Requirements and formatting specifications for batch log files

Data Format

Screen6 supports the following data file formats:

  • Apache Parquet
  • Optimized Row Columnar (ORC)
  • Newline delimited JSON
  • Apache Avro
  • Tab or Comma separated (TSV/CSV) or similar custom value separation

If using delimited formats, then Tab separation is preferred because it doesn't require quoting values that contain commas (e.g. User Agents) in quotes.

Files must be encoded in UTF-8 or ISO-8859-1.

Empty values

If you have empty values in your data (for example events where a DeviceID is missing) then Screen6 needs to know how these empty values are indicated. Common ways of specifying empty values are:

  • empty string (the empty string should still be separated by the separation character!)
  • null
  • \N
  • 0

Size, compression and naming

The log files you provide to us should comply with following requirements:

  • File size: File size should not exceed 2GB. Also avoid too many small files (below 50mb) as this will slow down ingestion.
  • Compression: gzip for compression or no compression
  • Naming: filenames should include a date stamp in YYYYMMDD format. For example: client_20150122_part123.log.gz