{"_id":"5a3254fec049430012f55872","category":{"_id":"5a3254fdc049430012f5586e","version":"5a3254fdc049430012f5586d","project":"5587ff91b3bcf52b0051314f","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-06-22T12:29:06.930Z","from_sync":false,"order":0,"slug":"proof-of-concept-documentation","title":"Screen6 Documentation"},"project":"5587ff91b3bcf52b0051314f","user":"5587ff84b3bcf52b0051314e","parentDoc":null,"version":{"_id":"5a3254fdc049430012f5586d","project":"5587ff91b3bcf52b0051314f","__v":3,"createdAt":"2017-12-14T10:39:57.964Z","releaseDate":"2017-12-14T10:39:57.964Z","categories":["5a3254fdc049430012f5586e","5a3255199a6f2000125c0d61","5bbc98ba817d5b00038e914a"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.6.0","version":"1.6"},"githubsync":"","__v":1,"updates":["584fecad2272620f0028358f","5b33be571d7a760003fc29e4"],"next":{"pages":[],"description":""},"createdAt":"2015-06-22T12:49:45.587Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":4,"body":"[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Data Format\"\n}\n[/block]\nScreen6 supports the following data file formats:\n\n* Apache Parquet\n* Optimized Row Columnar (ORC) \n* Newline delimited JSON\n* Apache Avro\n* Tab or Comma separated (TSV/CSV) or similar custom value separation\n\nIf using delimited formats, then Tab separation is preferred because it doesn't require quoting values that contain commas (e.g. User Agents) in quotes.\n\nFiles must be encoded in `UTF-8` or `ISO-8859-1`.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Empty values\"\n}\n[/block]\nIf you have empty values in your data (for example events where a DeviceID is missing) then Screen6 needs to know how these empty values are indicated. Common ways of specifying empty values are: \n* empty string (**the empty string should still be separated by the separation character!**)\n* `null`\n* `\\N`\n* `0`\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Size, compression and naming\"\n}\n[/block]\nThe log files you provide to us should comply with following requirements:\n\n  * **File size**: File size should not exceed **2GB**. Also avoid too many small files (below 50mb) as this will slow down ingestion.\n  * **Compression**: gzip for compression or no compression\n  * **Naming**: filenames should include a date stamp in YYYYMMDD format. For example: `client_20150122_part123.log.gz`","excerpt":"Requirements and formatting specifications for batch log files","slug":"log-file-format","type":"basic","title":"File format"}

File format

Requirements and formatting specifications for batch log files

[block:api-header] { "type": "basic", "title": "Data Format" } [/block] Screen6 supports the following data file formats: * Apache Parquet * Optimized Row Columnar (ORC) * Newline delimited JSON * Apache Avro * Tab or Comma separated (TSV/CSV) or similar custom value separation If using delimited formats, then Tab separation is preferred because it doesn't require quoting values that contain commas (e.g. User Agents) in quotes. Files must be encoded in `UTF-8` or `ISO-8859-1`. [block:api-header] { "type": "basic", "title": "Empty values" } [/block] If you have empty values in your data (for example events where a DeviceID is missing) then Screen6 needs to know how these empty values are indicated. Common ways of specifying empty values are: * empty string (**the empty string should still be separated by the separation character!**) * `null` * `\N` * `0` [block:api-header] { "type": "basic", "title": "Size, compression and naming" } [/block] The log files you provide to us should comply with following requirements: * **File size**: File size should not exceed **2GB**. Also avoid too many small files (below 50mb) as this will slow down ingestion. * **Compression**: gzip for compression or no compression * **Naming**: filenames should include a date stamp in YYYYMMDD format. For example: `client_20150122_part123.log.gz`