{"metadata":{"image":[],"title":"","description":""},"api":{"url":"","auth":"required","results":{"codes":[]},"settings":"","params":[]},"next":{"description":"","pages":[]},"title":"Data checklist","type":"basic","slug":"log-file-checklist","excerpt":"","body":"[block:html]\n{\n  \"html\": \"<div>\\n<ol >\\n  <li> Did I include <b>all parameters</b>:\\n  <ul>\\n    <li> Date-time or Timestamp </li>\\n  \\t<li> IP Address - can be hashed but not truncated </li>\\n  \\t<li> User Agent or equivalent <sup>(1)</sup></li>\\n  \\t<li> All available types of User IDs, preferably in separate columns <sup>(2)</sup> </li>\\n    </ul>\\n  </li>\\n  <li> Are <b>IP addresses</b> complete, not truncated? <sup>(3)</sup> </li>\\n  <li> Are the <b>User Agents</b> complete, not truncated? </li>\\n  <li> Is the data in the correct format?\\n <ul type='a'>\\n    <li> Structured formats: Parquet, ORC, NDJSON, Avro</li>\\n    <li> Delimited formats - CSV/TSV\\n      <ol type='i'> \\n        <li> User Agents <b>quoted</b> where they contain a separation character?</li>\\n        <li> Are <b>empty values</b> clearly indicated?</li>\\n   </ol></li>\\n  </ul> \\n </li>\\n\\t<li> Are my files either <b>gzip compressed</b> or not compressed at all?</li>\\n\\t<li> Are file sizes always <b>between 100MB and 2GB</b>?</li>\\n  <li> Do file paths include the <b>date stamp</b> YYYYMMDD, e.g. `20180122/datasource_20180122_part123.log.gz`?</li>\\n</ol>\\n\\n  <p>&nbsp;</p>\\n\\n<p><small>(1)</small> If a full browser-like user agent isn't available, for example, data from mobile applications - please discuss alternatives with Screen6.</p>\\n<p><small>(2)</small> If User Identifiers are combined - cookies, MAID (IFA, GAID, etc.), Vendor ID, Impression ID - are combined, it's essential to clearly identify the type of ID using different columns,  or include an ID type column. </p>\\n<p><small>(3)</small> Truncation is sometimes done by replacing the final octet with a zero or other placeholder. Please discuss with us if some truncated IP data will be included.</p>\\n</div>\\n\\n<style>\\nol { line-height: 2em; }\\nul { line-height: 1.5em; }\\n</style>\"\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"warning\",\n  \"title\": \"Please avoid:\",\n  \"body\": \"* Any kind of sampling, except for region-based.\\n* Alteration of ID values or other modification of source events\\n* Providing event/impression IDs instead of Cookie/User ID/MAIDs.\\n\\nScreen6 algorithms must process the entire dataset to produce meaningful results\"\n}\n[/block]\nAny questions, please don't hesitate to contact us. Thanks for reading!","updates":[],"order":5,"isReference":false,"hidden":false,"sync_unique":"","link_url":"","link_external":false,"_id":"5a3254fec049430012f55874","category":{"sync":{"isSync":false,"url":""},"pages":[],"title":"Screen6 Documentation","slug":"proof-of-concept-documentation","order":0,"from_sync":false,"reference":false,"_id":"5a3254fdc049430012f5586e","version":"5a3254fdc049430012f5586d","project":"5587ff91b3bcf52b0051314f","createdAt":"2015-06-22T12:29:06.930Z","__v":0},"user":"5587ff84b3bcf52b0051314e","project":"5587ff91b3bcf52b0051314f","parentDoc":null,"version":{"version":"1.6","version_clean":"1.6.0","codename":"","is_stable":true,"is_beta":false,"is_hidden":false,"is_deprecated":false,"categories":["5a3254fdc049430012f5586e","5a3255199a6f2000125c0d61","5bbc98ba817d5b00038e914a"],"_id":"5a3254fdc049430012f5586d","project":"5587ff91b3bcf52b0051314f","createdAt":"2017-12-14T10:39:57.964Z","releaseDate":"2017-12-14T10:39:57.964Z","__v":3},"createdAt":"2016-01-19T09:41:36.143Z","githubsync":"","__v":0}
[block:html] { "html": "<div>\n<ol >\n <li> Did I include <b>all parameters</b>:\n <ul>\n <li> Date-time or Timestamp </li>\n \t<li> IP Address - can be hashed but not truncated </li>\n \t<li> User Agent or equivalent <sup>(1)</sup></li>\n \t<li> All available types of User IDs, preferably in separate columns <sup>(2)</sup> </li>\n </ul>\n </li>\n <li> Are <b>IP addresses</b> complete, not truncated? <sup>(3)</sup> </li>\n <li> Are the <b>User Agents</b> complete, not truncated? </li>\n <li> Is the data in the correct format?\n <ul type='a'>\n <li> Structured formats: Parquet, ORC, NDJSON, Avro</li>\n <li> Delimited formats - CSV/TSV\n <ol type='i'> \n <li> User Agents <b>quoted</b> where they contain a separation character?</li>\n <li> Are <b>empty values</b> clearly indicated?</li>\n </ol></li>\n </ul> \n </li>\n\t<li> Are my files either <b>gzip compressed</b> or not compressed at all?</li>\n\t<li> Are file sizes always <b>between 100MB and 2GB</b>?</li>\n <li> Do file paths include the <b>date stamp</b> YYYYMMDD, e.g. `20180122/datasource_20180122_part123.log.gz`?</li>\n</ol>\n\n <p>&nbsp;</p>\n\n<p><small>(1)</small> If a full browser-like user agent isn't available, for example, data from mobile applications - please discuss alternatives with Screen6.</p>\n<p><small>(2)</small> If User Identifiers are combined - cookies, MAID (IFA, GAID, etc.), Vendor ID, Impression ID - are combined, it's essential to clearly identify the type of ID using different columns, or include an ID type column. </p>\n<p><small>(3)</small> Truncation is sometimes done by replacing the final octet with a zero or other placeholder. Please discuss with us if some truncated IP data will be included.</p>\n</div>\n\n<style>\nol { line-height: 2em; }\nul { line-height: 1.5em; }\n</style>" } [/block] [block:callout] { "type": "warning", "title": "Please avoid:", "body": "* Any kind of sampling, except for region-based.\n* Alteration of ID values or other modification of source events\n* Providing event/impression IDs instead of Cookie/User ID/MAIDs.\n\nScreen6 algorithms must process the entire dataset to produce meaningful results" } [/block] Any questions, please don't hesitate to contact us. Thanks for reading!