Data Checklist

Before transferring data to Samba TV, please review the following checklist to ensure your files are complete and properly formatted.

Required Parameters

  • Date-time or Timestamp included
  • IP Address (can be hashed, but not truncated)
  • User Agent or Client Hints (or equivalent*)
  • All available User IDs, preferably in separate columns
  • IP addresses complete, not truncated
  • User Agents complete, not truncated

File Format

  • Data is provided in a supported format:
    • Structured: Parquet, ORC, NDJSON, Avro
    • Delimited: CSV/TSV
  • User Agents are quoted if they contain a separator character
  • Empty values are clearly indicated ("", null, \N, or 0)

File Size, Compression & Naming

  • File sizes between 100MB and 2GB
  • Files are gzip-compressed or uncompressed
  • File paths include a date stamp in YYYYMMDD format

Example:

20180122/datasource_20180122_part123.log.gz 

Notes & Special Cases

  1. User Agent Alternatives: If a full browser-like User Agent isn’t available (e.g., mobile app data), please discuss equivalent fields or Client Hints with Samba TV
  2. User ID Types: If multiple identifiers are present (cookies, MAID/IFA/GAID, Vendor IDs, Impression IDs), they must either:
    1. Be placed in separate columns, or
    2. Include an ID type column to distinguish them.
  3. IP Address TruncationSome datasets replace the final octet with zero or another placeholder. If truncated IPs are included, please discuss this with Samba TV before delivery.

Please Avoid

❌ Any type of sampling (except for region-based sampling)
❌ Alteration of ID values or modification of source events
❌ Substituting event/impression IDs in place of Cookie/User IDs/MAIDs

Samba TV’s identity algorithms must process the entire dataset to produce meaningful, accurate results.