Before transferring data to Samba TV, please review the following checklist to ensure your files are complete and properly formatted.
Required Parameters
- Date-time or Timestamp included
- IP Address (can be hashed, but not truncated)
- User Agent or Client Hints (or equivalent*)
- All available User IDs, preferably in separate columns
- IP addresses complete, not truncated
- User Agents complete, not truncated
File Format
- Data is provided in a supported format:
- Structured: Parquet, ORC, NDJSON, Avro
- Delimited: CSV/TSV
- User Agents are quoted if they contain a separator character
- Empty values are clearly indicated (
""
,null
,\N
, or0
)
File Size, Compression & Naming
- File sizes between 100MB and 2GB
- Files are gzip-compressed or uncompressed
- File paths include a date stamp in
YYYYMMDD
format
Example:
20180122/datasource_20180122_part123.log.gz
Notes & Special Cases
- User Agent Alternatives: If a full browser-like User Agent isn’t available (e.g., mobile app data), please discuss equivalent fields or Client Hints with Samba TV
- User ID Types: If multiple identifiers are present (cookies, MAID/IFA/GAID, Vendor IDs, Impression IDs), they must either:
- Be placed in separate columns, or
- Include an ID type column to distinguish them.
- IP Address TruncationSome datasets replace the final octet with zero or another placeholder. If truncated IPs are included, please discuss this with Samba TV before delivery.
Please Avoid
❌ Any type of sampling (except for region-based sampling)
❌ Alteration of ID values or modification of source events
❌ Substituting event/impression IDs in place of Cookie/User IDs/MAIDs
Samba TV’s identity algorithms must process the entire dataset to produce meaningful, accurate results.