Selecting data in order to start deduplication with Screen6

As the amount of deduplication depends on the data that is provided to us, it’s important to select the right data, keeping in mind the following: time range, density, device mix, user ID types.

  • Date range
    Data should cover a long enough time range. Our pattern matching algorithms work by spotting patterns over time. We need at least two or three weeks of data, up to four weeks may be required depending on density.

  • Density in data
    The data you select should be dense in terms of UID activity. The more frequently we encounter User Identifiers (UIDs) the sooner we can deduplicate them.

  • Device mix
    Including traffic from a mix of devices - mobile, PC, TV, OTT - will increase the level of deduplication. We will spot more patterns and there will be more devices that we can deduplicate. Of course, we are also able to connect intra device.

  • User ID types
    The data should contain all available UID types. If you have access to Device IDs, IDFA (ID for Advertiser), Android IDs or any other type of UID then these should all be included in the data. These non-cookie UIDs provide a higher degree of persistency, which helps the deduplication.

❗️

Sampling

Data volume may be reduced by providing the data for a certain country or region. Extract only the required attributes, and use a compact data format (such as Parquet or ORC).

Any other type of data sampling is very strongly recommended against. If volume is an issue, please discuss options with the Screen6 Technical Ops team.