{"_id":"5a3254fec049430012f5587a","category":{"_id":"5bbc98ba817d5b00038e914a","project":"5587ff91b3bcf52b0051314f","version":"5a3254fdc049430012f5586d","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2018-10-09T12:02:02.151Z","from_sync":false,"order":1,"slug":"graph-and-methodology","title":"Graph and Methodology"},"project":"5587ff91b3bcf52b0051314f","user":"5587ff84b3bcf52b0051314e","parentDoc":null,"version":{"_id":"5a3254fdc049430012f5586d","project":"5587ff91b3bcf52b0051314f","__v":3,"createdAt":"2017-12-14T10:39:57.964Z","releaseDate":"2017-12-14T10:39:57.964Z","categories":["5a3254fdc049430012f5586e","5a3255199a6f2000125c0d61","5bbc98ba817d5b00038e914a"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.6.0","version":"1.6"},"githubsync":"","__v":0,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2017-06-13T08:04:59.255Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":4,"body":"[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Precision\"\n}\n[/block]\nScreen6 uses probabilistic algorithms to match User IDs that belong to the same person. These matches should be as accurate as possible. When deterministic verification data, such as login IDs or email addresses, are available then the client and Screen6 are able to compute the accuracy of the verifiable part of the graph. This figure is called: precision.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"How we compute precision\"\n}\n[/block]\nWe compute precision and recall using deterministic data that clients provide to us. As the device graphs that we produce are private to the client, we are not able to use deterministic data collected elsewhere for verification.\n\n## Pairs and cluster based verification ##\nWe use two methods for computing precision: pairs based and cluster based.\n\nThe pairs based method looks at individual pairs of user IDs. A pair is labelled correct when both user IDs have the same deterministic ID; a pair is incorrect when the two user IDs have a different deterministic ID.\n\n```Precision = correctPairs / (correctPairs + incorrectPairs)```\n\nThe cluster based method looks at all the user IDs belonging to a single cluster or Match ID. A cluster has been matched correctly when there’s only one Deterministic ID found for the cluster, and if that Deterministic ID is linked to at least two of its User IDs.\n\n```Precision = correctClusters / (correctClusters + incorrectClusters)```\n\n## Cleaning verification data ##\nBefore calculating precision, the verification data is cleaned:\n\n- User IDs, and their corresponding Deterministic IDs, that are not present in the source event data that was sent to Screen6 are discarded during verification\n- User IDs should not get linked to more than one Deterministic ID. In such case the related Deterministic IDs are removed from the verification data\n- A Deterministic ID should link at least two User IDs. Deterministic IDs that link to only one User ID may only point to an incorrect match but will never be able to confirm a correct match. As such they put an aggressive bias in the precision calculation\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Inaccuracies in deterministic data\"\n}\n[/block]\nWhen computing precision, the provided deterministic data is considered to be a truth set. However it is important to point out that most of the time these data sets are not flawless themselves. For example someone may login on site 1 as johndoe:::at:::gmail,com and on site 2 as johndoe@mycompany.com. When Screen6 matches the corresponding User IDs the match is done correctly, but when computing precision the match will be deemed incorrect.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Recall and coverage\"\n}\n[/block]\nRecall indicates what percentage of matches in a deterministic verification data set has also been matched correctly in the device graph. Note that the same cleaning of the verification data should be applied as mentioned above.\n\nCoverage doesn't relate to verification data. Coverage is the amount of User IDs or impressions that can be linked to a Match ID. [Read more about Coverage here](doc:coverage-match-rate).","excerpt":"","slug":"precision-and-validation","type":"basic","title":"Precision validation"}

Precision validation


[block:api-header] { "type": "basic", "title": "Precision" } [/block] Screen6 uses probabilistic algorithms to match User IDs that belong to the same person. These matches should be as accurate as possible. When deterministic verification data, such as login IDs or email addresses, are available then the client and Screen6 are able to compute the accuracy of the verifiable part of the graph. This figure is called: precision. [block:api-header] { "type": "basic", "title": "How we compute precision" } [/block] We compute precision and recall using deterministic data that clients provide to us. As the device graphs that we produce are private to the client, we are not able to use deterministic data collected elsewhere for verification. ## Pairs and cluster based verification ## We use two methods for computing precision: pairs based and cluster based. The pairs based method looks at individual pairs of user IDs. A pair is labelled correct when both user IDs have the same deterministic ID; a pair is incorrect when the two user IDs have a different deterministic ID. ```Precision = correctPairs / (correctPairs + incorrectPairs)``` The cluster based method looks at all the user IDs belonging to a single cluster or Match ID. A cluster has been matched correctly when there’s only one Deterministic ID found for the cluster, and if that Deterministic ID is linked to at least two of its User IDs. ```Precision = correctClusters / (correctClusters + incorrectClusters)``` ## Cleaning verification data ## Before calculating precision, the verification data is cleaned: - User IDs, and their corresponding Deterministic IDs, that are not present in the source event data that was sent to Screen6 are discarded during verification - User IDs should not get linked to more than one Deterministic ID. In such case the related Deterministic IDs are removed from the verification data - A Deterministic ID should link at least two User IDs. Deterministic IDs that link to only one User ID may only point to an incorrect match but will never be able to confirm a correct match. As such they put an aggressive bias in the precision calculation [block:api-header] { "type": "basic", "title": "Inaccuracies in deterministic data" } [/block] When computing precision, the provided deterministic data is considered to be a truth set. However it is important to point out that most of the time these data sets are not flawless themselves. For example someone may login on site 1 as johndoe@gmail,com and on site 2 as johndoe@mycompany.com. When Screen6 matches the corresponding User IDs the match is done correctly, but when computing precision the match will be deemed incorrect. [block:api-header] { "type": "basic", "title": "Recall and coverage" } [/block] Recall indicates what percentage of matches in a deterministic verification data set has also been matched correctly in the device graph. Note that the same cleaning of the verification data should be applied as mentioned above. Coverage doesn't relate to verification data. Coverage is the amount of User IDs or impressions that can be linked to a Match ID. [Read more about Coverage here](doc:coverage-match-rate).