“Anonymized data” is one of those holy grails, like “healthy ice-cream” or “selectively breakable crypto” — if “anonymized data” is a thing, then companies can monetize their surveillance dossiers on us by selling them to all comers, without putting us at risk or putting themselves in legal jeopardy (to say nothing of the benefits to science and research of being able to do large-scale data analyses and then publish them along with the underlying data for peer review without posing a risk to the people in the data-set, AKA “release and forget”).
As the old saying goes: “wanting it badly is not enough.” Worse still, legislatures around the world are convinced that because anonymized data would be amazing and profitable and useful, it must therefore be possible, and they’ve madelaws that say, “once you’ve anonymized this data, you can treat it like it is totally harmless,” without ever saying what “anonymization” actually entails.
Enter a research team from Imperial College London and Belgium’s Université Catholique de Louvain, whoseNaturearticleEstimating the success of re-identifications in incomplete datasets using generative modelsshows that they can reidentify “99.98 percent of Americans from almost any available data set with as few as 15 attributes.” That means that virtually every large-scale, anonymized data-set for sale or circulating for scientific research purposes today isnotanonymized at all, and should not be circulating or sold. (Rob discussed this earlier today)
The researchers chose to publish their method rather than keep it a secret so that people who maintain these data-sets can use it to test whether their anonymization methods actually work (Narrator: They don't).
While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.
Your Data Were ‘Anonymized’? These Scientists Can Still Identify You[Gina Kolata/New York Times]
Estimating the success of re-identifications in incomplete datasets using generative models[Luc Rocher, Julien M. Hendrickx & Yves-Alexandre de Montjoye/Nature Communications]
Researchers at the University of Exeter say pesky seagulls at holiday vacation spots tend to be deterred somewhat from stealing your food when you just stare at them. Yep, maintaining hostile eye contact with a gull may deter them from snarfing your french fries.
Artificial tongue’s nanoscale “tastebuds” can sort real whisky from counterfeits more than 99% of the time
In Whisky tasting using a bimetallic nanoplasmonic tongue (Nanoscale/Royal Society of Chemistry), a team from U Glasgow’s School of Engineering describe their work on an “artificial tongue” lined with “tastebuds” that sense “plasmonic resonance” (the absorption of light by liquids) to produced highly detailed accounts of the profiles of Scotch whiskys, which can be used […]
Monkeys can discern the order of items in a list, a skill that may help them manage their social lives
Many non-human animals, from apes to rats to crows, appear to be able to keep track of the order of items in a list.
Everybody wants to rule the world, but only one video game lets you do it in style – and even peacefully if you’re savvy enough with your cultural dominance. Sid Meier’s Civilization is on its fifth sequel and counting for good reason. No two games are alike thanks to the random mapping and numerous special […]
When it comes to travel, Genius is one company that sweats the details. If you’ve never owned one of their suitcases or carry-on bags, they feature dedicated compartments for everything you could imagine and often incorporate compression technology to fit more of it in there. If you’re planning for one last summer trip, here’s a […]
Company executives typically know two things about the cloud: They need to be on it, and they need it to work smoothly. Which means that if you know your way around Google Cloud, you’re going to have employers that want you to lead them through. The Complete Google Cloud Mastery Bundle is just the online […]