Anonymised data unwound
Researchers say it is becoming easier to identify individuals from anonymised data.
A method that can estimate whether a person can be re-identified from an incomplete, anonymised dataset has been presented in Nature Communications.
The paper suggests that current methods of anonymisation and data sharing may be inadequate to protect individual privacy or satisfy requirements set by data protection laws, such as the European General Data Protection Regulation.
A lot of work in data science and artificial intelligence depends on large-scale, detailed and individual-level data, the collection and sharing of which has raised concerns about individual privacy.
Authorities often anonymise and release partial datasets to address privacy concerns, however, the successful re-identification of anonymised datasets recently, including browsing histories, mobile phone and credit card data, have shown that these practices may be inadequate.
Researchers in the UK set out to re-identify individuals from anonymised datasets using only a few data points like postcode, date of birth, number of kids and gender.
They found it was fairly easy to pick out most people with a relatively high level of confidence.
The study shows that the likelihood of identification increases rapidly with the number of known attributes.
For example, 99.98 per cent of people in Massachusetts would be identifiable based on 15 demographic attributes.
Releasing only a sampled or partial dataset is therefore not sufficient to protect individual privacy, the authors conclude.