Tech Firms Can Easily Identify You Using Anonymized Data On Yourself

Chitanis - Aug 27, 2019

Researchers have recently pointed out that even if your personal information has been anonymized, advanced technology can still identify you.

Just by living in this modern world, you are giving up a lot of your personal info to many services and institutions. Many places promise that they will keep your data as private and secure as possible, but in fact, they often share your anonymized data to some third parties either for profit or for research. But the new research shows that anonymized data isn’t so anonymous.

Anonymized-data-is-not-as-anonymous-as-you-think-1 — Anonymized data is not as anonymous as you think

Recently, the Imperial College London’s researchers published their paper to show techniques currently used to anonymize data sets are insufficient. Before sharing a dataset, companies will delete identifying information (names, e-mail addresses, etc.). But even if identifiable factors were excluded from the dataset, it isn’t difficult to match definite information and find out who is the user of that data set, with high accuracy.

The researchers used 210 datasets for the analyses. These datasets were collected from 5 sources. It also includes the US government, which has over 11 million individuals’ information. According to the study, by using a machine learning model along with datasets including 15 identifiable factors (gender, birth date, age, marital status, ZIP code, etc.), the researchers can reidentify up to 99.98% of people in an anonymized data set.

Current-anonymization-techniques-are-insufficient-2 — Current anonymization techniques are insufficient

The study offered a hypothesis, a health insurance company issues a data set of 1,000 anonymous customers, which is 1% of the total customers of the company in California. This data set includes the ZIP code, gender, date of birth and diagnosis of breast cancer. One of these individuals’ boss finds out that there was a man, who has the same date of birth and ZIP code, and base on the data set, is having breast cancer and his stage IV treatments didn't succeed. However, the health insurance company is able to say that, even if this unique data of the employer and the record in their company’s file match, it could be anyone else among tens of thousands of people insured at that company.

Google-and-private-research-university-were-both-sued-for-sharing-medical-data-without-patient-consent-3 — Google and private research university were both sued for sharing medical data without patient consent

There are a lot of companies are now collecting data sets that can provide enough information to identify someone, and the fact that the researchers are able to reidentify users by using only 15 identifiable characteristics shows that we really need to reevaluate what creates an ethical anonymized dataset.

We-need-better-standards-for-anonymization-techniques-4 — We need better standards for anonymization techniques

According to the researchers, policymakers have the responsibility to make better standards for all of the anonymization techniques to make sure that the sharing of data sets will stop becoming an invasion of privacy