Is Anonymized Data Truly Safe From Re-Identification? Maybe not.

Aug 5, 2019

By Linda Henry

See all of Our JDSupra Posts by Clicking the Badge Below

View Patrick Law Group, LLC

Across all industries, data collection is ubiquitous. One recent study estimates that over 2.5 quintillion bytes of data are created every day, and over 90% of the data in the world was generated over the last two years. Not surprisingly, the proliferation of data collection has been an impetus for the increased regulatory scrutiny on the collection and use of personal data.

Companies rely on data anonymization both to maximize the utility and value of the personal data collected and to comply with privacy regulations. Although data protection regulations vary, data that meets the de-identification or anonymization requirements of the applicable regulation is not considered personal data, and thus exempt from privacy regulations such as the California Consumer Privacy Act (CCPA) and the European General Data Protection Regulation (GDPR). For example, the CCPA does not restrict a business’s ability to collect, use, retain, sell, or disclose consumer information that is de-identified or aggregated, and GDPR does not apply to personal data that has been anonymized. Consequently, a company that fails to adequately de-identify or anonymize data may violate the CCPA or GDPR with respect to its use of personal or consumer data.

The most recent research regarding re-identification of data sets that have been anonymized indicates that current anonymization techniques are often ineffective at protecting individuals against re-identification. A recent study published in Nature Communications (Estimating the success of re-identifications in incomplete datasets using generative models) found that 99.98% of Americans could be re-identified from any anonymized data set that uses only 15 demographic attributes. In addition, the researchers found that even if an anonymized data set is “heavily incomplete,” they could still estimate the likelihood of correctly re-identifying an individual with high accuracy and rejected the argument that the incompleteness of most data sets reduces the risk of re-identification.

The study posits that many anonymized data sets may not meet the requirements of GDPR or the CCPA, and calls into question whether the current release-and-forget model of anonymization is adequate. For example, Recital 26 of GDPR defines anonymous information as “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable (emphasis added).”  Recital 26 further provides that in order to determine whether “means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.”  Consequently, if data subjects are identifiable by collecting a small number of attributes, and available technology removes significant hurdles to re-identification,  companies should consider whether their current anonymization practices fail to meet the anonymization standard that GDPR (and other privacy regulations) prescribe for anonymization or de-identification.

The researchers also made an  online tool available that allows individuals to see the likelihood of being re-identified from anonymized data by plugging in a few common demographic characteristics. (On average, individuals have an 83% chance of  being re-identified if gender, birth date and ZIP code are known.) The tool also allows individuals to include additional basic demographic characteristics to see the increased likelihood of identification.

Although numerous prior studies have established that data anonymization is often reversible, the latest study demonstrates that technological advances have made it possible to de-anonymize data that might not have been previously possible, and it is becoming increasingly difficult to truly de-identify a data set and thus satisfy the requirements of privacy laws such as GDPR and the CCPA.


DHS Cybersecurity Arm Directs Executive Agencies to Develop Vulnerability Disclosure Policies

On November 27, 2019, the Cybersecurity and Infrastructure Security Agency (CISA) of the Department of Homeland Security (DHS) released for public comment a draft of Binding Operational Directive 20-01, Develop and Publish a Vulnerability Disclosure Policy (the “Directive”).

Open Internet Advocates Rejoice: Ninth Circuit Finds Web Scraping of Publicly Accessible Data Likely Does Not Violate CFAA

The Ninth Circuit Court of Appeals recently handed open internet advocates a big win by upholding the right of a data analytics startup to use automated bots to scrape publicly available data.

The ABA Speaks on AI

By Jennifer Thompson | Earlier this week, the American Bar Association (“ABA”) House of Delegates, charged with developing policy for the ABA, approved Resolution 112 which urges lawyers and courts to reflect on their use (or non-use) of artificial intelligence (“AI”) in the practice of law, and to address the attendant ethical issues related to AI.

Is Anonymized Data Truly Safe From Re-Identification? Maybe not.

By Linda Henry | Across all industries, data collection is ubiquitous. One recent study estimates that over 2.5 quintillion bytes of data are created every day, and over 90% of the data in the world was generated over the last two years.

FTC Settlement Reminds IoT Companies to Employ Prudent Software Development Practices

By Linda Henry | Smart home products manufacturer D-Link Systems Inc. (D-Link) has reached a proposed settlement with the Federal Trade Commission after several years of litigation over D-Link’s security practices.

Beyond GDPR: How Brexit Affects Other Data Laws

By Dawn Ingley | Since the United Kingdom (UK) voted in June, 2016, to exit the European Union (i.e., “Brexit”), the question in many minds has been, “Whither GDPR?” After all, the UK was a substantial contributor to this legislation. The UK has offered assurances that that it intends to, in large part, harmonize its data protection laws with GDPR.

San Francisco Says The Eyes Don’t Have It: Setting Limits on Facial Recognition Technology

By Jennifer Thompson | On May 14, 2019, the San Francisco Board of Supervisors voted 8-1 to approve a proposal that will ban all city agencies, including law enforcement entities, from using facial recognition technologies in the performance of their duties.

NYC’s Task Force to Tackle Algorithmic Bias: A Study in Inertia

By Linda Henry | In December, 2017 the New York City Council passed Local Law 49, the first law in the country designed to address algorithmic bias and discrimination occurring as a result of algorithms used by City agencies.

U.S. Lawmakers Want Companies to Check their Bias

By Linda Henry | Although algorithms are often presumed to be objective and unbiased, technology companies are under increased scrutiny for alleged discriminatory practices related to their use of artificial intelligence.

The Weight of “GDPR Lite”

By Dawn Ingley | In June, 2018, California’s legislature took the first steps to ensure that the state’s approach to data privacy was trending more closely to the European Union’s General Data Protection Regulation (GDPR), the de facto global industry standard for data protection. Though legislators have acknowledged that further refinements to the California Consumer Privacy Act (CCPA) will be necessary in the coming months, its salient requirements are known.