This is the second part of a series assessing how the proposed Consumer Data Protection Act in Virginia stacks up against the CCPA in California and GDPR in Europe. Today: the all-important definition of personal data.

Under the CDPA, “personal data” means data that can reasonably be linked to an identified or identifiable person. Not too far from the CCPA’s definition of “personal information,” which covers information that “identifies, relates to, describes, is reasonably* capable of being associated with, or could reasonably* be linked, directly or indirectly, with a particular consumer or household.” And not too far from the GDPR’s definition of “personal data,” either, which includes “information relating to an identified or identifiable natural person.” Good, simple, easy—we’re done.

Not quite. Because the more important part of the “personal data” definition is the exclusions.

The most important exclusion is for anonymized or “de-identified” data. On this point, the proposed Virginia law closely tracks the CCPA in basic principle: data ceases to be Personal Data once it’s de-identified, or stripped of details so that it “cannot reasonably be linked to an identified or identifiable natural person” (or health data de-identified to HIPAA standards). Each of the CCPA and CDPA have some additional organizational requirements around de-identified data. For example, controllers need to commit not to try to re-identify the data. There is a slight difference in that the CCPA builds those requirements into the definition, whereas the CDPA just states them as requirements—not an important distinction for present purposes. What’s important is that those organizational requirements necessarily presume that there may be some ways to re-identify de-identified data. And that means that de-identified data doesn’t need to absolutely perfectly 100% anonymized forever and for all purposes, but de-identified enough that it can no longer be “reasonably” linked to a person. It’s also important that both the CCPA and CDPA contain a lesser category for pseudonymized data, which under the CDPA means data that could be re-identified using a separate data set that the controller keeps under separate lock and key. The existence of this separate, lesser category implies that you can’t de-identify data simply by separating out identifiers and keeping them in a separate drawer of your desk.

The definition of de-identified data also contains a potential glitch around devices: The definition of de-identified data is basically the inverted image of the definition of personal data: it’s information that “cannot reasonably be linked to an identified or identifiable natural person.” But the definition actually goes further: “or a device linked to such person.” Is this meant to imply that device identifiers like IP addresses or MAC addresses might be personal data, because those represent a device that is linked to a person? If the CDPA’s intent is to include device identifiers in personal data, it’s far from clear. The mismatch between the definitions of personal data and de-identified data necessarily produces an interpretive contradiction:

  • If device data is already encompassed in the general phrase “information that is linked or reasonably linkable to an identified or identifiable natural person” when it appears in the definition of personal data, then there’s no need for the definition of de-identified data to make specific mention of devices. It would be enough to say that de-identified data means data that is “not linked or reasonably linkable to an identified or identifiable person”—which is already what the definition says.
  • If device data is not already encompassed in the phrase “information that is linked or reasonably linkable to an identified or identifiable natural person” when it appears in the definition of personal data, then again, there’s no need to specifically mention device data in the definition of de-identified data—it’s not personal data to begin with.

This interpretative contradiction means that the CDPA’s mention of “devices” in “de-identified data” is definitely not a feature; it’s a bug. And it’s one that will need to be fixed, because the status of device identifiers like IP addresses has been a perennial problem under GDPR and the CCPA alike. European court decisions have suggested that IP addresses inherently are personal data because they could be identified with an individual, even if it takes inordinate effort. The status of IP addresses changed back and forth as the CCPA regulations were being hammered out; at one time, the California AG proposed excluding IP addresses so long as a business didn’t itself maintain other data allowing identification. But that safe harbor fell out of the final regs.

Next, there’s the “publicly available” exception. Publicly available information is excluded from the CDPA much more extensively than under the current CCPA. The current CCPA’s definition of “publicly available” is unduly narrow—just information from official public records. The recently enacted CPREA will fixed that once it goes into effect by excluding not only public records but also information about a person that’s lawfully made public by the person or is in widely distributed media, and also some second-hand information. Thankfully, the Virginia CDPA’s definition largely tracks the more sensible CPREA definition.

Finally, we can’t ignore the big elephants in the room: employee and B2B data. At first, the CCPA handled these categories in the most awkward way possible: silence. The definition of “consumer” was broad enough to capture employees and business counterparties, which left people scratching their heads about whether the definition of “personal information” also included data in these categories. The California legislature later passed amendments to exclude data in these categories from most of the CCPA’s requirements, but didn’t entirely carve these categories out of the definition of personal information. Which means that personal information in these categories might still have important implications for businesses, such as counting towards applicability thresholds. And the CCPA’s exclusions contain sunset provisions, which means there’s still a cloud of uncertainty over these exclusions. Virginia’s CDPA, meanwhile, deals with these categories definitively and cleanly: employees and people in the B2B context are carved out of the definition of “consumer” entirely.

* “Reasonably” was recently added by the California Privacy Rights and Enforcement Act.

In this series: 

Part I: Language

Part II: The Definition of Personal Data