When I worked at Comscore, I had my first encounter with the wild-west of consumer privacy. Despite what critics might have claimed, I've never been with an organization more concerned with the digital privacy of consumers, and at literally every level in the organization. Comscore practically invented the digital currency of online advertising with it's 100k+ household panel, but to do so, we had a birds-eye view into each consumers entire click stream and I mean every single thing they did on their computer(s) and device(s). You quickly learned two things, one, yuck....just....yuck. Two, if folks like myself at the data engineering and aggregation layer didn't take steps to protect the privacy of these consumers, who would?
It's easy to imagine digital privacy as someone else's problem, but if you can imagine this as the norm, you can quickly also imagine how all the data breaches at so many large companies began. Thing's in development move fast, this table has too much PII, it should be hashed, aw, damn okay next sprint for sure. But that isn't how we'd want our own, or our families private information handled, is it? Like a quasi-known issue that someone will circle back to when demands allow? The good news is, privacy issues brought up in scrums or meetings are not treated like the 'case and space' arguments, or other design or performance considerations, not in my experience anyway. Typically, if someone champions a cause for consumer privacy, be it some security policy or how/where particular information is being collected or stored, it is:
A) Generally something everyone can get behind and
B) Not at all a esoteric or nebulous topic.
With initiatives like GDPR looming upon us, it is time for all data professionals to take up the torch for consumer privacy, at every level in our organizations. Bad practices are easy to spot, and saying 'No' should never get you into trouble. If it does, you might consider that you are a white hat working in a black hat company, which we all know, do exist.
Championing consumer privacy protects your company, moreover, it protects your companies executive team, who may be deeply conscious of privacy ramifications, or may be clueless and are counting on being informed. And you can make this personal, because ultimately, it is. Those same privacy concerns for your companies customers could be concerns for you personally, or again, your family.
It is easy to turn a blind eye, to not make waves, but consider; as this phenomena happens with you, it happens very close to the cut in terms of where privacy issues fall through the cracks. Turn your head now, and the cascade effect could very easily roll all the way up through the ranks of your organization, and we all know how that ends, don't we?
One bit of advice I can give architects and modelers that can help head the issue off at the pass is simply to design with privacy in mind. These days, this is actually very easy to do, and can actually serve to improve performance as well as give peace of mind. Personally, I follow the old Sith 'Rule of Two', well sort of anyway. While designing any storage for data, table, cloud, whatever, if a dataset contains more than two bits of information that alone are innocuous, but together could be personally identifying, the 'most unique' of those identifiers, typically a phone number, email address, or other mostly unique value, well, they are gonna get hashed, and that is that. When analysts email and say, 'Hey, these aren't emails in the email column', it is very easy to reply 'You're absolutely right, but they are both unique and representative of the unencrypted email, can I ask your use case for needing the actual address?'
That conversation will end there 99.9% of the time. For that .1% it doesn't, for me anyways, that is a huge red flag. Factually, there are very few use cases that require a set of PII to be exposed in the workplace. Even marketing can make do with the communication identifiers needed for campaigns coupled with encrypted personal data that doesn't expose a consumer's private bits.
So, my fellow data superheroes, the gauntlet has been thrown, if not us, who, if not now...you get the point. Take up the shield for our consumers, for they are you, me, our parents, our children, and we are the first and probably strongest line of defense to protect them from the nefarious ill-doers who await the first opportunity to exploit them using just a few bits of data.