Privacy Engineering Presentation (2022)

I was asked to present at a Privacy Engineering conference last year but then it was cancelled and later rescheduled for a time I was much busier and couldn’t finish the presentation or give it. But there was some information in it I had collected I thought was maybe useful so I’m dumping the draft of my presentation here, and lightly editing it down.

Dan Ballard of Open Privacy’s 2022 Stripped Down Presentation for Privacy Engineering Conference

What is Privacy Engineering

Privacy and security are separate focuses. Security is more a focus on preventing unauthorized access to an app and data, where as privacy could be argued to be about preventing any access to user data.

Security: confidentiality, integrity and availability.

Privacy: refers to the rights of individual and organizations with respect to personal information.

Privacy refers to the control that you have over your personal information and how that information is used. Personal information is any information that can be used to determine your identity.

Security refers to how protected your personal information is.

A bank selling your data, in a way you agreed (shitty EULA) to, is security maintained, privacy compromised

Privacy is one of the core principles of human dignity.

Consentful Design

We have found AndAlsoToo’s Consentful Design ezine a very useful framework to use when looking at privacy. It provides a solid conceptual frame work to reason about privacy issues from the perspective of user consent.

Just go read it. It’s quick and I heavily quoted it in this section. Just go read it, it’s barely 15 pages.

Done? I had a few notes to expand on:

Talking about F.R.I.E.S. from a tech design perspective: reversible, can have complex technical implications, so it really needs to be baked into design and reinforced as design turns to implementation. Specific is super important and something industry is often bad at, with over reach in asks, and even tension from other design principals about not pestering users with questions.

Good Defaults are important

About “Ideas For Technical Mechanisms” I said: We’ll come back to this conclusion in a coming section, but it shows that good frameworks for privacy, even coming from different perspectives, often reinforce and support each other.

Seriously, just go read Consentful Design

Zero Trust / Trustless Architecture

The zero trust security model (also, zero trust architecture, zero trust network architecture, ZTA, ZTNA), sometimes known as perimeterless security, describes an approach to the design and implementation of IT systems. The main concept behind the zero trust security model is “never trust, always verify” - Wikipedia

While this is nominally more of a security model, as discussed, there is overlap between security and privacy design with them reinforcing each other so I’ve included it in this tour.

Zero Trust Architecture becomes important in distributed and peer 2 peer systems and mesh networks, where there is no one central root of trust, as opposed to. for example. TLS certs which are issued by trusted root authorities. When designing towards zero trust, you should, as the saying goes, “never trust, always verify”.

In Cwtch for instance, all identities are onion v3 addresses which are also a public key, so during the connection process both sides can issues encryption challenges to have the other side prove they posses the private key, that they are who they say they are.

Again, we find this security practice is not just well suited to distributed design but a key component when building secure and privacy preserving distributed software.

Data Sovereignty of the User and Local First

Local First Design

Definition of Self-Soverign Identity

“Cloud apps like Google Docs and Trello are popular because they enable real-time collaboration with colleagues, and they make it easy for us to access our work from all of our devices. However, by centralizing data storage on servers, cloud apps also take away ownership and agency from users. If a service shuts down, the software stops functioning, and data created with that software is lost.” - Local First Design

Local First Design is supported by concepts such as Data Sovereignty, which is a term originally coined with a focus more on keeping your data within the same legal jurisdiction as you live, and with some eye to privacy and safety. Data Sovereignty as a concept has been extended more recently and in this scope to mean giving the user full autonomy over their data, and retaining full rights.

One of the main ways developers can help users pursue personal Data Sovereignty is through Local First Design. Its introductory article lists 7 core ideals to aspire towards in your design:

No spinners: your work at your fingertips
Your work is not trapped on one device
The network is optional - where possible
Seamless collaboration with your colleagues
The Long Now - problem with cloud services is your data can disappear or be inaccessible when they go away or out of business
Security and privacy by default
You retain ultimate ownership and control

It obviously prioritizes local data first for access and as the authoritative source, but it’s also a modern design principal that doesn’t want to sacrifice that convenience the cloud provides. Naturally it works best with a distributed design framework, as a companion framework that when building distributed software can help inform your design process.

Read the Local First Design for more details.

Metadata

Something to consider when following Local First Design and Data Sovereignty principals is metadata. Metadata is data about your data. For instance you could encrypt your data and upload it to a P2P network for storage backup, redundancy, or easier retrieval. This could still generates metadata about your data, anything from data about the author/owner, to the size or amount of data, and dates of upload and access, and in worst cases, locations of access. In more social scenarios such as messaging this includes contacts/associations, times and frequencies of contact, that can lead to detailed social graphs. All of this data should be as protected as the core user data itself, it all having value to others. One could argue in the more benign form just for advertising, but once out there data is always accessible to others, and in the worst case the “we kill people based on metadata” folks.

Decentralized / Distributed Design

As the previous design frameworks, paradigms and perspectives have all been pointing towards distributed and decentralized design are really powerful tools from a privacy perspective.

Obviously on it’s own there is nothing inherently privacy preserving about distributed design, for instance one of the most popular distributed apps on the planet, BitTorrent, has no privacy at all, as evidenced by the emails I occasionally get from my ISP as proxy for copy right holders listing files they are saying they have observed me downloading.

However, when you complement distributed design with the above mentioned frameworks, and a general privacy preserving focus in software design, you get tools and design options that are nearly impossible to compete with when compared to centralized service design. Centralized services can still be done with strong focus on preserving privacy, Signal is probably one of the better examples, but even then, their whisper systems protocol by design generates a lot of metadata, which so far there isn’t a lot of evidence they’ve taken advantage of, but in the hands of less scrupulous operators, like Meta, we’ve seen the same system be mined for metadata to funnel into advertising.

Decentralized design just provides the right structure and framework to start properly building privacy preserving apps.

I can’t do a fair overview of distributed and decentralized design in this presentation, it’s a huge systems design practice with a long history and lots of knowledge.

I will leave you with some anarchist quotes that are relevant to decentralization in terms of privacy:

“The thing you’re supposed to decentralize is power”

“There is a principle of Defensive Decentralization: when besieged, a well constructed decentralized system will further decentralize. The corollary of which is: A well constructed decentralized system will identify & attack emergent centralization.”

Surprise, It’s Actually Capitalism’s Fault

So why aren’t we seeing more these designs principals in the real world, in the tech products we use? Why in fact even as the field of distributed design is rich and well studied, are we in a “golden age” of centralized cloud services instead with privacy being eroded in new ways each year?

My conclusion is “capitalism”.

I outlined my thoughts on this years ago in an article “Capitalism Compromises Design”, and this thinking was influential on why I didn’t end up ever doing a startup, but instead helped co-found a tech non profit. Much less common?

In short, under capitalism, companies’ core principal is to maximize profit for owners, be it founders, investors, VCs or eventually share holders and their board of representatives. No social rule holds 100% so there’s always some examples of small organizations staying true to their values early on, but over time increasingly pressure increases to “sell out” especially as more investment is taken on. Ideal goal for most companies is to “go public”.

Capitalism elevates profit at the expense of every other interest.

The core drive at the heart of all companies becomes finding additional ways to extract wealth, and so it informs and influences all design decisions. It is a control seeking drive, that will always favour centralized approaches to distributed design to retain control and power over users and their data, and to exploit it for financial gain. Which means that under capitalism, we usually see distributed design approaches discarded because of capitalism’s presence very early on in the process.

Capitalism is in the end antithetical to privacy.

We are seeing a small counter trend gain traction a bit more recently. I mentioned Signal as an example of a privacy focused centralized app. They also saw the dead end that being a privacy preserving corporation was for their app and users and transitioned to a nonprofit, Signal Foundation. But beyond that, one of the largest and oldest privacy preserving pieces of tech we have, Tor, is also maintained by a nonprofit.

More recently, with the fall of Roe vs Wade in the US, there was a wave of folks realizing Period Tracker apps had a lot of data about them that could be weaponized in their jurisdiction. The very first discourse I saw about this was from non technical folks positing that apps built in the EU might be better “because the EU has stronger data protection laws”. Sadly upon actually examining EULAs and stated practices of said companies, they all admitted to reselling a lot of user data. In a quick survey, only two apps emerged that actually offer strong and reliable guarantees to user’s privacy, both following (intentionally or not) Local First and Consentful design principals in that the data was stored locally by default and only uploaded to the cloud by opt in. And both of these apps were built by non profits, Planned Parenthood and UNICEF.

Capitalism is antithetical to strong privacy preserving design. The overwhelming examples of the last decades bear this out.