The 8 new ways AI is breaking privacy

You don't know what you've got till it's gone

Mar 24, 2026

This one is for Sankarshan over at The Trust Graph.

Quote 1: “In some videos you can see someone going to the toilet, or getting undressed. I don’t think they know, because if they knew they wouldn’t be recording.”

Quote 2: “I saw a video where a man puts the glasses on the bedside table and leaves the room”…“Shortly afterwards his wife comes in and changes her clothes”

Quote 3: “There are also sex scenes filmed with the smart glasses – someone is wearing them having sex.”

Quote 4: “We see chats where someone talks about crimes or protests. It is not just greetings, it can be very dark things as well”

Lastly, “You think that if they knew about the extent of the data collection, no one would dare to use the glasses”

By now, you should have read about what Nairobi-based subcontractor employees told Swedish newspapers when asked about their job reviewing footage from Meta’s Ray-Ban smart glasses. I had written earlier, in AI that Sees for us, about Agastya Mehta’s explanation about how these glasses enable people who who struggle to see, but also that in that enablement lie features that could be developed in order to enhance our lives. I had flagged privacy issues, but my focus then was primarily on product and utility.

But there’s more to AI and privacy than just wearables…

Why AI impacts Privacy differently

AI Agents exacerbate the utility versus privacy conflict, because of a few factors that that differentiate AI and agentic operations from apps:

Agents can be persistent: always on, always monitoring, and hence always collecting data. The scale and scope of data collection increases.
Agents can be autonomous, and decide that it needs to collect, use data, share or move data, and purpose limitation gets stress-tested.
Agents are multimodal: they can build and use tools to collect data, from recording video to taking photos, from scraping the web to building tools for inference, and this expands the risk surface.
Training is non-discriminatory: personal data gets hoovered up along with non-personal data. Max Schrems raised this issue a couple of years ago.
Learning is irreversible: Outputs can be blocked from display, and data once trained in, is never removed.
The knowledge graph can be enormous, and continue to grow, thus a more complete picture of every user gets built.

AI and Agents should change the privacy conversation completely because they change how data is collected, combined and acted upon.

It goes from what you choose to share to what is observed, inferred and done with that information.

The New Frontiers of Privacy

A few days before the Meta glasses report came out, I sat down with Jules Polonetsky, CEO of the Future of Privacy Forum, at the AI Summit in India, to discuss how AI impacts privacy. The transcript, about 6000+ words, is here. Based on that conversation, and my writing across multiple Reasoned and MediaNama articles here eight new frontiers of privacy:

1. Bystander capture: You can be surveilled by other people’s AI

This is what wearable AI brings to the table. When someone walks into a room wearing AI glasses, every person in camera’s field of vision has data captured about them, without consent. Some may have a blinking red-dot that is barely visible in some context, so that’s “Notice”, but barely so. It can also be disabled. Filming people in public is not illegal, but historically, that was supposed to be for transient or personal use. Now both glasses and CCTV cameras are adding facial recognition to the mix, and that could mean easy doxing - scan a persons face, cross-reference with a web-search, and retrieve personal information. This goes beyond visuals: someone could we wearing an AI Pin that listens in on their conversations.

As I wrote in AI that sees for us: AI that sees for us can also capture us without our consent. The cost of the countermeasure, a developer built an app to alert you when smart glasses are nearby, is externalised to the people being surveilled, not the companies doing it.

2. Lived behavior extraction: how you behave in a real-world environment

Polonetsky called this “Spatial intelligence”, but I don’t think that phrase covers it…it’s to impersonal. He said:

“So, what happens when you scrape the world, not just text and scraping your face and scraping what’s happening in your home and using videos about what’s happening in the world and now embedding that and trying to have models really learn so they can truly predict.”

This is somewhere in-between, and includes both the idea that the real-world is extractable training data, including the entire physical environment we inhabit, but also how we inhabit it: in terms of how we behave in it, our preferences (whether you put the milk in tea before you pour in boiling water, or later, or prefer chai), or whether you smirk or smile when making a particular comment. Or that I have lost weight (I have) or have a wound on my forehead (I don’t)…how you behave in the real world, with a specific person, or personal ticks, or how you look today.

Google is currently advertising the usage of the phone camera to capture the physical world for answers. CCTVs are coming up everywhere, including inside our homes…baby monitors, anyone? Meta’s glasses, Kaze by Sarvam, B by Lenskart, are enabling mass usage of visual capture, but right now the exposure (heh) is relatively small.

This is an architectural shift: the physical world treated as training data, with no equivalent of a robots.txt file, no consent model, and no framework for what rights people hold over their physical presence being observed and ingested.

I can potentially use my glasses to capture someones facial expressions to determine whether they meant what they said. It’s already happening with audio:

Hedy.ai (“Real-time meeting/class coach”) can already sit in and advise you during the meeting. A pitch on its homepage:

An even bigger concern: AI can now be used to predict how you might behave in the real world.

We’re heading towards a very pre-cognition, minority-report-ish situation.

3. False sanctuary: you thought that the space was private but it isn’t

I’d spotted this when I wrote in When AI enters the conversation about how uncomfortable I am with AI note takers in Zoom calls. Seeing “XYZ’s note taker” in a meeting makes me feel watched and documented. Behavioural profiling is not new. Social Media has always captured a vast amount of data about users, but those are largely recognised as public spaces. Social Media is seen as public, AI chat is seen as private. People share more personal information in what they assume are more private spaces, but are not. Michael Mignano captures this well as “Passive Context” in What AI Knows. He wrote:

Granola released a feature called Crunched, their take on an end-of-year, Spotify Wrapped–style recap. Crunched left me stunned. It made me realize just how much Granola had learned about me after transcribing many of my meetings throughout 2025. And judging by my X timeline, plenty of others felt the same. It made me wonder: What does ChatGPT know about me?

An exercise for everyone to do, whether you use ChatGPT, Claude, Deepseek or any other service:

Ask: Based on my conversations with you, tell me what you know about me in a structured format, especially about my values, relationships, emotional intelligence, actual interests, what I know, what I am curious about, what I don’t know about, what my fears are, and what I’m trying to do. Avoid overlaps between sections.

These services capture information about you to serve you better, but you also tell them more because you trust the space. A few weeks ago, an influential group I am in had a private conversation where the idea of AI enabled medical transcription at a doctors office was debated: while the assumption is that only the doctor will use this, how does someone know that it’s not being fed as training data to an AI service. It’s meant to, or perceived to be, be a safe space…that’s why False Sanctuary.

MediaNama is planning PrivacyNama for September.

Drop me an email at nikhil@medianama.com if you’re looking to partner, sponsor or speak.

4. Silo collapse: AI and Agents enable inferences across connected data surfaces

We give tools like Claude and ChatGPT access to multiple surfaces, including email, calendar and maybe even our Social Media. People use AI Agents for summaries of messages across their WhatsApp groups. We are going to increasingly delegate more actions to AI agents that store context about us in a memory.md, a PARA architecture, or a knowledge graph.

(I wasted 10 days trying to set the last two up with a picoclaw, unsuccessfully, but it will happen. Currently using memory.md).

When you give an agent access to your Drive to respond to your emails, it also gains access to your private data. When AI has access to multiple surfaces at once, what it can infer from the combination is not the sum of what each surface knows separately.

For example, I store my medical tests reports in my Google Drive. If I ask Gemini for recipes, will it avoid those that might increase cardiovascular risk, and explain why, when I’m simply trying to demonstrate that it’s good at recipes?

Where is this is heading? The companies with large passive context stores, Google with email and calendar, Apple with messages and health, Meta with browsing behaviour and now glass footage, are the same companies building AI products to activate that context.

This is one of those “consent is not enough” scenarios.

5. Purpose expansion: Agents carry your data into contexts you never authorised

The question of whether data uploaded for health purposes stays bounded to health purposes, or whether it flows into a broader profile, is a tricky one. You give consent once, but the agent that has access to multiple surfaces and the memory.md file, and maybe the goals.md you create for your agent will push it to use it for other purposes.

Polonetsky’s concern is that the protocols being built to enable this — MCP, agent-to-agent — are being built by technical teams focused on interoperability, not privacy:

“Ad tech was built to quickly move all the data very quickly across all the players — the advertiser, the targeter, the third-party bidding, the data company — without paying attention to the fact that, well, wait a second: how is this data collected? What are the limitations? Who are you giving it to?”

Agents prioritise jobs-to-be done over barriers they’re faced with. Something I failed to explore in the interoperability piece I wrote about was that agents are being designed to route around barriers, and thus restrictions, especially when loosely worded, may sometimes be seen as obstructions, and instructions as consent.

I had discussed agentic purpose limitation with Polonetsky, but I think we only scratched the surface of this issue. He said:

There are already a lot of tools I have where I have a plug-in and Google is going into my email and taking my reservation and putting it on my calendar, and automatically once I make a, a reservation or I get an email with a confirmation, Google is jumping from one service to another and it’s putting it on there and so forth, right? I mean, these are obviously much more extensive, but it’s not novel that we trust tools to do complicated actions for us. But, the rules of what I have authorized you to do need to be spelled out and clear.

The other issue here is the delegation of trust. I wrote about this in When AI acts as you, not for you:

“Once an agent has acted competently a few times, we stop supervising it closely.

6. Compounding Memory: AI memory is permanent and accumulates

I had flagged this to Polonetsky in our conversation:

“So how do we evolve norms that ensure that .. that personal data of those people that these glasses are seeing … because… Didn’t recognize, didn’t care. This could be persistent memory. And one of the challenges that we are seeing with age AI is the expansion of memory and context, extensively, in an irreversible permanent manner. How do we address that problem?”

We went from a “lets collect everything” environment, to restricting collection because of impending global regulations, after the GDPR. With competition in AI, we’re once again collecting everything.

But LLMs, which tokenise information, cannot untrain once trained, and is used to build future models as well. What goes out the window: the right to be forgotten and the right to erasure.

It’s no surprise that Polonetsky struggled a little on the right to erasure:

“So, point one, erasure in different, you know, statutes around the world has never been 100% absolute. There are places where you have a very strong right to erasure. Sometimes it’s been limited. Now, that’s turned. Sometimes it might not be technically possible.”
…
“The European Data Protection Board has been providing different opinions that have said, okay, we understand that at this point – ’cause maybe erasure will be feasible at some point – but we understand that at this point, erasure is a complicated problem that has not been technically solved. You can’t go in and figure out which tokens to, to delete, and the retraining is complicated.”
…
“So today, if we want training to exist, we’re obligated to provide some flexibility.”

This is the utility versus privacy debate again, and once again, there are no easy answers. The issue will arise when this memory becomes available to governments for surveillance, for companies for decision making when engaging with you, and when predicting your behaviour.

7. Synthetic Generation Violation: AI can create a privacy violation without your original data

I’d been wondering if there’s a privacy angle to Deepfakes, and had thus asked Polonetsky about it: is there a new kind of privacy violation when AI generates outputs that closely resembles or reconstructs personal data like facial information, voice, likeness, even when the underlying training data can’t be traced? His answer: “Are deepfakes a privacy violation as well? They certainly are. I don’t think we have full solutions yet for how to deal with deepfakes.”

Deepfakes are a privacy violation at the output layer, where they’re synthesising a version of you, your face, your voice, your likeness, in situations you never created. We saw a version of this already on X, when Grok allowed users to edit other people’s photos and publish them into feeds, leading to sexualised imagery.

It’s also possible that something that looks just like you, or almost like you, can be generated without you uploading a picture.

Also, is an almost-deepfake a privacy violation? What if an image generated is like you, but with a tiny birthmark below your left eye? Or with the nose slightly longer. Where is the line to be drawn on this? I think nudify apps and deepfake porn will test this boundary.

8. Human “Reviewer” exposure: “AI processes your data” doesn’t mean humans don’t see it

I’ll be honest, I almost didn’t include this one, because the reviewer exposure is not new. But there is something new about it: previously, reviews were for things that were problematic, or something that AI or a human had flagged from a social network. It’s there in most platform terms and conditions.

The difference is that human review is now structural or for harm prevention. Your chats and conversations, images of what you’ve uploaded, videos you’ve recorded and uploaded, video taken using AI glasses - are potentially used by reviewers for annotation of “training data”.

I would argue then that the word “review” is misleading in this conversation, because a human reviewing the occasional reported post is very different from a situation where the system actively involves humans going through private information for the purpose of annotation.

How can this be solved for?

That’s a conversation to be had. A few questions to consider:

Where does liability for a privacy violation in an autonomous agentic ecosystem lie? Most claws are open sourced, and outputs and actions depend on both the claw design, the LLM in use, as well as other parameters that might be user created, like goals.md or agents.md? They might also be auto-generated in an knowledge-graph, or the agent might identify its own goals for actions in a self-learning mechanism (and learnings.md).
Therefore: can purpose limitation survive autonomy?
Do we need norms for bystander protection?
Someone I spoke with earlier today suggested this: How do we build in “memory fade” in AI systems, where personal data cannot be removed?
What costs can be externalised? I’ve been saying this across multiple pieces, so it’s something I’m beginning to see as a conscious activity: The externalisation of cost. The burden of protection has been placed on you, the user, not on the companies building the products.
Till when can this be left to market dynamics? We got privacy regulation because there was a global market failure in privacy. AI is a highly competitive market, and competition is leading to overrides of legal boundaries, for example, the downloading and usage of pirated content for training. The same issue applies to privacy as well: there’s currently a market incentive for companies to push at the boundaries of privacy protection, and possibly violate privacy because competitive activity is pushing them towards it. When do we declare that there’s a market failure when it comes to AI and privacy? Do we allow the violation to become too large and too useful to undo, such that we end up managing consequences rather than preventing them?
Do we take a risk-based approach to Privacy? Do treat AI and privacy as a separate issue? As I said in the interview with Jules, “It’s almost as if we’ve seen the systematic dismantling of data protection regulations because of AI.”
Where do privacy enhancing technologies go from here? The scope of what they must do is increasing.
Should we move from regulating data to regulating systems, or expand the scope of harms, given that harm often emerges after agents act, and is often not observable until they’ve acted?
How do we find a balance between interoperability and privacy without compromising utility?
What kind of privacy-by-design defaults should agentic systems have?
Should certain capabilities and actions be treated as high-risk by default and stopped by agentic design?
How do we address privacy in systems that are user-deployed (like most claws), and institutional agents?

Any other new frontiers? Any other questions?

Discussion about this post

Ready for more?