AI Security: How China Stole Claude Capabilities Through Distillation

Today Anthropic publicly accused three Chinese AI companies — DeepSeek, MiniMax and Moonshot AI — of targeted capability stealing. Their method was ingeniously simple:

24,000 fake accounts
16 million queries
Knowledge distillation to copy Claude’s capabilities

It’s not a dramatic hacking operation. It’s patient, methodical industrial espionage — and that’s precisely why it’s so interesting.

What is knowledge distillation and why is it legal

Knowledge distillation is a standard machine learning technique from 2015 (Hinton et al.).

Basic principle:

You have a large “teacher” model (expensive, slow)
You want a smaller “student” model (cheap, fast)
The student learns to mimic the teacher’s outputs, not just raw data

Result: A smaller model that retains 70-85% of the large one’s capabilities.

Who does it:

Meta with LLaMA
Google internally
Every other startup for deployment

When it’s legal:

You train on your own data
You have a license to the data
You distill your own model

When it’s industrial espionage:

You call someone else’s API against Terms of Service
You use the outputs to train a competing model
That’s exactly what happened

Chinese companies didn’t need to break any encryption. They just had to ask. A lot.

Anatomy of the attack: How the Hydra cluster worked

Anthropic operates geofencing — access from China is restricted. Attackers bypassed it with simple infrastructure:

Layer 1 — Fake accounts

24,000+ registrations with different identities
Various payment methods and metadata
Gradual registrations over time (not bulk)

Layer 2 — Proxy networks

Traffic through residential proxy networks
IP addresses from real households (USA, EU, SEA)
From a detection perspective, they look like legitimate users

Layer 3 — Fingerprint diversity

Different User-Agents
Different behavior patterns
Different time zones
Each “account” behaved like a different person

What they asked about:

Complex reasoning (multi-level logical tasks)
Code assistance (generation, debugging, refactoring)
Tool use (the most expensive to train)

It wasn’t random. It was targeted theft of the most valuable capabilities.

Why didn’t Anthropic catch it months earlier?

Dispersal in time and space

24,000 accounts = ~667 queries per account
Queries spread over weeks
No single account looks anomalous
Rate limits don’t protect against distributed attacks

Legitimate behavior patterns

Complex reasoning queries look like research
The same thing researchers and developers do
A detection algorithm would generate a huge false positive rate

Lack of cross-account correlation in real-time

Detecting that 24,000 accounts are asking similar things is computationally intensive
It requires sophisticated platform-wide analysis
Not just individual account monitoring

Incentive problem

AI companies are motivated to process as many queries as possible
Every query = revenue
Aggressive detection = lost money

Result: The attack was discovered through internal forensic analysis — not real-time automated detection.

Technical signals that should have been caught

In hindsight, it’s easy to be smart. Let’s be concrete — what red flags were there:

Rate limiting is ineffective without behavioral analysis

Classic rate limiting (X queries per minute) is trivially bypassable. A better approach:

Semantic similarity: Thousands of accounts ask structurally identical questions
Answer entropy monitoring: Responses from different “users” are too similar
Temporal clustering: Waves of queries in similar time windows
Coordination patterns: Queries logically build on each other

Residential proxy detection already exists

Services like IPQS, Sift and Sardine can identify proxy traffic:

ASN analysis
Latency fingerprinting
WebRTC leaks
Behavioral biometrics in the browser

This isn’t rocket science — it just wasn’t used aggressively enough.

Account clustering through payment infrastructure

24,000 accounts had to pay. Fingerprinting patterns exist:

Card BIN numbers
Issuer geolocation vs. IP address
Velocity checks on payment methods
Stripe and PayPal have these signals

Query profile vs. declared use case

If an account registers as an “indie developer” and then asks 500 complex reasoning queries daily focused on edge cases — that’s an anomaly.

Behavioral profiles are standard in fraud detection. In AI API environments, they’re only just starting to be applied.

How an AI company should defend its models

No solution is 100% effective. It’s about raising the cost of attack so it becomes unprofitable.

1. Cross-account behavioral analysis in real-time

Stop thinking per-account
Graph analysis of account clusters
Detection of shared infrastructure
The Hydra cluster would have been caught months earlier

2. Output watermarking

Embedding static patterns into LLM outputs
Invisible to users, detectable in training data
If a model trained on “stolen” data reproduces these patterns — that’s forensic evidence

3. Canary queries

Artificially created knowledge artifacts
Specific facts, stories, style patterns
Don’t exist anywhere else
Appearing in a competitor’s model = direct proof

4. Stricter KYC for high-volume usage

Enterprise customers go through KYC processes
For higher API tiers it should be standard
Friction layer for coordinated attacks

5. Geopolitical risk scoring

Combination of infrastructure signals
Payment data and query patterns
Risk score flagging potentially state-sponsored activity
It’s about infrastructure, not ethnicity

Conclusion: Software theft has changed

It’s not a traditional cyber attack:

No code vulnerabilities
No phishing campaigns
No insider threats
Just APIs and patience

Model distillation as a vector for industrial espionage is a new type of threat. AI companies aren’t ready for it — neither technically nor mentally.

Questions that should concern you:

How many other operations are currently underway?
Which ones haven’t been caught yet?
What’s the price of your AI infrastructure?

Geopolitics: The USA restricts chip exports to slow down Chinese AI development. But copying model capabilities through 16 million API calls? Hardware security is necessary, but insufficient.

Want to get your AI infrastructure security diagnosed?

I conduct detailed AI audits — mapping security risks including capability stealing, model extraction attacks and API abuse.

Also check out our AI services or read more on our blog.

Have a specific question? Get in touch — I’m happy to advise on how to defend against similar threats.

Share this article

Found this article helpful? Share it with colleagues who might benefit.