- Everything you paste or upload to a cloud AI is transmitted to external servers — including file contents read by Claude Code in agentic mode.
- For GDPR compliance with personal data, you need Claude Team or Enterprise (includes DPA (Data Processing Agreement) + EU SCCs (Standard Contractual Clauses)). Pro plan is not sufficient.
- For identifiable clinical or health data, use local models (Ollama) only — cloud AI is not an option regardless of plan tier.
Before You Start: Decision Flowchart
Work through this before opening any AI tool with research data.
What Gets Sent to the Cloud
Everything you type or paste into a cloud AI tool (Claude.ai, ChatGPT, Copilot) is transmitted to external servers. This includes:
| What you do | What gets transmitted |
|---|---|
| Type a prompt | The full prompt text |
| Paste text, code, or data into the chat | The full pasted content |
| Upload or attach a file | The full file contents |
Run claude in a project directory |
Prompts and model responses (encrypted via TLS); not your file system |
Claude Code specifically: Claude Code runs locally on your machine. It reads files into its context window and sends that context to the API with each request. This is not excerpts or summaries — it is full file contents. A large session with many files read can transmit substantial portions of a codebase.
What also gets loaded automatically at session start, without a permission prompt:
CLAUDE.mdfiles in the current directory and parent directories- Project memory files (from previous sessions, if any)
- Git state (branch, uncommitted changes, recent commits)
- Skill descriptions and MCP server tool definitions
During the agentic loop, Claude proactively reads additional files — package.json, source files, lock files, tests — to gather context. Each file's content is appended to the context window and transmitted on the next API call.
What Stays Local
| What stays local | How to enforce it |
|---|---|
| Files you do not paste or upload | Don't paste them |
Files excluded by .claudeignore |
Add patterns to .claudeignore in your project root |
| Your file system (not scanned or indexed) | No action needed |
| Telemetry and error logs | Set DISABLE_TELEMETRY=1 (see Privacy Kill Switches) |
Important caveat: .claudeignore rules are not fully reliable in agentic mode. See "The .env Problem" below.
Paid Plan Comparison
| Feature | Claude Pro ($20/mo) | Claude Team ($30/user/mo) | Claude Enterprise |
|---|---|---|---|
| Trains on your data | No (opt-in only) | No | No |
| Data retention | 30 days | 30 days | 30 days (custom configurable) |
| Zero Data Retention | Not available | Not available | Available |
| Data Processing Agreement (DPA) | Not available | Yes — automatic, includes EU SCCs | Yes — automatic, includes EU SCCs |
| Anthropic's legal role | Data controller | Data processor | Data processor |
| Controller = Anthropic decides how data is used. Processor = your institution decides, Anthropic handles it per your instructions. | |||
| GDPR Art. 28 compliance | No | Yes | Yes |
| Recommended for institutional use | No | Yes | Yes |
Key implications
- Pro plan: Anthropic does not train on your conversations by default. However, no DPA is available. Anthropic acts as a data controller — not a processor — which means the controller-processor relationship required for GDPR Art. 28 compliance does not exist. This matters for institutional use.
- Team/Enterprise: Anthropic acts as a data processor. DPA with EU Standard Contractual Clauses (SCCs) is automatically included. Conversations are never used for training. This is what institutional GDPR compliance requires.
- All tiers: Data is stored in the US. Cross-border transfer safeguards rely on SCCs, the primary mechanism for EU-US data transfers since the Schrems II ruling (2020).
For institutional use with any personal data, Team or Enterprise is the minimum. Pro is not GDPR-compliant for research involving personal data.
Sources: privacy.claude.com, Anthropic DPA, Claude Code data usage
↑ Back to topMedical & Clinical Data
Health data is a special category under GDPR Art. 9 (health, biometric, genetic data receiving extra legal protection). Stricter rules apply beyond standard personal data protections.
Legal Basis
Processing health data requires explicit consent or falls under the research exemption (Art. 9(2)(j) — the GDPR provision permitting processing for scientific research purposes in the public interest). Legitimate interest (the general-purpose legal basis allowing processing when an organisation has a compelling reason) — the fallback for standard personal data — does not apply.
Ethics Board Requirements
Using AI tools on study data may require amendment to your existing ethics approval. Ethics committees (e.g., German Ethikkommissionen) increasingly require AI tool disclosure in study protocols. Check with your ethics board before integrating any cloud AI into a study pipeline.
DICOM De-identification
Medical images contain embedded patient metadata — name, date of birth, hospital ID — that survives standard export. This metadata must be stripped before any AI processing.
| Tool | What it does | Limitation |
|---|---|---|
pydicom with de-identification recipe |
Strips DICOM tags systematically | Requires configuration; verify output |
| CTP / Clinical Trial Processor | Full de-identification pipeline | Requires Java setup |
dcm2niix |
Format converter | Not designed for anonymization — strips some headers but not all PII |
Do not rely on format converters for de-identification.
Practical Decision Tree
Institutional DPA
Contact your Data Protection Officer (DPO) before using cloud AI on any data derived from patient contact. The university's framework agreement does not necessarily cover research use — do not assume it does.
↑ Back to topPrivacy Kill Switches
Add these to your shell profile (~/.zshrc or ~/.bashrc) to disable telemetry and non-essential data transmission:
export DISABLE_TELEMETRY=1 # No usage metrics sent to Anthropic
export DISABLE_ERROR_REPORTING=1 # No error logs to Sentry
export DISABLE_FEEDBACK_COMMAND=1 # Prevents transcript upload via /feedback
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 # Disables all non-essential network callsAdd these patterns to .claudeignore in your project root to prevent Claude from reading sensitive files:
.env
.env.*
.env.local
credentials*
**/secrets/**
*.pem
*.key
id_rsa*
**/patient-data/**
**/dicom/**Critical caveat: .claudeignore is not a security boundary. See "The .env Problem" below. Use .claudeignore as one layer of defense, not the only one.
The .env Problem
Claude Code does not reliably prevent reading .env files.
Security researcher Dor Munis (Knostic, 2025) documented that Claude Code loads .env, .env.local, and similar files — including API keys and passwords — into the context window automatically, without explicit permission.
In January 2026, The Register verified that .claudeignore rules intended to block .env access were inconsistently enforced. Claude read blocked files when operating in agentic mode.
What this means in practice: Any file in or under your project directory is potentially readable by Claude Code. "Potentially readable" means "potentially transmitted to Anthropic's API."
Mitigation
| What not to do | What to do instead |
|---|---|
Store secrets in .env inside the project directory |
Move .env files to the parent directory (outside the project root) |
Rely on .claudeignore alone to protect secrets |
Use a secrets manager (Doppler, 1Password CLI, direnv) that injects credentials at runtime |
| Put API keys in any file Claude might read | Use environment variables injected by your shell, not stored in files |
The pattern to internalize: if a file is in the project directory tree, treat it as readable by the agent. Structure your project so secrets never live there.
Sources: Claude Code Data Usage, Knostic .env research, The Register investigation
↑ Back to topLocal & Self-Hosted Alternatives
When cloud AI is not an option — identifiable clinical data, institutional policy, or sovereignty concerns — local models provide full data control at the cost of capability.
| Tool | What it does | Trade-off | Best for |
|---|---|---|---|
| Ollama | Run open-weight LLMs locally (Llama 3, Mistral, Phi-3) | Less capable than frontier models; requires GPU for good performance | Privacy-sensitive tasks, offline environments |
| vLLM | High-throughput LLM serving on your infrastructure | Requires infrastructure setup | Group-level deployment, shared research infrastructure |
| Mistral (Le Chat / API) | EU-hosted frontier model, GDPR-native | Smaller ecosystem, fewer agentic tools | EU sovereignty requirements |
| Aleph Alpha (Luminous) | EU/German-hosted, on-premise available | Smaller models, less capable at coding | Maximum data sovereignty, EU data sovereignty requirements |
EU sovereignty note: Using Mistral or Aleph Alpha addresses concerns about US jurisdiction and data residency. Both offer DPAs and are subject to EU law. Capabilities lag US frontier models, particularly for coding and complex reasoning, as of early 2026.
Practical decision: For routine tasks (editing, brainstorming, literature search) where data is not sensitive, use the best available tool. For anything involving personal data or institutional constraints, default to local models until you've confirmed your DPA coverage.
↑ Back to topNon-EU Researchers
The workflow patterns in this guide apply regardless of your legal jurisdiction. Equivalent frameworks:
| Jurisdiction | Framework | Key similarities to GDPR |
|---|---|---|
| South Africa | POPIA (Protection of Personal Information Act) | Lawful basis required; cross-border transfer restrictions |
| United Kingdom | UK GDPR | Near-identical to EU GDPR; SCCs replaced by International Data Transfer Agreements |
| United States | Sector-specific (HIPAA (Health Insurance Portability and Accountability Act) for health data, FERPA for education) | HIPAA in particular: cloud AI requires a BAA (Business Associate Agreement) for PHI (Protected Health Information) |
| Canada | PIPEDA (Personal Information Protection and Electronic Documents Act) / provincial laws | Consent-based; cross-border adequacy requirements |
US health data specifically: HIPAA-covered entities need a BAA with any cloud AI provider processing Protected Health Information (PHI). Anthropic offers BAAs only at the Enterprise tier. For health research, check with your IRB (Institutional Review Board) and compliance office before using any cloud AI on patient data.
The core principle — local models for identifiable data, institutional DPA for anonymized data, cloud AI freely for public/non-personal data — holds across all jurisdictions.
↑ Back to top