Privacy & GDPR

Practical Compliance Guide for Research Data

Key Points

Before You Start: Decision Flowchart

Work through this before opening any AI tool with research data.

Does your data contain personal information about identifiable individuals?
YES → Is it health, biometric, or clinical data?
YES → See Medical & Clinical Data section below. Use local models (Ollama) only, on approved infrastructure. Contact your Data Protection Officer (DPO) before proceeding.
NO → Is it pseudonymized or fully anonymized?
ANON → Cloud AI permitted. Use Team plan (DPA required for GDPR).
PSEUDO → Does your institution have a DPA with the provider?
YES → Cloud AI permitted with care (Team/Enterprise). Scope working directory tightly.
NO → Local models only, or upgrade to Team.
NO → Is it unpublished research, embargoed data, or confidential grant content?
YES → Do not paste into cloud AI. Draft locally, then paste anonymized/synthetic excerpts for editing help only.
NO → Cloud AI permitted. Standard precautions apply.
↑ Back to top

What Gets Sent to the Cloud

Everything you type or paste into a cloud AI tool (Claude.ai, ChatGPT, Copilot) is transmitted to external servers. This includes:

What you doWhat gets transmitted
Type a prompt The full prompt text
Paste text, code, or data into the chat The full pasted content
Upload or attach a file The full file contents
Run claude in a project directory Prompts and model responses (encrypted via TLS); not your file system

Claude Code specifically: Claude Code runs locally on your machine. It reads files into its context window and sends that context to the API with each request. This is not excerpts or summaries — it is full file contents. A large session with many files read can transmit substantial portions of a codebase.

What also gets loaded automatically at session start, without a permission prompt:

During the agentic loop, Claude proactively reads additional files — package.json, source files, lock files, tests — to gather context. Each file's content is appended to the context window and transmitted on the next API call.

↑ Back to top

What Stays Local

What stays localHow to enforce it
Files you do not paste or upload Don't paste them
Files excluded by .claudeignore Add patterns to .claudeignore in your project root
Your file system (not scanned or indexed) No action needed
Telemetry and error logs Set DISABLE_TELEMETRY=1 (see Privacy Kill Switches)

Important caveat: .claudeignore rules are not fully reliable in agentic mode. See "The .env Problem" below.

↑ Back to top

Paid Plan Comparison

FeatureClaude Pro ($20/mo)Claude Team ($30/user/mo)Claude Enterprise
Trains on your data No (opt-in only) No No
Data retention 30 days 30 days 30 days (custom configurable)
Zero Data Retention Not available Not available Available
Data Processing Agreement (DPA) Not available Yes — automatic, includes EU SCCs Yes — automatic, includes EU SCCs
Anthropic's legal role Data controller Data processor Data processor
Controller = Anthropic decides how data is used. Processor = your institution decides, Anthropic handles it per your instructions.
GDPR Art. 28 compliance No Yes Yes
Recommended for institutional use No Yes Yes

Key implications

For institutional use with any personal data, Team or Enterprise is the minimum. Pro is not GDPR-compliant for research involving personal data.

Sources: privacy.claude.com, Anthropic DPA, Claude Code data usage

↑ Back to top

Medical & Clinical Data

Health data is a special category under GDPR Art. 9 (health, biometric, genetic data receiving extra legal protection). Stricter rules apply beyond standard personal data protections.

Legal Basis

Processing health data requires explicit consent or falls under the research exemption (Art. 9(2)(j) — the GDPR provision permitting processing for scientific research purposes in the public interest). Legitimate interest (the general-purpose legal basis allowing processing when an organisation has a compelling reason) — the fallback for standard personal data — does not apply.

Ethics Board Requirements

Using AI tools on study data may require amendment to your existing ethics approval. Ethics committees (e.g., German Ethikkommissionen) increasingly require AI tool disclosure in study protocols. Check with your ethics board before integrating any cloud AI into a study pipeline.

DICOM De-identification

Medical images contain embedded patient metadata — name, date of birth, hospital ID — that survives standard export. This metadata must be stripped before any AI processing.

ToolWhat it doesLimitation
pydicom with de-identification recipe Strips DICOM tags systematically Requires configuration; verify output
CTP / Clinical Trial Processor Full de-identification pipeline Requires Java setup
dcm2niix Format converter Not designed for anonymization — strips some headers but not all PII

Do not rely on format converters for de-identification.

Practical Decision Tree

Is the data identifiable clinical data (patient records, DICOM, study data)?
YES → Local models (Ollama) only, on approved institutional infrastructure. Contact Data Protection Officer (DPO) first.
NO → Is it anonymized research data derived from patient contact?
YES → Cloud AI permitted with Team plan (DPA included). Verify anonymization is complete before uploading.
NO → Standard GDPR decision flowchart applies.

Institutional DPA

Contact your Data Protection Officer (DPO) before using cloud AI on any data derived from patient contact. The university's framework agreement does not necessarily cover research use — do not assume it does.

↑ Back to top

Privacy Kill Switches

Add these to your shell profile (~/.zshrc or ~/.bashrc) to disable telemetry and non-essential data transmission:

export DISABLE_TELEMETRY=1 # No usage metrics sent to Anthropic export DISABLE_ERROR_REPORTING=1 # No error logs to Sentry export DISABLE_FEEDBACK_COMMAND=1 # Prevents transcript upload via /feedback export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 # Disables all non-essential network calls

Add these patterns to .claudeignore in your project root to prevent Claude from reading sensitive files:

.env .env.* .env.local credentials* **/secrets/** *.pem *.key id_rsa* **/patient-data/** **/dicom/**

Critical caveat: .claudeignore is not a security boundary. See "The .env Problem" below. Use .claudeignore as one layer of defense, not the only one.

↑ Back to top

The .env Problem

Claude Code does not reliably prevent reading .env files.

Security researcher Dor Munis (Knostic, 2025) documented that Claude Code loads .env, .env.local, and similar files — including API keys and passwords — into the context window automatically, without explicit permission.

In January 2026, The Register verified that .claudeignore rules intended to block .env access were inconsistently enforced. Claude read blocked files when operating in agentic mode.

What this means in practice: Any file in or under your project directory is potentially readable by Claude Code. "Potentially readable" means "potentially transmitted to Anthropic's API."

Mitigation

What not to doWhat to do instead
Store secrets in .env inside the project directory Move .env files to the parent directory (outside the project root)
Rely on .claudeignore alone to protect secrets Use a secrets manager (Doppler, 1Password CLI, direnv) that injects credentials at runtime
Put API keys in any file Claude might read Use environment variables injected by your shell, not stored in files

The pattern to internalize: if a file is in the project directory tree, treat it as readable by the agent. Structure your project so secrets never live there.

Sources: Claude Code Data Usage, Knostic .env research, The Register investigation

↑ Back to top

Local & Self-Hosted Alternatives

When cloud AI is not an option — identifiable clinical data, institutional policy, or sovereignty concerns — local models provide full data control at the cost of capability.

ToolWhat it doesTrade-offBest for
Ollama Run open-weight LLMs locally (Llama 3, Mistral, Phi-3) Less capable than frontier models; requires GPU for good performance Privacy-sensitive tasks, offline environments
vLLM High-throughput LLM serving on your infrastructure Requires infrastructure setup Group-level deployment, shared research infrastructure
Mistral (Le Chat / API) EU-hosted frontier model, GDPR-native Smaller ecosystem, fewer agentic tools EU sovereignty requirements
Aleph Alpha (Luminous) EU/German-hosted, on-premise available Smaller models, less capable at coding Maximum data sovereignty, EU data sovereignty requirements

EU sovereignty note: Using Mistral or Aleph Alpha addresses concerns about US jurisdiction and data residency. Both offer DPAs and are subject to EU law. Capabilities lag US frontier models, particularly for coding and complex reasoning, as of early 2026.

Practical decision: For routine tasks (editing, brainstorming, literature search) where data is not sensitive, use the best available tool. For anything involving personal data or institutional constraints, default to local models until you've confirmed your DPA coverage.

↑ Back to top

Non-EU Researchers

The workflow patterns in this guide apply regardless of your legal jurisdiction. Equivalent frameworks:

JurisdictionFrameworkKey similarities to GDPR
South Africa POPIA (Protection of Personal Information Act) Lawful basis required; cross-border transfer restrictions
United Kingdom UK GDPR Near-identical to EU GDPR; SCCs replaced by International Data Transfer Agreements
United States Sector-specific (HIPAA (Health Insurance Portability and Accountability Act) for health data, FERPA for education) HIPAA in particular: cloud AI requires a BAA (Business Associate Agreement) for PHI (Protected Health Information)
Canada PIPEDA (Personal Information Protection and Electronic Documents Act) / provincial laws Consent-based; cross-border adequacy requirements

US health data specifically: HIPAA-covered entities need a BAA with any cloud AI provider processing Protected Health Information (PHI). Anthropic offers BAAs only at the Enterprise tier. For health research, check with your IRB (Institutional Review Board) and compliance office before using any cloud AI on patient data.

The core principle — local models for identifiable data, institutional DPA for anonymized data, cloud AI freely for public/non-personal data — holds across all jurisdictions.

↑ Back to top