Securitydlpcontent-shieldpii

AI Data Leak Prevention: 26 DLP Patterns at the LLM Gateway

AI agents leak PII, credit cards, and API keys daily. 26 DLP patterns at the gateway level catch them before they reach the provider — block, redact, or alert.

Gil KalApril 23, 20265 min read

A support agent pastes a customer email into a prompt to summarize it. The email contains a credit card number. The summary gets logged in the LLM provider's telemetry, the audit log, the anomaly detector, and the observability dashboard. Three regulatory frameworks have just been violated in a single call — GDPR, PCI DSS, and whatever internal data handling policy your legal team wrote last quarter.

This is not a hypothetical. Data leakage through AI agents is the most common compliance finding in any SOC 2 audit of an AI platform, because the default flow — user input → LLM → response — has no choke point where secrets get caught. The only durable fix is to intercept before the request leaves your infrastructure. That is what a Content Shield does.

Why Upstream DLP Doesn't Help Here

Enterprises already have DLP for email (Mimecast, Microsoft Purview), for file uploads (Netskope), and for cloud storage (Google DLP API). None of these sit on the LLM request path. An agent making an API call to OpenAI flows over HTTPS from a backend service, bypassing every DLP gateway the security team has deployed. The request never looks like email or a file upload — it looks like an API call, because that is what it is.

The only place to run DLP on an AI agent's traffic is the gateway that sits between your agent and the LLM provider. If you do not have such a gateway, you have no enforcement point. The LLM provider might filter some patterns on their side (OpenAI redacts some PII in their moderation layer), but you cannot audit what they caught and you cannot change the policy. A self-hosted gateway gives you that control.

Your LLM provider is not your DLP. Their redaction is a courtesy, not a contract.

The 26 Patterns — What Content Shield Actually Scans For

Dobby's Content Shield ships with 26 patterns out of the box, grouped into four categories:

Identity — US SSN, EU national IDs (DE, FR, IT, ES, NL), Israeli Teudat Zehut, UK NI numbers, passport number formats for the G20
Financial — credit card numbers with Luhn validation, IBANs, SWIFT codes, US routing numbers
Credentials — API keys for the top 12 providers (OpenAI sk_*, Anthropic, Google, AWS, Stripe, GitHub, Slack, Twilio, SendGrid, and more), private keys with PEM headers, JWTs, SSH keys
PII free-form — email addresses, phone numbers (E.164 and national formats), IP addresses, MAC addresses

Each pattern runs on both the inbound prompt and the outbound completion. That matters — an LLM can generate PII that was not in the prompt (hallucinated credit card numbers that happen to pass Luhn, for example). Outbound scanning catches those. The same policy applies in both directions.

Three Actions — Block, Redact, Alert

For each pattern class, the policy picks one of three actions. Block halts the request and returns a 422 error with the matched class. Redact replaces the matched span with a marker like [REDACTED:CREDIT_CARD] and lets the request through. Alert passes the request through unchanged but fires a Slack notification. Different patterns want different actions — you almost always block API keys, redact credit cards, and alert on internal email domains.

// Per-org Content Shield policy
{
  "version": 1,
  "actions": {
    "api_key_openai": "block",
    "api_key_aws": "block",
    "private_key_pem": "block",
    "credit_card": "redact",
    "ssn_us": "redact",
    "national_id_eu": "redact",
    "email_internal": "alert",
    "phone_number": "alert"
  },
  "alert_channels": ["slack:#security"],
  "redaction_marker": "[REDACTED:{type}]",
  "scan_outbound": true
}

The 'scan_outbound' flag is worth highlighting. It roughly doubles CPU cost per request (you scan both the prompt and the completion), but it is the only way to catch model-generated leaks. For regulated industries, treat it as mandatory.

One Invariant — Scan the Original, Not the Rewritten

Earlier versions of Dobby's pipeline ran profile merge (which can rewrite fields in the request body) before Content Shield. This missed a class of leaks where a profile override injected a PII-laden default into the prompt. The lesson is general: the DLP scanner must operate on the original request body, byte-for-byte, not a downstream transformation of it. Any rewriting that happens between the user and the scanner is a hole — small, but real.

This is why Content Shield sits at hook 8 in the gateway, before profile merge at hook 9. The order is not arbitrary; it is the result of one incident and a better invariant.

Audit Trail and the Compliance Story

Every Content Shield match writes three audit records: one to the regional policy_validation_log, one to dobby_security.security_events, and one into the llm_gateway_requests row for that call. This redundancy is not elegant, but it is what lets a compliance dashboard widget, a real-time security console, and a long-range usage report all query the same event without one service being the bottleneck.

For a SOC 2 or ISO 27001 audit, the evidence you hand the auditor is the 365-day retention of these logs plus the versioned policy JSON. You can show every request that triggered a block, every redaction that landed, and every alert that fired. 'Show me all the times a credit card was sent to OpenAI in Q1' is a single SQL query. That is the shape of a defensible DLP program for AI agents — not a PDF of a policy, but logs that prove the policy ran.

Ready to take control of your AI agents?

Start free with Dobby AI — connect, monitor, and govern agents from any framework.

Get Started Free