Architecturemulti-tenantgatewaypolicy

Per-Tenant Gateway Profiles for Multi-Tenant AI Policy

One gateway, 50 tenants with different policies? Per-tenant gateway profiles give each tenant its own budget, models, and DLP via a 5-layer merge.

Gil KalApril 24, 20265 min read

A platform team at an enterprise runs one AI gateway for 50 internal tenants. Marketing wants GPT-4o for ad copy. Legal wants Claude for contract review but only with DLP on maximum. Engineering wants Gemini with a large monthly budget. Finance wants no internet-model access at all — only the on-prem LLM. One gateway, four very different policies.

The naive answer is one gateway per tenant, which is how most platforms fail. Fifty gateways means fifty deploys, fifty sets of credentials, fifty monitoring dashboards, fifty things to patch when a vuln drops. The correct answer is one gateway with per-tenant policy overrides — a single control plane that lets each tenant carry its own rules without operating its own infrastructure.

The Problem with Flat Policy

Most LLM gateways ship with org-level policy. You set a budget, a model whitelist, a DLP rule set — it applies to everyone in the org. This is fine for a 10-person startup with one shared OpenAI key, and it is wrong for every enterprise. Tenants have different trust levels, different data sensitivity, different budgets, different compliance requirements. A flat policy forces you to pick the most restrictive rule every time, which over-constrains the permissive tenants and under-enforces the strict ones.

The fix is not 'layered policy' as a buzzword — it is a deterministic merge algorithm that operators can reason about. You want a tenant operator to be able to look at their policy dump and explain exactly why a specific request got blocked or passed, in terms of which layer set which field.

The 5-Layer Merge Dobby Uses

Gateway profiles in Dobby merge five layers, in strict priority order. Later layers win over earlier layers, but only on fields they actually set — not on fields they default to:

Layer 1 — Platform default: the minimum baseline every tenant gets (e.g., DLP scan on, prompt injection detection on, 100 RPM cap)
Layer 2 — Plan tier: per-plan defaults (free gets GPT-4o-mini only; enterprise gets all models, higher RPM, longer retention)
Layer 3 — Org profile: the primary admin-defined policy for the whole organization
Layer 4 — Tenant profile: the override for this specific tenant (e.g., Finance tenant restricts to on-prem models only)
Layer 5 — Gateway key overrides: scope restrictions baked into the API key itself (e.g., CI-only keys get a narrower model list)

The merge is field-level. If the org profile sets max_tokens to 8192 and the tenant profile does not mention max_tokens, the tenant gets 8192. If the tenant profile sets max_tokens to 4096, it wins. If the tenant profile sets it to 16384 but the plan tier caps it at 8192, the plan tier wins — plan caps are hard ceilings that tenant overrides cannot exceed.

Multi-tenant policy is not a bag of rules. It is a priority list, and the priority order must never surprise the operator.

Dual Cost Checking

One trap with layered budgets is the silent overshoot. Tenant has a $5,000 monthly budget. Org has a $50,000 monthly budget. What happens when the tenant's usage is within its own budget but the aggregate across tenants is busting the org cap? Naive systems pick one or the other. Dobby's gateway checks both — the request must pass tenant budget AND org budget. Whichever is tighter at the moment of the call wins.

// Dual budget check — runs at hook 5 of the pipeline
async function checkBudget(orgId: string, tenantId: string, cost: number) {
  const [orgBudget, tenantBudget] = await Promise.all([
    getOrgBudgetUsage(orgId),
    getTenantBudgetUsage(tenantId),
  ])

  if (tenantBudget.remaining < cost) {
    throw new BudgetError('TENANT_BUDGET_EXHAUSTED', {
      tenant_id: tenantId,
      limit: tenantBudget.limit,
      used: tenantBudget.used,
    })
  }
  if (orgBudget.remaining < cost) {
    throw new BudgetError('ORG_BUDGET_EXHAUSTED', {
      org_id: orgId,
      limit: orgBudget.limit,
      used: orgBudget.used,
    })
  }
}

The error code tells the operator which ceiling was hit. A tenant that keeps hitting its own budget wants a quota increase from the org admin. An org that keeps hitting the org ceiling wants to revisit the org plan. These are different escalation paths and they deserve different error signals.

The Data Residency Bonus

Per-tenant profiles also solve a thorny compliance problem: data residency. A multi-tenant SaaS where tenants live in IL, EU, and US regions cannot route EU tenants' LLM calls through US-hosted providers without blowing up GDPR. Per-tenant profiles let you set provider_region constraint per tenant — the EU tenant can only use providers with an EU endpoint, the IL tenant can only use IL-hosted ones, and the US tenant is unconstrained.

Combined with Dobby's regional BigQuery datasets (ds_tenant_il, ds_tenant_eu, ds_tenant_us), this means even the audit trail stays in the right region. The tenant's requests, responses, audit logs, and cost records never cross the regional boundary. That is not a feature you can retrofit — it is a foundation you either build in on day one or you never have at all.

How to Model Your Own Profiles

If you are designing this from scratch: start with the org and platform layers, ship one opinionated default, and add tenant-level override only when a real tenant asks for it. Resist the temptation to predict which fields need to be overridable — you will predict wrong. Add override capability field-by-field, driven by actual tenant requests. By the time you have 20 tenants, you will know which fields are worth overriding and which are just complexity.

Also: make the merged, resolved policy a first-class API. Tenant operators should be able to GET /policy/resolved and see exactly what is active for their tenant right now, annotated by which layer set each field. Without that debug surface, troubleshooting is guesswork — and troubleshooting is always where multi-tenant policy pain lives.

Ready to take control of your AI agents?

Start free with Dobby AI — connect, monitor, and govern agents from any framework.

Get Started Free