Governancekill-switchsafetygovernance

AI Agent Kill-Switch: 5-Second Stop for Runaway Agents

When an AI agent goes rogue, you have minutes before the bill, the data leak, or the PR incident. A kill-switch is the 5-second stop — here is how to build one.

Gil KalApril 21, 20265 min read

At 2:14 AM on a Saturday, an agent starts a retry loop. By 2:20 AM it has burned through the monthly OpenAI budget. By 2:45 AM the oncall engineer is paged, fumbles through three dashboards to find the right agent, and struggles to remember whether to revoke the key, disable the agent, or pull the Kubernetes deployment. By 3:10 AM the incident is over — having cost $18,000 and a postmortem that will reference 'unclear shutdown procedure' for years.

This is not a hypothetical. It is the reason every AI platform that survives contact with production grows a kill-switch. Not a polite pause button — a cord you yank to cut power to the entire fleet in under five seconds, from anywhere, without a login ritual.

What a Kill-Switch Actually Is

A kill-switch is not a feature flag. Feature flags propagate through config reloads, CDN caches, and pod restarts — fine for rolling out a new button, useless for an active incident. A kill-switch must be a hot state variable that every request path reads on every call, with sub-second propagation and zero cold-start dependency.

Mechanically, that means a single Redis or Valkey key (or equivalent) that the gateway checks before any other work. Set the key, every in-flight and future request sees the new state within the TTL window. No deploy, no code change, no database migration. It is the infrastructure equivalent of the red button under a glass cover — it should be boring, obvious, and never used unless things are already on fire.

A kill-switch that takes longer than five seconds to deploy is not a kill-switch — it is a change request.

Three Scopes, Not One

One of the hardest design decisions is how blunt the instrument should be. Too blunt and you nuke healthy agents alongside the misbehaving one. Too narrow and you need an operator to diagnose which knob to turn — exactly what you do not have time for at 2:14 AM. Dobby's gateway splits the difference with three scopes:

Scope 1 — Stop LLM traffic: halt all chat.completions calls for the org. MCP tool calls (non-LLM) still run, which preserves the ability to read logs, trigger manual overrides, and query the state of other agents.
Scope 2 — Stop everything: halt LLM traffic AND MCP tool calls AND new task triggers. Nuclear option. Used when the agent is actively causing external harm (sending emails, posting tweets, writing to a database).
Scope 3 — Block new keys: stop new gk_user_*, gk_svc_*, and gk_tmp_* keys from being created. Used when the suspicion is a compromised credential supply chain rather than a misbehaving agent.

In a real incident, operators usually start with Scope 1 — the agent stops bleeding money, the rest of the org keeps working, the oncall person has time to diagnose. Scope 2 is a second-stage escalation. Scope 3 is used rarely, mostly after a known key rotation gets botched.

The 5-Second Propagation Path

How do you get from 'operator clicks the button' to 'next request is blocked' in under five seconds? The state lives in Redis/Valkey, keyed by org, with a 5-second TTL on the read path. Every gateway request does a single GET before it starts the pipeline. If the key says KILLED, the request returns 503 immediately with a clear error code. If the operator sets the key from the admin UI, the state is visible to every gateway replica within one TTL window.

// Gateway hook 0 — kill-switch check (runs before all 14 pipeline hooks)
async function checkKillSwitch(orgId: string): Promise<KillSwitchState> {
  const key = `kill:org:${orgId}`
  const cached = await valkey.get(key)  // 5s TTL read path

  if (!cached) return { active: false }
  const state = JSON.parse(cached) as KillSwitchState

  if (state.scope === 'stop_llm' || state.scope === 'stop_all') {
    throw new HttpError(503, {
      code: 'KILL_SWITCH_ACTIVE',
      scope: state.scope,
      activated_at: state.activated_at,
      activated_by: state.actor,
    })
  }
  return state
}

The trade-off is that disabling the switch also has up to a 5-second delay. Acceptable — restoring traffic is never a 5-second-SLA operation anyway. You check that the root cause is addressed, you un-kill, you watch the dashboard.

Graceful Degradation When Redis Is Down

The obvious question: what if Valkey itself is down? If the kill-switch read fails, you cannot fail-open (that defeats the purpose) and you cannot fail-closed (that is an outage every time your cache hiccups). The answer is an in-memory fallback per pod with a short TTL — on cache miss, the gateway assumes 'not killed' and logs a telemetry event. If Redis comes back up within seconds, normal operation resumes. If Redis stays down, a Slack alert fires and a secondary admin endpoint (Cloud SQL-backed) becomes the source of truth. This is the pattern Dobby's gateway uses, and it is the same pattern that keeps the rate limiters and free-tier checker running during Valkey outages.

Who Can Pull the Cord

The kill-switch should be the most blunt and the most restricted control in the platform. In Dobby, only org owners and platform super-admins can activate it. Every activation is audited with actor, timestamp, scope, and reason. A Slack alert fires instantly to the #incidents channel. And when the switch is released, a follow-up postmortem is auto-created in the admin UI so the team never normalizes pulling the cord without writing up why.

If you take one thing from this: the kill-switch is cheap to build and expensive to not have. Build it before you need it. Practice it in a non-prod drill once a quarter. The hour you spend adding it will be the hour you regret skipping the first time production catches fire.

Ready to take control of your AI agents?

Start free with Dobby AI — connect, monitor, and govern agents from any framework.

Get Started Free