Sandboxing AI Agents per Business Role
Giving a non-technical manager an AI assistant that can query the real business database and even change the product is genuinely useful - and genuinely dangerous. The danger isn't the model being "evil"; it's that an autonomous agent treats every boundary you enforce by convention as a suggestion. So the boundaries have to be structural.
Here's the design I run in production: each manager gets their own sandboxed instance of our internal app with its own AI assistant, and the safety properties come from the shape of the system rather than from asking the agent nicely.
One container per manager
Every manager's instance is its own container behind a single reverse proxy that routes by hostname, giving each tenant an isolated filesystem and process space for free. A manager's agent literally cannot open another manager's files, because they aren't in its container. This is the cheapest strong isolation you can buy: you're not writing access-control code, you're relying on a boundary the kernel already enforces.
Each container also gets its own database user scoped to its own database, and its own CPU/memory limits, so one runaway agent can't starve its neighbors or reach another tenant's data.
A real database, refreshed every four hours - never the production one
The assistants are only useful if the data looks real, so each sandbox is seeded from a copy of production. The important word is copy. The agent reads and writes a private database rebuilt every four hours from a production mirror via an atomic swap. Two things fall out of that:
- Blast radius is one refresh cycle. Whatever an agent does to its sandbox data is gone at the next refresh. The four-hourly reset isn't just freshness - it's the backstop. "What if the agent corrupts the data?" Then it's corrupted for at most a few hours, in one sandbox, affecting no one.
- Production is never in the loop. The sandboxes never point at the live database, so no code path lets a creative agent action land on a real reservation.
Default-deny command execution
The agent can run shell commands - that's most of its power - but only against an allowlist. The design choice is the fallback: anything not on the list is denied, not asked. A default-ask policy sounds safer but trains everyone to click "yes"; default-deny means the set of things the agent can do is finite, enumerable, and reviewable. Adding a capability is a deliberate edit to a config file.
How code changes get out - and the honest part
This is the part people usually overclaim, so I'll be precise. Agent code changes don't flow straight to production. Each manager's work lands on its own long-lived branch, and a nightly job commits and pushes that branch automatically. Production is a separate branch that only moves when a human merges into it.
So the "gate" between AI-written code and production is a human merge step, not a clever piece of enforcement tooling. I think that's the right design, and I'd rather describe it accurately than dress it up: the structural guarantees (container isolation, private four-hourly-reset databases, default-deny execution) are what make it safe to let the agent work at all; the human merge decides whether any given change is good enough to ship. Pretending a review gate is automated when it's a convention is how teams get surprised.
What generalizes
If you're putting LLM agents in front of non-technical users at your own company, the transferable ideas are:
- Isolate with the platform, not with prompts. Containers and per-tenant DB users do work that no system prompt can.
- Make the data resettable. A four-hourly rebuild turns "the agent broke the data" from an incident into a non-event.
- Default-deny the dangerous surface. A finite, enumerable capability set beats per-action approval fatigue.
- Be honest about which boundaries are enforced vs. conventional. Both are legitimate; conflating them is the trap.
None of this requires a frontier-lab budget. It's mostly Docker, a reverse proxy, a four-hourly cron, and the discipline to keep the agent's authority small and its blast radius smaller.
The full pattern this sandbox layer belongs to - six boundaries, the human-gate tradeoff, and a copyable checklist - is in Human-in-the-Loop AI Agents for Legacy Business Systems.
Each boundary here has its own decision record: per-tenant isolation (0001), snapshot-fed writable sandboxes (0003), a scoped agent identity (0005), and default-deny, host-pinned data access (0009).