Here's the dirty secret about shipping AI agents in production: the model is the easy part. The actual time sink is sandboxed execution, crash recovery, state persistence, credential vaulting, and context window engineering. I've watched teams spend more engineering months on that infrastructure plumbing than on the agent's behavior itself.

Anthropic's answer dropped Tuesday as a public beta: Claude Managed Agents. The pitch is blunt — stop building all of that. Hand over your system prompt, tools, and MCP servers; Anthropic handles the containers, the sandboxing, the crash recovery, and the context engineering. The engineering blog post pulled two million views in two hours and 40,000 likes. Notion, Asana, and Rakuten already have integrations shipping. Every API account can start using it today with the managed-agents-2026-04-01 beta header.

The Architecture Trick That Actually Matters

Most "managed agent" services are just a hosted container with an LLM loop inside. Anthropic did something structurally different — they decoupled three components that every existing framework crams into a single process:

flowchart LR App["Your App"] -- events --> S["Session\n(event log)"] S -- context --> B["Brain\n(Claude + harness)"] B -- "execute()" --> H["Hands\n(containers, MCP)"] H -- result --> B B -- emit --> S S -- SSE --> App

The Brain is Claude plus the orchestration harness. The Hands are sandboxed execution environments — containers, MCP servers, anything that actually runs code. The Session is an append-only event log that lives outside both.

This separation has a non-obvious consequence. The harness no longer lives inside the container. It calls containers through a clean execute(name, input) → string interface, treating them identically to any other tool. Container crash? Brain keeps reasoning. Harness crash? It reboots, pulls event history from the durable session store via getEvents(), and resumes from the last recorded event. Zero state loss, zero context corruption.

The performance numbers are striking: roughly 60% reduction in median time-to-first-token, over 90% at p95. Containers provision lazily — only when Claude actually needs to execute something — instead of blocking inference during upfront setup. Because harnesses are stateless, they scale horizontally without dedicated containers per brain. Organizations can even connect Claude to their own VPC infrastructure without network peering, which is a big deal for regulated industries that couldn't previously touch hosted agent services.

What You Build Against

The API surface is four concepts: Agent (model, system prompt, tools, MCP servers), Environment (container template with packages and network rules), Session (a running instance performing a task), and Events (SSE messages flowing between your app and the agent). Create an agent definition once, spawn unlimited sessions from it.

Sessions support mid-flight steering — send new events to redirect the agent while it's executing, or hard-interrupt to change course. Full event history persists server-side, so replay, audit, and debugging come free. The platform handles prompt caching, context compaction, and tool orchestration internally. No more hand-rolling your own agent loop.

Built-in tools include bash, file operations, web search and fetch, and arbitrary MCP connections. Credentials never touch execution sandboxes: OAuth tokens sit in external vaults, with a proxy making authenticated calls on the session's behalf.

The Price Tag

Standard Claude token rates plus $0.08 per session-hour. Web search inside sessions costs 10 per 1,000 queries. For a typical 15-minute Sonnet coding session, you're looking at about 0.02 in platform fees plus $0.30–0.80 in tokens depending on complexity.

That's not free. At scale — say, thousands of concurrent sessions running for hours — the session-hour fees stack up fast. But compare that to hiring two or three infrastructure engineers to build equivalent sandboxing, crash recovery, and state management from scratch. For most teams, the math isn't close.

The Lock-in Question Nobody Wants to Ask

I want to be direct about this because the enthusiasm is drowning out the downsides.

OpenAI's Responses API gives you tool execution without managed infrastructure — you still run your own sandboxes. Google's ADK and frameworks like LangGraph or CrewAI are runtime-level, meaning you own the full stack. Managed Agents is the only offering where the provider controls infrastructure end to end.

The closest open-source alternative is Multica — self-hosted with multi-model support and task lifecycle management. If vendor independence matters to your organization, that's your path.

Here's what concerns me: your agent definitions, session schemas, event formats, and tool configurations are all Anthropic-proprietary. Switching providers later means rewriting your infrastructure layer, not just swapping a model name in a config file. For a startup shipping an agent feature next quarter, that tradeoff is probably acceptable. For a platform team building capabilities they'll maintain for five years, think carefully about what you're coupling to before the convenience becomes a cage.

Where This Is Heading

The research preview features reveal Anthropic's roadmap. Multi-agent orchestration lets sessions spawn and coordinate child agents. Outcomes let you declare what "done" means and have the platform optimize toward it. Persistent memory carries context across sessions, turning one-shot executors into long-running collaborators that learn.

This is a platform play. Anthropic is building the operating system layer that agent-powered products run on — not just a runtime but a substrate. The forty thousand developers who liked that announcement in two hours weren't celebrating a new API. They were expressing relief. They'd been building exactly this infrastructure themselves, and they desperately want someone else to own the pager.