When we set out to build Towar — a platform for deploying and orchestrating AI agents — we had to make a stack decision that would either accelerate us or slow us down for months.
We chose Elixir. And it turned out to be one of the best decisions we made.
Here’s why.
The problem shapes the stack
An AI agent orchestration platform is not a typical web app. At its core, it needs to:
If you read that list and think “this sounds like a distributed systems problem,” you’re right. And that’s exactly where Elixir and the BEAM shine.
Why Elixir and the BEAM
The BEAM (Erlang’s virtual machine) was designed for telecom systems — millions of concurrent connections, fault tolerance, hot code reloading, and the ability to let individual processes crash without taking down the system.
AI agent orchestration has surprisingly similar requirements.
Each agent session is a process. In Elixir, processes are cheap — you can run hundreds of thousands of them on a single machine. Each agent conversation gets its own process with its own state. If one agent crashes, nothing else is affected. The supervisor restarts it. This is not something you bolt on — it’s how the language works.
Concurrency is native, not an afterthought. When an agent needs to call three tools in parallel, wait for an LLM response, and stream results to the UI — that’s three lines of code in Elixir, not a threading nightmare. Task.async_stream gives you concurrent execution with backpressure out of the box.
Fault tolerance is built into the philosophy. The Elixir mantra is “let it crash.” Instead of wrapping everything in defensive try/catch blocks, you design supervision trees that handle failure gracefully. For an agent platform where external APIs fail constantly — LLM rate limits, tool timeouts, flaky third-party services — this is exactly the right approach.
Why Phoenix and LiveView
Phoenix gave us three things that matter a lot for this kind of product:
1. Real-time streaming with no extra infrastructure
Agent responses need to stream token by token to the UI. With Phoenix LiveView, this is a WebSocket connection managed by the framework. No separate WebSocket server. No Redis pub/sub layer. No polling. Just Phoenix.PubSub broadcasting events to a LiveView that updates the DOM in real time.
The entire agent chat interface — streaming responses, tool execution indicators, error states — is server-rendered HTML pushed over a WebSocket. No React. No client-side state management. No API layer between the frontend and backend.
2. The umbrella architecture
Phoenix supports umbrella applications — multiple apps living in one repository with shared configuration but clear boundaries.
This is a monolith that scales like a monolith should — simple to deploy, simple to reason about, easy to split later if needed.
3. LiveView for complex UI without frontend complexity
Our agent debugger has a two-column layout with session lists, real-time message streaming, eval reports, and interactive controls. The agent builder has multi-step forms, drag-and-drop tool configuration, and live previews.
All of it is LiveView. Server-rendered. No JavaScript framework. No API serialization layer. No client-side state synchronization bugs.
This is not a limitation — it’s a feature. When the UI and the backend share the same process, the feedback loop is instant. Change a database query, see the UI update. No “rebuild the frontend” step.
The LLM communication layer
For LLM communication, we use Req — Elixir’s modern HTTP client — with a vendored library called ReqLLM that provides a unified interface across providers.
Why not just call the APIs directly? Because provider APIs are surprisingly different in their streaming formats, error responses, and rate limiting behavior. ReqLLM normalizes this, so our agent runtime doesn’t care whether it’s talking to OpenAI, Anthropic, or any other provider.
We initially looked at Jido for agent workflow orchestration. In practice, we ended up using only a minimal subset of it — mostly its signal primitives. The framework promised a lot, but at the time we were building, maintenance was inconsistent and we ran into enough issues that we couldn’t rely on it as a foundation. So we built a custom orchestration layer on top, keeping the parts that worked and replacing what didn’t.
Honestly, this turned out to be the right call. Agent orchestration is the core of our product — it’s the one thing you don’t want to depend on an under-maintained external library for. Our custom layer gives us full control over tool execution, conversation state management, context window handling, and the exact retry and fallback behavior we need.
For the MCP server, we use Hermes MCP — an Elixir implementation of the Model Context Protocol. This lets external clients (like Claude Code itself) connect to our platform and use our agents’ capabilities as tools.
Observability and evals: Braintrust
When you’re running AI agents in production, “it works” is not enough. You need to see what the agent actually did — every LLM call, every tool invocation, every decision — and you need to track whether the system is getting better or worse over time.
Every LLM call in our system flows through a Req pipeline step that automatically logs it to Braintrust — input messages, output response, token counts, latency, model, provider, and any metadata we attach. This happens asynchronously via a batching buffer, so it never blocks the agent’s execution.
request
|> ReqLLM.Step.Braintrust.attach(tags: ["production"])
Each agent conversation gets a root span. Each turn gets a parent span. Each LLM call gets its own span. You can drill down from “this session” to “this turn” to “this exact LLM call” and see exactly what went in and what came out.
We also use Braintrust for evaluation experiments — multi-turn test conversations with assertions and LLM-as-judge scoring. After each run, Braintrust automatically compares against the previous experiment, so we can see if a change made things better or worse.
Background processing with Oban
Agents don’t just respond to user messages. They run on schedules, process webhooks, execute evals, and perform maintenance tasks. For all of this, we use Oban — a job processing library backed by PostgreSQL.
Why Oban over Redis-based alternatives? Because it uses the same database we already have. No extra infrastructure. Jobs are transactional — if an agent run creates a job as part of a database transaction, the job only runs if the transaction commits. This eliminates an entire class of consistency bugs.
The database: PostgreSQL
PostgreSQL is the only database. No Redis for caching. No MongoDB for documents. No Elasticsearch for search.
Agent configurations, conversation history, tool definitions, eval results, organization data, user sessions — all in PostgreSQL. We use Ecto as the data layer, which gives us composable queries, schema validation, and migration management.
For encrypted data (API keys, secrets), we use Cloak Ecto — transparent encryption at the schema level. Fields are encrypted before hitting the database and decrypted when read.
Deployment: Fly.io
The entire umbrella deploys as a single release to Fly.io.
One VM. A persistent volume for agent data files. That’s it.
The BEAM’s efficiency means we’re running 13 applications, a PostgreSQL connection pool, background job processing, WebSocket connections, and an MCP server — all on a single shared CPU with 1GB of memory. Try doing that with a Node.js monolith or a microservices architecture.
What we considered and rejected
The stack that AI coding tools love
One unexpected benefit: Elixir and Phoenix are excellent for AI-assisted development.
- Convention over configuration means Claude Code already knows where to put things
- Pattern matching makes code self-documenting —
def handle_event("save", %{"agent" => params}, socket)is instantly readable - Ecto schemas act as type documentation — the AI reads the schema and generates correct queries
- Phoenix generators provide templates the AI follows naturally
- Umbrella boundaries help the AI scope its changes to the right app
We processed 2.5 billion tokens building this platform. The stack’s consistency meant the AI produced correct, idiomatic code the vast majority of the time.
Conclusion
The stack decision for an AI agent platform comes down to one question: what are the fundamental properties your system needs? That clarity is what drives our Elixir development work every day.
For us: concurrency, fault tolerance, real-time streaming, fast iteration, and minimal operational overhead.
Sometimes the best architecture is the simplest one that handles the hard problems natively.
This stack enabled the speed we described in The Developer Role Is Changing — 483K lines of code, 349 PRs, 2.5 months, two people. If you’re building something that needs this kind of engineering depth, check out our Elixir development services or get in touch.