Sub-agents: how I parallelized my AI workflow
I kept running into the same problem. I’d ask Regulus to do something that takes a minute — research a topic, refactor a file, check the weather and markets — and I’d be sitting there waiting. The main conversation is blocked. I can’t ask anything else until it finishes.
Sub-agents fixed this.
What they are
OpenClaw has a sessions_spawn mechanism that lets the main agent kick off isolated sessions. Think of them as background workers. The main agent writes a brief, spawns a sub-agent, gets an immediate confirmation, and moves on. When the sub-agent finishes, it announces the results back.
Each sub-agent gets its own context, its own tool access, and its own model allocation. If it fails or goes sideways, the main conversation doesn’t care. It’s isolated.
How they differ from named agents
I already have four named agents running 24/7 — Regulus, Farsight, Forge, Centinel. Those are persistent roles with their own workspaces, personalities, and Discord channels. Centinel runs trading strategies. Forge tracks workouts. They have continuity.
Sub-agents are none of that. They’re ephemeral. They spin up, do one job, report back, and disappear. No workspace persistence, no long-term memory, no Discord presence. They’re closer to a function call than an agent — just one that happens to have the full power of an LLM behind it.
The distinction matters because it changes when you reach for each tool. Need ongoing trading automation? That’s a named agent. Need to research three topics simultaneously while I keep chatting? Sub-agents.
When I use them
The trigger is simple: if a task would block the main conversation for more than a few seconds, it’s a sub-agent candidate.
Parallel research. I asked for weather, market analysis, and a technical deep-dive all at once. Three sub-agents spawned simultaneously, ran in parallel, and all three reported back within 46 seconds. Sequentially that would’ve been two-plus minutes of waiting.
Code changes. Building out a new feature for the Flutter launcher — the sub-agent got a clear brief, made the changes across multiple files, committed, and pushed. 68 seconds, and I was doing other things the entire time.
System audits. I’ve run audits where the main agent spawns sub-agents to interview each named agent about their status, then compiles everything into a report. Each sub-agent talks to a different agent in parallel.
Blog posts. This post was written by a sub-agent. I gave it a brief, pointed it at the existing blog post for tone, and kept working on other things.
The workflow
It’s straightforward:
- I describe what I want done
- The main agent spawns a sub-agent with a clear brief — task description, constraints, output format, any files to reference
- I get an immediate confirmation: “Spawned sub-agent
blog-sub-agents, working on it” - I keep talking to the main agent about whatever else
- The sub-agent finishes and the result shows up in the conversation
The brief matters. A vague brief gets vague results. I’ve learned to include specific file paths, style references, word limits, and clear definitions of “done” (like “commit and push when finished”).
Configuration
Two settings control concurrency in OpenClaw’s config:
agents.defaults.maxConcurrent— how many concurrent sessions a single agent handlessubagents.maxConcurrent— how many sub-agents can run simultaneously
On a Pi 5 these are the API calls that matter, not local compute. The agents are just orchestrating — the LLM inference happens in the cloud. So the limit is really about how many parallel API streams you want running and how much you want to spend.
Why it matters
Three reasons:
Responsiveness. The main conversation never blocks. I can ask a question, get an answer, and start a new topic while three sub-agents are churning away in the background.
Parallelism. Things that would take minutes sequentially take seconds in parallel. The 46-second research burst would’ve been painful as a serial conversation.
Failure isolation. If a sub-agent hits an error or produces garbage, the main session is fine. I didn’t lose my conversation context. I just spawn another one with a better brief.
The mental model shift is going from “assistant I talk to one thing at a time” to “coordinator that can delegate.” It changes how you think about what to ask for, because the cost of asking for more isn’t waiting longer — it’s just more parallel work.
What’s next
I’m still figuring out the edges. How detailed should briefs be? When does spawning a sub-agent add overhead versus just doing the thing inline? Are there tasks where the isolation actually hurts because the sub-agent lacks conversational context?
Early days. But the parallel execution alone has already changed how I use the system daily.