Research Source
Towards a Science of Scaling Agent Systems (arXiv:2512.08296)
180 controlled experiments across 3 LLM families and 4 agentic benchmarks Β· Predictive accuracy: 87%
One brain, zero coordination overhead
One reasoning locus with all perception, reasoning, and action in a single sequential loop.
Topology
Sequential loop with unified memory stream
Single-Agent System
One brain handles everything sequentially
Best For
- βSequential tasks requiring full context integration
- βLow-latency requirements (<100ms)
- βSimple to moderate complexity tasks
- βTasks with limited tool usage (β€4 tools)
- βBudget-constrained projects
Limitations
- βLimited capacity for task decomposition
- βSingle point of failure
- βMay struggle with highly parallelizable tasks
When to Choose
Start here unless you have clear evidence that task decomposition will help. Research shows SAS often matches or beats MAS for many tasks.
Real-World Examples
Work alone, combine at the end
Agents work in isolation with results aggregated. No peer communication.
Topology
Agent-to-aggregator only (no peer communication)
Independent Multi-Agent
Work independently, combine results at end
Best For
- βEmbarrassingly parallel tasks
- βTasks where diversity of attempts is valuable
- βSimple aggregation scenarios
Limitations
- βHighest error amplification (17.2x)
- βNo error correction between agents
- βDuplicates errors without correction opportunities
- βUniversal underperformance vs SAS (-70% on some tasks)
When to Choose
Avoid this architecture. Research consistently shows it performs worse than alternatives due to error amplification and wasted parallel effort without verification.
Real-World Examples
One boss coordinates specialist workers
Central orchestrator coordinates specialized agents.
Topology
Orchestrator-to-agents communication only
Centralized Multi-Agent
Boss assigns tasks to specialists
Best For
- βNaturally decomposable tasks (revenue, cost, market analysis)
- βTasks requiring specialized domain expertise
- βInformation synthesis from multiple sources
- βFinancial analysis (+80.8% improvement observed)
Limitations
- βHigh coordination overhead (285%)
- βCounterproductive for sequential tasks
- βOrchestrator becomes bottleneck
- βArtificial subtask decomposition wastes tokens
When to Choose
Choose when your task naturally splits into independent subtasks that can be worked on in parallel by specialists, and an orchestrator can reliably synthesize outputs.
Real-World Examples
All agents communicate with all others
All agents communicate with all other agents (all-to-all topology).
Topology
All-to-all peer communication
Decentralized Multi-Agent
Everyone communicates with everyone
Best For
- βTasks benefiting from parallel exploration
- βConsensus-building scenarios
- βDistributed information gathering
- βTasks where redundancy provides error correction
Limitations
- βHigh coordination overhead (263%)
- βCommunication complexity grows quadratically
- βHigher error amplification (7.8x)
When to Choose
Choose when parallel exploration and cross-checking genuinely help (e.g. tool-heavy or multi-perspective problems) and latency is not critical. Communication cost grows O(nΒ²).
Real-World Examples
Hierarchical control + peer collaboration
Combines centralized orchestration with limited peer-to-peer communication.
Topology
Orchestrator plus limited peer-to-peer
Hybrid Multi-Agent
Hierarchical + peer-to-peer communication
Best For
- βComplex tasks requiring both coordination and collaboration
- βScenarios needing hierarchical control with peer verification
- βTasks with natural sub-group structures
Limitations
- βHighest overhead (515%)
- βLowest efficiency (0.074)
- βCollapses on tool-heavy benchmarks
- βMost complex to implement and debug
When to Choose
Only for genuinely complex tasks where simpler architectures have failed and you can afford very high coordination overhead. Protocol complexity increases failure modes.
Real-World Examples
Key findings on when multi-agent systems help vs hurt
Tool Coordination Tradeoff
Tool-heavy tasks (T>4) suffer disproportionately from multi-agent coordination overhead.
Threshold: 4Capability Saturation
Coordination yields diminishing returns beyond ~45% single-agent baseline.
Threshold: 0.45Critical Complexity Threshold
Domain complexity threshold at Dβ0.40 determines MAS viability.
Threshold: 0.4Overhead Threshold
For T=16 tools, overhead threshold is ~150% beyond which coordination cost exceeds benefit.
Decomposability Requirement
Coordination benefits depend on task decomposability rather than team size.
