Phase 16 - Lesson 19
Swarm Optimization for LLMs (PSO, ACO)
This lesson includes a graded coding exercise that runs in your browser, unlocked with lifetime access.
Bio-inspired optimization is making an LLM comeback. LMPSO (arXiv:2504.09247) uses PSO where each particle's velocity is a prompt and the LLM generates the next candidate; works well on structured-sequence outputs (math expressions, programs). Model Swarms (arXiv:2410.11163) treats each LLM expert as a PSO particle on a model-weight manifold and reports 13.3% average gain over 12 baselines on 9 datasets with just 200 instances. SwarmPrompt (ICAART 2025) hybridizes PSO + Grey Wolf for prompt optimization. AMRO-S (arXiv:2603.12933) is ACO-inspired pheromone specialists for multi-agent LLM routing — 4.7x speedup, interpretable routing evidence, quality-gated asynchronous update that decouples inference from learning. This lesson implements PSO on prompt parameter space and ACO on agent routing, measures why these classical algorithms fit the LLM era, and when they do not.
Type: Learn + Build Languages: Python (stdlib) Prerequisites: Phase 16 · 09 (Parallel Swarm Networks), Phase 16 · 14 (Consensus and BFT) Time: ~75 minutes
Problem
You have a prompt that scores 62% on your task eval. You want to improve it. The naive move is gradient-free manual tweaking, which scales badly. Reinforcement learning needs reward signals and enough rollouts to train. Backprop through prompts is not really possible — the prompt is a discrete string, not a differentiable parameter.
Classical bio-inspired optimization — PSO for continuous search spaces, ACO for path selection — was designed exactly for this regime: gradient-free, population-based, cheap per evaluation. Pair them with LLMs for the gradient-free search step, and you get a surprisingly practical optimizer.
The same patterns apply to agent routing in multi-agent systems. An ACO-style pheromone trail records which agent worked best on which task-type, lets the router exploit the trail, and decays pheromones so routes can be rediscovered.
Concept
PSO refresher (Kennedy & Eberhart 1995)
Particle Swarm Optimization: population of particles in a continuous search space. Each particle has position x_i and velocity v_i. Each iteration:
v_i <- w * v_i + c1 * r1 * (p_best_i - x_i) + c2 * r2 * (g_best - x_i)
x_i <- x_i + v_i
evaluate fitness(x_i)
update p_best_i if improved
update g_best if global best
Where p_best is particle's own best, g_best is swarm's best, w, c1, c2 are inertia + cognitive + social weights, r1, r2 are random factors.
PSO on LLM outputs — LMPSO
arXiv:2504.09247 adapts PSO for LLM-generated structured outputs (math expressions, programs). Each particle is a candidate output. Velocity is a prompt that describes how to modify the current output toward the personal/global best. The LLM generates the new output from the velocity prompt. The "inertia" of the velocity is a prompt like "make small incremental changes."
This works well when:
- The output is structured (parseable, evaluable).
- Fitness is automatic (test runs, arithmetic evaluation).
- Population is small (~10-30 particles) so total LLM calls stay manageable.
It does not work well when fitness needs human review — the per-iteration cost becomes prohibitive.
Model Swarms
arXiv:2410.11163 takes PSO off the output layer and into the model layer. Each "particle" is an expert LLM (parameters). The swarm moves the parameters toward the collective best via a gradient-free update. Reported: 13.3% average gain over 12 baselines on 9 datasets, with just 200 instances per iteration.
The key insight is that LLM expert models are already nearby in a shared parameter manifold (adapter weights, LoRA deltas). PSO on this low-dimensional subspace is cheap and effective.
ACO refresher (Dorigo 1992)
Ant Colony Optimization: ants traverse a graph; each path has a pheromone trail. Ant move probabilities weight by pheromone strength. Ants that complete the task deposit pheromone proportional to solution quality. Pheromone decays over time.
AMRO-S — ACO for agent routing
arXiv:2603.12933 uses ACO for multi-agent routing. Each task-type is a "destination"; each agent is a possible route. Pheromones strengthen routes that produce good outputs. Key contributions:
- Interpretable routing evidence. Pheromone strength is a human-readable signal.
- Quality-gated asynchronous update. Pheromones update only after quality checks pass, decoupling inference from learning.
- 4.7x speedup on the multi-agent routing benchmark.
The quality gate matters: without it, fast-but-wrong agents accrue pheromone, and the system locks in on bad routes.
When to use PSO / ACO for LLMs
Use PSO when:
- Search space is continuous or maps to continuous parameters (prompt embeddings, LoRA weights, numeric generation parameters).
- Fitness is cheap and automatic.
- Population can be small (10-30).
Use ACO when:
- You have a routing or path-selection problem.
- Decisions reinforce over time (the same task types come back).
- You need interpretable evidence for routing decisions.
Do not use either when:
- Fitness requires human review (too expensive per iteration).
- The search space is discrete and combinatorial in a way that PSO does not cover (use genetic algorithms instead).
- Real-time decisions need strict latency (PSO/ACO converge slowly relative to single-pass heuristics).
Why bio-inspired still wins
Gradient-based methods need differentiable signals. LLM outputs and routing decisions are not trivially differentiable. Pseudo-gradient methods (reinforcement-learned routers, DPO-style prompt tuners) work but need expensive training.
PSO and ACO need only an evaluator function. If you can score a candidate output or a routing decision, you can optimize over the space. That makes the bar for applicability much lower.
Practical limits
- Population budget. N particles × T iterations × per-eval cost. For LLM evals at ~$0.02 / call, a 20-particle PSO running 50 iterations costs ~