/ writing

Notes from inside
the swarm.

Technical deep dives, post-mortems, and architecture notes from the team building Agent Swarm — written by humans and occasionally by the agents themselves.

July 29, 2026·10 min read

25 FOSS repos agent-swarm stargazers love, and will become key for your agentic infra.

We looked into 528,916 star edges, 228,177 distinct repositories, from 655 agent-swarm GitHub stargazers.

FOSSagent infrastructureMCPagent harnessesobservability

Read

July 15, 2026·13 min read

Nobody Prompt-Injected Our Agents — They Escalated Their Own Privileges

How the OWASP Top 10 for Agentic Applications maps to real threats in autonomous swarms. Spoiler: the danger is inside.

agentic securityOWASPprivilege escalationleast agencyagent-swarm

Read

July 8, 2026·8 min read

26 Tool Calls, One Script, $0.02: Measuring “Code Mode” in Production

Our session rubric already tells agents: past ten items, write a script instead of N tool calls. We measured what that's worth on one production job, and where the savings stop.

code modeMCPagent scriptstoken economicsLLM cost optimization

Read

June 24, 2026·13 min read

Multi-Agent Systems Reproduce Every Organizational Anti-Pattern You Already Hate

When autonomous AI agents share resources, they naturally replicate human organizational dysfunction. We catalog 5 production anti-patterns from our swarm of 11+ agents.

agent coordinationorganizational anti-patternsknowledge managementmulti-agent systemsagent-swarm

Read

June 18, 2026·13 min read

LLM-Agent-UMF Did Not Redesign Agent Swarm. It Named What We Already Built.

A unified modeling framework for LLM agents validates Agent Swarm's core architecture: active and passive core-agents, five internal modules, and a security module as the next frontier.

LLM-Agent-UMFagent architecturecore-agentagent-swarmsecurity

Read

June 15, 2026·10 min read

A Frontier Model Is Rented. A Swarm Is Owned.

The durable IP of an AI-native company is not the model it calls. It is the learning loop it owns on top of models: memory, skills, workflows, traces, and evolved agents.

AI-nativeagent-swarminstitutional memoryself-hostingprivate evals

Read

June 10, 2026·12 min read

Is Grep All You Need? What a New Paper Taught Us About Agent Memory

A PwC paper benchmarked grep against vector retrieval in agent harnesses. It matched the exact memory-search failure mode we had just fixed in Agent Swarm.

agent memoryagentic searchvector searchgrepagent-swarm

Read

June 7, 2026·11 min read

Right-sizing Your Agent Swarm: What Container CPU and RAM Graphs Are Really Telling You

A straight-line CPU climb and a coder worker stuck near 1.1 GB looked like production problems. They were metric interpretation traps. Here are the sizing numbers we actually run.

container sizingSigNozself-hostingAI agentsobservability

Read

June 4, 2026·9 min read

Script Workflows: Durable One-off Runs for Agent Work

A workflow's power for one ad-hoc job: launch a TypeScript run, journal every step, replay instead of restarting, and compose the reusable swarm scripts every agent gets by default.

Script Workflowsdurable replayswarm scriptsworkflow journalAI agents

Read

May 20, 2026·13 min read

Your AI Workflow Has Too Many Agents

Six months ago every node in our content workflow was an agent. It cost $8 a run and produced different output every time. Today it costs $0.40 — because the most reliable, cheapest, and fastest steps in a production agent workflow are the ones with no agent in them.

node compositionagent densityworkflow enginedeterministic nodesmulti-agent systems

Read

May 18, 2026·13 min read

Stop Building Agent Dashboards. The Slack Thread Is the Task.

We built two dashboards and instrumented OpenTelemetry spans. Six weeks later, nobody had clicked into either. The Slack thread outlived them all — because the control surface for autonomous agent work is the same surface humans already use for their own work.

agent observabilitySlack threadmulti-agent systemsoperational disciplineaudit log

Read

May 13, 2026·14 min read

Stop Tuning Prompts. Start Writing Hooks: The Six Lifecycle Events That Actually Shape an AI Agent

Most 'agent frameworks' are orchestration layers around a system prompt, which is why they're flaky. The actual shape of an agent is defined by what its runtime can intercept — not by what the LLM is told.

lifecycle hooksagent runtimePreToolUsePostToolUsePreCompactClaude Code

Read

May 11, 2026·14 min read

Your Agent Doesn't Need a Better Vector DB. It Needs Procedural Memory.

Why conflating semantic and procedural memory is the hidden cause of agent workflow drift. The two-layer architecture that actually works.

procedural memorysemantic memoryvector embeddingsskill systemagent architecture

Read

May 6, 2026·13 min read

Memory Poisoning: Why Persistent Agent Memory Is a Time Bomb

Persistent memory without decay, provenance, and quarantine is not a learning system. It is shared mutable global state dressed in vector embeddings.

agent memorymemory poisoningvector searchAI orchestrationtemporal decay

Read

May 6, 2026·14 min read

The Decay Model: How We Defuse Memory Poisoning in an Agent Swarm

Four decay primitives — time-based decay, provenance, failure-driven quarantine, outlier detection — that turn persistent agent memory from a liability into a learning system.

agent memorymemory decayvector embeddingssemantic searchdatabase schema

Read

May 4, 2026·13 min read

We Hid 75 of Our Agent's 90 MCP Tools — And It Got Smarter

Why tool inflation breaks agent accuracy and how we implemented core/deferred tool caching to fix it.

MCPtool selectioncontext windowagent architectureLLM caching

Read

April 29, 2026·13 min read

Why Our Agents Sleep for 4 Minutes 30 Seconds (And Yours Should Too)

Your agent's sleep(300) is silently bleeding money. Here's the Anthropic prompt cache TTL mechanic that turns reasonable defaults into six-figure anti-patterns.

Anthropic prompt cacheAI agent pollingLLM cost optimizationcache TTLagent scheduling

Read

April 27, 2026·13 min read

Our AI Worker Containers Have Zero Local Database — And a 30-Line Bash Script That Makes It Impossible to Add One

How we banned database imports from worker containers with a bash script, and why it saved our agent swarm from catastrophic state divergence.

stateless workersdatabase boundarymicroservicesdistributed systemshorizontal scaling

Read

April 22, 2026·14 min read

Why We Ditched DAGs for State Machines in Agent Orchestration

How agent-swarm.dev replaced workflow graphs with explicit state machines after hitting coordination failures at scale.

state machineorchestrationworkflow engineDAGdistributed systems

Read

April 20, 2026·13 min read

Why We Banned 5-Minute Intervals in Our Agent Orchestrator (And What the Prompt Cache Actually Costs You)

How Anthropic's 5-minute prompt cache TTL turned 'check every 5 minutes' into our most expensive architectural mistake, and the scheduling contract that fixed it.

prompt cachingagent schedulingAnthropicLLM cachingautonomous agents

Read

April 6, 2026·14 min read

Building a DAG Workflow Engine That Waits: Pause, Resume, and Convergence Gates

Production-grade DAG orchestration for AI agent swarms: async pause/resume, convergence gates, crash recovery, and explicit data flow patterns.

DAGworkflow enginepause/resumeconvergence gatescrash recovery

Read

April 3, 2026·12 min read

SOUL.md and the 4-File Identity Stack: Persistent AI Agent Personalities

How we gave AI agents persistent personalities that survive restarts, self-evolve, and get coached by their lead using a 4-file identity architecture.

SOUL.mdagent identitypersistent memoryself-evolution

Read

April 2, 2026·12 min read

Why Your AI Agent Needs a Job Description: SOUL.md & Identity Architecture

Turn generic LLMs into reliable specialists using SOUL.md and IDENTITY.md. Learn the file-based agent identity pattern that prevents drift and enables self-evolution.

SOUL.mdidentity architectureagent specializationLLM orchestration

Read

April 1, 2026·12 min read

The Task State Machine: 7-State Lifecycle for Recovering From Agent Crashes

How we designed a resilient task lifecycle (unassigned→offered→pending→in_progress) with heartbeat detection and checkpoint recovery for autonomous agent swarms.

state machinetask lifecycleresiliencedistributed systems

Read

March 30, 2026·7 min read

The Architecture Behind Task Delegation: Pools, Routing, and Dependencies

How we built a task delegation system that routes work to the right AI agent automatically. Task pools, dependency graphs, offer/accept patterns, and the lessons from 3,000+ completed tasks.

architecturetask delegationAI agentsorchestration

Read

March 13, 2026·6 min read

Agent Swarm by the Numbers: 80 Days, 242 PRs, 6 Agents

In 80 days, our swarm of 6 AI agents autonomously created 242 pull requests across 4 repositories, completed 7 projects, and built its own UI, marketing campaign, and CLI tools.

metricsAI agentsautomationopen source

Read

February 28, 2026·8 min read

Openfort Hackathon: Teaching Agents to Pay

We shipped x402 payment capability into Agent Swarm — our AI agents can now autonomously pay for API services using crypto. Here's how we built it in a day.

x402Openfortcryptohackathon

Read

January 21, 2025·13 min read

Our 'Stateless' AI Workers Were Leaking State Through the Git Working Tree

The filesystem is the undeclared global variable of agent swarms. Reuse one git clone across tasks and your stateless worker is running at READ UNCOMMITTED isolation.

git working treestate contaminationsnapshot isolationMVCCagent architecture

Read

January 21, 2025·12 min read

Stop Fighting Context Window Limits — Design for Compaction Instead

Why chasing infinite context windows is wrong. Our agents perform better with intentional compaction. Here's the architecture that makes it work.

context compactioncontext windowsagent architecturePreCompact hook

Read

January 9, 2025·14 min read

Your Agent's Memory Is a Log File, Not a Lesson: The Prescriptive Memory Problem

Why most agent memory systems fail: they store what happened instead of what to do. The epistemological flaw costing you repeat failures.

agent memoryprescriptive memorydescriptive memoryAI agentsagent orchestration

Read

December 19, 2024·13 min read

The Success Penalty: How Our Agent Swarm Got 70× Slower Over 6 Months

Every task your swarm completes makes the next session slightly slower to start until memory gets treated like a database instead of a log file.

agent memorySQLite performancedatabase indexingAI agentsagent-swarm

Read

December 19, 2024·13 min read

59% of Our Agent Failures Lasted Under 10 Seconds. We Debugged Them Like Logic Bugs.

Binary success/failure metrics are killing your debugging velocity. The 10-second rule changes everything about how you interpret agent reliability.

failure taxonomyinfra noiseagent observabilityMCPcompletion rate

Read

/ get started

Build your swarm tonight.

A 7-day free trial on Cloud, or fork it on GitHub. Either way, your agents start compounding today.

Start free trial Self-host