Autonomous Content Creator
Multi-agent system that owns a publishing calendar end-to-end. Research, draft, schedule, ship. Human-in-the-loop is optional, not load-bearing.
The first version of this project was a single agent with a long system prompt and access to four tools. It worked. It also broke in exactly the way long-prompt agents break: the writer voice drifted across topics, the researcher started inventing sources when the search tool returned thin results, and one Tuesday the scheduler queued the same post three times because it didn't notice the publisher had already taken it.
The rewrite was a multi-agent system, not because multi-agent is fashionable but because the failures were domain-specific. Each agent now does one thing.
How the agents split
A researcher pulls trending sources, deduplicates them, and proposes angles. A writer drafts under a brand tone profile and respects platform-specific length constraints. A scheduler sequences the queue against a publishing calendar and audience time zones. A publisher calls the platform APIs, handles retries, and writes audit records.
An orchestrator coordinates them through shared state. Each handoff goes through an eval gate. Drafts that fail the quality or brand-safety rubric never reach the publisher.
The bug that taught me about idempotency
Six weeks in, the publisher started double-posting on weekends. The pattern was always the same: a transient API timeout on the platform side, a retry from the publisher, and the platform had quietly succeeded the first call. We were posting the same piece twice and our audit log thought both were independent runs.
The fix wasn't smarter retry logic. It was idempotency keys persisted in the publisher state, plus a reconcile step that asked the platform whether a post with that key already existed before issuing the call. Boring infrastructure work. The kind of thing that makes a system actually production-grade.
What I'd say to anyone building this
Don't decompose for fashion. Decompose where the failures live. The five-agent version of this system is the right shape because each agent failed differently and needed different guardrails. A two-agent version would have been simpler and broken in places I couldn't fix with a prompt edit.
The audit log was a late addition and the most useful thing in the whole system. Every published piece traces back to its source, the agents that touched it, and the eval scores at each gate. When a post lands badly I can replay the chain. When a post lands well I can figure out why and tune from there.
What I'd add next: tighter feedback from real engagement back to the writer. The system currently produces good content. It doesn't yet learn which kind of good content the audience actually rewards.