Pker.xyz

Autom8: A Claude Orchestrator

rust, tools

github.com/louisboilard/autom8 | crates.io | docs.rs


Intro & Objectives

At the moment of writing this, orchestrators I have been exposed to have sane concepts but the implementation lacks the things we aim to bring with autom8: type safety, determinism where possible (i.e autom8 does as much work as possible, only leverage Claude when needed), safe re-runs/resuming of interupted work, ratatui powered TUI to monitor on-going runs and investigate/review previous runs, a native desktop GUI for richer workflows, proper error/state types, sane toml based config files, intelligent context injection between agents via git diffs and shared context, concrete agent types, commits/pr creations if desired, git worktree support for parallel execution, context size optimization..

Some of these exists in orchestrators that I believe are massively overengineered or in others that are very much underengineered (multiple thousand lines of shell scripts, or requiring a bunch of external deps or spawning multiple agents at the same time and basically tell them to "figure it out" and "self correct").

Autom8 therefore aims to improve on already existing concepts (agentic loops, context sharing between iteratively spawned agents..) in a simple way: a single binary, one command to run, cross platform, rich per-project configuration, sane defaults, easy to interface with.

Let's take a look.


Architecture Overview

autom8 is structured around three core concepts: specs, state, and knowledge.

A spec defines what needs to be built. It's a JSON file listing user stories with acceptance criteria, priorities, and a pass/fail status for each. The spec is the source of truth for "what work remains."

source
{
  "project": "my-feature",
  "branchName": "feature/add-auth",
  "userStories": [
    {
      "id": "US-001",
      "title": "Add login endpoint",
      "acceptanceCriteria": ["POST /login returns JWT", "Invalid creds return 401"],
      "priority": 1,
      "passes": false
    }
  ]
}

State tracks execution progress. Every transition gets persisted to disk, so the orchestrator can resume from any point. The state includes the current machine state, which story is being worked on, iteration counts, and timestamps.

Knowledge accumulates across the run. When a story completes, autom8 captures what files were modified, what architectural decisions were made, and what patterns were established. This knowledge gets injected into subsequent prompts, giving later agents context about earlier work.

The component relationships look like this:

source
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│      Spec       │     │   RunState      │     │  ProjectKnowledge│
│  (what to do)   │────▶│ (where we are)  │────▶│  (what we know)  │
└─────────────────┘     └─────────────────┘     └───────────────── ┘
         │                      │                        │
         │                      ▼                        │
         │              ┌─────────────────┐              │
         └─────────────▶│   StateManager  │◀─────────────┘
                        │   (persistence) │
                        └─────────────────┘
                                │
                        ┌───────┴───────┐
                        ▼               ▼
                  state.json      runs/*.json
                  (current)       (archived)

The StateManager handles all persistence. Current state lives in a single file that gets updated on every transition. When a run completes, it's archived with a timestamp for later reference.


The State Machine

Current impl is a 12-state machine with explicit transitions. Every state is a discrete phase of execution.

rust
#[derive(Debug, Clone, Copy, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum MachineState {
    Idle,
    LoadingSpec,
    GeneratingSpec,
    Initializing,
    PickingStory,
    RunningClaude,
    Reviewing,
    Correcting,
    Committing,
    #[serde(rename = "creating-pr")]
    CreatingPR,
    Completed,
    Failed,
}

At a higher level there are 4 main steps, the user only runs `autom8` to go through all 4.

First is planning/spec creation: launching autom8 will spawn claude with an injected prompt, where we define what we want to work on, once done we exit, autom8 picks the new spec and converts it to json.

Second phase is implementation: agents will do what the plan told them to.

Thirdly we have a review-correction loop.

Lastly (and optionally): commits and pr creation.

The state flow for a typical run:

source
                         ┌──────────────────┐
                         │       Idle       │
                         └────────┬─────────┘
                                  │ start
                                  ▼
                         ┌──────────────────┐
                         │   LoadingSpec    │
                         └────────┬─────────┘
                                  │
                                  ▼
                         ┌──────────────────┐
                         │  Initializing    │
                         └────────┬─────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              │                   ▼                   │
              │          ┌──────────────────┐         │
              │          │  PickingStory    │◀────────┤
              │          └────────┬─────────┘         │
              │                   │                   │
              │    ┌──────────────┼──────────────┐    │
              │    │ incomplete   │    all done  │    │
              │    ▼              │              │    │
              │ ┌──────────────┐  │              │    │
              │ │ RunningClaude│  │              │    │
              │ └──────┬───────┘  │              │    │
              │        │          │              │    │
              │        ▼          │              │    │
              │ ┌──────────────┐  │              │    │
   issues ──▶ │ │  Reviewing   │  │              │    │
   found      │ └──────┬───────┘  │              │    │
     │        │        │ pass     │              │    │
     │        │        ▼          │              │    │
     │        │ ┌──────────────┐  │              │    │
     │        │ │  Committing  │  │              │    │
     │        │ └──────┬───────┘  │              │    │
     │        │        │          │              │    │
     │        └────────┴──────────┘              │    │
     │                                           │    │
     ▼                                           ▼    │
┌──────────────┐                        ┌──────────────┐
│  Correcting  │                        │  CreatingPR  │
└──────┬───────┘                        └──────┬───────┘
       │                                       │
       └──────────▶ back to Reviewing          ▼
                                        ┌──────────────┐
                                        │  Completed   │
                                        └──────────────┘

State is persisted after every transition.


Cumulative Context Architecture

Each Claude invocation starts fresh with zero knowledge of what happened before. For single-shot tasks this is fine. For complex workflow/implementations where we want to keep relatively low context per agent but share some level of context to avoid doing too much work per agent we need something a little more sophisticated.

Consider implementing three related features: authentication, user profiles, and settings. Story 1 creates the auth module. Story 2 needs to know about that module to build profiles. Story 3 needs to know about both to build settings that integrate with them.

autom8 solves this with the ProjectKnowledge struct:

rust
pub struct ProjectKnowledge {
    /// Known files and their metadata, keyed by path
    pub files: HashMap<PathBuf, FileInfo>,

    /// Architectural decisions made during the run
    pub decisions: Vec<Decision>,

    /// Code patterns established during the run
    pub patterns: Vec<Pattern>,

    /// Changes made for each completed story
    pub story_changes: Vec<StoryChanges>,

    /// Baseline commit hash when the run started
    pub baseline_commit: Option<String>,
}

After each agent completes, we captures knowledge from two sources:

  1. Git diffs: What files were created, modified, or deleted
  2. Agent output: Semantic information the LLM provides about decisions and patterns

The agent is prompted to output structured context blocks:

source
<files-context>
src/auth.rs | JWT authentication module | [authenticate, verify_token]
src/main.rs | Application entry | [main]
</files-context>

<decisions>
Auth method | JWT | Stateless, scalable, well-supported
</decisions>

<patterns>
Use Result<T, AuthError> for auth operations
</patterns>

This gets parsed and merged into the knowledge base:

rust
pub struct StoryChanges {
    pub story_id: String,
    pub files_created: Vec<FileChange>,
    pub files_modified: Vec<FileChange>,
    pub files_deleted: Vec<PathBuf>,
    pub commit_hash: Option<String>,
}

pub struct FileChange {
    pub path: PathBuf,
    pub additions: u32,
    pub deletions: u32,
    pub purpose: Option<String>,
    pub key_symbols: Vec<String>,
}

When the next story starts, autom8 injects this knowledge into the prompt. The agent sees a table of files touched so far, architectural decisions that were made, and patterns it should follow. This context injection is automatic, stories don't need to explicitly reference each other. We have git based non-llm fallbacks for when agents misbehave and don't output file context/decision. Note context injection going from one agent to the next is a sub millisecond op: there's no llm invocation for this.

Between stories, the user might make unrelated changes—updating dependencies, editing configs, etc. autom8 tracks which files it touched via the baseline_commit and filters subsequent diffs to only include "our" changes:

rust
pub fn filter_our_changes(&self, all_changes: &[DiffEntry]) -> Vec<DiffEntry> {
    let our_files = self.our_files();

    all_changes
        .iter()
        .filter(|entry| {
            // Include if it's a new file (we created it)
            if entry.status == DiffStatus::Added {
                return true;
            }
            // Include if we've touched it before
            our_files.contains(&entry.path)
        })
        .cloned()
        .collect()
}

Each story in a run has access to accumulated context from all previous stories, without the noise of unrelated changes. Later agents build on earlier work rather than rediscovering it.


Type-Safe Error Handling

Orchestration code being inherently complex (there are dozens of failure modes: file not found, invalid JSON, git errors, Claude crashes, timeouts, network issues)...

We use exhaustive enums for all error types, with compiler enforced all variant handling:

rust
use thiserror::Error;

#[derive(Error, Debug)]
pub enum Autom8Error {
    #[error("Spec file not found: {0}")]
    SpecNotFound(PathBuf),

    #[error("Invalid spec format: {0}")]
    InvalidSpec(String),

    #[error("No incomplete stories found in spec")]
    NoIncompleteStories,

    #[error("Claude process failed: {0}")]
    ClaudeError(String),

    #[error("Claude process timed out after {0} seconds")]
    ClaudeTimeout(u64),

    #[error("State file error: {0}")]
    StateError(String),

    #[error("No active run to resume")]
    NoActiveRun,

    #[error("Run already in progress: {0}")]
    RunInProgress(String),

    #[error("IO error: {0}")]
    Io(#[from] std::io::Error),

    #[error("JSON error: {0}")]
    Json(#[from] serde_json::Error),

    #[error("Review failed after 3 iterations")]
    MaxReviewIterationsReached,
    // ... more variants
}
        

Each variant has a human-readable message, and the #[from] attribute enables automatic conversion from underlying error types.

Outcome enums follow the same pattern. Rather than returning booleans or magic strings, operations return typed results:

rust
#[derive(Debug, Clone, PartialEq)]
pub enum ReviewResult {
    Pass,
    IssuesFound,
    Error(ClaudeErrorInfo),
}

#[derive(Debug, Clone, PartialEq)]
pub enum CorrectorResult {
    Complete,
    Error(ClaudeErrorInfo),
}

When handling review results, the match is exhaustive:

rust
match review_result {
    ReviewResult::Pass => {
        // All good, proceed to commit
        state.transition_to(MachineState::Committing);
    }
    ReviewResult::IssuesFound => {
        // Need correction
        if state.review_iteration < 3 {
            state.transition_to(MachineState::Correcting);
        } else {
            return Err(Autom8Error::MaxReviewIterationsReached);
        }
    }
    ReviewResult::Error(info) => {
        // Claude failed, decide what to do
        log::error!("Review failed: {}", info.message);
        state.transition_to(MachineState::Failed);
    }
}

The benefit of this approach compounds over time: adding a new error type or state variant causes compile errors everywhere that needs updating.


Self-Correcting Review Loops

We use a simple bounded review loop. After all stories are marked complete, a reviewer agent checks the work against the spec. If issues are found, a corrector agent attempts fixes. This cycles up to n times before giving up gracefully.

The flow looks like this:

source
PickingStory (all complete)
        │
        ▼
    Reviewing ◀───────────┐
        │                 │
   ┌────┴────┐            │
   │         │            │
Pass    IssuesFound       │
   │         │            │
   ▼         ▼            │
Committing  Correcting ───┘
                (if iteration < n)

The reviewer agent gets the full spec context and writes issues to a file if any are found:

rust
pub fn run_reviewer<F>(
    spec: &Spec,
    iteration: u32,
    max_iterations: u32,
    mut on_output: F,
) -> Result<ReviewResult>
where
    F: FnMut(&str),
{
    let prompt = build_reviewer_prompt(spec, iteration, max_iterations);
    // ... run Claude ...

    // Check if review file exists and has content
    let review_path = Path::new(REVIEW);
    if review_path.exists() {
        match std::fs::read_to_string(review_path) {
            Ok(content) if !content.trim().is_empty() => Ok(ReviewResult::IssuesFound),
            Ok(_) => Ok(ReviewResult::Pass),
            Err(e) => Ok(ReviewResult::Error(/* ... */)),
        }
    } else {
        Ok(ReviewResult::Pass)
    }
}

If issues are found, the corrector agent reads that file and attempts fixes.

rust
// In the main loop
state.review_iteration += 1;

match run_reviewer(&spec, state.review_iteration, 3, on_output)? {
    ReviewResult::Pass => {
        state.transition_to(MachineState::Committing);
    }
    ReviewResult::IssuesFound => {
        if state.review_iteration >= 3 {
            return Err(Autom8Error::MaxReviewIterationsReached);
        }
        state.transition_to(MachineState::Correcting);
    }
    ReviewResult::Error(e) => { /* handle */ }
}

When the limit is hit, autom8 leaves the review file in place. The human can review the remaining issues, make manual fixes, and re-run if needed.


Git Worktrees: Parallel Execution

When worktrees are enabled (which they are by default), autom8 creates an isolated worktree for each session. The session ID is deterministic is just a SHA-256 hash of the repo path and branch name, truncated to 8 hex chars. Same inputs always produce the same session, so resuming a run finds the right worktree automatically.

rust
pub struct WorktreeInfo {
    pub path: PathBuf,
    pub branch: String,
    pub session_id: String,
}

pub enum WorktreeResult {
    Created(WorktreeInfo),
    Existing(WorktreeInfo),
    Conflict { branch: String, existing_path: PathBuf },
}

The session ID generation is straightforward:

rust
fn generate_session_id(repo_path: &Path, branch: &str) -> String {
    let mut hasher = Sha256::new();
    hasher.update(repo_path.to_string_lossy().as_bytes());
    hasher.update(b":");
    hasher.update(branch.as_bytes());
    let hash = hasher.finalize();
    hex::encode(&hash[..4]) // first 8 hex chars
}

Before creating a worktree, autom8 checks for branch conflicts. Git doesn't allow two worktrees to have the same branch checked out. If another session is already using the branch, we return a Conflict instead of silently breaking:

rust
// In ensure_worktree()
for wt in existing_worktrees {
    if wt.branch == branch && wt.path != expected_path {
        return Ok(WorktreeResult::Conflict {
            branch: branch.to_string(),
            existing_path: wt.path,
        });
    }
}

The worktree path pattern is configurable via worktree_path_pattern in the config. The default is {repo}-wt-{branch}, placing worktrees as siblings of the main repo directory. After a run completes, worktrees can be automatically cleaned up if worktree_cleanup is enabled.


Follow-Up: The Improve Command

What happens when you come back to a feature branch a day later and need refinements, bug fixes, or extensions?

autom8 improve is designed for this. It gathers context from up to three layers and spawns an interactive Claude session with full awareness of what was previously implemented. You describe what you want changed, and the session starts with all the relevant history already loaded. This saves you time instead of either re-running autom8 or running a claude session to make changes to what autom8 did.

The context model has three layers: git context (always present), the original spec (if it still exists), and accumulated project knowledge (if a previous run captured it). Each layer is optional except git, which is derived directly from the branch:

rust
pub struct FollowUpContext {
    pub git: GitContext,
    pub spec: Option<Spec>,
    pub knowledge: Option<ProjectKnowledge>,
}

pub struct GitContext {
    pub branch: String,
    pub commits: Vec<CommitInfo>,
    pub diff_summary: String,
    pub files_changed: Vec<PathBuf>,
}

GitContext is reconstructed from the branch itself: the commits since divergence from main, a diff summary, and the list of files touched. This is always available regardless of whether the original run used autom8 or not — any feature branch has enough git history to build useful context.

The richness of the session depends on how many layers are present. autom8 uses a simple 1-3 scale:

rust
impl FollowUpContext {
    /// 1 = git only, 2 = git + one of spec/knowledge, 3 = all three
    pub fn richness_level(&self) -> u8 {
        1 + self.spec.is_some() as u8 + self.knowledge.is_some() as u8
    }
}

At level 1 (git only), Claude sees the branch diff and commit messages. At level 2, it also gets either the spec with acceptance criteria or the knowledge base with architectural decisions. At level 3, it has the full picture: what was planned, what was decided during implementation, and exactly what changed. The richer the context, the less you need to re-explain when asking for changes.


UI/UX Design

autom8 has three display modes: a streaming CLI for running tasks, an interactive TUI for monitoring, and a native desktop GUI for richer workflows. All prioritize simplicity and consistency over showing everything.

CLI Output: Filtering the Noise

Claude produces lots of output: tool calls, file reads, thinking blocks, status messages... We filter this to only output what happens, and we always display the same nb of chars at the same pos for consistency.

rust
fn should_display(event: &StreamEvent) -> bool {
    match event {
        StreamEvent::Text(text) => !text.trim().is_empty(),
        StreamEvent::ToolUse { name, .. } => {
            // Show edits and writes, skip reads and searches
            matches!(name.as_str(), "Edit" | "Write" | "Bash")
        }
        StreamEvent::Thinking(_) => false,
        _ => false,
    }
}

We get a clean stream that shows: state transitions, which story is being worked on, edits being made, and commands being run. You can follow progress without scrolling through hundreds of lines of file contents and tool metadata.

Full Claude output is still available in log files if you need to debug.

Phase banners mark transitions between stages:

source
━━━━━ RUNNING CLAUDE ━━━━━
Working on: US-001 Add login endpoint

━━━━━ REVIEWING ━━━━━
Iteration 1/3

━━━━━ COMMITTING ━━━━━
feat(auth): implement login endpoint

These banners adapt to terminal width and provide consistent visual anchors as the run progresses.

Autom8 CLI output showing filtered, clean progress display

CLI output: filtered to show only relevant state transitions and actions

Monitor TUI: Real-Time Dashboard

The monitor TUI uses ratatui for an interactive dashboard. It polls state files from disk and renders a live view of all running projects.

The main view is a 2x2 quadrant grid showing active runs:

Navigation is arrows or vim-style: hjkl to move between quadrants, n/p to paginate when there are more than four runs, and Tab to switch views. Three views are available: active runs, project list, and run history.

Autom8 monitor TUI showing real-time dashboard with quadrant grid

Monitor TUI: real-time dashboard with state and progress at a glance

Desktop GUI

The CLI and TUI are great for headless monitoring and quick checks, but some workflows benefit from a richer interface. autom8 ships a native desktop GUI built with eframe/egui that provides visual config editing, spec creation through a chat interface, and detailed session views that would be impractical in a terminal.

The GUI uses a dynamic tab system. Each session, config editor, or spec chat opens in its own tab, so you can switch between monitoring a running session and editing the config for the next one without losing context. Session tabs show live session cards with state, progress, current story, and iteration count.

Output display uses a source selection model. For a running session, you see live streamed output. For a completed iteration, you can select any previous iteration's output to review what happened. When nothing is actively running, the card shows the final status and summary. This three-tier approach (live output, iteration output, status) keeps the display relevant regardless of session state.

The config editor renders the TOML config as a form with toggles and text fields, validates changes before writing, and supports both global and project-level configs. Spec creation works through a chat panel where you describe what you want to build, and autom8 generates the spec JSON, the same flow as the CLI but with a visual conversation history.

Desktop GUI: live session monitoring with output stream and story status


Config

Config lives at ~/.config/autom8/config.toml, project specific config: ~/.config/autom8/"project_xyz"/config.toml and is automatically created when running autom8 for the first time or in another directory where we want to do work. Project specific configs take precedence over global config.

The config struct captures what features are enabled:

rust
pub struct Config {
    pub review: bool,                // Run reviewer/corrector loop?
    pub commit: bool,                // Create git commits?
    pub pull_request: bool,          // Create GitHub PR?
    pub pull_request_draft: bool,    // Create PR as draft?
    pub worktree: bool,              // Use git worktrees for isolation?
    pub worktree_path_pattern: String, // e.g. "{repo}-wt-{branch}"
    pub worktree_cleanup: bool,      // Remove worktree after run?
}

The new fields cover worktree isolation (worktree, worktree_path_pattern, worktree_cleanup) and draft PRs (pull_request_draft). Config is validated at load time — enabling pull_request without commit is an error, since there's nothing to push.

Config gets snapshotted at run start and stored in the state. This ensures resumed runs use the same settings they started with, even if the config file changes:

rust
pub struct RunState {
    // ...
    /// Configuration snapshot taken at run start
    #[serde(default)]
    pub config: Option<Config>,
    // ...
}

/// Get the effective config for this run
pub fn effective_config(&self) -> Config {
    self.config.clone().unwrap_or_default()
}

Conclusion

Our goal with autom8 is to have a simple, fast, somewhat sophisticated and robust tool that facilitates using Claude for complex tasks by doing sane context window optimization, knowledge sharing between agents, features baked in (commits, pr's, reviews, worktrees) without user hassle and without noise.

It's published on crates.io and works well for complex/long tasks where a single claude instance's context window would grow quite large.

github.com/louisboilard/autom8 | crates.io

Uploaded 31-01-2026 | Updated 07-02-2026