Skip to content

Agent Operations: Timeouts, Retries, and Stalemate Detection

Reference guide for Symphony's agent timeout, retry, and stalemate detection behavior.

Timeout Configuration

Symphony uses two timeout mechanisms that work together as belt-and-suspenders:

turn_timeout_ms (Primary)

Per-agent hard deadline timer. Implemented as a setTimeout inside the agent process promise. Fires at the configured time and kills the agent process tree.

  • Default: 1,200,000 ms (20 min)
  • Config path: agent.turn_timeout_ms in symphony.config.json
  • Per-project override: agent.turn_timeout_ms in project settings

stall_timeout_ms (Backup)

Orchestrator-level wall-clock check, polled every poll_interval_ms (default 5s) in the reconcile loop. Catches zombie or hung processes where the primary timer fails to fire (e.g., blocked subprocess spawns, Node.js event loop stalls).

  • Default: 1,200,000 ms (20 min)
  • Config path: stall_timeout_ms in symphony.config.json
  • Recommendation: Set slightly higher than turn_timeout_ms to avoid false positives

How They Interact

Both result in SIGTERM to the agent process tree and a timed_out status on the agent run record.

Retry Configuration

Exponential Backoff

When an agent fails or exits without producing work, Symphony schedules a retry with exponential backoff:

delay = min(10,000 ms × 2^(attempt - 1), max_retry_backoff_ms)
AttemptDelay
110s
220s
340s
480s
5160s
6+300s cap

Config Keys

KeyDefaultDescription
agent.max_retries15Total runs per issue before circuit break
agent.max_retry_backoff_ms300,000Cap on backoff delay (5 min)

Circuit Breaker

When totalRuns >= max_retries OR consecutiveFailures >= max_retries, the issue is moved to todo with a system comment for human triage. The stalemate counter and failure counter are cleared.

All failed or stale agents go to todo (never blocked). The dispatch cycle re-dispatches the correct agent type based on the issue's current phase label.

Stalemate Detection

Prevents wasted compute from agents that repeatedly exit successfully but produce no work product.

Worker Stalemate Detection

  1. At dispatch time, the orchestrator records the HEAD commit hash in the worktree
  2. When a worker agent exits successfully:
    • Auto-commit captures any uncommitted changes
    • If no PR-worthy diff is detected, a stalemate check runs
    • If git diff <headCommitAtDispatch>..HEAD is empty, the stalemate counter increments
  3. When the stalemate count reaches max_stale_runs, the issue moves to todo with a comment

Phase Agent Stalemate Detection

Pre-ready phase agents (research, architecture, grooming) track artifact changes instead of git diffs:

  1. At dispatch time, the orchestrator computes a SHA-256 hash of the phase artifact file (e.g., docs/tickets/SYM-001/research.md)
  2. When a phase agent exits successfully without calling complete_phase:
    • The orchestrator compares the artifact hash before vs. after
    • If unchanged (or file still doesn't exist), the stalemate counter increments
  3. When the stalemate count reaches max_stale_runs, the issue moves to todo with a comment

Scope

  • Workers and pre-ready phase agents — judges, planners, and background researchers are excluded from stalemate tracking
  • In-memory tracking — the counter resets on orchestrator restart
  • Absolute captotalRuns (persisted in the database) still enforces the overall limit after restart
  • Phase agent grace period — first run with no artifact gets a pass (creation is expected to happen on first attempt)
  • Counter resets — the phase stalemate counter resets when the agent modifies the artifact, calls complete_phase, or the issue reaches a terminal status (done/cancelled)

Config

KeyDefaultDescription
agent.max_stale_runs3Consecutive no-progress exits (empty git diff for workers, unchanged artifact hash for phase agents) before circuit break

Tuning Guide

Small/Fast Projects

json
{
  "agent": {
    "turn_timeout_ms": 600000,
    "max_retries": 5,
    "max_stale_runs": 2
  },
  "stall_timeout_ms": 630000
}

Lower timeouts and retries for projects where agents should complete quickly.

Large/Complex Projects

json
{
  "agent": {
    "turn_timeout_ms": 1800000,
    "max_retries": 15,
    "max_stale_runs": 4
  },
  "stall_timeout_ms": 1890000
}

Higher limits for monorepos or projects requiring extensive exploration.

Cost-Sensitive Environments

json
{
  "agent": {
    "turn_timeout_ms": 900000,
    "max_retries": 5,
    "max_stale_runs": 2,
    "max_retry_backoff_ms": 600000
  }
}

Aggressive limits to minimize wasted compute. Combine with budgetAlertThresholdUsd in project config for spend monitoring.

Per-Project Overrides

Individual projects can override agent.turn_timeout_ms via the Symphony UI settings page. All other agent config keys use the global symphony.config.json values.