Agent Operations: Timeouts, Retries, and Stalemate Detection
Reference guide for Symphony's agent timeout, retry, and stalemate detection behavior.
Timeout Configuration
Symphony uses two timeout mechanisms that work together as belt-and-suspenders:
turn_timeout_ms (Primary)
Per-agent hard deadline timer. Implemented as a setTimeout inside the agent process promise. Fires at the configured time and kills the agent process tree.
- Default: 1,200,000 ms (20 min)
- Config path:
agent.turn_timeout_msinsymphony.config.json - Per-project override:
agent.turn_timeout_msin project settings
stall_timeout_ms (Backup)
Orchestrator-level wall-clock check, polled every poll_interval_ms (default 5s) in the reconcile loop. Catches zombie or hung processes where the primary timer fails to fire (e.g., blocked subprocess spawns, Node.js event loop stalls).
- Default: 1,200,000 ms (20 min)
- Config path:
stall_timeout_msinsymphony.config.json - Recommendation: Set slightly higher than
turn_timeout_msto avoid false positives
How They Interact
Both result in SIGTERM to the agent process tree and a timed_out status on the agent run record.
Retry Configuration
Exponential Backoff
When an agent fails or exits without producing work, Symphony schedules a retry with exponential backoff:
delay = min(10,000 ms × 2^(attempt - 1), max_retry_backoff_ms)| Attempt | Delay |
|---|---|
| 1 | 10s |
| 2 | 20s |
| 3 | 40s |
| 4 | 80s |
| 5 | 160s |
| 6+ | 300s cap |
Config Keys
| Key | Default | Description |
|---|---|---|
agent.max_retries | 15 | Total runs per issue before circuit break |
agent.max_retry_backoff_ms | 300,000 | Cap on backoff delay (5 min) |
Circuit Breaker
When totalRuns >= max_retries OR consecutiveFailures >= max_retries, the issue is moved to todo with a system comment for human triage. The stalemate counter and failure counter are cleared.
All failed or stale agents go to todo (never blocked). The dispatch cycle re-dispatches the correct agent type based on the issue's current phase label.
Stalemate Detection
Prevents wasted compute from agents that repeatedly exit successfully but produce no work product.
Worker Stalemate Detection
- At dispatch time, the orchestrator records the HEAD commit hash in the worktree
- When a worker agent exits successfully:
- Auto-commit captures any uncommitted changes
- If no PR-worthy diff is detected, a stalemate check runs
- If
git diff <headCommitAtDispatch>..HEADis empty, the stalemate counter increments
- When the stalemate count reaches
max_stale_runs, the issue moves totodowith a comment
Phase Agent Stalemate Detection
Pre-ready phase agents (research, architecture, grooming) track artifact changes instead of git diffs:
- At dispatch time, the orchestrator computes a SHA-256 hash of the phase artifact file (e.g.,
docs/tickets/SYM-001/research.md) - When a phase agent exits successfully without calling
complete_phase:- The orchestrator compares the artifact hash before vs. after
- If unchanged (or file still doesn't exist), the stalemate counter increments
- When the stalemate count reaches
max_stale_runs, the issue moves totodowith a comment
Scope
- Workers and pre-ready phase agents — judges, planners, and background researchers are excluded from stalemate tracking
- In-memory tracking — the counter resets on orchestrator restart
- Absolute cap —
totalRuns(persisted in the database) still enforces the overall limit after restart - Phase agent grace period — first run with no artifact gets a pass (creation is expected to happen on first attempt)
- Counter resets — the phase stalemate counter resets when the agent modifies the artifact, calls
complete_phase, or the issue reaches a terminal status (done/cancelled)
Config
| Key | Default | Description |
|---|---|---|
agent.max_stale_runs | 3 | Consecutive no-progress exits (empty git diff for workers, unchanged artifact hash for phase agents) before circuit break |
Tuning Guide
Small/Fast Projects
{
"agent": {
"turn_timeout_ms": 600000,
"max_retries": 5,
"max_stale_runs": 2
},
"stall_timeout_ms": 630000
}Lower timeouts and retries for projects where agents should complete quickly.
Large/Complex Projects
{
"agent": {
"turn_timeout_ms": 1800000,
"max_retries": 15,
"max_stale_runs": 4
},
"stall_timeout_ms": 1890000
}Higher limits for monorepos or projects requiring extensive exploration.
Cost-Sensitive Environments
{
"agent": {
"turn_timeout_ms": 900000,
"max_retries": 5,
"max_stale_runs": 2,
"max_retry_backoff_ms": 600000
}
}Aggressive limits to minimize wasted compute. Combine with budgetAlertThresholdUsd in project config for spend monitoring.
Per-Project Overrides
Individual projects can override agent.turn_timeout_ms via the Symphony UI settings page. All other agent config keys use the global symphony.config.json values.