Workflows & runs
The durable-execution model — immutable versions, run lifecycle, signals, and replay.
Workflows are versioned, immutable definitions
A Workflow is a graph of nodes — agents, activities, tools, LLM calls, control nodes — stored as a versioned document. Published versions are immutable: you never edit a live definition in place. Changing a workflow means publishing a new version and flipping the active pointer; rolling back is flipping it back. Every run records exactly which version it executed, so a run from last month replays against the definition it actually ran.
Graphs support:
- DAGs & trees — fan-out/fan-in, parallel branches with joins.
- Branching — router nodes choose a path from node output.
- Loops — cycles with a loop condition; each iteration is recorded and expands in the console (the dashed amber edge in the graph view).
- Subworkflows — child runs with their own graph, linked to the parent.
- Human input — a node can suspend the run durably until a signal resolves it.
Runs are durable executions
A Run is one execution of a workflow version. Execution is durable: state is persisted at every step, so worker crashes, deploys, and restarts don't lose progress — the run resumes exactly where it left off, and completed steps are never re-executed.
Run lifecycle
| Status | Meaning |
|---|---|
Queued | Accepted; waiting for a worker slot. |
Running | Executing. Nodes stream status/metrics live. |
Retrying | A step failed transiently and is being retried under its retry policy. |
Succeeded | Completed; the result is available. |
Failed | Terminal failure — a step exhausted retries or hit a non-retryable error. |
Cancelled | Stopped by a caller. Cooperative first; force-terminate as the escape hatch. |
Deny/config/safety errors are terminal (no retry); transient/capacity/transport errors
retry with backoff. Errors are factual and actionable:
Worker disconnected — retrying in 4s (attempt 2 of 5).
Signals, queries, and human input
- Signals deliver data into a running run (
POST /v1/runs/{id}/signals/{name}) — resolve a human-approval node, feed mid-run parameters, or trigger a branch. - Queries read live run state without perturbing it.
- Input requests are the structured form: a node registers the JSON schema it needs, the
console renders the form, and the matching signal resolves it. Idempotent by
(run_id, request_id).
Cancellation
POST /v1/runs/{id}/cancel is cooperative: in-flight steps get a chance to clean up and
compensations run. Force-terminate exists for operators when cooperative cancel can't proceed.
Cancelling an already-terminal run is a no-op, not an error.
Determinism & replay
The orchestrator replays workflow code against recorded history to recover state, so workflow code must be deterministic — no wall-clock reads, randomness, or direct I/O in the coordinator; side effects live in activities. This is what makes record-replay debugging possible: any run's node-by-node history can be replayed and inspected in the trace inspector.
