InfraWeaveDocs
Core concepts

Workflows & runs

The durable-execution model — immutable versions, run lifecycle, signals, and replay.

Workflows are versioned, immutable definitions

A Workflow is a graph of nodes — agents, activities, tools, LLM calls, control nodes — stored as a versioned document. Published versions are immutable: you never edit a live definition in place. Changing a workflow means publishing a new version and flipping the active pointer; rolling back is flipping it back. Every run records exactly which version it executed, so a run from last month replays against the definition it actually ran.

Graphs support:

  • DAGs & trees — fan-out/fan-in, parallel branches with joins.
  • Branching — router nodes choose a path from node output.
  • Loops — cycles with a loop condition; each iteration is recorded and expands in the console (the dashed amber edge in the graph view).
  • Subworkflows — child runs with their own graph, linked to the parent.
  • Human input — a node can suspend the run durably until a signal resolves it.

Runs are durable executions

A Run is one execution of a workflow version. Execution is durable: state is persisted at every step, so worker crashes, deploys, and restarts don't lose progress — the run resumes exactly where it left off, and completed steps are never re-executed.

Run lifecycle

StatusMeaning
QueuedAccepted; waiting for a worker slot.
RunningExecuting. Nodes stream status/metrics live.
RetryingA step failed transiently and is being retried under its retry policy.
SucceededCompleted; the result is available.
FailedTerminal failure — a step exhausted retries or hit a non-retryable error.
CancelledStopped by a caller. Cooperative first; force-terminate as the escape hatch.

Deny/config/safety errors are terminal (no retry); transient/capacity/transport errors retry with backoff. Errors are factual and actionable: Worker disconnected — retrying in 4s (attempt 2 of 5).

Signals, queries, and human input

  • Signals deliver data into a running run (POST /v1/runs/{id}/signals/{name}) — resolve a human-approval node, feed mid-run parameters, or trigger a branch.
  • Queries read live run state without perturbing it.
  • Input requests are the structured form: a node registers the JSON schema it needs, the console renders the form, and the matching signal resolves it. Idempotent by (run_id, request_id).

Cancellation

POST /v1/runs/{id}/cancel is cooperative: in-flight steps get a chance to clean up and compensations run. Force-terminate exists for operators when cooperative cancel can't proceed. Cancelling an already-terminal run is a no-op, not an error.

Determinism & replay

The orchestrator replays workflow code against recorded history to recover state, so workflow code must be deterministic — no wall-clock reads, randomness, or direct I/O in the coordinator; side effects live in activities. This is what makes record-replay debugging possible: any run's node-by-node history can be replayed and inspected in the trace inspector.

On this page