What is InfraWeave?
The control plane for building, running, observing, and debugging multi-agent workflows at scale.
Where a typical app calls one LLM, InfraWeave runs fleets of agents arranged as trees and DAGs — durable, observable, replayable, and governed. You write agents as plain code; the platform gives you the runtime, security, observability, and improvement infrastructure that production agentic systems need, so you don't re-solve the same cross-cutting problems each time.
Capability areas
| Area | What it does |
|---|---|
| Workflow orchestration | Durable execution: trees/DAGs, branching, loops, shared state, signals & queries. Crash mid-run and resume exactly where you left off. |
| Run visualizer | Node-graph view of a run — every agent/activity a node, with status, latency, cost & token overlays. Loop iterations expand as a list. |
| Observability | OpenTelemetry traces joined to runs; per-tenant / per-workflow / per-agent / per-node metrics. |
| LLM routing | Pluggable models behind one client boundary; alias-based, layered config (platform → tenant → agent → experiment). |
| A/B & RL | Deterministic variant assignment; reward pipelines that learn from feedback joined to traces. |
| Testing | Agents tested in isolation; workflows tested end-to-end. Record-replay, simulation, eval. |
| Security & secrets | Agent- and workflow-level permissions; built-in secret vault; encrypted worker I/O. Multi-tenant by design. |
| MCP & tools | Agents call MCP servers and arbitrary APIs through a governed tool layer — allowlisted, policy-checked, credential-injected. |
| RAG / knowledge | Vector + graph connectors, hybrid search, layout-aware ingestion. |
| Client SDK / API | Run-centric REST: start, status, result, cancel, signal, stream events. |
The two surfaces
- The Console — the web app where humans build, run, observe, evaluate, and administer agents: the workflow graph, runs list, trace inspector, and observability dashboards.
- The SDK & API — how programmatic callers define and trigger workflows.
Quickstart
Define an agent, start a run, watch it live — in five minutes.
Workflows & runs
The durable-execution model: versions, statuses, signals, replay.
REST API
The run-centric HTTP surface: start, status, result, cancel, signal, stream.
Design principles
- Multi-tenant everywhere. Tenant is always explicit; deep links carry it.
- Immutability + supersede. You edit by creating new versions and flipping pointers — never in place. The version graph is first-class.
- Fail-closed by default. When the platform denies, it explains why and offers the next step.
- One place to see anything. Every event, policy decision, and model call flows into the audit, trace, and cost surfaces.
