InfraWeaveDocs

What is InfraWeave?

The control plane for building, running, observing, and debugging multi-agent workflows at scale.

Where a typical app calls one LLM, InfraWeave runs fleets of agents arranged as trees and DAGs — durable, observable, replayable, and governed. You write agents as plain code; the platform gives you the runtime, security, observability, and improvement infrastructure that production agentic systems need, so you don't re-solve the same cross-cutting problems each time.

Capability areas

AreaWhat it does
Workflow orchestrationDurable execution: trees/DAGs, branching, loops, shared state, signals & queries. Crash mid-run and resume exactly where you left off.
Run visualizerNode-graph view of a run — every agent/activity a node, with status, latency, cost & token overlays. Loop iterations expand as a list.
ObservabilityOpenTelemetry traces joined to runs; per-tenant / per-workflow / per-agent / per-node metrics.
LLM routingPluggable models behind one client boundary; alias-based, layered config (platform → tenant → agent → experiment).
A/B & RLDeterministic variant assignment; reward pipelines that learn from feedback joined to traces.
TestingAgents tested in isolation; workflows tested end-to-end. Record-replay, simulation, eval.
Security & secretsAgent- and workflow-level permissions; built-in secret vault; encrypted worker I/O. Multi-tenant by design.
MCP & toolsAgents call MCP servers and arbitrary APIs through a governed tool layer — allowlisted, policy-checked, credential-injected.
RAG / knowledgeVector + graph connectors, hybrid search, layout-aware ingestion.
Client SDK / APIRun-centric REST: start, status, result, cancel, signal, stream events.

The two surfaces

  1. The Console — the web app where humans build, run, observe, evaluate, and administer agents: the workflow graph, runs list, trace inspector, and observability dashboards.
  2. The SDK & API — how programmatic callers define and trigger workflows.

Design principles

  • Multi-tenant everywhere. Tenant is always explicit; deep links carry it.
  • Immutability + supersede. You edit by creating new versions and flipping pointers — never in place. The version graph is first-class.
  • Fail-closed by default. When the platform denies, it explains why and offers the next step.
  • One place to see anything. Every event, policy decision, and model call flows into the audit, trace, and cost surfaces.

On this page