CAP CLI Agent Protocol
Discover, drive, and orchestrate any command-line AI agent.
One protocol that lets a single orchestrator coordinate Claude Code, Codex, Opencode, aider, openclaude, Gemini CLI, and any future CLI agent — whether they expose a structured API or not.
The 30-second pitch
Every major AI coding agent in 2026 ships as a CLI. They each have their own protocol — or none at all. Trying to coordinate three of them on the same project today means writing three different adapters, debugging three different output formats, and giving up on real-time interaction.
CAP fixes that. It defines a universal way to
drive CLI agents via PTY (works with anything that runs in a
terminal), uses structured fast-paths
(stream-json / gRPC / ACP /
A2A) when the agent supports them, and adds first-class
multi-agent orchestration on top.
The agent protocol stack
CAP composes with — not competes with — existing protocols.
┌──────────────────────────────────────────────────────────────────┐
│ │
│ Layer Protocol Maintainer │
│ ───────────────────── ────────────── ────────────────────── │
│ agent ↔ tools MCP Anthropic │
│ agent ↔ editor ACP Zed │
│ agent ↔ agent (peer) A2A Google / LF │
│ ───────────────────── ────────────── ────────────────────── │
│ orch ↔ CLI agent CAP cap-protocol.org │
│ │
└──────────────────────────────────────────────────────────────────┘
A single agent may speak all four at once: expose ACP to an editor, A2A to remote peers, consume MCP for tools, and be driven by a CAP orchestrator over PTY.
How it works
PTY is the universal substrate
Any CLI agent — Claude Code, aider, cursor-agent, a future tool nobody's built yet — runs in a terminal. CAP drives it like a human would: typing, reading the screen, sending Ctrl+C, watching for prompts.
Zero protocol negotiation. Day-1 compatibility.
Fast-path when the agent has one
When an agent exposes stream-json, gRPC, ACP-stdio,
or A2A HTTPS+SSE, the CAP driver prefers it for cleaner event
extraction. PTY is the fallback, not the only path.
No re-implementation. Just adapters.
Multi-agent orchestration is core
Coordinate N sub-agents on one project: plan propagation, cross-agent message routing (always orchestrator-mediated and human-auditable), workspace isolation via git worktrees, and budget aggregation with hard cancel.
Not bolted on. Built in.
Every agent declares a Manifest
One TOML file. Lives in the agent's package, in /usr/share/cap-agents/,
or emitted by the agent on --cap-manifest.
; cap-agent.toml — Claude Code example
[agent]
name = "claude-code"
binary = "claude"
profiles = ["coding"]
[startup]
command = ["claude"]
ready_when = { pattern = "Try \"how do I\\?\"" }
[fast_path]
stream_json = ["claude", "-p", "--input-format=stream-json", "--output-format=stream-json"]
acp_stdio = false
grpc = false
[pty]
cols = 200
rows = 50
bracketed_paste = true
sigint_cancels_turn = "graceful"
queued_input_supported = true
[parse]
idle = ["^> $", "^❯ $"]
tool_call_start = "^◉ (?P<tool>\\w+)\\((?P<args>.*?)\\)$"
[capabilities.coding]
fs = { read = true, write = true, scope = "workspace_only" }
terminal = { enabled = true, concurrent_max = 4 }
tool_permission = "interactive"
artifacts = ["diff", "pr_link", "test_result", "plan_doc"]
Want to add a new agent to the ecosystem? Write a manifest and a PTY parser. PR welcome.
The specification
The v1 draft is on GitHub. Two normative documents:
- CAP Core v1 — transport bindings (PTY / stream-json / gRPC / ACP / A2A), core events, manifest, orchestration, A2A interop.
- CAP Profile: Coding v1 — filesystem & terminal reverse RPC, code-specific artifact types, ACP bridge mapping.
Profiles
The core is domain-neutral. Vertical extensions layer on top.
| Profile | Status | Scope |
|---|---|---|
profile/coding |
draft v1 | Software engineering: fs/terminal RPC, diffs, PRs, tests, commits |
profile/devops |
reserved | Infrastructure agents: k8s, terraform, ansible context |
profile/data |
reserved | Data analysis: dataframe artifacts, SQL sessions |
profile/security |
reserved | Audit / pentest agents: finding artifacts, audit logs |
profile/research |
reserved | Literature / scientific agents |
profile/sysadmin |
reserved | System operations agents |
Status
- 2026-05-18 · v1 draft published. Public review open.
- now · Reference Rust implementation in progress (rsclaw).
- next · First-party manifests for Claude Code, Opencode, aider, openclaude, codex.
- future · Linux Foundation governance proposal once 3+ independent implementations exist.
FAQ
How is this different from ACP?
ACP (Zed) is for editor ↔ agent over local stdio. CAP is for orchestrator ↔ CLI agent across local, remote, and fleet. CAP can bridge to ACP agents via its ACP-stdio binding — they don't compete. Different layers.
How is this different from A2A?
A2A is generic agent-to-agent peer coordination. CAP is the vertical for CLI agents specifically, with PTY as a universal substrate that A2A doesn't address. CAP composes with A2A: a CAP-driven agent can be exposed as an A2A peer, and an A2A peer can be driven by a CAP orchestrator.
Does my agent need to support all five bindings?
No. The only required binding is PTY, which works automatically because your agent already runs in a terminal. Fast-paths (stream-json / gRPC / ACP / A2A) are optional optimizations. Most agents implement at most one fast-path.
Can I use CAP for non-coding agents?
Yes. The core protocol is domain-neutral. profile/coding
is just the first vertical. profile/devops,
profile/data, and others are reserved for the same
pattern — the core spec applies unchanged.
How does multi-agent coordination work?
Each sub-agent gets a CAP URN (cap://agent-1).
The orchestrator publishes a master plan whose entries reference
sub-agents via _meta.cap.assigned_to. Sub-agents never
talk directly — all inter-agent messages route through the
orchestrator and appear in the human-auditable log. See spec §10.
What about real-time bidirectional conversation?
PTY enables it as far as the underlying LLM allows. Current LLMs do not support atomic mid-turn interruption (you can't change Claude's mind mid-sentence). CAP provides a cooperative interruption pattern: cancel current turn, preserve partial output, resend with user correction. See spec §10.6.