Draft 2026-05-18 · v1 · Public review open

CAP CLI Agent Protocol

Discover, drive, and orchestrate any command-line AI agent.

One protocol that lets a single orchestrator coordinate Claude Code, Codex, Opencode, aider, openclaude, Gemini CLI, and any future CLI agent — whether they expose a structured API or not.

The 30-second pitch

Every major AI coding agent in 2026 ships as a CLI. They each have their own protocol — or none at all. Trying to coordinate three of them on the same project today means writing three different adapters, debugging three different output formats, and giving up on real-time interaction.

CAP fixes that. It defines a universal way to drive CLI agents via PTY (works with anything that runs in a terminal), uses structured fast-paths (stream-json / gRPC / ACP / A2A) when the agent supports them, and adds first-class multi-agent orchestration on top.

The agent protocol stack

CAP composes with — not competes with — existing protocols.

┌──────────────────────────────────────────────────────────────────┐
│                                                                  │
│  Layer                  Protocol         Maintainer              │
│  ─────────────────────  ──────────────   ──────────────────────  │
│  agent ↔ tools          MCP              Anthropic               │
│  agent ↔ editor         ACP              Zed                     │
│  agent ↔ agent (peer)   A2A              Google / LF             │
│  ─────────────────────  ──────────────   ──────────────────────  │
│  orch ↔ CLI agent       CAP              cap-protocol.org     │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

A single agent may speak all four at once: expose ACP to an editor, A2A to remote peers, consume MCP for tools, and be driven by a CAP orchestrator over PTY.

How it works

01

PTY is the universal substrate

Any CLI agent — Claude Code, aider, cursor-agent, a future tool nobody's built yet — runs in a terminal. CAP drives it like a human would: typing, reading the screen, sending Ctrl+C, watching for prompts.

Zero protocol negotiation. Day-1 compatibility.

02

Fast-path when the agent has one

When an agent exposes stream-json, gRPC, ACP-stdio, or A2A HTTPS+SSE, the CAP driver prefers it for cleaner event extraction. PTY is the fallback, not the only path.

No re-implementation. Just adapters.

03

Multi-agent orchestration is core

Coordinate N sub-agents on one project: plan propagation, cross-agent message routing (always orchestrator-mediated and human-auditable), workspace isolation via git worktrees, and budget aggregation with hard cancel.

Not bolted on. Built in.

Every agent declares a Manifest

One TOML file. Lives in the agent's package, in /usr/share/cap-agents/, or emitted by the agent on --cap-manifest.

; cap-agent.toml — Claude Code example

[agent]
name          = "claude-code"
binary        = "claude"
profiles      = ["coding"]

[startup]
command       = ["claude"]
ready_when    = { pattern = "Try \"how do I\\?\"" }

[fast_path]
stream_json   = ["claude", "-p", "--input-format=stream-json", "--output-format=stream-json"]
acp_stdio     = false
grpc          = false

[pty]
cols          = 200
rows          = 50
bracketed_paste = true
sigint_cancels_turn = "graceful"
queued_input_supported = true

[parse]
idle          = ["^> $", "^❯ $"]
tool_call_start = "^◉ (?P<tool>\\w+)\\((?P<args>.*?)\\)$"

[capabilities.coding]
fs            = { read = true, write = true, scope = "workspace_only" }
terminal      = { enabled = true, concurrent_max = 4 }
tool_permission = "interactive"
artifacts     = ["diff", "pr_link", "test_result", "plan_doc"]

Want to add a new agent to the ecosystem? Write a manifest and a PTY parser. PR welcome.

The specification

The v1 draft is on GitHub. Two normative documents:

Profiles

The core is domain-neutral. Vertical extensions layer on top.

ProfileStatusScope
profile/coding draft v1 Software engineering: fs/terminal RPC, diffs, PRs, tests, commits
profile/devops reserved Infrastructure agents: k8s, terraform, ansible context
profile/data reserved Data analysis: dataframe artifacts, SQL sessions
profile/security reserved Audit / pentest agents: finding artifacts, audit logs
profile/research reserved Literature / scientific agents
profile/sysadmin reserved System operations agents

Status

FAQ

How is this different from ACP?

ACP (Zed) is for editor ↔ agent over local stdio. CAP is for orchestrator ↔ CLI agent across local, remote, and fleet. CAP can bridge to ACP agents via its ACP-stdio binding — they don't compete. Different layers.

How is this different from A2A?

A2A is generic agent-to-agent peer coordination. CAP is the vertical for CLI agents specifically, with PTY as a universal substrate that A2A doesn't address. CAP composes with A2A: a CAP-driven agent can be exposed as an A2A peer, and an A2A peer can be driven by a CAP orchestrator.

Does my agent need to support all five bindings?

No. The only required binding is PTY, which works automatically because your agent already runs in a terminal. Fast-paths (stream-json / gRPC / ACP / A2A) are optional optimizations. Most agents implement at most one fast-path.

Can I use CAP for non-coding agents?

Yes. The core protocol is domain-neutral. profile/coding is just the first vertical. profile/devops, profile/data, and others are reserved for the same pattern — the core spec applies unchanged.

How does multi-agent coordination work?

Each sub-agent gets a CAP URN (cap://agent-1). The orchestrator publishes a master plan whose entries reference sub-agents via _meta.cap.assigned_to. Sub-agents never talk directly — all inter-agent messages route through the orchestrator and appear in the human-auditable log. See spec §10.

What about real-time bidirectional conversation?

PTY enables it as far as the underlying LLM allows. Current LLMs do not support atomic mid-turn interruption (you can't change Claude's mind mid-sentence). CAP provides a cooperative interruption pattern: cancel current turn, preserve partial output, resend with user correction. See spec §10.6.