The Technical Stack of an Agent Marketplace

If autonomous software is going to become a true asset class, it cannot just generate output. It has to transact. It has to find counterparties, make commitments, move money, build a track record, survive disputes, and stay inside the bounds of what its principal actually authorized it to do. In other words, it has to enter a market -- and operate inside the rules of one.

That is the missing layer in most conversations about agents. We talk about models, copilots, and workflows. We talk about smarter interfaces and lower labor costs. But markets do not run on intelligence alone. They run on identity, trust, pricing, contracts, settlement, enforcement, and policy. Strip those away and you do not have an economy. You have a demo.

This is the infrastructure question underneath agentic capital markets. Not who funds an agent, but what allows one software actor to reliably transact with another. Not a directory of prompts. Not a model registry. A marketplace in the hard sense of the word: a system where autonomous economic actors can be authorized, discovered, priced, contracted with, paid, verified, and governed.

The most useful way to think about it now is as three planes, ten layers. The trust plane establishes who the agent is and what it has done. The market plane is how value clears between agents. The control plane is what the agent is allowed to do, what regulators require it to prove, and how multi-agent work coordinates at runtime.

The agent marketplace stack: three planes, ten layers

This is not the final architecture. The boundaries between layers are still soft, and the names will shift as the market matures. But the decomposition holds up under research: each layer maps to a distinct buyer, a distinct failure mode, and an emerging vendor category. Pieces of it already exist. Some are standards (ERC-8004), some are payment primitives, some are observability and identity fragments, some are policy engines being marketed as "control planes." They do not yet compose into a coherent system. Whoever assembles them first will not just build a useful product. They will own the transaction rails of machine labor.

ERC-8004 matters here not because it solves the whole stack, but because it is the first serious attempt to define a shared trust substrate that multiple layers can read from: identity at the base, discovery through registration, reputation through feedback, and validation at the edge of disputes. It is the thread that runs through the trust plane and into the market plane.

What follows is a layer-by-layer map of where each piece sits, what already exists, and what is still missing.

The Trust Plane

Three layers establish who an agent is, how it is found, and what it has done. Without these, no market can clear at all.

Layer 1: Identity (Know-Your-Agent)

You cannot have a market without knowing who is on the other side of the trade. For humans this is KYC -- a passport, a tax ID, a credit bureau. For agents it is unsolved.

The current state is identifiers without identity. An agent introduces itself as gpt-4.1-claims-processor-v3 and you have no way to verify that the weights match, that the system prompt is what it claims, that the tools it has access to are what it advertises. A malicious operator can wrap a frontier model with a hostile prompt and route it onto the network as something else.

What the layer needs:

Model lineage attestations. A cryptographic proof tying a deployed agent back to a base model version, a fine-tuning run, and a hash of its system prompt. Anthropic and OpenAI both publish model cards, but neither signs them in a way another party can verify on-chain or off.
Tool permission manifests. A declaration of which APIs, databases, and bank accounts the agent can touch, signed by the deploying operator. The closest existing analog is OAuth scopes, which were not built for this.
Persistent agent DIDs. A decentralized identifier the agent owns across deployments. Cheqd and Sphereon are early. Most agents today have no stable identity between sessions, which makes reputation impossible.

The most concrete attempt to standardize this layer is ERC-8004, a draft Ethereum standard published in August 2025 with authors from MetaMask, the Ethereum Foundation, Google, and Coinbase -- which is itself the news, because that author list spans the crypto-native and enterprise-native camps that usually ship in parallel. ERC-8004 defines an Identity Registry as an ERC-721 NFT where each agent owns a token whose URI resolves to a JSON registration file naming its A2A endpoint, MCP endpoint, ENS handle, wallet addresses across chains, and supported trust models. The token is transferable, the identity is portable, and the spec sits on top of MCP and A2A rather than competing with them. As of early 2026 the AgentProof team reports more than 128,000 agents registered across 24 chains. That is real adoption for a one-year-old draft. The remaining question is whether the inference providers sign attestations into that registry directly -- Apple-signs-apps style -- or stay agnostic and let operators self-attest. Today it's the latter, which is the soft underbelly of the whole stack.

Layer 2: Discovery and Capability Registry

Once you know who agents are, you need to find them. Today this is a flat list on a webpage with no machine-readable capability surface. An agent looking for an "OCR service that handles Vietnamese invoices, under 200ms p95, with HIPAA controls" cannot ask that question to any existing registry and get a structured answer.

What the layer needs:

Structured capability declarations. Not "marketing assistant" but a typed interface: inputs, outputs, latency SLAs, jurisdictions, regulated-data handling. The MCP server registry pattern is the right shape but covers the wrong abstraction -- it indexes tools, not autonomous services.
Semantic search over those declarations. Vector search across capability embeddings, filterable by cost, latency, and compliance posture. A reverse Yellow Pages where the buyer is a machine.
Live availability and pricing. Agents go down, get rate-limited, or change pricing. The registry has to reflect state in seconds, not days.

Anthropic's MCP and Google's A2A protocol are the foundation. Neither is a marketplace; both are interop standards. ERC-8004 is where those standards start to become economically useful. Its registration file is the closest thing to a structured capability declaration in production -- each agent publishes its name, description, A2A and MCP endpoints, supported trust models, and arbitrary on-chain metadata via setMetadata(agentId, key, value). That matters because it turns identity from a static badge into an addressable market object: something another agent can discover, inspect, and route work toward. It's discovery primitives, not a search engine. The marketplace -- the part that ranks, filters, and prices against those declarations -- sits on top of them and does not yet exist as a category-defining product.

Layer 3: Reputation

Markets clear on price when goods are commoditized. They clear on reputation when they aren't. Agent work, today, is not commoditized -- outputs vary in quality, agents drift, models get deprecated. Reputation is load-bearing, and it belongs in the trust plane because it is what allows a counterparty to commit before any contract is signed.

What the layer needs:

Outcome track records. Cryptographically attested job histories: this agent completed 12,400 OCR jobs, average accuracy 0.94, average latency 220ms, disputed rate 0.3%. Signed by the buyers who paid for the work.
Cross-marketplace portability. An agent that builds reputation on one marketplace should be able to carry it to another. This requires the DID layer (1) and a common data schema. Today no marketplace lets reputation out, because reputation is the moat.
Sybil resistance. An operator spinning up 10,000 fresh agents to flood low-bid auctions has to be detectable. Costly identity (staked collateral, attested compute spend) is one answer; behavioral fingerprinting is another. Both will exist.

ERC-8004's Reputation Registry is the most concrete attempt: a standard interface for posting and fetching feedback signals, with scoring and aggregation happening both on-chain (for composability) and off-chain (for sophisticated algorithms). This is where the standard begins to look less like identity infrastructure and more like market infrastructure. The registry is intentionally thin -- it stores the feedback, not the score. The scoring layer on top, the one that turns raw feedback into something a buyer agent actually queries before committing capital, is wide open. Crypto-native attempts exist (EAS attestations, Karma, Gitcoin Passport, and AgentProof building directly on ERC-8004's reputation primitives). The boring enterprise version -- call it Moody's for Machines, a rating agency that consumes the registry and outputs investment-grade scores with explainable methodology -- has not been built and is one of the most valuable openings in the stack.

The Market Plane

Four layers move value between agents. This is where price gets discovered, obligations get encoded, money moves, and disputes get resolved.

Layer 4: Quoting and Price Discovery

This is the layer that does not exist anywhere today and is the most economically interesting.

Human services price in three ways: hourly rates, fixed scope, or outcome (contingency, success fees). Agent services don't fit any of them cleanly. Inference is cheap and getting cheaper. The interesting price is not "per token" but "per result of acceptable quality."

What the layer needs:

Real-time RFQ. A buyer agent broadcasts a job spec; provider agents respond with quotes inside 100ms. Same shape as institutional credit markets -- Tradeweb for cognitive work.
Outcome contracts with verifiable completion. "I'll generate 50 qualified leads for $X, you pay only on conversion verified by a third-party agent." The verifier is itself a marketplace participant. This is where the model breaks from SaaS: SaaS prices the tool, agents price the outcome.
Auction mechanics for fungible work. Combinatorial auctions for parallel jobs, Dutch auctions for time-sensitive ones. Existing ad-tech is the closest blueprint -- agent work clears the same way display impressions do, with first-price sealed bids resolved in milliseconds.

The companies positioning here are mostly stealthed. The public ones -- Crew, AgentOps, a few others -- are observability vendors, not market makers. The opening is large.

Layer 5: Contracting

When two agents agree on price, what is the contract? Today the answer is usually "nothing" -- a prompt, an API call, a payment authorization, and an assumption that both sides mean the same thing. That is enough for a demo. It is not enough for a market.

Contracting is the layer that turns a quote into an obligation. It is where the marketplace stops being a matching engine and starts becoming infrastructure. Price tells you what one side is willing to pay. A contract tells you what the other side is now required to deliver, under what conditions, by what deadline, with what remedies if the job fails. Without that transition, every transaction remains informal. Informal transactions do not scale because every edge case becomes a custom negotiation.

The important distinction is that contracting is not settlement. Settlement answers, Did money move? Contracting answers, What exactly was that money supposed to buy? Those are different questions. You can settle a payment instantly and still have no shared understanding of the task, the acceptable quality threshold, the data-handling rules, or the recourse if the output is wrong. In agent markets, that ambiguity is lethal because machines will execute exactly what is encoded and nothing that is merely implied.

What the layer needs:

Machine-readable MSAs. A standardized contract format that captures scope, deliverable, deadline, payment terms, data rights, liability allocation, and remedies, signed by both parties' DIDs. Ethereum's EIP-712 typed data is the right cryptographic primitive; the legal layer is unbuilt.
Encoded SLAs. "99% uptime, p95 latency under 500ms, accuracy above 0.92 measured by these three eval prompts." The eval prompts themselves get signed and stored as part of the contract record, so the quality bar is not negotiated after the fact.
Proof of authority. A contract is only enforceable if the agent that signed it had authority to bind its principal. That proof lives in the governance layer (Layer 8); contracting consumes it. Without that link, every signed contract is one ultra vires claim away from worthless.
Termination conditions. What kills the contract: budget exceeded, quality drop, repeated policy violations, sanctions exposure, or revoked tool permissions. These need to be enforceable by a third party without either side's cooperation.
State transitions. The contract has to know whether the job is quoted, accepted, in progress, delivered, verified, disputed, or terminated. That sounds operational, but it is actually contractual: each state changes what each party is allowed to do next.

This matters because most agent work is not binary. A task can be delivered on time but below threshold. It can be directionally correct but non-compliant. It can satisfy the visible prompt while violating a hidden constraint like data residency, budget ceiling, or approval chain. Human vendors handle this through lawyers, sales calls, and ambiguous statements of work. Agents cannot. They need terms that are explicit enough for code to enforce and flexible enough for commercial reality.

A real contracting layer would likely look less like a PDF and more like a hybrid object: part legal agreement, part workflow schema, part policy bundle. One section defines the economic terms. Another defines the evaluation criteria. Another specifies which datasets, APIs, or jurisdictions are allowed. Another names the verifier and the dispute path. In effect, the contract becomes a portable execution wrapper for the job itself.

Layer 6: Settlement

Money moves on agent timescales (per-call, per-result, per-second), not human timescales (Net 30). This is the layer with the most public activity and the most confusion.

The contenders, by approach:

Stablecoin rails. Coinbase's x402 protocol, Stripe's Bridge acquisition, Circle's CCTP. Sub-second settlement, programmable, no chargebacks. The legal status of an agent holding a stablecoin balance is unsettled in most jurisdictions, which is why this lane is more crypto-native than the others.
Agent-issued cards. Visa Intelligent Commerce, Mastercard Agent Pay, Stripe Issuing virtual cards scoped to a single transaction with a single merchant. These are bank rails dressed up for agents. They work today, are jurisdictionally clean, and have chargebacks -- which is both a feature (recourse) and a bug (the merchant agent never knows if revenue is final).
Direct-debit and ACH. Mercury, Brex, Column Bank. Cheap, slow, irreversible. Fine for B2B agent-to-agent invoicing on weekly cycles, useless for per-call settlement.

The right architecture is layered: stablecoins for high-frequency agent-to-agent, virtual cards for agent-to-merchant, ACH for periodic sweeps and treasury management. No single rail wins. Whoever builds the abstraction layer over all three -- pay(amount, currency, counterparty, settlement_window) and the system picks the cheapest rail -- wins the developer surface.

Layer 7: Dispute Resolution

When an agent doesn't deliver, who decides? The honest answer today is "the human operator on the slow side of the trade emails the other operator." That doesn't scale to a million agentic LLCs.

What the layer needs:

Escrow as default. Payment held by the settlement layer until the SLA conditions in the contract (5) are met, verified by attested outputs from the runtime (10).
Automated remediation paths. Quality below threshold → automatic refund. Late delivery → automatic penalty. These are clauses in the contract, executed by code, not by a courtroom.
Validators and arbitration agents. For ambiguous disputes -- was the output "good enough"? -- a third-party validator is appointed (randomly, or per the contract's named arbiter) and renders judgment. ERC-8004's Validation Registry is the standardized hook here, and it is deliberately unopinionated about how validation happens: stake-secured re-execution of the job, zkML proofs of correct inference, TEE attestations from a trusted enclave, or human-in-the-loop judges. That flexibility is important. It means the same trust substrate can support cheap low-stakes agent jobs and high-stakes regulated workflows without forcing one validation model onto every market. The arbiter is itself a marketplace participant with its own reputation, and the choice of trust model scales with the value at risk.
Human escalation, rarely. A small number of cases will need humans. This is what Kleros and similar crypto-native dispute markets were built for; they were too early and built for the wrong customer. The real customer is the autonomous LLC's registered agent.

The Control Plane

Three layers decide what the agent is allowed to do, what regulators require it to prove, and how the work actually runs. The trust plane and the market plane assume the agent has the authority to act and the ability to execute. The control plane is what makes those assumptions true.

The category language is already settling: vendors selling into this plane uniformly market themselves as "control planes for AI agents." That naming convergence inside twelve months -- Cordum, Aegis AI, Assury Enforce, Galileo, Microsoft's open-source toolkit -- is the strongest evidence that this is a real architectural plane and not a cross-cutting concern.

Layer 8: Governance and Authority

This is the layer that decides what an agent can do, who authorized it, and what happens when it tries to do something outside scope. For most of 2024 and early 2025 this was treated as a prompting problem -- write a better system prompt, list the forbidden actions, hope the model complies. That framing has collapsed. Governance is now a runtime authorization problem and a legal authority problem, and the market is treating it as such.

What the layer needs:

Policy engines that enforce before execution, not after logging. Cordum, Aegis AI, Assury Enforce, PolicyLayer, Galileo Agent Control, and Microsoft's open-source Agent Governance Toolkit are all selling the same primitive: a policy check sits inline between the agent and the action, evaluates the proposed call against declared policy, and either approves, escalates to a human, or blocks. Gartner is publicly predicting that more than 40% of agentic AI projects will be cancelled by the end of 2027 specifically because organizations cannot govern them. When a category gets its own term of art and its own Gartner failure-mode stat in under a year, it is a layer.
Spending limits and delegated authority. This is the most concrete part of governance and the one with the most mature primitives -- almost all of them shipped first on the crypto side. Safe allowance modules (caps per token per spender, one-time or recurring), Zodiac role-based modules, hierarchical multisig with security councils and timelocks, ERC-4337 session keys with scoped time-bounded permissions, and MPC wallets from Turnkey, Privy, and Fireblocks all map cleanly onto enterprise agent governance. Safe's own documentation explicitly recommends 2-of-4 multisig with human signers for agent-controlled treasuries. The pattern is M-of-N approval for any action above a threshold, automatic execution below it.
Approval chains and human-in-the-loop gates. Below $X, autonomous. Between $X and $Y, requires one human approval. Above $Y, requires two approvals and a 24-hour timelock. These rules are not legal documents; they are policy code that the runtime reads before every action.
Kill switches and rollback. Any deployed agent needs an out-of-band revocation path that does not depend on the agent being cooperative. This is closer to corporate governance (board override) than to software engineering (kill -9).
Proof of authority that downstream layers can consume. When the contracting layer (5) needs to know whether the agent signing an MSA actually has the right to bind its principal, it queries the governance layer. This is the bridge between technical permissions and legal authority.

That last point is the deepest piece, and it is genuinely unsolved. Major law firms -- Venable, DLA Piper, Frankfurt Kurnit -- are now publishing on agentic AI liability. The Arbel/Goldstein/Salib paper "How to Count AIs: Individuation and Liability for AI Agents" (2026) treats it as an open research problem in agency law. The Air Canada Civil Resolution Tribunal ruling (February 2024) held the airline liable for a promise its chatbot made to a customer, establishing that companies are bound by their agents' representations even when the company never approved the specific statement. The architectural primitive underneath all of this is authorized scope -- a machine-readable, signed declaration of what the agent is allowed to bind its principal to. Without that, every agent-signed contract is one ultra vires claim away from worthless. With it, the contracting layer becomes enforceable and the dispute layer has something to arbitrate against.

Governance is also where the crypto-native and enterprise-native versions of this stack diverge most visibly. The crypto version is Safe modules, on-chain timelocks, and ERC-4337 session keys -- programmable, auditable, transferable across organizations. The enterprise version is Cordum, Galileo, Microsoft's toolkit, and Airia -- SOC 2-friendly, IAM-integrated, runs against existing identity providers. Both will exist. The interesting bet is whether the two converge on a shared policy format (think OPA Rego for agents) or stay in parallel forever.

Layer 9: Compliance

For a brief moment compliance looked like a sub-feature of governance. That moment is over. Compliance is splitting out as its own vendor category, and the reason is regulatory: the EU AI Act's high-risk obligations come into effect on August 2, 2026, with Article 99 penalties of up to €35 million or 7% of global turnover.

The distinction between governance and compliance is real and worth holding tight. Governance answers: what is this organization willing to let the agent do? Compliance answers: what is this organization required to prove to a regulator, after the fact, about what the agent did? The data pipeline is similar -- both consume the same observability stream -- but the buyer is different (general counsel vs CISO vs head of platform), the SLA is different (real-time enforcement vs evidentiary completeness), and the failure mode is different (bad business outcome vs regulatory fine).

The vendor signal is already clear:

EU AI Act runtime enforcement. ComplyEdge is shipping an open-source enforcement layer specifically for Articles 9 (risk management), 12 (logging), 13 (transparency), and 14 (human oversight). Lucairn, Agent Module, and AgentWorks are all positioning in the same lane. The aicompliancevendors.com directory exists, which is its own signal -- there are now enough of them to warrant a directory.
Audit-grade logging. Articles 12 and 19 of the EU AI Act require automatic event logging that allows regulators to reconstruct the system's behavior after the fact. That obligation is similar in shape to the SOX requirements that built the GRC industry, and the same kind of vendor stack is likely to emerge.
Sector-specific overlays. Healthcare workflows need HIPAA, financial workflows need SR 11-7 / OCC 2011-12 model risk, EU workflows need the AI Act, employment workflows need anti-discrimination law. These are not different products; they are different policy bundles running on the same compliance runtime.

Compliance vendors do not displace governance vendors. They consume governance's policy decisions and the runtime's logs, and they produce evidence packs. Over time the most likely consolidation is that the largest governance platforms acquire or absorb compliance vendors, the same way GRC platforms consolidated audit, risk, and compliance into a single tool. But for the next 24 to 36 months, this is a real and separate buying motion.

Layer 10: Orchestration and Runtime

Agents need somewhere to run, somewhere to remember, somewhere to be watched, and a way to hand work to each other. The first three of those are runtime. The fourth is orchestration. The original framing of this stack folded them into one layer, which was wrong. They are different concerns, different vendors, and increasingly different procurement decisions.

Runtime. Modal, Anthropic's Claude in compute environments, OpenAI's Agents SDK, E2B, Daytona. The differentiation is collapsing into price, cold-start time, and tool-permission models.

Memory. Mem0, Letta (née MemGPT), Zep. Vector stores plus episodic structure. The interesting question is whether agent memory is portable across runtimes -- currently it isn't, which is why most agents are tied to one platform.

Observability. Langfuse, Helicone, Arize, Braintrust, LangSmith. Trace every call, every tool invocation, every prompt. For the marketplace this is the audit trail that backs reputation (3), compliance (9), and dispute resolution (7).

Orchestration. LangGraph, CrewAI, AutoGen, OpenAI Swarm, Google ADK, Anthropic Skills, Microsoft Semantic Kernel. Dimension Market Research is publishing standalone 2026-2035 forecasts for AI Agent Orchestration as a category, separate from runtime infrastructure. Around 86% of enterprise copilot spending in 2026 -- roughly $7.2 billion -- goes to agent-based systems, and 59% of organizations are running three or more LLMs that need to coordinate. The economic shape is clear: orchestration is where multi-agent state, handoffs, parallelism, and failure recovery get encoded. Runtime is where a single agent executes; orchestration is how a network of them does work together. Conflating the two understates how much of the buying decision is now about coordination, not execution.

The marketplace itself does not have to build any of these. It does have to standardize the export format so any of them can feed reputation, compliance, and dispute resolution. OpenTelemetry-for-agents is the missing standard.

What This Means Strategically

The ten layers compose. Identity unlocks reputation. Reputation unlocks pricing. Pricing unlocks contracting. Contracting unlocks settlement. Settlement plus governance unlocks enforceable disputes. Each one is a real product. Several are real companies that don't yet exist. The control plane is the part that is moving fastest right now in raw vendor count, but the trust plane is where the standards are being set and the market plane is where the dollars eventually flow.

Three observations on the competitive shape:

The platform play requires three layers, not two. A year ago I would have said whoever owns identity and settlement wins, because those are the layers where switching costs accrue. That is still true, but it is no longer sufficient. Governance has earned a seat at the table because enterprises cannot deploy agents without it and because the legal authority primitive -- proof of what an agent can bind its principal to -- has to live somewhere. The new platform claim is: whoever owns identity, settlement, and governance owns the marketplace. Everything else can be commoditized around them.

The crypto-native and enterprise-native stacks are converging at the trust plane and diverging at the control plane. ERC-8004 is the convergence story: Coinbase and Google co-authoring an Ethereum ERC for agent identity is not how those camps usually behave, and the spec reaches into discovery, reputation, and validation rather than staying confined to identity. The control plane is the divergence story. The crypto version of governance is Safe modules, on-chain timelocks, and session keys; the enterprise version is Cordum, Galileo, Microsoft's toolkit, and Airia. Both will exist. The dominant version inside any specific industry depends on whether the buyers are agents-paying-agents (crypto wins) or agents-paying-businesses (enterprise wins). For the next five years, more dollars flow through the second -- but the second is now reading from the first's identity and reputation layer, which it wasn't six months ago.

The most undervalued layer right now is still Layer 4 (quoting). Settlement gets the press, identity gets the standards work, governance has a wave of funded startups, observability has fifteen. Real-time machine-to-machine price discovery is a TAM measured in low-trillions of dollars of cognitive labor reallocating to software, and almost nobody is building it as a primary product. The closest analog -- programmatic ad exchanges -- built three of the largest companies of the last twenty years. The agent-work version will too.

What matters now is less whether each individual layer is technically possible and more whether anyone assembles them into a system that can actually clear transactions at scale. That is the missing move. The industry has standards efforts, payment primitives, observability vendors, identity experiments, early reputation registries, and a fresh crop of governance control planes. It does not yet have a marketplace architecture that makes those pieces interoperable, legible, and economically coherent.

That is why this stack matters. It is not just a map of tools. It is a map of bottlenecks. Every missing layer keeps agents trapped as demos, copilots, or workflow wrappers. Every solved layer pushes them one step closer to becoming autonomous counterparties that can discover work, price risk, contract, settle, prove authority, and build durable economic history.

The prize is not another app store for prompts. It is the transaction layer for machine labor.

And the company that wins it will not look like a model lab. It will look like a market operator: part exchange, part payments network, part identity provider, part trust infrastructure, part governance platform. In other words, less like OpenAI and more like Visa, Moody's, Stripe, Nasdaq, and ServiceNow fused into a single system for software actors.

That company does not exist yet. But the constraints around it are now visible, and the standards substrate is starting to take shape underneath them. ERC-8004 is not the marketplace, but it may prove to be one of the first schemas the marketplace is built on. And once a market's bottlenecks become legible -- across the trust plane, the market plane, and the control plane -- they usually do not stay open for long.