← labs/interop
experiment · integration layer · brief

MCP is the integration layer your AI strategy lives or dies on.

Model Context Protocol is moving from interesting standard to where every serious AI workspace in legal converges. The protocol is real, and the major clients pull in the same direction. But tools-only MCP servers aren't enough, and renting the integration layer piecemeal from rotating vendors is a structural mistake. This is the brief for firm leaders on what MCP actually offers, why a firm-hosted gateway is the right shape, and what safe means in legal-specific terms.

§ 01 — premise

MCP is a two-sided standard, and that changes the buyer questions.

Model Context Protocol is to AI assistants what ODBC was to databases or SAML was to identity — a vendor-neutral interface that lets any compliant client talk to any compliant server. The problem it solves is the M-by-N problem: with M AI applications and N data sources, the naive integration cost is M × N. With MCP, it collapses to M + N. The firm switches assistants in two years, or runs two side by side, and the integration work doesn't get redone.

The less-obvious half is that MCP is a two-sided standard. A compliant client has obligations — declaring capabilities accurately, honoring server requests for elicitation or sampling, rendering structured results faithfully. A compliant server has corresponding obligations. Both sides have to do their part, and the protocol uses SHOULD almost everywhere it could use MUST — by design, so it can evolve quickly across many vendors. The cost is real: a server author can do everything right and still produce inconsistent quality across clients.

The buyer implication. Supports MCP is now table stakes. The better questions are which parts of the spec does this implement, with what fidelity, and what happens when the other side does or doesn't reciprocate? That applies to evaluating an AI workspace and to evaluating a vendor MCP server.

§ 02 — beyond tools

Tools tell the model what. Prompts, resources, and instructions tell it how.

A tool definition tells the model what it can do. It does not tell the model when to do it, why, or how to interpret the result. In any specialized domain — and legal is the canonical specialized domain — that contextual layer is where most of the value lives.

Most published MCP servers use about 30% of the spec. They expose tools and stop. The rest of the spec is where the firm's intellectual capital fits:

  • Server instructions — system-prompt-level guidance the server returns at initialization. This server provides access to the firm's billing system. Always confirm matter context. Cite by client-matter ID. Do not summarize beyond what the user asked for. The most direct mechanism in the spec, and the most unevenly implemented by clients — worth budgeting for reinforcement through other channels.
  • Prompts — named, parameterized templates the server publishes. nda_review, matter_intake, precedent_search — each one encapsulating the firm's standard playbook for that task.
  • Resources — structured references the server can hand to the model: redline conventions, reps and warranties libraries, privilege-marking rules, opinion-letter templates the deal team uses.

When you scope an MCP server, plan the instructions, prompts, and resources before you plan the tools. The tools are the easy part. Everything around them is how the firm expresses its standards in a form the assistant can actually follow — and the part that compounds: every prompt or resource you add makes every existing tool more useful in every client that supports it as designed.

§ 03 — own the layer

The firm-hosted gateway pattern.

Survey the vendor MCP servers available today and the same pattern repeats: five to ten thinly-described tools, no prompts, no resources, no elicitation, sometimes no auth worth the name. Loose schemas, minimal structured output. Documentation that explains how to install the server but not how to use it well. Plug three of those into an assistant and the failure modes compound:

  • Capability collisions. Two servers expose search. Three expose get_user. The model picks wrong, often.
  • Auth fragmentation. Each server has its own OAuth dance, its own token lifetime. The user gets prompted constantly, or integrations silently degrade.
  • Inconsistent identity. The server doesn't actually know which user is asking; it only knows which OAuth client connected. Ethical walls become hopeful, not enforced.
  • Audit blind spots. Each vendor logs differently, in different places, with different retention. There is no single ledger of what the assistant did today on whose behalf.

The right architectural pattern is a firm-owned MCP gateway: one HTTPS endpoint, trusted by the firm's AI clients, that aggregates internal capabilities and selectively proxies vetted external ones. Same logic that produced the API gateway pattern fifteen years ago.

What the gateway centralizes: identity (SSO once, propagate downstream), authorization (ethical walls and role-based access applied once, not re-implemented per vendor), audit (every tool call into a single ledger that maps to the firm's existing supervision tooling), observability, policy (PII redaction, privilege markers, blocklists, retention rules), rate limiting, and tool registry filtering — show the assistant only the tools that this user, in this matter context, is allowed to use. The model can't misuse a tool it never sees.

Done well, the gateway becomes the firm's official AI integration substrate. Vendors plug into it, not into every assistant separately. New AI products that come along get one well-governed front door instead of having to rebuild trust from scratch.

§ 04 — not just AI

MCP is a general-purpose substrate. The AI client is one consumer.

Nothing in MCP requires the consumer to be an LLM. The protocol itself is JSON-RPC over HTTP with capability discovery, structured input and output schemas, and OAuth-based auth. A workflow engine, an RPA tool, a no-code platform, a BI dashboard, an internal script — all can speak MCP just as well as an agent can.

Once a firm has a well-built MCP server in front of its time-and-billing system, that server can be consumed by the AI assistant of choice, by a Friday-afternoon scheduled job (pull WIP exceeding $50k, route to the responsible partner with a draft narrative), by a no-code partner portal, by a BI tool, by an internal automation script — all using the same auth, audit, schema validation, and access controls. The integration work compounds across consumers, deterministic and AI alike.

This also pushes back productively on agent-autonomy skepticism. Firms appropriately cautious about letting an AI assistant run loose against firm systems can start with deterministic automation and broaden later. The integration layer the workflow team builds today is the same layer the AI client gets tomorrow. Same tools, same data, same audit trail — different consumers, different levels of autonomy, all governed centrally. For procurement, this reframes the buy-versus-build conversation on automation entirely: build the integration once, let each new tool consume it.

§ 05 — when wrapping isn't enough

Intelligence belongs in the server, not just on top of an API.

The most common pattern in published MCP servers — and the most common reason they disappoint — is the assumption that a useful server is a thin adapter around an existing API. Wrap the endpoints, declare some tools, ship it. Two examples make the limit concrete:

EDGAR. A naive fetch the latest 10-K tool returns a 200-page document, fully formatted, that detonates the assistant's context window before the user has asked a substantive question. The model can't reason about what it can't fit. The fix is intelligence in the server: structured extraction (filing date, period, key figures), section-aware retrieval (just the MD&A, just the risk factors), and summarization that respects what the user actually asked. That last capability needs an LLM — and using MCP sampling (the server asking the client's model to do the work) avoids building model-vendor lock-in into the integration layer.

Case law. A wrapper over a search API inherits whatever retrieval quality the upstream provider chose to ship. If their search is keyword-only, your tool returns keyword results. The quality ceiling of your server is set by whoever owns the index. Building your own index against the same source data — with hybrid retrieval, your own reranker, your own per-jurisdiction tuning — is what lets the firm own the most consequential design choice in the stack.

Both examples point in the same direction. The firm gateway is the natural place to put this intelligence, because it's the only place that survives changes in upstream APIs and AI clients. Vendors will rotate. Models will change. The intelligence in the integration layer is what compounds.

§ 06 — what safe means

Generic security guidance is necessary and far from sufficient.

HTTPS, OAuth, IP allowlisting, audience-bound tokens — table stakes. The legal-specific layer is where most firms will need to do real work.

  • Confidentiality and ethical walls. The server is the only thing the firm controls in the path. It must enforce matter-level access. Tokens should carry user identity, not just tenant identity, and the server should consult the firm's authoritative wall source on every call. This is annoying to build. It is also the only defensible posture.
  • Privilege. Privileged content can flow into a tool call's arguments and out of a tool's results. Apply gateway-side scrubbing on tool arguments before they leave for external services, and on results before they reach the model. If the server uses sampling, understand which model the client routes to and whether the firm's privilege posture allows it.
  • Audit and supervision. Every tool call should produce a record — actor, matter, tool, arguments, result summary, latency, status — landing in the firm's audit infrastructure, not in a vendor's logging pipeline. What did the AI do on this matter? should be retrievable in minutes, not weeks.
  • Document handling. MCP supports file in/out as base64 blobs. That bypasses normal DMS controls if you let it. Decide explicitly: where do tool-returned files land, do tool-sent files leave the firm's environment, what's the versioning and retention story?
  • Vendor due diligence. When you do connect a vendor MCP — and you will — apply the questions you'd apply to any other data processor: hosting region, auth flows, what's logged for how long, sub-processors, attestations. The gateway makes this tractable: vendor servers connect to the gateway, not directly to user clients, and the gateway is where you apply per-vendor policy uniformly.

Worth flagging: independent security research has documented that a meaningful share of publicly-deployed MCP servers ship with severe vulnerabilities — command injection, server-side request forgery, arbitrary file access, indirect prompt injection through tool descriptions. None of this is reason to walk away from MCP. It is reason to be selective about which servers a firm exposes its agents to, and to put a governed gateway between any agent and any external server.

§ 07 — observability

Audit answers what happened. Observability answers how to make it better.

Most discussions of MCP collapse the two together, and as a result, the operational improvement loop never gets built. They're different problems with different consumers.

Useful observability for an MCP layer captures, at minimum:

  • Request and response payloads, with appropriate redaction at capture time, not at query time. The right answer is rarely log nothing; it's log carefully, and put the controls upstream of the log.
  • Latency and failure modes broken down by type — auth, schema, upstream timeout, internal error, rate limit. The breakdown is what tells you whether the bottleneck is the firm's directory service, the vendor's API, or the model's understanding of how to call your tool.
  • Tool selection patterns. Which tools is the model picking, which is it consistently failing to pick when it should, which arguments is it getting wrong? A tool that's never called is either misnamed, mis-described, or unneeded — you can only tell which from telemetry.
  • User feedback. Thumbs, free-text correction, escalation to human. The single most undervalued signal in legal AI deployments. Without a structured feedback channel attached to specific tool calls, you have no way to tell whether last month's retrieval changes made things better or worse.

Most of the interesting failure modes in legal AI deployments don't surface in the model's output; they surface in the interaction between the model and the integration layer. The tool called with wrong arguments. The search that returned the right document buried at position 47. The auth flow that quietly failed and produced an empty result the model treated as authoritative. Without observability on the integration layer, those failure modes are invisible. With it, they become a backlog of concrete improvements.

§ 08 — where to begin

A practical starting point.

For a firm beginning to build seriously on MCP, the sequence that works:

  • Pick one high-value internal source. Time and billing or the DMS. Both have wide internal demand and well-understood data shapes.
  • Build one MCP server that does it well. Streamable HTTP transport, OAuth 2.1 with audience-bound tokens, tools with proper input and output schemas, structured content returns, and at least one prompt and one resource. Not three tools on top of an undocumented surface — a real piece of work.
  • Connect it to one AI client. Use it. Discover what's missing. Iterate on the prompts and resources, not just the tools. The biggest wins on day 30 are usually in the prompt-side context, not the tool surface.
  • Stand up the gateway when you add the second source. Don't wait. The discipline of everything goes through the gateway is much easier to install before there are six things to migrate.
  • Log everything from day one — audit and observability both. Compliance audit answers what happened; observability answers what should we improve next. Both are expensive to retrofit. Observability is more expensive, because you can't recover signals you never captured.
  • Treat external vendor MCPs as proxied, never direct. When a vendor publishes an MCP server, route it through the gateway. The vendor gets reach; the firm keeps control.
  • Plan for sampling and elicitation, but don't depend on them. Build features that use them where supported and degrade gracefully where not. The MCP client landscape will be heterogeneous for a while.

The firms that win at AI-augmented legal work over the next three years will not be the ones with the most vendor connectors. Counting integrations is a 2010s metric. The firms that win will be the ones with the cleanest, most policy-aware integration layer — one where every assistant the firm uses, present and future, plugs into the same well-governed front door, and where the firm's standards, walls, audit, and conventions are encoded in software the firm controls.

MCP is the standard that makes that integration layer portable across AI vendors. The question for firm leadership isn't whether to adopt MCP. It's whether to own the integration layer or rent a fragmented version of it from a rotating cast of point vendors. Owning it is more work in year one. It is also the only version that compounds.

Thinking about an integration-layer strategy? We're glad to talk it through.