A unified, queryable view of the global legal market.
There's no Bloomberg, no PitchBook, no Crunchbase for the legal industry. Harvest is that platform — tracking several hundred of the world's leading law firms and unifying attorneys, practices, offices, lateral movement, and multi-year market history into one current view, refreshed daily.
The legal industry runs on incomplete information.
Anyone making decisions about law firms — hiring one, recruiting from one, investing in one, or competing with one — pieces together intelligence from a dozen disconnected sources: published rankings, individual firm websites, directories, legal press, LinkedIn, internal CRM exports. Nothing unifies it. Nothing keeps it current.
A simple question — which partners moved firms last quarter, and where did they land? — turns into a research project. The information exists, publicly, scattered across hundreds of sites. But there is no single platform that pulls it together, normalizes it, and keeps it fresh.
Hundreds of firms, one current view.
Through a combination of public sources and in-depth research, Harvest tracks the world's leading law firms — covering several hundred of the most significant firms globally — and unifies attorney profiles, practice areas, office footprints, lateral movement, and multi-year market history into one queryable view. Refreshed daily.
Every firm and every attorney carries a multi-year history rather than a snapshot, so the data answers questions about change — growth, movement, hiring patterns, footprint shifts — not just current state.
The collection layer is the moat.
The challenge isn't designing the database — it's the data itself. Law firm websites are notoriously inconsistent and increasingly hostile to automation, and each one structures its people pages differently. There is no shared schema, no canonical directory, no API. Anti-bot defenses block most public-source aggregation outright.
Harvest's collection layer is what makes the rest possible — and the reason no one has built this before.
Three layers: collect, structure, expose.
- Collection. A fleet of ~450 firm-specific extractors, each tuned to a particular firm's site structure, runs on a managed scraping infrastructure. An adaptive proxy layer escalates through three modes — direct request, datacenter IPs, full browser rendering — to get past the anti-bot defenses that block most public-source aggregation. Raw HTML is archived to cloud storage, so new fields can be re-extracted from history without re-crawling.
- Data. Extracted records — attorneys, titles, offices, practice areas, sectors, contact details — flow into a unified PostgreSQL database with semantic search built on pgvector. Public ranking and market context is layered alongside, giving every firm and attorney a multi-year history rather than a snapshot.
- Access. Two interfaces, same data. A web application for exploring — firm and people browsers, comparison views, and an "Ask AI" mode powered by Claude that answers natural-language questions through a curated set of vetted query tools (no raw SQL, no hallucinated data). And an MCP server for integrating — letting clients plug Harvest directly into their own AI agents, copilots, and workflow tools as a live legal-market data source.
Questions that used to take a week.
- "Which firms grew their M&A practice last year?" Year-over-year practice-area headcount across the tracked universe, ranked and filterable by region, tier, or sector focus.
- "Compare two firms on growth and footprint over five years." Side-by-side multi-year trajectories — headcount, partner-to-associate ratio, office openings and closures, practice-area composition.
- "Show me every partner who's moved into Boston in the last 18 months." Lateral movement filtered by destination city, origin firm, practice area, and time window — with sources.
Ask through the web app, or wire Harvest into the agent already on your desk over MCP.
Anyone making decisions about law firms.
- Corporate legal departments evaluating outside counsel.
- Recruiters mapping lateral targets.
- Consultants and investors benchmarking firm performance.
- Firm leaders seeing themselves the way the market sees them.
The legal industry generates an enormous amount of public information about itself. Harvest is the first platform that makes all of it usable from one place — and then asks the next obvious question: what would you actually want to know?