The 30-year gap
Virtual data rooms have existed since the late 1990s. The category was invented to digitize the paper-stuffed "deal rooms" that M&A bankers, corporate lawyers, and due-diligence teams flew between for weeks at a time during big transactions. The first VDRs. Merrill, IntraLinks, Datasite (then known as Merrill DataSite). Solved a real problem: secure, auditable, multi-party access to confidential documents at a distance.
The market they created is now estimated at over $2 billion in annual revenue, growing in the high single digits. And yet, until 2023, the entire category looked the same as it did in 2003:
- Sales-led contracts; no self-serve sign-up.
- Browser-only interfaces with drag-and-drop folder views.
- No public REST API. No command-line tool. No webhooks. No SDK.
- Per-seat licensing with five-figure annual floors.
- Closed source; opaque infrastructure; vendor lock-in by default.
Every other piece of B2B SaaS went programmable in the 2010s. Stripe rewrote payments. Twilio rewrote telecom. Vercel rewrote hosting. Algolia rewrote search. Auth0 rewrote authentication. DocuSign rewrote contracts. Even Salesforce. The platonic ideal of sales-led software. Exposed a usable API. Data rooms didn't. The category sat out the API revolution.
What's a virtual data room, exactly?
Strip the legalese: a virtual data room is a permissioned bucket of documents with per-recipient access tracking. Practically, it's "an S3 bucket plus an ACL plus an audit log plus a viewer that watermarks PDFs and counts how long each recipient looked at each page." The differentiating capabilities versus generic file-sharing (Dropbox, Google Drive, OneDrive) are:
- Per-link policy. One folder of documents, many share links. Each with different gating: password, expiry, email verification, download permission, watermark.
- Granular audit. Every view event captures who, when, from what IP, on which page, for how long. This is the evidentiary backbone of due diligence.
- Forensic watermarking. The viewer overlays the recipient's identifier on every page. If a confidential PDF leaks, the watermark identifies the source.
- Document-grade viewer. Streamed page-by-page rendering, no full-file download by default, blocked print/copy at the browser level.
- Access lifecycle. Revoke a link and the recipient's access dies server-side, even on already-loaded tabs.
These primitives matter for any workflow where documents are sensitive and recipients are identified. And where you need to know after the fact who looked at what.
What a "developer data room" actually means
A developer data room. Or programmable VDR, API-first data room, headless data room. Is the same primitive exposed through code. Concretely: every operation a human can do in the dashboard is also available as a REST endpoint, a CLI command, an MCP tool, and a typed SDK method. The full operation surface includes:
Create dataroom POST /v1/datarooms
List datarooms GET /v1/datarooms
Update dataroom PATCH /v1/datarooms/:id
Delete dataroom DELETE /v1/datarooms/:id
Upload document POST /v1/documents (multipart or S3 presigned)
Version document POST /v1/documents/:id/versions
Attach doc to dataroom POST /v1/datarooms/:id/documents
Mint share link POST /v1/links
Configure password / expiry PATCH /v1/links/:id
Revoke link instantly DELETE /v1/links/:id
Stream view events GET /v1/links/:id/views
Aggregate analytics GET /v1/datarooms/:id/analytics
List visitors GET /v1/visitors
Subscribe to webhooks POST /v1/webhooksEvery endpoint is mirrored as a CLI subcommand (papermark datarooms create), an MCP tool (create_dataroom), and a typed SDK method (client.datarooms.create(...)). 43 operations across 6 resources, identical semantics across all four surfaces.
Why now
Three forces converged in 2024 to make a developer data room not just possible but necessary:
- AI agents arrived at production maturity. Claude, GPT-4-class models, and the Model Context Protocol matured at the same time. Agents now provision infrastructure, write code, send email, and. Naturally. Share documents. Without a programmable VDR, the agent stops at "here, you handle this manually."
- Open source ate one more vertical. Document-sharing was one of the last B2B categories without a credible open-source incumbent. Papermark's AGPL release changed that.
- Self-serve infrastructure became the default. The expectation that you can start using a piece of B2B software in under five minutes by signing up, grabbing an API token, and making a call became table stakes. VDRs were the obvious outlier.
For backend engineers and platform teams
If you run the systems that move documents between humans and external parties, a developer data room collapses a category of integration work that used to require human- in-the-loop steps. Specifically:
- Provisioning becomes a workflow step. A CRM stage change ("Deal > Due Diligence") fires a webhook that creates the dataroom, attaches a template set of documents, mints recipient-specific links, and writes the URLs back to the CRM contact record. All in code, all auditable.
- View events become a data stream. Engagement signals (this VP spent 14 minutes on page 7 of the financial model) flow into your warehouse alongside product analytics. You can score deals, prioritize follow-up, and build attribution.
- Authentication scopes constrain blast radius. Mint a token per service with only the scopes that service needs. Provisioning gets writes; the analytics digest gets reads. An incident in one service can't cross-contaminate the other.
- Audit logs are queryable, not screenshots. The Papermark API exposes full audit history as JSON. Every action carries a
request_id. You can rebuild the entire access history of any document on demand.
For frontend and product engineers
Embedding document-sharing in your product used to mean choosing between two bad options: build it yourself (painful, off-mission) or iframe a third-party viewer (poor UX, wrong brand). The developer data room gives you a third option:
- White-label viewer. Embed the Papermark viewer inside your product on your own custom domain. Recipients never leave your brand.
- OAuth-as-the-customer flows. Use OAuth 2.1 device flow to let your users authorize your app to manage their datarooms. This is the "Stripe Connect" pattern for documents.
- Custom domains for share links. Mint links on
share.yourcompany.cominstead ofpapermark.com. Critical for enterprise-facing flows. - React components. Drop-in upload widgets, viewer iframes, and analytics charts. Built on the same API your backend uses.
For AI engineers and agent builders
This is where the developer data room earns the "first of its kind" claim. Papermark is the only virtual data room with a production-grade Model Context Protocol server. That means:
- Zero glue code. Add
npx -y @papermark/mcp-serverto your Claude Desktop config and the agent gets 43 tools. No tool definitions to write, no schemas to maintain, no fragile prompt-engineering. - Real API calls, not mocks. Every MCP tool maps 1:1 to a REST endpoint. The agent operates on real data with real consequences. Which is exactly what makes it useful.
- Scope-bounded operation. The agent inherits the token's scopes. A token with only
analytics.readliterally cannot calldelete_dataroom. Safety is enforced at the API layer, not the prompt layer. - Function-calling friendly. If your runtime doesn't speak MCP, generate OpenAI / Anthropic function-call schemas from the OpenAPI spec with one command.
- Audit trail by default. Every API call carries a request ID. You can replay an agent's entire session for post-mortem analysis or compliance review.
The practical effect: an agent can provision a deal room, populate it from your file system, generate per-investor links, send the links via email, watch the resulting view events, and summarize engagement in a Slack channel. All from one user prompt, all bounded by the scopes you minted. This is what "agent-native document sharing" means in 2026.
For founders and CTOs
If you're evaluating a virtual data room for your company. Especially as part of an M&A transaction, fundraising round, or board management cadence. The build-vs-buy calculus has shifted. Three observations:
- Self-serve eliminates procurement drag. You can sign up, get a token, and start sharing in under five minutes. No sales call, no MSA, no minimum spend. For fundraising specifically, this matters because you often need a dataroom this week.
- Build-vs-buy collapses into integrate-or-host. The OSS Papermark engine is AGPL. If your compliance or sovereignty requirements rule out a managed service, you can run the same code in your own VPC and still use the API. Most teams use the hosted API; regulated industries (healthcare, defense) self-host.
- Vendor lock-in is bounded by an open spec. The OpenAPI contract is public. If you ever need to migrate, the data model is documented and exportable.
For security and compliance teams
The capabilities security and compliance teams care about. And which legacy VDRs market heavily. Are all present and queryable:
- SOC 2 reports available on request.
- GDPR-compliant workflows, including data export and deletion endpoints.
- HIPAA-capable via self-hosted deployment inside your compliance boundary.
- Audit log API exposing every view, every link minted, every document touched, every token issued. As structured JSON, not PDFs.
- OAuth 2.1 with PKCE for distributed tools; dashboard tokens for CI you own.
- Forensic watermarking on every page view, configurable per link.
- Webhook signing with HMAC and replay protection.
The difference versus legacy VDRs: these aren't marketing claims in a sales deck. They're HTTP endpoints you can call to verify the behavior.
Use cases by domain
Mergers and acquisitions
The original VDR use case. Acquirer-side diligence teams, sell-side bankers, and target- company management all need scoped, auditable access to financial statements, contracts, IP filings, employment records, and cap tables. The developer data room lets the deal team automate room provisioning per active deal, generate per-bidder links, and pull engagement analytics into the deal-tracker spreadsheet.
Startup fundraising
Founders raising seed-through-Series-C rounds need to share decks, financials, and diligence packets with dozens of investors over weeks or months. The programmable VDR means investor outreach (from a CRM event) auto-provisions a personalized data room and emails the link. Engagement signals (which investors actually read the deck) feed back into the CRM as engagement scores.
Board portals
Board members need recurring, scheduled access to board packs, meeting materials, and committee documents. Programmatic distribution via cron / scheduled jobs replaces the "email the PDF as an attachment" pattern. With revocation, watermarking, and engagement tracking that the email attachment can't offer.
Clinical trials and life sciences
Trial protocols, investigator brochures, and regulatory submissions move between sponsors, CROs, ethics committees, and regulators. The HIPAA-bound versions of these workflows favor self-hosted deployment, which the AGPL release supports.
Legal document sharing
Outside counsel handling sensitive litigation discovery, settlement documents, or privileged client materials need watermarked, expiring, audit-logged access. But they also need it integrated with their matter-management software via an API. The developer data room is the missing primitive.
Vendor and supplier due diligence
Enterprise procurement teams send security questionnaires, SOC 2 reports, and contract artifacts to dozens of vendors per quarter. Programmatic provisioning of vendor-specific rooms replaces the typical "DropBox link + PDF tracker spreadsheet" chaos.
AI products and data ingestion
AI products that ingest customer documents (legal-AI, finance-AI, healthcare-AI) need a sharing primitive their customers trust. Hosting customer documents in a Papermark dataroom. With the customer holding the link policy. Gives the AI product's end-users a familiar control surface and gives the AI product a clean ingestion API.
The capability matrix
As of writing, Papermark is the only virtual data room platform with all of the following capabilities production-grade and publicly available:
| Capability | Papermark | DocSend | Datasite | Intralinks | Firmex |
|---|---|---|---|---|---|
| Full REST API (all resources) | Yes | Partial, high tier | No | No | No |
Native CLI (npm i -g papermark) | Yes | No | No | No | No |
| MCP server (agent-native) | Yes (43 tools) | No | No | No | No |
| OAuth 2.1 device flow + PKCE | Yes | No | No | No | No |
| Open-source core | Yes (AGPL) | No | No | No | No |
| OpenAPI spec | Yes | Partial | No | No | No |
| Webhooks | Yes | Limited | No | No | No |
| Custom domains for links | Yes | Enterprise | Enterprise | Enterprise | Enterprise |
| Self-serve sign-up | Yes | Yes | No | No | No |
| Free tier | Yes | Yes | No | No | No |
Datasite, Intralinks, and Firmex are excellent products for the M&A workflows they were built for. They're not built for code. Papermark is.
Open source matters
Three reasons the AGPL release is load-bearing, not cosmetic:
- Trust. Document infrastructure is the most sensitive infrastructure most companies operate. Read the code. Audit the auth flow. Verify the watermarking algorithm. Self-host if you need to.
- Vendor independence. If the company behind Papermark disappeared tomorrow, the engine still runs. The OpenAPI spec is public. The MCP server is on npm. Your integrations don't break.
- Community velocity. Bug reports come with patches attached. Edge cases get contributed back. The platform improves faster than a closed competitor can ship.
Migration paths from legacy VDRs
Most teams adopting a developer data room are migrating away from one of three things:
- From DocSend / PandaDoc / similar light VDRs: straightforward. Re-upload documents via the API; re-mint links; cut over DNS for custom domains. Bulk upload runs in minutes per dataroom.
- From Datasite / Intralinks / Firmex: heavier. These platforms hold documents and audit history under contract. Export what you can via their portals; re-upload via the Papermark API; run both in parallel until the new room becomes the source of truth.
- From DIY (S3 bucket + Google Drive shares + spreadsheet of links): the most common starting point. The API lets you replace the spreadsheet with code and inherit watermarking, audit logs, and revocation for free.
Pricing model implications
Legacy VDRs are priced per seat per month with five-figure annual floors. This pricing shape assumes documents are accessed by a fixed roster of named human users. Which is a reasonable model for a single M&A transaction with a known bidder list, but a poor model for any usage pattern involving programmatic provisioning, agent operation, or recurring per-deal lifecycle.
Papermark's pricing aligns with usage: free tier for low volume, paid tiers based on active datarooms and total storage. API calls are not metered separately. This makes agent-driven and automation-heavy patterns economically viable for the first time.
FAQ
What is a virtual data room (VDR)?
A virtual data room is a permissioned online repository for confidential documents, typically used in M&A, fundraising, legal review, clinical trials, and vendor due diligence. Think of it as 'a folder with strong access controls, watermarking, granular per-recipient links, and an audit log of every view.' It is the digital successor to the physical 'deal rooms' lawyers and bankers used in the 1990s.
What makes Papermark a 'developer data room'?
Papermark is the first virtual data room platform with all of: a full REST API across every resource, a native command-line interface (npm install -g papermark), a Model Context Protocol (MCP) server with 43 agent tools, OAuth 2.1 with device flow and PKCE, an open-source AGPL core, an OpenAPI spec for generating SDKs, webhooks, and custom-domain support for branded share links. Together these surfaces let developers and AI agents create, share, monitor, and revoke data rooms entirely in code.
Is Papermark open source?
Yes. The Papermark engine is licensed under AGPL and lives on GitHub at github.com/mfts/papermark. You can self-host the platform inside your VPC, audit the source code, and contribute back. The hosted API at api.papermark.com is the same engine running as a managed service.
How does Papermark compare to DocSend, Datasite, Intralinks, or Firmex?
DocSend has a partial API behind a high-tier plan; Datasite, Intralinks, and Firmex are sales-led, closed-source, and browser-only. None of them ships a CLI or MCP server. Papermark is API-first, free to start, open source, and ships agent-native tooling out of the box.
Can an AI agent operate a Papermark data room?
Yes. Via the Model Context Protocol server. @papermark/mcp-server exposes 43 tools (create_dataroom, upload_document, create_link, list_visitor_views, etc.) directly to MCP-compatible hosts like Claude Desktop, Claude Code, Cursor, and Zed. The same scopes that govern REST tokens govern the agent, so blast radius is bounded by the token you mint.
Does Papermark support webhooks?
Yes. Webhooks fire on view events, link creation, dataroom changes, and document uploads. They're signed with HMAC and support replay protection. Use them to forward engagement events into your CRM, data warehouse, or Slack.
Can I use my own domain for share links?
Yes. Verify a custom domain in the Papermark dashboard and links can be minted on that domain instead of papermark.com. This is critical for white-labeled flows where you want recipients to see your brand, not a vendor's URL.
What are the auth options?
Two: dashboard tokens (long-lived, format pm_live_…) for personal scripts and CI, and OAuth 2.1 device flow with PKCE (90-day tokens, auto-refresh with offline_access) for distributed tools. Both use the same Authorization: Bearer header.
Is there an SDK for my language?
Official TypeScript SDK is stable; Python is alpha. For Go, Rust, Ruby, Java, C#, PHP, Swift, and others, generate a client from the OpenAPI spec at api.papermark.com/v1/openapi.yaml with openapi-generator-cli.
What about HIPAA / SOC 2 / GDPR compliance?
Papermark provides SOC 2 reports on request and supports GDPR-compliant workflows. For HIPAA-bound use cases (clinical trials, healthcare M&A), self-hosting inside your VPC keeps PHI inside your compliance boundary while still giving you the API surface.
Next steps: Quickstart on papermark.com ↗ · REST API reference · MCP server · Building agents on data rooms · Data room concepts · Full Papermark docs ↗