Data rooms for developers: concepts, primitives, mental model

Read full docsRead the full Papermark developer documentation

What's a data room

Strip the legalese: a data room is a permissioned bucket of documents with per-recipient access tracking. Think "S3 bucket + ACL + audit log + a viewer that watermarks PDFs."

The traditional product (Datasite, Intralinks, Firmex) lives behind a sales call and a per-seat license. Papermark is the same primitive. Open, API-first, free to start.

Core primitives

Six resources. That's the entire API.

conceptual schemats

type Dataroom = {
  id: string;          // dr_
  name: string;
  documents: Document[];
  folders: Folder[];
};

type Document = {
  id: string;          // doc_
  name: string;
  versions: DocumentVersion[];
  size_bytes: number;
  mime_type: string;
};

type Folder = {        // fld_. Exists both standalone and inside datarooms
  id: string;
  name: string;
  parent_id?: string;
};

type Link = {          // lnk_. The visitor-facing URL
  id: string;
  target: { kind: "dataroom" | "document"; id: string };
  password?: string;
  expires_at?: string;
  require_email: boolean;
  allow_download: boolean;
};

type Visitor = {       // vis_. Recognized via email gate
  id: string;
  email?: string;
};

type View = {          // vw_. Single session
  id: string;
  link_id: string;
  visitor_id: string;
  pages: { number: number; duration_seconds: number }[];
};

Lifecycle

text

1. Create  dataroom         (POST /v1/datarooms)
2. Upload  document          (POST /v1/documents)
3. Attach  document → room   (POST /v1/datarooms/:id/documents)
4. Mint    link              (POST /v1/links)
5. Send    link              (your email/CRM stack)
6. Watch   views             (GET  /v1/links/:id/views)
7. Revoke  link              (DELETE /v1/links/:id)
8. Archive dataroom          (PATCH /v1/datarooms/:id { archived: true })

Access model

Access is enforced at the link, not the dataroom. One dataroom can have N links, each with different gating. That gives you per-recipient policy without duplicating content.

password. Pre-shared secret
expires_at. Hard cutoff
require_email. Verified email for identification
allow_download. False = viewer-only, true = downloadable
watermark. Template like {{email}} · {{timestamp}}

Storage & uploads

Small files (under ~5 MB) can stream through multipart. Anything larger uses S3 presigned URLs:

bash

# 1. Reserve a slot, get a presigned PUT URL
curl -X POST https://api.papermark.com/v1/documents \
  -H "Authorization: Bearer $PAPERMARK_TOKEN" \
  -d '{"name": "big-deck.pdf", "upload": "presigned"}'

# 2: pUT the bytes directly to S3
curl -X PUT "$PRESIGNED_URL" \
  --upload-file big-deck.pdf \
  -H "Content-Type: application/pdf"

# 3: confirm
curl -X POST https://api.papermark.com/v1/documents/$DOC_ID/finalize \
  -H "Authorization: Bearer $PAPERMARK_TOKEN"

Mental moves

Datarooms are cheap. Create one per deal, not one giant shared room. Cleanup is one DELETE.
Links are policy. Different recipients should get different links, not shared credentials.
Views are events. Treat the views stream like Stripe events. Webhook it into your CRM.
Versions are immutable. Promoting a new version doesn't break existing links. They always serve the current active version.
Search is built in. search_documents queries OCR'd content. You don't need to ship your own indexer.