And keep the proof.
One system for the whole loop. Analysts and agents search the governing version directly, build reports from targeted passages instead of half-million-token pastes, and every claim is checked before it ships: what is wrong gets flagged, what is missing gets named, and every step lands in a signed, tamper-evident ledger with human sign-off. Databases gave data a paper trail. PortMem gives one to work made from documents.
About $1 per verified report. Agents read the passages that matter, not 500,000-token pastes.
Minutes, not review cycles. Documents index in seconds; a clean report verifies in under two.
Every step on the record. Verdicts and sign-offs land in a signed, tamper-evident ledger.
A database can always answer: where did this number come from, which version was current at decision time, who approved the change. Documents make no such promises. The provisions that actually govern a deal sit in a 300-page indenture and its amendments, with no schema, no version control, and no query log. And the moment an LLM reads those documents and someone acts on the answer, even the informal trail disappears: the model read something, said something, someone acted on it, and no record connects the three.
PortMem gives document work the guarantees databases gave to data. The record layer is cryptographic: hash-chained, signed, verifiable by anyone with our published key. The judgment layer is calibrated and fail-safe: it never certifies what it cannot support, and a human signs off on every flag.
Every claim in a report cites the exact passages it derives from, in the governing version of the document. A citation that is faithful to a superseded contract gets flagged, not certified.
Contradicted claims are caught before the report ships, and omissions are named: the provisions the document requires that the report never covered. Both directions, wrong and missing.
Measured, not estimated: about one dollar of model spend for a full review. When the labs make raw verification cheaper, our costs drop with them. The signed record is what you are buying.
To index a 1,274-chunk bond indenture. Measured on the production system, not a demo corpus.
For a clean report to verify, end to end. Reviews run while the analyst is still reading, not overnight.
That never enter a context window. An agent built and verified a full bondholder report through 18 targeted searches over MCP; it read the passages that mattered and paid for nothing else.
Underneath the certification layer is retrieval built for regulated documents: the exact passage that should govern the answer, with the trail of how it got there. Four published papers of research, running as one product.
The answer is a passage from a real document, with a citation. Nothing is generated on the critical path.
Superseded, recalled, and overruled documents are removed at retrieval time. The result is the one that is still in force.
When the evidence is weak or contradictory, the system says so. Better silence than a confident wrong answer in regulated work.
For questions that need two or three connected documents, PortMem chains the retrievals and shows the bridge between them.
Different question types need different strategies. A router picks the right one without you having to label queries.
Sits above your vector store. Vendor-neutral, model-agnostic, no rip-and-replace.
PortMem sits above any vector store. Buyers add the regulated-domain ranking layer that long-context LLMs, agentic frameworks, and generic rerankers do not have.
| Approach | Currency | Audit | Multi-hop | Model-agnostic | Regulated | Speed | Cost |
|---|---|---|---|---|---|---|---|
|
Long-context LLMs
Sonnet 1M, GPT-4
|
✗ | ✗ | ~ | ✗ | ✗ | Slow | $$$ |
|
Naive RAG
LangChain, LlamaIndex
|
✗ | ~ | ✗ | ✓ | ✗ | Fast | $ |
|
Agentic RAG
LangGraph, CrewAI, AutoGen
|
✗ | ~ | ✓ | ✓ | ✗ | Slow | $$ |
|
Compilation-stage knowledge
Pinecone Nexus, PageIndex
|
✗ | ~ | ✓ | ✓ | ✗ | Med | $$ |
|
Vertical legal AI
Harvey, Casetext, Hebbia
|
~ | ✓ | ~ | ✗ | legal only | Med | $$$ |
|
Enterprise search
Glean, Sana
|
✗ | ~ | ✗ | ✓ | ✗ | Fast | $$ |
|
Vector store + rerank
Vectara, Pinecone, Cohere
|
✗ | ~ | ~ | ✓ | ✗ | Fast | $$ |
|
PortMem
The certification layer
|
✓ | ✓ | ✓ | ✓ | ✓ | Fast | $ |
Reviewing a real debenture report, PortMem caught a covenant threshold that had been imported from a different instrument: a number that was word-for-word accurate, cited from the wrong contract. Every plain similarity check passes it. Authority-aware verification caught it.
Across human-adjudicated ground truth, PortMem has never certified a false claim. When evidence is thin it abstains and flags for human review. The error direction is engineered: extra caution, never silent approval.
Every verdict and sign-off lives in a hash-chained, Ed25519-signed, append-only ledger. Verification key: 506afc81aa39331a4c704a97bca26100a0d857a682a6fa2dd7274b0dae547c3e. Anyone holding it can verify our records. Nobody holding it can forge one.
We built three test corpora where the right answer is not the most similar document. In every one, a workhorse frontier model working alone fails most of the time. PortMem clears it.
Across SEC filings, FASB updates, FINRA rule changes, and SEC no-action letters, the latest superseding document wins. PortMem picks it deterministically. A frontier LLM working from raw text gets it right 65% of the time.
SCOTUS overrules are semantically distant from the cases they replace. PortMem finds them by authority structure, not by similarity.
FDA's recall database overlaps with marketing-approval text. Standard retrieval pulls the approval; we surface the recall. Pharma demo →
Datasets, code, and the single-command evaluator are published with each paper. "Workhorse frontier model alone" baseline is Claude Haiku reading the same corpus directly without PortMem retrieval. Method, prompts, and full result tables are in the CAR paper (paper 4) under github.com/andremir/car-retrieval.
The pain is sharpest where a stale rule is an SEC enforcement action, a withdrawn no-action letter is a deficient supervisory procedure, or a missed supersession is a restated filing. We are starting with finance because the buyer has the budget, the audit clock is recurring, and the data (10-Ks, FASB ASUs, FINRA rule amendments, no-action letters) is structurally version-controlled and machine-readable.
SEC filings, FASB Accounting Standards Updates, FINRA rule amendments, SEC no-action letters, internal compliance memos. Knowing which version is currently in force is the entire job, and the cost of being wrong is regulatory exposure measured in seven figures.
PortMem hits 100% accuracy on financial supersessions. A workhorse frontier model alone hits 65%. The 35-point gap is the gap between "tooling" and "audit-defensible."
Case law, precedent tracking, statutory authority. One mis-cited brief is 80 to 200 hours of associate time and a malpractice exposure. PortMem catches overruled precedent that frontier LLMs miss 93% of the time.
Buyers: AmLaw partners, in-house GC, litigation support, legal AI platforms.
FDA labels, clinical trial protocols, drug recall notices. A recalled product retrieved as "approved" is a patient safety event.
Buyers: medical affairs, regulatory affairs, pharmacovigilance. Pharma demo →
CVE entries, GHSA advisories, vendor patch notes. "Is this CVE patched?" is a controlling-authority question. Standard RAG gets it wrong 39% of the time.
Buyers: secops, vulnerability management, AppSec platforms.
PortMem is the productization of four sole-authored research contributions. Each one fixes a specific failure mode that standard retrieval has in regulated content.
Vector search and graph search produce scores on different scales. PhaseGraph maps them onto a common rank-based scale before fusing, which lifts last-hop recall on MuSiQue and 2Wiki without discarding magnitude information.
arXiv 2603.28886 →For questions that need a chain of evidence, the second-hop document should be ranked by usefulness given the first hop, not by similarity to the original question. Training-free, graph-free, beats published baselines on three standard benchmarks.
arXiv 2604.03384 →Different question types need different retrieval strategies. A lightweight router picks the right mode per query, with measurable gains in domain and graceful behavior out of domain. No hand-labeled query types required.
arXiv 2604.09019 →Finding the currently valid document is a different problem from finding the most similar one, and the two metrics are formally decoupled. CAR validates the framework on FDA, SCOTUS, and security advisories with large gains over dense baselines.
arXiv 2604.14488 · GitHub →Currently in active conversations with finance and legal teams across mid-market RIAs, regional banks, AmLaw firms, and large-cap controller offices. We onboard four to six design partners before commercial GA.
Hosted endpoint, integration support, and a co-built ingestion adapter for one corpus.
Design-partner license at a discount from list. Roadmap input. No long-term lock-in.
Two-week scoping. Six-week integration. Twelve-week paid pilot decision.
Or write directly to andre@portmem.com.