An honest list of what this assistant cannot do, where it can mislead, and how to use it responsibly as a research aid rather than an oracle.
Every research tool has limits. A tool that pretends otherwise is more dangerous than one that says so plainly. This page lists, without softening, the failure modes you should expect when using this assistant — both technical limitations of the underlying retrieval system and editorial limitations that follow from the nature of the source material.
The assistant searches only the documents indexed in this corpus. It cannot answer questions based on materials that have not been released, are not in the dataset, or have been added since the most recent index update. If a question cannot be answered from the indexed material, the assistant will say so rather than improvise.
The system retrieves passages by semantic similarity. Sometimes the most relevant passage ranks lower than a similar but less directly responsive one. Reranking helps, but does not eliminate this issue. Concretely:
Many of the source documents are image-based PDFs processed via OCR. OCR produces errors on:
Redactions in the original documents are visible as blacked-out boxes in the PDF, but the underlying text is not recoverable from the source. If the assistant’s answer depends on a redacted passage, that passage is genuinely unavailable.
Even with retrieval grounding, large language models occasionally produce plausible-sounding text that is not actually in the retrieved sources. The system mitigates this through explicit instructions to cite each claim, but it does not eliminate the risk. If you see a factual claim that is not directly tied to a citation, treat it as suspect and verify against the cited documents.
Public-record documents tend to over-represent what was litigated and under-represent what was not. A subject that was the focus of intense civil discovery has thousands of pages of material in the corpus; a subject that was investigated but never prosecuted may have few or no documents. The presence or absence of material in the corpus reflects the structure of the legal system, not the structure of reality.
Civil pleadings, witness statements, and preliminary court filings often contain allegations that were never adjudicated. Many cases settled before trial. A claim made in a complaint may be true, partially true, or false — the corpus contains the document either way. The assistant does not, and cannot, distinguish between alleged fact and adjudicated fact except to the extent that the documents themselves do.
A name appearing in a flight log, a witness list, an address book, or an exhibit is a fact about the document — not an allegation, finding, or insinuation about the person named. This is especially important for documents that include large numbers of names of family members, business contacts, household staff, professional associates, journalists, and others whose presence is incidental to the conduct at issue.
The assistant can search and summarize documents. It cannot conduct interviews, evaluate source credibility, weigh competing accounts, develop original reporting, or provide legal advice. Treat its output as a finding aid, not a finished work product.
A responsible workflow looks like this:
If you find a clear error in an answer, you have done useful work: the system is fallible, the citations are checkable, and disagreement with a source is itself a finding.