How the assistant produces clickable citations that link directly to the original PDFs hosted on justice.gov, and why that matters for verification.
A research assistant that cannot be checked is, in practice, a confident-sounding rumor. The
citation system on this site exists to make every answer verifiable: each EFTA identifier
in an assistant response links to the original document at its source. This page explains
how the linking works and what to expect when you click through.
EFTA identifierEvery document in the corpus carries a unique identifier of the form EFTA followed by eight
digits — for example, EFTA00009654. This identifier is assigned by the U.S. Department of
Justice at the time of public release and is used as the canonical reference for the document
across the dataset, internal indexes, and the public-facing tool.
The identifier is preserved exactly as published: no renaming, no normalization beyond zero
padding to eight digits. If you encounter an EFTA identifier in another source — for
example, in a journalist’s article or a court filing — it should match the same document on
this site.
When the language model generates an answer, it is instructed to mark each factual
attribution with the EFTA identifier of the supporting chunk in square brackets — for
example, [EFTA00009654]. The model can cite a single document or several:
[EFTA00009654, EFTA00012345].
After generation, a small post-processing step on the backend takes the raw text and:
EFTA identifier mention.online_url field.[EFTA00009654](https://justice.gov/...).The system is intentionally tolerant: it handles citations inside brackets, comma-separated lists of citations, and bare mentions of an identifier in the body of the answer. All of these become clickable links pointing at the original PDF.
When you click a citation, your browser navigates to the URL on justice.gov (or, in some
cases, another government repository) where the original document is hosted. For most
documents, this is the official DOJ public-records page. You may see:
The system does not host the documents itself. Citations always lead to the canonical government source.
Some documents in the corpus were public-record releases that have an EFTA identifier but
no current online URL. In those cases, the citation appears as plain text, without a
hyperlink. This is rare but possible, and it is handled deliberately: the system would rather
show you the identifier than fabricate a URL.
You may notice that the assistant cites documents by identifier rather than by long verbatim quotation. This is intentional, and serves several purposes:
We recommend the following workflow when you want to verify an assistant claim:
EFTA identifier(s) cited at or near that sentence.This process is straightforward and reliable. If you find a discrepancy between an assistant answer and a cited document, the cited document is authoritative — the assistant may have summarized too aggressively, or retrieval may have surfaced an imperfect match. In either case, the source is the truth, and the source is one click away.