Back to methodology

The Citation Linking System

How the assistant produces clickable citations that link directly to the original PDFs hosted on justice.gov, and why that matters for verification.

A research assistant that cannot be checked is, in practice, a confident-sounding rumor. The citation system on this site exists to make every answer verifiable: each EFTA identifier in an assistant response links to the original document at its source. This page explains how the linking works and what to expect when you click through.

The EFTA identifier

Every document in the corpus carries a unique identifier of the form EFTA followed by eight digits — for example, EFTA00009654. This identifier is assigned by the U.S. Department of Justice at the time of public release and is used as the canonical reference for the document across the dataset, internal indexes, and the public-facing tool.

The identifier is preserved exactly as published: no renaming, no normalization beyond zero padding to eight digits. If you encounter an EFTA identifier in another source — for example, in a journalist’s article or a court filing — it should match the same document on this site.

How citations are produced

When the language model generates an answer, it is instructed to mark each factual attribution with the EFTA identifier of the supporting chunk in square brackets — for example, [EFTA00009654]. The model can cite a single document or several: [EFTA00009654, EFTA00012345].

After generation, a small post-processing step on the backend takes the raw text and:

  1. Locates each EFTA identifier mention.
  2. Looks up the document’s metadata, which includes its online_url field.
  3. Replaces the bare identifier with a Markdown link of the form [EFTA00009654](https://justice.gov/...).
  4. Returns the linked Markdown to the frontend, which renders it with a target attribute that opens the link in a new tab.

The system is intentionally tolerant: it handles citations inside brackets, comma-separated lists of citations, and bare mentions of an identifier in the body of the answer. All of these become clickable links pointing at the original PDF.

What you see when you click

When you click a citation, your browser navigates to the URL on justice.gov (or, in some cases, another government repository) where the original document is hosted. For most documents, this is the official DOJ public-records page. You may see:

  • A landing page with an age-verification or terms-of-use prompt; click through to reach the document.
  • A direct link to the PDF, which your browser will display or download depending on its settings.
  • In some cases, a redirect to a different government site that re-hosts the document.

The system does not host the documents itself. Citations always lead to the canonical government source.

What if a document has no online URL?

Some documents in the corpus were public-record releases that have an EFTA identifier but no current online URL. In those cases, the citation appears as plain text, without a hyperlink. This is rare but possible, and it is handled deliberately: the system would rather show you the identifier than fabricate a URL.

You may notice that the assistant cites documents by identifier rather than by long verbatim quotation. This is intentional, and serves several purposes:

  • Honesty about scope. A citation should point you to a complete source, not a curated excerpt that frames the source in a particular way.
  • Document length. Many of the source documents are court filings of dozens or hundreds of pages. A short answer cannot meaningfully quote them, but a citation lets you read the full context.
  • Avoiding manipulation. If the assistant only quoted snippets, it would be easy to inadvertently emphasize one side of an issue. A citation system pushes the reader to engage with the underlying source.

Verifying an answer in practice

We recommend the following workflow when you want to verify an assistant claim:

  1. Identify the specific sentence or claim you want to check.
  2. Note the EFTA identifier(s) cited at or near that sentence.
  3. Click the link to open the original PDF.
  4. Use your browser’s text-search to locate the relevant passage in the document.
  5. Compare the assistant’s summary against the original wording.

This process is straightforward and reliable. If you find a discrepancy between an assistant answer and a cited document, the cited document is authoritative — the assistant may have summarized too aggressively, or retrieval may have surfaced an imperfect match. In either case, the source is the truth, and the source is one click away.