Methodology

Transparent documentation of how this research tool works, what it can and cannot do, and how to verify its outputs. A research assistant that asks for trust without explaining itself is, in practice, a confident-sounding rumor — these pages exist so that you do not have to take the assistant's word for anything.

What you will find here

Three pages, written for non-technical readers, describing the entire pipeline from your question to a sourced answer. The first explains the retrieval-augmented generation technique in plain terms. The second describes how citation linking back to the original PDFs is produced and how to verify an answer in practice. The third lists, without softening, the failure modes and editorial limits of the approach.

Reading the three pages takes about ten minutes and gives you everything you need to interpret an assistant answer critically — including what to do when it is wrong.

A note on transparency

The technical stack behind this site is documented openly. Embeddings are produced by OpenAI's text-embedding-3-small model. Vector search runs against a Pinecone index containing approximately 2.2 million chunks derived from the public-record dataset. Reranking uses Cohere's rerank-v3.5 model. Generation runs on Llama 3.3 70B served by Groq. Citation linking is a deterministic post-processing step that maps each EFTA identifier mentioned in the model output to the document's online URL on justice.gov.

Naming the models matters. It tells you what kind of failure modes to expect, makes the results reproducible in principle, and prevents the assistant from sounding more authoritative than the underlying components warrant. None of these models are infallible; all of them produce occasional errors that the citation system is designed to make detectable.

Methodology pages

How Retrieval-Augmented Generation Works

A non-technical explanation of how this assistant searches the document corpus, retrieves relevant passages, and generates sourced answers.

The Citation Linking System

How the assistant produces clickable citations that link directly to the original PDFs hosted on justice.gov, and why that matters for verification.

Limitations and Failure Modes

An honest list of what this assistant cannot do, where it can mislead, and how to use it responsibly as a research aid rather than an oracle.

If you find a problem

The system is fallible by design — retrieval is approximate, language models can hallucinate, and the underlying corpus contains OCR artifacts and redactions. If you find an answer that disagrees with the document it cites, the document is authoritative. Disagreement with a source is a finding, not a malfunction: it tells you the assistant has summarized too aggressively or surfaced an imperfect match. In either case, the source is one click away and is the right place to anchor any conclusion.