Who runs this site, how content is written and reviewed, and how to report errors. Last reviewed: May 2026.
Epstein Files is maintained by Juan, an independent software engineer. The project began as a public-interest experiment in applying retrieval-augmented generation (RAG) to a body of court records that, in its raw form, is too large for a single person to browse — 2.2 million indexed passages drawn from DOJ filings, depositions, exhibits, and congressional materials related to the Jeffrey Epstein and Ghislaine Maxwell cases.
I am not a journalist or a lawyer. I am the sole author and reviewer of the editorial content on this site (topic guides, methodology pages, glossary, FAQ, this page). My background is in software, and the tool exists because the underlying records are already public — anyone can read them on justice.gov — but few people have the time to do so. A searchable interface lowers the cost of looking.
This is a non-commercial project. There is no investor, no sponsoring publication, and no editorial board outside of me. Hosting, models, and the vector database are paid for out of pocket. If the site begins running advertising through Google AdSense, that will be disclosed here, and ad revenue will only be used to offset infrastructure costs. There is no paid content, no affiliate placement, and no relationship — paid or otherwise — with any person or entity named in the underlying documents.
The site is built around two kinds of content, and the distinction matters:
[EFTA00000000]
linking back to the original PDF. Chat answers are not human-reviewed before they
are shown to you; they should be read as a research tool, not as edited prose.
When you ask a question, the system does the following:
text-embedding-3-small model.[EFTA…] identifier into a clickable link to justice.gov.This pipeline has known failure modes, documented honestly on the limitations page. The most important: language models can produce plausible-sounding citations that don’t actually exist, can misattribute claims to documents that don’t support them, and can be confidently wrong about specifics. Treat every assistant answer as a starting point, not a settled fact, and follow the citation to the original PDF before relying on anything important.
If you find an error — a misstated fact in a topic guide, a broken citation, an incorrect glossary definition, a documented historical detail that I’ve gotten wrong — please report it. I take corrections seriously and will fix issues promptly, attributing the correction in the page footer with a date.
For corrections, factual disputes, takedown requests, or general feedback, email [email protected] . Include a link to the page in question and a citation to the source you believe is correct. I read every email, though I cannot guarantee a response time.
Editorial decisions — what to summarize, how to characterize a legal proceeding, how to handle the appearance of an uncharged individual in a document — are mine. I am open to being persuaded by a well-sourced correction, but I do not take down accurate characterizations of public proceedings on request.
I have no professional, financial, or personal relationship with any party named in the documents in the corpus — survivors, defendants, accused individuals, prosecutors, attorneys, judges, or institutions. I have not been retained, contacted, or compensated by any law firm, advocacy group, journalist, or political organization connected to this case. If that changes, I will disclose it on this page.
The dataset on which the assistant is built — Nikity/Epstein-Files on HuggingFace — is a third-party aggregation of public materials. I did not assemble the corpus; I indexed it. If the upstream dataset is corrected or expanded, this site will re-index in due course. See the dataset and corpus page for a longer discussion.
This site is built on open infrastructure: Python (FastAPI, LangChain) for the backend, Astro and React for the frontend, Pinecone for the vector index, OpenAI for embeddings, Cohere for reranking, and Groq for inference. The application code is independent of the dataset itself. If you want to verify how chat answers are produced — what prompt the model sees, what passages are retrieved, how citations are injected — read the methodology pages , which describe the system without marketing.