Inside The New York Times’s A.I. toolkit

By Duy Nguyen, The New York Times

A three panel illustration of someone reading the news, a computer, and a robot at a press conference. — Illustration by Juliana Castro Varón, The New York Times

The daily reality of journalism often involves painstaking work that, while important, has little to do with breaking a story. It’s the mundane task of sifting through thousands of documents, the repetitive cycle of drafting a dozen SEO headlines for the same news event, or the challenge of ensuring every piece of copy meets high standards of quality.

At The New York Times, where our journalism will always be human-led, our A.I. Initiatives team has focused on building tools to help reporters and editors solve specific challenges.

Cheatsheet: An Investigative Reporter’s Swiss Army Knife

Cheatsheet, which was developed by my colleague Dylan Freedman, is our solution to the reporter’s age-old problem of tackling the sheer volume of data in an investigation.

Cheatsheet features the familiar interface of a spreadsheet but is powered with advanced technical capabilities. Users upload datasets — for instance, a politician’s public statements over the years, police records from information requests or video recordings of city council meetings. Cheatsheet then allows users to apply “recipes” — tested workflows for specific investigative tasks such as quote extraction, summarization, translation, web searching and information classification. Each recipe processes data columns, generating new columns with the results (to witness this is quite a magical feeling).

With Cheatsheet, reporters can apply complicated data transformations that would previously require technical expertise and days of coding work to replicate. The tool parallelizes A.I. queries to enable rapid iteration. Recipes can be chained together. For instance, users could use the search recipe to query how each country is being affected by Trump’s tariffs and then summarize the search results, or extract text from social media posts of protests in Turkey and then translate the extracted text. A.I-generated columns are specifically labeled in the interface to indicate the need to verify any and all generated content.

To this day, Cheatsheet has been instrumental to dozens of Times investigations, from analyzing evidence of Donald Trump’s cognitive decline to identifying China’s misuse of U.S. dollars meant for panda conservation to debunking a narrative about Sydney Sweeney’s ad campaign.

Echo: Remixing Our Own Content

A month into my joining the A.I. Initiatives team, I noticed a pattern in the requests coming from our newsroom. From SEO headlines to social copy to one-sentence summaries for newsletters, many editors were seeking faster ways to draft summaries of our own articles before editing them for publication. We could have built bespoke prototypes for each request, but that would have been inefficient, trapping developer time in one-off projects. In an “aha” moment, I realized that instead of creating dozens of siloed solutions, we could build a single, powerful engine to solve summarization as an entire class of problems.

That idea became Echo, a centralized internal tool that connects our vast content archive to the power of large language models. With a single, user-defined prompt, any Times journalist can now summarize, transform or extract information from any piece of Times-owned content. We also shared Echo’s API with developers across the Times company, empowering them to build their own summarization-driven applications. What started as a summarization tool has led to experiments across the company, making it possible to extract key quotes from articles, suggest story tags and even help write news quizzes.

Stet: Defining and Measuring “Good” Copy

Tools like Echo operate in what we call a “breadth-first” paradigm: applying a single action across tens or even hundreds of articles in parallel. When working at that scale, how do you ensure quality, especially in a place of high editorial standards like The Times? A prompt that works well on five articles may fail on the next 50. Relying on subjective “eyeballing” of the results is brittle and risks an erosion of quality over time. In light of this challenge, my colleague Rubina Fillion and I created Stet (notice my affinity for four-letter project names), our framework for evaluating A.I.-generated copy. It’s important to note here that we do not use A.I. to write articles and that our journalists are ultimately responsible for everything that we publish.

The most critical work for Stet happened not in code, but in a room with other editors. The first and most important step was to codify our collective institutional knowledge into a concrete rubric to score A.I.-generated copy. This meant turning the implicit, nuanced judgment of an experienced editor (“this doesn’t feel right”) into a set of explicit, measurable qualities ranging from objective (“the summary is more than two-sentence long”) to subjective (“the summary fails to convey the main point of the story” or “it misrepresents the tone”). This process of quantitatively defining success is the most valuable part of the framework.

Once that rubric existed, we could employ automation to score every A.I.-generated summary against our standards. Crucially, this is paired with human oversight: The system also presents a random sample of summaries to human editors for review. This combination provides both the scale of automation and the invaluable ground truth of human expertise, creating a powerful feedback loop. It also aligns with our team’s belief and The Times’s stance on usage of A.I. in the newsroom: A.I. systems should never replace humans; in the Stet framework, editors will always be the first ones to notice and adapt our standards to changes in the English language at large – A.I. exists not to dictate but to adhere to those adaptations, amplifying human oversight.

Stet’s principles have already been successfully implemented to help develop and refine the A.I. prompt that now suggests alt text for approved images within our content management system. By rigorously evaluating the quality of these suggestions against a clear rubric for what constitutes helpful and accurate alt text, we could confidently integrate a tool that saves our newsroom production time while upholding our high accessibility and editorial standards.

Our Approach to Sharing and Collaboration

The Times is committed to sharing our knowledge and tools to benefit other newsrooms. While internal systems like Echo and Stet are tailored to our infrastructure and are not currently planned to be shared as code, their core value lies in their underlying philosophies, which can be adapted to any newsroom. For reporting tools like Cheatsheet, we’re interested in contributing open-source code and would love to hear from journalists working on similar problems.

Inside The New York Times’s A.I. toolkit

By Duy Nguyen, The New York Times

Be a part of our story. Change yours forever.

Investigative Reporters & Editors