Newspaper Navigator Project

The Newspaper Navigator project at the Library of Congress finds and extracts visual content from 16 million pages found in Chronicling America. Although LC is closed to the public, work continues behind the scenes on projects just like this.

Newspaper Navigator Project

Innovator in Residence Ben Lee trained a machine-learning model to identify visual content in historic newspapers at Chronicling America. Lee said,

The 16+ million historical newspapers within Chronicling America are fascinating to me on so many levels. They are a portal back in time and reveal the rich history of the United States in a way that is unique to historic newspapers, from local histories to fun advertisements.

But what excites me most about Chronicling America is how it reaches such a wide range of the American public, including school groups, genealogists, journalists, local historians, researchers, and even people looking to recreate old cooking recipes!

With Newspaper Navigator, I have been fortunate to be able to build on the work of Beyond Words volunteers who identified visual content in World War I-era newspapers; using the annotations volunteers created, I have trained a machine learning model to identify visual content in historic newspapers.

In the first phase of Newspaper Navigator, I have used this model to find and extract all of the visual content included in over 16 million pages in Chronicling America. The resulting dataset, which I am calling the Newspaper Navigator dataset, includes headlines, photographs, illustrations, maps, comics, editorial cartoons, and advertisements.

A doctoral student in Computer Science and Engineering at the University of Washington, Lee was drawn to this project by his own family history.

My interest in using computer science to help identify and analyze images in digitized cultural heritage collections stems from my time as a Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum (USHMM).

My grandmother is a survivor of Auschwitz-Birkenau concentration camp. When I first visited the USHMM with her in 2007 as a teenager, a reference librarian told us about the International Tracing Service, one of the largest repositories of documents pertaining to the Holocaust. A decade later, while a fellow at the USHMM, I learned about the idiosyncrasies and difficulties of manually searching the digital archive.

As a result, I developed a project to aid the search process by using machine learning to identify different types of documents in the archive. Read more about this work here. The idea of facilitating access to meaningful materials has stuck with me. It has been a large motivation in pursuing this line of research with cultural heritage collections with both the Library of Congress and the University of Washington.

Using the Newspaper Navigator Project Dataset

A compilation of extracted images from the Newspaper Navigator dataset (Courtesy, Library of Congress)

Search the dataset for the Newspaper Navigator project at this link. For other projects in the works at LC, visit the Library of Congress Labs here. And for more posts about using newspapers for family history research. click here.