Portrait of the AI as a Young Man: Future Prospects of AI in Archives Cataloging

The following was submitted by AV Cataloging and Metadata Intern, Emily Hankins.

Futuristic depictions of a technological world (think Bladerunner or Robocop) often center on machines taking over humans, or humans fighting back against machines. It’s becoming easy to see this reflected in the real world now, too, albeit with less physical robots running around our streets and more digital robots running around online. Artificial intelligence is often associated with making real things harder to find, flooding social media feeds with AI-generated posts, and creating chat bots that pop up at every corner. However, my time as an intern at GBH this past semester has allowed me to temper my hostility towards AI and bring nuance to my understanding of what a future with this technology could actually mean, especially in archives.

One of the main projects from this semester that I worked on alongside another GBH Archives intern, Kaycee Conover, and the Metadata Operations Manager, Owen King, was to improve a system which recognizes on-screen text for materials in the American Archive of Public Broadcasting (AAPB) that were being catalogued. Initiatives for providing archival access often hinge between quality and quantity, and the AAPB is no different. Having a designated team of catalogers sit down and capture the data could be possible, but with thousands upon thousands of hours of recorded broadcasts, it couldn’t be done without significant sacrifices in time. So in comes the computer.

The current workflow relies upon technology from the CLAMS team’s work on image classification for scene recognition and automatic text recognition by a vision language model (VLM). This is followed by prompting a large language model (LLM) to get a standardized output, and then having humans review the work to enhance the catalog record for each broadcast. It’s truly hand in hand collaboration between human and machine, using AI to create a framework that makes it easier for catalogers to clean it all up.

In the optimal scenario, a person’s face is clearly depicted with their name below them, which is captured by the VLM, and analyzed by the LLM to create an output that formalizes the name and its parts into a normalized format (like Smith, John) as well as any attributes under their name. For example:

The image is the actual frame from the broadcast, the extracted text shows what the VLM recognized, and the catalog data is the output from the LLM to fit the standardized format, editable by the cataloger to fix any mistakes. As the automatic processing pipeline has been improved and changes have been made, more and more frames are following this ideal chain of events and minimizing the amount of physical typing the cataloger needs to do to get this information out there.

But of course it isn’t all going to flow this easily. Take the following frame:

Early on in our work, the syntax that we were using only allowed us to catalog one person per frame and for most of what we encountered this worked perfectly fine. But as seen here, there’s two people, and if only one person per frame is allowed, then you’re forced to make decisions about who to include and who to ignore. Instead, we came up with a way to denote when there are two people being cataloged as separate entities to give them each a chance to exist in catalog records for future searching.

And there are so many more areas of public broadcasting that don’t fit the typical mold and that haven’t been encountered yet. Because of this, humans are always going to be needed to provide input into the final product, to provide the final look at a human-created, machine-analyzed cataloging product. It’s refreshing to realize that the computer is able to adapt to the creativity that makes public broadcasting so unique and that these core values are going to be retained even as we utilize new technologies to optimize the process.

Towards the beginning of the internship I also encountered an instance where the computer saw something we couldn’t. Looking at this frame:

A human looking at this would likely just see this as a black rectangle, with the “other text” marker at the top being an example of the computer mislabeling the frame. But after adjusting the brightness and contrast, you can see that there actually is text in the bottom right corner that reads “Dr. Kaye Whiley.”

This example showed me how using AI programs to detect text, though it may miss some, is also able to see others that humans would have a harder time detecting. It really brings home the cooperation of human and machine that I have been thinking about when working on this project.

The last example I have is about recording the person when no name is displayed on screen. This situation isn’t too uncommon, especially for on-the-street interviews and quick cameos. The bulk of our cataloging this semester has been for the New Jersey Nightly News on New Jersey Public Television (later renamed the New Jersey Network or NJN), and by focusing on the same series we were able to become familiar with the recurring faces of the cast and their styles for editing their news programs. The anchors, such as Kent Manahan and Sandra King, are often given Chyrons with their names and locations, which are picked up by the VLM and displayed for us to catalog. We also created tags to label when persons are part of the station such as anchors and reporters (rather than guests or interviewees only shown occasionally).

But another recurring face was missed. The hostess of the New Jersey Lottery’s nightly drawing, appearing at the end of each program, never had her name shown on screen despite appearing so many times. How can you consider recording her contribution when we don’t see her name?

In this case, I was able to find the Wikipedia article for the host, Hela Yungst. But finding her name required 1) my motivation to see if I could find her name using an outside source, rather than just leaving the cataloging record blank, and 2) her information to be out on the internet. Her name itself also presented questions about our definitions of normalized names, as she was born Hela Yungst but often went by Hela Young on screen. And her presence over so many programs despite not being an official member of the NJN staff breaks the categories we established about who is a regular for the show. These questions are the kinds of areas where humans really are important in this whole process, to offer empathetic and complex understandings to questions that don’t necessarily have correct answers.

Throughout all of this work, the effects of this system on the cataloging output has to always be in consideration. Within the cataloging team there’s lots of reflection on the biases inherent in the technology, the people that may be missed, or how the workflow changes when you incorporate another perspective into it. The archives team at GBH is thinking about all these questions; they aren’t just implementing AI to speed things along with a major sacrifice to the quality or goals of the work, and all of this really changed how I view AI as a tool in archival work. The purpose of cataloging has remained the same, and in a sense is further fueled by just how much these capabilities can improve the amount of items that can be accessed. Rather than molding the materials to somehow fit the capabilities of the AI model, throughout the semester we made revisions to the tool to fit how humans had decided to portray themselves and their communities through broadcasting. The uniqueness of public broadcasting doesn’t stop the process, it just means that we need to adapt it.

The knowledge I have gained over these past few months will carry into the intentions that inform my decision-making as an archivist going forward, especially considering who is added into the archival record and who is being left out. People are the center of archives and this work has truly cemented that. Being a part of this project has also been illuminating in how I understand the purpose of cataloging in archives and how computers can be involved in what often feels like a paper-based profession, brightening the possibilities that the future holds for expanding who can be accounted for.

Maybe Ridley Scott was wrong, and in the end we don’t have to fight the robots.