The year is 2008. I walk into a movie theater in my Michigan hometown, not knowing that the animated sci-fi movie I am about to watch about a trash-compacting robot will cement my view on artificial intelligence for the next decade and a half. The younger me has no idea that the automation and AI shown in the movie will be closer to fully realized by the time she gets to graduate school. It’s a fantasy, but it’s one that plants a seed of distrust.
Of course, AUTO, the AI autopilot that plays the role of the villain in WALL-E, is nothing like the AI models we have become familiar with today. In my time in Simmons University’s MLIS program, I have heard and taken part in many discussions on how we can utilize AI in the field of library and archival science. There are many arguments for and against AI in libraries and archives, but I find that many points (for me) boil down to something like this: We don’t want AI to replace us in helping people – we want it to make it easier to help those people.
My main goal in my final semester at Simmons was to tackle the parts of the archival profession with which I was less than comfortable: namely, those that had to deal with the digital aspects of archiving that are the present and future of the field. I enrolled in classes on metadata standards and digital asset management, and I secured an internship at GBH that, unbeknownst to me, would tie everything together in the most serendipitous way.
I find that I am a kinetic learner who is big on communication as a management strategy – that is, I learn best when I am given a task, tossed in the deep end, and allowed to ask as many questions as I need to. Luckily, GBH’s Media Library and Archives was the perfect environment for that. As the Audio Visual Metadata and Training Data intern working on the American Archive of Public Broadcasting, GBH’s collaboration with the Library of Congress, I have gained so much practical knowledge of digital archives. It seemed that every week, I learned something in one of my courses that was immediately applicable in my work at GBH. No matter the task, the Archives team has been incredibly supportive in teaching me how to utilize these newly learned skills in the context of the AAPB.
While my weekly assignments have varied, the majority of my work has focused on the CLAMS project, which is a project by Brandeis Lab for Linguistics and Computation. CLAMS aims to “improve access, search, and exploration of archival audiovisual material” through creating AI tools that can generate information from those materials that would otherwise go unseen and unindexed. Since the beginning of the CLAMS project, GBH Archives and the AAPB have been heavily involved—defining use cases, sharing data, and integrating CLAMS apps in their local data workflows. Over these past months, I have had the privilege of working on many aspects of this project from the archival side, and I will admit I started off curious but also somewhat mistrustful due to the AI involvement. There is a balance that must be found when incorporating AI into any given field, and it was one I hadn’t previously seen successfully achieved. Despite the cartoonishly evil memory of AUTO that lingered in the back of my mind, though, the CLAMS project has been eye-opening for me in terms of how AI can actually be used to help archivists, researchers, and other users.
My supervisor, Owen King (Metadata Operations Specialist at GBH), introduced me to the CLAMS project by showing me how to identify different types of text frames within videos. The SWT (scenes-with-text) Detection CLAMS tool extracts scenes from videos with textual content, so becoming familiar with the taxonomy of these text frames was key to knowing what we were trying to teach the program to identify. With this foundational knowledge, I started to go through the AAPB and curate batches of videos that were diverse in content and text format.
Over the next several weeks, I would curate several of these batches, each focusing on different aspects of videos that the program had trouble identifying in past batches. In many ways, it felt like I was learning alongside the program. A big learning moment came during the week of my spring break when I was finally able to meet with the Brandeis team. Prior to that meeting, my supervisor asked me to look over visual indexes or “visaids” from two sets of videos, both of which had been processed by SWT on several different settings, and see what the program consistently had the most problems identifying. It was a big project, but a rewarding one in terms of furthering my understanding of the project. This was the point for me at which I began to, as Owen called it, “give the program some credit,” and he wasn’t wrong.
I could see that, at times, the program desperately wanted to identify certain frames, resulting in false positives. It missed things sometimes, resulting in false negatives, of course, but overall, it really seemed like it wanted to try. I found that it even tried to identify a specific type of text frame in an animated video that one would usually only expect to see in real life (like a billboard in the background), and it actually did so somewhat accurately. Seeing this evidence, I was able to present a case for training the program to identify that type of frame in animation in order to strengthen its identification of abnormal cases of that text frame during my presentation to the Brandeis team.
From that point on, I felt better equipped to put together batches of assets that would challenge the program on the things it had trouble with, like plants or powerlines. Each time, I felt that my understanding of the program grew, and I hoped that these training batches would help it grow into what its creators intended it to be. A WALL-E, not an AUTO, if you will.
My time at GBH has helped me achieve my goals and better prepared me for working in this field. I am leaving this internship not only with so many new skills and some of the best mentorship I’ve received in this field but also a new perspective on the future of AI in archives. I am less fearful of the possibility that robots will take our jobs and more hopeful that programs like the CLAMS project will strengthen our abilities as archivists to help not only our userbases but each other as well.

