The traditional understanding of Machine Learning and Artificial Intelligence is that we (the humans) bestow our authentic and true knowledge on the ignorant and brainless machine. We explain what truth is to the machine with the hopes that it will learn how to perfectly replicate the pure human mind. However, my time as an intern at GBH taught me that this relationship between human and computer is more reciprocal than often thought.
One of my roles here at GBH was to create training data for the CLAMS teams to use for their Scenes-With-Text (SWT) project. I labeled still frames from selected videos in the American Archive of Public Broadcasting, according to specified categories. The goal of this task was to teach an ML model how to classify scenes with text somewhere in the frame.
So on I trekked with the classification guidelines in one hand and the trust of my objective truth in the other. Until I came across a few frames that stumped me:
First, let’s try to make sense of what this frame is even displaying. From what I can tell, this seems to be a picture/scan of a vintage advertisement superimposed on a graphical brick wall. The closest category that would fit this image is “F,” short for “Filmed text.” The specification reads: “It is common for camera footage to include things like signs, labels, papers, computer screens, etc., that have readable text on them. This category includes frames from such footage.”
While this seems to be close enough, something feels disingenuous about this classification. This advertisement was never “filmed” for this footage. Its image was taken and then fused with other graphical elements to create a cohesive frame. Is it only filmed if it involves the camera that is recording the video, capturing the text as it exists in time and space?
Here’s another example of a frame that left me in a state of uncertainty:
Most people who have seen old computer graphics would conclude that the black and white scribbles in the boxes on the display are text. However, it seems pretty much impossible to make out what any of the text says. I didn’t know whether or not to tell the computer that this was text. Even if the subject of what is being filmed is text, if it is illegible, does it count as text anymore?
The guidelines for the category of “filmed text” say no: “It also excludes filmed text where the text is completely illegible.”
But now I was contending with multiple considerations: what I (the human) understood the text to be, the relationship between the camera and the text, what the computer had the capacity to understand, and what I wanted the computer to consider as truth.
As I reflected on these varying perspectives, my trust in my own conception of truth had begun to waver. Yes, the subject of the frame is likely text that is legible. However, what happens to the integrity of the text after the multiple layers of abstraction (the computer screen, the camera lens, the screen I am watching it on)? This degradation makes the scribbles in the frame be only a signifier of text. However, my trust in the authenticity that comes with the camera lens makes me uneasy to simply disregard the text on the screen. In the first example with the image of the vintage advertisement, a graphically composite image was never “filmed” in the traditional sense. Is it disingenuous or accurate to label that as filmed text? After all, the text that is in the frame was captured with a camera at some point in time. Ultimately the following question remains: do we teach the machine to recognize only “[reflections] of basic reality,” or, in line with how we treat many abstractions and representations of reality, also want it to accept an image where “[it] bears no relation to any reality whatever: it is its own pure simulacrum.” 1
After a week of classifying and labelling, I decided to unwind by catching Ari Aster’s Eddington at the theater. In one of the opening scenes, I immediately noticed a sign with filmed text in the frame. My first thought was simply: “F.”
I quickly realized the nature of what had just occurred. By classifying so many images, I had effectively automated myself – rewired my malleable and mushy human brain to perform the job of a machine. Yes, my job was to teach the computer how to think like a human, but a byproduct of this transaction was that I was starting to think like a machine. By no longer seeing the movie scene and the text in the frame for its real function, but rather as merely a sign that was shared between me and the machine, I had effectively abstracted my own perceptual reality away from itself. My perception was through a model – reality was now a hyperreal.2
This reordering of truth between human and AI, while perhaps seemingly bleak, I think, poses a unique opportunity. Artificial intelligence, whether we want it to or not, is being integrated into many aspects of life. However, perhaps AI can teach us something about the reality we seek to create for ourselves. It can challenge our notions of truth and remind us to question our biased perception of the world.
We need not only look at ML and AI’s applications to see postmodern implications in the archive world. In his seminal essay “The Work of Art in the Age of Mechanical Reproduction,” Walter Benjamin examines the concept of “aura.” “Aura” is the “unique existence” of a work of art.3 It’s “a strange tissue of space and time: the unique apparition of a distance, however near it may be.”4 Another aspect of my work this summer is capturing CDs from WUWM’s radio broadcasts. We can think of listening to the radio show live – in the studio – as having “aura.” Each level of abstraction, however, degrades the aura of the original recording. The recording on the CD, the process of extracting that information from the CD and creating a .wav file from it, compressing that file into an .mp3 format – these steps create distance between the future listener and the original, thus degrading its aura. In the vintage advertisement frame mentioned above, the advertisement now takes on a completely different form and purpose than it originally had. With the power of graphical technology, it has been repurposed and morphed, and thus, much of its aura has degraded with distance.
Benjamin would argue that this destruction of the aura is beneficial. Publicly accessible archives maximize reproducibility and outreach, while still aiming to capture the uniqueness (or “aura”) of its artifacts. Much of the work I was exposed to as an intern involved replication and abstraction, all serving not only to make public media accessible, but also to ensure the record of its “aura” (its existence in time and space) is not lost through metadata and preservation.
My time at the GBH Archives brought to light the many postmodern considerations to be had with digital humanities, especially in the context of Machine Learning and Artificial Intelligence. Archives, more generally, are a perfect example of balancing the value that exists in the original, as well as the power that comes from mass accessibility. Additionally, as ML becomes a part of many of these workflows, it can teach us about how data can impact our understanding of history and reality. More importantly, it can teach us about what we understand as truth.
- Jean Baudrillard, Simulations (Semiotext(e), 1983), 11. ↩︎
- Baudrillard, Simulations, 2. ↩︎
- Walter Benjamin, “The Work of Art in the Age of Mechanical Reproduction,” in The work of art in the age of its technological reproducibility, and other writings on media (Cambridge, Mass.: Belknap Press of Harvard University Press, 2008), 21. ↩︎
- Benjamin, “The Work of Art in the Age of Mechanical Reproduction, 23. ↩︎

