The Internet made me do it

Confessions of an Accidental Archivist

Posted by Jack Brighton on May 16, 2015

Reading time ~6 minutes

My main theme is the extension of the nervous system in the electric age.

Marshall McLuhan - Letters of Marshall McLuhan (1987), p. 300

The ability to record and play back moving images and sound has been part of the human technology repertoire for barely more than 100 years. Electricity wasn’t commonplace anywhere until the turn of the twentieth century. The combination of electricity and media technology has brought an acceleration of change in our capacity to see and hear, paradoxically both extending and annihilating distance and time. It has also allowed us to create new forms of time-based narrative arts, and increasingly intervenes in every economic, political, and social interaction.

And most of this happened before the Internet.

If “all media are extensions of some human faculty,” as McLuhan has it, then archival media are an extension of our memory and imagination. For this to actually mean anything, the preservation of moving images and sound is necessary but not sufficient: its value in human culture depends on ready access.

I began my professional life in public broadcasting as a journalist and creator of electronic media. It is toward the realization of “persistent access” to the vast store of our multimedia memory that my career has increasingly become focused.

Fortunately we have no resources

Exposure to archival practice came from my struggles as a media producer in the emerging digital age. I began designing websites with streaming and downloadable multimedia in 1997, and quickly realized that without an archival plan the situation was becoming hopeless. I saw how quickly technology was changing, and suspected that the media we published on the web at that time would be unplayable within a few years. I placed bets all over the board by publishing RealMedia, QuickTime, Windows Media, and Flash but I assumed all these formats would sooner or later be obsolete. I guessed wrong how quickly that would happen. But I got one or two things right: I saved the original physical formats, and high-resolution digital files derived from the masters. And I began to maintain a database of everything I saved.

Fortunately nobody saw me doing this. As a lowly producer, I had no budget. Therefore, no one could cut my budget or tell me to stop. I was the guy running the station website in his “spare time,” and no one complained so long as I did the magic. But I soon noticed something truly disturbing: the problem of persistent media was growing more difficult, more complex, and just plain larger every day. And the rate of change was accelerating.

And that’s not counting our shelves and desk drawers everywhere filled with analog media in fragile formats with barely a peeling label to tell us what they are.

Toward content management and a life of data

You build a website with content and metadata. The presentation is just the access layer. A web page is accessed by humans by way of a browser that parses the html and presents a view we can understand and use. The same web page is accessed by machines (like web crawlers and other applications) by virtue of well-structured data they can parse and reuse. As I began to understand these things, I learned from my colleagues at other public media organizations that many of us were becoming deeply invested in technical solutions that could manage this data, and present it to humans and machines, at the cost of a very large contract and ongoing vendor dependency.

The solution I pursued seemed half-baked by comparison with these expensive, proprietary systems: a simple MySQL database tied to a file system. With this as a data core, we can build applications that speak the language of the Internet. We can present web pages to humans; RSS/podcast feeds to iTunes; Dublin Core and PBCore records to library and archival services. The most difficult challenge is getting good data to begin with. So that’s what I focused on, and sought external funding for with some success.

Today we openly share well-structured data with a growing number of public institutions and archival services, including the American Archive, the Pop Up Archive, Collective Access, and the Internet Archive. We provide open access to high-resolution digital media files and all the metadata we can capture and share. In some cases our partners add to and enhance the existing metadata. For example, the Pop Up Archive provides a speech-to-text transcription service that feeds back into our core data. Most of this happens automatically using PBCore as an exchange format between systems.

At Illinois Public Media we don’t aspire to fully solve the archival problem. We simply try to be an awesome version of the best data source ever. I think it’s important to keep role and scope in mind as more public media and cultural heritage institutions take on the challenge of preserving and creating access to their media collections. Illinois Public Media will serve as the source and authoritative voice for our collections, and if we can’t stand up our own trusted repository it’s within our mission to share it with other public institutions that can.

If you don’t know what you have, you don’t really have it

The complexities of persistent access in the age of rapid change in media technology can seem overwhelming for public media and other cultural heritage institutions. The situation is more dire with the deterioration of our legacy analog and physical media collections. We are racing against both time and scope, and for some significant portion of our audiovisual heritage, we will lose this race.

If we have any remaining arguments about which system or tool is best for managing media content and metadata, we can enjoy that conversation over drinks. The most important thing is to get the data about what we have, and put it in structured form so it can be accessed and shared. Media census projects, like those recently completed at Indiana University and the University of Illinois at Urbana-Champaign, are needed to begin answering the question of what there is to preserve. We can then marshal that data to prioritize the work of preservation and “persistent access.”

Persistence of Vision

The written symbol extends infinitely, as regards time and space, the range within which one mind can communicate with another.

Samuel Butler, Life and Habit, London: Trubner & Co, 1978

We live at a time when all previous forms of media are potentially hyperlinked and accessible. We added moving images and sound to the written symbol as means to extend perception, communication, knowledge, and imagination. It may seem lofty to claim that our media archives are to culture what memory is to the human brain. But to the extent that we have not yet embraced caring for these as means of extending our senses, our vision and reach are impaired.

The work still to be done is far from trivial, but my experience has been that it begins by focusing on the data. The technical means of handling data will change, but we can make the data accessible as systems and standards evolve. Small institutions like Illinois Public Media can’t do everything required for preservation, but we can serve as a node in a larger preservation ecosystem. In terms of preserving an aggregate of potentially all institutional media collections, resources can be allocated at different levels depending on roles and scope. The thing that has brought such rapid change, the Internet, is also a fantastic means of connecting the levels.

We may have limited resources, but fortunately we have an architecture of collaboration we’re just now beginning to understand.