Data Takes Command

This week continued a theme that we have focused on over the past three weeks: the digital is fundamentally different from physical history and we ignore those differences at our peril. As I learned about search, databases, and the new ways in which historians work, I became a little more pessimistic about this whole digital history thing. Patrick Spedding, Caleb McDaniel, Lara Putnam, and Marlene Manoff all criticized some of the unexamined practices of historians using digital tools. Whether these were poorly OCRed full text databases used for keyword searching, inexact and unreproducable searches, or the troubling implications of digital methods in general, the digital offers as many pitfalls as peaks.

Theoretically, as someone versed in the new digital methods, I will be aware of these shortcomings, think more deeply about what is happening “inside the box,” and understand when a digital method is less appropriate than a traditional close reading. James Mussel describes some of these digital best practices for historians- connect directly to APIs and you’ll know exactly what sort of search parameters you’re using as well as allowing other scholars to replicate your search. But Jennifer Rutner and Roger Schonfeld’s Ithaka S+R report on research practices indicated that historians fell much more into the former camp of inexact keyword searching, ending up with piles of unidentified archival photos (not a new phenomenon), and a mass of citations. The unorganized hard drive has replaced the unorganized filing cabinet.

And I cannot claim to be much better or more organized, especially when it comes to databases. As part of my day job as a public historian, I’m constantly creating databases. Some are as simple as excel spreadsheets for research notes, to large proprietary databases of digital documents or resource figures. Yet, through Stephen Ramsay’s article (not too much has changed in the past decade regarding relational database philosophy) and especially Mark Merry’s online course I learned many crucial organizing factors and philosophies behind relational databases, from method vs. source-driven design, to entity relationships.

Lev Manovich argued that while new media and databases can approximate or represent physical real world objects or media, it should not be forgotten that this representation is built on top of software. What we need to remember as we create new research methodologies is that not only are we constantly interfacing with software and data, not paper documents, but that these interactions are directed by the digital-industrial complex: Proquest, Elsevier, and Thomson Reuters. Until enough scholars demand API endpoints, transparent and stable databases, and disclosure about OCR accuracy, these keepers of the digital documents will be less likely to listen.

Leave a Reply

Your email address will not be published. Required fields are marked *