404 Error: Database Not Found

For over 20 years, students, scholars, and the general public have been enjoying the benefits of fulltext search and formerly paper documents. Many workflows might look something like this: Google->Wikipedia->JSTOR->Chronicling America/Proquest historic newspapers->Digital Dissertations->digitized archives->physical archives->fill in gaps from other digital sources->add something “transnational” from that digitized French archives that you could never visit otherwise. Collect everything relevant and write it up.

Yet, in a review of three years of Environmental History, the primary journal of global Environmental History scholarship, articles cited very few explicit mentions of this now common workflow. There were a handful of references to online material- a blog post, an article in a digital encylopedia, or even a reference to a particular longitude/latitude on Google Earth. But unsurprisingly, most citations were to books (likely read in physical form, but what about pre-1923 Google books/Open books/etc), physical articles (probably read via JSTOR/Proquest/etc), or archival citations (very likely to still be physical archives, but could be online). That they didn’t cite to what was actually consulted is a problem as a historian’s methodology is very important to their authority as a scholar.

But this post is less about silence towards the database and more about explicit references to databases or online sources in the articles of the last 3 years. Two references to this type of work were particularly interesting in representing the new scholarship, John McNeil’s Presidential Address: Toynbee as Environmental Historian, and Daniel Simberloff’s “Integrity, Stability, and Beauty: Aldo Leopold’s Evolving View of Nonnative Species”.

McNeil’s address concerned Arnold Toynbee, (in)famous public intellectual and historian, as a proto-enviromental historian. And on what “big” histories could contribute to the field as a whole. In preparing the address/article, McNeil “read only 2%” of [Toynbee’s] output” (10-15 million words). He laments:

When I chose the subject for this address, I ignorantly assumed that Toynbee’s works would be available via Google books and I could instantly locate all the passages that use words such as environment or ecology. By the time I learned that I would have to work from the printed pages, it was too late to change my plans. But happily Veronica Boulter Toynbee, who worked as hard as her husband and a bit more carefully, prepared the indexes for all his major books.

The full text search is so ingrained in our methods that even the President of the American Society for Environmental History assumes this in his workflow. Had the books been digitized, I doubt McNeil would have completely described how he arrived at the requisite references. Perhaps giving us search terms and insight into the relevant passages as he does in his address, but not likely informing us of the OCR accuracy, any false positives, or false negatives.

The other interesting example comes from Simberhoff’s article on Aldo Leopold. I don’t mean to pick on Simberhoff, who has written an interesting article exploring one of the conservation movement’s important figures and his views on diversity vs stability in academic ecology. But Simberhoff appears to exclusively derive much of his archival material, beyond Leopold’s books, on those in the University of Wisconsin’s digitized Aldo Leopold collection.

The UW-M  collection has a variety of methods to approach it’s digital works. It helpfully provides some guidelines:

Most users with a scholarly or general interest in Aldo Leopold will find that the Detailed Contents List provides the best access to the collection. It describes each file series and, within each series, each box and folder in the collection, and there are links from the description of each folder directly to the digitized material in those folders.

Alternatively, readers can use full search capability (it doesn’t list the OCR accuracy rate or additional processing). This method is best for those “who are primarily interested in whether Leopold had any connection with a particular person, place, or topic.” Though this does not include his handwritten correspondence. Simberhoff does not describe which method he took in researching Leopold and this is a problem for the reader. Did Simberhoff ignore the handwritten material while performing a full text search for “nonnative” or “ecology”? Or did he review the archive front to back?

These are serious questions that our discipline needs to answer as the archive may literally look different each time we approach it.

