Bridging the Gap between Tools and Texts


When I sat down this week with a large stack of papers on textual analysis, I was hoping that they would illuminate for me how literary text functions on the web, and how analysis of literature has been changed by digital technologies.

What I found instead was a history of text encoding that entirely changed my focus.

In “Facilitating Communities of Practice in Digital Humanities: Librarian Collaborations for Research and Training in Text Encoding” Harriet E. Green looks at five case studies of text encoding projects that utilize TEI, or the Text-encoding Initiative guidelines, when encoding literary texts in SGML or XML, which are “two metalanguages that provide an application-independent standard for data interchange” (220). Libraries play a key role in educating scholars, usually graduate students, in how to use these guidelines and these markup languages. Scholars gain experience in a DH community of practice, gain experience in researching biographical information and other relevant contextual data, and contribute to projects, the final produce of which will offer enhanced searchability of texts and other benefits for analysis.

After reading this article, I was conflicted. The increased searchability of texts and other digitial advancements, such as word counts and text-marking, are wonderful for textual analysis. EEBO (Early English Books online) is a great example of a success in this regard. Being included in a community of practice is also valuable, especially in contrast to the solitary work of the typical humanities scholar. However, I couldn’t help but think that if I were a graduate student at one of the five case study institutions, I wouldn’t want to be involved in a project like this due to how labour intensive they can be. At the Kolb-Proust Archive, graduate students have worked with the librarian over the past fifteen years on the creation and editing of notecards about Proust correspondence. That’s quite a lot of labour for which most students didn’t see overwhelming results. The Victorian Women Writers Project VWWP at Indiana University at Bloomington has a similarly long history. Begun in 1995, it was revised in 2010 with all texts requiring updates and new texts being added. Students on this project showed a lack of enthusiasm in finishing the project. As one librarian explains: “We also had many students who didn’t finish their projects during the course of the semester. Most of them continued to work on them, but some we had to take care of their work for them” (225). MONK, a web-based text mining software hosted at the University of Illinois Library that contains TEI-encoded texts, is a prototype for many other projects. Even here, “the process of TEI encoding is so labor intensive” that projects were sometimes abandoned (222). The general consensus is that while the end product may be useful, the process of creating it is too arduous, and researchers feel “strong reservations about engaging in the actual process of encoding texts” (222). With regards to the University of Virginia (UVA) text encoding project, the librarian at the Scholar’s Lab notes: “‘Ten years ago… doing TEI was almost an art unto itself and people were interested.’ But now, she says, faculty and students are less inclined to do the encoding and far prefer to acquire the texts already marked up with encoding.” If I were one of these graduate students enlisted in mass encoding, I’d rather skip right to having the prefabricated tools as well.

However, I’m aware that receiving the tools after they have been created also means missing out on the community of practice inherent in working within Digital Humanities tools. Reading about the history of text analysis tools really makes me appreciate at DH today stands on the shoulders of so many innovators. Most histories of textual analysis, including “Towards an Archaeology of Text Analysis Tools” by Stefan Sinclair and Geoffrey Rockwell, and “A History of Humanities Computing” by Susan M. Hockey” note that Roberto Busa, a Jesuit priest, was one of the pioneers of textual analysis (along with and his collaborator Paul Tasman). Busa’s work on the Index Thomisticus project, which is a tool for performing text searches within the massive corpus of Aquinas’s works, lasted for approximately 30 years. Now, that’s a long time to do textual encoding. He, and others like him, had to create the tools they were using, which shaped their very processes and products, as they went. “A History of Humanities Computing” traces textual analysis by advances made over decades, from times where scholars were “hampered by technology,” to where “where academic respectability for computer-based work in the humanities was questionable.” By advancing digital processes, innovators in early DH work made it possible for the tools that we have today to come into being. They also did a great deal of work at resolving “the two cultures” of humanities and the sciences, by bringing “the rigor and systematic unambiguous procedural methodologies characteristic of the sciences to address problems within the humanities.” Since the 90’s, and continuing today, problems with the two cultures have “emerged again” as there is a divide between scholars who create tools and those who use and talk about them (Hockey).

I believe quite strongly that the divide between science and the humanities is an artificial one, and that applying scientific rigor to humanities projects is one way to overcome this gap. Although I can’t blame those graduate students who weren’t interested in encoding text, it’s a bit like denying the importance of basic procedures for scientific experimentation. You have to cultivate lot of samples into petri dishes before you can effectively prove germ theory. There’s something romantic, as well, about being on the forefront of innovation in the way that those early technologists were. I now get to enjoy the products of their labour via DH tools such as Voyant and the tools listed in Tapor, but being one of the first creators of a tool that no one had ever seen before would be something else. It reminds me of Stephen Ramsey’s controversial statement that “Personally, I think Digital Humanities is about building things. [. . .] If you are not making anything, you are not…a digital humanist.” I don’t agree, but I understand the appeal in believing so.

Using DH tools such as Voyant and the many tools available through Tapor brings me closer to understanding how English literature function on web, and how analysis of it has changed with digital technologies – my original goal for this week. I have explored these tools, and find that they’re quite useful. I particularly like Umigon, which analyzes the emotions in tweets. I enjoy using these tools that I didn’t have to construct myself, just as I enjoy using digital texts online with excellent searchability. They help me with conducting analysis, and I benefit so much from others’ work.

Having read about this history, though, I’m going to be asking myself two questions going forward:

How is what I am accessing online the product of another person’s rigorous labour?

And, what innovation could I create to help bridge the gap between the two cultures and advance DH? Projects that would not stultify an entire generation of graduate students preferred.



Hockey, Susan M. 2004. “A History of Humanities Computing.” In A Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth, 3–20. Malden, MA: Blackwell Pub.

Green, Harriet E. 2014. “Facilitating Communities of Practice in Digital Humanities: Librarian Collaborations for Research and Training in Text Encoding.” The Library Quarterly 84 (2): 219–234.

Sinclair, Stefan, and Geoffrey Rockwell. 2014. “Towards an Archaeology of Text Analysis Tools.” In DH2014. Lausanne, Switzerland.