The Collation

Research and Exploration at the Folger

Extensions of the book

A guest post by Daniel Shore

Working in the Folger Shakespeare Library over the past eight months, I’ve felt some dissonance between the rich physical resources of the Library and the digital focus of my book project, Cyberformalism, which explores the capacity of full-text searchable archives like Early English Books Online to expand the domain of philological inquiry to new objects of knowledge. Advanced search tools, I argue, allow us to uncover the history not just of words but of linguistic forms: phrases, formulas, moods, syntactic constructions, etc. Though the philological stories I tell stretch over hundreds or even thousands of years, their main events take place primarily in my period of expertise, the seventeenth century.

Yet at the Folger I have often felt peculiarly distant from the early modern books that I’ve been writing about. Day in and day out, I’ve spent more time in front of a screen than a printed page, most often with Early English Books Online-Text Creation Partnership (EEBO-TCP) texts, which are keyed versions of scans of microfilm reproductions of books and pamphlets that, in many cases, sit on shelves a few floors below the desk where I work. Rather than reading all the sentences in any individual book from first to last, I’ve more often read through hundreds or thousands of isolated sentences plucked from texts in the EEBO-TCP archive, sometimes wondering if “reading” is the right word for this activity. Other researchers at the Library have confided to me that they, too, have occasionally felt the dissonance of working primarily with digital texts even as the printed book sits waiting nearby. But the demands of my digital project have intensified my sense of shamefully neglecting the Folger’s real treasures.

While the primary goal of Cyberformalism is to explore the methods of philological inquiry opened up by digital texts and search engines, it also aims to situate search in a longer history of textual finding tools. To understand this history, my first port of call has been the scholarship of book historians like Anne Blair, Anthony Grafton, William Sherman, and Richard and Mary Rouse, but I’ve also leapt at the occasional opportunity to take off my digital mittens and attend to the book as a textured, three-dimensional thing, structured for use and bearing the marks of its users. One book in particular, a work of Latin philology written by the humanist Niccolò Perotti and first published in 1489 (and often reprinted—the Folger’s copy is of a 1494 edition), has prompted me to rethink my basic assumptions about the nature of search engines and finding tools more generally. 

The incipit of the Folger's 1494 Cornucopiae
The incipit of the Folger’s 1494 Cornucopiae

Perotti’s Cornucopiae might be described as a centaur text. Its body is a commentary on Martial’s Liber spectaculorm and first book of epigrams, which it follows line by line, word by word. In his glosses, Perotti assembles a remarkable array of examples from Latin literature. Martial’s word ferant, appearing in verse six, elicits no less than seven hundred examples of the verb fero (Latin for “to carry”) as it is used in other texts and authors. ((Martine Furno, Le Cornucopiae de Niccolò Perotti: Culture et method d’un humaniste qui aimait les mots. Geneva: Droz, 1995. p. 133))

Commentary on Martial’s word “ferant.”
Commentary on Martial’s word “ferant.”

The head of the Cornucopiae, however, is an alphabetical index of roughly 20,000 words, each of which directs the reader to one of the book’s approximately three hundred leaves. ((Ann M. Blair, Too Much to Know: Managing Scholarly Information before the Modern Age. New Haven: Yale University Press, 2010. p. 129))

Index pointing to single pages using Roman numerals
Index pointing to single leaves using Roman numerals

Turning to the indicated leaf, the reader will find the word helpfully printed in the margin next to its appearance in the text. The index and marginal aids mean that the volume functions as a dictionary as well as a poetic commentary; because it does not merely define words but gives collected examples of their use, it is also a powerful lexicographical research tool, one used by Erasmus and other prominent humanists. ((Jean Louis Charlet, 1997. “Niccolo Perotti (1429/30-1480).” Centuriae latinae. Cent une figures humanists de la Renaissance aux Lumières offertes à Jacques Chromarat. ed. Colette Nativel, 601-5. Geneva: Droz. p. 603.)) In the Folger copy of the 1494 edition, a reader has sought in a few places to augment the book’s reference function by writing additional words in the margin, underlining them in the text, and in one case directing attention with a manicule (or drawn hand).

Lexicographical marginalia in the Cornucopiae
Lexicographical marginalia in the Cornucopiae

Subsequent philologists dispensed with the book’s commentary but expanded on its lexicographical function; the Dictionarium of Ambrosius Calepino (1521), for example, advertises that it is “decerptus” (plucked) “ex Nicolai Perotti Cornucopie.”

The Dictionarium of Ambrosius Calepino
The Dictionarium of Ambrosius Calepino

In 1513 edition and those that followed, Aldo Manutius printed Perotti’s book with a reworked index, probably the first, according to Ann Blair, to use Arabic rather than Roman numerals on both sides of the page leaf. ((Blair p. 49.)) In the Folger’s 1527 copy, also published by Manutius, many index entries point not just to a single page leaf, as in the 1494 edition, but to multiple pages leaves and line numbers, a considerable practical improvement that allows the edition to forego marginal aids.

Index with Arabic numerals and multiple references.
Index with Arabic numerals and multiple references.

Without an index, the Cornucopiae would still be an impressively learned commentary, but it would be useless as a dictionary.

What can Perotti’s book teach us about search engines, which, like indexes, are instruments, useful things?  In good McLuhanite fashion, I had conceived of finding tools as “extensions of man,” prostheses that, like Galileo’s telescope or Hooke’s microscope, sharpen and amplify our vision. A reader can identify a syntactic construction in one sentence after another, but a search engine using Natural Language Processing (the computational parsing of sentences) can determine its presence or absence in each of the millions of sentences in a large digital archive. Search, in this conception, enhances our abilities, changing what it is possible to observe and know.

This conception is not wrong, I think, but the Cornucopiae suggests a different possibility. Its index is at once an addition to the book, standing apart from the commentary and pointing back to it, and an integral part of its use and identity as a work of lexicographical reference. Its presence transforms the nature of the book—dictionary as well as commentary—by transforming what it can do. Historians of the book have been fascinated by the ontological instability of “paratexts”—title pages, tables of contents, marginal notes, page headers, etc.—at least since the work of Gerard Genette. ((Gerard Genette, Paratexts: Thresholds of Interpretation. Cambridge, UK: Cambridge University Press, 1997.))  Perotti’s index is a classic example, standing on the threshold of the text, at once outside and inside, supplemental and essential.

To what extent should we regard a search engine as a paratext on the order of Perotti’s index?  The EEBO search engine postdates the Cornucopiae by roughly half a millennia; it is physically distinct from the book as an object; it can only access books insofar as they have been remediated as a digital full-text surrogates; it is “textual” without being, strictly speaking, a text. Yet the search engine, like the index, has the capacity to alter the nature of the work it searches by altering our mode of access to it. Just as the index of the Cornucopiae makes a commentary into a lexicon, a consultable collection of words, so too do sophisticated search engines make texts into consultable collections of linguistic forms, collections for which we as of yet have no name. If search is an “extension of man” (or woman), as Marshall McLuhan suggests, it is also an “extension of the book.” ((Marshall McLuhan, Understanding Media: The Extensions of Man. Cambridge, MA: MIT Press, 1994.))  It can show us things about texts because, as paratext, it is already textual; it can show us things about texts because, as prosthesis, it is already human. In the fullest sense, this double role is what we mean when we speak of a search engine as an “interface.”

Supposing we accept that a search engine is, like the index of the Cornucopiae, a transformative part of books printed half-a-millennia ago, a further consequence follows. An alphabetical index is highly specific to the work it indexes; it cannot be lifted and reused in another work, not even a repaginated version of the same work. All that can be extracted for reuse is the barest schema—at once arbitrary, deracinated and, at the same time, durably, unalterably conventional—of alphabetization itself. Conversely, a search engine can, by design, access many documents at once. New documents are added to EEBO-TCP without altering the search function used to retrieve it. Like a card catalog rather than an index, a search engine is a paratext that participates in a finite but open-ended collection of texts while remaining relatively independent of any of them. Its independence is only relative because it is limited by the content and formatting of the documents it retrieves; we can sort search results by gender, or nationality, or page formatting, or part of speech only if documents are marked up for these properties. For a search to succeed, in other words, there must be a fit between the capabilities of the search and the thing searched. To paraphrase Kant: searches without content are empty; archives without search tools are blind.

All of this might strike someone as a fair bit of ontological noodling. But the point is that the Janus-faced nature of finding tools—part of the book, part of us—is rooted in practice. We change books, in ways large and small, by changing the way we access and study them. Whether the catalog describes a book solely, as the title suggests, as a commentary (see the record for the 1494 Folger copy) or also as a dictionary changes how it will be found, read, studied, and understood by future scholars. Search engines alter these books by changing once again how we access them. No longer restricted to the categories of MARC records or cataloging notes, they can retrieve digitized full-texts according to a potentially limitless set of textual and markup criteria. By using, studying, and developing these finding tools, I’d like to think that I’ve been touched by a good many of the Folger Library’s books after all, even those I’ve never had the pleasure of handling.


DANIEL SHORE, Assistant Professor of English at Georgetown University, is working on a book project, Cyberformalism (under contract with Johns Hopkins University Press), about how search engines transform literary and philological inquiry. His first book, Milton and the Art of Rhetoric, was published by Cambridge University Press in 2012, and he has published articles in journals such as PMLA, Critical Inquiry, Milton Studies, and Milton Quarterly. He is a 2013-2014 Mellon Long-term Fellow at the Folger.


  • I am very troubled by the apparent inaccurate and/or careless use of the terms “leaf” and “page” in this posting. If I am reading the text correctly they appear to be used interchangeably. Of course, they are quite different things. A “page” is one side of a “leaf,” and a “leaf” contains two pages. This is a matter I constantly harp on to my students and I was surprised to find the error in a posting in The Collation.

    • You are correct, William, that in nearly all instances here the word should be “leaf” and not “page.” I’m afraid that in the back-and-forth of editing, not all changes made it into the published version. But I’ve now updated the post and am grateful for the chance to correct our mistakes!

  • These are weighty issues that you address, Dan, issues that all scholars of early modern studies have had to address ever since EEBO gave us a kind of wide access to early printed texts that often bypasses engagements with the material object. That possible disjunction is especially acute when experienced in a place like the Folger’s Reading Room.
    I find it especially productive that as you work through your own methodologies, and questions about this possible disconnect, you do so in a historicized way. You have comparative recourse to the tools and technologies of early modern media. Thinking about the functionality of the printed index through the lens of new possibilities of organizing access is a promising step towards a more robust understanding of the history of media.

    We also saw a similar approach recently in another Institute program. Collin Jennings described his work on topic modelling and the indexing of Adam Smith’s Wealth of Nations to the summer 2013 symposium on the Orality and Literacy Heuristic, codirected by Adam Fox and Paula McDowell. Such projects model exciting new approaches to early modern studies.

Leave a Reply

  • (will not be published)