When it comes to the subject of tagging or encoding manuscript transcriptions in XML (extensible markup language) for Early Modern Manuscripts Online (EMMO), two important questions are how much should we tag and when should we do it.
With thousands of pages from a variety of genres, the “how much” question is a big one. For example, should tags be used to provide information about ink color, shifts in hand, size or ornamentation of letters, illustrations, marginalia, flourishes, indentations, spacing, symbols, quotations, layout, structure, lines, paper material, historical/literary connections, etymology, smudges, etc., etc.? The images of manuscript pages below give some idea of the challenges involved:
Should online transcriptions attempt to recreate the intriguing graphs, charts, and illustrations we see on these pages as well as the text and white space? If so, how? Should each line and symbol be tagged in relation to its position on the page? How about gradations, marks, or blots showing on the paper? Given the multitude of options available with encoding, doing so granular a version of a transcription may be possible. After all, the X in XML stands for extensible. This flexibility for what is noted—along with broader accessibility for the text—explains why the Folger Shakespeare Library is doing this type of work in EMMO.
But where to start with tagging? Entities such as the Text Encoding Initiative (TEI) provide extremely detailed and ever-widening guidelines for digital projects in the humanities (and elsewhere), and these are a helpful resource. However, the quantity of markup elements in TEI, not to mention associated attributes for the elements, might best be described as vast. Just because a huge amount of choices are available does not mean all of them have to be pursued, though, or at least not all at once.
Many projects dealing with digital texts end up using a customized and/or limited set of encoding tags to make a project feasible. EMMO is no exception. With thousands of manuscripts in the (always growing) Folger collection and the skill, effort, and time required to transcribe the archaic hands in which those pages are written, some limits had to placed on what could reasonably be accomplished in the three-year initial phase of the project.
With that in mind, the EMMO team decided we would focus on tagging the text of the manuscripts for now and let the accompanying high-resolution images provide additional information about the page. Of course, even the adjustable image will not show everything about the actual manuscript, but we think the digital representation and the transcription text together will serve as a valuable resource.
Accordingly, we set out to identify and test a tag set for primarily textual elements, but even this reduced scope contains many questions and possibilities. Would the lineation of the text be preserved? Would the original abbreviations, contractions, punctuation, and spelling (including apparent mistakes) be left as written? Would cross-outs and corrections made by the scribe be reflected in the transcription?
For answers, we turned to the methods of transcription already in use as a guide. Since semi-diplomatic transcription is a generally accepted model in the field of early modern paleography and the one taught here at the Folger Shakespeare Library, the starting tag set developed for EMMO closely reflects semi-diplomatic conventions. This simply means certain minor changes are made in the transcription for the sake of clarity and comprehension by a 21st-century readership. Semi-diplomatic transcription has set boundaries, and so do transcriptions for EMMO. Unfamiliar or just plain unusual spellings in the manuscripts (even those appearing to be mistakes) are maintained as is punctuation (often quite different from modern usage), capitalization, insertions, cancellations, and lineation. If words or parts of words are illegible or indecipherable, they are marked as gaps. The Collation‘s transcriptions follows semi-diplomatic conventions, so these may be familiar to readers.
A slightly trickier category of changes that are made in a semi-diplomatic standard, however, deal with abbreviated forms, contractions, and brevigraphs; in these cases, transcriptions show the expanded version of these letter constructions, since many may be unfamiliar to modern readers. Some of these abbreviations can be quite difficult (and interesting) for transcribers as they strive to recognize words with letters missing and/or raised.
For two brief examples, “wch” and “wth” commonly appear in 16th- and 17th-century manuscripts as abbreviations for “which” and “with”—though in secretary and mixed hands it is often a test to differentiate between the superscript “ch” and “th” paired with a leading “w.” In a semi-diplomatic transcription, “wch” would be entered as “which” with the “hi” tagged as an expansion and displayed in italics to signify it was expanded. Similarly, the “ch” would be tagged as lowered superscript and then displayed as regular text for readability. The abbreviated form “wth” (with) follows the same pattern. Telling the two apart often comes down to context and/or minute comparisons at high levels of magnification. Other regular abbreviations such as “ye” (the) work similarly, but in this case, the archaic thorn letter (looks with a modern “y”) is entered as “th”, tagged as a thorn brevigraph (expansion), and displayed as “th” in italics. Like “wch” the superscript “e” in “ye” is tagged as lowered superscript in the semi-diplomatic transcription to be displayed as “the.” Dromio, our online collation/transcription tool, simplifies this work shortcut buttons that enter frequently appearing abbreviations with the appropriate tags all in one step.
An image of a short letter from the Bacon-Townshend collection contains the common abbreviations discussed above and shows how the system works in practice. In EMMO, the image will always appear with the transcription, so the former serves as the full diplomatic version and the latter serves as the semi-diplomatic version for reading and searching.
A basic semi-diplomatic transcription of the letter with the expansions of these abbreviations and brevigraphs shown is below:
Madame, I thought it good for me to writ somthinge to
your Ladyship though I wrot the lesse, least by not writinge
I sholde seame to forget that dutie, which I iustly owe vnto
your Ladyship. I vnderstande allmost every weake by Mr
Stringar of your Ladyship good health & my fathers, which I am
very glad to heare of. I myself am somwhat sickly.
Some perswade me, that it is to a good end. My owne experi=
ence (as your Ladyship knoweth) is small to iudg. Suer I am, ther
is yet no certeintie of that thei saie. Yet I hope well, & I
praie to God I be not deceived of my hope. Thus with
humble remembrance of my dutie to my father I take
my leave, wisshinge your Ladyship longe to live in perfect health.
In comparing the original image with the transcription, sharp eyes may notice a few other examples of expansions:
- “yor” is expanded to “your”
- “yt” is expanded to “that”
- “yei” is expanded to “thei” (i.e., they)
- “La:” is expanded to “Ladyship” (notice the colon is dropped in the semi-diplomatic transcription)
- “pswade” and “pfect” are expanded, respectively, to “perswade” and “perfect” (the special “p” in these words has a stylized loop in the descender to signal the expansion). What we call the special p demonstrates the flexibility of early modern writing in which one brevigraph can stand for differing combinations of letters depending on style and use.
Signs of correction in a document are more straight-forward since most people have experience with crossing out words and replacing them, even if we use word-processing programs today, but how to display such instances still involves questions: should crossed-out items be included? In EMMO transcriptions, deleted words are entered (if they can be determined)and then tagged as deletions so that they display as struck through. If a deleted word is partially or totally illegible to the transcriber, periods “…” representing illegible letters would be entered and then tagged as a gap. Insertions (often to replace deleted words) are similarly entered then tagged as insertions and displayed as superscript in the line. An excerpt from the Inventories of the Townshend family shows some examples of these:
A basic transcription is shown below with the tagged deletions and insertions.
In the Lobie.
Item one livery beadstide matt & Cord a
fether bead and boulster iij blankettes
A rogge greene of yellowy Read a canope of Cadows
Item a Trundell beadstid matt & Corde a fether
bead & ij boulster iiij a pare of blankettes a dornex & a yelow Roge
Item a pallett bead & boulster for my Ladys Chamber
a pare of blanckettes & a dornex covering & the
Lobbye hanged Round with panted Clothe
Item ij downe pillowes
Item a Corse woole beade
The deleted and inserted words are hard to miss in the image. A few more examples of common expansions are also included in the transcription above: “Itm” becomes “Item” and the “es” terminal brevigraph appears at the end of “blankettes” and “blanckettes“.
A major advantage of tagging (or encoding) letters/words in this way, of course, is that it allows not only targeted searches but also flexible display options online. With the tags in place, we can show a transcription with the expansions and lowerings or without them. Similarly, deletions can appear or not. One could also choose to view only deleted words (or expanded ones) to look for patterns.
The EMMO tag set in its first phase does contain a few tags to identify the presence of non-textual items like illustrations and notational marks (e.g., a manicule), and a very limited number of tags to describe certain kinds of content (e.g., for postscripts in letters). Such tags, however, have intentionally been kept to a minimum in order to keep the transcription process manageable. Producing a consistent, neutral set of textual transcriptions for reading and analysis is the goal of the EMMO’s initial stage. Further tagging of the text as well as other features of manuscripts is expected and encouraged in the years ahead. Encoding for physical characteristics or iconography, or specific content, for example, could make for fascinating future projects. This is how the when question relates to the how much question, and we think the possibilities in this regard are exciting.