Manuscript Loss in Digital Contexts

Originally presented at the 14th Annual Schoenberg Symposium on Manuscript Studies in the Digital Age, November 17, 2021

Thank you for that kind and generous introduction and thanks to Lynn for inviting me to present this talk today. Thank you to all of you this afternoon, or this evening for those of you in Europe, for sticking around for my talk tonight on manuscript loss in digital contexts.

I want to do a couple of things with his paper and I’m not entirely sure that they go together so please bear with me. The first thing that I want to do is to look back over some specific things that have been said in the past about loss and manuscript digitization specifically. There’s quite a long history of both theorists and practitioners giving lectures and responding to the topic what we lose when we digitize a manuscript – how much digitized manuscripts lack in comparison with “the real thing” and all of the reasons why digitized manuscripts aren’t as good as the real thing because of these losses. I want to address those complaints and to respond to them with examples of work that that we’ve been doing at Penn that I think answers some of these issues. The second thing that I want to do is to showcase the Lightning Talks. Last month we put out a call for five minute presentations, for anybody who wanted to submit a paper on the issue of loss specifically in digital work and digitization – this was this was the ask that we put out:

“The theme of this year’s symposium is Loss and we are particularly interested in talks that focus on digital aspects of loss in manuscript studies.”

From the call for Lightning Talk proposals, Schoenberg Symposium 2021

I was pleasantly surprised by the submissions, which didn’t cover old ground and in some cases respond to some concerns that have been expressed about digitized manuscripts in the past.

In June 2013 ASG Edwards published a short essay called “Back to the real” in the Times literary supplement.[1] I cannot overstate the affect that this piece had on those of us who were creating digitized manuscripts. When this piece came out I’d been at Penn for about three months, having just started my position in April 2013, and although Penn had been digitizing its manuscripts for many years and there was an interface, Penn In Hand, which is still available, this was before OPenn, well before BiblioPhilly, before VisColl. The first version of Parker on the Web, which Edwards names in his piece, had been released on 1 October 2009, but it was behind a very high paywall, which was only lifted when Parker on the Web 2.0 was released in 2018. 2013 was also about a year after the Walters Art Museum had published The Digital Walters, and released that data under an open access license, and I came to Penn knowing that we wanted to recreate the Digital Walters here – the project that would become OPenn – so at that point in time we were thinking about logistics and technical details, how we could take the existing data Penn had and turn it into an open access collection explicitly for download and reuse as opposed to something that was accessed through an interface.

Mid 2013 was also when a lot of libraries were really starting to ramp up full-manuscript and full-collection digitization – digitization at scale, full books and full collections, as opposed to focusing only on the most precious books, or digitizing only sections of books. I also discovered that this was two years after the publication of “SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemination,” a little article published in the journal Digital Libraries in April 2011 by Robert Sanderson, Ben Albritton, and others, which is notable because it is what would eventually become the backbone of the International Image Interoperability Framework, or IIIF, which I will mention again.

Edwards’s piece has been cited and quoted many times over the eight years since it was published and it for good reason. Edwards doesn’t pull any punches, he says exactly what he’s thinking about with regards to digitized manuscripts and honestly as somebody who was in the thick of things in summer 2013 this so this came as a little bit of a bomb in the middle of that. Since this is such a such primary piece I want to look at his concerns and see how they hold up eight years later.

“The convenience of ready accessibility is beyond dispute, and one can see that there may be circumstances in which scholars do have a need for some sort of surrogate, whether of a complete manuscript or of selected bits. But the downsides are in fact many. One of the obvious limits of the virtual world is the size of the computer screen; it is often difficult for viewers to take in the scale of the object being presented.”

A. S. G. Edwards, “Back to the real”

The issue of interpreting the physical size of manuscripts in digital images is an obvious one and it’s one that I talk about a lot when I talk to classes and school groups. Edwards is correct – it is difficult to tell the size of an object when it’s presented in an in an online interface and there are a few different reasons for that. So I’m going to take a look at two manuscripts in comparison with the thinking that comparative analysis might be helpful.

Search results screenshot from the BiblioPhilly interface (

Both of these manuscripts are from the Free Library of Philadelphia, and they were digitized as part of the Biblioteca Philadelphiensis project and I’m showing them here in the BiblioPhilly interface, which was created for this project. They’re both 15th century manuscripts from England. FLP LC 14 19 is a copy of the old statutes, the statutes of England beginning with the Magna Carta, a document first issued in 1215 by King John (ruled 1199-1216), and FLP LC 14 10 is a copy of the new statutes, English legal statutes beginning with the reign of Edward III (ruled 1327-1377).

FLP LC 14 10 (New Statutes) is pictured on the left and FLP LC 14 19 (Old Statutes) is on the right. I present them to you like this to give you a sense of how these two manuscripts look on the surface, and I’m going to point out some things that I notice as we go back-and-forth that gives me clues to their relative sizes.

The old statutes and the new statutes, both of these manuscripts have clasps, in the case of the old statutes manuscript, or remnants of clasps in the case of the new statutes. I have a sense of how large the clasps are in comparison with the rest of the book so the clasps look bigger to me on the old statutes, they take up more space. The decoration also looks bigger on the old statutes cover. Both of these covers have decoration embossed on the front and the new statutes book has a lot more of them and they’re thin. Which implies to me that the new statute book is bigger than the old one.

Now let’s look inside – again, FLP LC 14 10 (New Statutes) is pictured on the left and FLP LC 14 19 (Old Statutes) is on the right. Immediately I notice that the new statutes manuscript has more lines and there appear to be more characters per line and the writing looks smaller than it does in the old statutes manuscript. So again, what this implies to me is that the new statutes manuscript is bigger because you can write more in it. Now I’ve seen enough manuscripts to know that this can be misleading. There are many very small manuscripts that contain tiny tiny writing – like Ms. Codex 1058 from Penn’s collection.

This is a glossed Psalter, and back in 2013 before I arrived at Penn I spent a lot of time looking at this manuscript online. The first time that I saw it in person I was absolutely floored because it is so much smaller than the writing implied to me – for reference, the codex is 40 mm, or 1.5 inches, shorter than the old statutes manuscript. It’s small, and by including it here I’m aware that I’m helping to prove Edward’s point – although comparing the two statutes manuscripts is helpful in coming up with size cues, it’s still really difficult to tell generally, at least in this interface.

“It is also difficult to discern distinctions between materials such as parchment and paper, and between different textures of ink.”

A. S. G. Edwards, “Back to the real”

Let’s turn back to the two statutes manuscripts again. We’ll look at the support – zoomed in to 100% so we’re very very close to the material.

If you know what parchment looks like and if you know what paper looks like I think that it’s clear that both of these manuscripts are written on parchment. One of these is darker than the other one, that could be because of the type of animal the parchment comes from, or variance between hair and flesh side, or it could be how it was produced. Also some of the things that you’ll look for in parchment like hair follicles are not immediately clear at least in these examples.

I haven’t yet mentioned training and the knowledge that you bring with you when you come to a digitized manuscript but I think that’s really important. If you are familiar with the statutes manuscripts, if you’ve seen several of them you know that the new statutes are much longer than the old statutes and so new statutes manuscripts are bigger and old statutes manuscripts are smaller, and that’s just something that you know. You probably didn’t know that but I knew that and that knowledge is reflected in what we’ve already looked at for those manuscripts.

It’s the same for parchment and paper. if you’ve been trained and you know what parchment looks like and you know what paper looks like then you’re going to be able, most of the time, to tell the difference between them in digital images.

This next example is from another manuscript, LJS 266 from the Schoenberg collection, and this is something that’s very typical in parchment manuscripts. That rounded area is an armpit of the animal, and you can also see some hair follicles around there. So this is very clearly a parchment manuscript and not a paper manuscript.

And then finally this last example, Bryn Mawr Ms. 4, is a paper manuscript. You should be able to tell, if you’ve studied paper, because there a chain lines there and also the way the paper is wearing around the edges. Parchment doesn’t flake like that, paper does that and being able to recognize that has nothing to do with whether you’re looking at it in person or in a digital image it looks the same either way. Whether or not you can tell the difference has more to do with your own training than with the fact that it’s an image. Now, we could talk about image quality, but most images that you access from an institution will be digitized to some set of best practices and guidelines. The images will also be presented alongside metadata – so if you’re not sure whether it’s paper or parchment, you can take a peek at the metadata and it will tell you.

One of the things I like to talk about when I discuss digitized manuscript is the concept of mediation, how one of the things that digitization does is it mediates our experience with the physical object. The people who play a part in that mediation – photographers, and software developers who build the systems and interfaces, and catalogers. Metadata is another example of that. We trust that the mediation is being done effectively; we trust that the cataloger knows the difference between parchment and paper and that this information is correct.

Ink can definitely be an issue in digitization, but I’m going to step sideways and talk instead for a moment about gold leaf. Gold leaf is notoriously difficult to photograph in the way that manuscripts are normally digitized – in a lab, on a cradle or a table, perhaps with a glass plate between the book and the camera holding the page flat. But gold leaf was made to move, was made to be seen in candlelight where it would gleam.

Here is an example of the difference between the same page digitized in a lab vs. an animated gif taken in a classroom – and yes, this is the same page (Ms. Codex 2032, f. 60r). While the gold in the gif shines when it is passed through the light, the gold in the still image looks almost black. Two things: again, because I’m trained, I know that’s gold in the still image even though it doesn’t look like gold, because I know what can happen to the appearance of gold when it’s photographed under normal lab conditions. And now you do, too. Just because it doesn’t shine like gold doesn’t mean it isn’t gold. Two, we don’t include images like the first one in our record, and that’s a choice. We make lots of choices about what we include and what we don’t, many of them made because of time issues or cost. We could make different decisions, we could choose to include images taken under different sorts of lighting and include them in the record, the technology is there.

“Often we can’t tell what the overall structure of the work is like, how many leaves it has, and whether it contains any cancelled leaves…”

A. S. G. Edwards, “Back to the real”

Now it’s really my time to shine because this concern about the loss of the structure of manuscripts in digitization is why I started the VisColl project with Alberto Campagnolo and Doug Emery back in 2013; it was one of the first things I did after I came to work at Penn.

Screenshot of a book of hours modeled in VCEditor, the current software implementation of the VisColl data model

The aim of the VisColl project has been to create a data model and a software system for modeling and visualizing the structure of codex manuscripts. You can use on your own to help you with the study of individual manuscripts but it was also designed for use by manuscript catalogers, and in fact has been used in the BiblioPhilly project to create models that are presented alongside or integrated with the usual sort of page turning digital interfaces for manuscript collections, to provide a different kind of view.  This sort of view that Edwards was concerned about, something that enables us to see what the structure is and see how long the manuscripts are.

So we’ll go back again to the statutes manuscripts. I mentioned earlier that I know that old statutes manuscripts are smaller than new statutes manuscripts both in terms of physical size of the covers and also in the size of the text and that is reflected here. The old statutes manuscript on the left has 21 quires, mostly of eight leaves, and the new statutes manuscript has 52 quires, also of eight leaves. It is a very large manuscript, a thick manuscript, and VisColl provides a way for us to show this size in our interfaces in a way that isn’t normally done.

But making a model with VisColl is work, and it is not the only way to see the size of a manuscript. You can also tell the size of a manuscript by looking at its spine and edges, and this is coming around again to the issue of the choices that we – we, the institutions and libraries – make when we present these manuscripts.

This is a gallery view for LC 14 19 in the BiblioPhilly interface. If you scroll all the way to the end of the manuscript you can see that the presentation ends with the back cover, which makes sense, since when we close a book we see the back cover. But if we go look at the same manuscript in OPenn, which is the collection on our website where we make the data available, you’ll see that there are more images of the book there. BiblioPhilly takes this data and makes an interface that’s user-friendly, but OPenn is more like a bucket of stuff.

So we go to the bucket of stuff and we scroll down to the images and here are all the images, no page turning interface just image after image. We scroll all the way down and we’re going to find some images that don’t appear on the interface. There are images of the spine and the fore edges. These are available but we made a decision not to include those in the main BiblioPhilly interface, neither in the page-turning view or in the gallery view.

These are available in BiblioPhilly but you have to go to the “Binding Images” tab to find them. You have to know to go there; they are categorized as something different, something special, and not part of the main view. This is another choice that we made in designing the interface.

“… and we can rarely be confident that the colours have been reproduced accurately.”

A. S. G. Edwards, “Back to the real”

I can report that at this point in time there are standards and guidelines and best practices for ensuring that digital images have color correction checked against the manuscript. One of the ways that we ensure this is by including a color bar in either a reference image or occasionally in every single image. There are institutions, I believe that the British library is one of these, where you will see a color bar in every single frame.

At Penn we do include color bars and photographs but they are trimmed out as part of post processing. We do however maintain reference images that, as well as the photos of the spine and the fore edges, are available on OPenn and you can see them here. So here is a reference image for the old statutes manuscript that we’ve been looking at. In addition to being a color correction aid, the color bar also serves as a ruler. So here you can see that the color bar is longer than the manuscript.

Now if we look at the reference image for the new statutes manuscript you’ll see how much larger the book is in reference to the color bar. So this is getting back again to our starting issue of how big is the manuscript. It’s possible to see how big the manuscripts are because we have these reference images, but sometimes they are hard to find in the interface. And again this comes down to the decisions that we are making in terms of what is easy for you the user to see and find in our collections and through our interfaces.

Size information also comes along in the metadata. We’ve already looked at the metadata before when we were looking at the paper versus parchment question, and you can see here we also have information about the physical size of the manuscript. So even if you can’t tell by the cues in the image, if you don’t know anything about the genre which might help you know the size the physical size, if you don’t have access to a reference image with some kind of ruler or color bar that might give you an indication of the size, you might still have information in the metadata that should be easily accessible in whatever interface you’re using.

And now I want to start the pivot to the Schoenberg Symposium lightning talks (click here for the complete playlist) because one of our lightning talk speakers talks quite a bit about metadata and image coming along together. Lisa Fagin Davis in her talk “IIIF, Fragmentology, and the Digital Remediation of 20th-c. Biblioclasm” talks about IIIF, which I’ve already mentioned, the international image interoperability framework, and it’s use particularly with fragments in the study of what we’re now calling fragmentology. In her lightning talk, Dr Davis talks about how you can add images to a shared interface and it brings the metadata along with it. All of this current contextual information that Edwards was very worried about actually becomes an integral part of image sharing. So in IIIF you’re not just sharing an image file you’re sharing a lot of information along with it.

The last word that I want to give to Dr. Edwards is his final comments in this section of his piece. He says a lot more after this but I really love this question: “Are digital surrogates not really just a new, more expensive form of microfilm?” To which I say yes, and…

There’s just so much that you can do with digitized images and our lightning talks speak to this so I want to go through the lightning talks and talk about how their concerns answer or reflect Edwards’s own concerns.

Including Lisa’s talk, three other talks focus on interacting with digital images in platforms to work towards a scholarly aim. Chris Nighman from Wilfrid Laurier University in his talk “Loss and recovery in Manuscripts for the CLIMO Project” gives an overview of his project to edit Burgundio of Pisa’s translation of John Chrysostom’s homilies on the Gospel of Matthew, which he rendered at the request of Pope Eugenius III in 1151. The apparent presentation copy Burgundio prepared for the pope survives as MS Vat. lat. 383, which is provided online by the BAV, but unfortunately the manuscript is lacking two pages. However Nighman is able to restore the text from another copy of the same text, MS Vat. lat. 384, which is also available online.

In her talk “A Lost and Found Ending of the Gospel of Mark,” Claire Clivaz presents the Mark 16 project, which is seeking to create a new edition – the first edition – of the alternative ending of the Gospel of Mark, which although well attested in many languages, is usually ignored by scholars as being marginal or unimportant. And in their talk “Lost in Transcription: EMROC, Recipe Books, and Knowledge in the Making,” Margaret Simon, Hillary Nunn, and Jennifer Munroe illustrate how they are using a shared transcription platform, From the Page, to create the first complete transcription of the Lady Sedley’s 1686 manuscript recipe book, which has only been partially transcribed in the past, the sections of the text relating to women’s concerns in particular having been ignored.

In all four of these talks, rather than expressing discontent on the loss that happens when materials are digitized, (or, in Chris Nighman’s talk, complaining about the watermarks added by the Vatican digital library), the presenters are using digital technology to help fill in losses that are wholly unrelated to digitization.

In her talk “Loss and Gain in Indo-Persian Manuscripts,” Hallie Nell Swanson provides a fascinating overview of the various ways that Indo-Persian manuscripts have been used and misused, cut apart and put back together, over time. For her, digitization is primarily useful as an access point; until recently, Swanson has only been able to access these books virtually, and yet she’s able to make compelling arguments about them.

Kate Falardeau’s talk, “London, British Library, Add. MS 19725: Loss and Wholeness,” is particularly compelling for me, because I wonder if it’s the kind of presentation that would only be made in a digital context, although the paper itself is not “digital.” Digitization has normalized fragmentation in a way not seen before now; we’re used to seeing leaves floating around, disbound and disembodied, and Kate’s argument that an incomplete, fragmentary copy of Bede’s Martyrology might nevertheless be considered whole within its own context is an idea that works today but might not have been conceived at all back in 2013.

In another non-digital talk that uses digital technology in a completely different way, William Stoneman presents on “George Clifford Thomas (1839-1909) of Philadelphia: Lost in Transition,” a 19th century bibliophile who is often overlooked within the context of the history of Philadelphia book collecting, even though the books he owned later passed through more well-known hands and now reside in some of the world’s top libraries. Stoneman points to the Schoenberg Database of Manuscripts, directed by my colleague Lynn Ransom, which is a provenance database, that is it traces the ownership of manuscripts over time – including manuscripts owned by George Clifford Thomas.

In his entertaining and illuminating talk, “Extreme Loss and Subtle Discoveries: The Corpus of Sotades of Maronea,” Mark Saltveit presents on recently discovered lines of text by the ancient poet Sotades – discoveries that were made entirely by reconsidering quotes and identification of poets in earlier texts, and not at all through any kind of digital work (which is part of his point)

Finally, of all the Lightning Talks, Mary M. Alcaro takes a more critical approach in her talk “Closing the Book on Kanuti: Lost Authorship & Digital Archives.” This Kanuti, we discover, was attributed the authorship of  “A litil boke for the Pestilence” In the 15th century, and this authorship followed the text until 2010, when Kari Anne Rand said in no uncertain terms that this Kanuti was not the author. But no matter – online catalogs still list him as the author. A problem for sure!

I want to close by pushing back a little on the accepted knowledge that digitization only causes loss. I think it does, but there are other ways that we can talk about digitization too, which may sit alongside the concept of loss and which might help us respond to it.

I’ve talked about mediation a bit during this talk – the idea that the digitized object is mediated through people and software, and this mediation provides a different way for users to have a relationship with the physical object. The decisions that the collection and interface creators make have a huge impact on how this mediation functions and how people “see” manuscripts out the other side, and we need to take that seriously when we choose what we show and what we hide.

Transformation: basically, digitization transforms the physical object into something else. There is loss in comparison with the original, but there is also gain. It’s much easier to take apart a digital manuscript than a physical one. About that…

I’ve talked before about how digitization is essentially a deconstruction, breaking down a manuscript into individual leaves, and interfaces are ways to reconstruct the manuscript again. It’s typical to rebuild a manuscript as it exists, that’s what we do in most interfaces, but it also enables things like Fragmentarium, or VisColl, where we can pull together materials that have long been separated, or organize an object in a different order.

Finally, my colleague Whitney Trettien, Assistant professor of English at Penn, claims the term creative destruction (currently used primarily in an economic context) and applies it to the work of the late seventeenth century shoemaker and bibliophile John Bagford, who took fragments of parchment and paper and created great scrapbooks from them – as she says in this context, “creative destruction with text technologies is not the oppositional bête noire of inquiry but rather is its generative force.”[2] Why must we insist that digitized copies of manuscripts reflect the physical object? Why not claim the pieces as our own and do completely new things with them?

Digitization is lossy, yes. But it can also generate something new.

I must say that to see a manuscript will never be replaced by a digital tool or feature. A scholar knows that a face to face meeting with a manuscript is something that can be replaced by nothing else.

Claire Clivaz, “A Lost and Found Ending of the Gospel of Mark”

Claire Clivaz is correct – a digital thing will never replace the physical object. But I think that’s okay; it doesn’t have to. It can be its own thing.

[1] Edwards, A. S. G. “Back to the real?” The Times Literary Supplement, no. 5749, 7 June 2013, p. 15. The Times Literary Supplement Historical Archive.

[2] Trettien, Whitney. “Creative Destruction and the Digital Humanities,” The Routledge Research Companion to Digital Medieval Literature and Culture, Ed. Jen Boyle and Helen Burgess, 2018, pp. 47-60.

Books of Hours as Transformative Works

This is the text of a talk originally presented at the Center for Medieval and Renaissance Studies at the Ohio State University in September, 2019. I presented a shorter version of the paper at Dark Archives: A Conference on the Medieval Unread and Unreadable, at Oxford University in September 2019.

Good afternoon, and thank you everyone for coming today. Thanks especially to the Center for Medieval and Renaissance Studies for inviting me, to Chris Highley and Leslie Lockett for organizing, and to Nick Spitulski for making all my travel arrangements.

The advertised topic of today’s talk is Books of Hours as Transformative Works, and I’m excited to be here to talk to you about this work that I’m really only beginning to work on. I’m hoping that we have time in Q&A to have a good discussion, and that I might be able to learn from you. 

But first, a bit about me so you know where we’re starting. 

I am a librarian and curator at the Schoenberg Institute for Manuscript Studies (SIMS), which is a research and development group in the Kislak Center for Special Collections, Rare Books and Manuscripts at the University of Pennsylvania. I’ve been at Penn, at SIMS, for six years, since SIMS was founded in 2013. The work we do at SIMS is focused broadly on medieval manuscripts and on digital medieval manuscripts – we have databases we host, we work with the physical collections in the library, we collaborate on a number of projects hosted at other institutions. Anything manuscript related, we’re interested in. 

For the past three years I’ve been co-PI, along with Lois Black of Lehigh University and Janine Pollock of the Free Library of Philadelphia, of a major project, funded by the Council on Library and Information Resources, to digitize and make available the western medieval manuscripts from 15 Philadelphia area institutions – about 475 mss codices total. We call this project Bibliotheca Philadelphiensis, or the library of Philadelphia – BiblioPhilly for short. My work on the project is largely related to project management: making sure the manuscripts are being photographed on time, that the cataloging is going well (and at the start we had to set up cataloging protocols and practices). The grant, as a whole, has gone remarkably well and we’re in the process of writing the final report right now. The manuscripts went online as they were digitized, in the same manner that all our manuscripts do: 

They go on OPenn: Primary Digital Resources Available for Everyone, which is a website where we make available raw data as Free Cultural Works (for BiblioPhilly, this means images, including high-resolution master TIFFs, in the Public Domain, and metadata in the form of TEI Manuscript Descriptions under CC:0 licenses – this means released into the public domain). 

OPenn has a very specific purpose: it’s designed to make data available for reuse. It is not designed for searching and browsing. 

There is a Google search box, which helps, but not the kind of robust keyword-based browsing you’d expect for a collection like this. 

And the presentation of the data on the site is also simple – this is an HTML rendition of a TEI file, with the information presented very simply and image files linked at the bottom. There’s no page-turning facility or gallery or filmstrip-style presentation. And this is by design. We designed it this way, because we believe in separating out data from presentation. The data, once created, won’t change much. We’ll need to migrate to new hardware, and at some point we may need to convert the TEI to some other format. The technologies for presentation, on the other hand, are numerous and change frequently. So we made a conscious decision to keep our data raw, and create and use interfaces as they come along, and as we wish. Releasing the data as Free Cultural Works means that other people can create interfaces for our data as well, which we welcome.

We also have a user-friendly interface. We partnered with a development company called Byte Studios out of Milwaukee Wisconsin, which also worked with the Walters Art Museum on their site called Ex Libris ( It’s this site, which we also call BiblioPhilly, which we expect most people will use to interact with our collection. It has the browsing facility you’d expect, 

here we’re selecting a Book of Hours, 

So here’s Library Company MS 5, 

we have a Contents and Decorations menu so we can browse directly to a specific text. Here’s the start of the Office of the Dead with a miniature and an illuminated initial.

We can also download individual images or the TEI for the whole manuscript directly from this site

And then further down the page is all the data from the record you’d expect to see on any good digitized manuscript site. This is pulled from the TEI Manuscript Descriptions and indexed in a backend database for the site, while the images are pulled directly from their URLs on OPenn.

I want to return back to the top and point out the icon in the middle, which is meant to resemble the edge of an uncovered manuscript spine.

This function of the website is, as far as I know, original to BiblioPhilly, and it contains an interactive view of the physical collation of the manuscript. Initially we have diagrams of each quire, but then we can select a quire and view the bifolia diagrams separated out, and if we select one 

we can view the page images that form the bilofia.

I’m showing you this both to give you a sense of the kind of work I focus on in my day to day work, but also to start to get us thinking about the concept of transformation as it applies to medieval manuscripts. Manuscript digitization and the BiblioPhilly project specifically is one example of how we transform manuscripts: through digitization we deconstruct manuscripts, breaking them virtually into individual pages and providing metadata about the object they represent (that is, the physical manuscript) as well as what we in the business call structural metadata, which is what enables us to then reconstruct some kind of digital version of the physical object. As the example of the collation view implies, it’s possible to have multiple digital versions of an object, focused on different aspects of the physical object (in this case we provide both a page-turning view that mimics the experience of paging through the manuscript, and a collation view that shows us how the manuscript would appear if we were to take it apart, as well as providing diagrams that would not be available to us if we were just using the manuscript in the reading room. 

For the rest of my talk I want to think about another kind of transformation: the textual and physical transformation inherent in a group of manuscripts that dominated the manuscript trade for 250 years – Books of Hours.

Transformative Works / Language of Care

Before I do, however, I think it’s only fair to come clean. Besides my family, there are  two things in the world that I love more than anything. One of them are medieval manuscripts. The other one is Star Wars. I’m a fan. I read and write fan fiction, I participate in fandom activities on and offline, I encourage other fans to create transformative works.

I’ve been able to combine my loves together in various ways too, from collaborating on a Star Wars fan story set in the middle ages and focusing on a group of medieval scribes,

to collaborating on a series of videos with another medievalist and Star Wars fan, Dr. Brandon Hawk at Rhode Island College, where we compare manuscripts from Penn’s collections with the “Ancient Jedi texts” shown in The Last Jedi. (Sacred Texts: Codices Far, Far Away)

So this project is another opportunity for me to combine my great loves. However, it’s also probably important to note that although I am in some sense a codicologist and manuscript scholar I am neither technically a scholar of Books of Hours – I don’t have a PhD in Art History, for example – nor am I a scholar of fandom. Which, honestly, is one of the reasons I am excited to be here with you all today, because I’m hoping that you’ll be able to help me through some of the questions I have, that I don’t yet have answers for.

Back to Transformative Works. 

Transformative work is a concept that comes out of fandom: that is, the fans of a particular person, team, fictional series, etc. regarded collectively as a community or subculture. We typically talk about fandom in relation to sports, movies, or TV shows, but people can be fans of many things (including manuscripts). As defined on the Fanlore wiki:

Transformative works are creative works about characters or settings created by fans of the original work, rather than by the original creators. Transformative works include but are not limited to fanfiction, real person fiction, fan vids, and graphics. A transformative use is one that, in the words of the U.S. Supreme Court, adds something new, with a further purpose or different character, altering the [source] with new expression, meaning, or message.

In some fandom communities, transformative works play a major role in how the members of that fandom communicate with each other and how they interact with the canon material (“canon” being the term fans use to refer the original work). Transformative works start with canon but then transform it in various ways to create new work – new stories, new art, new ideas, possible directions for canon to take in the future, directions canon would never take but which are fun or interesting to consider.

For example, Darth Vader reimagined as a medieval Dark Knight

There is a small but growing academic movement to apply the concept of transformative work to historical texts. Some of this work is happening through the Organization for Transformative Works, which among other things hosts Archive of Our Own, a major site for fans to publish their fanworks, and provides legal advocacy for creators of fanworks.

The Organization for Transformative Works also publishes a journal, Transformative Works and Cultures, and in 2016 they published an issue “The Classical Canon and/as Transformative Work,” which focused on relating ancient historical and literary texts to the concept of fan fiction (that is, stories that fans write that feature characters and situations from canon). An upcoming special journal issue on “Fan Fiction and Ancient Scribal Culture,” which will “explore the potential of fan fiction as an interpretative model to study ancient religious texts.” This special issue is being edited by a group of scholars who lead the “Fan Fiction and Ancient Scribal Cultures” working group in the European Association of Biblical Studies, which organized a conference on the topic in 2016. 

You will note that the academic work on transformative works I’ve cited focus specifically on fan fiction’s relationship with classical and medieval texts, which makes a fair amount of sense given the role of textual reuse in the classical and medieval world. In her article “The Role of Affect in Fan Fiction ,” published in the Transformative Works and Cultures special issue of 2016, Dr. Anna Wilson places fan fiction within the category of textual reception, wherein texts from previous times are received by and reworked by future authors. In particular, Dr. Wilson points to the epic poetry of classical literature, medieval romance poetry, and Biblical exegesis, but she notes that comparisons between modern transformative works – that is, fan fiction – and these past examples of textual reception are undertheorized, and leave out a major aspect of fan fiction that is typically not found, or even looked for, in the past examples. She says,

“To define fan fiction only by its transformative relationship to other texts runs the risk of missing the fan in fan fiction—the loving reader to whom fan fiction seeks to give pleasure. Fan fiction is an example of affective reception. While classical reception designates the content being received, affective reception designates the kind of reading and transformation that is taking place. It is a form of reception that is organized around feeling.”

Wilson, 1.2

For my paper at the International Medieval Congress at Leeds in 2018 I described the manuscript University of Pennsylvania LJS 101 as an example of “medieval manuscript as transformative work,” not as a piece of data to be mined for its texts, but as a transformative work in itself. 

Here is the manuscript in question. UPenn LJS 101 is the oldest codex we have in the University of Pennsylvania Libraries by at least 150 years and which is one of only two codices in our collection which is written in Caroline minuscule (the other one being UPenn Ms. Codex 1058, dated to ca. 1100 and localized to Laon). 

The bulk of the manuscript, folios five through 44 (Quires two through six), are dated to the mid-9th century, but in the early 12th century replacement leaves were added for the first four leaves and for the last 20 leaves. LJS 101 reflects the educational program set up in the Carolingian court by Alcuin, featuring a copy of Boethius’s translation of Aristotle’s De Institutiatione (which was commonly called Periermenias, the name also used in this text) and a short commentary on that text (also called Periermenias), along with a few other shorter texts.

Over the course of the talk I did a slow walk-through of the manuscript, examining it from cover to cover and noting every instance of a signs of care from the people who created and, for the most part, have used the manuscript since it was created. 

I talked about the 19th century binding with several 20th century owner’s marks, 

the 12th century replacement leaves at the front and the back of the manuscript (which notably contain all but one of the texts extant in the manuscript) 

The 12th century corrections that run throughout the 9th century section

The colorful highlighting that was added to diagrams and some of the headwords

There are also many quires’ worth of text missing from the main text, and the first and last quires have been misbound in a way that makes me think someone dropped the quires, scattering the bifolia across the floor, and picked them up without paying attention to the order.

There were two things I discovered as I was working on that paper, with regard to trying to fit the manuscript into the frame of a transformative work. The first issue was what exactly is it that is being transformed. Looking again at the definition of the transformative work

Transformative works are creative works about characters or settings created by fans of the original work, rather than by the original creators. Transformative works include but are not limited to fanfiction, real person fiction, fan vids, and graphics. A transformative use is one that, in the words of the U.S. Supreme Court, adds something new, with a further purpose or different character, altering the [source] with new expression, meaning, or message.

The thing that is being transformed is something canon – usually a text (and in text I’m including TV shows or movies), but it could be something else; an artwork, or a sports game, for example – and then there is another thing that is the transformation, the transformative work. In the case of LJS 101, one could argue that the canon is the various texts in the manuscript, but by the time they are there, they have already been transformed – they’ve been copied and recopied, edited, a scribe has decided to use this copy for their manuscript and not that one (if they even had a choice, which we can’t know), someone decided which other texts to include. If I argue instead that the canon was the 9th century version of the manuscript – as it existed when it was created – and the version we have now is the transformative work, I find that a bit more satisfying, although there’s still another problem. 

I’m not comfortable applying Dr. Wilson’s concept of affective reception to the people who created and worked with LJS 101 – I didn’t want to suggest that these people loved the manuscript the same way that I do – but I did want to explore the idea that this person or people cared about it, and that other people have cared about this manuscript over time enough that it survives to live now in our library at the University of Pennsylvania. Their interests may have been scholarly, or based on pride of ownership, or even based on curiosity, but whatever their reasons for caring for the manuscript, they did care, and we know they cared because of the physical marks that they have left on these books. Manuscripts that survive also often show examples of lack of care, damage and so forth, so those elements need to be included in this framework as well. So coming out of that paper I suggested a language of care around their use, rather than the frame of Transformative Work.

After Leeds, I spent some time wondering if there were actually a way to make the frame of Transformative Work fit any medieval manuscript, or if it was just another one of my zany ideas. But in the fall, when I was teaching an undergraduate course on medieval manuscripts with Dr. Will Noel, the director of the Schoenberg Institute and the Kislak Center, I had a bit of an epiphany as he was lecturing on Books of Hours. He made the very important point that people saw Books of Hours as at least part of a ticket to salvation. As I will describe in a moment, Books of Hours are organized around the veneration of the Virgin Mary, who acts as an intercessor between us and Christ. By using Books of Hours to ask Mary to intercede with us for Christ, we hope to spend less time in purgatory and to make to heaven, to be with Christ, sooner than we would otherwise.

I still thought this idea was a bit weird, so I sat down with my colleague Dr. Nicholas Herman, the SIMS Curator of Manuscripts, and talked about the potential of manuscripts as transformative works, and he reminded me that, unlike LJS 101, which really only works as a transformative work in a physical sense, Books of Hours can work both textually and physically. What do I mean by this?

Textually, Books of Hours are designed to be lay versions of the texts that priests, monks, and nuns used during their lives in the Church. Churches, cathedrals, and monasteries had antiphonaries – containing liturgical chants – and breviaries –  books furnishing the regulations for the celebration of the canonical Office – for the use of their communities, usually very large books that the members of the community would share as they sang and recited prayers together. 

Historians trace the growth of Books of Hours to the late 13th century, when social and economic changes led to both a growing secularization and an embryonic urban middle class, which emerged through the 14th and 15th centuries. Speaking generally these groups – although small – had money, and they were both literate and interested in books. They were also pious. In Time Sanctified, Roger Wieck explains that the laity at this time wished to imitate the clergy by adapting their prayers, adapting their books, and by adopting their direct relationship with God. The book of hours gave them all of these things: “a series of prayers like the clergy’s, but less complex, and a type of book like the breviary, but easier to use and more pleasing to the eye.”

The growth of the cult of the Virgin around this same time also contributed to the development and popularity of Books of Hours. Indeed, the thing that determines whether a prayer book is a Book of Hours or some other type of prayer book is the inclusion of the Hours of the Virgin. The Hours of the Virgin did not come into the common use before the 10th century although it may be older, and it was added to breviaries in various monastic orders throughout the 11th and 12th centuries. The Hours of the Virgin were extracted from the breviary and became the centerpiece of the Book of Hours: a set of prayers to Mary, designed to be recited daily and throughout the day – eight times, roughly following the canonical hours that would be practiced in a monastery – an ongoing reminder of the pious life of the user of any given Book of Hours.

In addition to the Hours of the Virgin, Books of Hours will typically include a Calendar, which will list festivals and saints’ days presumed relevant for the geographical location in which the owner of the book resides; The Gospels; the Hours of the Cross and Hours of the Holy Spirit; two prayers to the Virgin known as the Obsecro te and the O intemerata; the Penitential Psalms and Litany; the Office of the Dead; and various and perhaps numerous Sufferages. 

The Hours of the Virgin were pulled from existing monastic or liturgical books, but so were other parts of a typical Book of Hours – the Calendar, the Office of the Dead, and of course the Penitential Psalms. Other elements, including the Hours of the Cross and Hours of the Holy Spirit, the Obsecro te and the O intemerata, and the Suffrages have uncertain sources although it’s unclear in my research whether they were original to Books of Hours.

Returning to the frame of Transformative Works, here we have Books of Hours, as a class or type of books, that represent a transformation of canonical liturgical texts originally developed for clerical and monastic users, into texts that are explicitly for lay use, and this transformation was explicitly made from a place of affection on the part of the people doing the transforming. One could thus argue that Books of Hours are transformative works of liturgical works. 

But that’s looking at Books of Hours as a class. What about transformation within the class? Although as I explained earlier Books of Hours usually contain a set group of texts, and they usually appear in a set order, my theory coming into this is that, in practice the textual organization of individual Books of Hours is much more variant. So I did a bit of work, using data from OPenn, to visualize the variance across a collection of Books of Hours.

Through OPenn, I have access to three collections of digitized Books of Hours: those digitized through the Bibliotheca Philadelphiensis project, those digitized through The Digital Walters, which is the ongoing digitization project at the Walters Art Museum in Baltimore – the data for which is hosted on OPenn – and the collections of Penn Libraries itself. As part of the data publication, OPenn hosts spreadsheets that list all the manuscripts in a given repository or collection, so it was simple enough for me to filter for only those titles that include the word “Hours”, check for stray odd things, and then use some scripts I wrote to generate a new spreadsheet that pulls out the contents of each of these manuscripts. Then I color coded each text to align with the list of texts in the “usual” order prescribed by Roger Wieck in Time Sanctified.

Here are the first couple of books of hours, and you can see that the texts in both are in a slightly different order than the “usual” order.

To more easily view the variance I turned the color charts on their side, placing each manuscript in a row, and stacked them. The Hours of the Virgin are the gold bar down the center; I did it this way because the Hours of the Virgin are the central text of Books of Hours and it made sense to visualize the other texts around that one. The rest of the contents are arranged around that text. We can see that the area before the Hours of the Virgin tend to be warmer, and the area after the Hours of the Virgin tend to be cooler, there is an awful lot of variation there. Unfortunately this chart isn’t organized in a thoughtful way; There are a mix of uses represented here, including Paris, Rome, Sarum, Franciscan, Bourges, and Utrecht, among others, and they range from the 14th through the 16th centuries, mostly from France and the Netherlands, but they aren’t ordered, and that’s definitely something that would need to be addressed in future use. The chart also doesn’t take into account any physical modifications that might have been made to the individual manuscripts, modifications that could lead to texts being reordered at some point after the book was written. 

This view also doesn’t take into account the actual or relative length of the books, so I tried another way to visualize the texts. 

In this example, Books of Hours from the Walters Art Museum, each book has its texts distributed across a row, which is the same length for each book. Again the Hours of the Virgin are in gold, with the other texts colored appropriately, but because the gold bar of the Hours of the Virgin appear very long in some, and short in others – assuming that the Hours of the Virgin is approximately the same in each manuscript – you get a better sense of the size of the books in number of folios from this visualization.

So we’ve looked at Books of Hours as transforming from monastic liturgical texts to texts used by laypeople, and then at the transformation of those texts in terms of which are included and which left out, across a collection of Books of Hours. I want to close by talking very briefly about a third way in which Books of Hours might be considered transformative works: physical modifications made to them, both on purpose and accidentally, after their creation and as a result of their use.


For several years now Dr. Kathryn Rudy at the University of St Andrews has been doing important and cutting-edge work on how people in the medieval and early modern periods physically interacted with their books, particularly their prayer books. In this section I’ll talk about two of her studies that are particularly relevant to the concept of Books of Hours of transformative works.

In her book Piety in Pieces: How Medieval Readers Customized Their Manuscripts, Dr Rudy classifies the many ways in which the owners of Netherlandish prayerbooks modified these books over time. In the introduction, she points out that it’s often difficult if not impossible to understand exactly why some of these changes were made, but clearly some of the possible reasons lie within the realm of Wilson’s definition of affection. Rudy says:

“In this study I explore the ways in which medieval book owners adjusted the contents of their books to reflect changed circumstances. Such circumstances were not usually so overtly political, but they nonetheless reveal other fears and motivations. Religious, social or economic reasons could also motivate such emendations. Augmentations to a book reveal strong emotional and social forces.”

Dr Rudy divides the modifications into two groups, those that require rebinding and those that don’t, and comes up with a detailed variety of changes, which I list here. This list includes many ways that people changed their prayer books, from changing the text in various ways 

(in this example, an owner found an error and had a professional scribe fix it) to adding new leaves or removing old ones. This list isn’t even exhaustive – Dr Rudy’s study also includes sections on modular design – where prayer books were built from modules that were designed to be modified on spec, either during the time the book was being made or after – and on modifications that led to complete overhauls of books; for example, building books out of quires that were originally part of other books.

Most relevant for the discussion of Books of Hours as transformative works and the accompanying concept of affective reception is Dr Rudy’s final chapter, in which she lists the reasons that changes might have been made – reasons that she refers to as patterns of desire.

“I have asked in this study: how did later users register their opinions that a book considered perfectly acceptable by its previous owners was for them somehow incomplete, and by what means did they express their discontent? How can their acts of recycling and upcycling be interpreted? The kinds of augmentations owners made to books reveal certain patterns of desires, which I enumerate here.”

Although these desires vary significantly from the kind of affection that I’ve discussed earlier in the paper – which focused on the love of the Virgin Mary and the desire to reach God (so, I suppose, aligns most clearly with “H. Fear of Hell”) – these other desires also reflect affection of various sorts.  

One of Dr Rudy’s other studies focuses on densitometry, or the quantitative measurement of optical density in light-sensitive materials. It is a simple fact that medieval and later users of manuscripts leave traces of their use of manuscripts in the form of dirt around the edges of pages, and sometimes on decorations and illustrations as well. More use would deposit more dirt, more dirt would make an area darker, and that darkness vs. the lightness of the rest of the page can be picked up using a tool called a densitometer. Dr. Rudy used a densitometer to measure the distribution of dirt through a number of Netherlandish prayer books and first published her findings in “Dirty Books: Quantifying Patterns of Use in Medieval Manuscripts Using a Densitometer,” in the Journal of Historians of Netherlandish Art in 2010. Her conclusion was that in applying densitometry to prayer books, one might be able to determine which pages – and thus which texts – users of those books turned to the most. 

While my concern with the care of LJS 101 focused on how it was manipulated by later users – by how they bound it, notated it, cared for it in a physical sense – Dr. Rudy’s use of densitometry is designed to determine which prayers the people who actively used the prayer books recited most frequently. It’s a physical indication of religious practice – Dr Wilson’s affective reception. 

Here are just two examples from Dr Rudy’s article. This first is from a Missal – a liturgical book, not a personal prayer book – That shows where a priest (or perhaps priests) kissed the illumination enough to damage it. This circle and cross at the bottom of the page is an  osculation plaque, which was designed to be kissed and touched in place of the illumination, as artists knew that their work would be the object of veneration. So that plaque has been well-worn, but the priest wandered, at times reaching up as high as Christ’s feet.

In addition to measuring the wear of illuminations to see how people caressed their books, Dr Rudy used densitometry to measure dirt on the edge of pages. These pages show very heavy discoloration in the bottom margins, which implies that someone (or many someones) held the book open frequently to this page, which contains the incipit of a prayer to the “Seventy-two Names of the Virgin.” 

Someone loved the Virgin Mary, so they returned to this page again and again to read this prayer, thereby physically changing the book with the addition of dirt from their fingers. This book is one of many that contain approximately the same texts, in the same approximate order, but each of which was designed and written to be used by individuals for prayer and contemplation. And finally this type of book, the Book of Hours, was a transformation of liturgical texts into something that individuals could use for prayer in their own lives.

Thank you

The Sacred Texts: Manuscripts in Star Wars and Star Wars Fanfiction

This is the text of a talk originally presented at the conference Fan Cultures and the Premodern World at Oxford University in July, 2019, organized by Dr. Juliana Dresvina of the Oxford History Faculty. This presentation represents a collaboration between myself and Dr Brandon Hawke of Rhode Island College, and is essentially a summation of our video project Sacred Texts: Codices Far, Far Away 

title slide, with art by (@gwendy85 on Twitter)

Hi, My name is Dot Porter, and I want to start by thanking Juliana for the wonderful organization of this conference, and also for including me in the program. This is very different from the kind of conference I normally present at – in my day job I’m a special collections curator at the University of Pennsylvania, specializing in medieval manuscripts, their digitization, and their post-digital lives. Basically I get paid to digitize medieval manuscripts and then play with them. (I’d be remiss if I didn’t mention the Bibliotheca Philadelphiensis project, funded by the Council on Library and Information Resources, which is just finished, and through which we digitized and made available for reuse more than 465 codices from institutions in Philadelphia)

The search interface for the Bibliotheca Philadelphiensis project

Aside from my family there are two things in life I adore: medieval manuscripts, and Star Wars. I must admit that while I am a scholar of manuscripts, of a sort, I am also a fan. I love manuscripts – the way they look, feel, smell; I love to hold a manuscript and think about all the other people who have touched it, and consider the signs of use that imply their long histories. This interest has led to current work on conceiving of medieval manuscripts as transformative works themselves, first presented at Leeds 2018 and work I’m continuing looking specifically as Books of Hours. (My original draft of this presentation featured some of this work, but it threatened to take over, so I axed it all; a blog post of my Leeds paper is on my blog, if you’re curious). 

While I am arguably a manuscript scholar, I am most definitely not a scholar of fandom studies – you will, I’m sure, find my theory wanting – nor am I a scholar of Star Wars, but I am a fan. I do the things that fans do. I’m on Tumblr, although that platform is pretty dead now, and I have a fandom Twitter account, which is much more active. I write and consume fan fiction, and I regularly commission artwork to illustrate my stories and stories I would like to write. I have written exactly one notable meta, which was even picked up by the AV Club – they actually cited me, unlike many of the other websites, which only cited the person who stole my work and posted it on Reddit!

My claim to fame in the Star Wars fandom

In Star Wars: The Last Jedi, released in December 2017, we were introduced, for the first time, to manuscripts in the Star Wars universe. I had avoided trailers and spoilers, so the first time I saw this was in the theater, and I was, as the kids say, shooketh. Not only one manuscript, but a whole shelf-full of them! And they’re important. Rey, our heroine, has been sent to the island of Ahch-to to bring Luke Skywalker back to help the Resistance, led by Luke’s sister General Leia Organa, defeat the First Order. Rey has been there for a day or so, following Luke around, making no headway, when she is called to the Uneti tree, a large, hollow, Force-sensitive tree that houses these manuscripts. It’s in the company of these books that Rey and Luke finally communicate with each other, when Rey admits that she has only recently come to the Force and that she needs Luke to train her to be a Jedi, and when Luke grudgingly agrees to give her some lessons, but also tells her that the Jedi must die. Exciting stuff, and the books are there to hear it.

Rey in the Uneti tree
The manuscripts in the tree
Luke with a book
Luke’s hand on the sacred text

According to Star Wars The Last Jedi: The Visual Dictionary, Luke Skywalker scoured the galaxy for these texts and collected them himself, storing them in the tree that we see in the film. So these texts weren’t originally all in one collection, they are from many different planets, potentially written in ten different places, ten different times, ten different languages and alphabets, although there’s only one we ever see in the film. The blog post “Inside the Lucasfilm Archives: The Jedi Texts” gives us an up-close look at the prop book that was shown in the film; as you can see it’s a real book, written and bound, and even damaged. There are manuscripts in our collection at Penn that look not very unlike this book. It is a real manuscript. 

This is one manuscript in the universe. What else do we know about manuscripts in star wars in general? To be honest: not much. But we do know that it is rare to write by hand (as opposed to writing with digital technology like data pads). In Claudia Gray’s novel Bloodlines, which takes place six years before The Last Jedi, Leia Organa is preparing for a fancy party when she finds a handwritten note at her seat, and she’s shocked: “Virtually nobody wrote any longer; it had been years since Leia had seen actual words handwritten in ink on anything but historical documents.” So it appears that, by the time the current films take place, there are no longer manuscripts being actively written in the galaxy, or at least it’s very rare.

Interestingly there is one character in the Sequel Trilogy who it is suggested knows how to write by hand: Kylo Ren, formerly Ben Solo. There is a scene – the same scene is actually shown three times, from three different points of view – where a young padawan Ben is sleeping and his Uncle, Luke Skywalker, comes to him and looks into his head, sensing great darkness in his dreams. Ben calls his lightsaber to either attack his uncle or defend himself against him, depending on the version of the scene, and in one of these shots we can see that he has a calligraphy set in his bedroom. We can see the set here, in a screenshot of his desk just before he calls his lightsaber over – which knocks over the pen and inkwell and jar of parchment scrolls in the process – and in The Art of Star Wars: The Last Jedi.

What else do we know about these specific books? There is concept art in The Art of Star Wars: The Last Jedi; including six internal pages and six shots of the bindings.

I remember looking at the concept art and thinking how alike and different they were from the manuscripts I’ve had the pleasure of working with at Penn, and I discovered that my Twitter mutual Brandon Hawke, an Assistant Professor of English at Rhode Island College, was having many of the same thoughts that I was. So in October of 2018, Brandon came down to Penn and we sat for hours in front of a green screen and talked about manuscripts and Star Wars, comparing books in the Penn collections to what we see of the manuscripts in the concept art. We’ve been posting snippets of our discussions on the Schoenberg Institute YouTube channel, and there’s a link at the top there if you want to check them out. So for most of the rest of this paper I’ll be walking through some of the possible comparisons between real manuscripts and the Star Wars manuscripts. I want to stress that we did this for fun, and not for science, and that we’re limited by the collections at Penn and by our own knowledge.

Consider yourself warned: The remainder of this presentation is essentially an educated fan, raving.

As far as Brandon and I have been able to determine, this is a previously unknown script in the Star Wars universe. When I saw it my mind immediately went to Ge’ez, shown here in an early 20th century book of Hymns from Ethiopia. There’s something about the blockiness that is just slightly curved, and a few of the letter forms are slightly similar although I don’t think that’s necessarily meaningful.

We also made a comparison with Coptic, which is thinner, more curved, and perhaps a closer match.

For the third example we looked not at the text, but at its layout on the page. We found a similarity with this 16th century collection of Persian poetry, both its illuminated header (similar in aspect to the illuminated blue line of text in the center of the ancient Jedi text) and the framing of the text.

Aside from text, it is clear that the concept art of pages supplied to us here represent astronomical texts. This is really not surprising, considering that in the Star Wars universe we have a galaxy that seems to have been very closely connected, between planets and cultures, for a very long time, and so it makes sense that even the most ancient texts would be concerned with objects in the system – stars and planets and moons – and how they related to and interact with one another. And this is a major concern in medieval astronomical texts, too: these texts illustrate people trying to make sense of the system they live in, in the best way they know.

One of the pages in the jedi texts is the symbol of the Galactic Republic, but placed on some kind of chart, with characters dispersed through the chart and text – perhaps labels – along the outside. We found a similarity with this chart in LJS 57, a 14th century astronomical anthology from Spain. I don’t know exactly what this chart represents but I can tell you that astronomical texts are full of similar charts; it was one of the ways that medieval people made sense of the data they had available to them.

Something similar is happening here, in LJS 449, a 15th century German medical and astronomical miscellany. These charts are perhaps a bit simpler than the Spanish chart, but they have that attractive blue coloring. Both the coloring and the arrangement of data around the circle reminded Brandon and me of the diagrams on this page of the Jedi texts.

The next three slides show diagrams from a mid-13th century copy of Johannes de Sacro Bosco’s, Algorismus and Tractatum de sphaera, an immensely popular text that was copied and translated and commented upon from the time it was written in the early 13th century (it is possible that our copy was written during Sacrobosco’s lifetime) through the 16th century. It is full of diagrams illustrating the movement of the planets, and the sun, and the moon in relation to the earth. I personally find these diagrams most reminiscent of the two pages on the bottom left, although I feel like their organization suggests a sense of scale that is lacking in the medieval diagrams.

Medieval astronomers only had to think about the earth, and the moon, and the sun, and a few other planets. On the other hand, the Star Wars universe operates on a whole other level – a galaxy with countless star systems and planets that aren’t even charted. When I look at these diagrams I see a clever attempt to illustrate scale using the relatively primitive technology of ink and paper in place of the star charts and 3D maps that we see in the films.

On the other hand, there are some really simple 1:1 comparisons to be made, such as this diagram, which pretty clearly illustrated the phases of a moon.

I want to take a quick look at the bindings of these manuscripts, particularly this piece of concept art, which is quite similar to the prop that we see in the film.

This has a fairly standard binding structure, quite similar to LJS 102, the Ethiopic manuscript we looked at earlier, except for the front cover, which is built of three separate pieces that are obviously connected together. In western bindings, if a wooden cover were a composite of multiple pieces, we would expect that to be obscured, as in this late 13th century Catalonian manuscripts (It’s hard to tell, which is the point, but this cover is made of three pieces of wood).

The only example of a cover like this I’ve seen is from the Walters Art Museum, this 14th century Ethiopian Gospel book. The cover was broken and then sewn back together, but this was the result of an accident, not done on purpose.

My colleague Alberto Campagnolo also suggested that it is similar to the Chinese practice of writing on bamboo strips and binding them together, as in this 18th century example. 

This is one instance where the artists who created these concepts have done an excellent job with suggesting a manuscript culture – in fact, several manuscript cultures, cultures that use what is available to them. There are two manuscripts here that appear to be bound in decorated tusks, one that has what appear to be shells embedded in a leather binding, and another that might be bound in hairy skin or – I like to think – had the binding grown on it underground. In any case these all suggest books written in different places, perhaps at different times, and as a manuscript scholar I find that fascinating.

Following up on this I wanted to see how the concept of the manuscripts was received by writers of fan fiction. As a fan author myself I have written a few stories featuring the ancient Jedi texts, but given my interests that made sense; I was curious to see what other authors have done with them. I think there’s more extensive work to be done here, but in reading through the 40 or so stories I was able to find (by searching AO3 for ancient jedi texts, and the “jedi text” tag) I discovered not surprisingly that the stories focused on the text of the books, not on their physical appearance (which is at least partially due to fan fiction being a written medium, vs. film being a visual medium) and that there are three main themes that can appear by themselves or be combined:

  • Rey can read the texts on her own, or she needs help (Kylo Ren, C3PO, Obi Wan Kenobi’s force ghost) 
  • The translation is used to further the story (whether or not it happens) 
  • The texts do something (e.g., magic spells)

What will happen next? Will there be manuscripts in the Rise of Skywalker, the final film in this last trilogy? Of course I hope so, and it seems likely. The Uneti tree was struck by lightning and burned, but Rey took the manuscripts with her (here is a screenshot of a drawer in the Millennium Falcon, at the very end of the film, showing the books clearly safe and tucked away)

and in the Poe Dameron comic #27 we learn that Rey has been working with C3PO to translate the texts.

And there’s also the spectre of Kylo Ren with a calligraphy set; if he had access to these manuscripts when he was studying with Luke Skywalker, it’s possible that he has read and perhaps even annotated some of the books. Only time will tell, and I for one can’t wait for December.

Thank you!

fan art based on the Lewis Psalter by

Is This Your Book? What we call digitized manuscripts and why it matters

This is a version of a paper I presented as a Rare Book School Lecture at the University of Pennsylvania in Philadelphia on June 12, 2018, originally entitled “Is this your book? What digitization does to manuscripts and what we can do about it.” 

Good afternoon and thank you for coming to my talk today. The title of my talk is “Is this your book? What digitization does to manuscripts and what we can do about it.” However I want to make a small change to my title. I’m not entirely sure if there’s anything we can do about what digitization does manuscripts but I do think we can think about it, so that’s what I want to do a bit today. I want us to think about digitized books – specifically about digitized manuscripts, since that’s what I’m particularly interested in.

So, like any self-respecting book history scholar, I’m going to start our discussion of digitized manuscripts by talking about memes.


Definition of the word “meme” from the Oxford English Dictionary.

The word meme was coined in 1976 by Richard Dawkins in his book The Selfish Gene. In the Oxford English Dictionary, meme is defined as “a cultural element or behavioral trait whose transmission and consequent persistence in a population, although occurring by non-genetic means (especially imitation), is considered as analogous to the inheritance of a gene.” Dawkins was looking for a term to describe something that had existed for millennia – as long as humans have existed – and the examples he gave include tunes, ideas, catchphrases, clothes fashions, ways of making pots or building arches. These are all things that are picked up by a community, ideas and concepts that move among members of that community, are imitated and modified, and which are frequently moved on to new communities as well where the process of imitation and modification continues. More recently the term meme has been applied specifically to images or text shared, often with modification, on the Internet, particularly through social media: If you’ve ever been RickRolled, you have been on the receiving end of a particularly popular and virulent meme.

This is all very interesting, Dot (I hear you say), but what do memes have to do with digitized manuscripts? This is an excellent question. What I want to do now is look at a couple of specific examples of memes and think a bit in detail about how they work, and what it looks like to push the same idea through memes that are similar but that have slightly different connotations. Then I want to look at some different terms that scholars have used to refer to digitized manuscripts and think a bit about how those terms influence the way we think about digitized manuscripts (if they do). My proposition is that these terms, while they may not exactly be memes, function like memes in the way they are adapted and used within the library and medieval studies scholarly communities. So let’s see how this goes.

In the film The Black Panther, which was released back in February of this year, there’s a scene where a character has come to the country of Wakanda to challenge the king for the throne. This character, N’Jadaka (also named Erik Stevens, but better known by his nickname Killmonger), is a cousin of the king, T’Challa, but was unknown to pretty much everyone in Wakanda until just before he arrives to make his challenge. At the climax of this scene, during which Killmonger and T’Challa fight hand-to-hand in six inches of water, Killmonger – who is clearly winning – turns to the small audience of Wakandans gathered to witness the battle and exclaims, “IS THIS YOUR KING?” If you haven’t seen the film I’m about the spoil it for your: it turns out the answer to that question is NO.

This is a phrase that was born to be a meme, and within a month that’s exactly what happened.

According to the Know Your Meme website the first instance of the “Is this your king” meme appeared on March 20 on Twitter when @TheyWant_Nolan tweeted a screen shot of the scene with the caption “is this your spring”. If you think back to March, the weather was pretty terrible everywhere around the country. It was long and tedious going back and forth between snow and heat then back to snow. Is this your Spring? NOPE.

This type of meme is a snowclone, defined as “a type of phrasal templates in which certain words may be replaced with another to produce new variations with altered meanings, similar to the “fill-in-the-blank” game of Mad Libs.” I would like to note here that this term, snowclone, was coined in 2004 by American linguists Geoffrey K. Pullum and Glen Whitman specifically to describe this phenomenon. The concept of a snowclone has been around for much longer than the term – think of “I’m not an X but I play one on TV” which was the most hilarious phrase when I was a kid – and the “Is this your king” meme works the same way, where we replace king with some other word to make a phrase that is understood to elicit a negative response.

Here are some other examples of this meme featured on its Know Your Meme page. These all supply the identity of the question asker, they vary widely by topic, and one of them makes a slight modification to the image, but they all imply a negative response to the question.

I made one myself. My meme features a screen shot of my favorite manuscript, UPenn LJS 101, as seen through the Penn in Hand manuscript interface. In my meme, the question asked is, is this your book? As we know from the context of the original meme, the answer to the question is no. This is not my book. Or: It’s not my real book.

I’ve made a few other memes and for some reason most of them play with the relationship that a digitized version of a manuscript has with the physical object.

Memes such as “Is this your king” and this next one, the “Is this a pigeon” meme, enable us to ask questions with assumed answers. In this meme, the original scene is from an anime where a human-like android sees a butterfly and asks, “Is this a pigeon?” This is another snowclone, where the question asker, the object of the question, and the question itself can be replaced with almost literally anything else. I find these snowclone memes work well for my needs, though I find the differences between the emotions that these two memes elicit fascinating.

As before, I’ve replaced the object of the question with digital images of LJS 101 and specifically identified myself as the question asker. As with the previous meme, we know the answer to the question posed is no, although the context is different: while the king meme is used to express aggressive negativity, the pigeon meme is used to express mild but total confusion. The same idea can be pushed through both memes – is this digital thing a manuscript? – and while the answer is the same – no it’s not – the negative response of the pigeon meme is “oh you silly thing, thinking the digitized manuscript is the same as the manuscript” while the negative response of the king meme is “that thing is NOT the same as the manuscript, I’m offended you think so, and I’m going to throw it off a cliff so you don’t try it again.”

Although both of these memes can be used as a kind of mirror for us to view the relationship between a manuscript and its digitized version, they expect different responses and elicit different emotions, much as different words used to refer to the same situation or person might invoke different emotions. The memes are, in effect, acting as a kind of terminology, so now I want to pivot and talk about how terminology might act as memes.


I would like to take it as a given that that how we talk about things influences how we think about them; therefore, the terms we use to describe things matter. The terms we use to describe other people matter; the terms that we choose to refer to digitized manuscripts matter. I would also like to reiterate the proposition I made a few minutes ago that our terminology, while perhaps not memes themselves, are meme-like. In his 2016 article “’ut legi”: Sir John Mandeville’s Audience and Three Late Medieval English Travelers to Italy and Jerusalem,” Anthony Bale discusses Jerusalem as a meme in medieval English travel writings, but I find that his description of meme fits well with what I would like to do here. He says, “the meme proposes a model of cultural transmission based on audiences’ ongoing use and appropriation of the source, as opposed to the scholarly desire to return to the source as the “best” or “original” iteration.” (for a term, this would mean common usage points not to the original meaning of the word, but to the word as it is being used. That’s a bit of a circular argument but I think it makes sense) He continues, “Memes have not one stable author, no unitary point of origins, and are not retrospective, but rather change with their audiences, causing people to do things; stimulating actions and changing behaviors; leading people to take a particular route, see a particular site, notice one thing but not another, find new meanings in an old source.” (Bale, p. 210)

Following this theory, terms work like this:

  1. A term begins with a specific meaning (e.g., outlined in the OED, citing earlier usage),
  2. A scholar adopts the term because we need some way to describe this new thing that we’ve created. So we appropriate this term, with its existing meaning, and we use it to describe our new thing.
  3. The new thing takes on the old meaning of the term,
  4. The term itself becomes imbued with meaning from what we are now using it to describe.
  5. The next time someone uses that term, it carries along with it the new meaning.

Some scholars take time to define their terms, but some scholars choose not to, instead depending on their audience to recognize the existing definitions and connotations of the terms they use. For example, in her 2013 article “Fleshing out the text: The transcendent manuscript in the digital age,” Elaine Treharne (coming out of a description of how medieval people would have always interacted with a physical book) says: “for the greater proportion of a modern audience on any given day, one has necessarily to rely on the digital replication: the world of the ironically disembodied and defleshed simulacrum, avatar, surrogate.” (Treharne, p. 470) [emphasis mine] Here Treharne uses the terms simulacrum, avatar, and surrogate without defining them, and she groups them together, in that order, placing simulacrum first in that list. More than the other two, simulacrum has a negative connotation – as we can see from its entry in the OED, a simulacrum is a “mere image”; it looks like a thing without possessing its substance or proper qualities; it is a “specious imitation”. Although it is near identical in meaning and from related Latin roots as the term facsimile, which I’ll discuss in a moment, facsimile lacks the negative connotations that simulacrum has. Although the terms are undefined by the author, it seems that this was a purposeful word choice intended to elicit a negative response.

Compare this with Bill Endres, who in his 2012 article “More than Meets the Eye: Going 3D with an Early Medieval Manuscript” spends several paragraphs defining his terms and arguing for why he chooses to use some terms and not others. Endres says, “I will refer to 3D and 2D images as digital artifacts or digital versions, although not totally satisfied with either term as it relates to epistemology. I am tempted to refer to them as digital offspring, the results of a marriage between digital and manuscript technologies, with digital versions having unique qualities and a life of their own. This term is problematic but it speaks to the excesses, commonalities, and deficits when digital versions are measured against their physical antecedent.” (Endres, p. 4) Endres then discusses some other terms, including two of the ones I will consider in a moment, so we’ll return to his thoughts later. The point here is that Endres defines his terms and explains why he is using them, while Treharne relies on us to understand her meaning through the known definition of her terms.


For each term I will discuss pre-digital definitions of the term, using the Oxford English Dictionary as the source.[1] I’ll also include a few quotes where scholars refer to digitized manuscripts using that term, although these quotes are meant to be representative and not exhaustive (that is, I couldn’t tell you the first time that the term was used by someone to refer to a digitized manuscript, but I can give you an impression of how the term has been used or is being used currently).

Let’s begin with the term facsimile.


It is from the Latin meaning literally make similar. The earliest attestation of the term is from 1661, and refers to a transcribed copy of a text, and not necessarily something that looks just like the text it is being copied from. About 30 years later, facsimile is being used to mean an exact copy or likeness; an exact counterpart or representation, and the citations refer to written texts or drawings. The term continues to be used according to this definition into the later 19th century, by the time photography of books and manuscripts has become well-represented in the scholarly landscape. (David McKitterick, Old Books, New Technologies, pp. 117-118)

By the late 19th century, facsimile has been adapted to refer to the communication of images through radio, wire, or similar methods – the modern day “fax” machine, for example. This meaning maintains the previous definitions focusing on a facsimile as some kind of copy, but adds the meaning of communicating over distance, and I expect these combined uses of the terms – print facsimiles plus the sharing of images over distance – are why digital facsimile became an obvious term to use to describe these new representations of old objects.

The use of facsimile to refer to textual materials clearly varies over time and from individual to individual. In his 1926 article ‘Facsimile’ Reprints of Old Books, A. W. Pollard seems to use the term according to its 1661 attestation, not according to its 1691 attestation. He says “It is intended to cover any reprint the form of which has been influenced to any considerable extent by the form of the edition reproduced.” (Pollard, p. 305) Pollard’s ‘Facsimile’ reprints include “1) Photographic facsimiles, 2) Type-facsimiles, i.e. editions in which types of similar founts to those used in the original are set to follow the original setting as closely as possible; 3) more or less luxurious reprints which seek to reproduce the general effect of the original with such concessions to modern usage as the producer may think desirable.” (Pollard, p. 306)

Facsimile or digital facsimile has been, for as long as I can remember, the default term that libraries use to refer to their own digital copies, and that scholars use to refer to the digital images they incorporate into their online projects. In November 1993, Kevin Kiernan gave a presentation at a symposium of the Association of Research Libraries [Kiernan, “Digital Preservation, Restoration, and Dissemination of Medieval Manuscripts”] in which he says that the Electronic Beowulf  “will in its first manifestation make available in early 1994 a full-color electronic facsimile of Cotton Vitellius A. xv to readers in the British Library and at other selected sites.” He continues,  “As this electronic archive grows, it will incorporate facsimiles of many other documents that help us restore parts of the manuscript that were lost or damaged by fire in the early eighteenth century.” Kiernan is referring not only to straightforward digital images, but also to images taken under ultraviolet light that were included in the edition. As he says later in the presentation, because of the UV images “Readers of the electronic facsimile will thus acquire a reproduction of the manuscript that reveals more than the manuscript itself does under ordinary circumstances.”

The use of the term facsimile makes it possible for scholars to consider how digital facsimiles relate to older ways of making similar. In “The Ghost in the Machine: Digital Avatars and Medieval Manuscripts“, Sian Echard discussion of the restoration of manuscripts by Matthew Parker and his circle, which she interprets as a kind of facsimile. Dr. Echard says “Today, digital technologies continue to recreate medieval books for a variety of audiences, and the digital facsimiles, like the hand and machine produced examples … both reproduce and relocate their medieval objects. But our current attitudes toward facsimile differ from Parker’s and Dibdin’s, and may in fact inhibit our ability to see the extent to which we too are recreating medieval text objects according to our own tastes. As technology has enabled ever more exact reproduction, the cheerful refashioning proposed by Parker has been replaced by an emphasis on the photographic, on the exact, with at times an accompanying confidence that perfect reproduction can approach the revelation of an object’s truth.” (Echard p. 201)


The term surrogate is interesting because, unlike facsimile – which is a fairly straightforward synonym for a copy – the term refers to something standing in for, or perhaps replacing, something else.

It was first used in the 16th century to describe the act of appointing someone as a delegate or a substitute. In the 17th century the term is adopted to be a noun – to refer to a person who is thus delegated. Other uses of the term, meaning more or less similar things, are attested through through the 17th century,

until 1644 we have a general meaning substitute.




Since the 1970s the term has been used in a more intimate way, to refer to sexual surrogates and surrogate mothers. As my colleague Bridget Whearty pointed out to me while we were discussing the word surrogate, the term is almost always used to describe bodies – either a person having power delegated to them, or a body acting as a substitute for another body. So the implication is that using this term to refer to digitized manuscripts doesn’t only mean the digital is standing in for the physical, but it also – by virtue of previous uses of the term – may imply some sort of embodiment or materiality of the digital object that is acting as the surrogate.

Paul Conway has an extensive discussion of the digital surrogate in his 2014 article “Digital transformations and the archival nature of surrogates”, and although he is referring to archival materials and not medieval manuscripts, I would expect that the use of the term comes from the same place, so I will quote him here. He reflects my own thoughts about a surrogate being more than a copy, saying “The creation of digital surrogates from archival sources is fundamentally a process of representation, far more interesting and complex than merely copying from one medium to another. Theories of representation – and the vast literature derived from them – are at the heart of many disciplines’ scholarship and of particular relevance for scholars who work primarily or exclusively in the digital domain.” (Conway pp. 2-3) He then continues to cite several other scholars – Mitchell, Scruton, Geoffrey Yeo, Matthew Kirschenbaum, Michael Taussig, and Johanna Drucker – who discuss the relationship that digital copies continue to have with their sources well after they have been created, even as they have their own materialities.

Bill Endres, who I quoted above, continues his thoughtfulness in the same piece as he considers surrogate as a term for his own use in describing 3D images of manuscripts. He says, “a term that has gained some commonality in 3D is digital surrogate. Bernard Fischer uses the term for 3D renderings of archaeological sites, like the impressive Rome Reborn. Fischer’s interest in 3D is to construct digital cityscapes and large spaces, thus his use of surrogate, the virtual environment functioning as a substitute or proxy, a stand in for the likes of a dig site or what once was, like ancient Rome, as a means to generate and test hypotheses, fulfilling a specific epistemic function. Surrogate fits Fischer’s needs but does not speak as readily to the full range of epistemic considerations that I want to explore for a manuscript, particularly the excesses of a digital artifact that add to our knowledge in other ways and its effect on looking and knowing.” (Endres, p. 4) The excesses that Endres is referring to here are things like special lighting and the affordances of 3D imaging, and he feels that the term surrogate isn’t sufficient to include these things, although Endres’s excesses and are very similar to those things that Kiernan was thinking of in 1993 when he used the term electronic facsimile. However Kiernan did not use the term surrogate in 1993 – it would be interesting to see when the term surrogate was first used to refer to digital objects, and if it would have been available to Kiernan in 1993.


The third term, avatar, is relatively new to me, although Sian Echard used it in the chapter quoted above, and the term was also used by classicist Ségolène M. Tarte, in her 2011 presentation “Interpreting Ancient Documents: Of Avatars, Uncertainty and Knowledge Creation,” and is also mentioned by Endres and very recently by Michelle Warren, in a just-published article “Remix the Medieval Manuscript: Experiments with Digital Infrastructure.” This term is not yet common, but it may be gaining purchase because of its inherent complexity.

I really like avatar because of the connotations brought along with its original definition. According to Hindu mythology, an avatar is the incarnate, human manifestation of a deity. It is thus the avatar that is embodied, not the thing that the avatar represents. This can be contrasted with the term surrogate, which is also embodied, but the surrogate embodiment is in replacement of something else, while the embodiment of the avatar is the same thing, but in different form. And compare both of these again with facsimile, which again is a copy – these are three very different terms, and yet we have the desire to apply these terms to… if not the exact same things, than at least to the same kind of things.

The term avatar has also been used to mean more generally a manifestation, and I actually think that this is the usage of the term that is closest to its application to digitized manuscripts, although there is another recent usage that is relevant: avatar as a term to describe a character in a computer game on environment, a character that represents a person or a player within that virtual environment (think of Second Life, or, to use a more current example, Minecraft).

(There was also a popular movie by this name that came out in 2009, right around the same time Second Life was reaching peak popularity, and I can’t give short shrift to Avatar: The Last Airbender, an animated show that ran from 2003-2008.)

So what is an avatar when it comes to medieval manuscripts? Echard uses the term to refer both to physical objects and to digital ones, first describing the digital avatars of the Sherborne Missal included in the British Library exhibit celebrating its purchase. These include large-screen installations in the Library gallery, a CD-Rom available for purchase, an online version, and a 3D animation sequence that plays as an introduction to the CD-ROM. However as Echard says, “The avatars for these rare objects have … been books themselves- manipulable, tangible, physical … the physicality of the book is part of its cultural role, whether as public object or private delight. The digital facsimiles I have discussed here all attempt in one way or another to offer these medieval and early modem books to the fulfilling of both roles, and yet I would argue that they are ultimately stymied by the requirement to disembody the objects they display. The resulting tension, between access and absence, creates the ghosts that haunt the digital realm.” (Echard, p. 214) I’ve always loved this description of the tension of digitized manuscripts, and I am tickled to notice only now that the term avatar as attached to it.

I know that I keep quoting Endres, but I find here that again his thoughtfulness in exploring the terminology is really refreshing and I wish more scholars did this kind of intellectual work. He says,  “I find Ségolène Tarte’s impulse to call digital versions avatars most consistent with my needs, the digital version as an incarnation, the physical artifact crossing over and into a digital form. Since I am working on a gospel book, I cannot help but to think about this issue’s echo in early Christian prohibitions against depictions of Christ in the flesh, the prohibition motivated by the belief that physical matter is mundane, not divine, and therefore a painting or statue could not portray Christ’s divine nature, thus could not portray Christ and was blasphemous. In a similar vein, without the blasphemy, a digital version cannot portray all of the features of a physical artifact, but as mentioned, it also includes excesses. I appreciate Tarte’s choice of the word avatars, its recognition that digital artifacts have excesses and exist in a different reality and with different rules and potentials, offering unique advantages and experiences, a recognition that I want to carry forward in my sense of digital artifact or version.” (Endres, p. 4)

Before I conclude, I would like to remark on our apparent desire as a community to apply meaning to digital version of manuscripts by using existing terms, rather than by inventing new terms. After all, we coin new words all the time – just in this paper, I’ve mentioned snowclone and meme, so it would be understandable if we decided to make up a new term rather than reusing old ones. But as far as I know we haven’t , and if anyone has it hasn’t caught up enough to be reused widely in the scholarly community. I expect this comes from a desire to describe a new thing in terms that are understandable, as well as to define the new thing according to what came before. After all, both snowclone and meme are terms for things that have existed long before there were words for them, while digital versions of manuscripts are new things that have a close relationship with things that existed before, so while we want to differentiate them we also want to be able to acknowledge their similarities, and one way to do that is through the terms we call them.

Although we use these three terms – facsimile, surrogate, and avatar – to refer to digitized manuscripts, it is clear that these terms don’t mean the same thing, and that by choosing a specific term to refer to digitized manuscripts we are drawing attention to particular aspects of them. If I call a digitized manuscript a facsimile, I draw attention to its status as a copy. If I call it a surrogate, I draw attention to its status as a stand-in for the physical object. And if I call it an avatar, I draw attention to its status as a representation of the physical object in a digital world. Not a copy, not a replacement, but another version of that thing. Like pushing an idea through different memes, pushing the concept of a digitized manuscript through different terms give us flexibility in how we consider them and how we explain them, and our feelings about them, to our audiences. That we can so easily apply terms with vastly different meanings to the digital versions of manuscripts says something about the complexity of these objects and their digital counterparts.

Thank you.

Sincere thanks to Bridget Whearty, Keri Thomas, Johanna Green, and Anna Levine, for their help getting this paper ready for the public eye.

[1] In the paper presented at the Rare Book School (which was recorded; I will add a link here when it becomes available) I used the Historical Thesaurus of English as the source for the term definitions, but I found during further editing that the Thesaurus timelines weren’t doing what I needed them to. If I continue this work, I expect to bring the timelines back in again.

Zombie Manuscripts: Digital Facsimiles in the Uncanny Valley

This is a version of a paper presented at the International Congress on Medieval Studies, May 12, 2018, in session 482, Digital Skin II: ‘Franken-Manuscripts’ and ‘Zombie Books’: Digital Manuscript Interfaces and Sensory Engagement, sponsored by Information Studies (HATII), Univ. of Glasgow, and organized by Dr. Johanna Green.

The uncanny valley was described by Masahiro Mori in a 1970 article in the Japanese journal Energy, and it wasn’t translated into English completely until 2012.[1] In this article, Mori discusses how he envisions people responding to robots as they become more like humans. The article is a thought piece – that is, it’s not based on any data or study. In the article, which we’ll walk through closely over the course of this presentation, Mori posits a graph, with human likeness on the x axis and affinity on the y axis. Mori’s proposition is that, as robots become more human-like, we have greater affinity for them, until they reach a point at which the likeness becomes creepy, or uncanny, leading to a sudden dip into negative affinity – the uncanny valley.

Now, Mori defined the uncanny valley specifically in relation to robotics, but I think it’s an interesting thought exercise to see how we can plot various presentations of digitized medieval manuscripts along the affinity/likeness axes, and think about where the uncanny valley might fall.

In 2009 I presented a paper, “Reading,
 Hexateuch,” (unpublished but archived in the Indiana University institutional repository) in which I considered the uncanny valley in relation to digital manuscript editions. This consideration followed a long description of the “Turning the Pages Virtualbook” technology which was then being developed at the British Library, of which I was quite critical. At that time, I said:

In my mind, the models created by Turning the Pages™ fall at the nadir of the “uncanny valley of digital texts” – which has perhaps a plain text transcription at one end and the original manuscript at the other end, with print facsimiles and editions, and the various digital displays and visualizations presented earlier in this paper falling somewhere between the plain text and the lip above the chasm.

Which would plot out something like this on the graph. (Graph was not included in the original 2009 paper)

Dot’s 2009 Conception of the Uncanny Valley of Manuscripts

Nine years of thinking on this and learning more about how digital manuscripts are created and how they function, I’m no longer happy with this arrangement. Additionally, in 2009 I was working with imperfect knowledge of Mori’s proposition – the translation of the article I referred to then was an incomplete translation from 2005, and included a single, simplified graph in place of the two graphs from the original article – which we will look at later in this talk.

Manuscripts aren’t people, and digitized manuscripts aren’t robots, so before we start I want to be clear about what exactly I’m thinking about here. Out of Mori’s proposition I distill four points relevant to our manuscript discussion:

First, Robots are physical objects that resemble humans more or less (that is the x-axis of the graph)

Second, as robots become more human-like, people have greater affinity for them (until they don’t – uncanny valley) – this is the y-axis of the graph

Third, the peak of the graph is a human, not the most human robot

Fourth, the graph refers to robots and to humans generally, not robots compared to a specific human.

Four parallel points can be drawn to manuscripts:

First, digitized manuscripts are data about manuscripts (digital images + structural metadata + additional data) that are presented on computers. Digitized manuscripts are pieces, and in visualizing the manuscript on a computer we are reconstructing them in various ways. (Given the theme of the session I want to point out that this description makes digitized manuscripts sound a lot more like Frankenstein’s creature than like a traditional zombie, and I’m distraught that I don’t have time to investigate this concept further today) These presentations resemble the parent manuscript more or less (this is the x-axis)

Second, as presentations of digitized manuscripts become more manuscript-like, people have greater affinity for them (until they don’t – uncanny valley) – this is the y-axis

Third, the peak of the graph is the parent manuscript, not the most manuscript-like digital presentation

Fourth, the graph refers to a specific manuscript, not to manuscripts generally

I think that this is going to be the major difference in applying the concept of the uncanny valley to manuscripts vs. robots: while Robots are general, not specific (i.e., they are designed and built to imitate humans and not specific people), the ideal (i.e., most manuscript-like) digital presentation of a manuscript would need to be specific, not general (i.e., it would need to be designed to look and act like the parent manuscript, not like any old manuscript)

Now let’s move on to Affinity

A Valley in One’s Sense of Affinity

Mori’s article is divided into four sections, the first being “A Valley in One’s Sense of Affinity”. In this section Mori describes what he means by affinity and how affinity is affected by sensory input. Figure one in this section is the graph we saw before, which starts with an Industrial Robot (little likeness, little affinity), then a Toy Robot (more likeness, more affinity), then drops to negative affinity at about 80-85% likeness, with Prosthetic Hand at negative affinity and Bunraku Puppet on the steep rise to positive affinity and up to Healthy Person.

For Mori, sensory input beyond the visual is important for an object’s placement on the x-axis. An object might look very human, but if it feels strange, that doesn’t only send the affinity into the negative, but it also lessens the likeness. Mori’s original argument focuses on prosthetic hands, specifically about realistic prosthetic hands, which cannot be distinguished at a glance from real ones. I’m afraid the language in his example is abelist so I don’t want to quote him,

Luke Skywalker’s prosthetic hand in The Empire Strikes Back

but his argument is essentially that a very realistic prosthetic hand, when one touches it and realizes it is not a real hand (as one had been led to believe), it becomes uncanny. Relating this feeling to the graph, Mori says, “In mathematical terms, this can be represented by a negative value. Therefore, in this case, the appearance of the prosthetic hand is quite humanlike, but the level of affinity is negative, thus placing the hand near the bottom of the valley in Figure 1.”

The character Osono, from the play Hade Sugata Onna Maiginu (艶容女舞衣), in a performance by the Tonda Puppet Troupe of Nagahama, Shiga Prefecture. (CC:BY:SA)

Bunraku puppets, while not actually resembling humans physically as strongly as a very realistic prosthetic hand visually resembles a human hand, fall farther up the graph both in terms of likeness and in affinity. Mori makes it clear that likeness is not only, or even mostly, a visual thing. He says:

I don’t think that, on close inspection, a bunraku  puppet appears similar to a human being. Its realism in terms of size, skin texture, and so on, does not even reach that of a realistic prosthetic hand. But when we enjoy a puppet show in the theater, we are seated at a certain distance from the stage. The puppet’s absolute size is ignored, and its total appearance, including hand and eye movements, is close to that of a human being. So, given our tendency as an audience to become absorbed in this form of art, we might feel a high level of affinity for the puppet.

So it’s not that bunraku puppets look like humans in great detail, but when we experience them within the context of the puppet show they have the affect of being very human-like, thus they are high on the human likeness scale.

For a book-related parallel I want to quote briefly a blog post, brought to my attention earlier this week, by Sean Gilmore. Sean is an undergraduate student at Colby College and this past semester took Dr. Megan Cook’s Book History course, for which he wrote this post, “Zombie Books; Digital Facsimiles for the Dotty Dimple Stories.” There’s nothing in this post to suggest that Sean is familiar with the uncanny valley, but I was tickled with his description of reading a digital facsimile of a printed book. Sean says:

In regards to reading experience, reading a digital facsimile could not be farther from the experience of reading from the Dotty Dimple box set. The digital facsimile does in truth feel like reading a “zombie book”. While every page is exactly the same as the original copy in the libraries of the University of Minnesota, it feels as though the book has lost its character. When I selected my pet book from Special Collection half of the appeal of the Dotty Stories was the small red box they came in, the gold spines beckoning, almost as if they were shouting out to be read. This facsimile, on the other hand, feels like a taxidermy house cat; it used to be a real thing, but now it feels hollow, and honestly a little weird.

Sean has found the uncanny valley without even knowing it exists.

The Effect of Movement

The second section of Mori’s article, and where I think it really gets interesting for thinking about digitized manuscripts, is The Effect of Movement. In the first section we were talking in generalities, but here we see what happens when we consider movement alongside general appearance. Manuscripts, after all, are complex physical objects, much as humans are complex physical objects. Manuscripts have multiple leaves, which are connected to each other across quires, the quires which are then bound together and, often, connected to a binding. So moving a page doesn’t just move a page, much as bending your leg doesn’t just move your leg. Turning the leaf of a manuscript might tug on the conjoined leaf, push against the binding, tug on the leaves preceding and following – a single movement provoking a tiny chain reaction through the object, and one which, with practice, we are conditioned to recognize and expect.

Mori says:

Movement is fundamental to animals— including human beings—and thus to robots as well. Its presence changes the shape of the uncanny valley graph by amplifying the peaks and valleys (Figure 2). For illustration, when an industrial robot is switched off, it is just a greasy machine. But once the robot is programmed to move its gripper like a human hand, we start to feel a certain level of affinity for it.

And here, finally, we find our zombie, at the nadir of the “Moving” line of the uncanny valley. The lowest point of the “Still” line is the Corpse, and you can see the arrow Mori has drawn from “Healthy Person” at the pinnacle of the graph down to “Corpse” at the bottom. As Mori says, “We might be glad that this arrow leads down into the still valley of the corpse and not the valley animated by the living dead.” A zombie is thus, in this proposition, an animated corpse. So what is a “dead” manuscript? What is the corpse? And what is the zombie? (I don’t actually have answers, but I think Johanna might be addressing these or similar questions in her talk)

Reservoir Dogs (not zombies)

The Walking Dead (shuffling zombies)

28 Days Later (manic zombies)

I expect most of us here have seen zombie movies, so, in the same way we’ve been conditioned to recognize how manuscripts move, we’ve been conditioned to understand when we’re looking at “normal” humans and when we’re looking at zombies. They move differently from normal humans. It’s part of the fun of watching a zombie film – when that person comes around the corner, we (along with the human characters in the film) are watching carefully. [13] Are they shuffling or just limping? [14] Are they running towards us or away from something else? It’s the movement that gives away a zombie, and it’s the movement that will give away a zombie manuscript.


I want to take a minute to look at a manuscript in action. This is a video of me turning the pages of Ms. Codex 1056, a Book of Hours from the University of Pennsylvania. This will give you an idea of what this manuscript is like (its size, what its pages look like, how it moves, how it sounds), although within Mori’s conception this video is more similar to a bunraku puppet than it is like the manuscript itself.

It’s a copy of the manuscript, showing just a few pages, and the video was taken in a specific time and space with a specific person. If you came to our reading room and paged through this manuscript, it would not look and act the same for you.

e-codices manuscript viewer

e-codices viewed through Mirador

Now let’s take a look at a few examples of different page-turning interfaces. The first is from e-codices, and is their regular, purpose-built viewer. When you select the next page, the opening is simply replaced with the next opening (after a few seconds for loading). The second is also e-codices, but is from the Mirador viewer, a IIIF viewer that is being adopted by institutions and that can also be used by individuals. Similar to the other viewer, when you select the next page the opening is replaced with the next opening (and you can also track through the pages using the image strip along the bottom of the window). The next example is a Bible from Swarthmore College near Philadelphia, presented in the Internet Archive BookReader. This one is designed to mimic a physical page turning, but it simply tilts and moves the image. This would be fine (maybe a bit weird) if the image were text-only, but as the image includes the edges of the text-block and you can see a bit of the binding, the effect here is very odd. Finally, my old friend Turning the Pages (a newer version than the one I complained about in my 2009 paper), which works very hard to mimic the movement of a page turning, but doing so in a way that is unlike any manuscript I’ve ever seen.

Escape by Design

In the third section of his article, Mori proposes that designers focus their work in the area just before the uncanny valley, creating robots that have lower human likeness but maximum affinity (similar to how he discussed bunraku puppets in the section on affinity, although they are on the other side of the valley). He says:

In fact, I predict that it is possible to create a safe level of affinity by deliberately pursuing a nonhuman design. I ask designers to ponder this. To illustrate the principle, consider eyeglasses. Eyeglasses do not resemble real eyeballs, but one could say that their design has created a charming pair of new eyes. So we should follow the same principle in designing prosthetic hands. In doing so, instead of pitiful looking realistic hands, stylish ones would likely become fashionable.

Floral Porcelain Leg from the Alternative Limb Project (

And here’s an example of a very stylish prosthetic leg from the Alternative Limb Project, which specializes in beautiful and decidedly not realistic prosthetic limbs (and realistic ones too). This is definitely a leg, and it’s definitely not her real leg.


In the world of manuscripts, there are a few approaches that would, I think, keep digitized manuscript presentations in that nice bump before the valley:


“Page turning” interfaces that don’t try to hard to look like they are actually turning pages (see the two e-codices examples above)

Alternative interfaces that are obviously not attempting to show the whole manuscript but still illustrate something important about them (for example, RTI, MSI, or 3D models of single pages). This example is an interactive 3D image of the miniature of St. Luke from Bill Endres’s Manuscripts of Lichfield Cathedral project.

Visualizations that illustrate physical aspects of the manuscript without trying to imitate them (for example, VisColl visualizations with collation diagrams and bifolia)


I think these would plot out something like this on the graph.

Dot’s 2018 Conception of the Uncanny Valley of Digitized Manuscripts

This is all I have to say about the uncanny valley and zombie books, but I’m looking forward to Johanna, Bridget and Angie’s contributions and to our discussion at the end. I also want to give a huge shout-out to Johanna and Bridget, to Johanna for conceiving of this session and inviting me to contribute, and both of them for being immensely supportive colleagues and friends as I worked through my thoughts about frankenbooks and zombie manuscripts, many of which, sadly, didn’t make it into the presentation, but which I look forward to investigating in future papers.

[1] M. Mori, “The uncanny valley,” Energy, vol. 7, no. 4, pp. 33–35, 1970 (in Japanese);  M. Mori, K. F. MacDorman and N. Kageki, “The Uncanny Valley [From the Field],” in IEEE Robotics & Automation Magazine, vol. 19, no. 2, pp. 98-100, June 2012. (translated into English) (

Data for Curators: OPenn and Bibliotheca Philadelphiensis as Use Cases

Following are my remarks from the Collections as Data National Forum 2 event held at the University of New Mexico, Las Vegas, on May 7 2018. Collections as Data is an Institute of Museum and Library Services supported effort that aims to foster a strategic approach to developing, describing, providing access to, and encouraging reuse of collections that support computationally-driven research and teaching in areas including but not limited to Digital Humanities, Public History, Digital History, data driven Journalism, Digital Social Science, and Digital Art History. The event was organized by Thomas Padilla, and I thank him for inviting me. It was a great event and I was honored to participate.

Today I’m going to be talking about curators as an audience for collections as data, using two projects from the University of Pennsylvania’s Kislak Center for Special Collections, Rare Books and Manuscripts as use cases. I am a curator in the Kislak Center, and most of my time I work on projects under the aegis of the Schoenberg Institute for Manuscript Studies, which is a unit under the Kislak Center. SIMS is a kind of research and development group (our director likes to refer to it as a think tank) that focuses on manuscript studies writ large, mostly but by no means only focused on medieval manuscripts from Europe, and that specializes in examining the relationship between manuscripts as physical objects and their digitized counterparts.

For this session, we’ve been asked to react to this assertion from the Collections as Data Santa Barbara Statement: Collections as data designed for everyone serve no one, and to discuss the audiences that our collections as data are built for.

I’ll start with OPenn, which launched in May 2015 as an open access collection of Penn’s digitized manuscript material. Penn started digitizing its manuscripts in the mid 1990s, but they had been virtually locked in a black box system. To create OPenn we cracked opened the box, generated new derivative images from the master TIFF files, generated TEI/XML manuscript description files using the data from our catalog and supporting databases, and put it all in a fully public file server. The collection navigation is provided by HTML pages – one that lists all the repositories, pages listing the manuscripts in each repository, and finally HTML pages for each manuscript presenting the catalog data and links to the image files. At the time OPenn launched, there was no search facility, although one has recently been added.

OPenn’s developer, Doug Emery, describes the access that OPenn provides as friction-free access, referring both to the licensing (the image files are in the public domain, the metadata is licensed cc:by) and to the technical access. There’s no login and no API. You can navigate to the site in a browser and download images, or you can point wget at the server and bulk download entire manuscripts.

When we were designing OPenn, we weren’t thinking that much about the audience, honestly. We were thinking about pushing the envelope with fully available, openly licensed, high resolution, robustly described and well-organized digitized medieval manuscripts. We did imagine who might use our collections, and how, and you can read the statement from our readme here on the screen.

But I can’t say that we built the system to serve any audience in particular. We did build the system in a way that we thought would be generally useful and usable. But it became clear after OPenn launched that our lack of an audience made it difficult for us to “sell” OPenn to any group of people. Medievalists, faculty and students, who might want to use the material, were put off by the relatively high technical learning curve, the simple interface (lacking the expected page-turning view) and by the lack of search (we do have a Google Search now, but it was only added to the site in the past month). Data analysts who might want to visualize the collection-wide data were put off by the formatting of each manuscript having its own TEI file. Indeed data designed for everyone does seem to serve no one.

But wait! Don’t lose hope! An accidental audience did present itself. In the months and into the first year after OPenn launched, it was slowly used as a source for projects. The Philadelphia Area Consortium of Special Collections Libraries, PACSCL, undertook a collaborative project whereby each member institution digitized five diaries from their collections, which were put on OPenn, the PACSCL Diaries Project.

When the project went live, the folks at PACSCL wanted a user-friendly way to make the diaries available, so I generated page-turning interfaces using the Internet Archive Bookreader  that pulled in metadata from the TEI files and that point to the image files served on OPenn.

At some point I decided that I wanted to get a better sense of one of our manuscript collections, the Lawrence J. Schoenberg Collection, so again I wrote a script to generate a CSV file pulling from all the collection’s TEI files. Jessie Dummer, the Kislak Center’s Digitization Project Coordinator, cleaned up the data in the CSV, and we were able to load the CSV into Palladio for visualization and analysis (on github)

I combined the links to images on OPenn with data gathered through another SIMS project, VisColl (which I’ll describe in a bit more detail later) to generate a visualization of the gathering structure of manuscripts with the bifolia, or sheets, laid alongside. And last but not least, I experimented with setting up a IIIF image server that could serve the images from OPenn as IIIF-compatible images (this is a screenshot of the github site where I published IIIF manifests I generated as part of that project, but they don’t work because the server no longer exists).

The accidental audience? It was me.

I don’t remember thinking about or discussing with the rest of the team as we planned for OPenn how I might use it as part of my regular work. I was familiar with the concept of an open collection of metadata and image files online; OPenn was based on The Digital Walters, which both the Director of the Kislak Center Will Noel and Doug Emery had built when they were employed at the Walters Art Museum in Baltimore, and I had been playing with that data for a year before I was even hired at Penn. I must have know that I would use it, I just didn’t realize how much I would use it, or how having it available to me would change the way I thought about my work, and the way I worked with the collections. The things that made it difficult for other people to use OPenn – the lack of a search facility, the dependence on XML – didn’t affect me negatively. I already knew the collection, so a search wasn’t necessary; at the time OPenn launched I had been working with XML technologies for 10 years or so, so I was very comfortable with it.

Having OPenn as a source for data gives me so much in my curatorial role. I have the flexibility to build the interfaces I want using tools I can understand, and flexibility, easy access, familiar formats

At the very end of 2015, several months after OPenn was launched, we, along with PACSCL, Lehigh University, and the Free Library of Philadelphia, were awarded a grant from the Council on Library and Information Resources under the “Digitizing Hidden Collections” program to digitize western Medieval manuscripts in 15 Philadelphia area libraries. We call the project Bibliotheca Philadelphiensis, the “library of Philadelphia”, or BiblioPhilly for short. Working from my experience working with data on OPenn, during the six-month lead up to cataloging and digitization I was able to build the requirements for the BiblioPhilly metadata in a way to guarantee that the resulting data would be useful to me and to the curators and librarians at the other institutions. Some of the things we implemented include a closed list of keywords (based on the keyword list developed for the Digital Walters), in contrast with the Library of Congress subject headings in OPenn, and four different date fields (date range start, date range end, single date, and narrative date) with strict instructions for each (except for narrative date) to ensure that the dates will be computer readable.

We have also integrated data from VisColl into BiblioPhilly, both into the data itself, and in combination with the data in the interfaces. VisColl, as I mentioned before, is a system to model and visualize the quire structure of manuscripts. (A manuscript’s quire structure is called its collation, hence the name VisColl – visualizing collation) VisColl models are XML files that describe each leaf in a manuscript and how those leaves relate to each other (if they are in the same quire, or if they are conjoined, if a leaf is missing or has been added, etc.). From a model we’re able to generate a concise description of a manuscripts’ construction, in a format referred to as a collation formula, and this formula is included in the manuscript’s cataloging and becomes part of the TEI manuscript description. However we’re also able to combine the information from the collation model with the links to the image files on OPenn to generate views of a collation diagram alongside the sheets that make up the quires. 

For BiblioPhilly, because of the experimentation we did with Penn manuscripts on OPenn, we’ve been able to make the digitized BiblioPhilly manuscripts available online in ways that are more user-friendly to non-technical users than OPenn is, even before we have an “official” project interface. We did this by building an In Progress Viewer relatively early on. The aim of the In Progress viewer was 1) to provide technically simple, user-friendly ways to search, browse, and view the manuscripts, and 2) to make available information both about the manuscripts that were online, and about the manuscripts that had yet to go online (including the date they were photographed, so users can track manuscripts of particular interest through the process).

The first In Progress Viewer was built in the Library of Congress’s Viewshare,  which provided federated browsing for all the fields in our records, along with a timeline and simple mapping facility. Unfortunately the Library of Congress is no longer supporting Viewshare, and when it went offline on March 20 we moved to an Omeka platform, which is more attractive but lacks the federated searching that made Viewshare so compelling. From Omeka (and Viewshare before it) we link to the manuscript data on OPenn, to Internet Archive BookReader page-turners, and to VisColl collation views. Both the BookReaders and VisColl views are generated locally from scripts and hosted on a Digital Ocean droplet. This is a temporary system, and is not built to last beyond the end of the project. It will be replaced by an official, longer-lived interface.

We’re also able to leverage the OPenn design of BiblioPhilly and VisColl for this “official” interface, which is currently under development with Byte Studios of Milwaukee, Wisconson. While our In Progress Viewer has both page-turning facility and collation views, those elements are separate and are not designed to interact. The interface that we are designing with Byte Studios incorporates the collation data with the page-turning and will allow a user to switch seamlessly between page openings and full sheets.

It’s exciting that we’ve been able to leverage what was essentially an audience-less platform into something that can so well serve its curator, but there is a question that this approachpushes wide open: What does it mean to be a curator? With a background in digital humanities focused on the development of editions of medieval manuscripts I was basically the perfect curator for OPenn. But that was a happy accident. Most special collections curators don’t have my background or my technical training, so access to something like OPenn wouldn’t help them, and I’m very hesitant to suggest that every curator be trained in programming. I do think that every special collections department should have some in-house digital expertise, and maybe that’s the direction to go. Anyway, I’m very happy being in my current situation and I only wish we’d considered the curator as an audience for OPenn earlier in the process.

Ceci n’est pas un manuscrit: Summary of Mellon Seminar, February 19th 2018

This post is a summary of a Mellon Seminar I presented at the Price Lab for Digital Humanities at the University of Pennsylvania on February 19th, 2018. I will be presenting an expanded version of this talk at the Rare Book School in Philadelphia, PA, on June 12th, 2018

In my talk for the Mellon Seminar I presented on three of my current projects, talked about what we gain and lose through digitization, and made a valiant attempt to relate my talk to the theme of the seminars for this semester, which is music and sound. (The page for the Mellon Seminars is here, although it only shows upcoming seminars.) I’m not sure how well that went, but I tried!

I started my talk by pointing out that medieval manuscripts are physical objects – sometimes very large objects! They have weight and size and heft, and unlike static objects like sculptures, manuscripts move. They need to move in order for us to read them. But digitized manuscripts – the ones you find for example in Penn in Hand, the page-turning interface for Penn’s digitized manuscript collection – don’t really move. Sure, we have an interface that gives the impression of turning the pages of the book, but those images are flat, static files that are just the latest version in a long history of facsimile copies of manuscripts. A page-turning interface for medieval manuscripts is the equivalent of taking a book, cutting the pages out, and then pasting those pages into a photo album. You can read the pages but you lose the sense of the book as a physical object.

It sounds like I’m complaining, but I’m really not. I like that digital photographs of manuscripts are readily available and relatively standard, but I do think it’s vitally important that people using them are aware of how they’re different from the “real” manuscript. So in my talk I spent some time deconstructing a screenshot from a manuscript in Penn in Hand (see above). It presents itself as a manuscript opening (that is, two facing pages), but it should be immediately apparent that this is a fake. This isn’t the opening in the book, it’s two photos placed side-by-side to give the impression of the opening of the book. There is a dark line down the center of the window which clearly delineates the photo on the left and the one on the right. You can see two gutters – the book only has one, of course, but each photo includes it – and you can also see a bit of the text on the facing page in each photo. From the way the text is angled you can tell that this book was not laid flat when it was photographed – it was held at or near a 90 degree angle (and here’s another lie – the impression that the page-turning interface gives us is that of a book laid flat. Very few manuscripts lay flat. So many lies!).

We can see in the left-hand photo the line of the edge of the glass, to the right of the gutter and just to the left of the black line. In our digitization lab we use a table with a spring-loaded top and a glass plate that lays down on the page to hold it flat. (You can see a two-part demo of the table on Facebook, Part One and Part Two) This means the photographer will always know where to focus the camera (that is, at the level of the glass plate), and as each page of the book is turned the pages are the same distance from the camera (hence the spring under the table top). I think it’s also important to know that when you’re looking at an opening in a digital manuscript, the two photos in that composite view were not taken one after the other; they were possibly taken hours apart. In SCETI, the digitization lab in the Penn Libraries, all the rectos (that is, the front of the page) are taken at one time, and then the versos (the back of the page) are taken, and then the system interleaves them. (For an excellent description of digital photography of books and issues around it please see Dr. Sarah Werner’s Pforzheimer Lecture at the Harry Ransom Center on Early Digital Facsimiles)

I moved from talking about how digital images served through page-turning interfaces provide one kind of mediated (~fake~) view of manuscripts to one of my ongoing projects that provides another kind of mediated (also fake?) view of manuscripts: video. I could talk and write for a long time about manuscript videos, and I am trying to summarize my talk and not present it in full, so I’ll just say that one advantage that videos have over digitized images is that they do give an impression of the “real” manuscript: the size of them, the way they move (Is it stiff? How far can it open? Is the binding loose or tight?), and – relevant to the Seminar theme! – how they sound. I didn’t really think about it when I started making the videos four years ago, but if you listen carefully in any of the videos you can hear the pages (and in some cases the bindings), and if you listen to several of them you can really tell the difference between how different types of parchment and paper sound. Our complete YouTube playlist of video orientations is here, but I’ll embed one of my favorites here. This is LJS 280, a 13th century copy of Decretales Gregorii IX in a 15th century chain binding that makes a lot of noise.

I don’t want to imply that videos are better than digital images – they just tell us something that digital images can’t. And digital images are useful in ways that videos aren’t. For one thing, if you’re watching a video you can see the way the book moves, but I’m the one moving it. It’s still a mediated experience, it’s just mediated in a different way. You can see how it moved at a specific time, in a specific situation, with a specific person. If you want to see folio 45v, you’re out of luck, because I didn’t turn to that page (and even if I had, the video resolution might not be high enough for you to read it; the video isn’t for reading – that’s why we have the digital images).

There’s another problem with videos.

In four years of the video orientation program, we have 74 videos online. We could have more if we made it a higher priority (and arguably we should), but each one takes time: for research, to set up and take down equipment, for the recording (sometimes multiple takes), and then for the processing. The videos are also part of the official record of the manuscript (we load them into the library’s institutional repository and link them to records in the library’s catalog) and doing that means additional work.

At this point I left videos behind and went back to digital images, but a specific project: Bibliotheca Philadelphiensis, which we call BiblioPhilly. BiblioPhilly is a major collaborative project to digitize medieval manuscripts from institutions across Philadelphia, organized by the Philadelphia Area Consortium of Special Collections Libraries (PACSCL) and funded by the Council on Library and Information Resources (CLIR). We’re just entering year three of a three-year grant, and when we’re done we’ll have 476 manuscripts online (we have around 130 online now). If you’re interested in checking out the manuscripts that are online, and to see what’s coming, you can visit our search and browse site here.

The relevance of BiblioPhilly in my talk is that we’re being experimental with the kind of data we’re creating in the cataloging work, and with how we use that data to provide new and different manuscript views.

Manuscript catalogers traditionally examine and describe the physical structure of the codex. Codex manuscripts start as sheets of parchment or paper, which are stacked and folded to create booklets called quires. Quires are then gathered together and sewn together to make a text block, then that is bound to make the codex. So describing the physical structure means answering a few questions: How many quires? How many leaves in each quire? Are there leaves that are missing? Are there leaves that are singletons (i.e., were never part of a sheet)? When a cataloger has answered these questions they traditionally describe the structure using a collation formula. The formula will list the quires, number of leaves in a quire, and any variations. For example, a manuscript with 10 quires, all of which have eight leaves except for quire six which has four, and there are some missing leaves, might have a formula like this:

1-4(8), 5(8, -4,5), 6(4), 7-10(8)

(Quires 1 through 4 have eight leaves, quire 5 had eight leaves but four and five are now missing, quire 6 has four leaves, and quires 7-10 have eight leaves)

The formula is standardized for printed books, but not for manuscripts.

Using tools developed through the research project VisColl, which is designing a data model and system for describing and visualizing the physical construction of manuscripts, we’re building models for the manuscripts as part of the BiblioPhilly cataloging process, and then using those models to generate the formulas that go into our records. This itself is good, but once we have models we can use them to visualize the manuscripts in other ways too. So if you go to the BiblioPhilly search and browse site and peek into the records, you’ll find that some of them include links to a “Collation View”

Following that link will take you to a page where you can see diagrams showing each quire, and image files organized to show how the leaves are physically connected through the quire (that is, the sheets that were originally bound together to form the quire).

Like the page-turning interface, this is giving us a false impression of what it would be like to deconstruct the manuscript and view it in a different way, but like the video is it also giving us a view of the manuscript that is based in some way on its physicality.

And this is where my talk ended. We had a really excellent question and answer session, which included a question about why I don’t wear gloves in the videos (my favorite question, which I answer here with a link to this blog post at the British Library) but also a lot of great discussion about why we digitize, and how, and why it matters, and how we can do it best.

Thanks so much to Glenda Goodman and Stewart Varner for inviting me, and to everyone who showed up.


Slides from OPenn Demo at the American Historical Association Meeting

This week I participated in a workshop organized by the Collections as Data project at the annual meeting of the American Historical Association in Washington, DC. The session was organized by Stewart Varner and Laurie Allen, who introduced the session, and the other participants were Clifford Anderson and Alex Galarza.

The stated aim of the session was “to spark conversations about using emerging digital approaches to study cultural heritage collections,” (I’ll copy the full workshop description at the end of this post) but all of our presentations ended up focusing on the labor involved in developing our projects. This was not planned, but it was good, and also interesting that all of us independently came to this conclusion.

Clifford’s presentation was about work being done by the Scholarly Communications team at Vanderbilt University Libraries as they convert data from legacy projects (which have tended to be purpose built, siloed, and bespoke) into more tractable, reusable open data, and Alex told us about the GAM Digital Archive Project, which is digitizing materials related to human rights violations in Guatemala. Both Clifford and Alex stressed the amount of time and effort it takes to do the work behind their projects. The audience was mainly history faculty and maybe a few graduate students, and I expect they, like me, wanted to make sure the audience understood that the issue of where data comes from is arguably more important than the existence of the data itself.

My own talk was about the University of Pennsylvania’s OPenn (Primary Digital Resources for Everyone), which if you know me you probably already know about. OPenn is the website in which the Kislak Center for Special Collections, Rare Books and Manuscripts publishes its digitized collections in the public domain, as well as hosting collections for many other institutions. This includes several libraries and archives around Philadelphia who are partners on the CLIR-funded Bibliotheca Philadelphiensis project (a collaboration with Lehigh University, the Free Library of Philadelphia, Penn, and the Philadelphia Area Consortium of Special Collections Libraries), which I always mention in talks these days (I’m a co-PI and much of the work of the project is being done at Penn). I also focused my talk on the labor of OPenn, mentioning the people involved and including slides on where the data in OPenn comes from, which I haven’t mentioned in a public talk before.

Ironically I ended up spending so much time talking about what OPenn is and how it works that I didn’t have time to show much of the data, or what you can do with it. But that ended up fitting the (unplanned) theme of the workshop, and the attendees seemed to appreciate it, so I consider it a success.

Here are my slides:

Workshop abstract (from this page):

The purpose of this workshop is to spark conversations about using emerging digital approaches to study cultural heritage collections. It will include a few demonstrations of history projects that make use of collection materials from galleries, libraries, archives, or museums (GLAM) in computational ways, or that address those materials as data. The group will also discuss a range of ways that historical collections can be transformed and creatively re-imagined as data. The workshop will include conversations about the ethical aspects of these kinds of transformations, as well as the potential avenues of exploration that are opened by historical materials treated as data. Part of an IMLS-funded National Digital Forum grant, this workshop will ultimately inform the development of recommendations that aim to support cultural heritage community efforts to make collections collections more readily amenable to computational use.

The Historiography of Medieval Manuscripts in England (and the USA)

The text of a lightning talk originally presented at The Futures of Medieval Historiography, a conference at the University of Pennsylvania organized by Jackie Burek and Emily Steiner. Keep in mind that this was very lightly researched; please be kind.

Rather than the originally proposed topic, the historiography of medieval manuscript descriptions, I will instead be talking about the historiography of medieval manuscripts specifically in England and the USA, as perceived through the lens of manuscript descriptions.

We’ll start in the late 12th into the 15th century, when monastic houses cataloged the books in their care using little more than a shelf-list. Such a list would be practical in nature: the community needs to be able to know what books they own, so as books are borrowed internally or loaned to other houses (or perhaps sold) they have a way to keep track of them. Entries on the list would be very simple: a brief statement of contents, and perhaps a note on the number of volumes. There is, of course, an entire field of study around reconstructing medieval libraries using these lists, and as the descriptions are quite simple it is not an easy task.

c. 1190-1200. Cambridge, Jesus College MS 34, fol. 1r. First catalogue of the library of Rievaulx. (Plate 3 from The Libraries of the Cistercians, etc. Vol. 3 in Corpus of British Medieval Library Catalogues, 1992)

Late 13th c. Oxford, Bodleian Library MS Rawlinson B. 336, page 187. Catalogue of the library of St Radegund’s abbey at Bradsole. (Plate 5 from The Libraries of the Cistercians, etc. Vol. 3 in Corpus of British Medieval Library Catalogues, 1992)

1400. London, BL MS Additional 70507, fol. 2r. Description of the library at Titchfield (Plate 6 from The Libraries of the Cistercians, etc. Vol. 3 in Corpus of British Medieval Library Catalogues, 1992)

In the 15th and 16th centuries there were two major historical events that I expect played a major role both in a change in the reception of manuscripts, and in the development of manuscript descriptions moving forward: those are the invention of the printing press in the mid-15th century, and the dissolution of the monasteries in the mid-16th century. The first made it possible to relatively easily print multiple copies of the same book, and also began the long process that rendered manuscripts obsolete. The second led to the transfer of monastic books from institutional into private hands, and the development of private collections with singular owners. When it came to describing their books, these collectors seemed to be interested in describing for themselves and other collectors, and not only for the practical purpose of keeping track of them. Here is a 1697 reprint of a catalog published in 1600 of Matthew Parker’s private collection (bequeathed to Corpus Christi College Cambridge in 1574). You can see that the descriptions themselves are not much different from those in the manuscript lists, but the technology for sharing the catalog – and thus the audience for the catalog – is different.

1600. Ecloga Oxonio–Cantabrigiensis, tributa in libros duos, quorum prior continet catalogum confusum librorum manuscriptorum in illustrissimis bibliothecis, duarum florentissimarum Acdemiarum, Oxoniae et Catabrigiae (London, 1600; reprinted in 1697)

In the later 16th and into the 17th century these private manuscript collections began to be donated back to institutions (educational and governmental), leading to descriptions for yet other audiences and for a new purpose: for institutions to inform scholars of what they have available for their use. The next three examples, from three catalogs of the Cotton Collection (now at the British Library) reflect this movement. The first is from a catalogue published in 1696, the content description is perhaps a bit longer than the earlier examples, and barely visible in the margin is a bit of a physical description: this is a codex with 155 folios. Notably this is the first description we’ve looked at that mentions the size of the book at all, so we are moving beyond a focus only on content. This next example, from 1777, is notable because it completely forefronts the contents. This catalog as a whole is organized by theme, not by manuscript (you can see below the contents listed out for Cotton Nero A. i), so we might describe it as a catalog of the collection, rather than a catalog of the manuscripts comprising the collection.

1696. Catalogue of the manuscripts in the Cottonian Library, 1696 (facsimile 1984)

1777. A catalogue of the manuscripts in Cottonian library: to which are added many emendations and additions. With an appendix containing an account of the damage sustained by the fire in 1731; and also a catalogue of the charters preserved in the same library. British Museum Dept. of Manuscripts, 1777

The third example is from the 1802 catalog, and although it’s still in Latin we can see that there is more physical description as well as more detail about the contents and appearance of the manuscript. There is also a citation to a book in which the preface on the manuscript has been published – the manuscript description is beginning to look a bit scholarly.

1802. A catalogue of the manuscripts in the Cottonian library deposited in the British museum : printed by command of His Majesty King George III. &c. &c. &c. in pursuance of an address of the House of Commons of Great Britain. British Museum Dept. of Manuscripts, 1802

We’ll jump ahead 150 years, and we can see in that time that concern with manuscripts has spread out from the institution to include the realm of the scholar. This example is from N.R. Ker’s Catalogue of Manuscripts Containing Anglo-Saxon, rather than focusing on the books in a particular collection it is focused on a class of manuscripts, regardless of where they are physically located. The description is in the vernacular, and has more detail in every regard. The text is divided into sections as well: General description; codicological description; discussion of the hands; and provenance.

1957. N. R. Ker, Catalogue of Manuscripts Containing Anglo-Saxon. Oxford, 1957.

And now we arrive at today, and to the next major change to come to manuscript descriptions, again due to new technology. Libraries around the world, including here at Penn, are writing our manuscript descriptions using code instead of on paper, and publishing them online along with digital images of the manuscript pages, so people can not only read about our manuscripts, but also see images of them and use our data to create new things. We use the data ourselves, for example in OPenn (Primary digital resources available to everyone!) we build websites from our manuscript descriptions to make them available to the widest possible audience.

I want to close by giving a shout-out to the Schoenberg Database of Manuscripts, directed by Lynn Ransom, which is pushing the definition of manuscript descriptions in new scholarly directions. In the SDBM, a manuscript is described temporally, through entries that describe where a book was at particular moments in time (either in published catalogs, or through personal observation). As scholarly needs continue to change, and technology makes new things possible, the description of manuscripts will likewise continue to change around these, even as they have already over the last 800 years.