The Uncanny Valley and the Ghost in the Machine: a discussion of analogies for thinking about digitized medieval manuscripts

This is a version of a paper I presented at the University of Kansas Digital Humanities Seminar, Co-Sponsored with the Hall Center for the Humanities on September 17, 2018.

Good afternoon, and thank you everyone for coming today. Thanks especially to the Digital Humanities Seminar and the Medieval and Early Modern Seminar for inviting me today, and to Peter and Elspeth for being such excellent and kind hosts.

What I’d like to do today is present an overview of some of my current work on how to think about manuscripts and digitized manuscripts, and and to present some newer work, all of which I must warn you is still in a bit of an embryonic state. I’m hoping that we have time in Q&A to have a good discussion, and that I might be able to learn from you.

First, a bit about me so you know where we’re starting.

I am a librarian and curator at the Schoenberg Institute for Manuscript Studies (SIMS), which is a research and development group in the Kislak Center for Special Collections, Rare Books and Manuscripts at the University of Pennsylvania. I’ve been at Penn, at SIMS, for a little over five years, when SIMS was founded. The work we do at SIMS is focused broadly on manuscripts and on digital manuscripts, particularly but not solely medieval manuscripts – we have databases we host, we work with the physical collections in the library, we collaborate on a number of projects hosted at other institutions. Anything manuscript related, we’re interested in. My place in this group is primarily that of resident digital humanist and digital librarian – and let me tell you, these are two different roles.

At the moment, in my librarian role, I’m co-PI of a major project, funded by the Council on Library and Information Resources, to digitize and make available the western medieval manuscripts from 15 Philadelphia area institutions – about 475 mss codices total. We call this project Bibliotheca Philadelphiensis, or the library of Philadelphia – BiblioPhilly for short. The work I’ve been doing so far on this project is largely related to project management: making sure the manuscripts are being photographed on time, that the cataloging is going well (and at the start we had to set up cataloging protocols and practices). The manuscripts are going online as they are digitized, in the same manner that all our manuscripts do:

They go on OPenn: Primary Digital Resources Available for Everyone, which is a website where we make available raw data as Free Cultural Works. For BiblioPhilly, this means images, including high-resolution master TIFFs, in the Public Domain, and metadata in the form of TEI Manuscript Descriptions under CC:0 licenses – this means released into the public domain.

OPenn has a very specific purpose: it’s designed to make data available for reuse. It is not designed for searching and browsing. There is a Google search box, which helps, but not the kind of robust keyword-based browsing you’d expect for a collection like this.

And the presentation of the data on the site is also simple – this is an HTML rendition of a TEI file, with the information presented very simply and image files linked at the bottom. There’s no page-turning facility or gallery or filmstrip-style presentation. And this is by design. We designed it this way, because we believe in separating out data from presentation. The data, once created, won’t change much. We’ll need to migrate to new hardware, and at some point we may need to convert the TEI to some other format. The technologies for presentation, on the other hand, are numerous and change frequently. So we made a conscious decision to keep our data raw, and create and use interfaces as they come along, and as we wish. Releasing the data as Free Cultural Works means that other people can create interfaces for our data as well, which we welcome.

So we are working on an interface for the Bibliotheca Philadelphiensis data right now. We’re partnering with a development company that worked with the Walters Art Museum on their site called Ex Libris (manuscripts.thewalters.org), and this is the site that most people will use to interface with the collection. It has the browsing facility you’d expect.

Here we’re selecting a Book of Hours.

 

 

Let’s say we just want 14th century Books of Hours.

So here’s Lewis E 89,

we have a Contents and Decorations menu so we can browse directly to a specific text, here’s the start of the Office of the Dead with a four-line illuminated initial.

And then further down the page is all the data from the record you’d expect to see on any good digitized manuscript site. This is pulled from the TEI Manuscript Descriptions and indexed in a backend database for the site, while the images are pulled directly from their URLs on OPenn.

So this is great. We’re doing very important work making data about manuscripts available to the world, in ways that make it easy to reuse them, whether to build new projects or to just publish an image in a book or on a website. And I want to make it clear that I don’t intend anything in the rest of my talk to undermine this vital work. But. but.

I mentioned that I’m also the resident Digital Humanist on our team. And in addition to the technical work involved in that, the work of building tools (which I promise I will get to before this talk is finished) I do a lot of thinking about what it is we do. And there’s a question, one question that keeps me up at nights and drives the focus of my current research. The question comes out of a statement. And that statement is:

The digitized manuscript is not the manuscript itself.

 

Or, as I prefer it, in meme form

This shouldn’t be a controversial statement, to anyone who has ever worked with a manuscript and then used a digital version of it. It’s obvious. This is an obvious statement. And yet we undermine this statement all the time in the ways we talk about digitized manuscripts – I do it to. How many times have you said, or heard someone else say, “I saw the manuscript online” or “I consulted it online” or “I used it online”? Not pictures of the manuscript or the digitized manuscript but the manuscript? So one question to come out of this is:

If the digitized manuscript isn’t the manuscript, then what is it?

 

This is not actually the question that keeps me up nights, because although this is interesting, it’s not practical or useful for me. My job is to make these things available so you can use them. So the question that actually keeps me up at night is:

If a digitized manuscript isn’t a manuscript, how can we present it in ways that explore aspects of the original’s manuscript-ness, ethically and with care, while both pushing and respecting the boundaries of technology? Although this practice of thinking about what it means to digitize a manuscript and what that becomes seems really philosophical, this is really practical question.

As a librarian who works on the digitization and online presentation of medieval manuscripts, I think it’s really important for me and others in my position to be mindful about what exactly it is that we do in our work.

So for the rest of our time today I’ll first go over a few things I’ve already worked out a bit for other papers, and then move on to the Ghost in the Machine, which is new and confuses me in a way that my previous stuff hasn’t.

Outline (my work so far):

  • Uncanny valley
  • Memes & Terms
  • Transformative works
  • Ghost in the Machine*

The Uncanny Valley

We’ll start with the Uncanny Valley, which I presented on at the International Congress on Medieval Studies in Kalamazoo this May although I have been thinking about applying the uncanny valley to digitized manuscripts since 2008.

The uncanny valley is a concept that comes out of robotics and it has to do with how humans perceive robots as they become more like humans. The concept was first described by Masahiro Mori in a 1970 article in the Japanese journal Energy, and it wasn’t translated into English completely until 2012. In this article, Mori discusses how he envisions people responding to robots as they become more like humans. The article is a thought piece – that is, it’s not based on any data or study. In the article, Mori posits a graph, with human likeness on the x axis and affinity on the y axis. Mori’s proposition is that, as robots become more human-like, we have greater affinity for them, until they reach a point at which the likeness becomes creepy, or uncanny, leading to a sudden dip into negative affinity – the uncanny valley.

M. Mori, “The uncanny valley,” Energy, vol. 7, no. 4, pp. 33–35, 1970 (in Japanese);  M. Mori, K. F. MacDorman and N. Kageki, “The Uncanny Valley [From the Field],” in IEEE Robotics & Automation Magazine, vol. 19, no. 2, pp. 98-100, June 2012. (translated into English) (https://ieeexplore.ieee.org/document/6213238/)
Manuscripts aren’t people, and digitized manuscripts aren’t robots, so I distilled four relevant points out of Mori’s proposition, and then drew four parallel points for our manuscript discussion:

First, Robots are physical objects that resemble humans more or less (that is the x-axis of the graph)

Second, as robots become more human-like, people have greater affinity for them (until they don’t – uncanny valley) – this is the y-axis of the graph

Third, the peak of the graph is a human, not the most human robot

Fourth, the graph refers to robots and to humans generally, not robots compared to a specific human

And the four parallel points:

First, digitized manuscripts are data about manuscripts (digital images + structural metadata + additional data) that are presented on computers through interfaces. Digitized manuscripts are fragments, and in visualizing the manuscript on a computer we are reconstructing them in various ways. These presentations resemble the parent manuscript more or less (this is the x-axis)

Second, as presentations of digitized manuscripts become more manuscript-like, people have greater affinity for them (until they don’t – uncanny valley) – this is the y-axis

Third, the peak of the graph is the parent manuscript, not the most manuscript-like digital presentation

Fourth, the graph refers to a specific manuscript, not to manuscripts generally

I think that this is the major difference in applying the concept of the uncanny valley to manuscripts vs. robots: while Robots are general, not specific (i.e., they are designed and built to imitate humans and not specific people), the ideal (i.e., most manuscript-like) digital presentation of a manuscript would need to be specific, not general (i.e., it would need to be designed to look and act like the parent manuscript, not like any old manuscript)

The Effect of Movement is an important piece of the functioning of the uncanny valley for robotics, and it is also important when thinking about manuscripts. Manuscripts, after all, are complex physical objects, much as humans are complex physical objects. Manuscripts have multiple leaves, which are connected to each other across quires, the quires which are then bound together and, often, connected to a binding. So moving a page doesn’t just move that page, much as bending your leg doesn’t just move your leg. Turning the leaf of a manuscript might tug on the conjoined leaf, push against the binding, pull on the leaves preceding and following – a single movement provoking a tiny chain reaction through the object, and one which, with practice, we are conditioned to recognize and expect. If things move other than we expect them to (as zombies move differently from humans in monster movies) our brains will recognize that there is something “off” about them

Here is a video of me turning the pages of Ms. Codex 1056, a Book of Hours from the University of Pennsylvania. This will give you an idea of what this manuscript is like (its size, what its pages look like, how it moves, how it sounds). It’s a copy of the manuscript, showing just a few pages, and the video was taken in a specific time and space with a specific person. If you came to our reading room and paged through this manuscript, it would not look and act the same for you.

Now let’s take a look at this manuscript presented through Penn’s page turning interface, Penn in Hand (this is the University of Pennsylvania’s official interface and is separate from the BiblioPhilly interface that is under development) When you select the next page, the opening is simply replaced with the next opening (after a few seconds for loading). The images are, literally, flat. This is two images, taken at different times (perhaps within hours of each other), presented side-by-side to give the impression of a book opening. It’s a reconstruction using the digitized fragments.

These are two very different ways for looking at this manuscript through a digital interface, and they do very different things, illustrate different aspects of the book, and can be used in different ways. We’ll come back to this later in the talk, but for now I would like to move on to memes and terms.

Memes and Terms

I find the Uncanny Valley useful for providing a framework for presentation technologies along a spectrum from “less like the manuscript” to “more like the manuscript”.  Next, I want to say a little bit about the terms that people choose to use to refer to digitized manuscripts, how we talk about them. (Initially presented this summer in a lecture for Rare Book School) I think that in this instance, terms function rather like memes, so I want to start with the meme (You’ll notice that I’m a fan of memes).

The word meme was coined in 1976 by Richard Dawkins in his book The Selfish Gene. In the Oxford English Dictionary, meme is defined as “a cultural element or behavioral trait whose transmission and consequent persistence in a population, although occurring by non-genetic means (especially imitation), is considered as analogous to the inheritance of a gene.” Dawkins was looking for a term to describe something that had existed for millennia – as long as humans have existed – and the examples he gave include tunes, ideas, catchphrases, clothes fashions, ways of making pots or building arches. These are all things that are picked up by a community, ideas and concepts that move among members of that community, are imitated and modified, and which are frequently moved on to new communities as well where the process of imitation and modification continues. More recently the term meme has been applied specifically to images or text shared, often with modification, on the Internet, particularly through social media: If you’ve ever been RickRolled, you have been on the receiving end of a particularly popular and virulent meme.

Following this theory, terms work like this:

  1. A term begins with a specific meaning (e.g., outlined in the OED, citing earlier usage),
  2. A scholar adopts the term because we need some way to describe this new thing that we’ve created. So we appropriate this term, with its existing meaning, and we use it to describe our new thing.
  3. The new thing takes on the old meaning of the term,
  4. The term itself becomes imbued with meaning from what we are now using it to describe.
  5. The next time someone uses that term, it carries along with it the new meaning.

There are three terms that I see used a lot to refer to digitized manuscripts, but although we use these three terms – facsimile, surrogate, and avatar – to refer to digitized manuscripts, it is clear that these terms don’t mean the same thing, and that by choosing a specific term to refer to digitized manuscripts we are drawing attention to particular aspects of them. Facsimile literally means make similar, so if I call a digitized manuscript a facsimile, I draw attention to its status as a copy. Surrogate, on the other hand, generally means something that stands in for something else. So if I call it a surrogate, I draw attention to its status as a stand-in for the physical object. Avatar, finally, refers to manifestation, originally a god manifesting in human form, but now used to refer to people or physical objects manifesting in digital form. So if I call it an avatar, I draw attention to its status as a representation of the physical object in a digital world. Not a copy, not a replacement, but another version of that thing.

I would like to remark on our apparent desire as a community to apply meaning to digital version of manuscripts by using existing terms, rather than by inventing new terms. After all, we coin new words all the time, so it would be understandable if we decided to make up a new term rather than reusing old ones. But as far as I know we haven’t come up with a completely new term for “digitized medieval manuscript,” and if anyone has it hasn’t caught up enough to be reused widely in the scholarly community. I expect this comes from a desire to describe a new thing in terms that are understandable, as well as to define the new thing according to what came before. Digital versions of manuscripts are new things that have a close relationship with things that existed before, so while we want to differentiate them we also want to be able to acknowledge their similarities, and one way to do that is through the terms we call them.

Like pushing an idea through different memes, pushing the concept of a digitized manuscript through different terms give us flexibility in how we consider them and how we explain them, and our feelings about them, to our audiences. That we can so easily apply terms with vastly different meanings to the digital versions of manuscripts says something about the complexity of these objects and their digital counterparts.

Transformative Works / Language of Care

Another aspect that I’ve been looking into is coming up with a way to talk about manuscripts and manuscript digitization using the language of transformative works. (First presented at the International Medieval Congress, Leeds)

Transformative work is a concept that comes out of fandom: that is, the fans of a particular person, team, fictional series, etc. regarded collectively as a community or subculture. We typically talk about fandom in relation to sports, movies, or TV shows, but people can be fans of many things (including manuscripts). As defined on the Fanlore wiki:

“Transformative works are creative works about characters or settings created by fans of the original work, rather than by the original creators. Transformative works include but are not limited to fanfiction, real person fiction, fan vids, and graphics. A transformative use is one that, in the words of the U.S. Supreme Court, adds something new, with a further purpose or different character, altering the [source] with new expression, meaning, or message.”

In some fandom communities, transformative works play a major role in how the members of that fandom communicate with each other and how they interact with the canon material (“canon” being the term fans use to refer the original work). Transformative works start with canon but then transform it in various ways to create new work – new stories, new art, new ideas, possible directions for canon to take in the future, directions canon would never take but which are fun or interesting to consider.

Although it’s still quite niche there is a small but growing academic movement to apply the concept of transformative work to historical texts. Some of this work is happening through the Organization for Transformative Works, which among other things hosts Archive of our Own, a major site for fans to publish their fanworks, and provides legal advocacy for creators of fanworks.

The Organization for Transformative Works also publishes a journal, Transformative Works and Cultures, and in 2016 they published an issue “The Classical Canon and/as Transformative Work,” which focused on relating ancient historical and literary texts to the concept of fan fiction (that is, stories that fans write that feature characters and situations from canon). There is also a call for papers currently open for an upcoming special journal issue on “Fan Fiction and Ancient Scribal Culture,” which will “explore the potential of fan fiction as an interpretative model to study ancient religious texts.” This special issue is being edited by a group of scholars who lead the “Fan Fiction and Ancient Scribal Cultures” working group in the European Association of Biblical Studies, which organized a conference on the topic in 2016.

You will note that the academic work on transformative works I’ve cited focus specifically on fan fiction’s relationship with classical and medieval texts, which makes a fair amount of sense given the role of textual reuse in the classical and medieval world. In her article “The Role of Affect in Fan Fiction ,” published in the Transformative Works and Cultures special issue of 2016, Dr. Anna Wilson places fan fiction within the category of textual reception, wherein texts from previous times are received by and reworked by future authors. In particular, Dr. Wilson points to the epic poetry of classical literature, medieval romance poetry, and Biblical exegesis, but she notes that comparisons between fan fiction and these past examples of textual reception are undertheorized, and leave out a major aspect of fan fiction that is typically not found, or even looked for, in the past examples. She says, “To define fan fiction only by its transformative relationship to other texts runs the risk of missing the fan in fan fiction—the loving reader to whom fan fiction seeks to give pleasure. Fan fiction is an example of affective reception. While classical reception designates the content being received, affective reception designates the kind of reading and transformation that is taking place. It is a form of reception that is organized around feeling.” (Wilson, 1.2)

For my paper at the International Medieval Congress at Leeds earlier this summer I used the manuscript University of Pennsylvania LJS 101 as an example of “medieval manuscript as transformative work,” not as a piece of data to be mined for its texts, but as a transformative work in itself, centered around a language of care. I’m not comfortable applying the concept of affective reception to the people who created and worked with medieval manuscripts – I don’t want to suggest that these people loved the manuscript the same way that I do – but I do want to explore the idea that this person or people cared about them, and that other people have cared about these manuscript over time enough that it survives to live now in the libraries and private collections. Their interests may have been scholarly, or based on pride of ownership, or even based on curiosity, but whatever their reasons for caring for the manuscript, they did care, and we know they cared because of the physical marks that they have left on these books. Manuscripts that survive also often show examples of lack of care, damage and so forth, so those elements need to be included in this framework as well.

However, for this paper I want to suggest that we consider that manuscripts relate to digitized manuscripts within the framework of Transformative Works in the same way that so-called canon works relate to fan works. In this framework, we design and build visualizations and interfaces for digitized manuscripts in the same way that fans create fan fiction and fan art: by respecting the canon but adding something new, with a further purpose, altering the source with new expression, meaning, or message. Such visualizations or interfaces would be embodiments of Dr. Wilson’s concept of affective reception. They may look and function similar to scholarly interfaces, and may even be able to be used for scholarship, but they would primarily be designed to elicit an emotional response from the user.

The Ghost in the Machine

And now we get to our discussion of The Ghost in the Machine, which was a term coined by philosopher Gordon Ryle in 1949 to describe the concept of mind-body dualism, that is, that the human mind and the body are distinct and separable.

In the last chapter of 2008’s Printing the Middle Ages, “Coda; The Ghost in the Machine; Digital Avatars of Medieval Manuscripts,” Sian Echard talks of “the ominous implications of the phrase ‘the ghost in the machine’” in the consideration of modern reception, particularly digital reception, of medieval manuscripts. As she notes in her footnote, “the phrase has become a commonplace of digital popular culture and reflection on it, and I use it in that spirit here. It will become clear, however, that the phrase’s origins in discussions of Cartesian dualism are oddly relevant to a consideration of the dis- and re-embodiment of medieval text-objects in digital avatars.”

The idea of applying the concept of the Ghost in the machine, that is the separability or inseparability of the mind of the manuscript and the body of the manuscript, brings to mind a few other, perhaps similar ways of thinking about art and communication. In preparing for this talk I read, in addition to Echard’s piece quoted from a minute ago, both Walter Benjamin’s “The Work of Art in the Age of Mechanical Reproduction” his 1936 treatise on what photography means for art, and Marshall McCluhan’s chapter “The Medium is the Message” from his 1964 book Understanding Media: The Extensions of Man. There are a few things from these texts that I think are relevant for an application of the Ghost and the Machine to a discussion of manuscripts and digital manuscripts.

Benjamin is concerned, among other things, with the aura of a work of art, which he at one point defines as its uniqueness, closely related to its authenticity. Because of the nature of photography (one of the means of mechanical reproduction identified by Benjamin) it is simply not possible to reproduce the aura of an artwork, because art exists in the world, and reproducing a piece of art in a photograph takes it out of that world. In his words,

“Even the most perfect reproduction of a work of art is lacking in one element: its presence in time and space, its unique existence at the place where it happens to be. This unique existence of the work of art determined the history to which it was subject throughout the time of its existence. This includes the changes which it may have suffered in physical condition over the years as well as the various changes in ownership. The traces of the first can be revealed only by chemical or physical analyses which it is impossible to perform on a reproduction; changes of ownership are subject to a tradition which must be traced from the situation of the original.”

McCluhan, on the other hand, is more concerned with how information is conveyed to us – the medium being the physical conveyance of the message, which he makes pains to point out is different from the content, which is something else altogether. In fact, he says,  “the ‘content’ of any medium is always another medium. The content of writing is speech, just as the written word is the content of print, and print is the content of the telegraph.” The message instead is “the change of scale or pace or pattern that it introduces into human affairs.” So the message for us isn’t the text or illustrations in a manuscript, the message is the means by which that content makes it to us through whatever medium.

For example, the medium could be a 15th century Book of Hours, but the message is that the Psalms are divisible into groups, the Penitential Psalms are one such group, and at this point in time it was important for these Psalms to be set apart in these Books using some agreed upon conventions. That is the message. The message of a digital photograph of the Penitential Psalms is altogether different, because the medium is different. The medium is bits and bytes, arranged through a system of machines, to display something that looks like those pages in that Book of Hours. The concept of digital photography as a medium with a message is very similar to McCluhan’s description of electric light:

For it is not till the electric light is used to spell out some brand name that it is noticed as a medium. Then it is not the light but the “content” (or what is really another medium) that is noticed. The message of the electric light is like the message of electric power in industry, totally radical, pervasive, and decentralized.

Likewise, I think, the medium of pervasive digital photography for medieval manuscripts is radical for us today. We are creating relatively massive amounts of fragmentary data about our manuscripts. What do we do with it?

What the analogy of the Ghost and the Machine forces us to do first is to determine what is the ghost, and what is the machine. Being informed by Benjamin’s concept of the aura, McCluhan’s concept of the medium and the message (distinct from content), Dr. Wilson’s concept of affective reception, and also the relationship between affinity and “manuscript-ness” in the uncanny valley, I’d like to propose that the ghost of a manuscript is very close to Benjamin’s concept of the aura, and that the aura is what informs our affinity towards any interface, and is also what we as humans are set to respond to emotionally. The aura, the ghost, is what makes a manuscript unique, and what allows us to identify it. Earlier, when talking about the uncanny valley, I mentioned that if you had Ms. Codex 1056 in your hands tomorrow it would look different that it did in the video I showed you. That difference between what you would see in your interactions with the manuscript, and what you see in my video, that’s the ghost. And it is, I think, impossible to reproduce the ghost of the manuscript, and impossible to visualize it completely in any digital interface I can comprehend.

What we can do, though, is reproduce aspects of the machine – that is, the manuscript stripped of its ghost, removed from time and space and placed in a kind of virtual vacuum – and use those to construct something that gives us insights into the parts of the ghost that aren’t themselves reproducible. As we do this we can also be mindful of what emotional responses the interface might provoke in the users.

To be fair to those of us providing access to digitized manuscripts, the first part of this – reproducing aspects of the machine – is pretty much what we’re doing now and what we’ve always done, although from my experience the thinking that happens around the digitization process is really quite prosaic, literal, practical, and we don’t think enough about how digitization or other photographic reproduction presents us with something that is very different from the manuscript while still telling us something about it. For example

Early facsimiles that present the folios disembodied and set within pages of a modern book. You can read the text but don’t get a real sense of the physicality of the manuscript.

Microfilm, designed for access to the text, giving in most cases even less of a sense of the physical object.

Luxe facsimile, designed to give an impression of the physical book, although in most cases the quire structure and physicality is not exactly like the original.

Digital images presented in a page-turning interface to give the impression of the page opening, can be zoomed in on for reading, although the photography is high-res the images are only of the page, and the binding is photographed separately, so you don’t get a sense of the book as a three-dimensional object.

Digital images presented in a page-turning interface focused on the movement of the electronic page, although the movement doesn’t correspond with the movement of the physical manuscript.

Interactive 3D images that model the topography of the page, but without the context of the full manuscript. This is a miniature of St. Luke from Bill Endres’s Manuscripts of Lichfield Cathedral project.

A visualization using VisColl, a project out of SIMS, which models and visualizes the physical construction of manuscripts, generating quire diagrams and using digitized folio images to reconstruct conjoins.

The VisColl system integrated into the BiblioPhilly interface I showed you at the beginning of the talk, so you can get a sense of how the manuscript is constructed alongside the more usual facing-page view

Each of these interfaces, whether consciously or not, has been designed to illustrate some aspect of the manuscript and to ignore others. What I’d like to suggest is that we simply start being conscious of our choices as we design and build interfaces. Going back again to the uncanny valley, this is what Mori would call Escape by Design.

Uncanny Valley: Escape by Design

Mori proposes that designers focus their work in the area just before the uncanny valley, creating robots that have lower human likeness but maximum affinity. He says:

In fact, I predict that it is possible to create a safe level of affinity by deliberately pursuing a nonhuman design. I ask designers to ponder this. To illustrate the principle, consider eyeglasses. Eyeglasses do not resemble real eyeballs, but one could say that their design has created a charming pair of new eyes. So we should follow the same principle in designing prosthetic hands. In doing so, instead of pitiful looking realistic hands, stylish ones would likely become fashionable.

So after a lot of philosophical thinking, this gives us something practical to aim for. Let’s build page-turning interfaces that are conspicuous in their use of flat digital images, let’s do more with 3D, RTI, and MSI to show us parts of the manuscript we can’t see under regular institutional photography, let’s do more work using data about the manuscript to organize our flat images in new and interesting ways, all with a mind towards informing us about that unreproducible ghost.

Now, this is an ideal, and there are practical reasons this won’t always work. Interface design and development is expensive, and we – the librarians responsible for the digitization – usually aren’t the ones building the interfaces. We’re using interfaces other people build for us, either we buy them a company or we participate in consortia like the International Image Interoperability Framework, IIIF Consortium, which provides open access tools for the community. Very few institutions have the wherewithal to build their own interfaces from scratch. At Penn we’re very lucky, so the interfaces I’ve shown you today VisColl for visualizing the quire structures, and the BiblioPhilly interface with that integrated, are open source code so could be adapted by other institutions. But that adaption work also requires resources. There’s no free answer here.

There’s also the issue of the data itself. Manuscript digitization happens systematically, with each book getting a very specific set of flat images out the other end (every page gets one image, front and back cover, hopefully the edges and spine but not all libraries include those shots in their usual workflow) so it’s not usually possible to take the time to point out interesting anomalies (imagine a bookmark that is conspicuous when you’re sitting down with the book, but which goes unphotographed because it’s not listed in the list of shots provided to the photographer). And 3D modeling, RTI, MSI, are not available to most library digitization facilities, and even if they were there would need to be policies and procedures to determine which manuscripts get such special treatment and which don’t. So it’s great for me to stand up here and say this is what we should do, but a whole other thing to do it in any practical way.

Dot’s 2018 Conception of the Uncanny Valley of Digitized Manuscripts

I’m over time, I’m sorry, so I’ll just close here with the question that keeps me up at night. I’d like to talk to you about this and see if you have ideas or suggestions, or if you think this is moving in the right direction. Thank you.

Reading and Writing Philadelphia, University of Pennsylvania LJS 101, c. 850–1100

This is a version of a paper I presented at the International Medieval Congress at Leeds on July 3, 2018, in the session “The Origins, Effects, and Memory of Caroline Minuscule, II” sponsored by the Network for the Study of Caroline Minuscule. 

Today I’m going to tell you about UPenn LJS 101, which is the oldest codex we have in the University of Pennsylvania Libraries by at least 150 years and which is one of only two codices in our collection which is written in Caroline minuscule (the other one being UPenn Ms. Codex 1058, dated to ca. 1100 and located to Laon) – we also have one leaf written in Caroline minuscule.

The bulk of the manuscript, folios five through 44 (Quires two through six), are dated to the mid-9th century, but in the early 12th century replacement leaves were added for the first four leaves and for the last 20 leaves. LJS 101 reflects the educational program set up in the Carolingian court by Alcuin, featuring a copy of Boethius’s translation of Aristotle’s De Institutiatione (which was commonly called Periermenias, the name also used in this text) and a short commentary on that text (also called Periermenias), along with a few other shorter texts.

I want to admit up front that I don’t have a serious scholarly interest in LJS 101. I’m not a Carolingianist, I don’t study Alcuin, or Bede, or Aristotle. I’m a librarian, and the focus of my work is manuscript digitization and visualization, so I spend a lot of time thinking about manuscripts, how they’re put together, how to digitize them, and how to visualize them in ways that reveal truths about the physical object, ideally without fetishizing them.

But I also love manuscripts. I wouldn’t be doing what I do if I didn’t. I love the way they look, especially books that have been well-used: the imperfect edges, worn ink, and the many and varied signs that people have had their hands on these manuscripts, that they were well-used and well-loved. And there’s no book I love more than LJS 101.

So what I want to do today is tell you about LJS 101, but I want to put my discussion within the context of that love, specifically I want to talk about LJS 101 within the frame of Transformative works.

A transformative work is a concept that comes out of fandom: that is, the fans of a particular person, team, fictional series, etc. regarded collectively as a community or subculture. We typically talk about fandom in relation to sports, movies, or TV shows, but people can be fans of many things (including manuscripts). As defined on the Fanlore wiki:

Transformative works are creative works about characters or settings created by fans of the original work, rather than by the original creators. Transformative works include but are not limited to fanfiction, real person fiction, fan vids, and graphics. A transformative use is one that, in the words of the U.S. Supreme Court, adds something new, with a further purpose or different character, altering the [source] with new expression, meaning, or message.

In some fandom communities, transformative works play a major role in how the members of that fandom communicate with each other and how they interact with the canon material (“canon” being the term fans use to refer the original work). Transformative works start with canon but then transform it in various ways to create new work – new stories, new art, new ideas, possible directions for canon to take in the future, directions canon would never take but which are fun or interesting to consider.

Although it’s still quite niche there is a small but growing academic movement to apply the concept of transformative work to historical texts. Some of this work is happening through the Organization for Transformative Works, which among other things hosts Archive of our Own, a major site for fans to public their fanworks, and provides legal advocacy for creators of fanworks.

The Organization for Transformative Works also publishes a journal, Transformative Works and Cultures, and in 2016 they published an issue “The Classical Canon and/as Transformative Work,” which focused on relating ancient historical and literary texts to the concept of fan fiction (that is, stories that fans write that feature characters and situations from canon). There is also a call for papers currently open for an upcoming special journal issue on “Fan Fiction and Ancient Scribal Culture,” which will “explore the potential of fan fiction as an interpretative model to study ancient religious texts.” This special issue is being edited by a group of scholars who lead the “Fan Fiction and Ancient Scribal Cultures” working group in the European Association of Biblical Studies, which organized a conference on the topic in 2016. Closer to home, Dr. Juliana Dresvina at Oxford University is organizing a colloquium later this month on “Fanfiction and the Pre-Modern World,” and I understand she is planning to organize a larger conference next year.

You will note that the academic work on transformative works I’ve cited focus specifically on fan fiction’s relationship with classical and medieval texts, which makes a fair amount of sense. In her article “The Role of Affect in Fan Fiction ,” published in the Transformative Works and Cultures special issue of 2016, Dr. Anna Wilson places fan fiction within the category of textual reception, wherein texts from previous times are received by and reworked by future authors. In particular, Dr. Wilson points to the epic poetry of classical literature, medieval romance poetry, and Biblical exegesis, but she notes that comparisons between fan fiction and these past examples of textual reception are undertheorized, and leave out a major aspect of fan fiction that is typically not found, or even looked for, in the past examples. She says, “To define fan fiction only by its transformative relationship to other texts runs the risk of missing the fan in fan fiction—the loving reader to whom fan fiction seeks to give pleasure. Fan fiction is an example of affective reception. While classical reception designates the content being received, affective reception designates the kind of reading and transformation that is taking place. It is a form of reception that is organized around feeling.” (Wilson, 1.2)

Back to LJS 101. What I want to do here is look at LJS 101, not as a piece of data to be mined for its texts, but both as a transformative work in itself, and as an object for the transformative work of others, particularly digital versions, and I want to center this looking at the manuscript using a language of care. I’m not comfortable applying the concept of affective reception to the people who created and worked with LJS 101 over the past 1100 or so years – I don’t want to suggest that the person who took the manuscript from its original form to its 12th century form loved the manuscript the same way that I do – but I do want to explore the idea that this person or people cared about it, and that other people have cared about this manuscript over time enough that it survives to live now in the library at the University of Pennsylvania. Their interests may have been scholarly, or based on pride of ownership, or even based on curiosity, but whatever their reasons for caring for the manuscript, they did care, and we know they cared because of the physical marks that they have left on this book. The manuscript as it survives also shows some examples of lack of care, and I want to address those as well.

Binding: 19th-century English diced Russia leather, bound for Sir Thomas Phillips.

The first obvious mark of care on LJS 101 is the binding, which is a lovely 19th century leather binding done for book collector Sir Thomas Phillips, who purchased the book in or around 1826, and which was sold out of his estate in 1945, sold again in 1978 and 1979, and finally sold to Lawrence J. Schoenberg in 1997.

Formerly owned by Sir Thomas Phillips, ms. 2179 (stamped crest inside upper cover). LJS collection bookplate. Gift of Barbara Brizdle Schoenberg in honor of Amy Gutmann, President, University of Pennsylvania, 2014.

Phillips also left two owners marks on the inside front cover, a stamped crest in the upper part of the inside cover, and a second ownership stamp with his library’s number for the manuscript (ms. 2179). Another ownership mark is the Penn Libraries bookplate, showing that the manuscript belongs with the Lawrence J. Schoenberg collection. Phillips and Schoenberg both cared: Phillips cared enough to bind the book, they both marked it as their own, and Schoenberg gifted it to Penn in 2014 for long-term institutional care.

1r: Conclusion of a grammatical work, 7-line verse by Eugene II of Toledo, Isidore’s definition of rhetoric (12th c.)

The first quire – four leaves – is a 12th century replacement. Fol. 1r begins with the ending of a grammatical text on declensions, including some words in Greek and references to the Aeneid and the Thebais of Statius. This folio also contains a 7-line poem by Eugenius II of Toledo, “Primus in orbe dies…,” a poem on the seven days of Biblical creation (MGH, Auct. Antiq. XIV; Migne, PL LXXXVII:365-6) [1]. This implies that the first quire has not always been the first quire, and at some point there was at least one more quire before Quire 1. As we’ll see in a moment, the text on the last leaf of Quire 1 leads directly into the text on the first leaf of Quire 2, which makes it clear that the 12th century work was created in response to the 9th century piece, and it was not the case that two existing pieces were placed together without regard for the other. (Note that the first leaf in the manuscript also includes another ownership mark from Sir Thomas Phillips, noting the number of the manuscript in his collection, and the number it had in a catalog – yet another sign of care from Phillips.)

1v: Boethius’ translation of Aristotle’s De Institutione (Periermenias Aristotelis) 12th c. switching to 9th c. on fol. 5, back to 12th c. on fol. 45

The main text of the manuscript, the Latin translation of Aristotle’s De Institutione (called Periermenias Aristotelis generally and in the text), begins on fol. 1v. Boethius’s translation of De Institutione, along with his translation of the Categories and Porphyry’s introduction to Aristotelian logic, the Isagoge, formed the core of Alcuin of York’s logic textbook, De dialectica. These three works—as translated by Boethius—would become known as the logica vetus, and would dominate the study of logic until the twelfth century. This explains the why of this manuscript – this was an important text. The inclusion of the now-missing grammatical text also implies that this book was designed with care to be a sort of textbook. Note the striking illuminated initial P that begins the text – a visual sign of care taken in the design of this manuscript.

Folio 4v-5r, with the 12th century script on the left and the 9th century script on the right.

As noted above, the 12th century text from Quire 1, folio 4v, continues directly to the 9th century portion of the manuscript on Quire 2, folio 5r.

UPenn LJS 101, folio 27r, showing interlinear and marginal corrections and glosses.

The same hand that wrote out the 12th century full text here and from folio 45 through the end of the manuscript also went through the 9th century text and made many corrections, both deleting and adding text. As far as I know there hasn’t been a full textual analysis of the text in this manuscript, but it’s possible, if not likely, that the 12th century scribe had a more recent copy of the text and corrected the older version in comparison with it. For whatever reason the scribe, or someone supervising the scribe, cared enough to take the existing 9th century copy of Boethius, to complete it, and to bring it up to date with an improved version of the text.

36v: Diagram from 9th c. with color added in the 12th century

Also in the 12th century program of care, the scribe or someone alongside the scribe added green and yellow highlighting to the 9th century diagrams and to some of the headwords.

As we move on though the manuscript, note that the number of corrections drops precipitously after folio 45, when we are back with the 12th century scribe.

LJS 101, folio 44v and 45r, with the 9th century script on the left and the 12th century script on the right
LJS 101, folios 52v-53r; there are several leaves missing here.

There are several quires’ worth of leaves missing between Quire six (ending with folio 44) and Quire seven (beginning with folio 45, where the manuscript switches again from 9th century to 12th century) – 49 pages worth of edited text, from Prima Editio, I c. 9, p. 111 line 20 to Prima Editio, II c. 11, p. 160 line 15– and there are at least two quires missing between Quire seven (folio 52) and Quire eight ( folio 53), from Prima Editio, II c. 13, p. 188, line 5 to Prima Editio, II c. 14, p. 224, line 13, representing 36 pages of edited text. It’s unclear when, how, and why these pages were removed, although the folio numbering appears to be from the time of Thomas Phillips, so we can safely assume that they were removed at some point before he had the manuscript bound in its lovely leather binding.

53v: The last six lines of the unidentified text; Periermeniae (12th c.)

In Quire eight, the Boethius translation ends naturally on folio 53r, line 16, at the end of Prima Editio, II c. 14 (page 225 in the edition). There is another text between the end of that and the beginning of the next commentary that has yet to be identified. This unidentified text is the last six lines of 53r and the first six lines of 53v. The next text begins on line seven of 53v. This text is a short commentary on Aristotle’s De Institutione, the Periermeniae attributed to Apuleius, the second-century AD Platonist philosopher and Latin-language prose writer. (Emma Kathleen Ramsey, “A commentary on the Peri Hermeneias ascribed to Apuleius of Madaura“)

Commentary is a kind of transformative work, in which a writer expands on the thoughts of the original writer, expanding and explaining in order to create something new, but (hopefully) illuminating. In LJS 101, then, we have a physical expression of a canon work followed by a transformation, an order that was planned by someone who cared enough to organize them that way.

Fol. 59v: Ending of Periermeniae, beginning of commentary by Haymo of Auxerre.

After the commentary by Apuleius there is brief section of a commentary on Isaiah by Haymo of Auxerre (formerly attributed to Haymo of Halberstadt)[2] that is followed by “Versus de singulis mensibus” (a poem by Decimus Magnus Ausonius on the seven days of Creation). The poem itself has been laid out with care, the columns blind-ruled to keep them straight, and the large initials alternating between lighter and darker ink. Given the topic of the poem one could say this is yet another example of a transformative work – a poetic retelling of the Christian creation story originally told in the Bible. It also ties in with the poem by Eugenius II of Toledo, “Primus in orbe dies…,” from folio 1r – which is also on the topic of the seven days of Creation.

60v: Sample letter of a monk to an abbot

The next text in Quire eight is a sample letter of a monk to an abbot, on folio 60v. I want to thank Brother Thomas Sullivan from Conception Abbey for helping me with this text, which hasn’t been otherwise studied. This is the only section of the 12th century portion of the manuscript that is heavily annotated, the original letter being expanded by both interlinear and marginal glosses. The interlinear glosses expand the primary text into more intense or elaborate language, e.g., l. 8 inserts the Latin word valde (very). The marginal glosses are signaled by a system of thirteen different interpolative marks in the left margin and one in the right. The addressee appears to be one “Domno Luculemus,” and it is not clear if this is the name of an actual abbot, or an imagined character. The letter fits in with the medieval tradition of model letters and letter-writing guides, which is traditionally dated to the work of Alberic of Monte Cassino in the late 11th century and is well documented in the 12th century.[3] Is this model letter another example of a transformative work? Because this particularly letter hasn’t been studied we can’t tell if it’s a version of an existing letter, or written with “characters” featured in other letters. If not, going back to our language of care, we can venture that since it appears on the verso side of a bifolium it was placed here for some reason understood by the scribe or scribes planning the layout.

61r: Boethius’ translation of Aristotle’s De Institutione (Periermenias Aristotelis), 12th century

The Boethius text continues on Quire nine, folio 61r, and cuts off at the end of 62v.

63r: Miscellaneous verses, definitions, and biblical commentary

We finish up with folio 63r-64r, containing miscellaneous verses, definitions, and biblical commentary. Oddly, folio 63v is a continuation of the sample letter on folio 60v. We’ll return to Quires eight and nine below, where I’ll say more about signs of lack of care in LJS 101, and describe how these two quires are currently misbound.


Bern, Burgerbibliothek, Cod. 250, fol. 10v (9th c.)

Although LJS 101 is unique to Penn, it is not the only 9th century manuscript showing 12th century care. Bern, Burgerbibliothek, Cod. 250, begins with a 9th century section (folios 1-11) describing a meeting between Einhard and Lupus of Ferrières, at the time that Einhard gave Lupus a book of arithmetic by Victorius of Aquitaine along with a now widely known model alphabet for Ancient Capitals.

Bern, Burgerbibliothek, Cod. 250, fol. 12r (12th c.)

This is followed by a 12th century section, folios 12-18, including a commentary by Abbo of Fleury on the ‘computus’ (reckoning the date for Easter). Note the green highlighting, which is similar to the green highlighting added to the diagrams in LJS 101. As in LJS 101, the 12th century scribe did not just add to the existing manuscript. They marked the 9th century text to bring it up to date, and to incorporate it into what is essentially a new object, and, arguably, a transformative work.

Bern, Burgerbibliothek, Cod. 250, fol. 1r (11th c.)

They added an abacus table to folio 1r, which was presumably left blank in the 9th century, and, as with LJS 101, they also added interlinear glosses and corrections. As with the transformation of LJS 101, these modifications show a certain amount of care, both for the older sections of the manuscript and for the new object.

Bern, Burgerbibliothek, Cod. 250, fol. 10v: 9th c. text with 11th c. interlinear gloss

So we’ve walked through LJS 101 and looked at the transformative nature of the texts in the manuscript, and the physical object itself. I’d like to spend the last portion of my talk looking at another transformative physical aspect of the manuscript and how this aspect may illustrate a lack of care, while at the same time exploring LJS 101’s digitization as another potential for transformative work around the manuscript.

In addition to the missing quires between Quires six and seven and seven and eight, there are two quires in LJS 101 that have been misbound. In both cases it is clear that somehow bifolia were mixed up, likely during rebinding (whether during the last rebinding, under the ownership of Sir Thomas Phillips, or earlier, we don’t know) and care was not taken to ensure that the bifolia were put back together correctly with regard to the text contained on them. I can’t see any aesthetic reason for the quires to be rearranged as they are; antiquarians such as Matthew Parker frequently transformed the manuscripts in his ownership in ways that made them more attractive in his eyes, in various ways, but the changes made in LJS 101 appear to be accidental rather than purposeful.[4]

A study of the text in Quires two and three (the first eight leaves of the 12th century portion of the translation) makes it clear that the leaves were bound out of order.[5] Here is the current order, along with the text beginning and ending each folio (all are Prima Editio I c. 2, page and line numbers are from the edition)

Folio 5: ends with p. 38 line 2 [the text continues on folio 9]

Folio 6: begins with p. 41 line 5, [text continues on folio 7]

Folio 7: ends with p. 44 line 2 [the text continues on folio 11]

Folio 8: begins with p. 46 line 30, ends with p. 48 line 15 [the text continues on folio 13]

Folio 9: begins with p. 38 line 2, [text continues on folio 10]

Folio 10: ends with p. 41 line 4 [the text continues on folio 6]

Folio 11: begins with p. 44 line 2, [text continues onto folio 12]

Folio 12: ends with p. 46 line 30 [the text continues on folio 8]

Folio 13 (the first leaf of Quire four): begins with p. 48 line 15

Beginning with folio 5 and following the text through these eight leaves, we can find the original order: 5, 9, 10, 6, 7, 11, 12, 8   [13… 

What was originally a quire of eight leaves was made into two quires of four leaves. The current Quire two consists of the the innermost bifolia of the original quire nested in the outermost bifolia, and the two internal bifolia from the original quire form the current Quire three. This is our first example of a lack of care. How did this happen, and why wasn’t this error, assuming it was an error, discovered before it was bound?

It might be hard to picture in your mind exactly what has happened here, so it might help to look at some transformative digital work using a project designed specifically to visualize the physical construction of manuscripts, VisColl.  A couple of years ago a student in my Rare Book School class used VisColl to model both the current and previous structures of these leaves and generated diagrams to help us understand what exactly is happening here.

Correct arrangement of fols. 5-12. Jesse McDowell, “An Ideal Collation of LJS 101”

Here is a diagram and bifolia visualization of the original structure of what are now folios 5 through 12. Using current numbering, the order of leaves should be 5-9-10-6-7-11-12-8. You can see in this diagram that 5 and 8 are conjoin and the outer bifolio, followed by 9-12, then 10-11, then 6-7. Looking at the numbering here you can see already how they were rearranged.

Current (out of order) arrangement of fols. 5-12. Jesse McDowell, “An Ideal Collation of LJS 101”

But the next diagram shows the current structure, two four-leaf quires, with the middle bifolia grouped together and the outer and inner ones likewise. Viscoll, with its focus on modeling the physical construction of manuscripts and visualizing them in various ways, is a really good example of a system for building transformative works based on a medieval manuscript: It takes an existing character, expands on it, illuminates it, and in the process makes something new.

As I was preparing this paper for the blog, I discovered a second example of a misbound quire in LJS 101, illustrating another example of a lack of care in this book’s long history. As I mentioned above, the sample letter on fol. 60v actually continues on fol. 63r. Fagin-Davis notes this in her description of LJS 101, but until now no-one has investigated why this might be – it doesn’t make sense for a text to start on one leaf and end two leaves later. So why does this happen? While taking a closer look at this section – Quires eight and nine, from folio 52 (the end of Quire seven) through folio 64 – I discovered that, as mentioned earlier, there are at least a few quires of the Boethius text missing between Quires seven and eight. Quire eight, eight leaves, begins with folio 53, and the text starts with Prima Editio, II c. 14, p. 224, line 13. The text then ends naturally in the middle of 53r. But Quire nine (four leaves, starting with folio 61) picks up Boethius again, and when I checked the citation it begins with page 216, line 25 in the edition – this text comes before the text on fol. 53.  Folio 62 ends with the text from the edition page 224, line 13, which is exactly where the text picks up on fol. 53r:

Folio 53: begins with p. 224, line 13, continues through Folio 60

Folio 60: ends with the sample letter [text continues on Folio 63r]

Folio 61: begins with page 216, line 25, continues through Folio 62

Folio 62: ends with page 224, line 13 [text continues on Folio 53]

Folio 63: contains the rest of the sample letter, continues through Folio 64

Beginning with folio 61 (having the earlier text) and following through these 12 leaves, we can find the original order: 61, 62, 53-60, 63, 64

The manuscript originally had a quire of 12 leaves. The innermost eight leaves were removed and placed before the outermost four leaves, giving us two new quires. As with the example of Quires two and three, there’s no clear explanation of why, and I am assuming this was an error.

Transformative work in fandom is created by fans who take characters and situations from existing works and make new things with them. Transformative work differs from traditional scholarly work in that the focus is on affection. To quote Anna Wilson again, “It is a form of reception that is organized around feeling.” I don’t want to claim that 12th century scribes or Thomas Phillips loved the manuscript that we call LJS 101, but I do think it’s reasonable to suggest a language of care, and I think it’s a useful exercise to think about this manuscript and others within the theoretical frame of the transformative work. Doing this pushes the boundaries of current research in this area, which tends to focus on the relationship between fan fiction and earlier forms of textual reception. Moving beyond this, to consider a language of care when talking about manuscripts – bearers of text as well as physical expressions of their own history – and to the visualization of digitized manuscripts using new methods pushes traditional scholarship in new and exciting directions that also normalizes the affection we hold for the objects of our study.


[1] Identification of the texts on Folio 1r are from Lisa Fagin-Davis, Catalog record for LJS 101, March 2001

[2] Haymo Halberstatensis: HAYMONIS HALBERSTATENSIS EPISCOPI COMMENTARIORUM IN ISAIAM LIBRI TRES Ab eodem auctore dum viveret, multorum additione, quae in aliis plerisque exemplaribus desiderantur, passim locupletati et recognitione postrema ad unguem ubique recogniti. (Coloniae, per honestum civem Petrum Quentell, anno 1531 Liber Secundus, Caput LIII (from Patrologia Latina, Vol. 116, Col.0991C-Col.0991D)

[3] Malcolm Richardson, “The Ars dictaminis, the Formulary, and Medieval Epistolary Practice, Letter-Writing Manuals and Instruction from Antiquity to the Present, edited by Carol Poster and Linda C. Mitchell (University of South Carolina Press, 2007), pp. 52-66.

[4] Timothy Graham, ‘Matthew Parker’s manuscripts: an Elizabethan library and its use‘, in The Cambridge History of Libraries in Britain and Ireland, Volume 1: To 1640, ed. E. Leedham-Green and T. Webber (Cambridge, 2006), pp. 322-41

[5] The misbinding of Quires 2 and 3 has been noted by Fagin-Davis in her catalog record, and also by Jesse McDowell in his blog post An Ideal Collation of LJS 101

Reaction, a Mémoire

For the #madememedieval hashtag currently going around Twitter, here’s the story of how I became a medievalist (although I didn’t realize it until much later). This is part of the Preface to Reactions Medieval/Modern, the catalog for the exhibit I curated at the University of Pennsylvania Libraries in Fall 2016.

When I was eleven years old, my parents brought my brother (who would have been thirteen at that time) and me to England for two weeks during the summer. They rented a house in the southwest corner of the country, not far from Bath, and borrowed a car. We went all over the place; I remember Salisbury and Stonehenge, Wells Cathedral and Bath Abbey. I also remember riding in the back seat down a particularly narrow road surrounded by trees and fields and pointing out the funny stones the cows were grazing around, at which point my father remarked that we were probably getting close to Avebury. But one memory of that trip stands out above all the rest: The Castle. Over the years, The Castle in my mind has grown to almost mythical proportions as I’ve come to realize (even more so over the past couple of years as I have been preparing for this exhibition) that it marks the point at which I was destined to become a medievalist. My reaction to The Castle was an epiphany, my path set in childhood—and I didn’t realize it until almost thirty years later.

In my memory, we visited The Castle toward the end of the afternoon. I was probably tired and grouchy, although I don’t remember that. (I spent much of this trip tired and grouchy.) I do remember a small town, walking through a residential area with lots of houses, turning the corner, and all of a sudden there it was. It was very different from Warwick Castle, which we’d visited earlier in our trip and which I’d found dull and crowded and ugly. This one was small. I remember a tower, and a demolished wall; it was a ruin. There was no one else around, so my brother and I climbed on the broken walls and ran around and basically acted like kids.

At some point, I noticed that the interior of the tower, which was several meters tall, had regular sets of holes around the perimeter, several feet apart horizontally and several more running vertically all the way to the top. I asked about the holes, and someone told me that wooden beams would have gone through those holes, serving as supports for floors. And I remember being struck very suddenly that people had lived here. I was standing in this ruined tower, we were using it essentially as a playground, and yet hundreds of years ago people had made this place their home.

That experience was the first time I can remember having a visceral reaction to a physical object, a reminder that this object was not just the thing we have today, but a thing that has existed over time and been touched by so many hands and lives before it came to us, and will continue touching people long after we are gone.

As my personal experience attests, reactions are both immediate and ongoing, with potentially long-term effects (on both people and objects). Not all premodern book owners wrote in their books, and not all modern artists look to medieval manuscripts for inspiration, but by looking at the various ways that medieval and modern people have reacted to manuscripts, we may come to appreciate these objects as more than simply bearers of information, or beautiful things for us to enjoy. They bear the marks of their own history, and they still have the potential to make history today and in the future.

The Castle: Nunney Castle in Southwest England, not far from Bath. Photograph by Hugh Llewelyn, licensed CC:BY:SA.

Hosting the Digital Rāmamālā Library at Penn, or, thinking about open licenses for non-Western digitized manuscripts

This talk was presented as part of a panel at the Global Digital Humanities Symposium at Michigan State University, March 16-17 2017: ARC Panel: Access, Data, and Collaboration in the Global Digital Humanities

My story begins in 2012, when Dr. Benjamin Fleming, Visiting Scholar in Religious Studies and Cataloger of Indic Manuscripts for the Kislak Center for Special Collections at the University of Pennsylvania, proposed and was awarded an Endangered Archives grant from the British Library. The main purpose of the grant was to write a catalog for the Ramamala Library, which is one of the oldest still-active traditional libraries in Bangladesh. A secondary part of this grant was to digitize around 150 of the most fragile manuscripts from the Ramamala Library, and an agreement was made that the University of Pennsylvania Libraries would be responsible for hosting these digital images. At this time, someone from the Penn libraries recommended to Dr. Fleming that they get a Creative Commons license, and the non-commercial license was given as an example. The proposal went forward with a CC-NC license, which both the Penn Libraries and the Ramamala Library agreed to, and everything was fine.

So a bit of historical logistics might be helpful here. 2012, the year this proposal was agreed to, was one year before the Schoenberg Institute for Manuscript Studies (SIMS) was founded at Penn. One of the very first things that SIMS was tasked with doing – one of the things it was designed to do – was to create some kind of open access portal to enable the resue of digital images of our medieval manuscripts. Penn has been digitizing our manuscripts and posting them online since the late 1990s, and in 2013 all of them were online in a system called Penn in Hand. Penn in Hand is a kind of black box – you can see the manuscripts in there, search for them, navigate them, but if you want to publish an image in a book or use them in a project, you have to do some work to figure out what’s allowed in terms of licensing, and then figure out how to get access to high-resolution images that are going to be usable for your needs.

It took us a couple of years, but on May 1, 2015, we launched OPenn: Primary Digital Resources Available for Everyone, as a platform not for viewing images, but explicitly for downloading and reusing images and metadata. OPenn includes high-resolution master TIFF images and smaller JPEG derivatives, as well as robust metadata in TEI/XML using the Manuscript Description element. We started hosting our own data, but today we host manuscript data for 17 institutions in the Philadelphia area with others in the US and Europe (including Hebrew manuscripts from the John Rylands Library at the University of Manchester) to come online in the next year. One the things we wanted was for users of OPenn to always be certain about what they could do with the data, so we decided that anything that goes into OPenn must follow those licenses that Creative Commons has approved for Free Cultural Works:

  • the CC Public Domain mark
  • CC0 (“CC-zero”), the Public Domain dedication for copyrighted works
  • CC-BY, the Creative Commons Attribution license
  • CC-BY-SA, the Creative Commons Attribution-Share Alike license

Note that licenses with a non-commercial clause are not approved for Free Cultural Works, and thus OPenn, by policy, is not able to host them.

You see where this is going.

So in March 2016, a year after OPenn was launched, and well after the Ramamala Library manuscripts had been photographed, Dr. Fleming asked about adding the Ramamala Library material to OPenn, in addition to having it in Penn in Hand (where it was already going online). It wasn’t until he asked this question that we realized, under current policy, we couldn’t include that material in OPenn because of the license. Over the next few weeks we (we being representatives of OPenn, and Dr. Fleming) had several conversations during which we floated some various ideas:

  1. We could loosen the “Free Cultural Works” requirement and allow inclusion of the Ramamala data with the noncommercial license.
  2. We could build a parallel OPenn to contain data with a noncommercial license.
  3. We could use OPenn as a kind of carrot, to encourage the Ramamala library administration to loosen the noncommercial clause on the license and release the data as Free Cultural Works.

The third option was struck down almost as soon as it was suggested. There were, it turns out, highly sensitive discussions that had been happening at the Ramamala Library during the course of the project that would have made such a request difficult to say the least. As Dr. Fleming said in an email to me as we were talking about this talk, “It would be highly inappropriate and complex to try and revisit the copyright agreement as, even as it was, the act of a Western organization making digital copies of a small set of mss unraveled a dense set of internal issues related to private property and government control over cultural property (digital or otherwise).”

Before I return to our other two options I want to take a quick detour to talk a bit about how this conversation changed my thinking about Open Access in general, and about open access of non-Western material specifically.

I am by all accounts an evangelist for open access to medieval manuscript material. I like to complain: about institutions that keep their images under strict licenses, that make their images difficult or impossible to download, that charge hefty fees for manuscripts that have been digitized for years. OPenn is a reaction against that kind of thinking. We say: Here are our manuscript images! Here is our metadata! Here’s how you can download them. Do whatever you like with them. We own the books, but we acknowledge that they are our shared cultural heritage and in fact they belong to all of us. So the very least we can do is give you digital copies.

Until Ramamala, I would have told you that it was necessary that digital images of every manuscript in every culture that was written before modern times should be in the public domain and available for everyone. The people who wrote them are dead, and they wouldn’t have had the same conception of copyright ownership in any case, so why not? But suddenly I wasn’t so sure. I was forced to move beyond thinking in very black and white terms about “old vs. new” to thinking in a more nuanced way about “old vs. new”, sure, but also about “what is yours vs. what is ours” – and what ownership of the physical means for ownership of the digital. Again, until Ramamala I would have told you that physical owners owe it to the rest of us to allow the digital to sit in the public domain. But what does this mean for countries who have suffered under colonialism, and who have been forced for the past however many hundred years or more to share their cultural heritage with the west? In my somewhat unstructured thoughts, I keep coming back to the Elgin Marbles, which is just one example of cultural vandalism by the west (in this case, with the assistance of the Ottoman Empire, which ruled Greece at the time the marbles were taken) but is a particularly egregious one. I’m sure you’re all familiar with the Elgin Marbles, which used to decorate buildings in the Arcopolis, including the Parthenon, before they were removed to Britain between 1801 and 1812, later purchased by the British Museum where they can be seen today, although the current government of Greece has urged their return for many years.

Now the situation of the manuscripts in the Ramamala Library isn’t the same as that of the Elgin Marbles – we aren’t suggesting to move the physical collection to Penn – but I can’t help but believe there is a parallel here, and particularly a case to be made for respecting the ownership of cultural heritage by the cultures that created the heritage. But what does that mean? Dr. Fleming’s comment above implies the complexities around this question. Does “the culture that created the heritage” mean the current government? Or the cultural institutions? Or the citizens of the countries, or the citizens of the culture in those countries, no matter where they live now? And once we figure out the who, how do we ensure they have power to make decisions about their heritage objects? But I have taken my tangent far enough and I want to get back to talking about our ideas for making the Ramamala Library data available in OPenn.

When we left off, we had two other ideas:

  1. We could loosen the “Free Cultural Works” requirement and allow inclusion of the Ramamala data with the noncommercial license.
  2. We could build a parallel OPenn to contain data with a noncommercial license.

We considered the first suggestion but decided very quickly that we didn’t want to open that can of worms. Our concern was that if we started allowing one collection with a noncommercial license, no matter what the circumstances, other institutions that would want to include such a clause could point to it and say “Well them, why not us”? We have in fact used our “Free Cultural Works” only policy as a carrot for other institutions we host data for, including several museums and libraries in Philadelphia, and it works remarkably well (free hosting in exchange for an open license is apparently an attractive prospect). We don’t want to lose that leverage – particularly when it comes to institutions that own materials from former colonial countries, we have the ability and thus the responsibility to make that data available again, and it’s not a responsibility we take lightly.

The second idea, building a parallel OPenn to allow noncommercially licenced data, was more attractive but, we decided, just too much work at this time, with only one collection. However it did seem like something that would be a good community project: an open access portal, similar to OPenn in design, but with policies designed for the concerns surrounding access and reuse for cultural heritage data from former colonial countries. If something like this is currently being designed I would love to hear about it, and I expect we would be very happy to put our Ramamala data into such a platform.

So what did we do? We decided to take the path of least resistance and not do anything. The Ramamala manuscripts are available on Penn in Hand, and, since the license information was entered into the MARC record notes field, the license is obvious (unlike most other materials in Penn in Hand). The images are still not easily downloadable, although we are working on a new page for the project through which people can request free access to the high-resolution TIFF files. However, they aren’t available on OPenn. And I’m not sorry about that. I’m not sorry that OPenn’s policy is strictly for “Free Cultural Works,” because I think within our community, serving mainly institutions in Philadelphia and other US and Western European cities, the policy helps us leverage collections into Open Access that might otherwise be under more strict licenses. But I do think it’s important for us to keep talking and thinking about how we can serve other communities with respect.

Edit: after posting this talk, I was contacted by Caroline Schroeder who pointed me to an article she wrote about similar issues with Coptic manuscripts. I share a link to that article here: Caroline T. Schroeder, “Shenoute in Code: Digitizing Coptic Cultural Heritage For Collaborative Online Research and Study” Coptica 14 (2015), 21-36.

Manuscript PDFs: Update

My last post was an announcement that I’d posted the University of Pennsylvania’s Schoenberg Collection manuscripts on Google Drive as PDF files, along with details on how I did it. This is a follow-up to announce that I’ve since added PDF files for UPenn’s Medieval and Renaissance Manuscript collection, AND for the Walters Art Museum manuscripts (which are available for download through The Digital Walters).

As with the Schoenberg Manuscripts, these two other collections are in their own folders, along with a spreadsheet you can search and brows to aid in discovery. You are free to download the PDF files and redistribute them as you wish. They are in the public domain.

The main directory for the manuscripts is here.

Enjoy!

Title: Initial "C" with St. Paul trampling Agrippa Form: Historiated initial "C," 12 lines Text: Psalm 97 Comment: The inscriptions on the scrolls read "Paulus/Agr[ipp]a." The second inscription is partially obliterated.
Source: Walters Ms. W.36, Touke Psalter, fol. 89r
Title: Initial “C” with St. Paul trampling Agrippa
Form: Historiated initial “C,” 12 lines
Text: Psalm 97
Comment: The inscriptions on the scrolls read “Paulus/Agr[ipp]a.”
The second inscription is partially obliterated.

 

It’s been a while since I rapped at ya

I’m not dead! I’m just really bad when it comes to blogging. I’m better at Facebook, and somewhat better at Twitter (and Twitter), and I do my best to update Tumblr.

The stated purpose of this blog is to give technical details of my work. This mostly involves finding data, and moving it around from one format to another. I use XSLT, because it’s what I know, although I’ve seen the promise of Python and I may eventually take the time to learn it well. I don’t know when that will happen, though.

I’ve taken to posting files and documentation on Github, so if you’re curious you can look there. If you’re familiar with my interests, and you share them, the most interesting things will be VisColl, a developing system for generating visualizations of manuscript codices showing elements of physical construction, DistributionVis which is, as described on Github, “a wee script to visualize the distribution of illustration in manuscripts from the Walters Art Museum,” and ebooks, files I use to start the process of building ebooks from our digitized collection. (Finished ebooks are archived in UPenn’s institutional repository, you can download them there)

VisColl – quire with an added leaf
DistributionVis – Different color lines refer to different types of illustrations or texts.

VisColl has legs, a lot of interest in the community, and is part of a major grant from the Mellon Foundation to the University of Toronto. Woohoo! DistributionVis is something I threw together in an afternoon because I wanted to see if I could. I thought ebooks were a nice way to provide a different way for people to access our collection. I’ve no idea if either of these two are any use to anyone, but I put them out there, because why not?

I do a lot of putting-things-out-there-because-why-not – it seems to be what I’m best at. So I’m going to continue doing that. And when I do, I shall try my very best to put it here too!

Until next time…

Disbinding Some Manuscripts, and Rebinding Some Others (presented at ICMS, Kalamazoo, MI, May 2014)

I presented my collaborative project on visualizing collation at the International Congress on Medieval Studies in Kalamazoo, Michigan, last week, and it was really well received. Also last week I discovered the Screen Recording function in QuickTime on my Mac. So, I thought it might be interesting to re-present the Kalamazoo talk in my office and record it so people who weren’t able to make the talk could still see what we are up to. I think this is longer than the original presentation – 23 minutes! – so feel free to skip around if it gets boring. Also there is no editing, so um ah um sorry about that. (Watch out for a noise at 18:16, I think my hand brushed the microphone, it’s unnerving if you’re not expecting it)

We’ll also be presenting this work as a poster/demo at the Digital Humanities 2014 Conference in Lausanne this July.

How to get MODS using the NYPL Digital Collections API

Last week I figured out how to batch-download MODS records from the NYPL Digital Collections API (http://api.repo.nypl.org/) using my limited set of technical skills, so I thought I would share my process here.

I had a few tools at my disposal. First, I’m on a Macbook. I’m not sure how I would have done this had I been on a Windows machine. Second, I’m pretty good with XSLT. Although I have some experience with some other languages (javascript, python, perl) I’m not really good at them. It’s possible one could do something like this using other languages and it would be more effective – but I use what I know. I also had a browser, which came in handy in the first step.

The first thing I had to do is find all the objects that I wanted to get the MODS for. I wanted all the medieval objects (surprise!), so to get as broad a search as possible I opted for the “Search across all MODS fields” option (Method 4 in the API Documentation), which involves constructing a URL to stick in a browser. Because the most results the API will return on a single search is 500, I included that limit in my search. I ended up constructing four URLs, since it turned out there were between 1500 and 2000 objects:

I plugged these into my browser, then just saved these result pages as XML files in a directory on my Mac. Each of these results pages had a brief set of fields for each object: UUID (the unique identifier for the objects, and the thing I needed to use to get the MODS), title, typeOfResource, imageID, and itemLink (the URL for the object in the NYPL Digital Collections website).

Next, I had to figure out how to feed the UUIDs back into the API. I thought about this for most of a day, and an evening, and then a morning. I tapped my network for some suggestions, and it wasn’t until Conal Tuohy suggested using document() in XSLT that I thought XSLT might actually work.

To get the MODS record for any UUID, you need to simply construct a URL that points to the MODS record on the NYPL file directory. They look like this:

http://api.repo.nypl.org/api/v1/items/mods/[UUID].xml

For my first attempt, I wrote an XSLT document that used document(), constructing pointers to each MODS record when processed over the result documents I saved from my browser. Had this worked, it would have pulled all the MODS records into a new document during processing. I use Oxygen for most all of my XML work, including processing, but when I tried to process my first result document I got an I/O error. Of course I did – the API doesn’t allow just any old person in. You need to authenticate, and when you sign up with the API they send you an authentication token. There may be some way to authenticate through Oxygen, but if so I couldn’t figure it out. So, back to the drawing board.

Over lunch on the second day, I picked the brain of my colleague Doug Emery. Doug and I worked together on the Walters BookReaders (which are elsewhere on this site), and I trust him to give good advice. We didn’t have much time, but he suggested using a curl request through the terminal on my Mac – maybe I could do something like that? I had seen curl mentioned on the API documentation as well, but I hadn’t heard of it and certainly hadn’t used it before. But I went back to my office and did some research.

Basically, curl is a command-line tool for grabbing the content of whatever is at the other end of a URL. You give it a URL, and it sends back whatever is on the other end. So, if you send out the URL for an NYPL MODS record, it will send the MODS record back. There’s an example on the NYPL API documentation page which incorporates the authentication token. Score!

curl “http://api.repo.nypl.org/api/v1/items?identifier_type=local_bnumber&identifier_val=b11722689” -H ‘Authorization: Token token=”abcdefghijklmn”‘ where ‘abcdefghijklmn’ is the authentication token you receive when you sign up (link coming soon).

Next, I needed to figure out how to send between 1500 and 2000 URLs through my terminal, without having to do them one by one. Through a bit of Google searching I discovered that it’s possible to replace the URL in the command with a pointer to a text file containing a list of URLs, in the format url = [url]. So I wrote a short XSLT that I used to process over all four of the result documents, pulling out the UUIDs, constructing URLs that pointed to the corresponding MODS records, and putting them in the correct format. Then I put pointers to those documents in my curl command:

curl -K “nypl_medieval_4_forCurl.txt” -H ‘Authorization: Token token=”[my_token]”‘> test.xml

Voila – four documents chock full of MODS goodness. And I was able to do it with a Mac terminal and just a little bit of XSLT.

Q: How do you teach TEI in an hour?

A: You don’t! But you can provide a substantial introduction to the concept of the TEI, and explain how it functions.

On June 4 I participated in PhillyDH@Penn, a day of workshops and unconference sessions sponsored by PhillyDH and held in my own beloved Van Pelt-Dietrich Library on the University of Pennsylvania campus. I was sick, so I wasn’t able to participate fully, but I was able to lead a one-hour Introduction to TEI. I aimed it at absolute beginners, with the intention to a) Give the audience an idea of what TEI is and what it’s for (to help them answer the question, Is TEI really what I need?) and b) explain enough about the TEI so they will know a bit of something walking into their first “real” (multi-hour, hands-on) TEI workshop. I got a lot of good feedback, so hopefully it did its job. And I do hope to have the opportunity to follow this up with more substantial workshops.

Slides (in PDF format) are posted here.

EDIT: Need to add that these slides owe a ton to James Cummings, with whom I have taught TEI and to whom I owe much of what I know about it!