The Uncanny Valley and the Ghost in the Machine: a discussion of analogies for thinking about digitized medieval manuscripts

This is a version of a paper I presented at the University of Kansas Digital Humanities Seminar, Co-Sponsored with the Hall Center for the Humanities on September 17, 2018.

Good afternoon, and thank you everyone for coming today. Thanks especially to the Digital Humanities Seminar and the Medieval and Early Modern Seminar for inviting me today, and to Peter and Elspeth for being such excellent and kind hosts.

What I’d like to do today is present an overview of some of my current work on how to think about manuscripts and digitized manuscripts, and and to present some newer work, all of which I must warn you is still in a bit of an embryonic state. I’m hoping that we have time in Q&A to have a good discussion, and that I might be able to learn from you.

First, a bit about me so you know where we’re starting.

I am a librarian and curator at the Schoenberg Institute for Manuscript Studies (SIMS), which is a research and development group in the Kislak Center for Special Collections, Rare Books and Manuscripts at the University of Pennsylvania. I’ve been at Penn, at SIMS, for a little over five years, when SIMS was founded. The work we do at SIMS is focused broadly on manuscripts and on digital manuscripts, particularly but not solely medieval manuscripts – we have databases we host, we work with the physical collections in the library, we collaborate on a number of projects hosted at other institutions. Anything manuscript related, we’re interested in. My place in this group is primarily that of resident digital humanist and digital librarian – and let me tell you, these are two different roles.

At the moment, in my librarian role, I’m co-PI of a major project, funded by the Council on Library and Information Resources, to digitize and make available the western medieval manuscripts from 15 Philadelphia area institutions – about 475 mss codices total. We call this project Bibliotheca Philadelphiensis, or the library of Philadelphia – BiblioPhilly for short. The work I’ve been doing so far on this project is largely related to project management: making sure the manuscripts are being photographed on time, that the cataloging is going well (and at the start we had to set up cataloging protocols and practices). The manuscripts are going online as they are digitized, in the same manner that all our manuscripts do:

They go on OPenn: Primary Digital Resources Available for Everyone, which is a website where we make available raw data as Free Cultural Works. For BiblioPhilly, this means images, including high-resolution master TIFFs, in the Public Domain, and metadata in the form of TEI Manuscript Descriptions under CC:0 licenses – this means released into the public domain.

OPenn has a very specific purpose: it’s designed to make data available for reuse. It is not designed for searching and browsing. There is a Google search box, which helps, but not the kind of robust keyword-based browsing you’d expect for a collection like this.

And the presentation of the data on the site is also simple – this is an HTML rendition of a TEI file, with the information presented very simply and image files linked at the bottom. There’s no page-turning facility or gallery or filmstrip-style presentation. And this is by design. We designed it this way, because we believe in separating out data from presentation. The data, once created, won’t change much. We’ll need to migrate to new hardware, and at some point we may need to convert the TEI to some other format. The technologies for presentation, on the other hand, are numerous and change frequently. So we made a conscious decision to keep our data raw, and create and use interfaces as they come along, and as we wish. Releasing the data as Free Cultural Works means that other people can create interfaces for our data as well, which we welcome.

So we are working on an interface for the Bibliotheca Philadelphiensis data right now. We’re partnering with a development company that worked with the Walters Art Museum on their site called Ex Libris (manuscripts.thewalters.org), and this is the site that most people will use to interface with the collection. It has the browsing facility you’d expect.

Here we’re selecting a Book of Hours.

 

 

Let’s say we just want 14th century Books of Hours.

So here’s Lewis E 89,

we have a Contents and Decorations menu so we can browse directly to a specific text, here’s the start of the Office of the Dead with a four-line illuminated initial.

And then further down the page is all the data from the record you’d expect to see on any good digitized manuscript site. This is pulled from the TEI Manuscript Descriptions and indexed in a backend database for the site, while the images are pulled directly from their URLs on OPenn.

So this is great. We’re doing very important work making data about manuscripts available to the world, in ways that make it easy to reuse them, whether to build new projects or to just publish an image in a book or on a website. And I want to make it clear that I don’t intend anything in the rest of my talk to undermine this vital work. But. but.

I mentioned that I’m also the resident Digital Humanist on our team. And in addition to the technical work involved in that, the work of building tools (which I promise I will get to before this talk is finished) I do a lot of thinking about what it is we do. And there’s a question, one question that keeps me up at nights and drives the focus of my current research. The question comes out of a statement. And that statement is:

The digitized manuscript is not the manuscript itself.

 

Or, as I prefer it, in meme form

This shouldn’t be a controversial statement, to anyone who has ever worked with a manuscript and then used a digital version of it. It’s obvious. This is an obvious statement. And yet we undermine this statement all the time in the ways we talk about digitized manuscripts – I do it to. How many times have you said, or heard someone else say, “I saw the manuscript online” or “I consulted it online” or “I used it online”? Not pictures of the manuscript or the digitized manuscript but the manuscript? So one question to come out of this is:

If the digitized manuscript isn’t the manuscript, then what is it?

 

This is not actually the question that keeps me up nights, because although this is interesting, it’s not practical or useful for me. My job is to make these things available so you can use them. So the question that actually keeps me up at night is:

If a digitized manuscript isn’t a manuscript, how can we present it in ways that explore aspects of the original’s manuscript-ness, ethically and with care, while both pushing and respecting the boundaries of technology? Although this practice of thinking about what it means to digitize a manuscript and what that becomes seems really philosophical, this is really practical question.

As a librarian who works on the digitization and online presentation of medieval manuscripts, I think it’s really important for me and others in my position to be mindful about what exactly it is that we do in our work.

So for the rest of our time today I’ll first go over a few things I’ve already worked out a bit for other papers, and then move on to the Ghost in the Machine, which is new and confuses me in a way that my previous stuff hasn’t.

Outline (my work so far):

  • Uncanny valley
  • Memes & Terms
  • Transformative works
  • Ghost in the Machine*

The Uncanny Valley

We’ll start with the Uncanny Valley, which I presented on at the International Congress on Medieval Studies in Kalamazoo this May although I have been thinking about applying the uncanny valley to digitized manuscripts since 2008.

The uncanny valley is a concept that comes out of robotics and it has to do with how humans perceive robots as they become more like humans. The concept was first described by Masahiro Mori in a 1970 article in the Japanese journal Energy, and it wasn’t translated into English completely until 2012. In this article, Mori discusses how he envisions people responding to robots as they become more like humans. The article is a thought piece – that is, it’s not based on any data or study. In the article, Mori posits a graph, with human likeness on the x axis and affinity on the y axis. Mori’s proposition is that, as robots become more human-like, we have greater affinity for them, until they reach a point at which the likeness becomes creepy, or uncanny, leading to a sudden dip into negative affinity – the uncanny valley.

M. Mori, “The uncanny valley,” Energy, vol. 7, no. 4, pp. 33–35, 1970 (in Japanese);  M. Mori, K. F. MacDorman and N. Kageki, “The Uncanny Valley [From the Field],” in IEEE Robotics & Automation Magazine, vol. 19, no. 2, pp. 98-100, June 2012. (translated into English) (https://ieeexplore.ieee.org/document/6213238/)
Manuscripts aren’t people, and digitized manuscripts aren’t robots, so I distilled four relevant points out of Mori’s proposition, and then drew four parallel points for our manuscript discussion:

First, Robots are physical objects that resemble humans more or less (that is the x-axis of the graph)

Second, as robots become more human-like, people have greater affinity for them (until they don’t – uncanny valley) – this is the y-axis of the graph

Third, the peak of the graph is a human, not the most human robot

Fourth, the graph refers to robots and to humans generally, not robots compared to a specific human

And the four parallel points:

First, digitized manuscripts are data about manuscripts (digital images + structural metadata + additional data) that are presented on computers through interfaces. Digitized manuscripts are fragments, and in visualizing the manuscript on a computer we are reconstructing them in various ways. These presentations resemble the parent manuscript more or less (this is the x-axis)

Second, as presentations of digitized manuscripts become more manuscript-like, people have greater affinity for them (until they don’t – uncanny valley) – this is the y-axis

Third, the peak of the graph is the parent manuscript, not the most manuscript-like digital presentation

Fourth, the graph refers to a specific manuscript, not to manuscripts generally

I think that this is the major difference in applying the concept of the uncanny valley to manuscripts vs. robots: while Robots are general, not specific (i.e., they are designed and built to imitate humans and not specific people), the ideal (i.e., most manuscript-like) digital presentation of a manuscript would need to be specific, not general (i.e., it would need to be designed to look and act like the parent manuscript, not like any old manuscript)

The Effect of Movement is an important piece of the functioning of the uncanny valley for robotics, and it is also important when thinking about manuscripts. Manuscripts, after all, are complex physical objects, much as humans are complex physical objects. Manuscripts have multiple leaves, which are connected to each other across quires, the quires which are then bound together and, often, connected to a binding. So moving a page doesn’t just move that page, much as bending your leg doesn’t just move your leg. Turning the leaf of a manuscript might tug on the conjoined leaf, push against the binding, pull on the leaves preceding and following – a single movement provoking a tiny chain reaction through the object, and one which, with practice, we are conditioned to recognize and expect. If things move other than we expect them to (as zombies move differently from humans in monster movies) our brains will recognize that there is something “off” about them

Here is a video of me turning the pages of Ms. Codex 1056, a Book of Hours from the University of Pennsylvania. This will give you an idea of what this manuscript is like (its size, what its pages look like, how it moves, how it sounds). It’s a copy of the manuscript, showing just a few pages, and the video was taken in a specific time and space with a specific person. If you came to our reading room and paged through this manuscript, it would not look and act the same for you.

Now let’s take a look at this manuscript presented through Penn’s page turning interface, Penn in Hand (this is the University of Pennsylvania’s official interface and is separate from the BiblioPhilly interface that is under development) When you select the next page, the opening is simply replaced with the next opening (after a few seconds for loading). The images are, literally, flat. This is two images, taken at different times (perhaps within hours of each other), presented side-by-side to give the impression of a book opening. It’s a reconstruction using the digitized fragments.

These are two very different ways for looking at this manuscript through a digital interface, and they do very different things, illustrate different aspects of the book, and can be used in different ways. We’ll come back to this later in the talk, but for now I would like to move on to memes and terms.

Memes and Terms

I find the Uncanny Valley useful for providing a framework for presentation technologies along a spectrum from “less like the manuscript” to “more like the manuscript”.  Next, I want to say a little bit about the terms that people choose to use to refer to digitized manuscripts, how we talk about them. (Initially presented this summer in a lecture for Rare Book School) I think that in this instance, terms function rather like memes, so I want to start with the meme (You’ll notice that I’m a fan of memes).

The word meme was coined in 1976 by Richard Dawkins in his book The Selfish Gene. In the Oxford English Dictionary, meme is defined as “a cultural element or behavioral trait whose transmission and consequent persistence in a population, although occurring by non-genetic means (especially imitation), is considered as analogous to the inheritance of a gene.” Dawkins was looking for a term to describe something that had existed for millennia – as long as humans have existed – and the examples he gave include tunes, ideas, catchphrases, clothes fashions, ways of making pots or building arches. These are all things that are picked up by a community, ideas and concepts that move among members of that community, are imitated and modified, and which are frequently moved on to new communities as well where the process of imitation and modification continues. More recently the term meme has been applied specifically to images or text shared, often with modification, on the Internet, particularly through social media: If you’ve ever been RickRolled, you have been on the receiving end of a particularly popular and virulent meme.

Following this theory, terms work like this:

  1. A term begins with a specific meaning (e.g., outlined in the OED, citing earlier usage),
  2. A scholar adopts the term because we need some way to describe this new thing that we’ve created. So we appropriate this term, with its existing meaning, and we use it to describe our new thing.
  3. The new thing takes on the old meaning of the term,
  4. The term itself becomes imbued with meaning from what we are now using it to describe.
  5. The next time someone uses that term, it carries along with it the new meaning.

There are three terms that I see used a lot to refer to digitized manuscripts, but although we use these three terms – facsimile, surrogate, and avatar – to refer to digitized manuscripts, it is clear that these terms don’t mean the same thing, and that by choosing a specific term to refer to digitized manuscripts we are drawing attention to particular aspects of them. Facsimile literally means make similar, so if I call a digitized manuscript a facsimile, I draw attention to its status as a copy. Surrogate, on the other hand, generally means something that stands in for something else. So if I call it a surrogate, I draw attention to its status as a stand-in for the physical object. Avatar, finally, refers to manifestation, originally a god manifesting in human form, but now used to refer to people or physical objects manifesting in digital form. So if I call it an avatar, I draw attention to its status as a representation of the physical object in a digital world. Not a copy, not a replacement, but another version of that thing.

I would like to remark on our apparent desire as a community to apply meaning to digital version of manuscripts by using existing terms, rather than by inventing new terms. After all, we coin new words all the time, so it would be understandable if we decided to make up a new term rather than reusing old ones. But as far as I know we haven’t come up with a completely new term for “digitized medieval manuscript,” and if anyone has it hasn’t caught up enough to be reused widely in the scholarly community. I expect this comes from a desire to describe a new thing in terms that are understandable, as well as to define the new thing according to what came before. Digital versions of manuscripts are new things that have a close relationship with things that existed before, so while we want to differentiate them we also want to be able to acknowledge their similarities, and one way to do that is through the terms we call them.

Like pushing an idea through different memes, pushing the concept of a digitized manuscript through different terms give us flexibility in how we consider them and how we explain them, and our feelings about them, to our audiences. That we can so easily apply terms with vastly different meanings to the digital versions of manuscripts says something about the complexity of these objects and their digital counterparts.

Transformative Works / Language of Care

Another aspect that I’ve been looking into is coming up with a way to talk about manuscripts and manuscript digitization using the language of transformative works. (First presented at the International Medieval Congress, Leeds)

Transformative work is a concept that comes out of fandom: that is, the fans of a particular person, team, fictional series, etc. regarded collectively as a community or subculture. We typically talk about fandom in relation to sports, movies, or TV shows, but people can be fans of many things (including manuscripts). As defined on the Fanlore wiki:

“Transformative works are creative works about characters or settings created by fans of the original work, rather than by the original creators. Transformative works include but are not limited to fanfiction, real person fiction, fan vids, and graphics. A transformative use is one that, in the words of the U.S. Supreme Court, adds something new, with a further purpose or different character, altering the [source] with new expression, meaning, or message.”

In some fandom communities, transformative works play a major role in how the members of that fandom communicate with each other and how they interact with the canon material (“canon” being the term fans use to refer the original work). Transformative works start with canon but then transform it in various ways to create new work – new stories, new art, new ideas, possible directions for canon to take in the future, directions canon would never take but which are fun or interesting to consider.

Although it’s still quite niche there is a small but growing academic movement to apply the concept of transformative work to historical texts. Some of this work is happening through the Organization for Transformative Works, which among other things hosts Archive of our Own, a major site for fans to publish their fanworks, and provides legal advocacy for creators of fanworks.

The Organization for Transformative Works also publishes a journal, Transformative Works and Cultures, and in 2016 they published an issue “The Classical Canon and/as Transformative Work,” which focused on relating ancient historical and literary texts to the concept of fan fiction (that is, stories that fans write that feature characters and situations from canon). There is also a call for papers currently open for an upcoming special journal issue on “Fan Fiction and Ancient Scribal Culture,” which will “explore the potential of fan fiction as an interpretative model to study ancient religious texts.” This special issue is being edited by a group of scholars who lead the “Fan Fiction and Ancient Scribal Cultures” working group in the European Association of Biblical Studies, which organized a conference on the topic in 2016.

You will note that the academic work on transformative works I’ve cited focus specifically on fan fiction’s relationship with classical and medieval texts, which makes a fair amount of sense given the role of textual reuse in the classical and medieval world. In her article “The Role of Affect in Fan Fiction ,” published in the Transformative Works and Cultures special issue of 2016, Dr. Anna Wilson places fan fiction within the category of textual reception, wherein texts from previous times are received by and reworked by future authors. In particular, Dr. Wilson points to the epic poetry of classical literature, medieval romance poetry, and Biblical exegesis, but she notes that comparisons between fan fiction and these past examples of textual reception are undertheorized, and leave out a major aspect of fan fiction that is typically not found, or even looked for, in the past examples. She says, “To define fan fiction only by its transformative relationship to other texts runs the risk of missing the fan in fan fiction—the loving reader to whom fan fiction seeks to give pleasure. Fan fiction is an example of affective reception. While classical reception designates the content being received, affective reception designates the kind of reading and transformation that is taking place. It is a form of reception that is organized around feeling.” (Wilson, 1.2)

For my paper at the International Medieval Congress at Leeds earlier this summer I used the manuscript University of Pennsylvania LJS 101 as an example of “medieval manuscript as transformative work,” not as a piece of data to be mined for its texts, but as a transformative work in itself, centered around a language of care. I’m not comfortable applying the concept of affective reception to the people who created and worked with medieval manuscripts – I don’t want to suggest that these people loved the manuscript the same way that I do – but I do want to explore the idea that this person or people cared about them, and that other people have cared about these manuscript over time enough that it survives to live now in the libraries and private collections. Their interests may have been scholarly, or based on pride of ownership, or even based on curiosity, but whatever their reasons for caring for the manuscript, they did care, and we know they cared because of the physical marks that they have left on these books. Manuscripts that survive also often show examples of lack of care, damage and so forth, so those elements need to be included in this framework as well.

However, for this paper I want to suggest that we consider that manuscripts relate to digitized manuscripts within the framework of Transformative Works in the same way that so-called canon works relate to fan works. In this framework, we design and build visualizations and interfaces for digitized manuscripts in the same way that fans create fan fiction and fan art: by respecting the canon but adding something new, with a further purpose, altering the source with new expression, meaning, or message. Such visualizations or interfaces would be embodiments of Dr. Wilson’s concept of affective reception. They may look and function similar to scholarly interfaces, and may even be able to be used for scholarship, but they would primarily be designed to elicit an emotional response from the user.

The Ghost in the Machine

And now we get to our discussion of The Ghost in the Machine, which was a term coined by philosopher Gordon Ryle in 1949 to describe the concept of mind-body dualism, that is, that the human mind and the body are distinct and separable.

In the last chapter of 2008’s Printing the Middle Ages, “Coda; The Ghost in the Machine; Digital Avatars of Medieval Manuscripts,” Sian Echard talks of “the ominous implications of the phrase ‘the ghost in the machine’” in the consideration of modern reception, particularly digital reception, of medieval manuscripts. As she notes in her footnote, “the phrase has become a commonplace of digital popular culture and reflection on it, and I use it in that spirit here. It will become clear, however, that the phrase’s origins in discussions of Cartesian dualism are oddly relevant to a consideration of the dis- and re-embodiment of medieval text-objects in digital avatars.”

The idea of applying the concept of the Ghost in the machine, that is the separability or inseparability of the mind of the manuscript and the body of the manuscript, brings to mind a few other, perhaps similar ways of thinking about art and communication. In preparing for this talk I read, in addition to Echard’s piece quoted from a minute ago, both Walter Benjamin’s “The Work of Art in the Age of Mechanical Reproduction” his 1936 treatise on what photography means for art, and Marshall McCluhan’s chapter “The Medium is the Message” from his 1964 book Understanding Media: The Extensions of Man. There are a few things from these texts that I think are relevant for an application of the Ghost and the Machine to a discussion of manuscripts and digital manuscripts.

Benjamin is concerned, among other things, with the aura of a work of art, which he at one point defines as its uniqueness, closely related to its authenticity. Because of the nature of photography (one of the means of mechanical reproduction identified by Benjamin) it is simply not possible to reproduce the aura of an artwork, because art exists in the world, and reproducing a piece of art in a photograph takes it out of that world. In his words,

“Even the most perfect reproduction of a work of art is lacking in one element: its presence in time and space, its unique existence at the place where it happens to be. This unique existence of the work of art determined the history to which it was subject throughout the time of its existence. This includes the changes which it may have suffered in physical condition over the years as well as the various changes in ownership. The traces of the first can be revealed only by chemical or physical analyses which it is impossible to perform on a reproduction; changes of ownership are subject to a tradition which must be traced from the situation of the original.”

McCluhan, on the other hand, is more concerned with how information is conveyed to us – the medium being the physical conveyance of the message, which he makes pains to point out is different from the content, which is something else altogether. In fact, he says,  “the ‘content’ of any medium is always another medium. The content of writing is speech, just as the written word is the content of print, and print is the content of the telegraph.” The message instead is “the change of scale or pace or pattern that it introduces into human affairs.” So the message for us isn’t the text or illustrations in a manuscript, the message is the means by which that content makes it to us through whatever medium.

For example, the medium could be a 15th century Book of Hours, but the message is that the Psalms are divisible into groups, the Penitential Psalms are one such group, and at this point in time it was important for these Psalms to be set apart in these Books using some agreed upon conventions. That is the message. The message of a digital photograph of the Penitential Psalms is altogether different, because the medium is different. The medium is bits and bytes, arranged through a system of machines, to display something that looks like those pages in that Book of Hours. The concept of digital photography as a medium with a message is very similar to McCluhan’s description of electric light:

For it is not till the electric light is used to spell out some brand name that it is noticed as a medium. Then it is not the light but the “content” (or what is really another medium) that is noticed. The message of the electric light is like the message of electric power in industry, totally radical, pervasive, and decentralized.

Likewise, I think, the medium of pervasive digital photography for medieval manuscripts is radical for us today. We are creating relatively massive amounts of fragmentary data about our manuscripts. What do we do with it?

What the analogy of the Ghost and the Machine forces us to do first is to determine what is the ghost, and what is the machine. Being informed by Benjamin’s concept of the aura, McCluhan’s concept of the medium and the message (distinct from content), Dr. Wilson’s concept of affective reception, and also the relationship between affinity and “manuscript-ness” in the uncanny valley, I’d like to propose that the ghost of a manuscript is very close to Benjamin’s concept of the aura, and that the aura is what informs our affinity towards any interface, and is also what we as humans are set to respond to emotionally. The aura, the ghost, is what makes a manuscript unique, and what allows us to identify it. Earlier, when talking about the uncanny valley, I mentioned that if you had Ms. Codex 1056 in your hands tomorrow it would look different that it did in the video I showed you. That difference between what you would see in your interactions with the manuscript, and what you see in my video, that’s the ghost. And it is, I think, impossible to reproduce the ghost of the manuscript, and impossible to visualize it completely in any digital interface I can comprehend.

What we can do, though, is reproduce aspects of the machine – that is, the manuscript stripped of its ghost, removed from time and space and placed in a kind of virtual vacuum – and use those to construct something that gives us insights into the parts of the ghost that aren’t themselves reproducible. As we do this we can also be mindful of what emotional responses the interface might provoke in the users.

To be fair to those of us providing access to digitized manuscripts, the first part of this – reproducing aspects of the machine – is pretty much what we’re doing now and what we’ve always done, although from my experience the thinking that happens around the digitization process is really quite prosaic, literal, practical, and we don’t think enough about how digitization or other photographic reproduction presents us with something that is very different from the manuscript while still telling us something about it. For example

Early facsimiles that present the folios disembodied and set within pages of a modern book. You can read the text but don’t get a real sense of the physicality of the manuscript.

Microfilm, designed for access to the text, giving in most cases even less of a sense of the physical object.

Luxe facsimile, designed to give an impression of the physical book, although in most cases the quire structure and physicality is not exactly like the original.

Digital images presented in a page-turning interface to give the impression of the page opening, can be zoomed in on for reading, although the photography is high-res the images are only of the page, and the binding is photographed separately, so you don’t get a sense of the book as a three-dimensional object.

Digital images presented in a page-turning interface focused on the movement of the electronic page, although the movement doesn’t correspond with the movement of the physical manuscript.

Interactive 3D images that model the topography of the page, but without the context of the full manuscript. This is a miniature of St. Luke from Bill Endres’s Manuscripts of Lichfield Cathedral project.

A visualization using VisColl, a project out of SIMS, which models and visualizes the physical construction of manuscripts, generating quire diagrams and using digitized folio images to reconstruct conjoins.

The VisColl system integrated into the BiblioPhilly interface I showed you at the beginning of the talk, so you can get a sense of how the manuscript is constructed alongside the more usual facing-page view

Each of these interfaces, whether consciously or not, has been designed to illustrate some aspect of the manuscript and to ignore others. What I’d like to suggest is that we simply start being conscious of our choices as we design and build interfaces. Going back again to the uncanny valley, this is what Mori would call Escape by Design.

Uncanny Valley: Escape by Design

Mori proposes that designers focus their work in the area just before the uncanny valley, creating robots that have lower human likeness but maximum affinity. He says:

In fact, I predict that it is possible to create a safe level of affinity by deliberately pursuing a nonhuman design. I ask designers to ponder this. To illustrate the principle, consider eyeglasses. Eyeglasses do not resemble real eyeballs, but one could say that their design has created a charming pair of new eyes. So we should follow the same principle in designing prosthetic hands. In doing so, instead of pitiful looking realistic hands, stylish ones would likely become fashionable.

So after a lot of philosophical thinking, this gives us something practical to aim for. Let’s build page-turning interfaces that are conspicuous in their use of flat digital images, let’s do more with 3D, RTI, and MSI to show us parts of the manuscript we can’t see under regular institutional photography, let’s do more work using data about the manuscript to organize our flat images in new and interesting ways, all with a mind towards informing us about that unreproducible ghost.

Now, this is an ideal, and there are practical reasons this won’t always work. Interface design and development is expensive, and we – the librarians responsible for the digitization – usually aren’t the ones building the interfaces. We’re using interfaces other people build for us, either we buy them a company or we participate in consortia like the International Image Interoperability Framework, IIIF Consortium, which provides open access tools for the community. Very few institutions have the wherewithal to build their own interfaces from scratch. At Penn we’re very lucky, so the interfaces I’ve shown you today VisColl for visualizing the quire structures, and the BiblioPhilly interface with that integrated, are open source code so could be adapted by other institutions. But that adaption work also requires resources. There’s no free answer here.

There’s also the issue of the data itself. Manuscript digitization happens systematically, with each book getting a very specific set of flat images out the other end (every page gets one image, front and back cover, hopefully the edges and spine but not all libraries include those shots in their usual workflow) so it’s not usually possible to take the time to point out interesting anomalies (imagine a bookmark that is conspicuous when you’re sitting down with the book, but which goes unphotographed because it’s not listed in the list of shots provided to the photographer). And 3D modeling, RTI, MSI, are not available to most library digitization facilities, and even if they were there would need to be policies and procedures to determine which manuscripts get such special treatment and which don’t. So it’s great for me to stand up here and say this is what we should do, but a whole other thing to do it in any practical way.

Dot’s 2018 Conception of the Uncanny Valley of Digitized Manuscripts

I’m over time, I’m sorry, so I’ll just close here with the question that keeps me up at night. I’d like to talk to you about this and see if you have ideas or suggestions, or if you think this is moving in the right direction. Thank you.

Reading and Writing Philadelphia, University of Pennsylvania LJS 101, c. 850–1100

This is a version of a paper I presented at the International Medieval Congress at Leeds on July 3, 2018, in the session “The Origins, Effects, and Memory of Caroline Minuscule, II” sponsored by the Network for the Study of Caroline Minuscule. 

Today I’m going to tell you about UPenn LJS 101, which is the oldest codex we have in the University of Pennsylvania Libraries by at least 150 years and which is one of only two codices in our collection which is written in Caroline minuscule (the other one being UPenn Ms. Codex 1058, dated to ca. 1100 and located to Laon) – we also have one leaf written in Caroline minuscule.

The bulk of the manuscript, folios five through 44 (Quires two through six), are dated to the mid-9th century, but in the early 12th century replacement leaves were added for the first four leaves and for the last 20 leaves. LJS 101 reflects the educational program set up in the Carolingian court by Alcuin, featuring a copy of Boethius’s translation of Aristotle’s De Institutiatione (which was commonly called Periermenias, the name also used in this text) and a short commentary on that text (also called Periermenias), along with a few other shorter texts.

I want to admit up front that I don’t have a serious scholarly interest in LJS 101. I’m not a Carolingianist, I don’t study Alcuin, or Bede, or Aristotle. I’m a librarian, and the focus of my work is manuscript digitization and visualization, so I spend a lot of time thinking about manuscripts, how they’re put together, how to digitize them, and how to visualize them in ways that reveal truths about the physical object, ideally without fetishizing them.

But I also love manuscripts. I wouldn’t be doing what I do if I didn’t. I love the way they look, especially books that have been well-used: the imperfect edges, worn ink, and the many and varied signs that people have had their hands on these manuscripts, that they were well-used and well-loved. And there’s no book I love more than LJS 101.

So what I want to do today is tell you about LJS 101, but I want to put my discussion within the context of that love, specifically I want to talk about LJS 101 within the frame of Transformative works.

A transformative work is a concept that comes out of fandom: that is, the fans of a particular person, team, fictional series, etc. regarded collectively as a community or subculture. We typically talk about fandom in relation to sports, movies, or TV shows, but people can be fans of many things (including manuscripts). As defined on the Fanlore wiki:

Transformative works are creative works about characters or settings created by fans of the original work, rather than by the original creators. Transformative works include but are not limited to fanfiction, real person fiction, fan vids, and graphics. A transformative use is one that, in the words of the U.S. Supreme Court, adds something new, with a further purpose or different character, altering the [source] with new expression, meaning, or message.

In some fandom communities, transformative works play a major role in how the members of that fandom communicate with each other and how they interact with the canon material (“canon” being the term fans use to refer the original work). Transformative works start with canon but then transform it in various ways to create new work – new stories, new art, new ideas, possible directions for canon to take in the future, directions canon would never take but which are fun or interesting to consider.

Although it’s still quite niche there is a small but growing academic movement to apply the concept of transformative work to historical texts. Some of this work is happening through the Organization for Transformative Works, which among other things hosts Archive of our Own, a major site for fans to public their fanworks, and provides legal advocacy for creators of fanworks.

The Organization for Transformative Works also publishes a journal, Transformative Works and Cultures, and in 2016 they published an issue “The Classical Canon and/as Transformative Work,” which focused on relating ancient historical and literary texts to the concept of fan fiction (that is, stories that fans write that feature characters and situations from canon). There is also a call for papers currently open for an upcoming special journal issue on “Fan Fiction and Ancient Scribal Culture,” which will “explore the potential of fan fiction as an interpretative model to study ancient religious texts.” This special issue is being edited by a group of scholars who lead the “Fan Fiction and Ancient Scribal Cultures” working group in the European Association of Biblical Studies, which organized a conference on the topic in 2016. Closer to home, Dr. Juliana Dresvina at Oxford University is organizing a colloquium later this month on “Fanfiction and the Pre-Modern World,” and I understand she is planning to organize a larger conference next year.

You will note that the academic work on transformative works I’ve cited focus specifically on fan fiction’s relationship with classical and medieval texts, which makes a fair amount of sense. In her article “The Role of Affect in Fan Fiction ,” published in the Transformative Works and Cultures special issue of 2016, Dr. Anna Wilson places fan fiction within the category of textual reception, wherein texts from previous times are received by and reworked by future authors. In particular, Dr. Wilson points to the epic poetry of classical literature, medieval romance poetry, and Biblical exegesis, but she notes that comparisons between fan fiction and these past examples of textual reception are undertheorized, and leave out a major aspect of fan fiction that is typically not found, or even looked for, in the past examples. She says, “To define fan fiction only by its transformative relationship to other texts runs the risk of missing the fan in fan fiction—the loving reader to whom fan fiction seeks to give pleasure. Fan fiction is an example of affective reception. While classical reception designates the content being received, affective reception designates the kind of reading and transformation that is taking place. It is a form of reception that is organized around feeling.” (Wilson, 1.2)

Back to LJS 101. What I want to do here is look at LJS 101, not as a piece of data to be mined for its texts, but both as a transformative work in itself, and as an object for the transformative work of others, particularly digital versions, and I want to center this looking at the manuscript using a language of care. I’m not comfortable applying the concept of affective reception to the people who created and worked with LJS 101 over the past 1100 or so years – I don’t want to suggest that the person who took the manuscript from its original form to its 12th century form loved the manuscript the same way that I do – but I do want to explore the idea that this person or people cared about it, and that other people have cared about this manuscript over time enough that it survives to live now in the library at the University of Pennsylvania. Their interests may have been scholarly, or based on pride of ownership, or even based on curiosity, but whatever their reasons for caring for the manuscript, they did care, and we know they cared because of the physical marks that they have left on this book. The manuscript as it survives also shows some examples of lack of care, and I want to address those as well.

Binding: 19th-century English diced Russia leather, bound for Sir Thomas Phillips.

The first obvious mark of care on LJS 101 is the binding, which is a lovely 19th century leather binding done for book collector Sir Thomas Phillips, who purchased the book in or around 1826, and which was sold out of his estate in 1945, sold again in 1978 and 1979, and finally sold to Lawrence J. Schoenberg in 1997.

Formerly owned by Sir Thomas Phillips, ms. 2179 (stamped crest inside upper cover). LJS collection bookplate. Gift of Barbara Brizdle Schoenberg in honor of Amy Gutmann, President, University of Pennsylvania, 2014.

Phillips also left two owners marks on the inside front cover, a stamped crest in the upper part of the inside cover, and a second ownership stamp with his library’s number for the manuscript (ms. 2179). Another ownership mark is the Penn Libraries bookplate, showing that the manuscript belongs with the Lawrence J. Schoenberg collection. Phillips and Schoenberg both cared: Phillips cared enough to bind the book, they both marked it as their own, and Schoenberg gifted it to Penn in 2014 for long-term institutional care.

1r: Conclusion of a grammatical work, 7-line verse by Eugene II of Toledo, Isidore’s definition of rhetoric (12th c.)

The first quire – four leaves – is a 12th century replacement. Fol. 1r begins with the ending of a grammatical text on declensions, including some words in Greek and references to the Aeneid and the Thebais of Statius. This folio also contains a 7-line poem by Eugenius II of Toledo, “Primus in orbe dies…,” a poem on the seven days of Biblical creation (MGH, Auct. Antiq. XIV; Migne, PL LXXXVII:365-6) [1]. This implies that the first quire has not always been the first quire, and at some point there was at least one more quire before Quire 1. As we’ll see in a moment, the text on the last leaf of Quire 1 leads directly into the text on the first leaf of Quire 2, which makes it clear that the 12th century work was created in response to the 9th century piece, and it was not the case that two existing pieces were placed together without regard for the other. (Note that the first leaf in the manuscript also includes another ownership mark from Sir Thomas Phillips, noting the number of the manuscript in his collection, and the number it had in a catalog – yet another sign of care from Phillips.)

1v: Boethius’ translation of Aristotle’s De Institutione (Periermenias Aristotelis) 12th c. switching to 9th c. on fol. 5, back to 12th c. on fol. 45

The main text of the manuscript, the Latin translation of Aristotle’s De Institutione (called Periermenias Aristotelis generally and in the text), begins on fol. 1v. Boethius’s translation of De Institutione, along with his translation of the Categories and Porphyry’s introduction to Aristotelian logic, the Isagoge, formed the core of Alcuin of York’s logic textbook, De dialectica. These three works—as translated by Boethius—would become known as the logica vetus, and would dominate the study of logic until the twelfth century. This explains the why of this manuscript – this was an important text. The inclusion of the now-missing grammatical text also implies that this book was designed with care to be a sort of textbook. Note the striking illuminated initial P that begins the text – a visual sign of care taken in the design of this manuscript.

Folio 4v-5r, with the 12th century script on the left and the 9th century script on the right.

As noted above, the 12th century text from Quire 1, folio 4v, continues directly to the 9th century portion of the manuscript on Quire 2, folio 5r.

UPenn LJS 101, folio 27r, showing interlinear and marginal corrections and glosses.

The same hand that wrote out the 12th century full text here and from folio 45 through the end of the manuscript also went through the 9th century text and made many corrections, both deleting and adding text. As far as I know there hasn’t been a full textual analysis of the text in this manuscript, but it’s possible, if not likely, that the 12th century scribe had a more recent copy of the text and corrected the older version in comparison with it. For whatever reason the scribe, or someone supervising the scribe, cared enough to take the existing 9th century copy of Boethius, to complete it, and to bring it up to date with an improved version of the text.

36v: Diagram from 9th c. with color added in the 12th century

Also in the 12th century program of care, the scribe or someone alongside the scribe added green and yellow highlighting to the 9th century diagrams and to some of the headwords.

As we move on though the manuscript, note that the number of corrections drops precipitously after folio 45, when we are back with the 12th century scribe.

LJS 101, folio 44v and 45r, with the 9th century script on the left and the 12th century script on the right
LJS 101, folios 52v-53r; there are several leaves missing here.

There are several quires’ worth of leaves missing between Quire six (ending with folio 44) and Quire seven (beginning with folio 45, where the manuscript switches again from 9th century to 12th century) – 49 pages worth of edited text, from Prima Editio, I c. 9, p. 111 line 20 to Prima Editio, II c. 11, p. 160 line 15– and there are at least two quires missing between Quire seven (folio 52) and Quire eight ( folio 53), from Prima Editio, II c. 13, p. 188, line 5 to Prima Editio, II c. 14, p. 224, line 13, representing 36 pages of edited text. It’s unclear when, how, and why these pages were removed, although the folio numbering appears to be from the time of Thomas Phillips, so we can safely assume that they were removed at some point before he had the manuscript bound in its lovely leather binding.

53v: The last six lines of the unidentified text; Periermeniae (12th c.)

In Quire eight, the Boethius translation ends naturally on folio 53r, line 16, at the end of Prima Editio, II c. 14 (page 225 in the edition). There is another text between the end of that and the beginning of the next commentary that has yet to be identified. This unidentified text is the last six lines of 53r and the first six lines of 53v. The next text begins on line seven of 53v. This text is a short commentary on Aristotle’s De Institutione, the Periermeniae attributed to Apuleius, the second-century AD Platonist philosopher and Latin-language prose writer. (Emma Kathleen Ramsey, “A commentary on the Peri Hermeneias ascribed to Apuleius of Madaura“)

Commentary is a kind of transformative work, in which a writer expands on the thoughts of the original writer, expanding and explaining in order to create something new, but (hopefully) illuminating. In LJS 101, then, we have a physical expression of a canon work followed by a transformation, an order that was planned by someone who cared enough to organize them that way.

Fol. 59v: Ending of Periermeniae, beginning of commentary by Haymo of Auxerre.

After the commentary by Apuleius there is brief section of a commentary on Isaiah by Haymo of Auxerre (formerly attributed to Haymo of Halberstadt)[2] that is followed by “Versus de singulis mensibus” (a poem by Decimus Magnus Ausonius on the seven days of Creation). The poem itself has been laid out with care, the columns blind-ruled to keep them straight, and the large initials alternating between lighter and darker ink. Given the topic of the poem one could say this is yet another example of a transformative work – a poetic retelling of the Christian creation story originally told in the Bible. It also ties in with the poem by Eugenius II of Toledo, “Primus in orbe dies…,” from folio 1r – which is also on the topic of the seven days of Creation.

60v: Sample letter of a monk to an abbot

The next text in Quire eight is a sample letter of a monk to an abbot, on folio 60v. I want to thank Brother Thomas Sullivan from Conception Abbey for helping me with this text, which hasn’t been otherwise studied. This is the only section of the 12th century portion of the manuscript that is heavily annotated, the original letter being expanded by both interlinear and marginal glosses. The interlinear glosses expand the primary text into more intense or elaborate language, e.g., l. 8 inserts the Latin word valde (very). The marginal glosses are signaled by a system of thirteen different interpolative marks in the left margin and one in the right. The addressee appears to be one “Domno Luculemus,” and it is not clear if this is the name of an actual abbot, or an imagined character. The letter fits in with the medieval tradition of model letters and letter-writing guides, which is traditionally dated to the work of Alberic of Monte Cassino in the late 11th century and is well documented in the 12th century.[3] Is this model letter another example of a transformative work? Because this particularly letter hasn’t been studied we can’t tell if it’s a version of an existing letter, or written with “characters” featured in other letters. If not, going back to our language of care, we can venture that since it appears on the verso side of a bifolium it was placed here for some reason understood by the scribe or scribes planning the layout.

61r: Boethius’ translation of Aristotle’s De Institutione (Periermenias Aristotelis), 12th century

The Boethius text continues on Quire nine, folio 61r, and cuts off at the end of 62v.

63r: Miscellaneous verses, definitions, and biblical commentary

We finish up with folio 63r-64r, containing miscellaneous verses, definitions, and biblical commentary. Oddly, folio 63v is a continuation of the sample letter on folio 60v. We’ll return to Quires eight and nine below, where I’ll say more about signs of lack of care in LJS 101, and describe how these two quires are currently misbound.


Bern, Burgerbibliothek, Cod. 250, fol. 10v (9th c.)

Although LJS 101 is unique to Penn, it is not the only 9th century manuscript showing 12th century care. Bern, Burgerbibliothek, Cod. 250, begins with a 9th century section (folios 1-11) describing a meeting between Einhard and Lupus of Ferrières, at the time that Einhard gave Lupus a book of arithmetic by Victorius of Aquitaine along with a now widely known model alphabet for Ancient Capitals.

Bern, Burgerbibliothek, Cod. 250, fol. 12r (12th c.)

This is followed by a 12th century section, folios 12-18, including a commentary by Abbo of Fleury on the ‘computus’ (reckoning the date for Easter). Note the green highlighting, which is similar to the green highlighting added to the diagrams in LJS 101. As in LJS 101, the 12th century scribe did not just add to the existing manuscript. They marked the 9th century text to bring it up to date, and to incorporate it into what is essentially a new object, and, arguably, a transformative work.

Bern, Burgerbibliothek, Cod. 250, fol. 1r (11th c.)

They added an abacus table to folio 1r, which was presumably left blank in the 9th century, and, as with LJS 101, they also added interlinear glosses and corrections. As with the transformation of LJS 101, these modifications show a certain amount of care, both for the older sections of the manuscript and for the new object.

Bern, Burgerbibliothek, Cod. 250, fol. 10v: 9th c. text with 11th c. interlinear gloss

So we’ve walked through LJS 101 and looked at the transformative nature of the texts in the manuscript, and the physical object itself. I’d like to spend the last portion of my talk looking at another transformative physical aspect of the manuscript and how this aspect may illustrate a lack of care, while at the same time exploring LJS 101’s digitization as another potential for transformative work around the manuscript.

In addition to the missing quires between Quires six and seven and seven and eight, there are two quires in LJS 101 that have been misbound. In both cases it is clear that somehow bifolia were mixed up, likely during rebinding (whether during the last rebinding, under the ownership of Sir Thomas Phillips, or earlier, we don’t know) and care was not taken to ensure that the bifolia were put back together correctly with regard to the text contained on them. I can’t see any aesthetic reason for the quires to be rearranged as they are; antiquarians such as Matthew Parker frequently transformed the manuscripts in his ownership in ways that made them more attractive in his eyes, in various ways, but the changes made in LJS 101 appear to be accidental rather than purposeful.[4]

A study of the text in Quires two and three (the first eight leaves of the 12th century portion of the translation) makes it clear that the leaves were bound out of order.[5] Here is the current order, along with the text beginning and ending each folio (all are Prima Editio I c. 2, page and line numbers are from the edition)

Folio 5: ends with p. 38 line 2 [the text continues on folio 9]

Folio 6: begins with p. 41 line 5, [text continues on folio 7]

Folio 7: ends with p. 44 line 2 [the text continues on folio 11]

Folio 8: begins with p. 46 line 30, ends with p. 48 line 15 [the text continues on folio 13]

Folio 9: begins with p. 38 line 2, [text continues on folio 10]

Folio 10: ends with p. 41 line 4 [the text continues on folio 6]

Folio 11: begins with p. 44 line 2, [text continues onto folio 12]

Folio 12: ends with p. 46 line 30 [the text continues on folio 8]

Folio 13 (the first leaf of Quire four): begins with p. 48 line 15

Beginning with folio 5 and following the text through these eight leaves, we can find the original order: 5, 9, 10, 6, 7, 11, 12, 8   [13… 

What was originally a quire of eight leaves was made into two quires of four leaves. The current Quire two consists of the the innermost bifolia of the original quire nested in the outermost bifolia, and the two internal bifolia from the original quire form the current Quire three. This is our first example of a lack of care. How did this happen, and why wasn’t this error, assuming it was an error, discovered before it was bound?

It might be hard to picture in your mind exactly what has happened here, so it might help to look at some transformative digital work using a project designed specifically to visualize the physical construction of manuscripts, VisColl.  A couple of years ago a student in my Rare Book School class used VisColl to model both the current and previous structures of these leaves and generated diagrams to help us understand what exactly is happening here.

Correct arrangement of fols. 5-12. Jesse McDowell, “An Ideal Collation of LJS 101”

Here is a diagram and bifolia visualization of the original structure of what are now folios 5 through 12. Using current numbering, the order of leaves should be 5-9-10-6-7-11-12-8. You can see in this diagram that 5 and 8 are conjoin and the outer bifolio, followed by 9-12, then 10-11, then 6-7. Looking at the numbering here you can see already how they were rearranged.

Current (out of order) arrangement of fols. 5-12. Jesse McDowell, “An Ideal Collation of LJS 101”

But the next diagram shows the current structure, two four-leaf quires, with the middle bifolia grouped together and the outer and inner ones likewise. Viscoll, with its focus on modeling the physical construction of manuscripts and visualizing them in various ways, is a really good example of a system for building transformative works based on a medieval manuscript: It takes an existing character, expands on it, illuminates it, and in the process makes something new.

As I was preparing this paper for the blog, I discovered a second example of a misbound quire in LJS 101, illustrating another example of a lack of care in this book’s long history. As I mentioned above, the sample letter on fol. 60v actually continues on fol. 63r. Fagin-Davis notes this in her description of LJS 101, but until now no-one has investigated why this might be – it doesn’t make sense for a text to start on one leaf and end two leaves later. So why does this happen? While taking a closer look at this section – Quires eight and nine, from folio 52 (the end of Quire seven) through folio 64 – I discovered that, as mentioned earlier, there are at least a few quires of the Boethius text missing between Quires seven and eight. Quire eight, eight leaves, begins with folio 53, and the text starts with Prima Editio, II c. 14, p. 224, line 13. The text then ends naturally in the middle of 53r. But Quire nine (four leaves, starting with folio 61) picks up Boethius again, and when I checked the citation it begins with page 216, line 25 in the edition – this text comes before the text on fol. 53.  Folio 62 ends with the text from the edition page 224, line 13, which is exactly where the text picks up on fol. 53r:

Folio 53: begins with p. 224, line 13, continues through Folio 60

Folio 60: ends with the sample letter [text continues on Folio 63r]

Folio 61: begins with page 216, line 25, continues through Folio 62

Folio 62: ends with page 224, line 13 [text continues on Folio 53]

Folio 63: contains the rest of the sample letter, continues through Folio 64

Beginning with folio 61 (having the earlier text) and following through these 12 leaves, we can find the original order: 61, 62, 53-60, 63, 64

The manuscript originally had a quire of 12 leaves. The innermost eight leaves were removed and placed before the outermost four leaves, giving us two new quires. As with the example of Quires two and three, there’s no clear explanation of why, and I am assuming this was an error.

Transformative work in fandom is created by fans who take characters and situations from existing works and make new things with them. Transformative work differs from traditional scholarly work in that the focus is on affection. To quote Anna Wilson again, “It is a form of reception that is organized around feeling.” I don’t want to claim that 12th century scribes or Thomas Phillips loved the manuscript that we call LJS 101, but I do think it’s reasonable to suggest a language of care, and I think it’s a useful exercise to think about this manuscript and others within the theoretical frame of the transformative work. Doing this pushes the boundaries of current research in this area, which tends to focus on the relationship between fan fiction and earlier forms of textual reception. Moving beyond this, to consider a language of care when talking about manuscripts – bearers of text as well as physical expressions of their own history – and to the visualization of digitized manuscripts using new methods pushes traditional scholarship in new and exciting directions that also normalizes the affection we hold for the objects of our study.


[1] Identification of the texts on Folio 1r are from Lisa Fagin-Davis, Catalog record for LJS 101, March 2001

[2] Haymo Halberstatensis: HAYMONIS HALBERSTATENSIS EPISCOPI COMMENTARIORUM IN ISAIAM LIBRI TRES Ab eodem auctore dum viveret, multorum additione, quae in aliis plerisque exemplaribus desiderantur, passim locupletati et recognitione postrema ad unguem ubique recogniti. (Coloniae, per honestum civem Petrum Quentell, anno 1531 Liber Secundus, Caput LIII (from Patrologia Latina, Vol. 116, Col.0991C-Col.0991D)

[3] Malcolm Richardson, “The Ars dictaminis, the Formulary, and Medieval Epistolary Practice, Letter-Writing Manuals and Instruction from Antiquity to the Present, edited by Carol Poster and Linda C. Mitchell (University of South Carolina Press, 2007), pp. 52-66.

[4] Timothy Graham, ‘Matthew Parker’s manuscripts: an Elizabethan library and its use‘, in The Cambridge History of Libraries in Britain and Ireland, Volume 1: To 1640, ed. E. Leedham-Green and T. Webber (Cambridge, 2006), pp. 322-41

[5] The misbinding of Quires 2 and 3 has been noted by Fagin-Davis in her catalog record, and also by Jesse McDowell in his blog post An Ideal Collation of LJS 101

Is This Your Book? What we call digitized manuscripts and why it matters

This is a version of a paper I presented as a Rare Book School Lecture at the University of Pennsylvania in Philadelphia on June 12, 2018, originally entitled “Is this your book? What digitization does to manuscripts and what we can do about it.” 

Good afternoon and thank you for coming to my talk today. The title of my talk is “Is this your book? What digitization does to manuscripts and what we can do about it.” However I want to make a small change to my title. I’m not entirely sure if there’s anything we can do about what digitization does manuscripts but I do think we can think about it, so that’s what I want to do a bit today. I want us to think about digitized books – specifically about digitized manuscripts, since that’s what I’m particularly interested in.

So, like any self-respecting book history scholar, I’m going to start our discussion of digitized manuscripts by talking about memes.

Memes

Definition of the word “meme” from the Oxford English Dictionary.

The word meme was coined in 1976 by Richard Dawkins in his book The Selfish Gene. In the Oxford English Dictionary, meme is defined as “a cultural element or behavioral trait whose transmission and consequent persistence in a population, although occurring by non-genetic means (especially imitation), is considered as analogous to the inheritance of a gene.” Dawkins was looking for a term to describe something that had existed for millennia – as long as humans have existed – and the examples he gave include tunes, ideas, catchphrases, clothes fashions, ways of making pots or building arches. These are all things that are picked up by a community, ideas and concepts that move among members of that community, are imitated and modified, and which are frequently moved on to new communities as well where the process of imitation and modification continues. More recently the term meme has been applied specifically to images or text shared, often with modification, on the Internet, particularly through social media: If you’ve ever been RickRolled, you have been on the receiving end of a particularly popular and virulent meme.

This is all very interesting, Dot (I hear you say), but what do memes have to do with digitized manuscripts? This is an excellent question. What I want to do now is look at a couple of specific examples of memes and think a bit in detail about how they work, and what it looks like to push the same idea through memes that are similar but that have slightly different connotations. Then I want to look at some different terms that scholars have used to refer to digitized manuscripts and think a bit about how those terms influence the way we think about digitized manuscripts (if they do). My proposition is that these terms, while they may not exactly be memes, function like memes in the way they are adapted and used within the library and medieval studies scholarly communities. So let’s see how this goes.

In the film The Black Panther, which was released back in February of this year, there’s a scene where a character has come to the country of Wakanda to challenge the king for the throne. This character, N’Jadaka (also named Erik Stevens, but better known by his nickname Killmonger), is a cousin of the king, T’Challa, but was unknown to pretty much everyone in Wakanda until just before he arrives to make his challenge. At the climax of this scene, during which Killmonger and T’Challa fight hand-to-hand in six inches of water, Killmonger – who is clearly winning – turns to the small audience of Wakandans gathered to witness the battle and exclaims, “IS THIS YOUR KING?” If you haven’t seen the film I’m about the spoil it for your: it turns out the answer to that question is NO.

This is a phrase that was born to be a meme, and within a month that’s exactly what happened.

According to the Know Your Meme website the first instance of the “Is this your king” meme appeared on March 20 on Twitter when @TheyWant_Nolan tweeted a screen shot of the scene with the caption “is this your spring”. If you think back to March, the weather was pretty terrible everywhere around the country. It was long and tedious going back and forth between snow and heat then back to snow. Is this your Spring? NOPE.

This type of meme is a snowclone, defined as “a type of phrasal templates in which certain words may be replaced with another to produce new variations with altered meanings, similar to the “fill-in-the-blank” game of Mad Libs.” I would like to note here that this term, snowclone, was coined in 2004 by American linguists Geoffrey K. Pullum and Glen Whitman specifically to describe this phenomenon. The concept of a snowclone has been around for much longer than the term – think of “I’m not an X but I play one on TV” which was the most hilarious phrase when I was a kid – and the “Is this your king” meme works the same way, where we replace king with some other word to make a phrase that is understood to elicit a negative response.

Here are some other examples of this meme featured on its Know Your Meme page. These all supply the identity of the question asker, they vary widely by topic, and one of them makes a slight modification to the image, but they all imply a negative response to the question.

I made one myself. My meme features a screen shot of my favorite manuscript, UPenn LJS 101, as seen through the Penn in Hand manuscript interface. In my meme, the question asked is, is this your book? As we know from the context of the original meme, the answer to the question is no. This is not my book. Or: It’s not my real book.

I’ve made a few other memes and for some reason most of them play with the relationship that a digitized version of a manuscript has with the physical object.

Memes such as “Is this your king” and this next one, the “Is this a pigeon” meme, enable us to ask questions with assumed answers. In this meme, the original scene is from an anime where a human-like android sees a butterfly and asks, “Is this a pigeon?” This is another snowclone, where the question asker, the object of the question, and the question itself can be replaced with almost literally anything else. I find these snowclone memes work well for my needs, though I find the differences between the emotions that these two memes elicit fascinating.

As before, I’ve replaced the object of the question with digital images of LJS 101 and specifically identified myself as the question asker. As with the previous meme, we know the answer to the question posed is no, although the context is different: while the king meme is used to express aggressive negativity, the pigeon meme is used to express mild but total confusion. The same idea can be pushed through both memes – is this digital thing a manuscript? – and while the answer is the same – no it’s not – the negative response of the pigeon meme is “oh you silly thing, thinking the digitized manuscript is the same as the manuscript” while the negative response of the king meme is “that thing is NOT the same as the manuscript, I’m offended you think so, and I’m going to throw it off a cliff so you don’t try it again.”

Although both of these memes can be used as a kind of mirror for us to view the relationship between a manuscript and its digitized version, they expect different responses and elicit different emotions, much as different words used to refer to the same situation or person might invoke different emotions. The memes are, in effect, acting as a kind of terminology, so now I want to pivot and talk about how terminology might act as memes.

Terms

I would like to take it as a given that that how we talk about things influences how we think about them; therefore, the terms we use to describe things matter. The terms we use to describe other people matter; the terms that we choose to refer to digitized manuscripts matter. I would also like to reiterate the proposition I made a few minutes ago that our terminology, while perhaps not memes themselves, are meme-like. In his 2016 article “’ut legi”: Sir John Mandeville’s Audience and Three Late Medieval English Travelers to Italy and Jerusalem,” Anthony Bale discusses Jerusalem as a meme in medieval English travel writings, but I find that his description of meme fits well with what I would like to do here. He says, “the meme proposes a model of cultural transmission based on audiences’ ongoing use and appropriation of the source, as opposed to the scholarly desire to return to the source as the “best” or “original” iteration.” (for a term, this would mean common usage points not to the original meaning of the word, but to the word as it is being used. That’s a bit of a circular argument but I think it makes sense) He continues, “Memes have not one stable author, no unitary point of origins, and are not retrospective, but rather change with their audiences, causing people to do things; stimulating actions and changing behaviors; leading people to take a particular route, see a particular site, notice one thing but not another, find new meanings in an old source.” (Bale, p. 210)

Following this theory, terms work like this:

  1. A term begins with a specific meaning (e.g., outlined in the OED, citing earlier usage),
  2. A scholar adopts the term because we need some way to describe this new thing that we’ve created. So we appropriate this term, with its existing meaning, and we use it to describe our new thing.
  3. The new thing takes on the old meaning of the term,
  4. The term itself becomes imbued with meaning from what we are now using it to describe.
  5. The next time someone uses that term, it carries along with it the new meaning.

Some scholars take time to define their terms, but some scholars choose not to, instead depending on their audience to recognize the existing definitions and connotations of the terms they use. For example, in her 2013 article “Fleshing out the text: The transcendent manuscript in the digital age,” Elaine Treharne (coming out of a description of how medieval people would have always interacted with a physical book) says: “for the greater proportion of a modern audience on any given day, one has necessarily to rely on the digital replication: the world of the ironically disembodied and defleshed simulacrum, avatar, surrogate.” (Treharne, p. 470) [emphasis mine] Here Treharne uses the terms simulacrum, avatar, and surrogate without defining them, and she groups them together, in that order, placing simulacrum first in that list. More than the other two, simulacrum has a negative connotation – as we can see from its entry in the OED, a simulacrum is a “mere image”; it looks like a thing without possessing its substance or proper qualities; it is a “specious imitation”. Although it is near identical in meaning and from related Latin roots as the term facsimile, which I’ll discuss in a moment, facsimile lacks the negative connotations that simulacrum has. Although the terms are undefined by the author, it seems that this was a purposeful word choice intended to elicit a negative response.

Compare this with Bill Endres, who in his 2012 article “More than Meets the Eye: Going 3D with an Early Medieval Manuscript” spends several paragraphs defining his terms and arguing for why he chooses to use some terms and not others. Endres says, “I will refer to 3D and 2D images as digital artifacts or digital versions, although not totally satisfied with either term as it relates to epistemology. I am tempted to refer to them as digital offspring, the results of a marriage between digital and manuscript technologies, with digital versions having unique qualities and a life of their own. This term is problematic but it speaks to the excesses, commonalities, and deficits when digital versions are measured against their physical antecedent.” (Endres, p. 4) Endres then discusses some other terms, including two of the ones I will consider in a moment, so we’ll return to his thoughts later. The point here is that Endres defines his terms and explains why he is using them, while Treharne relies on us to understand her meaning through the known definition of her terms.

Facsimile

For each term I will discuss pre-digital definitions of the term, using the Oxford English Dictionary as the source.[1] I’ll also include a few quotes where scholars refer to digitized manuscripts using that term, although these quotes are meant to be representative and not exhaustive (that is, I couldn’t tell you the first time that the term was used by someone to refer to a digitized manuscript, but I can give you an impression of how the term has been used or is being used currently).

Let’s begin with the term facsimile.

 

It is from the Latin meaning literally make similar. The earliest attestation of the term is from 1661, and refers to a transcribed copy of a text, and not necessarily something that looks just like the text it is being copied from. About 30 years later, facsimile is being used to mean an exact copy or likeness; an exact counterpart or representation, and the citations refer to written texts or drawings. The term continues to be used according to this definition into the later 19th century, by the time photography of books and manuscripts has become well-represented in the scholarly landscape. (David McKitterick, Old Books, New Technologies, pp. 117-118)

By the late 19th century, facsimile has been adapted to refer to the communication of images through radio, wire, or similar methods – the modern day “fax” machine, for example. This meaning maintains the previous definitions focusing on a facsimile as some kind of copy, but adds the meaning of communicating over distance, and I expect these combined uses of the terms – print facsimiles plus the sharing of images over distance – are why digital facsimile became an obvious term to use to describe these new representations of old objects.

The use of facsimile to refer to textual materials clearly varies over time and from individual to individual. In his 1926 article ‘Facsimile’ Reprints of Old Books, A. W. Pollard seems to use the term according to its 1661 attestation, not according to its 1691 attestation. He says “It is intended to cover any reprint the form of which has been influenced to any considerable extent by the form of the edition reproduced.” (Pollard, p. 305) Pollard’s ‘Facsimile’ reprints include “1) Photographic facsimiles, 2) Type-facsimiles, i.e. editions in which types of similar founts to those used in the original are set to follow the original setting as closely as possible; 3) more or less luxurious reprints which seek to reproduce the general effect of the original with such concessions to modern usage as the producer may think desirable.” (Pollard, p. 306)

Facsimile or digital facsimile has been, for as long as I can remember, the default term that libraries use to refer to their own digital copies, and that scholars use to refer to the digital images they incorporate into their online projects. In November 1993, Kevin Kiernan gave a presentation at a symposium of the Association of Research Libraries [Kiernan, “Digital Preservation, Restoration, and Dissemination of Medieval Manuscripts”] in which he says that the Electronic Beowulf  “will in its first manifestation make available in early 1994 a full-color electronic facsimile of Cotton Vitellius A. xv to readers in the British Library and at other selected sites.” He continues,  “As this electronic archive grows, it will incorporate facsimiles of many other documents that help us restore parts of the manuscript that were lost or damaged by fire in the early eighteenth century.” Kiernan is referring not only to straightforward digital images, but also to images taken under ultraviolet light that were included in the edition. As he says later in the presentation, because of the UV images “Readers of the electronic facsimile will thus acquire a reproduction of the manuscript that reveals more than the manuscript itself does under ordinary circumstances.”

The use of the term facsimile makes it possible for scholars to consider how digital facsimiles relate to older ways of making similar. In “The Ghost in the Machine: Digital Avatars and Medieval Manuscripts“, Sian Echard discussion of the restoration of manuscripts by Matthew Parker and his circle, which she interprets as a kind of facsimile. Dr. Echard says “Today, digital technologies continue to recreate medieval books for a variety of audiences, and the digital facsimiles, like the hand and machine produced examples … both reproduce and relocate their medieval objects. But our current attitudes toward facsimile differ from Parker’s and Dibdin’s, and may in fact inhibit our ability to see the extent to which we too are recreating medieval text objects according to our own tastes. As technology has enabled ever more exact reproduction, the cheerful refashioning proposed by Parker has been replaced by an emphasis on the photographic, on the exact, with at times an accompanying confidence that perfect reproduction can approach the revelation of an object’s truth.” (Echard p. 201)

Surrogate

The term surrogate is interesting because, unlike facsimile – which is a fairly straightforward synonym for a copy – the term refers to something standing in for, or perhaps replacing, something else.

It was first used in the 16th century to describe the act of appointing someone as a delegate or a substitute. In the 17th century the term is adopted to be a noun – to refer to a person who is thus delegated. Other uses of the term, meaning more or less similar things, are attested through through the 17th century,

until 1644 we have a general meaning substitute.

 

 

 

Since the 1970s the term has been used in a more intimate way, to refer to sexual surrogates and surrogate mothers. As my colleague Bridget Whearty pointed out to me while we were discussing the word surrogate, the term is almost always used to describe bodies – either a person having power delegated to them, or a body acting as a substitute for another body. So the implication is that using this term to refer to digitized manuscripts doesn’t only mean the digital is standing in for the physical, but it also – by virtue of previous uses of the term – may imply some sort of embodiment or materiality of the digital object that is acting as the surrogate.

Paul Conway has an extensive discussion of the digital surrogate in his 2014 article “Digital transformations and the archival nature of surrogates”, and although he is referring to archival materials and not medieval manuscripts, I would expect that the use of the term comes from the same place, so I will quote him here. He reflects my own thoughts about a surrogate being more than a copy, saying “The creation of digital surrogates from archival sources is fundamentally a process of representation, far more interesting and complex than merely copying from one medium to another. Theories of representation – and the vast literature derived from them – are at the heart of many disciplines’ scholarship and of particular relevance for scholars who work primarily or exclusively in the digital domain.” (Conway pp. 2-3) He then continues to cite several other scholars – Mitchell, Scruton, Geoffrey Yeo, Matthew Kirschenbaum, Michael Taussig, and Johanna Drucker – who discuss the relationship that digital copies continue to have with their sources well after they have been created, even as they have their own materialities.

Bill Endres, who I quoted above, continues his thoughtfulness in the same piece as he considers surrogate as a term for his own use in describing 3D images of manuscripts. He says, “a term that has gained some commonality in 3D is digital surrogate. Bernard Fischer uses the term for 3D renderings of archaeological sites, like the impressive Rome Reborn. Fischer’s interest in 3D is to construct digital cityscapes and large spaces, thus his use of surrogate, the virtual environment functioning as a substitute or proxy, a stand in for the likes of a dig site or what once was, like ancient Rome, as a means to generate and test hypotheses, fulfilling a specific epistemic function. Surrogate fits Fischer’s needs but does not speak as readily to the full range of epistemic considerations that I want to explore for a manuscript, particularly the excesses of a digital artifact that add to our knowledge in other ways and its effect on looking and knowing.” (Endres, p. 4) The excesses that Endres is referring to here are things like special lighting and the affordances of 3D imaging, and he feels that the term surrogate isn’t sufficient to include these things, although Endres’s excesses and are very similar to those things that Kiernan was thinking of in 1993 when he used the term electronic facsimile. However Kiernan did not use the term surrogate in 1993 – it would be interesting to see when the term surrogate was first used to refer to digital objects, and if it would have been available to Kiernan in 1993.

Avatar

The third term, avatar, is relatively new to me, although Sian Echard used it in the chapter quoted above, and the term was also used by classicist Ségolène M. Tarte, in her 2011 presentation “Interpreting Ancient Documents: Of Avatars, Uncertainty and Knowledge Creation,” and is also mentioned by Endres and very recently by Michelle Warren, in a just-published article “Remix the Medieval Manuscript: Experiments with Digital Infrastructure.” This term is not yet common, but it may be gaining purchase because of its inherent complexity.

I really like avatar because of the connotations brought along with its original definition. According to Hindu mythology, an avatar is the incarnate, human manifestation of a deity. It is thus the avatar that is embodied, not the thing that the avatar represents. This can be contrasted with the term surrogate, which is also embodied, but the surrogate embodiment is in replacement of something else, while the embodiment of the avatar is the same thing, but in different form. And compare both of these again with facsimile, which again is a copy – these are three very different terms, and yet we have the desire to apply these terms to… if not the exact same things, than at least to the same kind of things.

The term avatar has also been used to mean more generally a manifestation, and I actually think that this is the usage of the term that is closest to its application to digitized manuscripts, although there is another recent usage that is relevant: avatar as a term to describe a character in a computer game on environment, a character that represents a person or a player within that virtual environment (think of Second Life, or, to use a more current example, Minecraft).

(There was also a popular movie by this name that came out in 2009, right around the same time Second Life was reaching peak popularity, and I can’t give short shrift to Avatar: The Last Airbender, an animated show that ran from 2003-2008.)

So what is an avatar when it comes to medieval manuscripts? Echard uses the term to refer both to physical objects and to digital ones, first describing the digital avatars of the Sherborne Missal included in the British Library exhibit celebrating its purchase. These include large-screen installations in the Library gallery, a CD-Rom available for purchase, an online version, and a 3D animation sequence that plays as an introduction to the CD-ROM. However as Echard says, “The avatars for these rare objects have … been books themselves- manipulable, tangible, physical … the physicality of the book is part of its cultural role, whether as public object or private delight. The digital facsimiles I have discussed here all attempt in one way or another to offer these medieval and early modem books to the fulfilling of both roles, and yet I would argue that they are ultimately stymied by the requirement to disembody the objects they display. The resulting tension, between access and absence, creates the ghosts that haunt the digital realm.” (Echard, p. 214) I’ve always loved this description of the tension of digitized manuscripts, and I am tickled to notice only now that the term avatar as attached to it.

I know that I keep quoting Endres, but I find here that again his thoughtfulness in exploring the terminology is really refreshing and I wish more scholars did this kind of intellectual work. He says,  “I find Ségolène Tarte’s impulse to call digital versions avatars most consistent with my needs, the digital version as an incarnation, the physical artifact crossing over and into a digital form. Since I am working on a gospel book, I cannot help but to think about this issue’s echo in early Christian prohibitions against depictions of Christ in the flesh, the prohibition motivated by the belief that physical matter is mundane, not divine, and therefore a painting or statue could not portray Christ’s divine nature, thus could not portray Christ and was blasphemous. In a similar vein, without the blasphemy, a digital version cannot portray all of the features of a physical artifact, but as mentioned, it also includes excesses. I appreciate Tarte’s choice of the word avatars, its recognition that digital artifacts have excesses and exist in a different reality and with different rules and potentials, offering unique advantages and experiences, a recognition that I want to carry forward in my sense of digital artifact or version.” (Endres, p. 4)

Before I conclude, I would like to remark on our apparent desire as a community to apply meaning to digital version of manuscripts by using existing terms, rather than by inventing new terms. After all, we coin new words all the time – just in this paper, I’ve mentioned snowclone and meme, so it would be understandable if we decided to make up a new term rather than reusing old ones. But as far as I know we haven’t , and if anyone has it hasn’t caught up enough to be reused widely in the scholarly community. I expect this comes from a desire to describe a new thing in terms that are understandable, as well as to define the new thing according to what came before. After all, both snowclone and meme are terms for things that have existed long before there were words for them, while digital versions of manuscripts are new things that have a close relationship with things that existed before, so while we want to differentiate them we also want to be able to acknowledge their similarities, and one way to do that is through the terms we call them.

Although we use these three terms – facsimile, surrogate, and avatar – to refer to digitized manuscripts, it is clear that these terms don’t mean the same thing, and that by choosing a specific term to refer to digitized manuscripts we are drawing attention to particular aspects of them. If I call a digitized manuscript a facsimile, I draw attention to its status as a copy. If I call it a surrogate, I draw attention to its status as a stand-in for the physical object. And if I call it an avatar, I draw attention to its status as a representation of the physical object in a digital world. Not a copy, not a replacement, but another version of that thing. Like pushing an idea through different memes, pushing the concept of a digitized manuscript through different terms give us flexibility in how we consider them and how we explain them, and our feelings about them, to our audiences. That we can so easily apply terms with vastly different meanings to the digital versions of manuscripts says something about the complexity of these objects and their digital counterparts.

Thank you.

Sincere thanks to Bridget Whearty, Keri Thomas, Johanna Green, and Anna Levine, for their help getting this paper ready for the public eye.

[1] In the paper presented at the Rare Book School (which was recorded; I will add a link here when it becomes available) I used the Historical Thesaurus of English as the source for the term definitions, but I found during further editing that the Thesaurus timelines weren’t doing what I needed them to. If I continue this work, I expect to bring the timelines back in again.

Zombie Manuscripts: Digital Facsimiles in the Uncanny Valley

This is a version of a paper presented at the International Congress on Medieval Studies, May 12, 2018, in session 482, Digital Skin II: ‘Franken-Manuscripts’ and ‘Zombie Books’: Digital Manuscript Interfaces and Sensory Engagement, sponsored by Information Studies (HATII), Univ. of Glasgow, and organized by Dr. Johanna Green.

The uncanny valley was described by Masahiro Mori in a 1970 article in the Japanese journal Energy, and it wasn’t translated into English completely until 2012.[1] In this article, Mori discusses how he envisions people responding to robots as they become more like humans. The article is a thought piece – that is, it’s not based on any data or study. In the article, which we’ll walk through closely over the course of this presentation, Mori posits a graph, with human likeness on the x axis and affinity on the y axis. Mori’s proposition is that, as robots become more human-like, we have greater affinity for them, until they reach a point at which the likeness becomes creepy, or uncanny, leading to a sudden dip into negative affinity – the uncanny valley.

Now, Mori defined the uncanny valley specifically in relation to robotics, but I think it’s an interesting thought exercise to see how we can plot various presentations of digitized medieval manuscripts along the affinity/likeness axes, and think about where the uncanny valley might fall.

In 2009 I presented a paper, “Reading,
 Writing,
 Building: 
the 
Old
 English
Illustrated
 Hexateuch,” (unpublished but archived in the Indiana University institutional repository) in which I considered the uncanny valley in relation to digital manuscript editions. This consideration followed a long description of the “Turning the Pages Virtualbook” technology which was then being developed at the British Library, of which I was quite critical. At that time, I said:

In my mind, the models created by Turning the Pages™ fall at the nadir of the “uncanny valley of digital texts” – which has perhaps a plain text transcription at one end and the original manuscript at the other end, with print facsimiles and editions, and the various digital displays and visualizations presented earlier in this paper falling somewhere between the plain text and the lip above the chasm.

Which would plot out something like this on the graph. (Graph was not included in the original 2009 paper)

Dot’s 2009 Conception of the Uncanny Valley of Manuscripts

Nine years of thinking on this and learning more about how digital manuscripts are created and how they function, I’m no longer happy with this arrangement. Additionally, in 2009 I was working with imperfect knowledge of Mori’s proposition – the translation of the article I referred to then was an incomplete translation from 2005, and included a single, simplified graph in place of the two graphs from the original article – which we will look at later in this talk.

Manuscripts aren’t people, and digitized manuscripts aren’t robots, so before we start I want to be clear about what exactly I’m thinking about here. Out of Mori’s proposition I distill four points relevant to our manuscript discussion:

First, Robots are physical objects that resemble humans more or less (that is the x-axis of the graph)

Second, as robots become more human-like, people have greater affinity for them (until they don’t – uncanny valley) – this is the y-axis of the graph

Third, the peak of the graph is a human, not the most human robot

Fourth, the graph refers to robots and to humans generally, not robots compared to a specific human.

Four parallel points can be drawn to manuscripts:

First, digitized manuscripts are data about manuscripts (digital images + structural metadata + additional data) that are presented on computers. Digitized manuscripts are pieces, and in visualizing the manuscript on a computer we are reconstructing them in various ways. (Given the theme of the session I want to point out that this description makes digitized manuscripts sound a lot more like Frankenstein’s creature than like a traditional zombie, and I’m distraught that I don’t have time to investigate this concept further today) These presentations resemble the parent manuscript more or less (this is the x-axis)

Second, as presentations of digitized manuscripts become more manuscript-like, people have greater affinity for them (until they don’t – uncanny valley) – this is the y-axis

Third, the peak of the graph is the parent manuscript, not the most manuscript-like digital presentation

Fourth, the graph refers to a specific manuscript, not to manuscripts generally

I think that this is going to be the major difference in applying the concept of the uncanny valley to manuscripts vs. robots: while Robots are general, not specific (i.e., they are designed and built to imitate humans and not specific people), the ideal (i.e., most manuscript-like) digital presentation of a manuscript would need to be specific, not general (i.e., it would need to be designed to look and act like the parent manuscript, not like any old manuscript)

Now let’s move on to Affinity

A Valley in One’s Sense of Affinity

Mori’s article is divided into four sections, the first being “A Valley in One’s Sense of Affinity”. In this section Mori describes what he means by affinity and how affinity is affected by sensory input. Figure one in this section is the graph we saw before, which starts with an Industrial Robot (little likeness, little affinity), then a Toy Robot (more likeness, more affinity), then drops to negative affinity at about 80-85% likeness, with Prosthetic Hand at negative affinity and Bunraku Puppet on the steep rise to positive affinity and up to Healthy Person.

For Mori, sensory input beyond the visual is important for an object’s placement on the x-axis. An object might look very human, but if it feels strange, that doesn’t only send the affinity into the negative, but it also lessens the likeness. Mori’s original argument focuses on prosthetic hands, specifically about realistic prosthetic hands, which cannot be distinguished at a glance from real ones. I’m afraid the language in his example is abelist so I don’t want to quote him,

Luke Skywalker’s prosthetic hand in The Empire Strikes Back

but his argument is essentially that a very realistic prosthetic hand, when one touches it and realizes it is not a real hand (as one had been led to believe), it becomes uncanny. Relating this feeling to the graph, Mori says, “In mathematical terms, this can be represented by a negative value. Therefore, in this case, the appearance of the prosthetic hand is quite humanlike, but the level of affinity is negative, thus placing the hand near the bottom of the valley in Figure 1.”

The character Osono, from the play Hade Sugata Onna Maiginu (艶容女舞衣), in a performance by the Tonda Puppet Troupe of Nagahama, Shiga Prefecture. https://en.wikipedia.org/wiki/Bunraku#/media/File:Osonowiki.jpg (CC:BY:SA)

Bunraku puppets, while not actually resembling humans physically as strongly as a very realistic prosthetic hand visually resembles a human hand, fall farther up the graph both in terms of likeness and in affinity. Mori makes it clear that likeness is not only, or even mostly, a visual thing. He says:

I don’t think that, on close inspection, a bunraku  puppet appears similar to a human being. Its realism in terms of size, skin texture, and so on, does not even reach that of a realistic prosthetic hand. But when we enjoy a puppet show in the theater, we are seated at a certain distance from the stage. The puppet’s absolute size is ignored, and its total appearance, including hand and eye movements, is close to that of a human being. So, given our tendency as an audience to become absorbed in this form of art, we might feel a high level of affinity for the puppet.

So it’s not that bunraku puppets look like humans in great detail, but when we experience them within the context of the puppet show they have the affect of being very human-like, thus they are high on the human likeness scale.

For a book-related parallel I want to quote briefly a blog post, brought to my attention earlier this week, by Sean Gilmore. Sean is an undergraduate student at Colby College and this past semester took Dr. Megan Cook’s Book History course, for which he wrote this post, “Zombie Books; Digital Facsimiles for the Dotty Dimple Stories.” There’s nothing in this post to suggest that Sean is familiar with the uncanny valley, but I was tickled with his description of reading a digital facsimile of a printed book. Sean says:

In regards to reading experience, reading a digital facsimile could not be farther from the experience of reading from the Dotty Dimple box set. The digital facsimile does in truth feel like reading a “zombie book”. While every page is exactly the same as the original copy in the libraries of the University of Minnesota, it feels as though the book has lost its character. When I selected my pet book from Special Collection half of the appeal of the Dotty Stories was the small red box they came in, the gold spines beckoning, almost as if they were shouting out to be read. This facsimile, on the other hand, feels like a taxidermy house cat; it used to be a real thing, but now it feels hollow, and honestly a little weird.

Sean has found the uncanny valley without even knowing it exists.

The Effect of Movement

The second section of Mori’s article, and where I think it really gets interesting for thinking about digitized manuscripts, is The Effect of Movement. In the first section we were talking in generalities, but here we see what happens when we consider movement alongside general appearance. Manuscripts, after all, are complex physical objects, much as humans are complex physical objects. Manuscripts have multiple leaves, which are connected to each other across quires, the quires which are then bound together and, often, connected to a binding. So moving a page doesn’t just move a page, much as bending your leg doesn’t just move your leg. Turning the leaf of a manuscript might tug on the conjoined leaf, push against the binding, tug on the leaves preceding and following – a single movement provoking a tiny chain reaction through the object, and one which, with practice, we are conditioned to recognize and expect.

Mori says:

Movement is fundamental to animals— including human beings—and thus to robots as well. Its presence changes the shape of the uncanny valley graph by amplifying the peaks and valleys (Figure 2). For illustration, when an industrial robot is switched off, it is just a greasy machine. But once the robot is programmed to move its gripper like a human hand, we start to feel a certain level of affinity for it.

And here, finally, we find our zombie, at the nadir of the “Moving” line of the uncanny valley. The lowest point of the “Still” line is the Corpse, and you can see the arrow Mori has drawn from “Healthy Person” at the pinnacle of the graph down to “Corpse” at the bottom. As Mori says, “We might be glad that this arrow leads down into the still valley of the corpse and not the valley animated by the living dead.” A zombie is thus, in this proposition, an animated corpse. So what is a “dead” manuscript? What is the corpse? And what is the zombie? (I don’t actually have answers, but I think Johanna might be addressing these or similar questions in her talk)

Reservoir Dogs (not zombies)
The Walking Dead (shuffling zombies)
28 Days Later (manic zombies)

I expect most of us here have seen zombie movies, so, in the same way we’ve been conditioned to recognize how manuscripts move, we’ve been conditioned to understand when we’re looking at “normal” humans and when we’re looking at zombies. They move differently from normal humans. It’s part of the fun of watching a zombie film – when that person comes around the corner, we (along with the human characters in the film) are watching carefully. [13] Are they shuffling or just limping? [14] Are they running towards us or away from something else? It’s the movement that gives away a zombie, and it’s the movement that will give away a zombie manuscript.

 

I want to take a minute to look at a manuscript in action. This is a video of me turning the pages of Ms. Codex 1056, a Book of Hours from the University of Pennsylvania. This will give you an idea of what this manuscript is like (its size, what its pages look like, how it moves, how it sounds), although within Mori’s conception this video is more similar to a bunraku puppet than it is like the manuscript itself.

It’s a copy of the manuscript, showing just a few pages, and the video was taken in a specific time and space with a specific person. If you came to our reading room and paged through this manuscript, it would not look and act the same for you.

e-codices manuscript viewer
e-codices viewed through Mirador

Now let’s take a look at a few examples of different page-turning interfaces. The first is from e-codices, and is their regular, purpose-built viewer. When you select the next page, the opening is simply replaced with the next opening (after a few seconds for loading). The second is also e-codices, but is from the Mirador viewer, a IIIF viewer that is being adopted by institutions and that can also be used by individuals. Similar to the other viewer, when you select the next page the opening is replaced with the next opening (and you can also track through the pages using the image strip along the bottom of the window). The next example is a Bible from Swarthmore College near Philadelphia, presented in the Internet Archive BookReader. This one is designed to mimic a physical page turning, but it simply tilts and moves the image. This would be fine (maybe a bit weird) if the image were text-only, but as the image includes the edges of the text-block and you can see a bit of the binding, the effect here is very odd. Finally, my old friend Turning the Pages (a newer version than the one I complained about in my 2009 paper), which works very hard to mimic the movement of a page turning, but doing so in a way that is unlike any manuscript I’ve ever seen.

Escape by Design

In the third section of his article, Mori proposes that designers focus their work in the area just before the uncanny valley, creating robots that have lower human likeness but maximum affinity (similar to how he discussed bunraku puppets in the section on affinity, although they are on the other side of the valley). He says:

In fact, I predict that it is possible to create a safe level of affinity by deliberately pursuing a nonhuman design. I ask designers to ponder this. To illustrate the principle, consider eyeglasses. Eyeglasses do not resemble real eyeballs, but one could say that their design has created a charming pair of new eyes. So we should follow the same principle in designing prosthetic hands. In doing so, instead of pitiful looking realistic hands, stylish ones would likely become fashionable.

Floral Porcelain Leg from the Alternative Limb Project (http://www.thealternativelimbproject.com/project/floral-porcelain-leg/)

And here’s an example of a very stylish prosthetic leg from the Alternative Limb Project, which specializes in beautiful and decidedly not realistic prosthetic limbs (and realistic ones too). This is definitely a leg, and it’s definitely not her real leg.

 

In the world of manuscripts, there are a few approaches that would, I think, keep digitized manuscript presentations in that nice bump before the valley:

 

“Page turning” interfaces that don’t try to hard to look like they are actually turning pages (see the two e-codices examples above)

Alternative interfaces that are obviously not attempting to show the whole manuscript but still illustrate something important about them (for example, RTI, MSI, or 3D models of single pages). This example is an interactive 3D image of the miniature of St. Luke from Bill Endres’s Manuscripts of Lichfield Cathedral project.

Visualizations that illustrate physical aspects of the manuscript without trying to imitate them (for example, VisColl visualizations with collation diagrams and bifolia)

 

I think these would plot out something like this on the graph.

Dot’s 2018 Conception of the Uncanny Valley of Digitized Manuscripts

This is all I have to say about the uncanny valley and zombie books, but I’m looking forward to Johanna, Bridget and Angie’s contributions and to our discussion at the end. I also want to give a huge shout-out to Johanna and Bridget, to Johanna for conceiving of this session and inviting me to contribute, and both of them for being immensely supportive colleagues and friends as I worked through my thoughts about frankenbooks and zombie manuscripts, many of which, sadly, didn’t make it into the presentation, but which I look forward to investigating in future papers.

[1] M. Mori, “The uncanny valley,” Energy, vol. 7, no. 4, pp. 33–35, 1970 (in Japanese);  M. Mori, K. F. MacDorman and N. Kageki, “The Uncanny Valley [From the Field],” in IEEE Robotics & Automation Magazine, vol. 19, no. 2, pp. 98-100, June 2012. (translated into English) (https://ieeexplore.ieee.org/document/6213238/)

Data for Curators: OPenn and Bibliotheca Philadelphiensis as Use Cases

Following are my remarks from the Collections as Data National Forum 2 event held at the University of New Mexico, Las Vegas, on May 7 2018. Collections as Data is an Institute of Museum and Library Services supported effort that aims to foster a strategic approach to developing, describing, providing access to, and encouraging reuse of collections that support computationally-driven research and teaching in areas including but not limited to Digital Humanities, Public History, Digital History, data driven Journalism, Digital Social Science, and Digital Art History. The event was organized by Thomas Padilla, and I thank him for inviting me. It was a great event and I was honored to participate.

Today I’m going to be talking about curators as an audience for collections as data, using two projects from the University of Pennsylvania’s Kislak Center for Special Collections, Rare Books and Manuscripts as use cases. I am a curator in the Kislak Center, and most of my time I work on projects under the aegis of the Schoenberg Institute for Manuscript Studies, which is a unit under the Kislak Center. SIMS is a kind of research and development group (our director likes to refer to it as a think tank) that focuses on manuscript studies writ large, mostly but by no means only focused on medieval manuscripts from Europe, and that specializes in examining the relationship between manuscripts as physical objects and their digitized counterparts.

For this session, we’ve been asked to react to this assertion from the Collections as Data Santa Barbara Statement: Collections as data designed for everyone serve no one, and to discuss the audiences that our collections as data are built for.

I’ll start with OPenn, which launched in May 2015 as an open access collection of Penn’s digitized manuscript material. Penn started digitizing its manuscripts in the mid 1990s, but they had been virtually locked in a black box system. To create OPenn we cracked opened the box, generated new derivative images from the master TIFF files, generated TEI/XML manuscript description files using the data from our catalog and supporting databases, and put it all in a fully public file server. The collection navigation is provided by HTML pages – one that lists all the repositories, pages listing the manuscripts in each repository, and finally HTML pages for each manuscript presenting the catalog data and links to the image files. At the time OPenn launched, there was no search facility, although one has recently been added.

OPenn’s developer, Doug Emery, describes the access that OPenn provides as friction-free access, referring both to the licensing (the image files are in the public domain, the metadata is licensed cc:by) and to the technical access. There’s no login and no API. You can navigate to the site in a browser and download images, or you can point wget at the server and bulk download entire manuscripts.

When we were designing OPenn, we weren’t thinking that much about the audience, honestly. We were thinking about pushing the envelope with fully available, openly licensed, high resolution, robustly described and well-organized digitized medieval manuscripts. We did imagine who might use our collections, and how, and you can read the statement from our readme here on the screen.

But I can’t say that we built the system to serve any audience in particular. We did build the system in a way that we thought would be generally useful and usable. But it became clear after OPenn launched that our lack of an audience made it difficult for us to “sell” OPenn to any group of people. Medievalists, faculty and students, who might want to use the material, were put off by the relatively high technical learning curve, the simple interface (lacking the expected page-turning view) and by the lack of search (we do have a Google Search now, but it was only added to the site in the past month). Data analysts who might want to visualize the collection-wide data were put off by the formatting of each manuscript having its own TEI file. Indeed data designed for everyone does seem to serve no one.

But wait! Don’t lose hope! An accidental audience did present itself. In the months and into the first year after OPenn launched, it was slowly used as a source for projects. The Philadelphia Area Consortium of Special Collections Libraries, PACSCL, undertook a collaborative project whereby each member institution digitized five diaries from their collections, which were put on OPenn, the PACSCL Diaries Project.

When the project went live, the folks at PACSCL wanted a user-friendly way to make the diaries available, so I generated page-turning interfaces using the Internet Archive Bookreader  that pulled in metadata from the TEI files and that point to the image files served on OPenn.

At some point I decided that I wanted to get a better sense of one of our manuscript collections, the Lawrence J. Schoenberg Collection, so again I wrote a script to generate a CSV file pulling from all the collection’s TEI files. Jessie Dummer, the Kislak Center’s Digitization Project Coordinator, cleaned up the data in the CSV, and we were able to load the CSV into Palladio for visualization and analysis (on github)

I combined the links to images on OPenn with data gathered through another SIMS project, VisColl (which I’ll describe in a bit more detail later) to generate a visualization of the gathering structure of manuscripts with the bifolia, or sheets, laid alongside. And last but not least, I experimented with setting up a IIIF image server that could serve the images from OPenn as IIIF-compatible images (this is a screenshot of the github site where I published IIIF manifests I generated as part of that project, but they don’t work because the server no longer exists).

The accidental audience? It was me.

I don’t remember thinking about or discussing with the rest of the team as we planned for OPenn how I might use it as part of my regular work. I was familiar with the concept of an open collection of metadata and image files online; OPenn was based on The Digital Walters, which both the Director of the Kislak Center Will Noel and Doug Emery had built when they were employed at the Walters Art Museum in Baltimore, and I had been playing with that data for a year before I was even hired at Penn. I must have know that I would use it, I just didn’t realize how much I would use it, or how having it available to me would change the way I thought about my work, and the way I worked with the collections. The things that made it difficult for other people to use OPenn – the lack of a search facility, the dependence on XML – didn’t affect me negatively. I already knew the collection, so a search wasn’t necessary; at the time OPenn launched I had been working with XML technologies for 10 years or so, so I was very comfortable with it.

Having OPenn as a source for data gives me so much in my curatorial role. I have the flexibility to build the interfaces I want using tools I can understand, and flexibility, easy access, familiar formats

At the very end of 2015, several months after OPenn was launched, we, along with PACSCL, Lehigh University, and the Free Library of Philadelphia, were awarded a grant from the Council on Library and Information Resources under the “Digitizing Hidden Collections” program to digitize western Medieval manuscripts in 15 Philadelphia area libraries. We call the project Bibliotheca Philadelphiensis, the “library of Philadelphia”, or BiblioPhilly for short. Working from my experience working with data on OPenn, during the six-month lead up to cataloging and digitization I was able to build the requirements for the BiblioPhilly metadata in a way to guarantee that the resulting data would be useful to me and to the curators and librarians at the other institutions. Some of the things we implemented include a closed list of keywords (based on the keyword list developed for the Digital Walters), in contrast with the Library of Congress subject headings in OPenn, and four different date fields (date range start, date range end, single date, and narrative date) with strict instructions for each (except for narrative date) to ensure that the dates will be computer readable.

We have also integrated data from VisColl into BiblioPhilly, both into the data itself, and in combination with the data in the interfaces. VisColl, as I mentioned before, is a system to model and visualize the quire structure of manuscripts. (A manuscript’s quire structure is called its collation, hence the name VisColl – visualizing collation) VisColl models are XML files that describe each leaf in a manuscript and how those leaves relate to each other (if they are in the same quire, or if they are conjoined, if a leaf is missing or has been added, etc.). From a model we’re able to generate a concise description of a manuscripts’ construction, in a format referred to as a collation formula, and this formula is included in the manuscript’s cataloging and becomes part of the TEI manuscript description. However we’re also able to combine the information from the collation model with the links to the image files on OPenn to generate views of a collation diagram alongside the sheets that make up the quires. 

For BiblioPhilly, because of the experimentation we did with Penn manuscripts on OPenn, we’ve been able to make the digitized BiblioPhilly manuscripts available online in ways that are more user-friendly to non-technical users than OPenn is, even before we have an “official” project interface. We did this by building an In Progress Viewer relatively early on. The aim of the In Progress viewer was 1) to provide technically simple, user-friendly ways to search, browse, and view the manuscripts, and 2) to make available information both about the manuscripts that were online, and about the manuscripts that had yet to go online (including the date they were photographed, so users can track manuscripts of particular interest through the process).

The first In Progress Viewer was built in the Library of Congress’s Viewshare,  which provided federated browsing for all the fields in our records, along with a timeline and simple mapping facility. Unfortunately the Library of Congress is no longer supporting Viewshare, and when it went offline on March 20 we moved to an Omeka platform, which is more attractive but lacks the federated searching that made Viewshare so compelling. From Omeka (and Viewshare before it) we link to the manuscript data on OPenn, to Internet Archive BookReader page-turners, and to VisColl collation views. Both the BookReaders and VisColl views are generated locally from scripts and hosted on a Digital Ocean droplet. This is a temporary system, and is not built to last beyond the end of the project. It will be replaced by an official, longer-lived interface.

We’re also able to leverage the OPenn design of BiblioPhilly and VisColl for this “official” interface, which is currently under development with Byte Studios of Milwaukee, Wisconson. While our In Progress Viewer has both page-turning facility and collation views, those elements are separate and are not designed to interact. The interface that we are designing with Byte Studios incorporates the collation data with the page-turning and will allow a user to switch seamlessly between page openings and full sheets.

It’s exciting that we’ve been able to leverage what was essentially an audience-less platform into something that can so well serve its curator, but there is a question that this approachpushes wide open: What does it mean to be a curator? With a background in digital humanities focused on the development of editions of medieval manuscripts I was basically the perfect curator for OPenn. But that was a happy accident. Most special collections curators don’t have my background or my technical training, so access to something like OPenn wouldn’t help them, and I’m very hesitant to suggest that every curator be trained in programming. I do think that every special collections department should have some in-house digital expertise, and maybe that’s the direction to go. Anyway, I’m very happy being in my current situation and I only wish we’d considered the curator as an audience for OPenn earlier in the process.

Workflow: MS Word to TEI

For the past couple of years I’ve been refining a workflow to convert MS Word files into publishable TEI. By “publishable” I mean TEI that can be loaded into some existing publication system (something like TEI Publisher, Edition Visualization Technology (EVT), or TEI Boilerplate), or that you could process yourself in some other way.

Why might you want to use such a workflow? In my experience, it’s useful when you have a person or people who are designated as transcribers, but who aren’t comfortable or interested in encoding in XML. Microsoft Word is ubiquitous, so pretty much everyone in academia uses it and has access to it. For people who don’t want to work with pointy brackets but still want to collaborate on a digital editing project, a workflow that converts Microsoft Word to TEI can be very useful. (I have also used this workflow myself, even though I’m capable of hand-encoding XML, just because there are times when I’d rather just to it in Word. YMMV!)

I think the workflow works best when there is one person designated to do all the conversion at the end (steps 4 and 5) and any number of people involved in the first three steps. The workflow could be used in the classroom as a group project (where the students model the TEI, plan the pseudocodes, and do the encoding, and one student or the instructor does the conversion work at the end) although I’ve only used it for non-classroom editing projects.

There are a few things you need in order to be successful with this workflow:

  1. You need a team that knows TEI. This doesn’t mean they need to know XML! (Although yes, you will need someone on your team who knows XML, but that’s not related to TEI) You need to know TEI basics –  what tags and attributes are, how modules and classes work – and you need these because you need to know what TEI tags you want in your final document before you start transcribing.
  2. Microsoft Word (obviously)
  3. OxGarage conversion tools. OxGarage is a service of the Text Encoding Initiative, which provides scripts for converting between a variety of text formats, including MS Word to TEI.
  4. OxygenXML Editor (or an XML editor of your choice). OxygenXML is popular with the TEI community, and it has the find & replace functionality that is required by this workflow. BBEdit is another XML/text editor that I use a lot, and it has a great find & replace functionality, but it doesn’t work as well for this workflow for reasons I’ll describe later in this post.

The steps of the workflow are (briefly):

  1. Model your TEI.
  2. Create pseudocodes to “tag” in MS Word.
  3. Transcribe in MS Word, using the pseudocode “tags” to indicate those things that will eventually be converted into TEI.
  4. Convert the finished MS Word document into TEI using OxGarage.
  5. Use find & replace in OxygenXML to convert the pseudocode tags into TEI tags, resulting in well-formed and complete TEI.

In more detail:

Model your TEI

The very first thing you need to do is to decide what you need your finished TEI to be able to do. If you’re working with an existing system (e.g., if you know you’ll be publishing in EVT at the end) some of your decisions will be made for you, because you will need to have TEI code that the system can use.[1] Are you encoding abbreviations, and if so are you going to tag the entire word or just the abbreviation and expansion? Are you going to normalize spelling, and if so are you going to do it silently or tag it? Are there marginalia you want to include in your TEI code? Do you want to include editorial notes?

Make a list of everything you need in your TEI, which TEI tags you plan to use, and how you plan to use them. You’ll need this to do the next step of the workflow, the creation of pseudocodes.

Create pseudocodes and tag them in MS Word

Pseudocodes are what I call non-TEI formatting elements that are used to set text apart, and are later processed into TEI tags. Pseudocodes can be divided into two main types: native MS Word formatting (italics, underlining, superscript, etc.) and punctuation marks.

Native MS Word Formatting

Native formatting in Word

MS Word formatting is converted by OxGarage into TEI <hi> tags with the relevant values for @rend. For example, Italics converts to <hi rend=“italic”>Italics</hi>, Bold converts to <hi rend=”bold”>Bold</hi rend=”bold”>, Underscore converts to <hi rend=“underline”>Underscore</hi>, Strikethrough converts to <hi rend=“strikethrough”>Strikethrough</hi>, Red text converts to <hi rend=“color(FF0000)”>Red text</hi>, Yellow highlight (not an option in WordPress) converts to <hi rend=“background(yellow)”>Yellow highlight</hi>,Superscript converts to <hi rend=“superscript”>Superscript</hi>, and Subscript converts to<hi rend=“subscript”>Subscript</hi>.

 

This is of course useful if you want these exact tags reflected in your final TEI, but once the TEI comes out of OxGarage, you can use the find and replace function in OxygenXML (or some other text/XML editor) to convert these tags into other tags. More on this below.

Native MS Word formatting works very well and can represent a very large number of TEI tags (using just text color and highlight would give you 75 pseudocodes mapping to 75 TEI tags or tag/abbreviation combinations), but there are definitely cases when you would want to use punctuation marks instead.

Punctuation Marks

You can use punctuation marks to set text apart that might not correspond 1:1 with a TEI tag. These are cases, such as expanded abbreviations or corrected readings, where you need tags nested within tags. Brackets work particularly well for this, especially various combinations of brackets. You do need to be careful about configuring bracket combinations, particularly when you’ll have brackets nested within brackets, and (as will also be mentioned later) the order in which you find & replace brackets later will also be relevant. This isn’t a matter to be taken lightly. You should test your pseudocodes and find & replace expressions on a section of text before encoding a full text.

Here is an example using the first line of Genesis 3, from University of Pennsylvania MS. Codex 236, fol. 31r

Genesis 3:1, UPenn Ms. Codex 236, fol. 31r

 

 

 

The text in this line reads:

sed et serpens erat callidior cūctis aīantib t̄

This includes a number of abbreviations that we could expand silently, or we could encode them in TEI in a few different ways. (For more information see the TEI Guidelines 11.3.1.2, “Abbreviations and Expansion”) Options include:

Noting that a word contains an abbreviation, without expanding it. In this example we put <abbr> tags around the complete word, and <am> tags around the abbreviated letter:

sed et serpens erat callidior <abbr>c<am>ū</am>ctis</abbr> <abbr>a<am>ī</am>anti<am>b</am></abbr> <abbr><am></am></abbr>

In word, you might choose pseudocodes using nested brackets. In this case, [[]] will be converted later into <abbr></abbr>, and [] (nested within [[]]) will be converted to <am></am>:

sed et serpens erat callidior [[c[ū]ctis]] [[a[ī]antib]] [[[t̄]…]]

Alternatively, you might choose to encode both abbreviation and expansion, and enable the system to choose between them. In this example we add <expan> and <ex> tags to the mix alongside <abbr> and <am>, and then include <choice> to make it clear that the abbreviations and expansions come in pairs.:

sed et serpens erat callidior

<choice>

<abbr>c<am>ū</am>ctis</abbr>

<expan>c<ex>un</ex>ctis</expan>

</choice>

<choice>

<abbr>a<am>ī</am>anti<am>b</am></abbr>

<expan>a<ex>nim</ex>antib</expan>

</choice>

<choice>

<abbr><am></am></abbr>

<expan><ex>ter</ex></expan>

</choice>

As above, you can come up with combinations of marks that you can use to indicate the encoding. In this case <abbr> and <am> are encoded as above, || will later be converted to <choice>, {{}} will be converted later into <expan></expan>, and {} (nested within {{}}) will be converted to <ex></ex>:

sed et serpens erat callidior |[[c[ū]ctis]]{{c{un}ctis}}| |[[a[ī]anti[b]]]{{a{nim}anti{bus}}}| |[[[t̄]]]{{{ter}…}}|

I like to group brackets of the same type together (as here, where square brackets are used for abbreviations, curly brackets for expansions, and pipes for choice) but you can also combine them in various ways for more options. For example, here are the bracketing options for a project I’m currently working on:

In all cases you need to be very careful that the punctuation marks you use don’t appear in your text, or only use them in combinations that don’t appear in your text, or else you will accidentally create TEI tags where you don’t want them.

Convert Word to TEI in OxGarage

Once you’ve transcribed and entered pseudocodes in MS Word, it’s time to convert your Word file into TEI. You can do this using OxGarage, a conversion service provided by the TEI. OxGarage has an online interface where you can convert one document at a time, described here, but you can also download the XSLTS from GitHub and run bulk conversion processes (converting multiple files at one time).

The online OxGarage interface is at http://www.tei-c.org/oxgarage/. You need to indicate that you are converting Documents, then select your options from (Microsoft Word doc or docx) and to (TEI P5), then load in your Word document and click “Convert”. Here is my input file (so you can download it and try this yourself), and a screenshot: 
OxGarage will generate a TEI file with a template header (including information gleaned from the Word document) and the textual content of the Word doc converted into very basic TEI (file here; you’ll need to change the file extension to .xml):

You can see here how the Word comment is converted into a <note> nested in <hi>, with a <date> included. I also included some red text (to indicate the tyronian et symbol) which has converted as expected. The punctuation mark pseudocodes are unchanged.

Replace Pseudocodes with TEI Tags

This is where we replace those pseudocodes – the <hi> color tags and the combinations of punctuation marks – into TEI. I like to do this in OxygenXML, because that software has advanced and advanced find & replace that enables you to search using regular expressions, including the ability to save pieces of what is being searched and reusing that in the replace (a bit like setting a variable in the search).[2]

As mentioned above the order in which you replace tags matters. You will always want to replace the outermost pseudotags first, then the interior ones, because the find & replace will always match from the first instance of a character in the regular expression to the last instance. This means that if you have […] (for <am>) nested inside [[…]] (for <abbr>) you need to replace the [[…]] before the […] or else you will end up with <am> around the word, and there will be no match when you then search for [[…]].

For example, to find [[…]] and replace it with <abbr>, using Find/Replace with the “Regular Expression” and “Dot matches all” boxes checked, you would search for:

\[\[([^\s]*)\]\] (this is a regular expression that will find every instance where a string of any character except spaces (\s), enclosed by [[ and ]] . The central part of the expression is enclosed with parentheses because we’re going to reuse that in the replace. The square brackets are preceded by \ to ensure they are considered as characters and not as part of a regular expression)

And replace that with

<abbr>$1</abbr> (This will replace the [[ and ]] with the closing and ending tags, and copy everything else in the middle – $1 refers to the piece of the search that was enclosed in parentheses)

Unfortunately, if you have an abbreviated word that starts on one line and ends on the following line (as we do here – the last word on this line is terre, but the re are on the next line) this regular expression won’t catch it because it ignores all spaces. So I do two sets of finds for each set of pseudocodes: one using the expression above, which ignores spaces, and a second one which includes spaces.

\[\[(.*)\]\] (replace with <abbr>$1</abbr>) as above

You don’t want to include spaces in your first search because if you have multiple sets of the same pseudocode in your document (which you probably do), the regular expression will include all the spaces so will only find the very first and the very last instance of the double brackets and you’ll end up with this:

The regular expression has matched the first [[ (on line 43) and the last]] (on line 46), but there are many in between that are missed because spaces are included.

Starting with the first search followed immediately with the second gives you:

Similarly, tag abbreviations by replacing |…| with <choice></choice> and {{..}} with <expan></expan>  – all the outer nesting has been replaced with TEI tags:

When you have multiple codes that may be nested in a single tag (as the multiple [] and {} now within <abbr> and <expan>) you need to modify the regular expression again, so it catches every matching pair of brackets.

\{([^\s\}]*)\} (Note the \} now within the square brackets. This will keep the expression from moving past the first closing bracket)

The result is a complete set of TEI tags encoding abbreviations and expansions (result file here, change the file extension to .txt).

 

You can also use OxygenXML’s find & replace function to replace the pseudocode TEI tags, or you can be fancy and write an XSLT to do that work. In this example, I want to replace the <hi rend=”color(FF0000)”> with <g ref=”#t_et”> (I’ll add a corresponding <glyph> tag to the <charDecl> section of the header as described in 5.5.2 of the TEI Guidelines). This is fairly straightforward, since I know the content of the tag will always be “et” I can do a find & replace for the whole thing. If the content of the tag varies, I can use a regular expression as I did above to copy content from find to replace.

And that’s the workflow. It’s still a lot of work, you need a strong handle on the TEI and you need to plan everything in advance. But if you are working with a large number of people transcribing and advanced TEI training isn’t possible or desirable. 

[1] EVT for example has specific requirements for tags it can process and how those tags need to be formatted, and if your TEI doesn’t meet those requirements it won’t work out of the box – you’ll need to modify the EVT code to suit your TEI.

[2] For more information and tutorials on Regular Expressions, visit https://regexone.com/.

Using VisColl to Visualize Parker on the Web: Reports on an experiment

This is the full text of a talk I presented at the Parker on the Web 2.0 Symposium in Cambridge on March 16, 2018 (Please note addendum at the end which addresses an issue that came up in discussion later in the day.)

I want to begin my presentation by talking about interface.

DATA OVER INTERFACE

A couple of years ago I presented a keynote at a digital humanities conference on digital editing in which I made the argument that data for a project should take precedence over the interfaces used to present that data. (I stole this idea from my colleague Doug Emery, and I liked it so much, I had it put on a teeshirt).  In my talk today I want to investigate how data and interface work together, how existing interfaces can influence both the data we gather and the development of new interfaces, and some ways that we can think around existing interfaces to develop new ones (and what this in turn means for our data).

 

This is MS 433, a Miscellany copied in a number of hands from the 13th into the late 15th century. If you want to see this manuscript, you have a few different options, which you can access through the menu in the top right.

The options are: Image View, Book View, Scroll View, and Gallery View. You probably know exactly what you’ll get when you make a selection here: Image View will present you with a single image, Book View will show the book openings, also known as facing pages (as Dr Anne McLaughlin said in her introduction at the Symposium, Book View presents the images “as a book, so when I turn the pages, it looks like a book”), Scroll View will show all the page images in a continuous row that you can scroll through back and forth, and Gallery View will show all page images as thumbnails in a single page.

Each of these views serves a different purpose: Image View, Book View, and Scrolling view present the images in a size large enough to read, with slightly different methods for moving through the book, while gallery view is more like a finding tool that also gives you the ability to get the “sense” of the aesthetic contents of a book: the relative size of script and written area, distribution of illuminations or miniatures, that kind of thing (as Anne said in her introduction, in this view you can “look at the whole thing – look for initials, for something pretty to look at”). You wouldn’t read a text in the Gallery view, you would select an image from that view and then interact with that larger image (clicking on a thumbnail in the Gallery view on Parker takes you to the Image View).

I want to consider for a moment why we present digital manuscript images in these ways. Let’s start by looking at some examples of non-digital manuscript facsimiles.

The Exeter book of Old English poetry . London, Printed and Pub. for the Dean and Chapter of Exeter Cathedral by P. Lund, Humphries & Co., ltd., 1933. Limited to twelve copies, unnumbered and not for sale and two hundred and fifty copies numbered and for sale of which this is no. 182 PR 1490 .A1 1933 Special Coll Oversize (University of Arizona)

For example, here’s an opening from the 1933 Early English Manuscripts in Facsimile facsimile of the Exeter Book. The pages face each other in the manuscript (this is 65b and 66a), but they’ve been decontextualized, presented in frames and with labels underneath.

Bestiario di Peterborough. Rome: Salerno Editrice, 2004

Compare this with the Salerno Editrice edition of the Peterborough Bestiary, published in 2004, which looks very much like what I imagine the manuscript looks like (I haven’t seen it so I can’t say for sure, but it definitely looks like a manuscript, unlike the Exeter Book facsimile, which looks like pictures of pages reproduced in a modern book).

Microfilm reader and microfilm,
https://blogs.acu.edu/csart/2017/02/08/from-microfilm-to-mass-media-biblical-manuscripts-in-the-digital-age/

And here’s something that is probably familiar to many of us: Microfilm, which presents images on a long ribbon of film, which you scroll through a special machine to find whichever page you want.

 

Microfiche Reader, linked example from https://www.abc-clio.com/ODLIS/odlis_m.aspx#microfiche

Finally, there’s Microfiche, which consists of rectangles of film onto which small images of pages are presented in a grid.

I expect you can see where I’m going with this, because I’m not exactly being subtle. The options for viewing manuscripts in Parker on the Web are basically the same as they have always been. The difference is that instead of having to go to a library to check out a book or access a reader (or order a book through interlibrary loan, if your library doesn’t own it), that you can access them in your office, or at your house, at all times of day (as long as your Internet is working, and the system isn’t down).

It’s not just the Mirador Viewer (the interface that provides image access in Parker on the Web) that has these options, every online environment for viewing medieval manuscripts will have some similar setup with at least a page-turning interface and frequently a selection of the other three. E-codices is the only interface I know of that has another option: to view the front and back of a leaf at the same time, which is pretty cool (The Scroll View also shows the front and back of leaves side by side, but in e-codices you can purposefully select this view. If you know of any other interface with unique views I would be very happy to know about them).

A System: Data + Processes

Why is it the case that all manuscript libraries have basically the same interfaces? One reason is probably because, as we can see from the non-digital examples above, that’s the way we’ve always done it. We are used to seeing manuscripts as single pages, and facing pages, and scrolling pages, and galleries of pages, so that’s how we present them digitally. But it becomes a self-fulfilling prophecy. This is how we view manuscripts, so we create systems that allow us to look at manuscripts in this way. If we want to look at manuscripts in a different way, we need to build new systems. Keep in mind that in a computer system you have two things that need to work together: You need data (information presented in a format that the computer can work with), and you need processes (software or scripts that take that data and do something with it). (This is a really simplified view, of course, but I think it works pretty well)

Parker on the Web: A IIIF System

Parker on the Web, for example, is a IIIF system, so in order to function it needs IIIF Manifests, which provide metadata in a specific format in addition to links to images served in a specific way, and it needs the IIIF server to serve the images, and the IIIF APIs (or more properly, software built to work with the APIs). If any piece of this system doesn’t meet specification – if the manifest is formatted incorrectly, or the image links don’t point to a IIIIF image server, or the software doesn’t reference the APIs correctly – the system won’t work. Without both data and processes – data and processes designed to work together – you won’t have a working system.

I’m interested in creating new ways to present digitized manuscripts, and the frame I’m using is that of the manuscripts collation. Rather than displaying a digitized manuscript only as a series of images of pages arranged from beginning to end, I want to create displays that take into account pages as leaves connected to each other through the pattern of the quiring: the collation.

M. R. James, A Descriptive Catalogue of the Manuscripts in the Fitzwilliam Museum (Cambridge University Press, 1895)

Collation isn’t a new way to think about manuscripts. In his 1895 Catalog of the Fitzwilliam Museum, M. R. James wrote a description of how to collate a manuscript, and also included collation formulas in most of the manuscript descriptions (a few random examples are in the figure below).

Examples of collation formulas from M. R. James’ Fitzwilliam catalogue

The Parker on the Web also includes collation formulas.

Building a system that considers a manuscript’s quiring in the display should be possible. We have the information (in the form of collation formulas), so we should be able to build processes to act on that. But of course it’s not that simple, because although a collation formula contains the information a person might need to construct a diagram of the codex, it isn’t formatted in a way that is able to be processed by a computer. It’s not an effective piece of data for a system of the type I describe above. The collation formula isn’t data, it’s a visualization of data, just one way to express the physical collation of a manuscript among many possibilities, and which visualization you choose will depend on what you want to do with it. For example, formulas work well in library manuscript descriptions or catalog records because they are compact and textual, while diagrams might be better suited for a scholarly essay or book because they can be annotated. There are other views one could take of the same information; I’m quite fond of this synoptic chart that shows how different texts and image cycles are dispersed through this miscellany.

However it takes work (both time and effort) to write formulas and draw diagrams and build charts. This is labor that doesn’t have to be repeated! What we need isn’t a formula, but a specially formatted, data-oriented description that can be turned into many different versions for different purposes.

VisColl as a system

This is where VisColl comes in. Briefly, VisColl is a system that consists of a data model, which is basically a set of rules, that you can use as a guide to build collation models of manuscripts, and then scripts that you can use to process the collation model to generate different views of that model. We currently have three working scripts: one that generates diagrams, one that generates a presentation of leaves as conjoins (which we call the bifolia view – this view requires digitized page images) and one that generates collation formulas. 

We are currently at an in-between stage with VisColl. We had a first version of our data model, and we have developed a second version but that one doesn’t have good visualizations yet, so today I’m going to talk about our first model, but I’m happy to answer questions about the second data model later.

The prototype of VisColl took collation formulas from The Walters Art Museum in Baltimore’s Digital Walters collection and generated diagrams directly from them. It was this prototype work that convinced me that generating data out of existing formulas was a terrible idea and would never work at scale. Nevertheless, I decided that the first step in my experiment would be to attempt to parse the collation formulas in Parker on the Web and convert them into XML files following the rules for our collation models. I wrote scripts to pull the formulas out of the Parker records and got to work figuring out rules that would describe the conversion from formula to XML. As I spent a few hours on this, I was reminded of why we decided to move from processing formulas to creating new models in the first place.

The collation formulas in Parker on the Web are inconsistent (this is not a criticism of Parker on the Web – the formulas come from different catalogues created over time by many different people, with no shared guidelines. The same thing would happen in any project that combines existing catalogs). Unlike with printed books, there is no standard for manuscript collation formulas, and the formulas in Parker on the Web have a lot of variance among them, notably that some use Arabic numerals, some Roman numerals, and some letters, while some describe flyleaves as quires and some do not. Because of the inconsistency, it was very difficult to get a handle on every single thing that would need to be caught by a process in order to convert every detail of a formula into an XML model. The use of letters, Arabic numerals, and Roman numerals is one example. In order to identify quires I would need a script that would be able to interpret each of these, to recognize when a quire identified by a letter was a set of flyleaves and when not, and to be able to generate multiple quires when presented with a span of numbers or letters.

Penn Collation Modeler

It is possible that this is something that could be done, given enough time and expertise, but given my constraints I was clearly not going to be able to do it myself for this talk. So instead, I turned to the Collation Modeler, which is the tool that we use at Penn to build models from scratch. (Another implementation of VisColl is being developed as part of the Digital Tools for Manuscript Study project by the Old Books New Science Lab at the University of Toronto)

If you have a collection formula to work from its actually pretty easy to build a model for it in the Collation modeler.

MS 433 in the Penn Collation Modeler

Here is MS 433 again, in the context of the collation modeler. Here in the main manuscript page, I’ve listed out all the quires in the manuscript and you can see the number of leaves in each. Using the collation modeler I can generate multiple regular quires all at once and then modify them, or create quires one-at-a-time. Folio numbers are generated automatically, but if the manuscript is paginated I need to change the folio numbers to page numbers (formatted as two numbers separated by a dash); to make this easier I wrote a script to fix the numbering in the finished collation model rather than doing it in the modeler (pagination will be built into the system for the new data model).

MS 433 Quire 3 in the Penn Collation Modeler

Taking a look at Quire 3, we can see the list of leaves, the folio numbering (which again I can change – we can also renumber completely from any point in the manuscript, if for example the numbering skips a leaf or numbers repeat). We can also note the “mode” of a leaf – is it original to the manuscript, added, a replacement, or missing? Once the model is built in the collation modeler, I output the collation model, which is an XML file.

MS 433 output: diagrams, bifolia view, and collation formula

And then I processed this model using the existing scripts and got diagrams, bifolia view, and a collation formula (the script actually generates a set of formulas – we can generate as many different flavors of formula as we need). You will note that the formula isn’t exactly like the formula from the record.

MS 433 collation formula: comparing the James formula and the VisColl-generated formula

That’s because the first data model is very simple and doesn’t indicate advanced things like gaps or quire groupings (indicated by “gap” and || dividers in this formula). This is actually something that can be done in the second data model, so hopefully by the end of the summer we’ll be able to output something that looks more like the record formula, and also include that information with the diagrams and bifolia view.

Once I decided to forego processing the formulas, work progressed more quickly. I was able to get 21 formulas into the collation modeler within a few hours spread out over two days. In addition to referencing the formula I would also reference the folio or page numbering in the manuscript description (which describes when page numbers are missing, repeated, or otherwise inconsistent), and at times I would reference the image files too (although I did that to double-check numbering, not to seek out physical clues to collation).

Work slowed down again when I discovered while entering the data that in several cases the foliation or pagination given in the record didn’t agree with the numbering required by the given collation formula. In the Collation Modeler you specify which folio or page aligns with which leaf in a quire – there should be a 1:1 correspondence between foliated leaves or pairs of paginated pages and leaves listed in the collation model. The first time I noticed this, a manuscript with several regular quires of 8 ended up with 8 more leaves required by the formula than were accounted for in the manuscript description. I figure that the person who made the formula got caught up in the regular quires and just added an extra one to the count, so I was comfortable removing one quire of 8 from the model. It’s not always this clear, however.

A few examples of instances where foliation and collation don’t add up

I have a list of manuscripts that I was unable to make models for because of slight variations between numbers of leaves needed by the model and the number of pages or folios listed in the description. Somebody would need to sit down with the manuscripts to see if the problem is with the formula or the foliation or pagination. (This shows one of the positive side effects of the collation modeling approach that I didn’t consider when we started, it can be used as a tool in the catalogers toolkit to double-check both the collation and numbering to ensure they align).

I’ve created a website that links brief records to the Parker on the Web and the diagram/bifolia collation views, just for fun (and I’m afraid it’s not very pretty). But if you’d like to see that you can visit parkercollations.omeka.net.

Although the combined diagram/bifolia view is interesting on its own, I’m most interested in how it might be combined with the more traditional facing-page view to provide an alternative access/navigation to digitized manuscripts. I’m currently co-PI on  Bibliotheca Philadelphiensis (BiblioPhilly), a collaboration between the University of Pennsylvania, the Free Library of Philadelphia, Lehigh University, and the Philadelphia Area Consortium of Special Collections Libraries (PACSCL) and funded by the Council on Library and Information Resources. BiblioPhilly serves to digitize all the medieval manuscripts in Philadelphia written in Europe before 1600 (476 of them, not including several hundred already digitized at Penn). We have incorporated collation modeling into our cataloging workflow, and we are working with a software developer to build an interface that will make the collation information an integral part of the experience.

Here are some mock-ups that I made to pass along to the software developers so they can see what I’m thinking about, but there will be a certain amount of back and forth with them, and others in the project are involved in this, so I’m not really sure what we’ll come up with but I’m excited to see it. And the reason we can do this is because we have the data. Now we can build the processes.

There’s no reason not to incorporate collation views of some kind into the navigation options of the Parker on the Web and other IIIF collections. There would need to be a standard way to model the collation within IIIF manifests, and then add a plug-in to the IIIF image viewers that takes advantage of that new data in new and interesting ways.

I hope that our experience with integrating VisColl into BiblioPhilly from the beginning, and my experiments building models from the Parker formulas for this talk, will encourage Parker on the Web and other libraries to develop more experimental interfaces for their digitized manuscripts.

Addendum

During his presentation “The Durham Library Recreated project,” Dr. Richard Higgins from Durham University Library suggested using the Bodleian Library’s Manifest Editor, one tool in their Digital Manuscripts Toolkit, to rearrange images so they present as bifolia in the facing-page view. Here is a screenshot of the manifest for MS 433 with the first quire rearranged as bifolia:

This works on one level: if you paged through this in an interface using a Book View, you would be presented with the conjoin leaves as sheets. But it’s really just another flat list of images, presented one after the other, just in a different order than they are in the book (Edit on 3/20/2018: It’s come to my attention that this is the general approach used by the Electronic Beowulf 4.0, in that edition’s collation navigation, so if you want to try paging through manuscript images organized by bifolia you can do it there. Instructions are here; be sure to select manuscript for both sides or else it’s not possible to click the collation option). This approach doesn’t really express the structural, three-dimensional aspect of the manuscript’s collation, so it can’t be used to generate alternative views (like diagrams or formulas). I think that a manifest like this could, however, be another kind of output from a collation model, but I think for IIIF it would make more sense to make the model part of the manifest, or something standard that IIIF APIs combine with manifests, to create any number of collation-aware views. 

Ceci n’est pas un manuscrit: Summary of Mellon Seminar, February 19th 2018

This post is a summary of a Mellon Seminar I presented at the Price Lab for Digital Humanities at the University of Pennsylvania on February 19th, 2018. I will be presenting an expanded version of this talk at the Rare Book School in Philadelphia, PA, on June 12th, 2018

In my talk for the Mellon Seminar I presented on three of my current projects, talked about what we gain and lose through digitization, and made a valiant attempt to relate my talk to the theme of the seminars for this semester, which is music and sound. (The page for the Mellon Seminars is here, although it only shows upcoming seminars.) I’m not sure how well that went, but I tried!

I started my talk by pointing out that medieval manuscripts are physical objects – sometimes very large objects! They have weight and size and heft, and unlike static objects like sculptures, manuscripts move. They need to move in order for us to read them. But digitized manuscripts – the ones you find for example in Penn in Hand, the page-turning interface for Penn’s digitized manuscript collection – don’t really move. Sure, we have an interface that gives the impression of turning the pages of the book, but those images are flat, static files that are just the latest version in a long history of facsimile copies of manuscripts. A page-turning interface for medieval manuscripts is the equivalent of taking a book, cutting the pages out, and then pasting those pages into a photo album. You can read the pages but you lose the sense of the book as a physical object.

It sounds like I’m complaining, but I’m really not. I like that digital photographs of manuscripts are readily available and relatively standard, but I do think it’s vitally important that people using them are aware of how they’re different from the “real” manuscript. So in my talk I spent some time deconstructing a screenshot from a manuscript in Penn in Hand (see above). It presents itself as a manuscript opening (that is, two facing pages), but it should be immediately apparent that this is a fake. This isn’t the opening in the book, it’s two photos placed side-by-side to give the impression of the opening of the book. There is a dark line down the center of the window which clearly delineates the photo on the left and the one on the right. You can see two gutters – the book only has one, of course, but each photo includes it – and you can also see a bit of the text on the facing page in each photo. From the way the text is angled you can tell that this book was not laid flat when it was photographed – it was held at or near a 90 degree angle (and here’s another lie – the impression that the page-turning interface gives us is that of a book laid flat. Very few manuscripts lay flat. So many lies!).

We can see in the left-hand photo the line of the edge of the glass, to the right of the gutter and just to the left of the black line. In our digitization lab we use a table with a spring-loaded top and a glass plate that lays down on the page to hold it flat. (You can see a two-part demo of the table on Facebook, Part One and Part Two) This means the photographer will always know where to focus the camera (that is, at the level of the glass plate), and as each page of the book is turned the pages are the same distance from the camera (hence the spring under the table top). I think it’s also important to know that when you’re looking at an opening in a digital manuscript, the two photos in that composite view were not taken one after the other; they were possibly taken hours apart. In SCETI, the digitization lab in the Penn Libraries, all the rectos (that is, the front of the page) are taken at one time, and then the versos (the back of the page) are taken, and then the system interleaves them. (For an excellent description of digital photography of books and issues around it please see Dr. Sarah Werner’s Pforzheimer Lecture at the Harry Ransom Center on Early Digital Facsimiles)

I moved from talking about how digital images served through page-turning interfaces provide one kind of mediated (~fake~) view of manuscripts to one of my ongoing projects that provides another kind of mediated (also fake?) view of manuscripts: video. I could talk and write for a long time about manuscript videos, and I am trying to summarize my talk and not present it in full, so I’ll just say that one advantage that videos have over digitized images is that they do give an impression of the “real” manuscript: the size of them, the way they move (Is it stiff? How far can it open? Is the binding loose or tight?), and – relevant to the Seminar theme! – how they sound. I didn’t really think about it when I started making the videos four years ago, but if you listen carefully in any of the videos you can hear the pages (and in some cases the bindings), and if you listen to several of them you can really tell the difference between how different types of parchment and paper sound. Our complete YouTube playlist of video orientations is here, but I’ll embed one of my favorites here. This is LJS 280, a 13th century copy of Decretales Gregorii IX in a 15th century chain binding that makes a lot of noise.

I don’t want to imply that videos are better than digital images – they just tell us something that digital images can’t. And digital images are useful in ways that videos aren’t. For one thing, if you’re watching a video you can see the way the book moves, but I’m the one moving it. It’s still a mediated experience, it’s just mediated in a different way. You can see how it moved at a specific time, in a specific situation, with a specific person. If you want to see folio 45v, you’re out of luck, because I didn’t turn to that page (and even if I had, the video resolution might not be high enough for you to read it; the video isn’t for reading – that’s why we have the digital images).

There’s another problem with videos.

In four years of the video orientation program, we have 74 videos online. We could have more if we made it a higher priority (and arguably we should), but each one takes time: for research, to set up and take down equipment, for the recording (sometimes multiple takes), and then for the processing. The videos are also part of the official record of the manuscript (we load them into the library’s institutional repository and link them to records in the library’s catalog) and doing that means additional work.

At this point I left videos behind and went back to digital images, but a specific project: Bibliotheca Philadelphiensis, which we call BiblioPhilly. BiblioPhilly is a major collaborative project to digitize medieval manuscripts from institutions across Philadelphia, organized by the Philadelphia Area Consortium of Special Collections Libraries (PACSCL) and funded by the Council on Library and Information Resources (CLIR). We’re just entering year three of a three-year grant, and when we’re done we’ll have 476 manuscripts online (we have around 130 online now). If you’re interested in checking out the manuscripts that are online, and to see what’s coming, you can visit our search and browse site here.

The relevance of BiblioPhilly in my talk is that we’re being experimental with the kind of data we’re creating in the cataloging work, and with how we use that data to provide new and different manuscript views.

Manuscript catalogers traditionally examine and describe the physical structure of the codex. Codex manuscripts start as sheets of parchment or paper, which are stacked and folded to create booklets called quires. Quires are then gathered together and sewn together to make a text block, then that is bound to make the codex. So describing the physical structure means answering a few questions: How many quires? How many leaves in each quire? Are there leaves that are missing? Are there leaves that are singletons (i.e., were never part of a sheet)? When a cataloger has answered these questions they traditionally describe the structure using a collation formula. The formula will list the quires, number of leaves in a quire, and any variations. For example, a manuscript with 10 quires, all of which have eight leaves except for quire six which has four, and there are some missing leaves, might have a formula like this:

1-4(8), 5(8, -4,5), 6(4), 7-10(8)

(Quires 1 through 4 have eight leaves, quire 5 had eight leaves but four and five are now missing, quire 6 has four leaves, and quires 7-10 have eight leaves)

The formula is standardized for printed books, but not for manuscripts.

Using tools developed through the research project VisColl, which is designing a data model and system for describing and visualizing the physical construction of manuscripts, we’re building models for the manuscripts as part of the BiblioPhilly cataloging process, and then using those models to generate the formulas that go into our records. This itself is good, but once we have models we can use them to visualize the manuscripts in other ways too. So if you go to the BiblioPhilly search and browse site and peek into the records, you’ll find that some of them include links to a “Collation View”

Following that link will take you to a page where you can see diagrams showing each quire, and image files organized to show how the leaves are physically connected through the quire (that is, the sheets that were originally bound together to form the quire).

Like the page-turning interface, this is giving us a false impression of what it would be like to deconstruct the manuscript and view it in a different way, but like the video is it also giving us a view of the manuscript that is based in some way on its physicality.

And this is where my talk ended. We had a really excellent question and answer session, which included a question about why I don’t wear gloves in the videos (my favorite question, which I answer here with a link to this blog post at the British Library) but also a lot of great discussion about why we digitize, and how, and why it matters, and how we can do it best.

Thanks so much to Glenda Goodman and Stewart Varner for inviting me, and to everyone who showed up.

 

Dot’s Twitter Bots

I made some Twitter bots! It was mostly very easy.

The bots I made use Zach Whalen’s SSBot, documented in “How to Make A Twitter Bot with Google Spreadsheets version 0.4” which includes all the information you need about how to link your Twitter account to the spreadsheet and start the bot tweeting. The only thing I’ll note is that the Spreadsheet’s “Project Key” (asked for in Step 4) is depreciated; you’ll need to use the Script ID instead (it’s located directly under the Project Key in the Spreadsheet’s Project Properties).

Once you link the Twitter account to the SSBot, you enter data in the spreadsheet and that data is what gets tweeted.

Here’s a list of bots I made:

For all but WhyBeBot I generated a list of 140 line strings and pasted it into column one of the “Select from Columns” tab in the SSBot spreadsheet. This was really the most difficult and interesting part, because in each case I had to figure out how to download and process the texts. For example, for CollationBot I had to figure out how to pull out just the collation formulas from the records, while for the full-text bots I had to download the texts, find the sentences, and ideally find sentences that were less than 140 characters (if you pay attention you can see that these bots were created over time, and I got much better later on about including only complete sentences). Clearly most of these bots were made before Twitter increased to 280 characters; I may go back and lengthen the strings someday.

WhyBeBot is a bit different. It takes advantage of SSBot’s ability to mix content among columns. Instead of just one column, WhyBeBot has four columns. The first contains only “Why be ” while the third contains only “when you can be “, and the second and fourth both have a randomly-generated list of a few hundred adjectives.

There are many other ways to make Twitter bots (I know that a lot of people have had good luck with Cheap Bots Done Quick – I’ve never tried it, maybe someday). I would like to do more bots, the setup is pretty simple and getting the content situated is a fun challenge.

Slides from OPenn Demo at the American Historical Association Meeting

This week I participated in a workshop organized by the Collections as Data project at the annual meeting of the American Historical Association in Washington, DC. The session was organized by Stewart Varner and Laurie Allen, who introduced the session, and the other participants were Clifford Anderson and Alex Galarza.

The stated aim of the session was “to spark conversations about using emerging digital approaches to study cultural heritage collections,” (I’ll copy the full workshop description at the end of this post) but all of our presentations ended up focusing on the labor involved in developing our projects. This was not planned, but it was good, and also interesting that all of us independently came to this conclusion.

Clifford’s presentation was about work being done by the Scholarly Communications team at Vanderbilt University Libraries as they convert data from legacy projects (which have tended to be purpose built, siloed, and bespoke) into more tractable, reusable open data, and Alex told us about the GAM Digital Archive Project, which is digitizing materials related to human rights violations in Guatemala. Both Clifford and Alex stressed the amount of time and effort it takes to do the work behind their projects. The audience was mainly history faculty and maybe a few graduate students, and I expect they, like me, wanted to make sure the audience understood that the issue of where data comes from is arguably more important than the existence of the data itself.

My own talk was about the University of Pennsylvania’s OPenn (Primary Digital Resources for Everyone), which if you know me you probably already know about. OPenn is the website in which the Kislak Center for Special Collections, Rare Books and Manuscripts publishes its digitized collections in the public domain, as well as hosting collections for many other institutions. This includes several libraries and archives around Philadelphia who are partners on the CLIR-funded Bibliotheca Philadelphiensis project (a collaboration with Lehigh University, the Free Library of Philadelphia, Penn, and the Philadelphia Area Consortium of Special Collections Libraries), which I always mention in talks these days (I’m a co-PI and much of the work of the project is being done at Penn). I also focused my talk on the labor of OPenn, mentioning the people involved and including slides on where the data in OPenn comes from, which I haven’t mentioned in a public talk before.

Ironically I ended up spending so much time talking about what OPenn is and how it works that I didn’t have time to show much of the data, or what you can do with it. But that ended up fitting the (unplanned) theme of the workshop, and the attendees seemed to appreciate it, so I consider it a success.

Here are my slides:

Workshop abstract (from this page):

The purpose of this workshop is to spark conversations about using emerging digital approaches to study cultural heritage collections. It will include a few demonstrations of history projects that make use of collection materials from galleries, libraries, archives, or museums (GLAM) in computational ways, or that address those materials as data. The group will also discuss a range of ways that historical collections can be transformed and creatively re-imagined as data. The workshop will include conversations about the ethical aspects of these kinds of transformations, as well as the potential avenues of exploration that are opened by historical materials treated as data. Part of an IMLS-funded National Digital Forum grant, this workshop will ultimately inform the development of recommendations that aim to support cultural heritage community efforts to make collections collections more readily amenable to computational use.