digital libraries Archives - Dot Porter Digital

February 16, 2022February 16, 2022

Manuscript Loss in Digital Contexts

Originally presented at the 14th Annual Schoenberg Symposium on Manuscript Studies in the Digital Age, November 17, 2021

Thank you for that kind and generous introduction and thanks to Lynn for inviting me to present this talk today. Thank you to all of you this afternoon, or this evening for those of you in Europe, for sticking around for my talk tonight on manuscript loss in digital contexts.

I want to do a couple of things with his paper and I’m not entirely sure that they go together so please bear with me. The first thing that I want to do is to look back over some specific things that have been said in the past about loss and manuscript digitization specifically. There’s quite a long history of both theorists and practitioners giving lectures and responding to the topic what we lose when we digitize a manuscript – how much digitized manuscripts lack in comparison with “the real thing” and all of the reasons why digitized manuscripts aren’t as good as the real thing because of these losses. I want to address those complaints and to respond to them with examples of work that that we’ve been doing at Penn that I think answers some of these issues. The second thing that I want to do is to showcase the Lightning Talks. Last month we put out a call for five minute presentations, for anybody who wanted to submit a paper on the issue of loss specifically in digital work and digitization – this was this was the ask that we put out:

“The theme of this year’s symposium is Loss and we are particularly interested in talks that focus on digital aspects of loss in manuscript studies.”
From the call for Lightning Talk proposals, Schoenberg Symposium 2021

I was pleasantly surprised by the submissions, which didn’t cover old ground and in some cases respond to some concerns that have been expressed about digitized manuscripts in the past.

In June 2013 ASG Edwards published a short essay called “Back to the real” in the Times literary supplement.[1] I cannot overstate the affect that this piece had on those of us who were creating digitized manuscripts. When this piece came out I’d been at Penn for about three months, having just started my position in April 2013, and although Penn had been digitizing its manuscripts for many years and there was an interface, Penn In Hand, which is still available, this was before OPenn, well before BiblioPhilly, before VisColl. The first version of Parker on the Web, which Edwards names in his piece, had been released on 1 October 2009, but it was behind a very high paywall, which was only lifted when Parker on the Web 2.0 was released in 2018. 2013 was also about a year after the Walters Art Museum had published The Digital Walters, and released that data under an open access license, and I came to Penn knowing that we wanted to recreate the Digital Walters here – the project that would become OPenn – so at that point in time we were thinking about logistics and technical details, how we could take the existing data Penn had and turn it into an open access collection explicitly for download and reuse as opposed to something that was accessed through an interface.

Mid 2013 was also when a lot of libraries were really starting to ramp up full-manuscript and full-collection digitization – digitization at scale, full books and full collections, as opposed to focusing only on the most precious books, or digitizing only sections of books. I also discovered that this was two years after the publication of “SharedCanvas: A Collaborative Model for Medieval Manuscript Layout Dissemination,” a little article published in the journal Digital Libraries in April 2011 by Robert Sanderson, Ben Albritton, and others, which is notable because it is what would eventually become the backbone of the International Image Interoperability Framework, or IIIF, which I will mention again.

Edwards’s piece has been cited and quoted many times over the eight years since it was published and it for good reason. Edwards doesn’t pull any punches, he says exactly what he’s thinking about with regards to digitized manuscripts and honestly as somebody who was in the thick of things in summer 2013 this so this came as a little bit of a bomb in the middle of that. Since this is such a such primary piece I want to look at his concerns and see how they hold up eight years later.

“The convenience of ready accessibility is beyond dispute, and one can see that there may be circumstances in which scholars do have a need for some sort of surrogate, whether of a complete manuscript or of selected bits. But the downsides are in fact many. One of the obvious limits of the virtual world is the size of the computer screen; it is often difficult for viewers to take in the scale of the object being presented.”
A. S. G. Edwards, “Back to the real”

The issue of interpreting the physical size of manuscripts in digital images is an obvious one and it’s one that I talk about a lot when I talk to classes and school groups. Edwards is correct – it is difficult to tell the size of an object when it’s presented in an in an online interface and there are a few different reasons for that. So I’m going to take a look at two manuscripts in comparison with the thinking that comparative analysis might be helpful.

Search results screenshot from the BiblioPhilly interface (https://bibliophilly.library.upenn.edu/)

Both of these manuscripts are from the Free Library of Philadelphia, and they were digitized as part of the Biblioteca Philadelphiensis project and I’m showing them here in the BiblioPhilly interface, which was created for this project. They’re both 15^th century manuscripts from England. FLP LC 14 19 is a copy of the old statutes, the statutes of England beginning with the Magna Carta, a document first issued in 1215 by King John (ruled 1199-1216), and FLP LC 14 10 is a copy of the new statutes, English legal statutes beginning with the reign of Edward III (ruled 1327-1377).

FLP LC 14 10 (New Statutes) is pictured on the left and FLP LC 14 19 (Old Statutes) is on the right. I present them to you like this to give you a sense of how these two manuscripts look on the surface, and I’m going to point out some things that I notice as we go back-and-forth that gives me clues to their relative sizes.

The old statutes and the new statutes, both of these manuscripts have clasps, in the case of the old statutes manuscript, or remnants of clasps in the case of the new statutes. I have a sense of how large the clasps are in comparison with the rest of the book so the clasps look bigger to me on the old statutes, they take up more space. The decoration also looks bigger on the old statutes cover. Both of these covers have decoration embossed on the front and the new statutes book has a lot more of them and they’re thin. Which implies to me that the new statute book is bigger than the old one.

Now let’s look inside – again, FLP LC 14 10 (New Statutes) is pictured on the left and FLP LC 14 19 (Old Statutes) is on the right. Immediately I notice that the new statutes manuscript has more lines and there appear to be more characters per line and the writing looks smaller than it does in the old statutes manuscript. So again, what this implies to me is that the new statutes manuscript is bigger because you can write more in it. Now I’ve seen enough manuscripts to know that this can be misleading. There are many very small manuscripts that contain tiny tiny writing – like Ms. Codex 1058 from Penn’s collection.

This is a glossed Psalter, and back in 2013 before I arrived at Penn I spent a lot of time looking at this manuscript online. The first time that I saw it in person I was absolutely floored because it is so much smaller than the writing implied to me – for reference, the codex is 40 mm, or 1.5 inches, shorter than the old statutes manuscript. It’s small, and by including it here I’m aware that I’m helping to prove Edward’s point – although comparing the two statutes manuscripts is helpful in coming up with size cues, it’s still really difficult to tell generally, at least in this interface.

“It is also difficult to discern distinctions between materials such as parchment and paper, and between different textures of ink.”
A. S. G. Edwards, “Back to the real”

Let’s turn back to the two statutes manuscripts again. We’ll look at the support – zoomed in to 100% so we’re very very close to the material.

If you know what parchment looks like and if you know what paper looks like I think that it’s clear that both of these manuscripts are written on parchment. One of these is darker than the other one, that could be because of the type of animal the parchment comes from, or variance between hair and flesh side, or it could be how it was produced. Also some of the things that you’ll look for in parchment like hair follicles are not immediately clear at least in these examples.

I haven’t yet mentioned training and the knowledge that you bring with you when you come to a digitized manuscript but I think that’s really important. If you are familiar with the statutes manuscripts, if you’ve seen several of them you know that the new statutes are much longer than the old statutes and so new statutes manuscripts are bigger and old statutes manuscripts are smaller, and that’s just something that you know. You probably didn’t know that but I knew that and that knowledge is reflected in what we’ve already looked at for those manuscripts.

It’s the same for parchment and paper. if you’ve been trained and you know what parchment looks like and you know what paper looks like then you’re going to be able, most of the time, to tell the difference between them in digital images.

This next example is from another manuscript, LJS 266 from the Schoenberg collection, and this is something that’s very typical in parchment manuscripts. That rounded area is an armpit of the animal, and you can also see some hair follicles around there. So this is very clearly a parchment manuscript and not a paper manuscript.

And then finally this last example, Bryn Mawr Ms. 4, is a paper manuscript. You should be able to tell, if you’ve studied paper, because there a chain lines there and also the way the paper is wearing around the edges. Parchment doesn’t flake like that, paper does that and being able to recognize that has nothing to do with whether you’re looking at it in person or in a digital image it looks the same either way. Whether or not you can tell the difference has more to do with your own training than with the fact that it’s an image. Now, we could talk about image quality, but most images that you access from an institution will be digitized to some set of best practices and guidelines. The images will also be presented alongside metadata – so if you’re not sure whether it’s paper or parchment, you can take a peek at the metadata and it will tell you.

One of the things I like to talk about when I discuss digitized manuscript is the concept of mediation, how one of the things that digitization does is it mediates our experience with the physical object. The people who play a part in that mediation – photographers, and software developers who build the systems and interfaces, and catalogers. Metadata is another example of that. We trust that the mediation is being done effectively; we trust that the cataloger knows the difference between parchment and paper and that this information is correct.

Ink can definitely be an issue in digitization, but I’m going to step sideways and talk instead for a moment about gold leaf. Gold leaf is notoriously difficult to photograph in the way that manuscripts are normally digitized – in a lab, on a cradle or a table, perhaps with a glass plate between the book and the camera holding the page flat. But gold leaf was made to move, was made to be seen in candlelight where it would gleam.

Here is an example of the difference between the same page digitized in a lab vs. an animated gif taken in a classroom – and yes, this is the same page (Ms. Codex 2032, f. 60r). While the gold in the gif shines when it is passed through the light, the gold in the still image looks almost black. Two things: again, because I’m trained, I know that’s gold in the still image even though it doesn’t look like gold, because I know what can happen to the appearance of gold when it’s photographed under normal lab conditions. And now you do, too. Just because it doesn’t shine like gold doesn’t mean it isn’t gold. Two, we don’t include images like the first one in our record, and that’s a choice. We make lots of choices about what we include and what we don’t, many of them made because of time issues or cost. We could make different decisions, we could choose to include images taken under different sorts of lighting and include them in the record, the technology is there.

“Often we can’t tell what the overall structure of the work is like, how many leaves it has, and whether it contains any cancelled leaves…”
A. S. G. Edwards, “Back to the real”

Now it’s really my time to shine because this concern about the loss of the structure of manuscripts in digitization is why I started the VisColl project with Alberto Campagnolo and Doug Emery back in 2013; it was one of the first things I did after I came to work at Penn.

Screenshot of a book of hours modeled in VCEditor, the current software implementation of the VisColl data model

The aim of the VisColl project has been to create a data model and a software system for modeling and visualizing the structure of codex manuscripts. You can use on your own to help you with the study of individual manuscripts but it was also designed for use by manuscript catalogers, and in fact has been used in the BiblioPhilly project to create models that are presented alongside or integrated with the usual sort of page turning digital interfaces for manuscript collections, to provide a different kind of view. This sort of view that Edwards was concerned about, something that enables us to see what the structure is and see how long the manuscripts are.

So we’ll go back again to the statutes manuscripts. I mentioned earlier that I know that old statutes manuscripts are smaller than new statutes manuscripts both in terms of physical size of the covers and also in the size of the text and that is reflected here. The old statutes manuscript on the left has 21 quires, mostly of eight leaves, and the new statutes manuscript has 52 quires, also of eight leaves. It is a very large manuscript, a thick manuscript, and VisColl provides a way for us to show this size in our interfaces in a way that isn’t normally done.

But making a model with VisColl is work, and it is not the only way to see the size of a manuscript. You can also tell the size of a manuscript by looking at its spine and edges, and this is coming around again to the issue of the choices that we – we, the institutions and libraries – make when we present these manuscripts.

This is a gallery view for LC 14 19 in the BiblioPhilly interface. If you scroll all the way to the end of the manuscript you can see that the presentation ends with the back cover, which makes sense, since when we close a book we see the back cover. But if we go look at the same manuscript in OPenn, which is the collection on our website where we make the data available, you’ll see that there are more images of the book there. BiblioPhilly takes this data and makes an interface that’s user-friendly, but OPenn is more like a bucket of stuff.

So we go to the bucket of stuff and we scroll down to the images and here are all the images, no page turning interface just image after image. We scroll all the way down and we’re going to find some images that don’t appear on the interface. There are images of the spine and the fore edges. These are available but we made a decision not to include those in the main BiblioPhilly interface, neither in the page-turning view or in the gallery view.

These are available in BiblioPhilly but you have to go to the “Binding Images” tab to find them. You have to know to go there; they are categorized as something different, something special, and not part of the main view. This is another choice that we made in designing the interface.

“… and we can rarely be confident that the colours have been reproduced accurately.”
A. S. G. Edwards, “Back to the real”

I can report that at this point in time there are standards and guidelines and best practices for ensuring that digital images have color correction checked against the manuscript. One of the ways that we ensure this is by including a color bar in either a reference image or occasionally in every single image. There are institutions, I believe that the British library is one of these, where you will see a color bar in every single frame.

At Penn we do include color bars and photographs but they are trimmed out as part of post processing. We do however maintain reference images that, as well as the photos of the spine and the fore edges, are available on OPenn and you can see them here. So here is a reference image for the old statutes manuscript that we’ve been looking at. In addition to being a color correction aid, the color bar also serves as a ruler. So here you can see that the color bar is longer than the manuscript.

Now if we look at the reference image for the new statutes manuscript you’ll see how much larger the book is in reference to the color bar. So this is getting back again to our starting issue of how big is the manuscript. It’s possible to see how big the manuscripts are because we have these reference images, but sometimes they are hard to find in the interface. And again this comes down to the decisions that we are making in terms of what is easy for you the user to see and find in our collections and through our interfaces.

Size information also comes along in the metadata. We’ve already looked at the metadata before when we were looking at the paper versus parchment question, and you can see here we also have information about the physical size of the manuscript. So even if you can’t tell by the cues in the image, if you don’t know anything about the genre which might help you know the size the physical size, if you don’t have access to a reference image with some kind of ruler or color bar that might give you an indication of the size, you might still have information in the metadata that should be easily accessible in whatever interface you’re using.

And now I want to start the pivot to the Schoenberg Symposium lightning talks (click here for the complete playlist) because one of our lightning talk speakers talks quite a bit about metadata and image coming along together. Lisa Fagin Davis in her talk “IIIF, Fragmentology, and the Digital Remediation of 20th-c. Biblioclasm” talks about IIIF, which I’ve already mentioned, the international image interoperability framework, and it’s use particularly with fragments in the study of what we’re now calling fragmentology. In her lightning talk, Dr Davis talks about how you can add images to a shared interface and it brings the metadata along with it. All of this current contextual information that Edwards was very worried about actually becomes an integral part of image sharing. So in IIIF you’re not just sharing an image file you’re sharing a lot of information along with it.

The last word that I want to give to Dr. Edwards is his final comments in this section of his piece. He says a lot more after this but I really love this question: “Are digital surrogates not really just a new, more expensive form of microfilm?” To which I say yes, and…

There’s just so much that you can do with digitized images and our lightning talks speak to this so I want to go through the lightning talks and talk about how their concerns answer or reflect Edwards’s own concerns.

Including Lisa’s talk, three other talks focus on interacting with digital images in platforms to work towards a scholarly aim. Chris Nighman from Wilfrid Laurier University in his talk “Loss and recovery in Manuscripts for the CLIMO Project” gives an overview of his project to edit Burgundio of Pisa’s translation of John Chrysostom’s homilies on the Gospel of Matthew, which he rendered at the request of Pope Eugenius III in 1151. The apparent presentation copy Burgundio prepared for the pope survives as MS Vat. lat. 383, which is provided online by the BAV, but unfortunately the manuscript is lacking two pages. However Nighman is able to restore the text from another copy of the same text, MS Vat. lat. 384, which is also available online.

In her talk “A Lost and Found Ending of the Gospel of Mark,” Claire Clivaz presents the Mark 16 project, which is seeking to create a new edition – the first edition – of the alternative ending of the Gospel of Mark, which although well attested in many languages, is usually ignored by scholars as being marginal or unimportant. And in their talk “Lost in Transcription: EMROC, Recipe Books, and Knowledge in the Making,” Margaret Simon, Hillary Nunn, and Jennifer Munroe illustrate how they are using a shared transcription platform, From the Page, to create the first complete transcription of the Lady Sedley’s 1686 manuscript recipe book, which has only been partially transcribed in the past, the sections of the text relating to women’s concerns in particular having been ignored.

In all four of these talks, rather than expressing discontent on the loss that happens when materials are digitized, (or, in Chris Nighman’s talk, complaining about the watermarks added by the Vatican digital library), the presenters are using digital technology to help fill in losses that are wholly unrelated to digitization.

In her talk “Loss and Gain in Indo-Persian Manuscripts,” Hallie Nell Swanson provides a fascinating overview of the various ways that Indo-Persian manuscripts have been used and misused, cut apart and put back together, over time. For her, digitization is primarily useful as an access point; until recently, Swanson has only been able to access these books virtually, and yet she’s able to make compelling arguments about them.

Kate Falardeau’s talk, “London, British Library, Add. MS 19725: Loss and Wholeness,” is particularly compelling for me, because I wonder if it’s the kind of presentation that would only be made in a digital context, although the paper itself is not “digital.” Digitization has normalized fragmentation in a way not seen before now; we’re used to seeing leaves floating around, disbound and disembodied, and Kate’s argument that an incomplete, fragmentary copy of Bede’s Martyrology might nevertheless be considered whole within its own context is an idea that works today but might not have been conceived at all back in 2013.

In another non-digital talk that uses digital technology in a completely different way, William Stoneman presents on “George Clifford Thomas (1839-1909) of Philadelphia: Lost in Transition,” a 19th century bibliophile who is often overlooked within the context of the history of Philadelphia book collecting, even though the books he owned later passed through more well-known hands and now reside in some of the world’s top libraries. Stoneman points to the Schoenberg Database of Manuscripts, directed by my colleague Lynn Ransom, which is a provenance database, that is it traces the ownership of manuscripts over time – including manuscripts owned by George Clifford Thomas.

In his entertaining and illuminating talk, “Extreme Loss and Subtle Discoveries: The Corpus of Sotades of Maronea,” Mark Saltveit presents on recently discovered lines of text by the ancient poet Sotades – discoveries that were made entirely by reconsidering quotes and identification of poets in earlier texts, and not at all through any kind of digital work (which is part of his point)

Finally, of all the Lightning Talks, Mary M. Alcaro takes a more critical approach in her talk “Closing the Book on Kanuti: Lost Authorship & Digital Archives.” This Kanuti, we discover, was attributed the authorship of “A litil boke for the Pestilence” In the 15^th century, and this authorship followed the text until 2010, when Kari Anne Rand said in no uncertain terms that this Kanuti was not the author. But no matter – online catalogs still list him as the author. A problem for sure!

I want to close by pushing back a little on the accepted knowledge that digitization only causes loss. I think it does, but there are other ways that we can talk about digitization too, which may sit alongside the concept of loss and which might help us respond to it.

I’ve talked about mediation a bit during this talk – the idea that the digitized object is mediated through people and software, and this mediation provides a different way for users to have a relationship with the physical object. The decisions that the collection and interface creators make have a huge impact on how this mediation functions and how people “see” manuscripts out the other side, and we need to take that seriously when we choose what we show and what we hide.

Transformation: basically, digitization transforms the physical object into something else. There is loss in comparison with the original, but there is also gain. It’s much easier to take apart a digital manuscript than a physical one. About that…

I’ve talked before about how digitization is essentially a deconstruction, breaking down a manuscript into individual leaves, and interfaces are ways to reconstruct the manuscript again. It’s typical to rebuild a manuscript as it exists, that’s what we do in most interfaces, but it also enables things like Fragmentarium, or VisColl, where we can pull together materials that have long been separated, or organize an object in a different order.

Finally, my colleague Whitney Trettien, Assistant professor of English at Penn, claims the term creative destruction (currently used primarily in an economic context) and applies it to the work of the late seventeenth century shoemaker and bibliophile John Bagford, who took fragments of parchment and paper and created great scrapbooks from them – as she says in this context, “creative destruction with text technologies is not the oppositional bête noire of inquiry but rather is its generative force.”[2] Why must we insist that digitized copies of manuscripts reflect the physical object? Why not claim the pieces as our own and do completely new things with them?

Digitization is lossy, yes. But it can also generate something new.

I must say that to see a manuscript will never be replaced by a digital tool or feature. A scholar knows that a face to face meeting with a manuscript is something that can be replaced by nothing else.
Claire Clivaz, “A Lost and Found Ending of the Gospel of Mark”

Claire Clivaz is correct – a digital thing will never replace the physical object. But I think that’s okay; it doesn’t have to. It can be its own thing.

[1] Edwards, A. S. G. “Back to the real?” The Times Literary Supplement, no. 5749, 7 June 2013, p. 15. The Times Literary Supplement Historical Archive.

[2] Trettien, Whitney. “Creative Destruction and the Digital Humanities,” The Routledge Research Companion to Digital Medieval Literature and Culture, Ed. Jen Boyle and Helen Burgess, 2018, pp. 47-60.

March 4, 2018

Ceci n’est pas un manuscrit: Summary of Mellon Seminar, February 19th 2018

This post is a summary of a Mellon Seminar I presented at the Price Lab for Digital Humanities at the University of Pennsylvania on February 19th, 2018. I will be presenting an expanded version of this talk at the Rare Book School in Philadelphia, PA, on June 12th, 2018

In my talk for the Mellon Seminar I presented on three of my current projects, talked about what we gain and lose through digitization, and made a valiant attempt to relate my talk to the theme of the seminars for this semester, which is music and sound. (The page for the Mellon Seminars is here, although it only shows upcoming seminars.) I’m not sure how well that went, but I tried!

I started my talk by pointing out that medieval manuscripts are physical objects – sometimes very large objects! They have weight and size and heft, and unlike static objects like sculptures, manuscripts move. They need to move in order for us to read them. But digitized manuscripts – the ones you find for example in Penn in Hand, the page-turning interface for Penn’s digitized manuscript collection – don’t really move. Sure, we have an interface that gives the impression of turning the pages of the book, but those images are flat, static files that are just the latest version in a long history of facsimile copies of manuscripts. A page-turning interface for medieval manuscripts is the equivalent of taking a book, cutting the pages out, and then pasting those pages into a photo album. You can read the pages but you lose the sense of the book as a physical object.

It sounds like I’m complaining, but I’m really not. I like that digital photographs of manuscripts are readily available and relatively standard, but I do think it’s vitally important that people using them are aware of how they’re different from the “real” manuscript. So in my talk I spent some time deconstructing a screenshot from a manuscript in Penn in Hand (see above). It presents itself as a manuscript opening (that is, two facing pages), but it should be immediately apparent that this is a fake. This isn’t the opening in the book, it’s two photos placed side-by-side to give the impression of the opening of the book. There is a dark line down the center of the window which clearly delineates the photo on the left and the one on the right. You can see two gutters – the book only has one, of course, but each photo includes it – and you can also see a bit of the text on the facing page in each photo. From the way the text is angled you can tell that this book was not laid flat when it was photographed – it was held at or near a 90 degree angle (and here’s another lie – the impression that the page-turning interface gives us is that of a book laid flat. Very few manuscripts lay flat. So many lies!).

We can see in the left-hand photo the line of the edge of the glass, to the right of the gutter and just to the left of the black line. In our digitization lab we use a table with a spring-loaded top and a glass plate that lays down on the page to hold it flat. (You can see a two-part demo of the table on Facebook, Part One and Part Two) This means the photographer will always know where to focus the camera (that is, at the level of the glass plate), and as each page of the book is turned the pages are the same distance from the camera (hence the spring under the table top). I think it’s also important to know that when you’re looking at an opening in a digital manuscript, the two photos in that composite view were not taken one after the other; they were possibly taken hours apart. In SCETI, the digitization lab in the Penn Libraries, all the rectos (that is, the front of the page) are taken at one time, and then the versos (the back of the page) are taken, and then the system interleaves them. (For an excellent description of digital photography of books and issues around it please see Dr. Sarah Werner’s Pforzheimer Lecture at the Harry Ransom Center on Early Digital Facsimiles)

I moved from talking about how digital images served through page-turning interfaces provide one kind of mediated (~fake~) view of manuscripts to one of my ongoing projects that provides another kind of mediated (also fake?) view of manuscripts: video. I could talk and write for a long time about manuscript videos, and I am trying to summarize my talk and not present it in full, so I’ll just say that one advantage that videos have over digitized images is that they do give an impression of the “real” manuscript: the size of them, the way they move (Is it stiff? How far can it open? Is the binding loose or tight?), and – relevant to the Seminar theme! – how they sound. I didn’t really think about it when I started making the videos four years ago, but if you listen carefully in any of the videos you can hear the pages (and in some cases the bindings), and if you listen to several of them you can really tell the difference between how different types of parchment and paper sound. Our complete YouTube playlist of video orientations is here, but I’ll embed one of my favorites here. This is LJS 280, a 13th century copy of Decretales Gregorii IX in a 15th century chain binding that makes a lot of noise.

I don’t want to imply that videos are better than digital images – they just tell us something that digital images can’t. And digital images are useful in ways that videos aren’t. For one thing, if you’re watching a video you can see the way the book moves, but I’m the one moving it. It’s still a mediated experience, it’s just mediated in a different way. You can see how it moved at a specific time, in a specific situation, with a specific person. If you want to see folio 45v, you’re out of luck, because I didn’t turn to that page (and even if I had, the video resolution might not be high enough for you to read it; the video isn’t for reading – that’s why we have the digital images).

There’s another problem with videos.

In four years of the video orientation program, we have 74 videos online. We could have more if we made it a higher priority (and arguably we should), but each one takes time: for research, to set up and take down equipment, for the recording (sometimes multiple takes), and then for the processing. The videos are also part of the official record of the manuscript (we load them into the library’s institutional repository and link them to records in the library’s catalog) and doing that means additional work.

At this point I left videos behind and went back to digital images, but a specific project: Bibliotheca Philadelphiensis, which we call BiblioPhilly. BiblioPhilly is a major collaborative project to digitize medieval manuscripts from institutions across Philadelphia, organized by the Philadelphia Area Consortium of Special Collections Libraries (PACSCL) and funded by the Council on Library and Information Resources (CLIR). We’re just entering year three of a three-year grant, and when we’re done we’ll have 476 manuscripts online (we have around 130 online now). If you’re interested in checking out the manuscripts that are online, and to see what’s coming, you can visit our search and browse site here.

The relevance of BiblioPhilly in my talk is that we’re being experimental with the kind of data we’re creating in the cataloging work, and with how we use that data to provide new and different manuscript views.

Manuscript catalogers traditionally examine and describe the physical structure of the codex. Codex manuscripts start as sheets of parchment or paper, which are stacked and folded to create booklets called quires. Quires are then gathered together and sewn together to make a text block, then that is bound to make the codex. So describing the physical structure means answering a few questions: How many quires? How many leaves in each quire? Are there leaves that are missing? Are there leaves that are singletons (i.e., were never part of a sheet)? When a cataloger has answered these questions they traditionally describe the structure using a collation formula. The formula will list the quires, number of leaves in a quire, and any variations. For example, a manuscript with 10 quires, all of which have eight leaves except for quire six which has four, and there are some missing leaves, might have a formula like this:

1-4(8), 5(8, -4,5), 6(4), 7-10(8)

(Quires 1 through 4 have eight leaves, quire 5 had eight leaves but four and five are now missing, quire 6 has four leaves, and quires 7-10 have eight leaves)

The formula is standardized for printed books, but not for manuscripts.

Using tools developed through the research project VisColl, which is designing a data model and system for describing and visualizing the physical construction of manuscripts, we’re building models for the manuscripts as part of the BiblioPhilly cataloging process, and then using those models to generate the formulas that go into our records. This itself is good, but once we have models we can use them to visualize the manuscripts in other ways too. So if you go to the BiblioPhilly search and browse site and peek into the records, you’ll find that some of them include links to a “Collation View”

Following that link will take you to a page where you can see diagrams showing each quire, and image files organized to show how the leaves are physically connected through the quire (that is, the sheets that were originally bound together to form the quire).

Like the page-turning interface, this is giving us a false impression of what it would be like to deconstruct the manuscript and view it in a different way, but like the video is it also giving us a view of the manuscript that is based in some way on its physicality.

And this is where my talk ended. We had a really excellent question and answer session, which included a question about why I don’t wear gloves in the videos (my favorite question, which I answer here with a link to this blog post at the British Library) but also a lot of great discussion about why we digitize, and how, and why it matters, and how we can do it best.

Thanks so much to Glenda Goodman and Stewart Varner for inviting me, and to everyone who showed up.

January 7, 2018January 7, 2018

Slides from OPenn Demo at the American Historical Association Meeting

This week I participated in a workshop organized by the Collections as Data project at the annual meeting of the American Historical Association in Washington, DC. The session was organized by Stewart Varner and Laurie Allen, who introduced the session, and the other participants were Clifford Anderson and Alex Galarza.

The stated aim of the session was “to spark conversations about using emerging digital approaches to study cultural heritage collections,” (I’ll copy the full workshop description at the end of this post) but all of our presentations ended up focusing on the labor involved in developing our projects. This was not planned, but it was good, and also interesting that all of us independently came to this conclusion.

Clifford’s presentation was about work being done by the Scholarly Communications team at Vanderbilt University Libraries as they convert data from legacy projects (which have tended to be purpose built, siloed, and bespoke) into more tractable, reusable open data, and Alex told us about the GAM Digital Archive Project, which is digitizing materials related to human rights violations in Guatemala. Both Clifford and Alex stressed the amount of time and effort it takes to do the work behind their projects. The audience was mainly history faculty and maybe a few graduate students, and I expect they, like me, wanted to make sure the audience understood that the issue of where data comes from is arguably more important than the existence of the data itself.

My own talk was about the University of Pennsylvania’s OPenn (Primary Digital Resources for Everyone), which if you know me you probably already know about. OPenn is the website in which the Kislak Center for Special Collections, Rare Books and Manuscripts publishes its digitized collections in the public domain, as well as hosting collections for many other institutions. This includes several libraries and archives around Philadelphia who are partners on the CLIR-funded Bibliotheca Philadelphiensis project (a collaboration with Lehigh University, the Free Library of Philadelphia, Penn, and the Philadelphia Area Consortium of Special Collections Libraries), which I always mention in talks these days (I’m a co-PI and much of the work of the project is being done at Penn). I also focused my talk on the labor of OPenn, mentioning the people involved and including slides on where the data in OPenn comes from, which I haven’t mentioned in a public talk before.

Ironically I ended up spending so much time talking about what OPenn is and how it works that I didn’t have time to show much of the data, or what you can do with it. But that ended up fitting the (unplanned) theme of the workshop, and the attendees seemed to appreciate it, so I consider it a success.

Here are my slides:

Workshop abstract (from this page):

The purpose of this workshop is to spark conversations about using emerging digital approaches to study cultural heritage collections. It will include a few demonstrations of history projects that make use of collection materials from galleries, libraries, archives, or museums (GLAM) in computational ways, or that address those materials as data. The group will also discuss a range of ways that historical collections can be transformed and creatively re-imagined as data. The workshop will include conversations about the ethical aspects of these kinds of transformations, as well as the potential avenues of exploration that are opened by historical materials treated as data. Part of an IMLS-funded National Digital Forum grant, this workshop will ultimately inform the development of recommendations that aim to support cultural heritage community efforts to make collections collections more readily amenable to computational use.

October 13, 2016October 13, 2016

“Freely available online”: What I really want to know about your new digital manuscript collection

So you’ve just digitized medieval manuscripts from your collection and you’re putting them online. Congratulations! That’s great. Online access to manuscripts is so important, for scholars and students and lots of other people, too (I know a tattoo artist who depends on digital images for design ideas). As the number of collections available online has grown in recent years (DMMAP lists 545 institutions offering at least one digitized manuscript), the use of digital manuscripts by medievalists has grown right along with supply.[1] If you’re a medievalist and you study manuscripts, I’m confident that you regularly use digital images of manuscripts. So every new manuscript online is a celebration. But now, you who are making digitized medieval manuscripts available online, tell us more. How, exactly, are you making your manuscripts available? And please don’t say you’re making them freely available online.

I hate this phrase. It makes my teeth clench and my heart beat faster. It makes me feel this way because it doesn’t actually tell me anything at all. I know you are publishing your images online, because where else would you publish them (the age of CDRom for these things is long gone) and I know they are going to be free, because otherwise you’d be making a very different kind of announcement and I would be making a very different kind of complaint (I’m looking at you, Codices Vossiani Latini Online). What else can you tell me?

Here are the questions I want answered when I read about an online manuscript collection.

How are your images licensed? This is going to be my first question, and for me it’s the most important because it defines what I can do with your images. Are you placing them in the public domain, licensing them CC0? This is what we do at my institution, and it’s what I like to see, since, you know, medieval manuscripts are not in copyright, at least not in the USA (I understand things are more complicated in Europe). If not CC0, then what restrictions are you placing on them? Creative Commons has a tool where you can select the restrictions you want and then gives you license options. Consider using it as part of your decision-making process. A clear license is a good license.
How can I find your manuscripts? Is there a search and browse function on your site, or do I have to know what I’m looking for when I come in?
Will your images be served through the International Image Interoperability Framework (IIIF)? IIIF has become very popular recently, and for good reason – it enables users to pull manuscripts from any IIIF-compliant repository into a single interface, for example comparing manuscripts from different institutions in a single browser window. A user will need access to the IIIF manifests to make this work – the manifest is essentially a file containing metadata about the manuscript and a list of links to image files. So, if you are using IIIF, will the manifests be easily accessible so I can use them for my own purposes? (For reference, e-codices links IIIF manifests to each manuscript record, and it couldn’t be easier to find them.)
What kind of interface will you have? I usually assume that a page-turning interface will be provided, but if there is some other interface (like, for example, Yale University, which links individual images from a thumbnail strip on the manuscript record) I’d like to know that. Will users be able to build collections or make annotations on page images, or contribute transcriptions? I’d like to know that, too.
How can I get your images? I know you’re proud of your interface, but I might want to do something else with your images, either download them to my own machine or point to them from an interface I’ve built myself or borrowed from someone else (maybe using IIIF, but maybe not). If you provide IIIF manifests I have a list of URLs I can use to point to or download your image files (more or less, depending on how your server works), but if you’re not using IIIF, is there some other way I can easily get a list of image URLs for a manuscript? For example, OPenn and The Digital Walters publish TEI documents with facsimile lists. If you can’t provide a list, can you at least share how your urls are constructed? If I know how they’re made I can probably figure out how to build them myself.

Those are the big five questions I like to have answered when I read about a new digital manuscript collection, and they very rarely are. Please, please, please, next time you announce a new collection, try to go beyond freely available online and tell us all more about how your collection will be made available, and what users will be able and allowed to do with it.

[1] In 2002 33% of survey respondents reported manuscript facsimiles “print mostly, electronic sometimes” and 47% reported using “print only”. In 2011, 44% reported using them “electronic mostly, print sometimes” and 17% reported using “electronic only”. This is an enormous shift. From Dot Porter, “Medievalists and the Scholarly Digital Edition,” Scholarly Editing: The Annual of the Association for Documentary Editing Volume 34, 2013. http://www.scholarlyediting.org/2013/essays/essay.porter.html