I spent this evening setting up the hosting (via Dreamhost), installing Omeka, MediaWiki, and WordPress, moving the few page-turning books I set up yesterday (using the Internet Archive BookReader) over to the sandbox, installing a bunch of Omeka plugins, and thinking about what exactly it is I want to do first.
Someone very smart told me just the other day that the best way to learn new programming skills is to have a project in mind, and to work through that project. So I think that’s what I’m going to do, and I have an idea of what I want my first project to be. It’s a doozie 🙂
The Walters Art Museum, with great foresight (mainly the foresight of Will Noel, Curator of Manuscripts) has released all of their digitized manuscripts, and many TEI manuscript descriptions along with them, under a Creative Commons license. The metadata is regular and standard, and the naming conventions for the scanned images are regular as well. So if one wishes to do something with the digitized collections, one could write up some scripts to grab the information one required (metadata and/or image files) and do it all in one go. Theoretically. And that is what I am going to do.
At this point my plan has two stages.
The first stage is to create Internet Archive BookReader versions for all of the scanned manuscripts from the Walters collection. The BookReader is super simple and there are only a few changes that would need to be made to the code for each version: Changing the book title (manuscript name) in a few places in the index.html file, and changing the URL pattern, manuscript name, and “home URL” (which would point to the Walters website) in the BookReader.js file. Honestly, I’m not entirely sure how I am going to do this, but I know it can be done. So I’ll figure it out.
The second stage is to create records for all these BookReader versions in Omeka. Now, the manuscript descriptions for the Walters manuscripts are extensive and good, but there is no place for descriptive metadata in the BookReader itself. On the other hand, Omeka doesn’t do well with multi-page documents. There’s no page-turning interface (I didn’t even see a plug-in for one), each individual page is presented separately, which isn’t really a good way to interact with codex manuscripts. My plan is to create a CSV file of metadata extracted from the Walters manuscript descriptions, import that into Omeka, and then provide links from those records to the BookReader versions, where the manuscripts can be interacted with a bit more naturally. (I’m actually following the model that we used at Indiana University Bloomington in developing the War of 1812 in the Collections of the Lilly Library project, which also uses Omeka to describe books and other objects but doesn’t display them through Omeka, instead it links out to our purpose-built publication services: http://collections.libraries.iub.edu/warof1812/) Now, how exactly am I going to create that CSV file? Again, I’m not sure yet, but I know that I can do it.
That’s it for tonight. Tomorrow I’m going to try to figure out what I’m going to have to do to start on stage one. Right now I’m thinking some kind of script to grab all the manuscript names + sigla (it’s the sigla that are different in the page URL from manuscript to manuscript), put them in some kind of (probably XML) document, then a separate script (probably XSLT, since that’s what I know best) to generate the BookReader files for each title. Any suggestions?