Going on hiaitus

I will be presenting at a couple of conferences in Germany later this month, and I need to get my stuff together. So I will be dedicating the time that I had been giving to this project, to those presentations. I will be back at the end of July.

Walters BookReaders are updated! And more content in Omeka

The subject line says it all.

I’ve uploaded new versions for all the BookReaders, and also updated the index to include all the manuscripts in the collection. (Follow the “Walters BookReader” link in the top menu to find it) The BR are greatly improved, however there is a pretty major bug: all the 1-up images display very, very small. Doug is working on methods to fix this, but for now I would just recommend that you use only the 2-up or thumbnail views.

Also this morning, I uploaded a couple of thousand decoration images into Omeka. Some of them are now tagged (with the siglum of the ms in which they are found) and public. (Follow the “Walters Omeka” link in the top menu) This is taking much longer than anticipated, as Omeka seems unable to accept a CSV file with more than about 90-100 rows. Considering the thousands of image I’m dealing with, loading only 100 or so at a time means a lot of effort and time. I’m going to check out the forums to see if there is a way around this, but for now… enjoy!

CSV, almost ready for import

I spent this evening finalizing the XSLT to convert the msDesc into CSV for import into Omeka. To the fields I mapped yesterday I’m adding the manuscript siglum (so we can easily find which decorations are from which manuscript), as well as the folio number on which the decorations appear (this was Doug’s suggestion; I’m hoping this will make it possible to create some kind of automatic link between the Omeka records and the corresponding folios within the context of the BookReader).

I’m generating the CSV files now. I had really hoped to be able to process everything into one huge CSV file, but I wasn’t able to get it to work (and really, that would be one huge file) so instead I’m generating one CSV file for each manuscript. There is some post-processing to be done, to keep things from getting too messy I actually put XML tags around each row of data, and those will need to be stripped out. I may see about combining some of the CSV files together, so I won’t have to strip quite as many separate files, but 200 together may be too many. We’ll see. I have a holiday tomorrow so hopefully will have some time to work on this, between cooking and wrangling the toddler.

Digital Walters in Omeka!

The second part of my project, after getting the manuscripts all loaded into the Internet Archive BookReader, is to build a more extensive catalog for the manuscripts in Omeka. Eventually I’m going to experiment with some scholarly tools as well (I’m particularly interested in Scripto, which enables crowdsourced transcription, and the just released today Neatline, which supports temporal and geographic research) but for now I’m most interested in getting descriptive metadata out of the manuscript descriptions and into Omeka, where they can be searched and explored.

Tonight I generated a csv file from one of the Walters manuscripts (using XSLT), and then used the Omeka CSV plugin to import that data. I wasn’t really careful about mapping the fields (I’m using extended Dublin Core in Omeka), I’ll probably go back and take another look to make sure I’m mapping to the best DC fields. For now, I’m most interested in making sure the workflow is effective. So far, it’s great.

I’m using another plugin in Omeka that allows for hierarchical collections, so I’ve created one main collection, Digital Walters, and currently one subcollection, for manuscripts that are described according to their decoration rather than their textual divisions. I will create a second subcollection for those described according to textual divisions. I expect there are some (probably several) that have both extensive decoNote sections and msContents sections… I’ll deal with that when I get to it (ah, one benefit of experimenting, I don’t have to have all the answers before I start!).

For now, however, enjoy!

http://www.dotporterdigital.org/omeka/collections/show/2

Changed wordpress theme

You are in the right place! I just changed my theme. I upgraded to the most recent version of the theme I’d been using, and I didn’t like some of the changes, so I decided to do something completely different. This one is a bit more sophisticated. I like it!

Almost ready to update

Doug and I have been working on the generalized BookReader javascript file, and I am almost ready to update all the Walters manuscripts. There is just one bug that is causing all the 1-up images to display teeny-tiny, instead of fitting to the height of the browser window. But once that is taken care of I’ll generate the new javascript files (I have an XSLT ready to go now) and get those up right quick.

The javascript has been updated to pull everything it needs to function from the TEI manuscript description file, except for the name of the msDesc file itself. That is the only bit that needs to be hard-coded into the javascript – everything else gets grabbed from that file. This includes:

  • The manuscript’s id number (tei:idno) and siglum (which is the idno with the “.” removed – W.4 becomes W4). Both of these are needed at several different points in the javascript. In an earlier post I was pleased with myself for figuring out how to grab the siglum from the name of the TEI file, which takes a few steps and is a bit complicated. I’m glad that I figured out how to do that, but using the idno for this purpose is much more elegant.
  • The number of leaves in the manuscript. This is required by the BookReader. In the context of the Walters collection, it doesn’t make sense to just grab the number of <surface> elements in <facsimile> (which is what I did the first time), because there are images of the fore-edge, tail, spine, head, and in some cases where the bindings have flaps, an image of the flap closed. We removed those images from the group (although important, they would not make sense within the context of a page-turning version of the manuscript) and then counted the number of images remaining.
  • The title of the manuscript: title[@type=’common’]. The Walters manuscripts have several different titles, but for the purpose of the BookReader – the title to display at the top of the main window – the common title made the most sense.
  • The height and width of the page images. The first version of the BookReader hard-coded these numbers in, so the best I could do was to use measurements from an example page and hope that the other images in the manuscript weren’t too much different. However, I know that the images are frequently different sizes. So being able to pull the measurements for each individual page image is very useful.
  • Finally, we were able to use the information about the language of the manuscript (textLang/@mainLang) to determine whether a manuscript should display right-to-left or left-to-right. Doug figured out how to modify the javascript to allow right-to-left page-turning, and I figured out how to grab that information from the TEI file. We were both pretty tickled about this one! Given that many, if not most, of the Digital Walters manuscripts are non-Western, having a right-to-left display is really important functionality.

So, updates to the manuscripts are coming very soon. Once they are up I will (finally) publish a post that explains the workflow in more detail and includes all the files and documentation one would need to create their own Digital Walters BookReaders, or to do the same kind of thing with their own materials.

New manuscript descriptions, and javascript

For the past couple of days I’ve been working with Doug Emery at Digital Walters on a few things that will greatly improve our little project!

1) Updating the TEI manuscript descriptions to include the height & width for each image file. If you’ve read previous posts on this blog, you know that the BookReader javascript file requires a height & width for the images in order to work correctly. My workflow has been, for each manuscript, open a representative image in Jeffrey’s Exif viewer (http://regex.info/exif.cgi), and take the measurements from that image and hard-coding it into the javascript file. This is time- and effort-intensive, and it makes my head hurt. Having the measurements easily available (like, encoded into the TEI files) means that we now have the option of pulling those measurements out and applying them programmatically. Which leads us to the next cool thing we’ve been doing…

2) Improving the BookReader simple javascript file with a more thoughtful use of code to pull as much as we can out of the metadata rather than hard-coding it. Ideally the same file could be used for each manuscript, but I don’t know if we’ll get to that point.[1] The main problem seems to be with the siglum – it’s not actually coded anywhere in the metadata. When I generated the javascript files that are running the manuscripts now I used XSLT to extract the siglum from the name of the TEI file, but I am having trouble doing something similar using javascript. Although, as I think about it, there’s nothing stopping me from just adding that to the TEI files myself. We’ve already added the measurements, so why not add this too? It would be a simple script and would take a few minutes, and it might just make the javascript file completely free from hard-coded information. Perhaps I’ll try that tomorrow!

Although I started out intending to do everything myself, I’m really glad to have picked up a partner to help me with the javascript (someone who also happens to be knowledgeable about the metadata and able to feed me additional metadata when needed!). I have learned a lot in the past week, and this is the first time I’ve done javascript and really understood how it’s working, and I feel confident applying it on my own. It’s pretty exciting.

[1] If you’ve read previous posts, you know that I do want to have self-contained sets of files for each mss, rather than running everything off of one javascript file controlled by a drop-down list of sigla or something like that. I think that it would be possible to do that, but it’s not really what I want. When I say “the same file used for each manuscript” I mean that identical files could be found running each manuscript. I hope this distinction makes sense.

More Walters Manuscripts online

I uploaded 71 more manuscripts this afternoon: see Walters Art Museum Manuscripts.

I spent some time on Skype with Doug Emery this afternoon. He figured out how to use jquery to pull in the TEI manuscript description (an XML file) and use that to get the correct order of the images! He showed me how he did it, so I should be able to figure out something similar in the future if I need to. He also showed me generally how to use javascript to reach into an XML file and grab an element value, a great trick and more versatile than XSLT. We had an interesting conversation about XML-brain (which is what I have) vs. programmer brain (which is what he has). I tend to use XSLT to do everything, since that’s what I know, and he will only use XSLT under duress (I’m paraphrasing and may be exaggerating). But it’s clear that javascript/jqery adds a lot to a toolkit, even if there is still much I can do with XSLT.

What I’d like eventually is a simple javascript file that can replace the current javascript files I’ve generated for each manuscript, one that can pull all of the information it needs straight from the metadata files rather than having it hard-coded into the javascript (which is how it is working now).

There are thee things currently hard-coded that will need to be grabbed:

  • The URL for the image files
  • The home URL for the book (where the data resides on the WAM server)
  • height and width for the image files

The third I figured would be the toughest, because that information doesn’t currently exist anywhere other than in the image files themselves. Doug and I have a plan (his plan 🙂  ), so hopefully we’ll be able to get that in this week.

In the meantime, until we can get the javascript updated significantly, I’m going to go ahead and finish checking and uploading the existing manuscripts. Once the updated javascript is ready I’ll switch everything over to the new system.

This is fun!

Two steps forward and a big step back

First, the good news: I generated BookReaders for all the Walters manuscripts, and I’ve checked and uploaded some of them to this site (look for “Walters Art Museum Manuscripts” in the right menu, or go here: http://www.dotporterdigital.org/?page_id=20. Yesterday (after I figured out that there was a relative path that was slightly wrong, keeping anything from working) there was only one thing left to do: finding the height and width, in pixels, of all the page images. I was hoping to do something programmatic, but instead I have been using Jeffrey’s Exif Viewer (http://regex.info/exif.cgi) to find those numbers, and putting them into the simple js files by hand. It’s probably not ideal, and it takes a while, but it’s 1) simple (I like simple!) and 2) it offers another opportunity for quality control. I’ve found a few instances where the title pulled out of the XML contains a single quote, which messes up the javascript and keeps the book from displaying, so it’s a good check.

Now for the not-so-good news: While going through this exif-checking / quality control, I found an instance where the image files aren’t named in order. That is, page 1 is image 0001, page 2 is image 0007, page 3 is image 0003 … etc. In this instance there is no image 0002, so “page 2” shows as a blank, and “page7” displays what is actually page 2. I wrote to Will Noel, Mike Toth, and Doug Emery (Digital Walters’ brain trust) and it turns out that this is a fairly frequent occurrence. While the IA BookReader assumes that files are numbered consecutively (since that is how they design their projects), Walters provides ordering in the form of a TEI facsimile element in each manuscript description.

Long story short, I spent some time on Skype with Doug (very generously giving me his time on a Saturday afternoon!) and he recommended using jquery to build a web app that will build a list of fake file names, using the order from the facsimile element, and will associate each of those names with the file that matches the surface that the fake file name represents. I think that this will be more elegant than trying to modify the way that the BookReader works.

Now, I don’t know jquery! Doug pointed me to a tutorial, and I also have access to Lynda.com through my employer, so I expect I’ll spend some time this week getting up to speed on jquery. Doug is going to work on this in the meantime too.

I’ve placed a warning at the top of the Walters page, to let folks know that there may indeed be wonkiness in the BookReaders, but we are on the case! I’m going to go ahead and check and upload all the rest of the manuscripts, and hope that the jquery fix can be made programmatically across all the simple js files. Depending on how that goes I may go ahead with the planned Omeka catalog (the issue with page ordering in the BookReader shouldn’t really have an impact on that work) or I may wait. Stay tuned!

Progress!

I got in a good few hours of work this evening, and I really have some things to show for it.

I’m almost done with the stylesheet to generate the javascript BookReader files. Not having used XSLT to generate a javascript file before I wasn’t sure it would work, but it sure did. I had to set a bunch of variables, one after the other, to do things like extract the ms siglum from the file name and extract the number of pages from the last graphic tag in the facsimile section. That last one I’m particularly pleased with:

<!-- extract the number of page images from the last graphic in the facsimile section -->
<xsl:variable name="url">
    <xsl:for-each select="tei:TEI/tei:facsimile/tei:surface/tei:graphic">
        <xsl:if test="position()=last()">
            <xsl:value-of select="@url"/>
        </xsl:if>
    </xsl:for-each>
</xsl:variable>
<xsl:variable name="replace-thumb" select="replace($url,'thumb','')"/>
<xsl:variable name="replace-siglum" select="replace($replace-thumb,$siglum,'')"/>
<xsl:variable name="replace-underscore" select="replace($replace-siglum,'_','')"/>
<xsl:variable name="replace-jpg" select="replace($replace-underscore,'.jpg','')"/>
<xsl:variable name="replace-slash" select="replace($replace-jpg,'/','')"/>
<xsl:variable name="page-nos" select="number($replace-slash)"/>

I expect there is a more sophisticated way to do this, but I’m just tickled that I figured it out myself.

I made good use of those variables, too, in fact every replacement I made in the javascript was done with a variable. I don’t think I’ve ever made a stylesheet constructed entirely of variables before.

The stylesheet generates a folder, named after the siglum, and the (much simpler) stylesheet I wrote for the html file does the same, and also uses the same variables (although not as many of them).

There is still work to do. For one thing, it’s not working! It’s something in the javascript, and since javascript isn’t my forte it’s going to take me a while to figure it out. I’ll probably spend some time this weekend modifying my test BookReader (the one that works) by hand, in comparison with the XSLT output, and see if I can get it to break or if I can see what’s different in the output that would cause it to break.

In any case I feel really good about what I’ve been able to get through tonight, and now I am READY FOR BED.