I will be presenting at a couple of conferences in Germany later this month, and I need to get my stuff together. So I will be dedicating the time that I had been giving to this project, to those presentations. I will be back at the end of July.
The subject line says it all.
I’ve uploaded new versions for all the BookReaders, and also updated the index to include all the manuscripts in the collection. (Follow the “Walters BookReader” link in the top menu to find it) The BR are greatly improved, however there is a pretty major bug: all the 1-up images display very, very small. Doug is working on methods to fix this, but for now I would just recommend that you use only the 2-up or thumbnail views.
Also this morning, I uploaded a couple of thousand decoration images into Omeka. Some of them are now tagged (with the siglum of the ms in which they are found) and public. (Follow the “Walters Omeka” link in the top menu) This is taking much longer than anticipated, as Omeka seems unable to accept a CSV file with more than about 90-100 rows. Considering the thousands of image I’m dealing with, loading only 100 or so at a time means a lot of effort and time. I’m going to check out the forums to see if there is a way around this, but for now… enjoy!
I spent this evening finalizing the XSLT to convert the msDesc into CSV for import into Omeka. To the fields I mapped yesterday I’m adding the manuscript siglum (so we can easily find which decorations are from which manuscript), as well as the folio number on which the decorations appear (this was Doug’s suggestion; I’m hoping this will make it possible to create some kind of automatic link between the Omeka records and the corresponding folios within the context of the BookReader).
I’m generating the CSV files now. I had really hoped to be able to process everything into one huge CSV file, but I wasn’t able to get it to work (and really, that would be one huge file) so instead I’m generating one CSV file for each manuscript. There is some post-processing to be done, to keep things from getting too messy I actually put XML tags around each row of data, and those will need to be stripped out. I may see about combining some of the CSV files together, so I won’t have to strip quite as many separate files, but 200 together may be too many. We’ll see. I have a holiday tomorrow so hopefully will have some time to work on this, between cooking and wrangling the toddler.
The second part of my project, after getting the manuscripts all loaded into the Internet Archive BookReader, is to build a more extensive catalog for the manuscripts in Omeka. Eventually I’m going to experiment with some scholarly tools as well (I’m particularly interested in Scripto, which enables crowdsourced transcription, and the just released today Neatline, which supports temporal and geographic research) but for now I’m most interested in getting descriptive metadata out of the manuscript descriptions and into Omeka, where they can be searched and explored.
Tonight I generated a csv file from one of the Walters manuscripts (using XSLT), and then used the Omeka CSV plugin to import that data. I wasn’t really careful about mapping the fields (I’m using extended Dublin Core in Omeka), I’ll probably go back and take another look to make sure I’m mapping to the best DC fields. For now, I’m most interested in making sure the workflow is effective. So far, it’s great.
I’m using another plugin in Omeka that allows for hierarchical collections, so I’ve created one main collection, Digital Walters, and currently one subcollection, for manuscripts that are described according to their decoration rather than their textual divisions. I will create a second subcollection for those described according to textual divisions. I expect there are some (probably several) that have both extensive decoNote sections and msContents sections… I’ll deal with that when I get to it (ah, one benefit of experimenting, I don’t have to have all the answers before I start!).
For now, however, enjoy!
You are in the right place! I just changed my theme. I upgraded to the most recent version of the theme I’d been using, and I didn’t like some of the changes, so I decided to do something completely different. This one is a bit more sophisticated. I like it!
- The number of leaves in the manuscript. This is required by the BookReader. In the context of the Walters collection, it doesn’t make sense to just grab the number of <surface> elements in <facsimile> (which is what I did the first time), because there are images of the fore-edge, tail, spine, head, and in some cases where the bindings have flaps, an image of the flap closed. We removed those images from the group (although important, they would not make sense within the context of a page-turning version of the manuscript) and then counted the number of images remaining.
- The title of the manuscript: title[@type=’common’]. The Walters manuscripts have several different titles, but for the purpose of the BookReader – the title to display at the top of the main window – the common title made the most sense.
- The height and width of the page images. The first version of the BookReader hard-coded these numbers in, so the best I could do was to use measurements from an example page and hope that the other images in the manuscript weren’t too much different. However, I know that the images are frequently different sizes. So being able to pull the measurements for each individual page image is very useful.
So, updates to the manuscripts are coming very soon. Once they are up I will (finally) publish a post that explains the workflow in more detail and includes all the files and documentation one would need to create their own Digital Walters BookReaders, or to do the same kind of thing with their own materials.
For the past couple of days I’ve been working with Doug Emery at Digital Walters on a few things that will greatly improve our little project!
I uploaded 71 more manuscripts this afternoon: see Walters Art Museum Manuscripts.
There are thee things currently hard-coded that will need to be grabbed:
- The URL for the image files
- The home URL for the book (where the data resides on the WAM server)
- height and width for the image files
The third I figured would be the toughest, because that information doesn’t currently exist anywhere other than in the image files themselves. Doug and I have a plan (his plan 🙂 ), so hopefully we’ll be able to get that in this week.
This is fun!
Now for the not-so-good news: While going through this exif-checking / quality control, I found an instance where the image files aren’t named in order. That is, page 1 is image 0001, page 2 is image 0007, page 3 is image 0003 … etc. In this instance there is no image 0002, so “page 2” shows as a blank, and “page7” displays what is actually page 2. I wrote to Will Noel, Mike Toth, and Doug Emery (Digital Walters’ brain trust) and it turns out that this is a fairly frequent occurrence. While the IA BookReader assumes that files are numbered consecutively (since that is how they design their projects), Walters provides ordering in the form of a TEI facsimile element in each manuscript description.
Long story short, I spent some time on Skype with Doug (very generously giving me his time on a Saturday afternoon!) and he recommended using jquery to build a web app that will build a list of fake file names, using the order from the facsimile element, and will associate each of those names with the file that matches the surface that the fake file name represents. I think that this will be more elegant than trying to modify the way that the BookReader works.
Now, I don’t know jquery! Doug pointed me to a tutorial, and I also have access to Lynda.com through my employer, so I expect I’ll spend some time this week getting up to speed on jquery. Doug is going to work on this in the meantime too.
I’ve placed a warning at the top of the Walters page, to let folks know that there may indeed be wonkiness in the BookReaders, but we are on the case! I’m going to go ahead and check and upload all the rest of the manuscripts, and hope that the jquery fix can be made programmatically across all the simple js files. Depending on how that goes I may go ahead with the planned Omeka catalog (the issue with page ordering in the BookReader shouldn’t really have an impact on that work) or I may wait. Stay tuned!
I got in a good few hours of work this evening, and I really have some things to show for it.
<!-- extract the number of page images from the last graphic in the facsimile section --> <xsl:variable name="url"> <xsl:for-each select="tei:TEI/tei:facsimile/tei:surface/tei:graphic"> <xsl:if test="position()=last()"> <xsl:value-of select="@url"/> </xsl:if> </xsl:for-each> </xsl:variable> <xsl:variable name="replace-thumb" select="replace($url,'thumb','')"/> <xsl:variable name="replace-siglum" select="replace($replace-thumb,$siglum,'')"/> <xsl:variable name="replace-underscore" select="replace($replace-siglum,'_','')"/> <xsl:variable name="replace-jpg" select="replace($replace-underscore,'.jpg','')"/> <xsl:variable name="replace-slash" select="replace($replace-jpg,'/','')"/> <xsl:variable name="page-nos" select="number($replace-slash)"/>
I expect there is a more sophisticated way to do this, but I’m just tickled that I figured it out myself.
The stylesheet generates a folder, named after the siglum, and the (much simpler) stylesheet I wrote for the html file does the same, and also uses the same variables (although not as many of them).
In any case I feel really good about what I’ve been able to get through tonight, and now I am READY FOR BED.