This post is a bit long to be talking points for a poster presentation, but it’s the general gist of what I’ll be saying today at today’s iSchool Experiential Learning Expo.
My field study was part of the larger project of implementing the ArchivesSpace system here at Hornbake. It was a great chance to apply and build on some of the technical skills I’ve learned, like database organization, XML and EAD. At the same time, it was a chance to think about some bigger issues around language, and policies, and how we organize information for users.
The existing information management system is known as “The Beast”. It’s a Microsoft Access database that was built in the early 90’s to facilitate the introduction of Encoded Archival Description for online finding aids. Twenty years on, while the Beast still works as intended, it has issues. The database fields take information however it is entered which, combined with inconsistent written data entry policies, has led to extremely messy data. The path from data entry to finding aid is also complicated with the Beast, requiring the data to be fed through a homemade Java conversion program before it can be plugged into the online finding aid template.
The hope is that ArchivesSpace will clean up this process. It’s an open-source program developed by a community of archivists, with a web interface that prevents the need for messing with Access or other desktop-based software. With its drop down menus and ability to encode default language, it’s easier to enforce data entry guidelines. Creating a more streamlined data entry process that also provides consistent data should help to reduce the amount of time it takes to get from accessioning a new collection to having information up on the website about it.
As part of the field study, I worked on three elements of the conversion process. The first was analyzing the EAD being produced by the Beast and comparing it against best practices from other institutions as well as ArchivesSpace. From this, I developed a new EAD template. As functional as our EAD was, it seemed like there were areas where we could be using it to convey more information. The <physdesc> tag is a good example. The original version conveyed different information depending on who entered the data. Most frequently, it provided the linear feet, although sometimes it included box count as well. The linear feet number is useful to archivists, but is it something that researchers understand? In my revision, I expanded the tag to require linear feet and box count, as well as some more optional information. This will give both archivists and researchers a better sense of the size of the collection they want to view before they get to the reading room.
My next project was creating a map to show the way in which our existing data would translate (or not) to the ArchivesSpace system. Because of the localized nature of the Beast, a lot of the tags didn’t have a direct translation. On the other end of the equation, the fact that ArchivesSpace is still in development means that not all of the regular EAD tags are accounted for either. At the same time, the impending release of EAD3 may raise other issues that need to be reconciled. In the end, policy decisions will need to be made about any data that doesn’t fit into the ArchivesSpace map.
The culmination of this analysis was to start looking at the steps needed for the data cleanup of the resource records. The chart of <unitdateinclusive> and <unitdatebulk> information is a good example of what we’re up against. The variety of date formats, combined with inconsistent use of bulk dates, raises questions not only of data normalization, but also of policies. Here we see multiple bulk dates assigned to collections that include four items. There’s also bulk dates that were clearly just copied from the inclusive tag. Given these examples, when should we use bulk dates, and what’s the best way to properly identify what those dates are? While programs like OpenRefine might help with the normalization of the data, these policy issues should be resolved first.
So this is the point where technology and policy intersect. A software implementation is never just that. A fresh slate like ArchivesSpace opens up all sorts of questions about processes and policies, and whether we are doing what is in the best interest of our users. But upending the status quo is a political process. Stakeholders must be consulted and middle ground sought in order to get everyone on board with the changes. In the end, changes like those needed to implement ArchivesSpace are in the best interest of users, and therefore the repository. As hard as that change might be, it’s both a good and necessary thing.