Behind the scenes at ARTstor: Metadata
By Dustin Wees, Director of Metadata and Cataloging
At cocktail parties and in the checkout line at Whole Foods, I’m often asked to explain the difference between data and metadata. I first try “You know the difference between physics and metaphysics, don’t you? Metadata is a lot more philosophical than data.” When that flops—and it usually does—I then try a more prosaic answer: “Metadata is data about data.” In terms of the ARTstor Digital Library, I think of the image as data, and the metadata is the information about the image.
ARTstor’s metadata is an additive aggregation of heterogeneous bits—the equivalent of yellow stickies and typed lists. But we refer to differing schemas and databases, and it comes from various places and in a variety of forms. Typically, our contributors have already created the metadata for their own uses, frequently tailoring the requirements to their point of view. Describing old master paintings, say, uses categories of information that differ from those describing arrowheads or autograph letters.
Despite the wide variety of incoming metadata, we need to make it all as standard, or “normalized,” as possible so that the images in the ARTstor Digital Library behave similarly. We start by organizing the information into a uniform set of fields. Sometimes we flesh out the records if the source data is very skimpy or not very useful.
Most data sets we receive from contributors are pretty much ARTstor-ready. The Metadata team analyzes them to first figure out which fields are useful for ARTstor, then determine which ARTstor field should contain the values in each of the source data fields. There are plenty of instances, however, when the metadata team adjusts the source data during analysis and mapping: misspellings are corrected, terms normalized, etc.
But that’s not all we do to help our users find what they need or to discover images they wouldn’t have known about. The Metadata team also creates Clusters and Featured groups, adds enhancements to help with browse and advanced search, corrects errors for better discovery through keyword searches, and more.
Such as? Well, for one, we’ve developed a way to improve creator searching without actually editing every record. Let’s say I have this painting in mind and I know something, but not everything about it. I know its date, vaguely, and I think it’s from the Netherlands. Is it by Sustris? I’m not sure. All I know for certain is that it’s a nude, and if I just search for nude I get more than 1,000 results. But when I use the Digital Library’s filters and select Netherlands and Painting, I reduce the results to a reasonable 45 images, including the one I was looking for. (I could have done a similar thing within the advanced search.) How did this work? The metadata team has been adding enhancement terms to records to the data provided by the contributor.
When we analyze searching in the ARTstor Digital Library, we see that the vast majority of searches are for artists’ names, so we have also developed the ARTstor Name Authority, or ANA. Our goal, as with the enhancements, is to improve discovery without having to edit the display.
The Getty Research Institute has allowed us to use the Union List of Artist Names® Online (ULAN), so we developed a computer algorithm to match creator names in ARTstor to ULAN names. ULAN records are typically very rich. The Metadata team also employs controlled vocabularies based on work type, enabling advanced search by classification.
The Metadata team is also heavily involved in shaping Shared Shelf’s Names vocabulary and working with the Built Works Registry, but that will have to wait for a future blog post. Hopefully by now I’ve given you enough information to understand why when people ask me about metadata I respond with “You know the difference between physics and metaphysics, don’t you?”