Accurator: Enriching collections with expert knowledge from the crowd
AbstractCrowdsourcing is not a new phenomenon for museums. There are good examples for museums (e.g., Powerhouse museum, steve.museum). But not all crowdsourcing initiatives are successful. Crowdsourced tagging does not always contribute to a better understanding of art and can even be confusing. The Rijksmuseum, Free University Amsterdam, Center for Mathematics and Informatics, and Technical University of Delft developed the Accurator: a visual tool to get experts in domains like birds, ships, castles, etc. involved in annotating art and enrich the museums' metadata with expertise that is not available internally. In this how-to session, we will demonstrate the tool and the ways other museums can implement this Open Web application for their own collections.
Keywords: Crowdsource; nichesource; open tool; Rijksmuseum; Linked Open Data
In 2012, the Rijksmuseum launched its new website, Rijksstudio, presenting 220,000 works of art in high resolution. The website is a big breakthrough in presenting art online: the museum’s website no longer looks like a library catalogue, but is all about high-resolution images of objects. The website allows users to see, “touch,” pinch, collect, and reuse the images of the Rijksmuseum. This visual approach attracts large numbers of visitors, and many of them spend a long time on the Rijksmuseum website (Gorgels, 2013). Currently 300,000 to 400,000 people visit the website each month. One third of visitors are repeat visits. On average, visitors spend three minutes on the website, but when they start searching the average goes up to twelve minutes.
The basis for the website are the collections of the museum, and the museum’s Collection Management System (CMS) serves as the basis for the object information on the website. But the CMS was built as an internal, back-office registration and documentation system, not an information system for hundreds of thousands art lovers. There is a big gap between the information provided by the museum and the questions asked by the users of the Rijksmuseum website.
The most used search keys on the Rijksmuseum website are: Rembrandt, Vermeer, Golden Age (Gouden Eeuw), flowers (bloemen), and still life (stilleven). These iconographic keywords are hardly used to describe works of art in the CMS. The museum describes forms, sizes, materials, and techniques. Searching for “flowers” or “still life paintings” results in unexpected sets of objects.
To answer the questions from the audiences, the museum has to redescribe its collections: describe iconographic elements depicted on works of art. But the museum lacks manpower and knowledge to describe all details on each object in the collection. Crowdsourcing iconographic aspects of the collections is the only way to get this job done.
2. The things museums do
The collections of the Rijksmuseum consist of over a million objects, ranging from ancient Asian pottery to medieval religious artefacts, seventeenth century art, the guns of Napoleon, and the first Dutch airplane. As the fourth largest print room in the world, the Rijksmuseum has 700,000 works of art on paper in its vaults: prints, drawings, photographs, and artists’ books.
To keep track of all objects, the museum started automating registration in 2000. All local and personal database systems were merged into one central CMS. Even though all data was centralised, it took the museum some years to realise that one system does not automatically mean one systematic way of describing objects. Each department and each individual cataloguer had his or her own standards for documentation. Objects were described in many different ways, and each collection used different fields and different metadata schemes. Specific fields and schemes were built on request.
In 2006, the museum started to work on the biggest project in the its history: the New Rijksmuseum. Storage facilities of the museum had been moved out of Amsterdam, and (a new generation of) curators started to plan new exhibitions. More detailed and better-structured information of the objects in the collections and images were needed to stay in touch with the collections and to plan new exhibitions. At the same time, the museum got funding from the Dutch government to digitise its largest collection: the print room collection. To be able to produce a measurable amount of digital data (a specific amount of records and digital images), the Rijksmuseum had to develop a workflow for digitisation and object registration.
The basis for the workflow is the registration manual, a document that describes what information has to be registered, and how and in which fields it needs to be registered. What resources can be used to annotate an object and even the format of the data is laid out in order to standardise all object information.
In the last ten years, the manual has been refined and updated constantly. Specific standards for metadata were incorporated: the Arts and Architecture thesaurus of the Getty Research Institute to describe the physical appearance of works of art (object name, materials, technique, and so on); ULAN and RKDArtist for makers; as well as Wikipedia and Wikidata. For subject annotation, the Iconclass system (http://www.iconclass.org/) is used to describe iconographic information in a structured way. The Getty Thesaurus of Geographic Names is used to pinpoint locations. And some collections use subject specific resources: controlled vocabularies to document weapons, print marks etcetera.
3. Digital photography
In 2006, the Rijksmuseum, in collaboration with external experts, developed new standards for digital photography. The most important developments in this area have been to create a strict set of rules so that each individual object is digitised in accordance with the rules of its collection (e.g., all plates are photographed from above; each painting is photographed at least eight times: with frame, without frame, back side, signatures, marks, and signs). All digital images are color managed, and all colours are automatically measured and checked. All objects are recorded in high resolution.
4. Linked Open Data
The museum currently has different systems for collection management, for images and digital assets, for its library, and for archival material. Information between these different systems and the website (and partners outside the Rijksmuseum) are shared with application programming interfaces (APIs). The museum’s APIs are also available online for external users.
The museum chose structured resources to support digitisation workflows. These resources are now available online, and most are available as Linked Open Data sets. The museum is developing techniques to attach these Linked Open Data resources directly to the CMS and produce structured, linked, and comprehensible (and multilingual) metadata.
This metadata is the basis to find, interpret, connect, and share information. It also helps to collaborate with external partners like Europeana, Wikipedia and Wikimedia Commons, Google Art Institute, etc.
5. Dear sir/madam
The object with object number RP-P-1882-A-5712 is a different (bird) species than mentioned in your description. It is a brambling (Fringilla montifringilla) and not a Tit Bird.
(e-mail sent to firstname.lastname@example.org)
Crowdsourcing is not a new phenomenon for museums. The Powerhouse museum launched its first collection 2.0 (Chan, 2007) in the first decade of this century; steve.museum (Leason & steve.museum, 2009) is another well-known social tagging project; and in the Netherlands, the Institute for Sound and Vision runs very interesting crowdsource projects (Oomen et al., 2010).
But not all crowdsourcing initiatives are successful. The Rijksmuseum incorporated a general tagging system in its first iteration of Rijksstudio. Users could add their own keywords next to the keywords of the Rijksmuseum. These user-generated keywords had the same color and were on the same line as the official Rijksmuseum keywords. This may have been a design error. The tags added by the public were usually confusing, of emotional nature, or even insulting and did not contribute to a better understanding of works of art. People sent e-mails complaining about the silly keywords on the Rijksmuseum website. The tagging system on the Rijksmuseum website was shut down.
The only means for communication left on the website is a link to an e-mail address: at the bottom of each object page, users can click on the text: “Do you have a remark or extra information on this object? Please let us know!” Surprisingly, at least two remarks per day are e-mailed to the museum. Ninety five percent of the remarks are valid corrections or comments and help to improve the description of the object. The remarks range from corrections, references to additional publications, date information, additional creators information, etc. A large quantity of the remarks are corrections on iconographic information.
This was the beginning of the Accurator system: http://www.accurator.nl. The goal of the system is to develop new tools for crowdsourcing in the cultural heritage domain: to get people involved in annotating and enriching museums’ data with expertise that is not available internally. The project team therefore introduced the notion of “nichesourcing”: crowdsourcing for specific expertises (niches). These expertises can range from biology to monuments, from ships to shoes: expertises about subjects depicted in art that are often difficult to identify by art historians.
The work on Accurator is conducted in the context of SEALINCMedia, a consortium of Dutch researchers from the VU University Amsterdam, Delft University of Technology, and Centrum Wiskunde & Informatica (Centre for Mathematics and Informatics), teamed up with the Rijksmuseum and other heritage institutions like Sound and Vision and Naturalis, Center for Biodiversity. SEALINCMedia is part of the COMMIT/ project (http://www.commit-nl.nl/projects/socially-enriched-access-to-linked-cultural-media). The goal of the project is to bring together academic research in the field of informatics and (non-)profit organisations (https://sealincmedia.wordpress.com/tag/accurator/).
Nichesourcing starts with identifying a suitable subset of the collection. These subsets are formed around a shared subject-matter topic. There is no shortage of topics to choose from in the Rijksmuseum collection. Some examples are historical figures, animals, plants, and buildings.
For the Accurator project, we started setting up campaigns around the topics of castles and plants. But subjects’ suitability for the museum are not always suitable for Nichesourcing. As the team soon learned, not every topic has an interested, active group of experts available that is willing to invest time and energy in annotating artworks.
Requests for contributions were sent out on fora dedicated to historical monuments and plants, counting on intrinsic motivation of the fora members. But we barely received feedback, and the number of contributions was low. From this, we learned to also take into consideration the availability and access to experts while selecting topics. The project team started to collaborate with other institutions that get access to specific expert groups: with the help of Naturalis, Center for Biodiversity we got in touch with birdwatching groups. We decided to draw attention to our Annotation System by organising launch events: edit-a-thons in the Library of the Rijksmuseum.
Deciding on which information should be gathered is essential to make a crowdsourcing exercise useful for a museum. Crowdsourcing projects often rely on free text tagging. But with Accurator, we gather more-structured data. The museum has little time to change free-text annotations into structured metadata, and it has limited knowledge to check the validity of unstructured information. To check validity and make the process of ingesting annotations easy, Accurator is built upon structured resources for annotation, tailored to a specific domain. Thesauri are connected to the Annotation tool and serve as resource for autocompletion. One field for the topic “birds,” for example, is “Scientific name,” with the description “add scientifically accepted species or genus name.” A taxonomy of bird names is used so that annotators can select one of the alternatives. This results in uniform entries and avoids nonsense or unstructured data.
The sources can originate from a very simple vocabulary or even a list of preferred keywords. But to leverage the advantages of Linked Open Data (documentation, translations, shared and exchangeable data), Accurator supports loading vocabularies modelled according to the Simple Knowledge Organisation System (SKOS). Some vocabularies are already used by the museum (e.g., the Iconclass and Getty Art & Architecture thesaurus). This allows the system to gather annotations appropriate for automatic feeding back into the CMS.
But often, data structures that are not modelled in SKOS are more suitable for specific expert groups. Accurator supports any data structure as long as it comes from an authority that includes identifiers for the concepts for reference. In specific instances, the project team converts data structures to a SKOS format: for the bird domain, the IOC World Bird List, available in Open Data format, was converted into SKOS.
The data structures are also used to elicit information from contributors about their levels of expertise within a given domain. On the login page of the Accurator, users can differentiate specific expertise within a domain or make statements about levels of expertise. Some birdwatchers will know more about birds of prey than about waterbirds. This knowledge about the expert is later used to recommend suitable artworks to annotate.
During the project, research was done on automatically assessing expertise, based upon the profiles of experts but also on criteria like “speed of typing,” “value of previous annotations,” and “precision of usage of vocabularies” (Oosterman et al., 2014). Some algorithms from this research are implemented in the Accurator to validate annotations before they are uploaded into the CMS.
(Codes written for the Accurator Tool are available on Github: https://github.com/rasvaan/accurator/wiki)
7. The birdwatching event
The Accurator was launched on October 4, 2015. The project team, in collaboration with Wikimedia NL and Naturalis, organized a birdwatching event at the Rijksmuseum. Over forty birdwatchers were invited to come to the Rijksmuseum to search and annotate birds on works of art.
The Rijksmuseum preselected 1,606 works of art. Digital images of these objects were uploaded in the Accurator and on Wikimedia Commons, to see if experts would like to write or edit articles about birds on Wikipedia. The project team collaborated with Naturalis to get in touch with communities of birdwatchers.
The event attracted a lot of publicity, and the forty bird experts annotated 213 works of art, pinpointing 1,528 birds in the Accurator. Most experts preferred the Accurator over Wikipedia, because of the easy-to-use interface of the Accurator. They didn’t need a lot of training to work with the system.
The most remarkable finding was a black-capped donacobius on print RP-T-1898-A-3679: experts think that this may be (one of) the oldest recording(s) of this species of birds.
Currently, the project team is improving the back end of the Accurator. A dashboard is being created to visually help museum staff to upload collections and connect Linked Open Datasets. The dashboard will also display annotations and predict the quality of the annotations. Curators can then decide if they want to add the annotations to the CMS.
The Rijksmuseum will organise a second Accurator event on April 23, 2016: the subject will be fashion from the Rijksmuseum collection.
- The Rijksmuseum puts a lot of time and energy into structured, high-quality digital data (images and metadata).
- Experts in the museum register the art-historical information about the collections.
- The Rijksmuseum shares its collections online in high resolution and offers users new experiences with art online.
- Questions and e-mails from the (online) audience are often about iconographic details. Museum staff can’t always answer these questions.
- Crowdsourcing with the help of niche experts, experts from other domains, was tested to help annotate works of art.
- Tests showed that some experts are more interested to help and that it helps to collaborate with others to reach out to the experts, and personal contact also helps.
- Using thesauri, structured metadata, helps estimate the quality of the annotations and fits in the quality-controlled workflows of the Rijksmuseum.
- The Accurator annotation tool works easily and helps the museum acquire an enormous amount of annotations.
Chan, S. (2007). “Tagging and Searching – Serendipity and museum collection databases.” In J. Trant & D. Bearman (eds.). Museums and the Web 2007: Proceedings, Toronto: Archives & Museum Informatics. Published March 1, 2007 Consulted January 28, 2016. Available http://www.archimuse.com/mw2007/papers/chan/chan.html
Dijkshoorn, Chris, Jasper Oosterman, Lora Aroyo, & Geert-Jan Houben. (2012). “Personalization in crowd-driven annotation for cultural heritage collections.” UMAP Workshops 2012. Available http://ceur-ws.org/Vol-872/patch2012_paper_3.pdf
Gorgels, Peter. (2013). “Make Your Own Masterpiece!” In N. Proctor & R. Cherry (eds.). Museums and the Web 2013. Silver Spring, MD: Museums and the Web. Published January 28, 2013. Consulted January 28, 2016. Available http://mw2013.museumsandtheweb.com/paper/rijksstudio-make-your-own-masterpiece/
jegoosterman. (2014). “COMMIT – SEALINCMedia Rijksmuseum Use Case.” Youtube. [Video about Accurator Annotation Tool and SEALINCMedia project]. January 14. Available https://www.youtube.com/watch?v=OJAJbxzOV7o&feature=youtu.be
Leason, T., and steve.museum. (2009). “Steve: The Art Museum Social Tagging Project: A Report on the Tag Contributor Experience.” In J. Trant & D. Bearman (eds.). Museums and the Web 2009: Proceedings. Toronto: Archives & Museum Informatics. Published March 31, 2009. Consulted January 28, 2016. Available http://www.archimuse.com/mw2009/papers/leason/leason.html
Oomen, J., M. Brinkerink, L. Heijmans, & T. van Exel. (2010). “Emerging Institutional Practices: Reflections on Crowdsourcing and Collaborative Storytelling.” In J. Trant & D. Bearman (eds.). Museums and the Web 2010: Proceedings. Toronto: Archives & Museum Informatics. Published March 31, 2010. Consulted January 29, 2016. Available http://www.archimuse.com/mw2010/papers/oomen/oomen.html
Oosterman, Jasper, Archana Nottamkandath, Chris Dijkshoorn, Alessandro Bozzon, Geert-Jan Houben, & Lora Aroyo. (2014). “Crowdsourcing knowledge-intensive tasks in cultural heritage.” WebSci ’14, Proceedings of the 2014 ACM conference on Web science: 267-268. Available http://dl.acm.org/citation.cfm?doid=2615569.2615644
. "Accurator: Enriching collections with expert knowledge from the crowd." MW2016: Museums and the Web 2016. Published February 7, 2016. Consulted .