AnnoTate: Uncover the lives of artistsBackground: The Archives & Access Project
The Tate Archive is the largest archive of British art in the world. Comprising of over 800 collections, it contains a rich array of artists’ materials, including photographs, sketchbooks, diaries, letters and objects, documenting the lives and working processes of British born and émigré artists, from 1900 to the present.
Through an ambitious programme of digitisation, learning and participation, the HLF funded Archives & Access project has opened up these archival materials to national and international online audiences.
Assembled around a large scale digitisation initiative – which has seen 53,000 pieces from Tate’s Archive captured and published online – archival materials can now be browsed via the website, and are accompanied by a suite of bespoke digital tools, study guides, and specially commissioned films. Archives & Access also saw the completion of a major capital build project, which included newly created archive and learning spaces – the Archive Gallery and the Taylor Digital Studio. Further to providing new levels of physical and online access, the digitised content is central to a national Outreach Learning partnership programme in which Tate joins forces with five institutions across the UK: Josef Herman Art Foundation Cymru, Tate Liverpool, Tyne & Wear Archives & Museums, Turner Contemporary, Tate Collective.
Using this multi-faceted approach to facilitate the exploration and discovery of archival collections, Tate aims to provide audiences with a variety of means to enjoy and explore art, culture, and heritage.
However, textual documents captured require particular attention if they are to be made as accessible as possible; they need to be transcribed, and plaintext transcriptions must be generated to sit alongside the archival pieces on the website.
Considering the range of textual items contained within Tate’s significant collection, this poses a unique challenge on a considerable scale. To negotiate this, Tate collaborated with Zooniverse, a team working on Citizen Science projects based in Oxford University, to create a tool – called AnnoTate – to crowdsource archive transcriptions.
This nomination outlines Tate’s transcription rationale, approach, and reflects on some emergent project outcomes.
Within the notes, letters and sketchbooks that constitute an artist’s archive, insights into their lives, loves, and losses are preserved. Beyond offering ways to learn about an artist’s character, the wealth of detail captured in archival material is culturally significant, casting light on social histories, art practice, local, national and international events, and so much more. Such textual archives thus form a rich resource for all.
Yet so often these documents can be hard to read; written by hand, drawn over, typed on paper torn, discoloured, or damaged. And, if the content of such documents present a challenge for human eyes, it is completely inaccessible to search engines. To overcome these barriers, and to provide new means for engagement with – and discovery of – these archives, Tate and Zooniverse launched the crowd-sourced online transcription tool AnnoTate.
Since 1 September 2015, thousands of volunteers have helped decipher writing found in over 17,000 artists’ letters, diaries and sketchbooks. AnnoTate transcribers are working through a wealth of material from the mundane to the magical: professional gripes to professions of love; social commentary to shopping lists; jotted ‘to-do’s’ and reminders, to a moving description of the outbreak of war. AnnoTate visitors can browse the collections and type up anything from Francis Bacon’s letters to his art dealer, to the notes in Donald Rodney’s sketchbooks. Other artist’s archives featured include Barbara Hepworth, Eileen Agar, Duncan Grant, Ian Breakwell, Ethel Sands and Stuart Brisley, amongst many others.
Beyond exploring the archives, those participating in AnnoTate will be contributing to an entirely new online resource; the plaintext transcripts that will be published on Tate’s website, alongside the original materials.
Tate turned to crowdsourcing in part because of the sheer volume of material that needed transcription, but crowdsourcing offers benefits beyond mere speed. Over time, each page on AnnoTate is viewed by many transcribers and numerous versions of a transcription are captured. Even the most experienced lone transcriber will stumble over a crux or two, but when ten transcribers tackle the same manuscript, the chance of the correct reading eluding all contributors becomes very low. AnnoTate has therefore been designed to invite many transcriptions of a single page and uses an interface that manages to be both very open and very granular so that multiple versions of the text can be reconciled by a machine algorithm.
The archival materials digitised by Tate embrace a great variety of age, style, scale, format and materiality. Lines of handwritten text can lie at odd angles of even meander around the page. It was clear that no machine would decipher everything in these pages accurately; in fact, simply locating reliably all lines of text in need of transcription would be a challenge without human input. AnnoTate presents transcribers with a complete, unmodified photograph of each page, giving them full control over defining the position of each line of text before they attempt to transcribe it. Having marked the start and end points of a single line, the transcriber enters their rendition of what may only be a single word. Behind the scenes, this is matched against other attempts to transcribe text in that area of the image. Once a certain threshold of attempts have been recorded in the system, the line is ‘retired’, meaning the system holds enough data to produce an aggregated version of the line.
The machine aggregation of the transcriptions is the next stage. Having identified all the versions of a line submitted by our transcribers, the software assesses the level of consensus between them. The aim is to produce a single version that represents a merger of all individual versions and flags up the significant variations between them. To minimise noise in the end result, single outlying readings are ignored, along with insignificant variations such as multiple spaces. At the end of the process, Tate’s archivists receive a remarkably clean version of the transcribed page, although frequently with some significant variant readings still to be reviewed and reconciled by an expert human eye.
As we work towards project completion, AnnoTate transcribers keep in touch with Tate and Zooniverse staff, as well as with each other, via the dedicated online forum ‘Talk’. Talk is both a social hub and a noticeboard where shared archival content can be browsed, questioned, and commented on. Using Talk, Tate asked two project participants to describe their reasons for getting involved in AnnoTate. One transcriber, @starrymirth, explained that she was ‘curious to see what it was about’, adding that on the one hand ‘the AnnoTate concept is something that I consider quite important – the digitization of data, making it searchable and accessible; but on the other hand I transcribe simply because I find the process enjoyable.’ She further added:
‘I think one of the reasons I’ve enjoyed AnnoTate is that there’s a lot of flexibility – I can just transcribe a line or two if I don’t have a lot of time. There’s also an immense sense of satisfaction when you’re staring at a line and you suddenly see what the illegible word is and the sentence makes sense. It’s a sort of instant-gratification that makes contributing to AnnoTate quite addictive.’
@jules transcribed Keith Vaughan’s notebooks, which she describes as both poignant and revealing:
‘He sketched and painted some wonderful images from WWII. I’ve come to realise that he suffered from poor physical and mental health and eventually took his own life. I have also learnt something about the process of creating a piece of art as he sometimes included notes on the colours he wanted to use in his sketches.’
The transcription process is illuminating in many respects, with participants learning about aspects of and approaches to practice, gaining insights into societal histories, and encountering striking biographical details. As @jules notes:
‘Sometimes it feels like prying but it’s all part of the bigger picture of understanding who the person was and why they produced the art they did… It’s important to save these notebooks and sketches. They are part of our heritage and social history – and you never know what you might find.’
Sometimes, extraordinary finds are uncovered. Transcriber Barbara Jung was fascinated by a typescript of an unattributed German-language poem in Klaus E Hinrichsen’s archive. As a German speaker and translator by profession, she volunteered her own translation – to our knowledge, the first time the verses have appeared in English. The poem thoughtfully explores the abject absurdity of the internment of detainees during World War Two. Hinrichsen himself had fled to Britain during the 1930s to escape Nazi persecution. While interned in Hutchinson camp on the Isle of Man during the early 1940s, he befriended other German and Austrian scientists, musicians and artists – most notably Erich Kahn and the Dadaist Kurt Schwitters – and helped to establish what became known as the ‘Hutchinson University’, where internees could while away the hours behind barbed wire in a creative spirit. The translated poem can be read here.
Share and share alike
Tate is the first art gallery to collaborate with the Zooniverse team, led by the University of Oxford, to crowdsource text transcriptions of handwritten documents in this way. As Zooniverse develop all their code under an Apache software license, other organisations will be able to use the code behind AnnoTate. The source code is freely available and open for reuse – visible on Github – enabling other materials and collections to be transcribed as an outcome of this project’s investment. Tate, too will be able to accommodate new images within AnnoTate as further collections are digitised.
It is hoped that this open and collaborative approach will continue to foster dialogues within and between audiences, whilst contributing to ongoing developments in the arts and heritage sectors (and beyond) which see digital tools implemented as means of providing user access on unprecedented scales.