Towards Affordable Disclosure of Spoken Heritage Archives

Contents:

  • 1. Introduction
  • 2. From spoken-word archive to multimedia information portal
  • 3. Automatic annotation using automatic speech recognition
  • 4. The Buchenwald user interface
  • 5. Discussion and conclusion
  • References

This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken heritage archives in general, and in particular of a collection of recorded video interviews with Dutch survivors of World War II concentration camp Buchenwald. The work on this project extends the ‘Radio Oranje’ project.The goal of the Buchenwald project was to develop a Dutch educationalmultimedia information portal on this WorldWar II concentration camp giving the user a complete picture of the camp then and now by presenting written articles, photos and the interview collection. After introducing three main difficulties confronting automatic annotation systems: the low accuracy of the technology, limited knowledge concerning usability issues and the affordability of integrating new technology into current archive workflows, the paper describes the application of two annotation technology strategies: alignment and large scale vocabulary speech recognition to the Buchenwald Collection. These strategies for automatic annotation support for example within-document search.The paper also includes an assessment of the end-user interface. This case shows that a minimal amount of manual effort can help to improve accuracy and/or usability. The authors conclude that, although users are positive about the search possibilities, more work needs to be done to improve the technology.It appeared in the Journal of Digital Information, Volume 10, No. 6 (2009). The authors are affiliated with the Universiteit Twente and Radboud Universiteit in the Netherlands.

This paper clearly illustrates how much a collection’s characteristics can influence the applicability of automatic annotation and thus affects the ultimate affordability of introducing its use into standard operations. The article is perhaps best suited for readers with some technical knowledge of automatic indexing technology as some sections are quite technical in nature. The options to apply speech recognition in an AV-portal may inspire the reader.