BBC World Service Archive Prototype

Contents:

1 Description
1.1 Automatic interlinking
1.2 Putting the archive online
1.3 Crowdsourcing
2 Appendix

The BBC World Service radio archive includes around 70,000 English-language programmes from over 45 years. The metadata around this archive was sparse (catalogue descriptions) and sometimes wrong, but the full audio content available in digital form. To be able to provide proper online access, BBC R&D has built a system to automatically annotate programmes within this archive with Linked Data web identifiers. This short paper describes how the construction of a feedback cycle results in a continously improves access to this archive. Also, applying Semantic Web technologies in combination with automatic speech recognition (ASR) and crowdsourced metadata (researchers providing feedback on the automatically created tags) has dramatically reduced the amount of time and effort required to publish this rich archive online. The navigation functionality of the system visualises archive programmes related to current news events, thus enabling journalists or editors to quickly locate relevant archive content which can then be used to provide more context around particular events.The appendix describes the characteristics of the application built via a list of developing criteria and demands.The BBC World Service archive prototype is available online.

Nice example of how the automatic indexing results in combination with crowdsourced feedback on the automatic tagging can enhance the online access to an audiovisual (news radio) archive.