Although, many digitization efforts have been made to access the content of handwritten document collections, none have been successful in full digital transcription, which automatically converts digital images of historical manuscripts into plain text. As a result, a large number of historical handwriting texts cannot be easily studied and its valuable content remain inaccessible.
The aim of the μDOC.tS project is to develop an innovative and cost-effective solution towards achieving automatic transcription of historical handwritten documents to be used not only by individual scholars but also on a large scale, by archives, libraries, museums, etc. The resulting technologies will be integrated into a set of tools that will be implemented on a dedicated platform for transcription purposes.
The end product of the project will enable users to:
- Automatically export full text from digitized historical manuscripts through a Handwritten Transcription Engine (HTR) using beyond the state of the art technology that relies upon Recurrent Neural Networks and language models.
- Improve quality (eg, noise reduction) and enhance text areas in digitized historical manuscripts.
- Search for keywords directly in digitized documents of historical handwritten collections using keyword spotting (KWS) techniques.
- Manage digitized handwritten documents in a user-friendly, intelligent and efficient manner, providing an automated document image processing, transcription and management environment.