@PhilosTEI creates a simple and free way to transform images of texts in several languages and historical periods to the TEI format, which is today's standard for digital editions.
More in detail, @PhilosTEI develops and makes available an open source, web-based, user-friendly workflow from digital images of text to TEI, based on a combination of an OCRopus / Tesseract webservice for text layout analysis and Optical Character Recognition (OCR) and a multilingual version of TICCL available as webservice TICCLops.
This tool will enable the building of corpora appropriate to philosophy research of the kind we do in other projects such as our current ERC project Tarskiís Revolution: A New History. Developing this tool will in turn open up the way to new e-methodologies in different areas of philosophy and philosophy-informed intellectual history, to be pursued in future projects.
The result of this project will be a workflow from scanned images of digital texts in TEI format in several historical scripts and European languages. This result is crucial to proper corpora-building in philosophy. Proper corpus building is the first essential step for reaching the Axiom groupís overall methodological goal, namely developing a new computational methodology for the history of philosophy and philosophy-informed intellectual history.
This project will have considerable relevance because this workflow, if successful, will provide a viable open source alternative to the commercial OCR software ABBYY (featuring an expensive, pay-per-page Fraktur module). This will be interesting both to individual scholars in the humanities and textual data providers such as libraries and archives.
It will also open up interesting perspectives for automatic TEI encoding from a variety of standards widespread among researchers (such as .doc(x)). Note that the doc(x) to TEI feature is potentially of great interest and relevance for the Open Access movement, and thus also outside the humanities. Public funders put growing demands on the researchers they fund as regards the open accessibility of their results in a sustainable format. Automatic doc(x) to TEI conversion will arguably be an ideal solution to meet this demand.
The project, which will also have a training component for younger scholars, is embedded in a larger framework includingĎTarskiís Revolution: A New History', the ERC Proof of Concept GLAMMap and the pilot Phil@Scale, involving philosophers, mathematicians, computer scientists, programmers and librarians.
You can follow this project on Twitter: @PhilosTEI.
Arianna Betti(Axiom Group, VU University)
Martin Reynaert (Tilburg University, TiCCL)
Ko van der Sloot (Tilburg University)
Rik Hoekstra (Huygens Institute)
Hein van den Berg (Axiom Group, VU University)
Research Assistant (vacancy)
State of Affairs
This project is funded by CLARIN-NL.
Papyrs page For project-members only: Papyrs PhilosTEI.