On 27.01.15 17:40, Stefano Bargioni wrote:
Hi, I'd like to generate a ALTO xml file [*] starting from a PDF file, like an ebook. Is there a tool to do this in a Unix/Linux machine?
I have more or less the opposite problem: I'd like to combine a bitmap image and an ALTO file into a PDF document with searchable text. I think I have found the necessary Python packages to build the desired PDF (as per http://stackoverflow.com/questions/1180115/add-text-to-existing-pdf-using-python ), and parsing the ALTO xml to get the text elements and their positions on the page is certainly feasible. However I'd rather skip the mandatory debugging step in the development process and use well-tested tools if I can find them :-)
Any pointers someone would like to share? Best regards, Alain Borel EPFL Bibliothèque Rolex Learning Center Station 20 CH-1015 LAUSANNE (SUISSE) Téléphone: +41 (0)21 693.98.01 Téléfax: +41 (0)21 693.51.00