Hello, I'm working on historical newspaper from the interwar period written in 3 different languages : corsican, french and italian. After many tries, Tesseract seems to be the best OCR for me but the layout analysis of a newspaper is complex. However, using the API of Gallica (French national library), I can have access to an OCR (bad quality) and usable ALTO files. My question is : can I use those ALTO files to make Tesseract follow the same segmentation as the basic OCR? I don't know if my question makes sense. Thanks a lot, Vincent Sarbach-Pulicani
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/334be2c9-a194-46ee-adcb-ab48b712e3b8n%40googlegroups.com.