[tesseract-ocr] Question : can I force Tesseract to follow an existing layout?

Vincent Sarbach-Pulicani Fri, 23 Sep 2022 08:20:22 -0700

Hello,
I'm working on historical newspaper from the interwar period written in 3 
different languages : corsican, french and italian.
After many tries, Tesseract seems to be the best OCR for me but the layout 
analysis of a newspaper is complex.
However, using the API of Gallica (French national library), I can have 
access to an OCR (bad quality) and usable ALTO files.
My question is : can I use those ALTO files to make Tesseract follow the 
same segmentation as the basic OCR?
I don't know if my question makes sense.
Thanks a lot,
Vincent Sarbach-Pulicani


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/334be2c9-a194-46ee-adcb-ab48b712e3b8n%40googlegroups.com.

[tesseract-ocr] Question : can I force Tesseract to follow an existing layout?

Reply via email to