
I have some old, busy documents that I'm trying to OCR. Tesseract does an 
incredible job with them out of the box (especially in comparison to other 
open source tools), but there are a few lines that it fails to detect in 
entirety. I've spent some time trying to figure out how Tesseract detects 
text lines to no real avail. So I have two questions for the community: how 
does Tesseract detect text lines, and if detection is ML-based, is it 
possible to fine-tune that model on our own datasets?

Thank you in advance for your answer!


You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Reply via email to