The default language model of Tesseract is the one for English. It's the 
same you get with the command line option '-l eng'. This model uses a 
reasonable small character set of letters, punctuation, numbers and some 
symbols.

You can save a small amount of time with smaller resolution, because there 
will not be much difference in OCR quality between 150 and 300 dpi. But 
converting them down also needs time, maybe more. The largest factor for 
the needed time is the number of characters in the page.

viju...@gmail.com schrieb am Mittwoch, 18. August 2021 um 12:21:50 UTC+2:

> Hi,
>
> I am working on printed document of English language. I need to extract 
> all text from that image, but with simple tesseract it is taking 4 sec. Is 
> it possible to fine tune for only English alphabet and numbers? Please help.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/873c14b2-c16b-4711-8bb2-a2de10a1708cn%40googlegroups.com.

Reply via email to