The default language model of Tesseract is the one for English. It's the same you get with the command line option '-l eng'. This model uses a reasonable small character set of letters, punctuation, numbers and some symbols.
You can save a small amount of time with smaller resolution, because there will not be much difference in OCR quality between 150 and 300 dpi. But converting them down also needs time, maybe more. The largest factor for the needed time is the number of characters in the page. viju...@gmail.com schrieb am Mittwoch, 18. August 2021 um 12:21:50 UTC+2: > Hi, > > I am working on printed document of English language. I need to extract > all text from that image, but with simple tesseract it is taking 4 sec. Is > it possible to fine tune for only English alphabet and numbers? Please help. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/873c14b2-c16b-4711-8bb2-a2de10a1708cn%40googlegroups.com.