I have a 1920x1080 screen and I have a script to screenshot it every so often (usually 30 seconds) and I run tesseract on those screenshots to make them searchable, so I can go back in time and find something that I thought I recall seeing.
This works well, and has given me much appreciated certainty many times. It is perhaps a little cpu/power hungry though. It's the only thing that pegs the cpu most times. So today I optimized it to only run the OCR when the battery is full and using AC power. Then I got to thinking. Tesseract takes about 4 seconds to process one screenshot. Or about 13% of my whole cpu. That's only okay for web browsers, right? :-p Is there a way to speed that up? So I read https://tesseract-ocr.github.io/tessdoc/FAQ.html#can-i-increase-speed-of-ocr And I tried "tessedit_do_invert=0 " and it wrecked the output. completely unusable garbled output. I've been specifying dpi 96 all this time and maybe dpi could affect performance? I tried "OMP_THREAD_LIMIT=1" as well. But 1, 2, and 4 performed the same. My cheap laptop has a " i5-1235U" cpu so 2 performance cores and 8 efficiency cores. I have no idea how to tell tesseract to use the performance cores only but maybe the e-cores slow it down. I also wonder if there's some parts of tesseract that I can shut off to reduce CPU usage... Knowing that my input is "perfect" text. i.e. it will never be tilted or rotated 90 or 180 degrees. I only want to recognise English. And it is guaranteed never to have defects common to printed/scanned paper images. Tesseract could be 'lazier' maybe and still do a good job in this case. Any suggestions, feedback? maybe I should be trying to text-scrape via X11 or gtk somehow? But I do often use ipmi kvmoip consoles or remote terminals where my local PC wouldn't have the text in a buffer but it should still be exceptionally clean text. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CADUq1f5tFHjEqC_S4fD%2BoeBhwmBV%3DmtqFxe9scPCcRBcoRgctw%40mail.gmail.com.