I have a 1920x1080 screen and I have a script to screenshot it every so
often (usually 30 seconds) and I run tesseract on those screenshots to make
them searchable, so I can go back in time and find something that I thought
I recall seeing.

This works well, and has given me much appreciated certainty many times.
It is perhaps a little cpu/power hungry though.  It's the only thing that
pegs the cpu most times.  So today I optimized it to only run the OCR when
the battery is full and using AC power.

Then I got to thinking.  Tesseract takes about 4 seconds to process one
screenshot.  Or about 13% of my whole cpu.  That's only okay for web
browsers, right? :-p

Is there a way to speed that up?  So I read
https://tesseract-ocr.github.io/tessdoc/FAQ.html#can-i-increase-speed-of-ocr
And I tried "tessedit_do_invert=0 " and it wrecked the output.  completely
unusable garbled output.

I've been specifying dpi 96 all this time and maybe dpi could affect
performance?

I tried "OMP_THREAD_LIMIT=1" as well.  But 1, 2, and 4 performed the same.
My cheap laptop has a " i5-1235U" cpu so 2 performance cores and 8
efficiency cores.  I have no idea how to tell tesseract to use the
performance cores only but maybe the e-cores slow it down.

I also wonder if there's some parts of tesseract that I can shut off to
reduce CPU usage... Knowing that my input is "perfect" text.  i.e. it will
never be tilted or rotated 90 or 180 degrees.  I only want to recognise
English.  And it is guaranteed never to have defects common to
printed/scanned paper images. Tesseract could be 'lazier' maybe and still
do a good job in this case.

Any suggestions, feedback?  maybe I should be trying to text-scrape via X11
or gtk somehow?  But I do often use ipmi kvmoip consoles or remote
terminals where my local PC wouldn't have the text in a buffer but it
should still be exceptionally clean text.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CADUq1f5tFHjEqC_S4fD%2BoeBhwmBV%3DmtqFxe9scPCcRBcoRgctw%40mail.gmail.com.

Reply via email to