[tesseract-ocr] Re: tesseract returns random and spurious characters

Terry Hardie Wed, 15 Mar 2023 10:28:27 -0700

I'm having the same issue, although, I see it when interfacing to tesseract 
programmatically. If I take the same image (It's a PERFECT source, coming 
from a machine generated PDF->PNG) and run it through tesseract on the 
command line, the equals does not show up.


I hope you managed to find a solution and just haven't updated this thread?

Thanks!

On Tuesday, June 21, 2022 at 10:25:33 AM UTC-7 Z. Jay wrote:

> We have been using a competing OCR tool and are now evaluating a switch to 
> tesseract. However, when converting a png, tesseract randomly - albeit 
> rarely, returns characters where there is only white space. For example, 
> tesseract will return a comma or equal sign where there is only white 
> space. Scrutinizing the png I do not see anything such as dirt or a spec 
> which looks like anything other than white space. While this is rare and 
> random, it happens enough to be a problem. Note that this does not occur 
> when using our current OCR tool. I suspect someone has encountered this 
> issue before and already posted the solution somewhere on this list or 
> elsewhere.
>
> For reference, here is a comparison of the actual text and the text 
> returned by tesseract:
> Actual:
>    10/17  10/17, 0000 PAYMENT THANK YOU $64.79CR  
>
> Returned:
>    10/17, 10/17, 0000 =PAYMENT THANK YOU $64.79CR  
>
> Any pointers appreciated.
>
> Thanks,
>
> --zj
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ac6e446d-30f2-4288-9952-0dec2c1952c9n%40googlegroups.com.

[tesseract-ocr] Re: tesseract returns random and spurious characters

Reply via email to