I'm having the same issue, although, I see it when interfacing to tesseract programmatically. If I take the same image (It's a PERFECT source, coming from a machine generated PDF->PNG) and run it through tesseract on the command line, the equals does not show up.
I hope you managed to find a solution and just haven't updated this thread? Thanks! On Tuesday, June 21, 2022 at 10:25:33 AM UTC-7 Z. Jay wrote: > We have been using a competing OCR tool and are now evaluating a switch to > tesseract. However, when converting a png, tesseract randomly - albeit > rarely, returns characters where there is only white space. For example, > tesseract will return a comma or equal sign where there is only white > space. Scrutinizing the png I do not see anything such as dirt or a spec > which looks like anything other than white space. While this is rare and > random, it happens enough to be a problem. Note that this does not occur > when using our current OCR tool. I suspect someone has encountered this > issue before and already posted the solution somewhere on this list or > elsewhere. > > For reference, here is a comparison of the actual text and the text > returned by tesseract: > Actual: > 10/17 10/17, 0000 PAYMENT THANK YOU $64.79CR > > Returned: > 10/17, 10/17, 0000 =PAYMENT THANK YOU $64.79CR > > Any pointers appreciated. > > Thanks, > > --zj > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ac6e446d-30f2-4288-9952-0dec2c1952c9n%40googlegroups.com.