Apparently, version 4 doesn't support white listing. https://groups.google.com/g/tesseract-ocr/c/IBbQIQpdSpE That is not good. On Friday, September 22, 2023 at 2:23:39 PM UTC+3 Des Bw wrote:
> The difference between zero and O is deeply problematic, for the human > eye. Some fonts make it even harder. > You can try the method used here: > https://pyimagesearch.com/2021/09/06/whitelisting-and-blacklisting-characters-with-tesseract-and-python/ > if that helps. > On Friday, September 22, 2023 at 9:43:51 AM UTC+3 [email protected] wrote: > >> I found the parameters >> "C:\Program Files\Tesseract-OCR\tesseract.exe" "..\Lambregts0001 - >> cleaned.jpg" "Lambregts0001 - cleaned.txt" -c >> tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 >> >> :@." >> It is not working. "uw BTW nummer:: NLOO7900000B01" >> >> Any other ideas ? >> >> Op donderdag 21 september 2023 om 22:25:12 UTC+2 schreef >> [email protected]: >> >>> White list the digits so that the O will not confuse it. >>> >> You can also try --psm 13 if all of your texts are single line. >>> >> >>> On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <[email protected]> wrote: >>> >>>> Hi. >>>> I am trying to use the tesseract engine instead of the nuance engine. >>>> When i currently use tesseract.exe the image it returns a few strange >>>> characters. >>>> 2x OO instead of 00 >>>> "uw BTW nummer:: NLOO7900000B01" >>>> instead of >>>> "uw BTW nummer:: NL007900000B01" >>>> and >>>> "Tel £01" >>>> instead of >>>> "Tel : 01" >>>> but "Tel : 0168-452452" is recognized ok. >>>> >>>> I see no optimization using >>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md >>>> because it are really clean documents. >>>> >>>> Am i missing some parameters ? Like a second run, or more accurate run >>>> etc. >>>> Maybe compile tesseract.exe myself with different more quality >>>> parameters ? >>>> >>>> Thanks, >>>> Alwin >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/97806b1f-b51b-4b03-b017-c26735a5f0b9n%40googlegroups.com.

