The difference between zero and O is deeply problematic, for the human eye. Some fonts make it even harder. You can try the method used here: https://pyimagesearch.com/2021/09/06/whitelisting-and-blacklisting-characters-with-tesseract-and-python/ if that helps. On Friday, September 22, 2023 at 9:43:51 AM UTC+3 powe...@gmail.com wrote:
> I found the parameters > "C:\Program Files\Tesseract-OCR\tesseract.exe" "..\Lambregts0001 - > cleaned.jpg" "Lambregts0001 - cleaned.txt" -c > tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 > > :@." > It is not working. "uw BTW nummer:: NLOO7900000B01" > > Any other ideas ? > > Op donderdag 21 september 2023 om 22:25:12 UTC+2 schreef elvi...@gmail.com > : > >> White list the digits so that the O will not confuse it. >> > You can also try --psm 13 if all of your texts are single line. >> > >> On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <powe...@gmail.com> wrote: >> >>> Hi. >>> I am trying to use the tesseract engine instead of the nuance engine. >>> When i currently use tesseract.exe the image it returns a few strange >>> characters. >>> 2x OO instead of 00 >>> "uw BTW nummer:: NLOO7900000B01" >>> instead of >>> "uw BTW nummer:: NL007900000B01" >>> and >>> "Tel £01" >>> instead of >>> "Tel : 01" >>> but "Tel : 0168-452452" is recognized ok. >>> >>> I see no optimization using >>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md >>> because it are really clean documents. >>> >>> Am i missing some parameters ? Like a second run, or more accurate run >>> etc. >>> Maybe compile tesseract.exe myself with different more quality >>> parameters ? >>> >>> Thanks, >>> Alwin >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/307f6e78-78c3-464f-8167-f37c4eab1dc4n%40googlegroups.com.