The difference between zero and O is deeply problematic, for the human eye. 
Some fonts make it even harder. 
You can try the method used 
here: 
https://pyimagesearch.com/2021/09/06/whitelisting-and-blacklisting-characters-with-tesseract-and-python/
if that helps. 
On Friday, September 22, 2023 at 9:43:51 AM UTC+3 powe...@gmail.com wrote:

> I found the parameters
> "C:\Program Files\Tesseract-OCR\tesseract.exe" "..\Lambregts0001 - 
> cleaned.jpg" "Lambregts0001 - cleaned.txt" -c 
> tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>  
> :@."
> It is not working. "uw BTW nummer:: NLOO7900000B01"
>
> Any other ideas ?
>
> Op donderdag 21 september 2023 om 22:25:12 UTC+2 schreef elvi...@gmail.com
> :
>
>> White list the digits so that the O will not confuse it. 
>>
> You can also try --psm 13 if all of your texts are single line.
>>
>
>> On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <powe...@gmail.com> wrote:
>>
>>> Hi.
>>> I am trying to use the tesseract engine instead of the nuance engine.
>>> When i currently use tesseract.exe the image it returns a few strange 
>>> characters.
>>> 2x OO instead of 00
>>>   "uw BTW nummer:: NLOO7900000B01"
>>> instead of
>>>   "uw BTW nummer:: NL007900000B01"
>>> and
>>> "Tel £01"
>>> instead of
>>> "Tel : 01"
>>> but "Tel : 0168-452452" is recognized ok.
>>>
>>> I see no optimization using 
>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md 
>>> because it are really clean documents.
>>>
>>> Am i missing some parameters ? Like a second run, or more accurate run 
>>> etc.
>>> Maybe compile tesseract.exe myself with different more quality 
>>> parameters ?
>>>
>>> Thanks,
>>> Alwin
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/307f6e78-78c3-464f-8167-f37c4eab1dc4n%40googlegroups.com.

Reply via email to