Apparently, version 4 doesn't support white 
listing. https://groups.google.com/g/tesseract-ocr/c/IBbQIQpdSpE
That is not good. 
On Friday, September 22, 2023 at 2:23:39 PM UTC+3 Des Bw wrote:

> The difference between zero and O is deeply problematic, for the human 
> eye. Some fonts make it even harder. 
> You can try the method used here: 
> https://pyimagesearch.com/2021/09/06/whitelisting-and-blacklisting-characters-with-tesseract-and-python/
> if that helps. 
> On Friday, September 22, 2023 at 9:43:51 AM UTC+3 [email protected] wrote:
>
>> I found the parameters
>> "C:\Program Files\Tesseract-OCR\tesseract.exe" "..\Lambregts0001 - 
>> cleaned.jpg" "Lambregts0001 - cleaned.txt" -c 
>> tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>>  
>> :@."
>> It is not working. "uw BTW nummer:: NLOO7900000B01"
>>
>> Any other ideas ?
>>
>> Op donderdag 21 september 2023 om 22:25:12 UTC+2 schreef 
>> [email protected]:
>>
>>> White list the digits so that the O will not confuse it. 
>>>
>> You can also try --psm 13 if all of your texts are single line.
>>>
>>
>>> On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <[email protected]> wrote:
>>>
>>>> Hi.
>>>> I am trying to use the tesseract engine instead of the nuance engine.
>>>> When i currently use tesseract.exe the image it returns a few strange 
>>>> characters.
>>>> 2x OO instead of 00
>>>>   "uw BTW nummer:: NLOO7900000B01"
>>>> instead of
>>>>   "uw BTW nummer:: NL007900000B01"
>>>> and
>>>> "Tel £01"
>>>> instead of
>>>> "Tel : 01"
>>>> but "Tel : 0168-452452" is recognized ok.
>>>>
>>>> I see no optimization using 
>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md 
>>>> because it are really clean documents.
>>>>
>>>> Am i missing some parameters ? Like a second run, or more accurate run 
>>>> etc.
>>>> Maybe compile tesseract.exe myself with different more quality 
>>>> parameters ?
>>>>
>>>> Thanks,
>>>> Alwin
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/97806b1f-b51b-4b03-b017-c26735a5f0b9n%40googlegroups.com.

Reply via email to