White list the digits so that the O will not confuse it.
You can also try --psm 13 if all of your texts are single line.

On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <powern...@gmail.com> wrote:

> Hi.
> I am trying to use the tesseract engine instead of the nuance engine.
> When i currently use tesseract.exe the image it returns a few strange
> characters.
> 2x OO instead of 00
>   "uw BTW nummer:: NLOO7900000B01"
> instead of
>   "uw BTW nummer:: NL007900000B01"
> and
> "Tel £01"
> instead of
> "Tel : 01"
> but "Tel : 0168-452452" is recognized ok.
>
> I see no optimization using
> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
> because it are really clean documents.
>
> Am i missing some parameters ? Like a second run, or more accurate run etc.
> Maybe compile tesseract.exe myself with different more quality parameters ?
>
> Thanks,
> Alwin
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BLi4kDOp-2wycjPz5%3DOmri5mE2byT-br4jpU_ve9hd-osQkTA%40mail.gmail.com.

Reply via email to