White list the digits so that the O will not confuse it. You can also try --psm 13 if all of your texts are single line.
On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <powern...@gmail.com> wrote: > Hi. > I am trying to use the tesseract engine instead of the nuance engine. > When i currently use tesseract.exe the image it returns a few strange > characters. > 2x OO instead of 00 > "uw BTW nummer:: NLOO7900000B01" > instead of > "uw BTW nummer:: NL007900000B01" > and > "Tel £01" > instead of > "Tel : 01" > but "Tel : 0168-452452" is recognized ok. > > I see no optimization using > https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md > because it are really clean documents. > > Am i missing some parameters ? Like a second run, or more accurate run etc. > Maybe compile tesseract.exe myself with different more quality parameters ? > > Thanks, > Alwin > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BLi4kDOp-2wycjPz5%3DOmri5mE2byT-br4jpU_ve9hd-osQkTA%40mail.gmail.com.