You need upscaling, then a bit of blurring and it should work. For upscaling personally I tried Lanczos with a factor of 3x. This eliminates most of "8 vs. 3" errors. Don't forget that your source TIFF is BW (2 colors) so you have to save the upscaling result e.g. as a 24bit PNG.
For blurring - I used FastStone Image Viewer's Blur with a parameter of 14. If you want to use ImageMagick - I don't know how it exactly relates to Gaussian blur sigma, you have to experiment. Then a standard command line for Tesseract works well. At least no more "8 vs. 3" errors. Best regards, Dmitri Silaev www.CustomOCR.com On Tue, Feb 24, 2015 at 6:31 PM, Federico C. <[email protected]> wrote: > Hi , I'm having a problem with recognition of an invoice image, the > recognition is reading most of the 8 characters as 3s. > > Attached is the image I'm using. > > I have tried with different PSM and some basic configuration options > (resolution, avoid loading dawgs). > > Any help is appreciated. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/ad762df6-4617-4184-b5c5-aedf1ec9b92c%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ad762df6-4617-4184-b5c5-aedf1ec9b92c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFMRA2XexJ2SFXREBV7zU%3DJRkTCZB1qcO%3DgkAxsi6KfA6A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

