I'm having a similar issue with a font that i've trained for numbers and a 
few symbols only - i've attached a sample of the numbers. It is detecting 
2's as 8's in my case.

I tried using a Gaussian blur and it appears to help the issue. It also 
appears that depending on how much or how little blur it changes the 
results. Do you know why this is?

Do you know if it would help to blur the images when training tesseract too?

Thanks!
Andy

On Tuesday, February 24, 2015 at 10:20:48 AM UTC-6, Dmitri Silaev wrote:
>
> You need upscaling, then a bit of blurring and it should work.
>
> For upscaling personally I tried Lanczos with a factor of 3x. This 
> eliminates most of "8 vs. 3" errors. Don't forget that your source TIFF is 
> BW (2 colors) so you have to save the upscaling result e.g. as a 24bit PNG.
>
> For blurring - I used FastStone Image Viewer's Blur with a parameter of 
> 14. If you want to use ImageMagick - I don't know how it exactly relates to 
> Gaussian blur sigma, you have to experiment.
>
> Then a standard command line for Tesseract works well. At least no more "8 
> vs. 3" errors.
>
> Best regards,
> Dmitri Silaev
> www.CustomOCR.com
>
>
>
> On Tue, Feb 24, 2015 at 6:31 PM, Federico C. <[email protected] 
> <javascript:>> wrote:
>
>> Hi , I'm having a problem with recognition of an invoice image, the 
>> recognition is reading most of the 8 characters as 3s.
>>
>> Attached is the image I'm using.
>>
>> I have tried with different PSM and some basic configuration options 
>> (resolution, avoid loading dawgs).
>>
>> Any help is appreciated.
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/ad762df6-4617-4184-b5c5-aedf1ec9b92c%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/ad762df6-4617-4184-b5c5-aedf1ec9b92c%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b98b83a7-fe88-40f7-9aea-1bd52bf07304%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to