Re: Tesseract fails recognizing simple and isolated digits. How can I train tesseract for recognizing digits from unknown font type

Nick White Wed, 26 Mar 2014 11:54:31 -0700

Hi V.Lorz,

Firstly, it's Tesseract 3.02.02, not 3.2. We may release version 3.2 
someday, but not for a long time yet ;)


Doing training is not going to help you, I'm afraid. The font is 
quite standard, so you aren't going to be able to do a better job at 
training Tesseract for it than the eng.traineddata provides.

Out of curiousity, why did you think that training would help you 
here? I ask as it's a very common misconception, but (AFAIK) our 
documentation doesn't imply it anywhere.

You may just have to accept that the accuracy from Tesseract won't 
be 100%, I'm afraid. Maybe someone else here has suggestions, but 
the image looks alright to me, so the general advice of "more 
preprocessing" may not be helpful.

Nick

On Wed, Mar 26, 2014 at 11:10:56AM -0700, V.Lorz wrote:
> Hi All,
> 
> I started integrating tesseract (version 3.2, EMGV) in a project for
> recognizing short texts in scanned images. Using some very simple image
> processing I extract the area of interest for speeding up the process.
> 
> The errors I get are related to recognition results, tesseract sometimes
> confuses the digits '6' and '5', the image bellow is recognized as "4436695"
> instead of "4436696". I'm using the default eng.traineddata file bundled with
> the library. Using some other trained data files from around the Inet I got 
> the
> same results with the same two digits (5 and 6). Before processing the image I
> configure tesseract to process only digits.
> 
> 
> [VwAAAAASUV]
> 
> Does anyone know what could be causing this error? How could I solve it?
> 
> I started reading the guide for training the engine (http://code.google.com/p/
> tesseract-ocr/wiki/TrainingTesseract3) as suggested in some other threads, but
> it is of near to no help for me. Is there any other guide around for 'dummies'
> like [presummably :(] me? In this case I want to train it using one image that
> I created from 40 sampled documents (attached here). Using jTessBoxEditor-1.0 
> I
> was able to generate and correct the box file. What should I do next?
> 
> 
> Thanks a lot in advance, V.Lorz
> 
> 
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
> 
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email
> to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Tesseract fails recognizing simple and isolated digits. How can I train tesseract for recognizing digits from unknown font type

Reply via email to