I defined a ROI around each number and it seemed to produce better results.

On Wednesday, March 26, 2014 1:10:56 PM UTC-5, V.Lorz wrote:
>
> Hi All,
>
> I started integrating tesseract (version 3.2, EMGV) in a project for 
> recognizing short texts in scanned images. Using some very simple image 
> processing I extract the area of interest for speeding up the process. 
>
> The errors I get are related to recognition results, tesseract sometimes 
> confuses the digits '6' and '5', the image bellow is recognized as "443669
> *5*" instead of "443669*6*". I'm using the default *eng.traineddata* file 
> bundled with the library. Using some other trained data files from around 
> the Inet I got the same results with the same two digits (5 and 6). Before 
> processing the image I configure tesseract to process only digits.
>
>
>
>
> Does anyone know what could be causing this error? How could I solve it?
>
> I started reading the guide for training the engine (
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) as 
> suggested in some other threads, but it is of near to no help for me. Is 
> there any other guide around for 'dummies' like [presummably :(] me? In 
> this case I want to train it using one image that I created from 40 sampled 
> documents (attached here). Using jTessBoxEditor-1.0 I was able to generate 
> and correct the box file. What should I do next?
>
>
> Thanks a lot in advance, V.Lorz
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to