Hi Nguyen, Thanks for the suggestion. I've tried with the ROI and also isolating the digits as independent images but with no results improvement. In some images I got better results resizing the image by a scale factor of 2.5, some other images required DILATE/ERODE operations for closing 1 pixel holes.
V.Lorz On Thursday, March 27, 2014 1:53:27 AM UTC+1, Quan Nguyen wrote: > > I defined a ROI around each number and it seemed to produce better results. > > On Wednesday, March 26, 2014 1:10:56 PM UTC-5, V.Lorz wrote: >> >> Hi All, >> >> I started integrating tesseract (version 3.2, EMGV) in a project for >> recognizing short texts in scanned images. Using some very simple image >> processing I extract the area of interest for speeding up the process. >> >> The errors I get are related to recognition results, tesseract sometimes >> confuses the digits '6' and '5', the image bellow is recognized as "443669 >> *5*" instead of "443669*6*". I'm using the default *eng.traineddata*file >> bundled with the library. Using some other trained data files from >> around the Inet I got the same results with the same two digits (5 and 6). >> Before processing the image I configure tesseract to process only digits. >> >> >> >> >> Does anyone know what could be causing this error? How could I solve it? >> >> I started reading the guide for training the engine ( >> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 - >> tracked<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>) >> as suggested in some other threads, but it is of near to no help for me. Is >> there any other guide around for 'dummies' like [presummably :(] me? In >> this case I want to train it using one image that I created from 40 sampled >> documents (attached here). Using jTessBoxEditor-1.0 I was able to generate >> and correct the box file. What should I do next? >> >> >> Thanks a lot in advance, V.Lorz >> >> -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

