Hi Joe and Moffette, Thanks for the tips you provided. those are very helpful for me. These days I'm testing your instructions. Thanks again.
regards thilanka > > > > Topic: word > review<http://groups.google.com/group/tesseract-ocr/t/4e723fa1766b7167> > > Joe K <joekarlov...@gmail.com> Mar 08 11:02AM -0800 > ^<#12749c27dfe006e1_digest_top> > > Hey Thilanka, > > I ran into a similar problem when I only needed it to look at > hexidecimal values. What I ended up doing was creating a separate > "langauge" that only contained the specified characters. So you could > create a langauge of numbers and a language with letters and use > tesseract to read each part of your image using the appropriate > language. > > The web address below shows you how to train tesseract for a specific > language. Hope this helps. > > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract > > > > > > > > Moffette <omoffe...@gmail.com> Mar 08 12:26PM -0800 > ^<#12749c27dfe006e1_digest_top> > > Hi, > > An easier way to deal with number only or letter, is to use this from > FAQ (http://code.google.com/p/tesseract-ocr/wiki/FAQ): > > > ---------------------------------------------------------------------------------------------------------------------------- > How do I recognize only digits? > > In 2.03 and above: > > Use > > TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789"); > > BEFORE calling an Init function or put this in a text file called > tessdata/configs/digits: > > tessedit_char_whitelist 0123456789 > > and then your command line becomes: > > tesseract image.tif outputbase nobatch digits > > Warning: Until the old and new config variables get merged, you must > have the nobatch parameter too. > > > ---------------------------------------------------------------------------------------------------------------------------- > > For the second part : " I'm willing to review the recognised letters > with the > possible words so we can improve the accuracy " > > If you are using a 2.0X version you could use the eng.user-words (a > user dictionary) as it's suggested in the FAQ (http://code.google.com/ > p/tesseract-ocr/wiki/FAQ) > > > > ---------------------------------------------------------------------------------------------------------------------------- > How do I provide my own dictionary? > > Easy: Replace tessdata/eng.user-words with your own word list, in the > same format - UTF8 text, one word per line. > > More difficult, but better for a large dictionary: Replace tessdata/ > eng.word-dawg with one created from your own word list, using > wordlist2dawg. See the TrainingTesseract wiki page for details. > > > ---------------------------------------------------------------------------------------------------------------------------- > > -- http://coders-view.blogspot.com/ http://thilankagekawuluwa.blogspot.com/ http://twitter.com/thilanka_k -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-...@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.