Hello, if I correctly understood "Comment by ffournel, Mar 30, 2010" on http://code.google.com/p/tesseract-ocr/wiki/FAQ we can achieved the same behavior by creating config file (e.g. digits in directory tessdata/configs/) with line:
tessedit_char_whitelist 0123456789 and than to run: C:>tesseract.exe nine.tif out tessdata/configs/nobatch tessdata/configs/digits Zd On Sun, Apr 18, 2010 at 7:50 PM, MARTIN Pierre <hicksc...@gmail.com> wrote: > Dear NGuyenQ, > > From the page http://www.pixel-technology.com/freeware/tessnet2/ > tessnet2.Tesseract ocr = new tessnet2.Tesseract(); > ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only > > This is brilliant advice you just gave him. It is very effective, i just > tested it on document with only digits and a few special characters. > Since i'm working with C++ only (No .net wrapper), here is what i recommend > to do: > > // Init your tess API. > _tessApi = new tesseract::TessBaseAPI(); > // Set up the current directory and language prefix. > _tessApi->Init("./", "cst"); > // This is only important if you'll be parsing pictures with only one > line of text (Which is my case). > _tessApi->SetPageSegMode(tesseract::PSM_SINGLE_LINE); > // Here is the trick as explained and pointed by NGuyenQ: > _tessApi->SetVariable("tessedit_char_whitelist", "<0123456789"); > // The in a loop for each of my documents, here is the idea: > PIX *pix = pixReadMemTiff((const l_uint8*)buffer.buffer().constData(), > buffer.size(), 0); > _tessApi->SetImage(pix); > doc.setRecognizedData("OCRLine", QString(text).trimmed()); > pixDestroy(&pix); > delete [] text; > delete pix; > // Release everything. > _tessApi->Clear(); > _tessApi->End(); > delete _tessApi; > > The very very interesting part is that before, i was getting "D" and "O" > instead of zeros, sometimes even "A" for "4" and "[]" and "[)" instead of > zeroes, despite my disambiguation file. Now, i'm getting everything correct, > which means the *whitelist / blacklist are not just post-processing > filters, but real "recognition clues"*. > > i recommend everyone to take note (Well... i'm discovering this feature and > it's real consequences, maybe you're not :D). > > Pierre. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to tesseract-...@googlegroups.com. > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-...@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.