Re: Disable Special characters?

zdenko podobny Sun, 18 Apr 2010 16:01:25 -0700

Hello,

if I correctly understood "Comment by ffournel, Mar 30, 2010" on
http://code.google.com/p/tesseract-ocr/wiki/FAQ we can achieved the same
behavior by creating config file (e.g. digits in directory
tessdata/configs/) with line:


tessedit_char_whitelist 0123456789

and than to run:

C:>tesseract.exe nine.tif out tessdata/configs/nobatch
tessdata/configs/digits

Zd

On Sun, Apr 18, 2010 at 7:50 PM, MARTIN Pierre <hicksc...@gmail.com> wrote:

> Dear NGuyenQ,
>
> From the page http://www.pixel-technology.com/freeware/tessnet2/
> tessnet2.Tesseract ocr = new tessnet2.Tesseract();
> ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
>
> This is brilliant advice you just gave him. It is very effective, i just
> tested it on document with only digits and a few special characters.
> Since i'm working with C++ only (No .net wrapper), here is what i recommend
> to do:
>
> // Init your tess API.
>  _tessApi = new tesseract::TessBaseAPI();
> // Set up the current directory and language prefix.
>  _tessApi->Init("./", "cst");
>  // This is only important if you'll be parsing pictures with only one
> line of text (Which is my case).
>  _tessApi->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
> // Here is the trick as explained and pointed by NGuyenQ:
>  _tessApi->SetVariable("tessedit_char_whitelist", "<0123456789");
>  // The in a loop for each of my documents, here is the idea:
>  PIX *pix = pixReadMemTiff((const l_uint8*)buffer.buffer().constData(),
> buffer.size(), 0);
>  _tessApi->SetImage(pix);
> doc.setRecognizedData("OCRLine", QString(text).trimmed());
>  pixDestroy(&pix);
> delete [] text;
>  delete pix;
>  // Release everything.
> _tessApi->Clear();
>  _tessApi->End();
> delete _tessApi;
>
> The very very interesting part is that before, i was getting "D" and "O"
> instead of zeros, sometimes even "A" for "4" and "[]" and "[)" instead of
> zeroes, despite my disambiguation file. Now, i'm getting everything correct,
> which means the *whitelist / blacklist are not just post-processing
> filters, but real "recognition clues"*.
>
> i recommend everyone to take note (Well... i'm discovering this feature and
> it's real consequences, maybe you're not :D).
>
> Pierre.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Disable Special characters?

Reply via email to