Hello,

         Yep that is exactly what I do; we read a predefined format so
I limit it to captial letters, digits and the < character.  It speeds
up the reading and removes a lot of the rubbish.

Cheers,

Neil

On 19 April 2010 01:00, zdenko podobny <zde...@gmail.com> wrote:
> Hello,
>
> if I correctly understood "Comment by ffournel, Mar 30, 2010" on
> http://code.google.com/p/tesseract-ocr/wiki/FAQ we can achieved the same
> behavior by creating config file (e.g. digits in directory
> tessdata/configs/) with line:
>
> tessedit_char_whitelist 0123456789
>
> and than to run:
>
> C:>tesseract.exe nine.tif out tessdata/configs/nobatch
> tessdata/configs/digits
>
> Zd
>
> On Sun, Apr 18, 2010 at 7:50 PM, MARTIN Pierre <hicksc...@gmail.com> wrote:
>>
>> Dear NGuyenQ,
>>
>> From the page http://www.pixel-technology.com/freeware/tessnet2/
>> tessnet2.Tesseract ocr = new tessnet2.Tesseract();
>> ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
>>
>> This is brilliant advice you just gave him. It is very effective, i just
>> tested it on document with only digits and a few special characters.
>> Since i'm working with C++ only (No .net wrapper), here is what i
>> recommend to do:
>> // Init your tess API.
>> _tessApi = new tesseract::TessBaseAPI();
>> // Set up the current directory and language prefix.
>> _tessApi->Init("./", "cst");
>> // This is only important if you'll be parsing pictures with only one line
>> of text (Which is my case).
>> _tessApi->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
>> // Here is the trick as explained and pointed by NGuyenQ:
>> _tessApi->SetVariable("tessedit_char_whitelist", "<0123456789");
>> // The in a loop for each of my documents, here is the idea:
>> PIX *pix = pixReadMemTiff((const l_uint8*)buffer.buffer().constData(),
>> buffer.size(), 0);
>> _tessApi->SetImage(pix);
>> doc.setRecognizedData("OCRLine", QString(text).trimmed());
>> pixDestroy(&pix);
>> delete [] text;
>> delete pix;
>> // Release everything.
>> _tessApi->Clear();
>> _tessApi->End();
>> delete _tessApi;
>> The very very interesting part is that before, i was getting "D" and "O"
>> instead of zeros, sometimes even "A" for "4" and "[]" and "[)" instead of
>> zeroes, despite my disambiguation file. Now, i'm getting everything correct,
>> which means the whitelist / blacklist are not just post-processing filters,
>> but real "recognition clues".
>> i recommend everyone to take note (Well... i'm discovering this feature
>> and it's real consequences, maybe you're not :D).
>> Pierre.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>



-- 
-- 

Neil Benn Msc
Director
Ziath Ltd
Phone :+44 (0)7508 107942
Website - http://www.ziath.com

IMPORTANT NOTICE:  This message, including any attached documents, is
intended only for the use of the individual or entity to which it is
addressed, and may contain information that is privileged,
confidential and exempt from disclosure under applicable law.  If the
reader of this message is not the intended recipient, or the employee
or agent responsible for delivering the message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify Ziath
Ltd immediately by email at i...@ziath.com. Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to