Re: Problem using DangAmbigs and user-words files

Jimmy O'Regan Tue, 20 Jul 2010 07:47:18 -0700

On 20 July 2010 15:18, caro <caroline.ma...@gmail.com> wrote:
> I try to complete these files, after looking at errors appearing
> during the recognition.
> Typically, I have the following error which occurs very ofter:
> tesseract recognizes FESLLTS instead of RESULTS
>
> So I had in the file user-word: RESULTS
> and in the file DangAmbigs:
> 2 F E 2 R E
> 2 L L 2 U L
> 1 F 1 R
> 1 L 1 U
>
> But when adding this, it does not change anything, and the OCR still
> find FESLLTS, instead of RESULTS.
> Any idea what am I doing wrong?


DangAmbigs and the dictionaries are used in a non-obvious way. There
are two passes; one from the training data you usually download in the
language pack, and the second, where that data has been adapted to the
current data. The dictionaries and DangAmbigs ('Dang' is short for
'dangerous') are used to determine (approximately) if the characters
are being interpreted properly.

The 'dangerous' part is that these ambiguities are impossible
sequences, so you shouldn't list legal character sequences in them:
you'll not only not get the results you expect, you'll get worse
results in places where you would have had a proper read.

There is a post-processing tool that works more-or-less in the way you
expect: http://www.cs.toronto.edu/~mreimer/tesseract.html

-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Problem using DangAmbigs and user-words files

Reply via email to