On 20 July 2010 15:18, caro <caroline.ma...@gmail.com> wrote: > I try to complete these files, after looking at errors appearing > during the recognition. > Typically, I have the following error which occurs very ofter: > tesseract recognizes FESLLTS instead of RESULTS > > So I had in the file user-word: RESULTS > and in the file DangAmbigs: > 2 F E 2 R E > 2 L L 2 U L > 1 F 1 R > 1 L 1 U > > But when adding this, it does not change anything, and the OCR still > find FESLLTS, instead of RESULTS. > Any idea what am I doing wrong?
DangAmbigs and the dictionaries are used in a non-obvious way. There are two passes; one from the training data you usually download in the language pack, and the second, where that data has been adapted to the current data. The dictionaries and DangAmbigs ('Dang' is short for 'dangerous') are used to determine (approximately) if the characters are being interpreted properly. The 'dangerous' part is that these ambiguities are impossible sequences, so you shouldn't list legal character sequences in them: you'll not only not get the results you expect, you'll get worse results in places where you would have had a proper read. There is a post-processing tool that works more-or-less in the way you expect: http://www.cs.toronto.edu/~mreimer/tesseract.html -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-...@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.