I have created kan.unicharambigs(attached below) based on the output text of Kan.training_text file (which is big). I could not understand how to test the attached file and find out whether it works or not? kindly point out my mistakes in fhe said attached file, if any, for which i shall be thankful to you. I prefer to have commandline test if possible.
========================================================================== Based on wiki instruction (extract reproduced below for ready reference) = The rules are not bidirectional, so if you want 'rn' to be considered when 'm' is detected and vise versa you need a rule for each. Version 3.03 and on supports a new, simpler format for the unicharambigs file: v2 '' " 1 m rn 0 iii m 0 In this format, the "error" and "correction" are simple utf-8 strings separated by *a space*, and, after another space, the same type specifier as v1 (0 for optional and 1 for mandatory substitution). Note the downside of this simpler format is that Tesseract has to encode the utf-8 strings into the components of the unicharset. In complex scripts, this encoding may be ambiguous. In this case, the encoding is chosen such as to use the least utf-8 characters for each component, ie the shortest unicharset components will make up the encoding. Like most other files used in training, the 'unicharambigs' file must be encoded as UTF8, and must end with a newline character. The unicharambigs format is also described in the unicharambigs(5) man page <https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/unicharambigs.5.html>. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0d30025d-cc11-4f69-9e98-ec919d3f43df%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
kan.unicharambigs
Description: Binary data

