Solution is requested urgently. On Wed, Dec 2, 2015 at 4:25 PM, sriranga(83yrsold) < [email protected]> wrote:
> > I have created kan.unicharambigs(attached below) based on the output text > of Kan.training_text file (which is big). I could not understand how to > test the attached file and find out whether it works or not? > kindly point out my mistakes in fhe said attached file, if any, for which > i shall be thankful to you. I prefer to have commandline test if possible. > > ========================================================================== > Based on wiki instruction (extract reproduced below for ready reference) = > > The rules are not bidirectional, so if you want 'rn' to be considered when > 'm' is detected and vise versa you need a rule for each. > > Version 3.03 and on supports a new, simpler format for the unicharambigs > file: > > v2 > '' " 1 > m rn 0 > iii m 0 > > In this format, the "error" and "correction" are simple utf-8 strings > separated by *a space*, and, after another space, the same type specifier > as v1 (0 for optional and 1 for mandatory substitution). Note the downside > of this simpler format is that Tesseract has to encode the utf-8 strings > into the components of the unicharset. In complex scripts, this encoding > may be ambiguous. In this case, the encoding is chosen such as to use the > least utf-8 characters for each component, ie the shortest unicharset > components will make up the encoding. > > Like most other files used in training, the 'unicharambigs' file must be > encoded as UTF8, and must end with a newline character. The unicharambigs > format is also described in the unicharambigs(5) man page > <https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/unicharambigs.5.html>. > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/0d30025d-cc11-4f69-9e98-ec919d3f43df%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/0d30025d-cc11-4f69-9e98-ec919d3f43df%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CANKD7YzvkpNMbzdfnP_Z3SG7dMSMbCUWEqGSj1n4yqTCqTOVew%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

