I have created kan.unicharambigs(attached below) based on the output text 
of Kan.training_text file (which is big). I could not understand how to 
test the attached file and find out whether it works or not?
kindly point out my mistakes in fhe said attached file, if any, for which i 
shall be thankful to you. I prefer to have commandline test if possible.

==========================================================================
Based on wiki instruction (extract reproduced below for ready reference) =

The rules are not bidirectional, so if you want 'rn' to be considered when 
'm' is detected and vise versa you need a rule for each. 

Version 3.03 and on supports a new, simpler format for the unicharambigs 
file: 

v2
'' " 1
m rn 0
iii m 0

In this format, the "error" and "correction" are simple utf-8 strings 
separated by *a space*, and, after another space, the same type specifier 
as v1 (0 for optional and 1 for mandatory substitution). Note the downside 
of this simpler format is that Tesseract has to encode the utf-8 strings 
into the components of the unicharset. In complex scripts, this encoding 
may be ambiguous. In this case, the encoding is chosen such as to use the 
least utf-8 characters for each component, ie the shortest unicharset 
components will make up the encoding. 

Like most other files used in training, the 'unicharambigs' file must be 
encoded as UTF8, and must end with a newline character. The unicharambigs 
format is also described in the unicharambigs(5) man page 
<https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/unicharambigs.5.html>.
 


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0d30025d-cc11-4f69-9e98-ec919d3f43df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: kan.unicharambigs
Description: Binary data

Reply via email to