Re: [tesseract-ocr] combine_lang_model makes no dawg file

Shree Devi Kumar Mon, 17 Sep 2018 09:25:05 -0700

I use it as follows and it works. Please check that you are using correct
paths for the files.


combine_lang_model \
--input_unicharset ./layersan/san.unicharset \
--script_dir ~/langdata \
--words ~/langdata/san/san.wordlist \
--numbers ~/langdata/san/san.numbers \
--puncs ~/langdata/san/san.punc \
--output_dir ./layersan \
--lang san \
--pass_through_recoder \
--version_str ` cat ./layersan/san.new.version`

And, here is the unpacking of this traineddata file

~/tesstutorial-deva/layersan/san$ combine_tessdata -u san.traineddata ./san.

Extracting tessdata components from san.traineddata
Wrote ./san.config
Wrote ./san.lstm-punc-dawg
Wrote ./san.lstm-word-dawg
Wrote ./san.lstm-number-dawg
Wrote ./san.lstm-unicharset
Wrote ./san.lstm-recoder
Wrote ./san.version
Version
string:4.0.0-beta.4-138-g2093:san:shreeshrii20180917:from:4.00.00alpha:Devanagari:synth20170629test
0:config:size=1013, offset=192
18:lstm-punc-dawg:size=5306, offset=1205
19:lstm-word-dawg:size=15123986, offset=6511
20:lstm-number-dawg:size=450, offset=15130497
21:lstm-unicharset:size=12621, offset=15130947
22:lstm-recoder:size=1552, offset=15143568
23:version:size=92, offset=15145120




On Mon, Sep 17, 2018 at 4:18 PM, Hosein Khoshdel <hoskhosh...@gmail.com>
wrote:

> i used combine_lang_model like this:
>
> combine_lang_model    --input_unicharset     
> ../combinelangmodel/fas.lstm-unicharset
>  \
> --script_dir    ../combinelangmodel/sdir   \
> --outputdir    outputdir \
> --lang    fas  \
> --lang_is_rtl    true \
> --words    ..\lists\fas.wordlist  \
> --puncs    ..\lists\fas.punc  \
> --numbers     ..\lists\fas.numbers  \
>
> BTW i get fas.lstm-unicharset by using combine_tessdata with -u on
> official fas.traineddata and got fas.wordlist, fas.punc and fas.numbers
> from langdata repo. now almost everything is fine except that when i unpack
> the resulting traineddata there is no dawg file in it although the help
> says that if the 3 word lists are provided the dawg files are also added to
> traineddata file.
> can you please help me and show me what part i am doing wrong?
> also the extra spaces in command is just for better readability here
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/ecb262d7-d448-4125-a60e-ddf266aea40c%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/ecb262d7-d448-4125-a60e-ddf266aea40c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWq8PCg-VL2cKurCcyO0cKAFr-Gi3hCKYWoxf0An%3DniVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] combine_lang_model makes no dawg file

Reply via email to