Please see
https://tesseract-ocr.github.io/tessdoc/Data-Files-in-tessdata_fast.html



Version string:4.00.00alpha:tir:synth20170629
LSTM training info:Network
str:[1,36,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx128O1c1], flags=41,
iteration=10498000, sample_iteration=10498000, null_char=267,
learning_rate=0.001, momentum=0.5, adam_beta=0.999



On Wed, Aug 5, 2020 at 8:49 AM Biniam <bini...@gmail.com> wrote:

> For language tir (which has over 350 characters) only 272 are included in
> the existing lstm tir.traineddata. I have a file with all the missing
> charset included and I have a training text. I want to recreate
> tir.traineddata but I could not find the exact commands and parameters used
> to make it.
>
> Basically, how to compile
> https://github.com/tesseract-ocr/langdata_lstm/tree/master/tir so I can
> get the same output as
> https://github.com/tesseract-ocr/tessdata_best/blob/master/tir.traineddata
>
>
> I followed the documentation in
> https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html to
> train from scratch and come up with a set of commands shown here
> https://github.com/TigrinyaNLP/Tigrinya-tasseract-ocr/blob/master/bin/train_from_scrach.sh
>
> But the final result is not that good. for example, I used
> --max_iterations 50000 and    net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48
> Lfx96 Lrx96 Lfx256 O1c352] but this parameters are copied from the eng
> example and may not be good fit for 'tir'. I would appreciate it if someone
> could tell me what commands are used to build tir.traineddata in
> tessdata_best.
>
> I know I could use fine-tune or adding the missing chars instead of
> building from scratch, but I have more things to modify (like adding
> wordlist, and other improvements, fonts) which will improve the quality of
> 'tir' a lot. This language is not that big and it should not be a big task
> as rebuilding 'eng'.
>
> Thanks,
> Biniam
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1b43703e-2816-40f0-8a23-41b2ed10c4eao%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/1b43703e-2816-40f0-8a23-41b2ed10c4eao%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX4Ui2Dz_GouBKxiLBb9fX4neOxyD%3Da1J8OwsnLXFwAyQ%40mail.gmail.com.

Reply via email to