Re: [tesseract-ocr] How to get the net_spec

Shree Devi Kumar Sat, 16 Sep 2023 13:11:45 -0700

The language name headings seem to be missing from the tessdoc page for
tessdata_fast


Please revert to an older version of page from history

On Sat, Sep 16, 2023, 2:08 PM Shree Devi Kumar <[email protected]> wrote:

>
> https://github.com/tesseract-ocr/tessdoc/blob/main/Data-Files-in-tessdata_best.md
>
>
> https://github.com/tesseract-ocr/tessdoc/blob/main/Data-Files-in-tessdata_fast.md
>
> Version string : 4.00.00alpha : [Network specification] for tessdata_best
>
> tessdata_best models - *incomplete list*, only till Kannada.
>
> The flags are TrainingFlags from lstmrecognizer.h. 0x40 is compress
> unicharset and 1 is integer mode. The one from best has flags 40 =
> compress + not integer mode.
>
> afr
> Version 
> string:4.00.00alpha:afr:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
> LSTM training info:Network 
> str:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1],
> flags=40, iteration=286700, sample_iteration=286724, null_char=95,
> learning_rate=0.001, momentum=0.5, adam_beta=0.999
>
> amh
> Version string:4.00.00alpha:amh
> LSTM training info:Network 
> str:[1,36,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx192O1c1],
> flags=40, iteration=6112200, sample_iteration=6112270, null_char=284,
> learning_rate=0.001, momentum=0.5, adam_beta=0.999
>
>
>
> On Fri, Sep 15, 2023, 9:50 PM Des Bw <[email protected]> wrote:
>
>> For the last couple of days, I have been trying to train the amh data to
>> include some missing characters.
>>
>> I have seen that Shree was able to add the Norwegian Æ by removing the
>> top layer and training on it (
>> https://groups.google.com/g/tesseract-ocr/c/l33zsTEPj70/m/wPzPv6HiEQAJ).
>>
>> I was trying to do the same. But, the traineddata in Amharic doesn't
>> contain the net_spec information with the version line.
>>
>> Version:4.00.00alpha:amh:synth20170629
>>
>> 17:lstm:size=3356155, offset=192
>>
>> 18:lstm-punc-dawg:size=3154, offset=3356347
>>
>> 19:lstm-word-dawg:size=5007810, offset=3359501
>>
>> 20:lstm-number-dawg:size=810, offset=8367311
>>
>> 21:lstm-unicharset:size=18906, offset=8368121
>>
>> 22:lstm-recoder:size=2578, offset=8387027
>>
>> 23:version:size=30, offset=8389605
>>
>>
>> Can sb  (@Shree, please) help me on how to get the net_spec, or how to
>> proceed to add a layer to introduce the missing characters?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/5bedab52-2f9b-44ab-a97a-e2033d0e92den%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/5bedab52-2f9b-44ab-a97a-e2033d0e92den%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXh%2BBHZ2jzY1ZnJ4wpap74e-QxJ3ArsaO876ajXiiq7xg%40mail.gmail.com.

Re: [tesseract-ocr] How to get the net_spec

Reply via email to