Re: [tesseract-ocr] What are Langdata repository given for retraining Tesseract

Shree Devi Kumar Thu, 15 Apr 2021 03:16:20 -0700

Use langdata_lstm repo for LSTM training. That has larger training text.

On Thu, Apr 15, 2021, 00:52 Venkatapathy S <[email protected]> wrote:


> Hi,
> I want to retrain Tesseract from the scratch for a particular language(I
> have read as many resources as possible, including warnings, from the
> Tutorial <https://tesseract-ocr.github.io/tessdoc/>, Github
> <https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951>
>  and
> this forum). Now to begin (and to get myself familiar with the process), I
> was trying to start with the English language. When I was going through the
> langdata files(https://github.com/tesseract-ocr/langdata) for English I
> found out that the training text contains only 72 lines. Does the training
> text provided in the langdata repository given as a sample text or is it
> exactly the same set used to train the default eng.traineddata model
> provided by the tesseract? Can someone help me with this, please?
>
> Regards,
> Venkat
> https://sites.google.com/view/venkatapathy/home
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/5f588dfc-5c8b-400a-96c5-65c547f27d46n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/5f588dfc-5c8b-400a-96c5-65c547f27d46n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU80y%2Bubo37SLxWgYs_087%2BibJBBd2OUn2f50psc0OdHg%40mail.gmail.com.

Re: [tesseract-ocr] What are Langdata repository given for retraining Tesseract

Reply via email to