Use langdata_lstm repo for LSTM training. That has larger training text. On Thu, Apr 15, 2021, 00:52 Venkatapathy S <venkat.s.i...@gmail.com> wrote:
> Hi, > I want to retrain Tesseract from the scratch for a particular language(I > have read as many resources as possible, including warnings, from the > Tutorial <https://tesseract-ocr.github.io/tessdoc/>, Github > <https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951> > and > this forum). Now to begin (and to get myself familiar with the process), I > was trying to start with the English language. When I was going through the > langdata files(https://github.com/tesseract-ocr/langdata) for English I > found out that the training text contains only 72 lines. Does the training > text provided in the langdata repository given as a sample text or is it > exactly the same set used to train the default eng.traineddata model > provided by the tesseract? Can someone help me with this, please? > > Regards, > Venkat > https://sites.google.com/view/venkatapathy/home > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/5f588dfc-5c8b-400a-96c5-65c547f27d46n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5f588dfc-5c8b-400a-96c5-65c547f27d46n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU80y%2Bubo37SLxWgYs_087%2BibJBBd2OUn2f50psc0OdHg%40mail.gmail.com.