See https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951
This was for Devanagari and Indic languages. Also see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-text-requirements On Thu, Oct 10, 2019 at 12:45 PM peter bence <peterbence....@gmail.com> wrote: > I'm working with Arabic `langdata_lstm`, where it only has 84 lines of > training text in the `training_text` file, where I believe it is too small > for building/training a reliable model. After reading the `training_text` > file I can see a randomly generated text with no meaning, first I think > that this is an Arabic problem, but later I found that it is the same for > all other languages. > > *My questions are:* > > 1. What specifications are followed when generating these `training_text` > files (I can see for example that each line is no more than 60 characters > long, is this one of the specification?) > > 2. Could I simply extend the `training_text` file then generate my > training data with custom fonts and start training directly? or there are > other files that should be changed after changing this file? if yes, what > are they and how to regenerate them? > > Best Regards > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/f40d972a-50d8-4a17-b69c-3f83271b3af8%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/f40d972a-50d8-4a17-b69c-3f83271b3af8%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUi%3DOGxD3-U4cec7ggJYTA9DmSMwvUWcxSfzWvtZH%3DUEw%40mail.gmail.com.