See
https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951

This was for Devanagari and Indic languages.

Also see
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-text-requirements

On Thu, Oct 10, 2019 at 12:45 PM peter bence <peterbence....@gmail.com>
wrote:

> I'm working with Arabic `langdata_lstm`, where it only has 84 lines of
> training text in the `training_text` file, where I believe it is too small
> for building/training a reliable model. After reading the `training_text`
> file I can see a randomly generated text with no meaning, first I think
> that this is an Arabic problem, but later I found that it is the same for
> all other languages.
>
> *My questions are:*
>
> 1. What specifications are followed when generating these `training_text`
> files (I can see for example that each line is no more than 60 characters
> long, is this one of the specification?)
>
> 2. Could I simply extend the `training_text` file then generate my
> training data with custom fonts and start training directly? or there are
> other files that should be changed after changing this file? if yes, what
> are they and how to regenerate them?
>
> Best Regards
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f40d972a-50d8-4a17-b69c-3f83271b3af8%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f40d972a-50d8-4a17-b69c-3f83271b3af8%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUi%3DOGxD3-U4cec7ggJYTA9DmSMwvUWcxSfzWvtZH%3DUEw%40mail.gmail.com.

Reply via email to