Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

ShreeDevi Kumar Thu, 18 Jan 2018 20:14:23 -0800

I also noticed that you are using just one font for training, and also
using the same font for evaluation.


While probably unrelated to the errors you are getting, lstm training from
scratch requires a large number of fonts and training text. You should try
fine-tune training to modify current best model for the font you need.

On 19-Jan-2018 7:23 AM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote:

> Take a look at the lines that are getting the error and check that all
> characters are in the unicharset generated by training.
>
> The size of lstm-unicharset is different than the one generated by the
> training text, note the message shown at beginning of training.
>
> Check github issues, one of the most recent ones re diff sizes of
> unicharset and it's impact on training.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXmtdXgQtHghaZWB%3D45L6kfmuKcDrecSFZmyG7rHPYfzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

Reply via email to