Thank you for your reply Shreeshrii!

Indeed finetune method is much much better solution for my problem. Thanks 
to your logs and data provided in repo I realized that I don't need to 
generate every single MRZ code separately (I'm sure it was mentioned 
somewhere <dummy>). In fact the process of making tiffs with boxes and then 
lstmf's was oddly long (also loading lines in form o pages takes much less 
time). Using merged data is now just a matter of seconds. I don't know if 
it affected accuracy but now I'm generating every code in one .txt file and 
then processing it.

I've managed to make my own trainneddata based on polish language and 
results are outstanding. Thank you very much!

Usually I've avoided tesstrain.sh script and was trying to use my own just 
to customize the process and control it. When it's combining language model 
I've spotted that it's making some dawg files. Is it because I'm using 
already existing language data? If so how can i generate langdata myself 
for custom language. In this case documentation isn't so clear. I know that 
it's created by combine_lang_model based on wordlist(langdata). I don't 
need it at the time but I think it's good idea to clear that out if I'll 
need to do some training from scratch although I know it's rare case.

Thank you for taking your time to solve my problem! :)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/db6a0582-4372-489b-82ba-8cdd0301dbb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to