> When it's combining language model I've spotted that it's making some dawg files.
Yes, it takes the files from langdata repo specified in the training command. You could change langdata/pol/pol.wordlist to have only the LAST NAMES and GIVEN NAMES, pol.punc to have only < and change number formats in pol.numbers to the MRZ number patterns (i.e. any required customizations based on your use set). I am not sure how much the dawgs help with the LSTM engine, but you can try after customizing to see if you get improved results. On Thu, Sep 6, 2018 at 4:23 PM, <kaminski.robert...@gmail.com> wrote: > Thank you for your reply Shreeshrii! > > Indeed finetune method is much much better solution for my problem. Thanks > to your logs and data provided in repo I realized that I don't need to > generate every single MRZ code separately (I'm sure it was mentioned > somewhere <dummy>). In fact the process of making tiffs with boxes and then > lstmf's was oddly long (also loading lines in form o pages takes much less > time). Using merged data is now just a matter of seconds. I don't know if > it affected accuracy but now I'm generating every code in one .txt file and > then processing it. > > I've managed to make my own trainneddata based on polish language and > results are outstanding. Thank you very much! > > Usually I've avoided tesstrain.sh script and was trying to use my own just > to customize the process and control it. When it's combining language model > I've spotted that it's making some dawg files. Is it because I'm using > already existing language data? If so how can i generate langdata myself > for custom language. In this case documentation isn't so clear. I know that > it's created by combine_lang_model based on wordlist(langdata). I don't > need it at the time but I think it's good idea to clear that out if I'll > need to do some training from scratch although I know it's rare case. > > Thank you for taking your time to solve my problem! :) > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/db6a0582-4372-489b-82ba-8cdd0301dbb8% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/db6a0582-4372-489b-82ba-8cdd0301dbb8%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUCfV5LrfSqxDZh%3DZV5rsTxPXT0cDtiizBhvjnfkvq%2Bfg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.