I have exactly the same problem for Amharic. I find three characters missing; and they are screwing the Ocr result. Dear Shree, can you help me please?
On Friday, January 6, 2017 at 3:50:38 PM UTC+3 shree wrote: > I have uploaded modified nor.traineddata at > > https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata > > See attached log and info file for commands used in training. It took > about 9 hours on my pc - about 1700 iterations only and then my PC froze so > I rebooted and created the traineddata for norlayer0.853_1615.lstm i.e. > 0.853 % character error rate at iteration number 1615. > > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shree...@gmail.com> > wrote: > >> @Peter, Have you tried the 4.0.0alpha version yet? >> >> @Ludvig F. Aarstad - Add a layer training worked for adding 'Æ' - I will >> upload the new traineddata so that you can test. You will need 4.0.alpha >> version for testing. >> >> Here is couple of the training tifs and OCRed text. >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Fri, Jan 6, 2017 at 5:01 PM, Peter <pe...@peterkrantz.se> wrote: >> >>> >>> >>> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree: >>>> >>>> Ray is planning to retrain the languages for the new 4.0.0 version >>>> sometime in January. So it would be helpful if you could open an issue on >>>> https://github.com/tesseract-ocr/langdata/issues with this information. >>>> >>> >>> Is it possible to contribute training data for this effort? I realise >>> swedish will not be on top of the list but I think it would be easy to >>> involve some of the research community here in contributing training data >>> if it could improve the language model. >>> >>> /Peter >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/be8e5df8-1283-4aa1-9b92-b3a4afc585f3n%40googlegroups.com.