You have not stated the version of tesseract that you are using. >We downloaded some online training data available for the language Malayalam
You have not mentioned from where you got it. Are these the official traineddata files? >we found that few special characters in the language are not picked up by the training data properly. Which characters? >Current achieved 60% accuracy With the LSTM engine, better results are expected. Please share a sample image with its expected result. You can also try https://ocr.sanskritdictionary.com/ On Sun, Mar 14, 2021, 00:41 avinash singh <avinasht...@gmail.com> wrote: > Hello, > > We are working on a project for underprivileged kids, we need to build an > OCR for the Malayalam language. > > We downloaded some online training data available for the language > Malayalam, the current accuracy is around 60%, we found that few special > characters in the language are not picked up by the training data properly. > > So we wanted to fine-tune the current training data, we did some research > and then downloaded Jtessbox editor for creating training data but we > couldn't edit the incorrect character. > > then we tried the QT-Box editor, we were able to edit the incorrect > letters but we couldn't generate the training data through the software > > Finally, we tried Cygwin with the command line to generate the custom data > but we failed to combine the training data > > As this is for an NGO our company wants to close this project with the > current achieved 60% accuracy, I really wish to complete this as the > English translation is completely wrong can someone please guide us on how > to train the data > > Any help would be much appreciated > Thanks in advance > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/84a6fc1f-300a-4aac-85b8-99c47a7d88f4n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/84a6fc1f-300a-4aac-85b8-99c47a7d88f4n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWKinoOd9QATkoL0hAVrehg%2BRq4NNNF-cAQ2WEXVGA2Xw%40mail.gmail.com.