You have not stated the version of tesseract that you are using.

>We downloaded some online training data available for the language
Malayalam

You have not mentioned from where you got it. Are these the official
traineddata files?

>we found that few special characters in the language are not picked up by
the training data properly.

Which characters?

>Current achieved  60% accuracy

With the LSTM engine, better results are expected.

Please share a sample image with its expected result.

You can also try

https://ocr.sanskritdictionary.com/



On Sun, Mar 14, 2021, 00:41 avinash singh <avinasht...@gmail.com> wrote:

> Hello,
>
> We are working on a project for underprivileged kids, we need to build an
> OCR for the Malayalam language.
>
> We downloaded some online training data available for the language
> Malayalam,  the current accuracy is around 60%, we found that few special
> characters in the language are not picked up by the training data properly.
>
> So we wanted to fine-tune the current training data, we did some research
> and then downloaded Jtessbox editor for creating training data but we
> couldn't edit the incorrect character.
>
> then we tried the QT-Box editor, we were able to edit the incorrect
> letters but we couldn't generate the training data through the software
>
> Finally, we tried Cygwin with the command line to generate the custom data
> but we failed to combine the training data
>
> As this is for an NGO our company wants to close this project with the
> current achieved  60% accuracy, I really wish to complete this as the
> English translation is completely wrong can someone please guide us on how
> to train the data
>
> Any help would be much appreciated
> Thanks in advance
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/84a6fc1f-300a-4aac-85b8-99c47a7d88f4n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/84a6fc1f-300a-4aac-85b8-99c47a7d88f4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWKinoOd9QATkoL0hAVrehg%2BRq4NNNF-cAQ2WEXVGA2Xw%40mail.gmail.com.

Reply via email to