I am currently running a training run based on synthetic training data for Sanskrit to support both Devanagari script with vedic accents as well as iAST (Roman with diacritics support). I will share the traineddata for you and others who are interested to test how well it works with real life images.
On Mon, Sep 28, 2020, 10:43 shreyansh dwivedi <advocates...@gmail.com> wrote: > Hello everyone, > I want to train some diacritical which are not present in latin.trained > model, apart from latin i used vietnamese and latvian trained model but the > some of the diacriticals are missed in those models too, some of missed > characters are mentioned below which i need to recognise. > ṭ > Ṭ > ṅ > ṭh > ḍ > ḍh > ṇ > ṃ > ṣ > Ḥ > ḥ > I want to train the above diacritical to recognise the characters in the > text image, through the tesseract engine. > Any help would be appreciated and from the scratch would be a great way to > understand. > Thank you! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAMREWd6R%2Bec5r%3D77%2BRWGM7PUKZPqqJT%2BkNX6r9zwijvW5sxykQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAMREWd6R%2Bec5r%3D77%2BRWGM7PUKZPqqJT%2BkNX6r9zwijvW5sxykQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW7TbFaTCNbsSQBfVw8L%2BHf0AXOC-iJPtg4LG4sg9vPDw%40mail.gmail.com.