Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-27 Thread Grad
@shree thank you for the advice, it was helpful. I managed to get everything working satisfactorily: after adding additional training images, I now get perfect results (446 pass, 0 fail)! Furthermore, these results come with using the built-in "eng" model. I ended up not needing to re-train or

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-27 Thread Shree Devi Kumar
Thank you for sharing the results of your trial with fine-tuning and getting better results with the official traineddata after pre-processing the images. Hope your notes will help other users with similar questions. On Sun, Sep 27, 2020, 20:51 Grad wrote: > @shree thank you for the advice, it

[tesseract-ocr] Diacriticals Training

2020-09-27 Thread shreyansh dwivedi
Hello everyone, I want to train some diacritical which are not present in latin.trained model, apart from latin i used vietnamese and latvian trained model but the some of the diacriticals are missed in those models too, some of missed characters are mentioned below which i need to recognise. ṭ Ṭ ṅ

Re: [tesseract-ocr] Diacriticals Training

2020-09-27 Thread Shree Devi Kumar
I am currently running a training run based on synthetic training data for Sanskrit to support both Devanagari script with vedic accents as well as iAST (Roman with diacritics support). I will share the traineddata for you and others who are interested to test how well it works with real life image