Re: [tesseract-ocr] Diacriticals Training

Greg Jay Sun, 13 Dec 2020 22:40:56 -0800

Thank you

> On Dec 11, 2020, at 7:13 AM, shree <shreesh...@gmail.com> wrote:
> 
> For Sanskrit in Devanagari and IAST, you can try the traineddata files from 
> https://github.com/Shreeshrii/tesstrain-Sanskrit-IAST 
> <https://github.com/Shreeshrii/tesstrain-Sanskrit-IAST>
> 
> For Sanskrit alone, you can try the traineddata file from 
> https://github.com/Shreeshrii/tesstrain-sanPlusMinus 
> <https://github.com/Shreeshrii/tesstrain-sanPlusMinus>
Thanks I’m not sure exactly what to do with these links or the files they 
access?


> These have the float models, to improve speed they can be compressed using 
> `combine_tessdata -c`

Sorry but I don’t know what all this means?

> I would appreciate feedback on how well these work compared to the official 
> `san` and `Devanagari` files.

I would be happy to give feedback. I have been using san. But was unaware that 
you can also use Devanagari. What is the difference?

> I had done some training for grantha using the Noto fonts. But to be usable, 
> I need more training data of actual line images and their groundtruth 
> transcription. If you can provide that, I'll be happy to retrain it.

I would be happy to provide more examples of Grantha. If you tell me how to 
make “actual line images” and “groundtruth transcription”?

I can make images of Grantha. Let me know the format?

I can also provide the transliteration in IAST or ISO15919 or some other Indic 
script like Devanagari.

Sorry if I show my lack of understanding here.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6B83B287-A7FD-484F-B958-B2988C83286B%40gmail.com.

Re: [tesseract-ocr] Diacriticals Training

Reply via email to