> On Jan 27, 2021, at 1:42 AM, Shree Devi Kumar <shreesh...@gmail.com> wrote: > > >The Internet Archive has switched to using Tesseract for all our OCR, > > I am so happy to hear this. It will be great to have the Indic languages that > were marked as non-ocrable so far be converted to text correctly on Internet > Archive. > > Is there any page with instructions to do this? Can a language be specified > while OCRing? eg. Better results are many times received using > script/Devanagari instead of san for Sanskrit. > > Regarding your question about tessdata, there have only been minor changes to > tessdata files but adding a tag is a good idea. I suggest you post this as a > feature request in the repo.
I hope someone adds Grantha script as there are many texts on Archive.org <http://archive.org/> in this script. Greg -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7A10AE5A-E779-422F-97B1-FDE73198EEBE%40gmail.com.