Re: [tesseract-ocr] Re: How to use the "latin sanskrit" language?

Jajwalya Karajgikar Wed, 15 Sep 2021 00:09:03 -0700

Hello Frank, I am wondering if you have worked on " 3. OCR Kannada 
inscriptions and keep them in OCR'ed format". I am very interested in 
multilingual OCR-ing for Kannada inscriptions. You mention Epigraphy 
documents, might they be Epigraphia Carnatica? In which case I would be 
grateful for any knowledge you have to share.
Thank you,
Jajwalya
On Friday, May 15, 2020 at 5:39:07 AM UTC-4 Frank wrote:


> Hi, Ive just installed tesseract to OCR some old Epigraphy documents. I 
> used Google colab as well as a Mac install. All fine, except I am unable to 
> get the text with IAST...characters are substituted (ā becomes i etc). I 
> tried using the lang attribute as lat but it doesnt find a latin lang 
> package and installing latin script didnt help. Ive searched through all of 
> Shree's work on github, but cant figure this out. I have three objectives:
> 1. OCR english pages and search through them
> 2. It would be nice to convert the sanskrit into IAST and search through it
> 3. OCR Kannada inscriptions and keep them in OCR'ed format-this is 
> optional- a "good to have"
>
> Writing the search code doesnt seem to be tough, however the IAST 
> recognition/transcription is the challenge. Accuracy is not very important 
> as I have to search through volumes of inscriptions for specific key words 
> to recategorize a lot of mis categorised inscriptions on my research topic. 
> Any help would be appreciated. The volume itself doesnt make the Google OCR 
> solution suggested by Shree elsewhere practicable.
>
> Im new at Python and tesseract, though have programmed in the past.
> Any help is appreciated.
>
>
> On Friday, July 27, 2018 at 6:29:09 AM UTC+2, shree wrote:
>
>> You can try IAST ones from 
>> https://github.com/Shreeshrii/tessdata_shreetest?files=1
>>
>> On Fri 27 Jul, 2018, 8:27 AM Shree Devi Kumar, <shree...@gmail.com> 
>> wrote:
>>
> There is no official traineddata for san_latn or last. I have created some 
>>> experimental versions but the output is not fully accurate.
>>>
>>>
>>>
>>> On Fri 27 Jul, 2018, 12:21 AM John Muccigrosso, <jmuc...@gmail.com> 
>>> wrote:
>>>
>> You're telling tesseract that your text is in Latin. You need the 
>>>> traineddata for san-lat.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>>
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesser...@googlegroups.com.
>>>
>>>
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/d2fc7942-16a2-48f0-9651-920616179d54%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d2fc7942-16a2-48f0-9651-920616179d54%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5fdd7933-a7bc-42c3-82ee-4afbb8da40f9n%40googlegroups.com.

Re: [tesseract-ocr] Re: How to use the "latin sanskrit" language?

Reply via email to