Re: [tesseract-ocr] Re: How to use the "latin sanskrit" language?

Greg Jay Sun, 20 Nov 2022 22:13:48 -0800

I have installed Tesseract 5.2.0 on Macbookpro (M1 Apple Silicon) running 
MacOS 12.6.1 Monterey using Homebrew.


I have downloaded IAST.traineddata.

Moving this file into /opt/homebrew/cellar/tesseract/5.2.0/share/tessdata 
or /opt/homebrew/opt/tesseract-lang/share/tessdata doesn't seem to work.

I get error messages.

How do I load it into the program and use it for OCRing IAST diacritics?

Also is there any traineddata files for ISO15919 diacritics? or for Indian 
Grantha script?

Thanks in advance

Greg




On Tuesday, September 14, 2021 at 9:09:04 PM UTC-10 jajw...@gmail.com wrote:

>
> Hello Frank, I am wondering if you have worked on " 3. OCR Kannada 
> inscriptions and keep them in OCR'ed format". I am very interested in 
> multilingual OCR-ing for Kannada inscriptions. You mention Epigraphy 
> documents, might they be Epigraphia Carnatica? In which case I would be 
> grateful for any knowledge you have to share.
> Thank you,
> Jajwalya
> On Friday, May 15, 2020 at 5:39:07 AM UTC-4 Frank wrote:
>
>> Hi, Ive just installed tesseract to OCR some old Epigraphy documents. I 
>> used Google colab as well as a Mac install. All fine, except I am unable to 
>> get the text with IAST...characters are substituted (ā becomes i etc). I 
>> tried using the lang attribute as lat but it doesnt find a latin lang 
>> package and installing latin script didnt help. Ive searched through all of 
>> Shree's work on github, but cant figure this out. I have three objectives:
>> 1. OCR english pages and search through them
>> 2. It would be nice to convert the sanskrit into IAST and search through 
>> it
>> 3. OCR Kannada inscriptions and keep them in OCR'ed format-this is 
>> optional- a "good to have"
>>
>> Writing the search code doesnt seem to be tough, however the IAST 
>> recognition/transcription is the challenge. Accuracy is not very important 
>> as I have to search through volumes of inscriptions for specific key words 
>> to recategorize a lot of mis categorised inscriptions on my research topic. 
>> Any help would be appreciated. The volume itself doesnt make the Google OCR 
>> solution suggested by Shree elsewhere practicable.
>>
>> Im new at Python and tesseract, though have programmed in the past.
>> Any help is appreciated.
>>
>>
>> On Friday, July 27, 2018 at 6:29:09 AM UTC+2, shree wrote:
>>
>>> You can try IAST ones from 
>>> https://github.com/Shreeshrii/tessdata_shreetest?files=1
>>>
>>> On Fri 27 Jul, 2018, 8:27 AM Shree Devi Kumar, <shree...@gmail.com> 
>>> wrote:
>>>
>> There is no official traineddata for san_latn or last. I have created 
>>>> some experimental versions but the output is not fully accurate.
>>>>
>>>>
>>>>
>>>> On Fri 27 Jul, 2018, 12:21 AM John Muccigrosso, <jmuc...@gmail.com> 
>>>> wrote:
>>>>
>>> You're telling tesseract that your text is in Latin. You need the 
>>>>> traineddata for san-lat.
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>>
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesser...@googlegroups.com.
>>>>
>>>>
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d2fc7942-16a2-48f0-9651-920616179d54%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d2fc7942-16a2-48f0-9651-920616179d54%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0077c583-9e17-49fa-8f7f-06793c00969bn%40googlegroups.com.

Re: [tesseract-ocr] Re: How to use the "latin sanskrit" language?

Reply via email to