Re: [tesseract-ocr] add new characters

Timo Struppi Wed, 28 Oct 2020 13:51:11 -0700

Hello, Problem solved.

I just made a Linuxinstallation and the error was gone.


Thanks again for you file and help!


On Wednesday, October 28, 2020 at 3:06:18 AM UTC+1 shree wrote:

> Did you copy the traineddata file to 
> /usr/share/tesseract-ocr/4.00/tessdata?
> What's the value of TESSDATA_PREFIX  in your 'env' output?
>
> What's the output of?
>
> ls -l 
> /usr/share/tesseract-ocr/4.00/tessdata/Sanskrit-1017-fast.traineddata  
>
> combine_tessdata -d  
> /usr/share/tesseract-ocr/4.00/tessdata/Sanskrit-1017-fast.traineddata 
>
> tesseract --list-langs --tessdata-dir 
> /usr/share/tesseract-ocr/4.00/tessdata
>
> tesseract --list-langs
>
> tesseract -v
>
>
> On Wednesday, October 28, 2020 at 3:04:01 AM UTC+5:30 Timo Struppi wrote:
>
>> Help!  I get following errorcode. What am i doing wrong?
>>
>> Error opening data file 
>> /usr/share/tesseract-ocr/4.00/tessdata/Sanskrit-1017-fast.traineddata
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>> Failed loading language 'Sanskrit-1017-fast'
>> Tesseract couldn't load any languages!
>> Could not initialize tesseract.
>>
>> On Saturday, October 24, 2020 at 5:53:55 PM UTC+2 Timo Struppi wrote:
>>
>>> *perfect!* Thank you very much <3 Thats what i was looking for. 
>>> International Alphabet of Sanskrit Transliteration Characters.
>>>
>>> Can tell me in which folder i must place the .traineddata?  
>>>
>>> My configuration:
>>> tesseract 4.1.1
>>>  leptonica-1.79.0
>>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : 
>>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
>>>  Found AVX
>>>  Found SSE
>>>  Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 
>>> liblz4/1.9.2 libzstd/1.4.4
>>>
>>> Many thanks again for your fast help
>>>
>>> On Saturday, October 24, 2020 at 3:12:15 PM UTC+2 shree wrote:
>>>
>>>> Ray has suggested using plus-minus type of training for adding a couple 
>>>> of characters to the traineddata. Did you try that?
>>>>
>>>> Please share the training data you used (box/tiff pairs or lstmf files).
>>>>
>>>> I have done replace a layer training for Sanskrit. It adds the two 
>>>> characters you want (in addition to many other required for Sanskrit 
>>>> transliteration) . See sample image and attached output. The file is 
>>>> available at 
>>>> https://github.com/Shreeshrii/tess5training-sanskrit-iast/tree/main/tessdata/fast
>>>>
>>>>  
>>>>
>>>> On Sat, Oct 24, 2020 at 5:31 PM Timo Struppi <mac...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I dont want to invent the wheel new by creating a new language but how 
>>>>> do i add the letters ṛ and ī to the OCR??
>>>>>
>>>>> I tried a lot (vietOCR, Linux inteligent OCR solution, followed the 
>>>>> few avaible tutorials etc) for several days but i am still not achieve to 
>>>>> add a single letter. 
>>>>>
>>>>>
>>>>> Many thanks in advance
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/f23a9be3-dea4-46a6-8e21-dbe9c120d993n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f23a9be3-dea4-46a6-8e21-dbe9c120d993n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> -- 
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/430aa390-b441-462c-a106-308a82d6f26fn%40googlegroups.com.

Re: [tesseract-ocr] add new characters

Reply via email to