If you used the tesstrain you trained the lstm engine. Why do you then ask
tesseract to use a legacy engine?
Do you understand what you are doing?

Zdenko


št 25. 4. 2024 o 11:35 Surya VaraPrasad Alla <asvp.0...@gmail.com>
napísal(a):

> eng_pcb.traineddata is a traineddata starting with eng.traineddata
>
> i did lstm training to improve the detection of ocr rather than the
> recognition. i used tesstrain git repo.
>
> final error: couldn't find the legacy components in eng_pcb.traineddata
>
> On Monday, April 22, 2024 at 6:43:54 PM UTC+2 zdenop wrote:
>
>> No, you are not using best float tessdata files from:
>> https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata
>> There is nothing like eng_pcb.traineddata. (read your error message)
>>
>>
>> Zdenko
>>
>>
>> po 22. 4. 2024 o 17:40 Surya VaraPrasad Alla <asvp...@gmail.com>
>> napísal(a):
>>
>>> Hello,
>>>
>>> I have the similar response
>>>
>>> pytesseract.pytesseract.TesseractError: (1, "read_params_file: Can't
>>> open tessedit_char_blacklist=,;: Error: Tesseract (legacy) engine
>>> requested, but components are not present in
>>> external/tesstrain/data/eng_pcb/eng_pcb.traineddata!! Failed loading
>>> language 'eng_pcb' Tesseract couldn't load any languages! Could not
>>> initialize tesseract.")
>>>
>>> tesseract --version:
>>> tesseract -v
>>> tesseract 4.1.1
>>>  leptonica-1.82.0
>>>   libgif 5.1.9 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 :
>>> libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0
>>>  Found AVX512BW
>>>  Found AVX512F
>>>  Found AVX2
>>>  Found AVX
>>>  Found FMA
>>>  Found SSE
>>>  Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8
>>> liblz4/1.9.3 libzstd/1.4.8
>>>
>>> I am using best float tessdata files from:
>>> https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata
>>>
>>> also tried some of possibilities in
>>> https://github.com/ocrmypdf/OCRmyPDF/issues/209
>>>
>>> I am looking for the source of the issue ---> could someone help if
>>> understood the source. so I can work further.
>>> On Tuesday, January 19, 2021 at 5:30:46 PM UTC+1 Shree Devi Kumar wrote:
>>>
>>>> >*wget 
>>>> >https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata
>>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>*
>>>>
>>>> That is not correct. You need to get the `raw` file.
>>>>
>>>>
>>>> https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata
>>>>
>>>> *wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata
>>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>*
>>>>
>>>>
>>>> On Tue, Jan 19, 2021 at 9:49 PM Roparzh Hemon <roparz...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> I downloaded it as you suggested, and as the terminal output below
>>>>> shows, the file is now present at the correct place :
>>>>>
>>>>> $file /home/mbalambala/tesseract/tessdata/eng.traineddata
>>>>> /home/mbalambala/tesseract/tessdata/eng.traineddata : HTML document,
>>>>> UTF-8 Unicode text, with very long lines
>>>>>
>>>>> $ echo TESSDATA_PREFIX
>>>>> /home/mbalambala/tesseract/tessdata
>>>>>
>>>>> but the error message stays exactly the same :
>>>>>
>>>>> $ tesseract Downloads/p1.pdf p1
>>>>> Error opening data file
>>>>> /home/mbalambala/tesseract/tessdata/eng.traineddata
>>>>> Please make sure the TESSDATA_PREFIX environment variable is set to
>>>>> your "tessdata" directory.
>>>>> Failed loading language 'eng'
>>>>> Tesseract couldn't load any languages!
>>>>> Could not initialize tesseract.
>>>>>
>>>>>
>>>>> Whatever the real problem is, the error message is not detecting it.
>>>>>
>>>>> On Sunday, January 17, 2021 at 10:37:22 AM UTC+1 ... wrote:
>>>>>
>>>>>> Run the following command in order to get the eng.traineddata file
>>>>>> within the tessdata directory: *wget
>>>>>> https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata
>>>>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1629a028-e116-47f9-9253-faa642e4847bn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/1629a028-e116-47f9-9253-faa642e4847bn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y2xOP5GiG0eVnMmGYeYh62yvygVyQc1rwLZ3eRas0BAQ%40mail.gmail.com.

Reply via email to