eng_pcb.traineddata is a traineddata starting with eng.traineddata

i did lstm training to improve the detection of ocr rather than the 
recognition. i used tesstrain git repo. 

final error: couldn't find the legacy components in eng_pcb.traineddata 

On Monday, April 22, 2024 at 6:43:54 PM UTC+2 zdenop wrote:

> No, you are not using best float tessdata files from: 
> https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata
> There is nothing like eng_pcb.traineddata. (read your error message)
>
>
> Zdenko
>
>
> po 22. 4. 2024 o 17:40 Surya VaraPrasad Alla <asvp...@gmail.com> 
> napísal(a):
>
>> Hello,
>>
>> I have the similar response
>>
>> pytesseract.pytesseract.TesseractError: (1, "read_params_file: Can't open 
>> tessedit_char_blacklist=,;: Error: Tesseract (legacy) engine requested, but 
>> components are not present in 
>> external/tesstrain/data/eng_pcb/eng_pcb.traineddata!! Failed loading 
>> language 'eng_pcb' Tesseract couldn't load any languages! Could not 
>> initialize tesseract.")
>>
>> tesseract --version:
>> tesseract -v
>> tesseract 4.1.1
>>  leptonica-1.82.0
>>   libgif 5.1.9 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 : 
>> libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0
>>  Found AVX512BW
>>  Found AVX512F
>>  Found AVX2
>>  Found AVX
>>  Found FMA
>>  Found SSE
>>  Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 
>> liblz4/1.9.3 libzstd/1.4.8
>>
>> I am using best float tessdata files from: 
>> https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata
>>
>> also tried some of possibilities in 
>> https://github.com/ocrmypdf/OCRmyPDF/issues/209
>>
>> I am looking for the source of the issue ---> could someone help if 
>> understood the source. so I can work further.
>> On Tuesday, January 19, 2021 at 5:30:46 PM UTC+1 Shree Devi Kumar wrote:
>>
>>> >*wget 
>>> >https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata 
>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>*
>>>
>>> That is not correct. You need to get the `raw` file.
>>>
>>> https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata
>>>
>>> *wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata 
>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>*
>>>   
>>>
>>> On Tue, Jan 19, 2021 at 9:49 PM Roparzh Hemon <roparz...@gmail.com> 
>>> wrote:
>>>
>>>>
>>>> I downloaded it as you suggested, and as the terminal output below 
>>>> shows, the file is now present at the correct place :
>>>>
>>>> $file /home/mbalambala/tesseract/tessdata/eng.traineddata
>>>> /home/mbalambala/tesseract/tessdata/eng.traineddata : HTML document, 
>>>> UTF-8 Unicode text, with very long lines
>>>>
>>>> $ echo TESSDATA_PREFIX
>>>> /home/mbalambala/tesseract/tessdata
>>>>
>>>> but the error message stays exactly the same :
>>>>
>>>> $ tesseract Downloads/p1.pdf p1
>>>> Error opening data file 
>>>> /home/mbalambala/tesseract/tessdata/eng.traineddata
>>>> Please make sure the TESSDATA_PREFIX environment variable is set to 
>>>> your "tessdata" directory.
>>>> Failed loading language 'eng'
>>>> Tesseract couldn't load any languages!
>>>> Could not initialize tesseract.
>>>>
>>>>
>>>> Whatever the real problem is, the error message is not detecting it.
>>>>
>>>> On Sunday, January 17, 2021 at 10:37:22 AM UTC+1 ... wrote:
>>>>
>>>>> Run the following command in order to get the eng.traineddata file 
>>>>> within the tessdata directory: *wget 
>>>>> https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata 
>>>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>*
>>>>>
>>>>
>>>>  
>>>>
>>>> -- 
>>>>
>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1629a028-e116-47f9-9253-faa642e4847bn%40googlegroups.com.

Reply via email to