No, you are not using best float tessdata files from: https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata There is nothing like eng_pcb.traineddata. (read your error message)
Zdenko po 22. 4. 2024 o 17:40 Surya VaraPrasad Alla <asvp.0...@gmail.com> napísal(a): > Hello, > > I have the similar response > > pytesseract.pytesseract.TesseractError: (1, "read_params_file: Can't open > tessedit_char_blacklist=,;: Error: Tesseract (legacy) engine requested, but > components are not present in > external/tesstrain/data/eng_pcb/eng_pcb.traineddata!! Failed loading > language 'eng_pcb' Tesseract couldn't load any languages! Could not > initialize tesseract.") > > tesseract --version: > tesseract -v > tesseract 4.1.1 > leptonica-1.82.0 > libgif 5.1.9 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 : > libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0 > Found AVX512BW > Found AVX512F > Found AVX2 > Found AVX > Found FMA > Found SSE > Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 > liblz4/1.9.3 libzstd/1.4.8 > > I am using best float tessdata files from: > https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata > > also tried some of possibilities in > https://github.com/ocrmypdf/OCRmyPDF/issues/209 > > I am looking for the source of the issue ---> could someone help if > understood the source. so I can work further. > On Tuesday, January 19, 2021 at 5:30:46 PM UTC+1 Shree Devi Kumar wrote: > >> >*wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >> >> That is not correct. You need to get the `raw` file. >> >> https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata >> >> *wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >> >> >> On Tue, Jan 19, 2021 at 9:49 PM Roparzh Hemon <roparz...@gmail.com> >> wrote: >> >>> >>> I downloaded it as you suggested, and as the terminal output below >>> shows, the file is now present at the correct place : >>> >>> $file /home/mbalambala/tesseract/tessdata/eng.traineddata >>> /home/mbalambala/tesseract/tessdata/eng.traineddata : HTML document, >>> UTF-8 Unicode text, with very long lines >>> >>> $ echo TESSDATA_PREFIX >>> /home/mbalambala/tesseract/tessdata >>> >>> but the error message stays exactly the same : >>> >>> $ tesseract Downloads/p1.pdf p1 >>> Error opening data file >>> /home/mbalambala/tesseract/tessdata/eng.traineddata >>> Please make sure the TESSDATA_PREFIX environment variable is set to your >>> "tessdata" directory. >>> Failed loading language 'eng' >>> Tesseract couldn't load any languages! >>> Could not initialize tesseract. >>> >>> >>> Whatever the real problem is, the error message is not detecting it. >>> >>> On Sunday, January 17, 2021 at 10:37:22 AM UTC+1 ... wrote: >>> >>>> Run the following command in order to get the eng.traineddata file >>>> within the tessdata directory: *wget >>>> https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >>>> >>> >>> >>> >>> -- >>> >> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y8f9X%2BUcRa8nADS3JDbS8Gn%3DZPtszgafmcSe3dt8yz1Q%40mail.gmail.com.