Hello, I have the similar response
pytesseract.pytesseract.TesseractError: (1, "read_params_file: Can't open tessedit_char_blacklist=,;: Error: Tesseract (legacy) engine requested, but components are not present in external/tesstrain/data/eng_pcb/eng_pcb.traineddata!! Failed loading language 'eng_pcb' Tesseract couldn't load any languages! Could not initialize tesseract.") tesseract --version: tesseract -v tesseract 4.1.1 leptonica-1.82.0 libgif 5.1.9 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0 Found AVX512BW Found AVX512F Found AVX2 Found AVX Found FMA Found SSE Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.4.8 I am using best float tessdata files from: https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata also tried some of possibilities in https://github.com/ocrmypdf/OCRmyPDF/issues/209 I am looking for the source of the issue ---> could someone help if understood the source. so I can work further. On Tuesday, January 19, 2021 at 5:30:46 PM UTC+1 Shree Devi Kumar wrote: > >*wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata > <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* > > That is not correct. You need to get the `raw` file. > > https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata > > *wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata > <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* > > On Tue, Jan 19, 2021 at 9:49 PM Roparzh Hemon <roparz...@gmail.com> wrote: > >> >> I downloaded it as you suggested, and as the terminal output below shows, >> the file is now present at the correct place : >> >> $file /home/mbalambala/tesseract/tessdata/eng.traineddata >> /home/mbalambala/tesseract/tessdata/eng.traineddata : HTML document, >> UTF-8 Unicode text, with very long lines >> >> $ echo TESSDATA_PREFIX >> /home/mbalambala/tesseract/tessdata >> >> but the error message stays exactly the same : >> >> $ tesseract Downloads/p1.pdf p1 >> Error opening data file >> /home/mbalambala/tesseract/tessdata/eng.traineddata >> Please make sure the TESSDATA_PREFIX environment variable is set to your >> "tessdata" directory. >> Failed loading language 'eng' >> Tesseract couldn't load any languages! >> Could not initialize tesseract. >> >> >> Whatever the real problem is, the error message is not detecting it. >> >> On Sunday, January 17, 2021 at 10:37:22 AM UTC+1 ... wrote: >> >>> Run the following command in order to get the eng.traineddata file >>> within the tessdata directory: *wget >>> https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >>> >> >> >> >> -- >> > You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com.