eng_pcb.traineddata is a traineddata starting with eng.traineddata i did lstm training to improve the detection of ocr rather than the recognition. i used tesstrain git repo.
final error: couldn't find the legacy components in eng_pcb.traineddata On Monday, April 22, 2024 at 6:43:54 PM UTC+2 zdenop wrote: > No, you are not using best float tessdata files from: > https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata > There is nothing like eng_pcb.traineddata. (read your error message) > > > Zdenko > > > po 22. 4. 2024 o 17:40 Surya VaraPrasad Alla <asvp...@gmail.com> > napísal(a): > >> Hello, >> >> I have the similar response >> >> pytesseract.pytesseract.TesseractError: (1, "read_params_file: Can't open >> tessedit_char_blacklist=,;: Error: Tesseract (legacy) engine requested, but >> components are not present in >> external/tesstrain/data/eng_pcb/eng_pcb.traineddata!! Failed loading >> language 'eng_pcb' Tesseract couldn't load any languages! Could not >> initialize tesseract.") >> >> tesseract --version: >> tesseract -v >> tesseract 4.1.1 >> leptonica-1.82.0 >> libgif 5.1.9 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 : >> libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0 >> Found AVX512BW >> Found AVX512F >> Found AVX2 >> Found AVX >> Found FMA >> Found SSE >> Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 >> liblz4/1.9.3 libzstd/1.4.8 >> >> I am using best float tessdata files from: >> https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata >> >> also tried some of possibilities in >> https://github.com/ocrmypdf/OCRmyPDF/issues/209 >> >> I am looking for the source of the issue ---> could someone help if >> understood the source. so I can work further. >> On Tuesday, January 19, 2021 at 5:30:46 PM UTC+1 Shree Devi Kumar wrote: >> >>> >*wget >>> >https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >>> >>> That is not correct. You need to get the `raw` file. >>> >>> https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata >>> >>> *wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >>> >>> >>> On Tue, Jan 19, 2021 at 9:49 PM Roparzh Hemon <roparz...@gmail.com> >>> wrote: >>> >>>> >>>> I downloaded it as you suggested, and as the terminal output below >>>> shows, the file is now present at the correct place : >>>> >>>> $file /home/mbalambala/tesseract/tessdata/eng.traineddata >>>> /home/mbalambala/tesseract/tessdata/eng.traineddata : HTML document, >>>> UTF-8 Unicode text, with very long lines >>>> >>>> $ echo TESSDATA_PREFIX >>>> /home/mbalambala/tesseract/tessdata >>>> >>>> but the error message stays exactly the same : >>>> >>>> $ tesseract Downloads/p1.pdf p1 >>>> Error opening data file >>>> /home/mbalambala/tesseract/tessdata/eng.traineddata >>>> Please make sure the TESSDATA_PREFIX environment variable is set to >>>> your "tessdata" directory. >>>> Failed loading language 'eng' >>>> Tesseract couldn't load any languages! >>>> Could not initialize tesseract. >>>> >>>> >>>> Whatever the real problem is, the error message is not detecting it. >>>> >>>> On Sunday, January 17, 2021 at 10:37:22 AM UTC+1 ... wrote: >>>> >>>>> Run the following command in order to get the eng.traineddata file >>>>> within the tessdata directory: *wget >>>>> https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >>>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >>>>> >>>> >>>> >>>> >>>> -- >>>> >>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1629a028-e116-47f9-9253-faa642e4847bn%40googlegroups.com.