If you used the tesstrain you trained the lstm engine. Why do you then ask tesseract to use a legacy engine? Do you understand what you are doing?
Zdenko št 25. 4. 2024 o 11:35 Surya VaraPrasad Alla <asvp.0...@gmail.com> napísal(a): > eng_pcb.traineddata is a traineddata starting with eng.traineddata > > i did lstm training to improve the detection of ocr rather than the > recognition. i used tesstrain git repo. > > final error: couldn't find the legacy components in eng_pcb.traineddata > > On Monday, April 22, 2024 at 6:43:54 PM UTC+2 zdenop wrote: > >> No, you are not using best float tessdata files from: >> https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata >> There is nothing like eng_pcb.traineddata. (read your error message) >> >> >> Zdenko >> >> >> po 22. 4. 2024 o 17:40 Surya VaraPrasad Alla <asvp...@gmail.com> >> napísal(a): >> >>> Hello, >>> >>> I have the similar response >>> >>> pytesseract.pytesseract.TesseractError: (1, "read_params_file: Can't >>> open tessedit_char_blacklist=,;: Error: Tesseract (legacy) engine >>> requested, but components are not present in >>> external/tesstrain/data/eng_pcb/eng_pcb.traineddata!! Failed loading >>> language 'eng_pcb' Tesseract couldn't load any languages! Could not >>> initialize tesseract.") >>> >>> tesseract --version: >>> tesseract -v >>> tesseract 4.1.1 >>> leptonica-1.82.0 >>> libgif 5.1.9 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 : >>> libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0 >>> Found AVX512BW >>> Found AVX512F >>> Found AVX2 >>> Found AVX >>> Found FMA >>> Found SSE >>> Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 >>> liblz4/1.9.3 libzstd/1.4.8 >>> >>> I am using best float tessdata files from: >>> https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata >>> >>> also tried some of possibilities in >>> https://github.com/ocrmypdf/OCRmyPDF/issues/209 >>> >>> I am looking for the source of the issue ---> could someone help if >>> understood the source. so I can work further. >>> On Tuesday, January 19, 2021 at 5:30:46 PM UTC+1 Shree Devi Kumar wrote: >>> >>>> >*wget >>>> >https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >>>> >>>> That is not correct. You need to get the `raw` file. >>>> >>>> >>>> https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata >>>> >>>> *wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >>>> >>>> >>>> On Tue, Jan 19, 2021 at 9:49 PM Roparzh Hemon <roparz...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> I downloaded it as you suggested, and as the terminal output below >>>>> shows, the file is now present at the correct place : >>>>> >>>>> $file /home/mbalambala/tesseract/tessdata/eng.traineddata >>>>> /home/mbalambala/tesseract/tessdata/eng.traineddata : HTML document, >>>>> UTF-8 Unicode text, with very long lines >>>>> >>>>> $ echo TESSDATA_PREFIX >>>>> /home/mbalambala/tesseract/tessdata >>>>> >>>>> but the error message stays exactly the same : >>>>> >>>>> $ tesseract Downloads/p1.pdf p1 >>>>> Error opening data file >>>>> /home/mbalambala/tesseract/tessdata/eng.traineddata >>>>> Please make sure the TESSDATA_PREFIX environment variable is set to >>>>> your "tessdata" directory. >>>>> Failed loading language 'eng' >>>>> Tesseract couldn't load any languages! >>>>> Could not initialize tesseract. >>>>> >>>>> >>>>> Whatever the real problem is, the error message is not detecting it. >>>>> >>>>> On Sunday, January 17, 2021 at 10:37:22 AM UTC+1 ... wrote: >>>>> >>>>>> Run the following command in order to get the eng.traineddata file >>>>>> within the tessdata directory: *wget >>>>>> https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata >>>>>> <https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata>* >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/47e8b734-5de9-4624-8872-ed91ac8775b4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/c0a86f51-b876-40ba-8d46-afdc3eccc96dn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/1629a028-e116-47f9-9253-faa642e4847bn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/1629a028-e116-47f9-9253-faa642e4847bn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y2xOP5GiG0eVnMmGYeYh62yvygVyQc1rwLZ3eRas0BAQ%40mail.gmail.com.