*tesseract executable problem:* for TESSDATA_PREFIX you use a path with space and you did not not escape it properly. That is why you get an error about an existing file ("C:\Program/eng.traineddata"). Solutions: a) use path without speciation characters like space b) learn how to properly escaped path to environment variables
When you solve this problem you will face the same problem (Error, OSD requires a model for the legacy engine) as with node-tesseract-ocr (that seems to take care about handling paths correctly) ;-) I guess problem is that OSD needs legacy engine while you restrict tesseract to use only LSTM engine. So you need to fix your option to allow usage of legacy engine. I am not sure if OSD needs also eng.traineddata with legacy components, but you will see. KR, Zdenko pi 12. 1. 2024 o 14:08 Oliver Saintilien <osaintilie...@gmail.com> napísal(a): > Sorry for the confusion, When I do > > tesseract > .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\ > Document_20240109_0014.jpg`, { > lang: "eng", > oem: 1, > psm: 0, > > }) > > I get > Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992 > Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 0 > Warning, detects only orientation with -l eng > Error, OSD requires a model for the legacy engine > > How do I fix this error? I am using it through this wrapper node-tesseract-ocr > - npm (npmjs.com) <https://www.npmjs.com/package/node-tesseract-ocr>. I > hear you when you say make sure tesseract (outside of wrapper) is > providing expected results. But thats the thing when I set psm to 0 I > expect to get back orientation data. However when I set the psm to other > numbers like 3 or 1 it returns to me the text from an image. > > Something else I tried was this > const tesseract = require("node-tesseract-ocr") > > tesseract > .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\ > Document_20240109_0014.jpg`, { > lang: "eng", > oem: 1, > psm: 0, > "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata" > }) > > Thats when I get the error about the Tessdata env var. I have pasted it > below: > > Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992 > Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 > --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata > Error opening data file C:\Program/eng.traineddata > Please make sure the TESSDATA_PREFIX environment variable is set to your > "tessdata" directory. > Failed loading language 'eng' > Tesseract couldn't load any languages! > Could not initialize tesseract. > > On Friday, January 12, 2024 at 1:11:56 AM UTC-5 zdenop wrote: > >> Unfortunately you don't. >> >> Instead of showing irrelevant information, make sure tesseract (outside >> of wrapper) is providing expected results. >> >> You are claiming "I keep getting an error that I have to set the >> TESSDATA_PREFIX" but your only relevant screenshot (you made it hardly >> readable) shows that this is not true. >> Please do not post a screenshot - send relevant logs (text) or copy text >> from the console. >> >> Zdenko >> >> >> pi 12. 1. 2024 o 4:59 Oliver Saintilien <osaint...@gmail.com> napísal(a): >> >>> >>> When I do >>> ```js >>> tesseract >>> .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\ >>> Document_20240109_0014.jpg`, { >>> lang: "eng", >>> oem: 1, >>> psm: 0, >>> >>> }) >>> .then((text) => { >>> >>> console.log(text ) >>> >>> }) ``` >>> I was expecting to get some orientation info on the image, like if its, >>> sideways, upsidedown, etc, but instead it gives me the error you see in my >>> subject, and in the screenshot. Changing the psm to 3 extracts the text >>> perfect! but when I change it to 0 I get that error. I got the number code >>> for psm from here Improving the quality of the output | tessdoc >>> (tesseract-ocr.github.io) >>> <https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html> >>> On Thursday, January 11, 2024 at 1:25:53 PM UTC-5 Oliver Saintilien >>> wrote: >>> >>>> So I keep getting an error that I have to set the TESSDATA_PREFIX env >>>> var which I did do, both in the User Vars and System Var. However after >>>> doing that I get another error. I attached screenshots to make my setup and >>>> issuse as clear as possible. Im using node-tesseract-ocr - npm >>>> (npmjs.com) <https://www.npmjs.com/package/node-tesseract-ocr> >>>> >>>> [image: Screenshot 2024-01-11 131619.png][image: Screenshot 2024-01-11 >>>> 131802.png] >>>> >>>> [image: Screenshot 2024-01-11 131330.png] >>> >>> -- >>> >> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/4e67e75a-b3c6-4f95-a168-eb8d9e50d6e3n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/4e67e75a-b3c6-4f95-a168-eb8d9e50d6e3n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/b9fe984e-97ef-41cf-9c82-5a79df78fbd3n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/b9fe984e-97ef-41cf-9c82-5a79df78fbd3n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zeq72CNsPpV47ZU3Dv%2BFLoULYa2MGZfZRbMFM20B%3D%3D9w%40mail.gmail.com.