Oh right, for those facing a similar issue, what I did was 1. relpace the eng.traineddata file with the eng.traineddata found here tesseract-ocr/tessdata: Trained models with fast variant of the "best" LSTM models + legacy models (github.com) <https://github.com/tesseract-ocr/tessdata/tree/main> I didn't delete the original file but renamed it. 2. Test the orientation command directly with tesseract in the terminal like so tesseract "C:\Users\osain\OneDrive\Desktop\2000\Document_20240110_0001.jpg" stdout --psm 0 --oem 0
If this command works in the terminal then it will work in the node wrapper version. Here is how I called it. tesseract.recognize(path, { oem: 0, psm: 0, lang: "eng" }) .then((data) => { return data }) .catch((error) => { console.log(error.message) }) On Friday, January 12, 2024 at 8:21:03 PM UTC-5 Oliver Saintilien wrote: > Great it works like a charm now, thanks very much for your help. > > On Friday, January 12, 2024 at 10:42:05 AM UTC-5 g...@hobbelt.com wrote: > >> On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <osaint...@gmail.com> >> wrote: >> >>> Something else I tried was this >>> const tesseract = require("node-tesseract-ocr") >>> >> tesseract >>> .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\ >>> Document_20240109_0014.jpg`, { >>> lang: "eng", >>> oem: 1, >>> psm: 0, >>> >> "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata" >>> }) >>> >>> Thats when I get the error about the Tessdata env var. I have pasted it >>> below: >>> >>> Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992 >>> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 >>> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata >>> Error opening data file C:\Program/eng.traineddata >>> Please make sure the TESSDATA_PREFIX environment variable is set to your >>> "tessdata" directory. >>> >> >> Adding to Zdenko's answer: what you need to do is fix / patch >> node-tesseract-ocr (or file a bug report there and see if someone else does >> it for you; since this is open source I suggest fork+fix+pullreq at >> node-tesseract-ocr instead ;-) ) where it then correctly converts paths >> with spaces as specified in js config struct to operating system dependent >> correctly escaped commandline arguments for tesseract executable that is >> invoked by node-tesseract-ocr. >> Quickest fix would be to wrap the --tessdata-dir path argument in double >> quotes, which fixes most/your path issues on mswindows (as long as the path >> itself is not adversarial, containing dquote of it's own). >> >> In other words: currently node-tesseract-ocr produces this commandline, >> as reported by you: >> >> tesseract "C:\Users\osain\OneDrive\Desktop\1992 >> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 >> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata >> >> which is interpreted like this (extra newlines added to show the >> arguments separated): >> >> tesseract >> "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" >> stdout >> -l eng >> --oem 1 >> --psm 3 >> --tessdata-dir C:\Program >> Files\Tesseract-OCR\tessdata >> >> so tesseract receives this and gets a damaged path PLUS a surplus >> argument it apparently ignored: "Files\Tesseract-OCR\tessdata". >> >> Would SHOULD have been generated by node-tesseract-ocr is this (with >> extra newlines again): >> >> >> tesseract >> "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" >> stdout >> -l eng >> --oem 1 >> --psm 3 >> --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata" >> >> as was intended in the js code. >> >> >> HTH, >> >> Ger >> >> >>>>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/77f1b6af-6cea-4294-b4fd-5a2ec03ded23n%40googlegroups.com.