Great it works like a charm now, thanks very much for your help. On Friday, January 12, 2024 at 10:42:05 AM UTC-5 g...@hobbelt.com wrote:
> On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <osaint...@gmail.com> wrote: > >> Something else I tried was this >> const tesseract = require("node-tesseract-ocr") >> > tesseract >> .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\ >> Document_20240109_0014.jpg`, { >> lang: "eng", >> oem: 1, >> psm: 0, >> > "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata" >> }) >> >> Thats when I get the error about the Tessdata env var. I have pasted it >> below: >> >> Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992 >> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 >> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata >> Error opening data file C:\Program/eng.traineddata >> Please make sure the TESSDATA_PREFIX environment variable is set to your >> "tessdata" directory. >> > > Adding to Zdenko's answer: what you need to do is fix / patch > node-tesseract-ocr (or file a bug report there and see if someone else does > it for you; since this is open source I suggest fork+fix+pullreq at > node-tesseract-ocr instead ;-) ) where it then correctly converts paths > with spaces as specified in js config struct to operating system dependent > correctly escaped commandline arguments for tesseract executable that is > invoked by node-tesseract-ocr. > Quickest fix would be to wrap the --tessdata-dir path argument in double > quotes, which fixes most/your path issues on mswindows (as long as the path > itself is not adversarial, containing dquote of it's own). > > In other words: currently node-tesseract-ocr produces this commandline, as > reported by you: > > tesseract "C:\Users\osain\OneDrive\Desktop\1992 > Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 > --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata > > which is interpreted like this (extra newlines added to show the arguments > separated): > > tesseract > "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" > stdout > -l eng > --oem 1 > --psm 3 > --tessdata-dir C:\Program > Files\Tesseract-OCR\tessdata > > so tesseract receives this and gets a damaged path PLUS a surplus argument > it apparently ignored: "Files\Tesseract-OCR\tessdata". > > Would SHOULD have been generated by node-tesseract-ocr is this (with extra > newlines again): > > > tesseract > "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" > stdout > -l eng > --oem 1 > --psm 3 > --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata" > > as was intended in the js code. > > > HTH, > > Ger > > >>>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7308b2a0-2e9a-4cc1-8c92-0d186ce6d753n%40googlegroups.com.