On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <osaintilie...@gmail.com> wrote:
> Something else I tried was this > const tesseract = require("node-tesseract-ocr") > > tesseract > .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\ > Document_20240109_0014.jpg`, { > lang: "eng", > oem: 1, > psm: 0, > "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata" > }) > > Thats when I get the error about the Tessdata env var. I have pasted it > below: > > Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992 > Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 > --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata > Error opening data file C:\Program/eng.traineddata > Please make sure the TESSDATA_PREFIX environment variable is set to your > "tessdata" directory. > Adding to Zdenko's answer: what you need to do is fix / patch node-tesseract-ocr (or file a bug report there and see if someone else does it for you; since this is open source I suggest fork+fix+pullreq at node-tesseract-ocr instead ;-) ) where it then correctly converts paths with spaces as specified in js config struct to operating system dependent correctly escaped commandline arguments for tesseract executable that is invoked by node-tesseract-ocr. Quickest fix would be to wrap the --tessdata-dir path argument in double quotes, which fixes most/your path issues on mswindows (as long as the path itself is not adversarial, containing dquote of it's own). In other words: currently node-tesseract-ocr produces this commandline, as reported by you: tesseract "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata which is interpreted like this (extra newlines added to show the arguments separated): tesseract "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata so tesseract receives this and gets a damaged path PLUS a surplus argument it apparently ignored: "Files\Tesseract-OCR\tessdata". Would SHOULD have been generated by node-tesseract-ocr is this (with extra newlines again): tesseract "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata" as was intended in the js code. HTH, Ger >>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFP60frs9cGjyYwhvojUUAPpXxhGG2DeXAVzfinU7oSpVHPZtw%40mail.gmail.com.