On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <osaintilie...@gmail.com>
wrote:

> Something else I tried was this
> const tesseract = require("node-tesseract-ocr")
>
> tesseract
>   .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\
> Document_20240109_0014.jpg`, {
>     lang: "eng",
>     oem: 1,
>     psm: 0,
>     "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata"
>   })
>
> Thats when I get the error about the Tessdata env var. I have pasted it
> below:
>
> Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992
> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3
> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
> Error opening data file C:\Program/eng.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to your
> "tessdata" directory.
>

Adding to Zdenko's answer: what you need to do is fix / patch
node-tesseract-ocr (or file a bug report there and see if someone else does
it for you; since this is open source I suggest fork+fix+pullreq at
node-tesseract-ocr instead ;-) ) where it then correctly converts paths
with spaces as specified in js config struct to operating system dependent
correctly escaped commandline arguments for tesseract executable that is
invoked by node-tesseract-ocr.
Quickest fix would be to wrap the --tessdata-dir path argument in double
quotes, which fixes most/your path issues on mswindows (as long as the path
itself is not adversarial, containing dquote of it's own).

In other words: currently node-tesseract-ocr produces this commandline, as
reported by you:

tesseract "C:\Users\osain\OneDrive\Desktop\1992
Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3
--tessdata-dir C:\Program Files\Tesseract-OCR\tessdata

which is interpreted like this (extra newlines added to show the arguments
separated):

tesseract
 "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
 stdout
 -l eng
 --oem 1
 --psm 3
 --tessdata-dir C:\Program
Files\Tesseract-OCR\tessdata

so tesseract receives this and gets a damaged path PLUS a surplus argument
it apparently ignored: "Files\Tesseract-OCR\tessdata".

Would SHOULD have been generated by node-tesseract-ocr is this (with extra
newlines again):


tesseract
 "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
 stdout
 -l eng
 --oem 1
 --psm 3
 --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata"

as was intended in the js code.


HTH,

Ger


>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAFP60frs9cGjyYwhvojUUAPpXxhGG2DeXAVzfinU7oSpVHPZtw%40mail.gmail.com.

Reply via email to