Great it works like a charm now, thanks very much for your help.

On Friday, January 12, 2024 at 10:42:05 AM UTC-5 g...@hobbelt.com wrote:

> On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <osaint...@gmail.com> wrote:
>
>> Something else I tried was this 
>> const tesseract = require("node-tesseract-ocr")
>>
> tesseract
>>   .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\
>> Document_20240109_0014.jpg`, {
>>     lang: "eng",
>>     oem: 1,
>>     psm: 0,
>>
>     "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata"
>>   }) 
>>
>> Thats when I get the error about the Tessdata env var. I have pasted it 
>> below:
>>  
>> Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992 
>> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 
>> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
>> Error opening data file C:\Program/eng.traineddata
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>>
>
> Adding to Zdenko's answer: what you need to do is fix / patch 
> node-tesseract-ocr (or file a bug report there and see if someone else does 
> it for you; since this is open source I suggest fork+fix+pullreq at 
> node-tesseract-ocr instead ;-) ) where it then correctly converts paths 
> with spaces as specified in js config struct to operating system dependent 
> correctly escaped commandline arguments for tesseract executable that is 
> invoked by node-tesseract-ocr.
> Quickest fix would be to wrap the --tessdata-dir path argument in double 
> quotes, which fixes most/your path issues on mswindows (as long as the path 
> itself is not adversarial, containing dquote of it's own).
>
> In other words: currently node-tesseract-ocr produces this commandline, as 
> reported by you:
>
> tesseract "C:\Users\osain\OneDrive\Desktop\1992 
> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 
> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
>
> which is interpreted like this (extra newlines added to show the arguments 
> separated):
>
> tesseract
>  "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
>  stdout 
>  -l eng
>  --oem 1
>  --psm 3
>  --tessdata-dir C:\Program 
> Files\Tesseract-OCR\tessdata
>
> so tesseract receives this and gets a damaged path PLUS a surplus argument 
> it apparently ignored: "Files\Tesseract-OCR\tessdata".
>
> Would SHOULD have been generated by node-tesseract-ocr is this (with extra 
> newlines again):
>
>
> tesseract
>  "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
>  stdout 
>  -l eng
>  --oem 1
>  --psm 3
>  --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata"
>
> as was intended in the js code.
>
>
> HTH,
>
> Ger
>
>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7308b2a0-2e9a-4cc1-8c92-0d186ce6d753n%40googlegroups.com.

Reply via email to