Oh right, for those facing a similar issue, what I did was 
1. relpace the eng.traineddata file with the  eng.traineddata found here 
tesseract-ocr/tessdata: 
Trained models with fast variant of the "best" LSTM models + legacy models 
(github.com) <https://github.com/tesseract-ocr/tessdata/tree/main> I didn't 
delete the original file but renamed it. 
2. Test the orientation command directly with tesseract in the terminal 
like so  tesseract 
"C:\Users\osain\OneDrive\Desktop\2000\Document_20240110_0001.jpg" stdout 
--psm 0 --oem 0 

If this command works in the terminal then it will work in the node wrapper 
version. Here is how I called it.
tesseract.recognize(path, {
      oem: 0,
      psm: 0,
      lang: "eng"
    })
    .then((data) => {
      return data
    })
    .catch((error) => {
      console.log(error.message)
  })


On Friday, January 12, 2024 at 8:21:03 PM UTC-5 Oliver Saintilien wrote:

> Great it works like a charm now, thanks very much for your help.
>
> On Friday, January 12, 2024 at 10:42:05 AM UTC-5 g...@hobbelt.com wrote:
>
>> On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <osaint...@gmail.com> 
>> wrote:
>>
>>> Something else I tried was this 
>>> const tesseract = require("node-tesseract-ocr")
>>>
>> tesseract
>>>   .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\
>>> Document_20240109_0014.jpg`, {
>>>     lang: "eng",
>>>     oem: 1,
>>>     psm: 0,
>>>
>>     "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata"
>>>   }) 
>>>
>>> Thats when I get the error about the Tessdata env var. I have pasted it 
>>> below:
>>>  
>>> Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992 
>>> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 
>>> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
>>> Error opening data file C:\Program/eng.traineddata
>>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>>> "tessdata" directory.
>>>
>>
>> Adding to Zdenko's answer: what you need to do is fix / patch 
>> node-tesseract-ocr (or file a bug report there and see if someone else does 
>> it for you; since this is open source I suggest fork+fix+pullreq at 
>> node-tesseract-ocr instead ;-) ) where it then correctly converts paths 
>> with spaces as specified in js config struct to operating system dependent 
>> correctly escaped commandline arguments for tesseract executable that is 
>> invoked by node-tesseract-ocr.
>> Quickest fix would be to wrap the --tessdata-dir path argument in double 
>> quotes, which fixes most/your path issues on mswindows (as long as the path 
>> itself is not adversarial, containing dquote of it's own).
>>
>> In other words: currently node-tesseract-ocr produces this commandline, 
>> as reported by you:
>>
>> tesseract "C:\Users\osain\OneDrive\Desktop\1992 
>> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 
>> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
>>
>> which is interpreted like this (extra newlines added to show the 
>> arguments separated):
>>
>> tesseract
>>  "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
>>  stdout 
>>  -l eng
>>  --oem 1
>>  --psm 3
>>  --tessdata-dir C:\Program 
>> Files\Tesseract-OCR\tessdata
>>
>> so tesseract receives this and gets a damaged path PLUS a surplus 
>> argument it apparently ignored: "Files\Tesseract-OCR\tessdata".
>>
>> Would SHOULD have been generated by node-tesseract-ocr is this (with 
>> extra newlines again):
>>
>>
>> tesseract
>>  "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
>>  stdout 
>>  -l eng
>>  --oem 1
>>  --psm 3
>>  --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata"
>>
>> as was intended in the js code.
>>
>>
>> HTH,
>>
>> Ger
>>
>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/77f1b6af-6cea-4294-b4fd-5a2ec03ded23n%40googlegroups.com.

Reply via email to