You do not need to rename  traineddata. You can move them to tessdata
subdirectory e.g. tessdata/fast, tessdata/best and then use it at "-l
best/eng" or "-l fast/eng"

Zdenko


so 13. 1. 2024 o 3:38 Oliver Saintilien <osaintilie...@gmail.com>
napísal(a):

> Oh right, for those facing a similar issue, what I did was
> 1. relpace the eng.traineddata file with the  eng.traineddata found here 
> tesseract-ocr/tessdata:
> Trained models with fast variant of the "best" LSTM models + legacy models
> (github.com) <https://github.com/tesseract-ocr/tessdata/tree/main> I
> didn't delete the original file but renamed it.
> 2. Test the orientation command directly with tesseract in the terminal
> like so  tesseract
> "C:\Users\osain\OneDrive\Desktop\2000\Document_20240110_0001.jpg" stdout
> --psm 0 --oem 0
>
> If this command works in the terminal then it will work in the node
> wrapper version. Here is how I called it.
> tesseract.recognize(path, {
>       oem: 0,
>       psm: 0,
>       lang: "eng"
>     })
>     .then((data) => {
>       return data
>     })
>     .catch((error) => {
>       console.log(error.message)
>   })
>
>
> On Friday, January 12, 2024 at 8:21:03 PM UTC-5 Oliver Saintilien wrote:
>
>> Great it works like a charm now, thanks very much for your help.
>>
>> On Friday, January 12, 2024 at 10:42:05 AM UTC-5 g...@hobbelt.com wrote:
>>
>>> On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <osaint...@gmail.com>
>>> wrote:
>>>
>>>> Something else I tried was this
>>>> const tesseract = require("node-tesseract-ocr")
>>>>
>>> tesseract
>>>>   .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\
>>>> Document_20240109_0014.jpg`, {
>>>>     lang: "eng",
>>>>     oem: 1,
>>>>     psm: 0,
>>>>
>>>     "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata"
>>>>   })
>>>>
>>>> Thats when I get the error about the Tessdata env var. I have pasted it
>>>> below:
>>>>
>>>> Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992
>>>> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3
>>>> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
>>>> Error opening data file C:\Program/eng.traineddata
>>>> Please make sure the TESSDATA_PREFIX environment variable is set to
>>>> your "tessdata" directory.
>>>>
>>>
>>> Adding to Zdenko's answer: what you need to do is fix / patch
>>> node-tesseract-ocr (or file a bug report there and see if someone else does
>>> it for you; since this is open source I suggest fork+fix+pullreq at
>>> node-tesseract-ocr instead ;-) ) where it then correctly converts paths
>>> with spaces as specified in js config struct to operating system dependent
>>> correctly escaped commandline arguments for tesseract executable that is
>>> invoked by node-tesseract-ocr.
>>> Quickest fix would be to wrap the --tessdata-dir path argument in double
>>> quotes, which fixes most/your path issues on mswindows (as long as the path
>>> itself is not adversarial, containing dquote of it's own).
>>>
>>> In other words: currently node-tesseract-ocr produces this commandline,
>>> as reported by you:
>>>
>>> tesseract "C:\Users\osain\OneDrive\Desktop\1992
>>> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3
>>> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
>>>
>>> which is interpreted like this (extra newlines added to show the
>>> arguments separated):
>>>
>>> tesseract
>>>  "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
>>>  stdout
>>>  -l eng
>>>  --oem 1
>>>  --psm 3
>>>  --tessdata-dir C:\Program
>>> Files\Tesseract-OCR\tessdata
>>>
>>> so tesseract receives this and gets a damaged path PLUS a surplus
>>> argument it apparently ignored: "Files\Tesseract-OCR\tessdata".
>>>
>>> Would SHOULD have been generated by node-tesseract-ocr is this (with
>>> extra newlines again):
>>>
>>>
>>> tesseract
>>>  "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
>>>  stdout
>>>  -l eng
>>>  --oem 1
>>>  --psm 3
>>>  --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata"
>>>
>>> as was intended in the js code.
>>>
>>>
>>> HTH,
>>>
>>> Ger
>>>
>>>
>>>>>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/77f1b6af-6cea-4294-b4fd-5a2ec03ded23n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/77f1b6af-6cea-4294-b4fd-5a2ec03ded23n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zNdUKh9eTneHNp5nEJs%2BYOuq-GVvPiMvmkBiQP_hOYBA%40mail.gmail.com.

Reply via email to