You do not need to rename traineddata. You can move them to tessdata subdirectory e.g. tessdata/fast, tessdata/best and then use it at "-l best/eng" or "-l fast/eng"
Zdenko so 13. 1. 2024 o 3:38 Oliver Saintilien <osaintilie...@gmail.com> napísal(a): > Oh right, for those facing a similar issue, what I did was > 1. relpace the eng.traineddata file with the eng.traineddata found here > tesseract-ocr/tessdata: > Trained models with fast variant of the "best" LSTM models + legacy models > (github.com) <https://github.com/tesseract-ocr/tessdata/tree/main> I > didn't delete the original file but renamed it. > 2. Test the orientation command directly with tesseract in the terminal > like so tesseract > "C:\Users\osain\OneDrive\Desktop\2000\Document_20240110_0001.jpg" stdout > --psm 0 --oem 0 > > If this command works in the terminal then it will work in the node > wrapper version. Here is how I called it. > tesseract.recognize(path, { > oem: 0, > psm: 0, > lang: "eng" > }) > .then((data) => { > return data > }) > .catch((error) => { > console.log(error.message) > }) > > > On Friday, January 12, 2024 at 8:21:03 PM UTC-5 Oliver Saintilien wrote: > >> Great it works like a charm now, thanks very much for your help. >> >> On Friday, January 12, 2024 at 10:42:05 AM UTC-5 g...@hobbelt.com wrote: >> >>> On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <osaint...@gmail.com> >>> wrote: >>> >>>> Something else I tried was this >>>> const tesseract = require("node-tesseract-ocr") >>>> >>> tesseract >>>> .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\ >>>> Document_20240109_0014.jpg`, { >>>> lang: "eng", >>>> oem: 1, >>>> psm: 0, >>>> >>> "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata" >>>> }) >>>> >>>> Thats when I get the error about the Tessdata env var. I have pasted it >>>> below: >>>> >>>> Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992 >>>> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 >>>> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata >>>> Error opening data file C:\Program/eng.traineddata >>>> Please make sure the TESSDATA_PREFIX environment variable is set to >>>> your "tessdata" directory. >>>> >>> >>> Adding to Zdenko's answer: what you need to do is fix / patch >>> node-tesseract-ocr (or file a bug report there and see if someone else does >>> it for you; since this is open source I suggest fork+fix+pullreq at >>> node-tesseract-ocr instead ;-) ) where it then correctly converts paths >>> with spaces as specified in js config struct to operating system dependent >>> correctly escaped commandline arguments for tesseract executable that is >>> invoked by node-tesseract-ocr. >>> Quickest fix would be to wrap the --tessdata-dir path argument in double >>> quotes, which fixes most/your path issues on mswindows (as long as the path >>> itself is not adversarial, containing dquote of it's own). >>> >>> In other words: currently node-tesseract-ocr produces this commandline, >>> as reported by you: >>> >>> tesseract "C:\Users\osain\OneDrive\Desktop\1992 >>> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3 >>> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata >>> >>> which is interpreted like this (extra newlines added to show the >>> arguments separated): >>> >>> tesseract >>> "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" >>> stdout >>> -l eng >>> --oem 1 >>> --psm 3 >>> --tessdata-dir C:\Program >>> Files\Tesseract-OCR\tessdata >>> >>> so tesseract receives this and gets a damaged path PLUS a surplus >>> argument it apparently ignored: "Files\Tesseract-OCR\tessdata". >>> >>> Would SHOULD have been generated by node-tesseract-ocr is this (with >>> extra newlines again): >>> >>> >>> tesseract >>> "C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg" >>> stdout >>> -l eng >>> --oem 1 >>> --psm 3 >>> --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata" >>> >>> as was intended in the js code. >>> >>> >>> HTH, >>> >>> Ger >>> >>> >>>>>>> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/77f1b6af-6cea-4294-b4fd-5a2ec03ded23n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/77f1b6af-6cea-4294-b4fd-5a2ec03ded23n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zNdUKh9eTneHNp5nEJs%2BYOuq-GVvPiMvmkBiQP_hOYBA%40mail.gmail.com.