Re: [tesseract-ocr] Obtain both PDF and HOCR output from single scan?

2020-03-11 Thread Shree Devi Kumar
Use both at end of command line eg. tesseract image outbase -l foo --oem 1 hocr pdf On Thu, Mar 12, 2020, 03:59 Chris Falter wrote: > Hi, > > My project is using Tesseract 4.x to scan multi-page TIFFs. We need to > obtain HOCR output to perform some analytics, and we need to obtain a > searchab

[tesseract-ocr] Obtain both PDF and HOCR output from single scan?

2020-03-11 Thread Chris Falter
Hi, My project is using Tesseract 4.x to scan multi-page TIFFs. We need to obtain HOCR output to perform some analytics, and we need to obtain a searchable PDF to interact with a different system. The documentation shows how to make Tesseract produce *either *a HOCR *or *a PDF. Is it possible

Re: [tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread shree
I suggest you file an issue with Sikulix Also see https://github.com/RaiMan/SikuliX1/issues/246 On Wednesday, March 11, 2020 at 10:04:40 PM UTC+5:30, Jeremiah wrote: > > So I did download the latest version of the trained data file and tried > but it didn't work. In the actual Java code a Tessera

Re: [tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread Jeremiah
So I did download the latest version of the trained data file and tried but it didn't work. In the actual Java code a Tesseract object isn't ever created from what I can find, what the bots do is create a Region in Sikulix which then calls collectWordsText(). This is the code for reference. //

Re: [tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread Shree Devi Kumar
One possibility is that the eng.traineddata file you have is not compatible with the latest tesseract version you are using. The other possibility is that the Java userbot is calling tesseract with the wrong --oem. I have cc:ed Quan for advice regarding tess4j and Java. On Wed, Mar 11, 2020, 17:

[tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread Jeremiah
Yes, I've tried both C:\Program Files\Tesseract-OCR and C:\Program Files\Tesseract-OCR\tessdata and neither one work for me. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send

[tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread Jeremiah
Yes, I've tried both C:\Program Files\Tesseract-OCR and C:\Program Files\Tesseract-OCR\tessdata and neither one work for me. On Wednesday, March 11, 2020 at 7:21:57 AM UTC-4, PD wrote: > > Is TESSDATA_PREFIX pointing to tessdata directory ? It should point to > tessdata directory where it can fi

[tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread PD
Is TESSDATA_PREFIX pointing to tessdata directory ? It should point to tessdata directory where it can find traineddata file. Regards PD On Wednesday, March 11, 2020 at 1:10:13 AM UTC+5:30, Jeremiah wrote: > > I am getting this error when running some userbot java code on my Win10 > machine whi