[tesseract-ocr] Doing OCR on pdfs with embedded CID fonts

2019-04-02 Thread Kristóf Horváth
I just tried to doOCR on a pdf that has embedded CID fonts and gave me the following error: > 6329 [pool-2-thread-1] INFO org.ghost4j.Ghostscript - Error: >> can't process embedded font stream, > > 6329 [pool-2-thread-1] INFO org.ghost4j.Ghostscript - attempting >> to load the

[tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-02 Thread kotomi . niu
I go through the source code and find tesseract do Otsu Thresholding and put the binary pix in the Thresholder object. But It seems the Thresholder object haven't been invoked if I use lstm engines. As well as dpi size,tesseract wiki said it is better for 300 dpi. This is a requirement for tes

[tesseract-ocr] OCRing existing PDF

2019-04-02 Thread robert.j.richard via tesseract-ocr
Hello, I need to OCR several PDF, what command line can I use to batch (French text)? Can you point where I can find this information in the tesseract manual. Thanks, *Robert Richard * Archiviste en ethnologie acadienne Centre d'études acadiennes Anselme-Chiasson Université de Moncton Moncton

[tesseract-ocr] Does Tesseract Send Information to Google?

2019-04-02 Thread Dave Walsh
Hello, My company is using Tesseract for OCR in an internal application. The information contained may be sensitive in nature and be subject to GDPR rules. Does anyone know if Tesseract sends information to Google for processing? Or is it a completely standalone app that requires no externa

[tesseract-ocr] Please help a new member to train tess

2019-04-02 Thread Trong
Dear friends, I 'm trying to train tesseract 4 as here https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 (I also installed tesseract 4 https://github.com/tesseract-ocr/tesseract/wiki) But i got error from first step "$ sudo make training-install :( make: *** No rule to make

Re: [tesseract-ocr] Doing OCR on pdfs with embedded CID fonts

2019-04-02 Thread Shree Devi Kumar
Tesseract does not take pdfs as direct input. You have to convert pdf to images and provide that to tesseract. However there are many 3rd party applications which take pdf as input and have tesseract as backend to do OCR. On Tue, Apr 2, 2019 at 5:02 PM Kristóf Horváth wrote: > I just tried to d

Re: [tesseract-ocr] Does Tesseract Send Information to Google?

2019-04-02 Thread Shree Devi Kumar
Tesseract is a standalone app and can be run locally. On Tue, Apr 2, 2019 at 7:26 PM Dave Walsh wrote: > Hello, > > > My company is using Tesseract for OCR in an internal application. The > information contained may be sensitive in nature and be subject to GDPR > rules. Does anyone know if Tes

Re: [tesseract-ocr] Does Tesseract Send Information to Google?

2019-04-02 Thread Du Kotomi
can you help me answer the question. confuse whether Otsu Thresholding affects lstm training This topic has been submitted,but no one answers On Tue, Apr 2, 2019 at 23:29 Shree Devi Kumar wrote: > Tesseract is a standalone app and can be run locally. > > On Tue, Apr 2, 2019 at 7:26 PM Dave Wal

Re: [tesseract-ocr] Please help a new member to train tess

2019-04-02 Thread Shree Devi Kumar
Please see https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation for building tesseract. To install pre-built versions, see https://github.com/tesseract-ocr/tesseract/wiki#installation On Tue, Apr 2, 2019 at 8:50 PM Trong wrote: > Dear friends, > I 'm trying to tr

[tesseract-ocr] Tesseract 4 Training Tutorials

2019-04-02 Thread Shree Devi Kumar
I have setup a github repo with the required files and bash scripts for running Tesseract 4 Training Tutorials. https://github.com/Shreeshrii/tess4training Please give it a try and let me know of any problems. -- You received this message because you are subscribed to the Google Groups "tesser

Re: [tesseract-ocr] Re: Example how to use tessseract C-API in python with cffi

2019-04-02 Thread Zdenko Podobny
OK. I have more time to look at you code and I see there few problems: - Whitelist does not work at tesseract 4.0 (search for more details in forum/issue tracker) - Setting variable: there is no variable like "image_default_resolution" in tesseract 4.x - you need to check return value

[tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-02 Thread kotomi . niu
Sorry for disturb again. I have sent my issue befire, but no one gives the answer. I really need your help. I go through the source code and find tesseract do Otsu Thresholding and put the binary pix in the Thresholder object. But It seems the Thresholder object haven't been invoked if I us

Re: [tesseract-ocr] Re: Example how to use tessseract C-API in python with cffi

2019-04-02 Thread Guru Govindan
Thanks a lot for your response and example. I will try today. I dont have much experience with cffi. So I decided to write my tesseract python interface with just ctypes. The following is my code for the same. It seems to work. So for loading the tesseract library it takes about 160ms and recogn

[tesseract-ocr] Re: Tesseract 4 Training Tutorials

2019-04-02 Thread Kristóf Horváth
Woho, I will try as soon as I get a chance. Might not be today, I am preparing for exam, but I am really happy you made this. 2019. április 2., kedd 17:53:01 UTC+2 időpontban shree a következőt írta: > > I have setup a github repo with the required files and bash scripts for > running Tesser

Re: [tesseract-ocr] Re: Tesseract 4 Training Tutorials

2019-04-02 Thread Soumik Ranjan Dasgupta
Thank you for your effort Shree, appreciate it! On Wed, Apr 3, 2019 at 11:39 AM Kristóf Horváth wrote: > Woho, I will try as soon as I get a chance. Might not be today, I am > preparing for exam, but I am really happy you made this. > > 2019. április 2., kedd 17:53:01 UTC+2 időpontban shree