[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread Firlefanz
It works! I am so relieved. Thank you all for the help. Still I have a couple of questions since I've read a couple of tutorials, each using other commands: 1. Converting my Fraktur pdf files in tiff I use imagemagick. Is this the right command? convert -density 300 test.pdf -depth 8 -strip -b

Re: [tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread ShreeDevi Kumar
1. Check the output tif and adjust convert command if needed 2. Depending on your tesseract version you could try -l frk also. 3. Yes, you can get a pdf as output. Search Github issues, there is a long discussion thread regarding best ways to create a pdf output. Look for pdf and invisible pdf.

Re: [tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/tesseract/issues/660 Regarding pdf On Wed 11 Apr, 2018, 1:28 PM ShreeDevi Kumar, wrote: > 1. Check the output tif and adjust convert command if needed > > 2. Depending on your tesseract version you could try -l frk also. > > 3. Yes, you can get a pdf as output.

Re: [tesseract-ocr] Column splitting failed around fuzzy line

2018-04-11 Thread ShreeDevi Kumar
Try to look at leptonica sample programs about column splitting to see if you can preprocess the image better, before giving to tesseract On Wed 11 Apr, 2018, 11:46 AM Ewan Mellor, wrote: > Hi, > > > I am using Tesseract 4 (git 10f4998a) to process a file with two columns. > A snippet of the im

Re: [tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread Firlefanz
Thank you again. I think I'll stay with plain txt -- pdf looks too difficult to achieve. Now, next problem: Everything worked fine with my 1-page test pdf. I now tried to do the same with a 30 MB 500 pages pdf. After running convert -density 300 test.pdf -depth 8 -strip -background white -alph

[tesseract-ocr] Re: How to train for multiple languages?

2018-04-11 Thread Fanatico
Thanks, I was going to do this, just to be sure if there wasn't a way to train 2 traineddata like the actual. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to t

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-11 Thread Fanatico
After some research in Korean I found that they do use Chinese characters in their language, so it is correct to set Chinese as a sublanguage, the problem is that the kor.training_text doesn't have chinede letters, so the code is only training Korean and ignoring the Chinese, so if I tesseract o

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

2018-04-11 Thread Kalven Schraut
After doing some more digging and running valgrind on code the last fews lines were ==360==by 0x95B913A: tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*, tesseract::WordData*) (control.cpp:1314) ==360==by 0x95BC63B: tesseract::Tesseract::RecogAllWordsPassN(int, ETEX