It works! I am so relieved. Thank you all for the help.
Still I have a couple of questions since I've read a couple of tutorials,
each using other commands:
1. Converting my Fraktur pdf files in tiff I use imagemagick. Is this the
right command? convert -density 300 test.pdf -depth 8 -strip -b
1. Check the output tif and adjust convert command if needed
2. Depending on your tesseract version you could try -l frk also.
3. Yes, you can get a pdf as output.
Search Github issues, there is a long discussion thread regarding best ways
to create a pdf output.
Look for pdf and invisible pdf.
https://github.com/tesseract-ocr/tesseract/issues/660
Regarding pdf
On Wed 11 Apr, 2018, 1:28 PM ShreeDevi Kumar, wrote:
> 1. Check the output tif and adjust convert command if needed
>
> 2. Depending on your tesseract version you could try -l frk also.
>
> 3. Yes, you can get a pdf as output.
Try to look at leptonica sample programs about column splitting to see if
you can preprocess the image better, before giving to tesseract
On Wed 11 Apr, 2018, 11:46 AM Ewan Mellor, wrote:
> Hi,
>
>
> I am using Tesseract 4 (git 10f4998a) to process a file with two columns.
> A snippet of the im
Thank you again. I think I'll stay with plain txt -- pdf looks too
difficult to achieve.
Now, next problem: Everything worked fine with my 1-page test pdf. I now
tried to do the same with a 30 MB 500 pages pdf. After running convert
-density 300 test.pdf -depth 8 -strip -background white -alph
Thanks, I was going to do this, just to be sure if there wasn't a way to
train 2 traineddata like the actual.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to t
After some research in Korean I found that they do use Chinese characters
in their language, so it is correct to set Chinese as a sublanguage, the
problem is that the kor.training_text doesn't have chinede letters, so the
code is only training Korean and ignoring the Chinese, so if I tesseract o
After doing some more digging and running valgrind on code the last fews
lines were
==360==by 0x95B913A:
tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*,
tesseract::WordData*) (control.cpp:1314)
==360==by 0x95BC63B: tesseract::Tesseract::RecogAllWordsPassN(int,
ETEX
8 matches
Mail list logo