Re: [tesseract-ocr] I do not include 'chi_tra' in my tessdata folder . What is it ? I have seen language-specific.sh

2018-03-10 Thread 이경준
Sorry ... I just want to know tesseract4.0 sorry 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8f232574-7291-4798-af73-b1f2690bcf89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: I do not include 'chi_tra' in my tessdata folder . What is it ? I have seen language-specific.sh

2018-03-10 Thread Gonil Rho
2), 3):

I'm wondering about using tesseract 4.0 for multiple language, too.

After searching & testing a while, I found that it seems not working the 
old method for tesseract 3. (e.g. running with '-l lang1+lang2' option)
Is there any other method that I have to try?

Or I have to train tesseract with two languages at the same time? 


2018년 3월 10일 토요일 오후 4시 18분 48초 UTC+9, 이경준 님의 말:
>
> Hi i'm sorry to question oftenly. and lots of questions.
>
> But, I must use tesseract 4.0 for my business .
>
> plz understand my situations. I have lots of family to raise.
>
>
> ealier you gave me *a bash sciprt *. In there *tesstrain.sh* (course) . 
> it give me an error like 
>
>
> Please make sure the TESSDATA_PREFIX environment variable is set to your 
> "tessdata" directory.
> *Failed loading language 'chi_tra'*
> Error opening data file 
> /usr/share/tesseract-ocr/4.00/tessdata/chi_tra.traineddata
>
>
> *Before, you gave me a conference . it froms the lang directory / 
> kor.config.*
>
>
>
>
>
> *in there #Fixes https://github.com/tesseract-ocr/tesseract/issues/1009 
> preserve_interword_spaces
>  
> 1tessedit_load_sublangs chi_tra# New Segmentation search params*
>
>
> So I guess "tessedit_load_sublangs chi_tra" cause to error for executing 
> "tesstrain.sh"
>
> So I conclude(for solution) *1) Delete that sentence -> Is it right ? or 
> what is the side-effect*
> 
>
>I want to have 1 traineddata which is fine tuned and for 2 langugages 
> (korean & English)
>
>   so is it possible to add the sentece like  *1-1)*
>
> *"tessedit_load_sublangs eng"-> Is it right? or possible???*
> *In conclusion *
> *1)*
>
>
> *I do not want to see like error " Please make sure the TESSDATA_PREFIX 
> environment variable is set to your "tessdata" directory.Failed loading 
> language 'chi_tra'Error opening data file 
> /usr/share/tesseract-ocr/4.00/tessdata/chi_tra.traineddata " *
>
> *2) If I want to use tessereract(4.0) for 2 languages(e.g. Korean, 
> English) by 1(one) traineddata(which is fine tuned) *
>
> Is it possible and How to make 1 finedtuned traineddata for 2 
> languages(e.g Korean, English) 
>
> 3) tesseract is possible to use like 
>
> $ tesseract (picture.png) -l kor+eng
>
> is it possible ? 
>
> 4) What is kor.vert traineddata ? (tessdata-best) 
>
> What is different from kor.traineddata ??? 
>
> 5) Is it possible to fine tune by existing images??? How is it possible to 
> use script you gave me 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/224474f7-a56f-4434-ab56-4914907fdaad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: I do not include 'chi_tra' in my tessdata folder . What is it ? I have seen language-specific.sh

2018-03-10 Thread ShreeDevi Kumar
Lang1+lang2 should work. If it does not, please open an issue with an
example image.

If lang2 is English, you may want to try the script level traineddata,
which includes English with the other languages .

Please take a look at the readme file in tessdata_fast which explains about
script level files in more details.

On Sat 10 Mar, 2018, 6:57 PM Gonil Rho,  wrote:

> 2), 3):
>
> I'm wondering about using tesseract 4.0 for multiple language, too.
>
> After searching & testing a while, I found that it seems not working the
> old method for tesseract 3. (e.g. running with '-l lang1+lang2' option)
> Is there any other method that I have to try?
>
> Or I have to train tesseract with two languages at the same time?
>
>
> 2018년 3월 10일 토요일 오후 4시 18분 48초 UTC+9, 이경준 님의 말:
>>
>> Hi i'm sorry to question oftenly. and lots of questions.
>>
>> But, I must use tesseract 4.0 for my business .
>>
>> plz understand my situations. I have lots of family to raise.
>>
>>
>> ealier you gave me *a bash sciprt *. In there *tesstrain.sh* (course) .
>> it give me an error like
>>
>>
>> Please make sure the TESSDATA_PREFIX environment variable is set to your
>> "tessdata" directory.
>> *Failed loading language 'chi_tra'*
>> Error opening data file
>> /usr/share/tesseract-ocr/4.00/tessdata/chi_tra.traineddata
>>
>>
>> *Before, you gave me a conference . it froms the lang directory /
>> kor.config.*
>>
>>
>>
>>
>>
>> *in there #Fixes https://github.com/tesseract-ocr/tesseract/issues/1009
>> preserve_interword_spaces
>> 1tessedit_load_sublangs chi_tra# New Segmentation search params*
>>
>>
>> So I guess "tessedit_load_sublangs chi_tra" cause to error for executing
>> "tesstrain.sh"
>>
>> So I conclude(for solution) *1) Delete that sentence -> Is it right ? or
>> what is the side-effect*
>>
>>
>>I want to have 1 traineddata which is fine tuned and for 2 langugages
>> (korean & English)
>>
>>   so is it possible to add the sentece like  *1-1)*
>>
>> *"tessedit_load_sublangs eng"-> Is it right? or possible???*
>> *In conclusion *
>> *1)*
>>
>>
>> *I do not want to see like error " Please make sure the TESSDATA_PREFIX
>> environment variable is set to your "tessdata" directory.Failed loading
>> language 'chi_tra'Error opening data file
>> /usr/share/tesseract-ocr/4.00/tessdata/chi_tra.traineddata " *
>>
>> *2) If I want to use tessereract(4.0) for 2 languages(e.g. Korean,
>> English) by 1(one) traineddata(which is fine tuned) *
>>
>> Is it possible and How to make 1 finedtuned traineddata for 2
>> languages(e.g Korean, English)
>>
>> 3) tesseract is possible to use like
>>
>> $ tesseract (picture.png) -l kor+eng
>>
>> is it possible ?
>>
>> 4) What is kor.vert traineddata ? (tessdata-best)
>>
>> What is different from kor.traineddata ???
>>
>> 5) Is it possible to fine tune by existing images??? How is it possible
>> to use script you gave me
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/224474f7-a56f-4434-ab56-4914907fdaad%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUJe6oxk-ny4V8DoU%3DmcVW2z7oJhy4C3cWi0qz%2BKXYdcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Multiple pages in parallel?

2018-03-10 Thread Matthew Lai
Hello!

According to the FAQ[1], if I run tesseract on a multi-page image, it 
should process the pages in parallel.

I am converting a 10-page TIF (in one file) into PDF, and looking at *top*, 
it seems like tesseract never uses more than about 250% CPU (I have 16 
cores / 32 threads on my machine).

Am I doing something wrong?

tesseract combined.tif out pdf
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
OSD: Weak margin (6.98) for 914 blob text block, but using orientation 
anyway: 0

tesseract -v (from Debian Testing):
tesseract 4.00.00alpha
 leptonica-1.74.1
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.1) : libpng 1.6.28 : libtiff 
4.0.8 : zlib 1.2.8 : libwebp 0.5.2 : libopenjp2 2.1.2

 Found AVX
 Found SSE

Thanks!
Matthew

[1]: 
https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-increase-speed-of-ocr

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7f01dc90-9210-45e6-93d4-282a1edd4a0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.