Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-24 Thread Shree Devi Kumar
Please see https://github.com/tesseract-ocr/tessdata_fast#example---jpn-and--japanese for Ray's comment regarding the 'script' traineddata. preserve_interword_spaces 1 was added via jpn.config to jpn.traineddata file and other CJK languages to fix this issue - see https://github.com/tesseract

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-24 Thread Lorenzo Bolzani
I had this error when I was mixing best models with non best models. I would try to run again combine_tessdata -e base_model/eng.traineddata base_model/eng.lstm to generate the eng.lstm from the "_best" model (the ones from /usr/share/tessdata are not the "_best" models). Then if the error is s

Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-24 Thread Atsuyoshi Suzuki
Thank you Shree. I got same result jpn and Japanese with '-c preserve_interword_spaces=1'. $ tesseract -l Japanese -c preserve_interword_spaces=1 test_jpn_04.jpg stdout Unnecessary space problem is solved. Thanks. 2018年7月24日火曜日 16時28分22秒 UTC+9 shree: > > Please see > https://github.com/t

[tesseract-ocr] How is the Mean rms calculated?

2018-07-24 Thread j . biros
I have been looking through the documentation but cannot seem to find anything that explains how the rms is calculated. I am a bit new to this sort of work, so I am not quite sure where to look. Can anyone point me in the right direction? -- You received this message because you are subscrib

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-24 Thread Emiliano Isaza Villamizar
I'm using OCR-D that uses 4.0.0-beta.1 On Tuesday, July 24, 2018 at 12:05:22 AM UTC-5, shree wrote: > > Which version of tesseract are you using? > > Please post output of > > tesseract -v > > On Tue 24 Jul, 2018, 2:26 AM Emiliano Isaza Villamizar, > wrote: > >> Hello everyone, >> >> >> 'm trying

Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-24 Thread mahendrag gajera
I am using Japanese.traineddata.which gives good result On Tue, Jul 24, 2018 at 2:59 PM, Atsuyoshi Suzuki < atuyosi.unloc...@gmail.com> wrote: > Thank you Shree. > > > I got same result jpn and Japanese with '-c preserve_interword_spaces=1'. > > $ tesseract -l Japanese -c preserve_interword_spa

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-24 Thread Emiliano Isaza Villamizar
I'm using OCR-D I compiled it again changing the .traineddata in the original file but it hasn't worked. I still get the same error. Iteration 0: ALIGNED TRUTH : Zhejiang Huamei Holding Co Ltd Iteration 0: BEST OCR TEXT : ₩Z₩h₩e₩j₩i₩a₩n₩ ₩₩u₩a₩m₩e ₩₩o₩₩d₩i₩n₩ ₩C₩o ₩L₩₩d File /home/tulipan1637/D

[tesseract-ocr] Re: Read Bold fonts with Tesseract API - JAVA

2018-07-24 Thread Raed Kubaizi
any luck guys ??? > > > > Thanks > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send em

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-24 Thread shree
> > * --continue_from >> >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm >> >> \* >> * --old_traineddata >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata >> >> \* >> > Use eng.traineddata from tessdata_best

[tesseract-ocr] Problems when training tesseract in Spanish language

2018-07-24 Thread ricardo valadez
It happens to the moment in which a word contains this tilde, it is not recognized and the word changes, the same case is for the letter "ñ" -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emai

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-24 Thread Emiliano Isaza Villamizar
It worked maybe I was using another *eng.traineddata. *Thank you for your time Shree and Lorenzo kind regards, Emiliano On Tuesday, July 24, 2018 at 11:40:34 AM UTC-5, shree wrote: > > * --continue_from >>> >>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.l

[tesseract-ocr] Re: Problems when training tesseract in Spanish language

2018-07-24 Thread 'John Lee Ward' via tesseract-ocr
This may be a silly question, but I assume that when you call tesseract that you are using the -l spa option? On Tuesday, July 24, 2018 at 12:20:11 PM UTC-5, ricardo valadez wrote: > > It happens to the moment in which a word contains this tilde, it is not > recognized and the word changes,

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-24 Thread Emiliano Isaza Villamizar
I anyone is following this thread and are using OCR-D, I had to change the start of the .py file by adding these lines because I kept getting and unicode error: *import sys* *reload(sys)* *sys.setdefaultencoding('utf-8')* On Tuesday, July 24, 2018 at 4:41:45 PM UTC-5, Emiliano Isaza Villamizar

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-24 Thread Emiliano Isaza Villamizar
If anyone is following this thread and are using OCR-D, I had to modify the .py file because I kept getting a Unicode error, just add these lines to the file: import sys reload(sys) sys.setdefaultencoding('utf-8') On Tuesday, July 24, 2018 at 4:41:45 PM UTC-5, Emiliano Isaza Villamizar wrote:

[tesseract-ocr] Re: How is the Mean rms calculated?

2018-07-24 Thread 'John Lee Ward' via tesseract-ocr
I am new to the tesseract also. Where in the tesseract world does rms value come up? As a general rule in engineering, the rms value is .707 peak value if one is working with amps or volts and you are dealing with sinusoids. If the waveform is not sinusoidal, the rms value is equal to the averag

[tesseract-ocr] Re: Problems when training tesseract in Spanish language

2018-07-24 Thread ricardo valadez
maybe if it's silly but I'm new to tesseract ... I'll call it that, thank you El martes, 24 de julio de 2018, 16:42:55 (UTC-5), John Lee Ward escribió: > > This may be a silly question, but I assume that when you call tesseract > that you are using the -l spa option? > > > > On Tuesday, July 2