Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-24 Thread mahendrag gajera
I am using Japanese.traineddata.which gives good result On Tue, Jul 24, 2018 at 2:59 PM, Atsuyoshi Suzuki < atuyosi.unloc...@gmail.com> wrote: > Thank you Shree. > > > I got same result jpn and Japanese with '-c preserve_interword_spaces=1'. > > $ tesseract -l Japanese -c preserve_interword_spa

Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-24 Thread Atsuyoshi Suzuki
Thank you Shree. I got same result jpn and Japanese with '-c preserve_interword_spaces=1'. $ tesseract -l Japanese -c preserve_interword_spaces=1 test_jpn_04.jpg stdout Unnecessary space problem is solved. Thanks. 2018年7月24日火曜日 16時28分22秒 UTC+9 shree: > > Please see > https://github.com/t

Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-24 Thread Shree Devi Kumar
Please see https://github.com/tesseract-ocr/tessdata_fast#example---jpn-and--japanese for Ray's comment regarding the 'script' traineddata. preserve_interword_spaces 1 was added via jpn.config to jpn.traineddata file and other CJK languages to fix this issue - see https://github.com/tesseract

Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-23 Thread Atsuyoshi Suzuki
Hi Shree. I use tessdata_fast. 2018年7月24日火曜日 13時44分40秒 UTC+9 shree: > > Which tessdata repository are you using for your trained data files? > > tessdata > tessdata_best > tessdata_fast > > > > On Tue 24 Jul, 2018, 9:01 AM Atsuyoshi Suzuki, > wrote: > >> Hi. >> >> I tried new tesseract and tra

Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-23 Thread Shree Devi Kumar
Which tessdata repository are you using for your trained data files? tessdata tessdata_best tessdata_fast On Tue 24 Jul, 2018, 9:01 AM Atsuyoshi Suzuki, wrote: > Hi. > > I tried new tesseract and traineddata for Japanese (both jpn.traineddata > and Japanese.traineddata). > > It's very good r

[tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-23 Thread Atsuyoshi Suzuki
Hi. I tried new tesseract and traineddata for Japanese (both jpn.traineddata and Japanese.traineddata). It's very good recognition result with jpn.traineddata. Japanese.traineddata provide good result but unnecessary space is inserted in words or characters. Is this behavior expected? In