Which tessdata repository are you using for your trained data files? tessdata tessdata_best tessdata_fast
On Tue 24 Jul, 2018, 9:01 AM Atsuyoshi Suzuki, <atuyosi.unloc...@gmail.com> wrote: > Hi. > > I tried new tesseract and traineddata for Japanese (both jpn.traineddata > and Japanese.traineddata). > > It's very good recognition result with jpn.traineddata. > > Japanese.traineddata provide good result but unnecessary space is > inserted in words or characters. > > > > Is this behavior expected? In Japanese, there is no space between each > words. > > If this behavior is expected, what kind of usage is assumed for > Japanese.traineddata? > > > > jpn.traineddata (very good, and I expected): > > --- start --- > $ tesseract -l jpn test_jpn_04.jpg stdout > Warning. Invalid resolution 0 dpi. Using 70 instead. > Estimating resolution as 168 > OCR 機能を提供する Web API はいくつか存在しますが、用途によってカスタマイズすることが > できません。Tesseract は多数の言語に対応し、Linux、macOS、Windows で動作します。 > > --- end --- > > > Japanese.traineddata: > > --- start --- > $ tesseract -l Japanese test_jpn_04.jpg stdout > Warning. Invalid resolution 0 dpi. Using 70 instead. > Estimating resolution as 168 > OCR 機能 を 提供 する Web API は いく つか 存在 し ます が 、 用 途 に よっ て カス タマ イズ する こと が > で きま せん 。Tesseract は 多数 の 言語 に 対応 し 、Linux、macOS、Windows で 動作 し ます 。 > > --- end --- > > > This result is same between Ubuntu (beta.1) and macOS > (4.0.0-beta.2-586-g607e). > > > > Thanks. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/ccfcb61b-3afa-4ecc-b6ac-ae3aebc55465%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ccfcb61b-3afa-4ecc-b6ac-ae3aebc55465%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVDx5_gDmipLsM5Md98_RP4tri9dH100O6_3tgq-5Q5Pw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.