Hi Shree. I use tessdata_fast.
2018年7月24日火曜日 13時44分40秒 UTC+9 shree: > > Which tessdata repository are you using for your trained data files? > > tessdata > tessdata_best > tessdata_fast > > > > On Tue 24 Jul, 2018, 9:01 AM Atsuyoshi Suzuki, <atuyosi....@gmail.com > <javascript:>> wrote: > >> Hi. >> >> I tried new tesseract and traineddata for Japanese (both jpn.traineddata >> and Japanese.traineddata). >> >> It's very good recognition result with jpn.traineddata. >> >> Japanese.traineddata provide good result but unnecessary space is >> inserted in words or characters. >> >> >> >> Is this behavior expected? In Japanese, there is no space between each >> words. >> >> If this behavior is expected, what kind of usage is assumed for >> Japanese.traineddata? >> >> >> >> jpn.traineddata (very good, and I expected): >> >> --- start --- >> $ tesseract -l jpn test_jpn_04.jpg stdout >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> Estimating resolution as 168 >> OCR 機能を提供する Web API はいくつか存在しますが、用途によってカスタマイズすることが >> できません。Tesseract は多数の言語に対応し、Linux、macOS、Windows で動作します。 >> >> --- end --- >> >> >> Japanese.traineddata: >> >> --- start --- >> $ tesseract -l Japanese test_jpn_04.jpg stdout >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> Estimating resolution as 168 >> OCR 機能 を 提供 する Web API は いく つか 存在 し ます が 、 用 途 に よっ て カス タマ イズ する こと が >> で きま せん 。Tesseract は 多数 の 言語 に 対応 し 、Linux、macOS、Windows で 動作 し ます 。 >> >> --- end --- >> >> >> This result is same between Ubuntu (beta.1) and macOS >> (4.0.0-beta.2-586-g607e). >> >> >> >> Thanks. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/ccfcb61b-3afa-4ecc-b6ac-ae3aebc55465%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/ccfcb61b-3afa-4ecc-b6ac-ae3aebc55465%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e009654e-7f40-42fb-bc56-6946a60105aa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.