Re: [tesseract-ocr] Japanese - Problems with vertical words

2019-06-03 Thread Seokbong Choi
Are you using jpn_vert instead of jpn? I have trained jpn_vert https://github.com/zodiac3539/jpn_vert On Mon, Jun 3, 2019 at 11:31 AM Shree Devi Kumar wrote: > tesseract 4 has been trained on line images and hence gives better results > for lines, as far as I have seen. > > On Sun, Jun 2, 2019

Re: [tesseract-ocr] Failing to run on OSX after installation with brew

2018-12-31 Thread Seokbong Choi
you need to install pre-requisite libraries. https://gist.github.com/fractaledmind/cd2fc4125bef57bcb3e2 Please refer to line 17-19. Thanks. Happy new year! On Mon, Dec 31, 2018 at 6:49 AM Bernard Pochet wrote: > After installing (and reinstall ...) with brew,I receive this message ... > >

Re: [tesseract-ocr] How I extra the green word on a image

2018-12-23 Thread Seokbong Choi
Use HSV filter. You can use OpenCV. I guess you don't need to filter out V range, but High S range may work. On Sun, Dec 23, 2018 at 10:09 AM 童虎 wrote: > > I want use tessract to extra green text(which is Chinese word) > [image: t_first_name_more_foggy.png] > but the result it not well. > > And

Re: [tesseract-ocr] [/usr/local/bin/language-specific.sh: 줄 1125: FONTS: unbound variable] Error help me!!

2018-12-05 Thread Seokbong Choi
Hello, I think you miss the "fontlist" argument... The below script worked out for Japanese. Even though you want to train all fonts in language-specific.sh, I would suggest to include the "fontlist" argument still. tesstrain.sh \ --fonts_dir /usr/share/fonts/ \ --lang jpn \ --linedata_onl

Re: [tesseract-ocr] Is Tesseract high security for commercial APP?

2018-12-03 Thread Seokbong Choi
Hello Long, Tesseract does not require internet connection to be run. That fact will eliminate most concerns around network security. (As a matter of fact, the current threat landscape mostly stems from the internet connectivity.) However, I do not know it will impact integrity and availability. I

[tesseract-ocr] New jpn_vert.trainnedata

2018-11-26 Thread Seokbong Choi
Hello all, Although our jpn_vert from best worked greatly, it didn't serve my purpose - reading comic books. Here, I retrained with the new font and new expressions where most Japanese comic books use. https://github.com/zodiac3539/jpn_vert - Add more fonts - Othutome, the font wher

Re: [tesseract-ocr] Tesseract v4 generated incorrect text output

2018-11-26 Thread Seokbong Choi
Hello, OEM and PSM are values that you should set up whenever you execute tesseract.exe, which cannot be automatically detected under the current version. (I hope it can be improved in the next version) I guess you are in the situation where the optimal result can be obtained through different psm

Re: [tesseract-ocr] Images with text in white color

2018-11-12 Thread Seokbong Choi
Use Otsu Inverse from OpenCV. https://www.meccanismocomplesso.org/en/opencv-python-otsu-binarization-thresholding/ On Mon, Nov 12, 2018 at 6:38 AM raghunath rs wrote: > Hi, > > I recently experienced that Tesseract 4 is not identifying images with > text in white and background colored > > Is

Re: [tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread Seokbong Choi
Can you share the content of "eng.training_files.txt" file? that --train_listfile argument refers to? Thanks. On Fri, Oct 19, 2018 at 1:59 PM tu tonquang wrote: > I want my application able to recognize characters like: 'Φ' > > Vào 00:56:01 UTC+7 Thứ Bảy, ngày 20 tháng 10 năm 2018, tu tonquang đ

[tesseract-ocr] New JPN_VERT traineddata (for 4.0)

2018-10-15 Thread Seokbong Choi
Hello all, During 2 weeks, I trained JPN_VERT little bit further. I included heart symbols, which are commonly used in Japanese comic books. Whenever I tried to OCR, the entire sentence got weird. So, I got around the issue by training those symbols. I also trained casual conversations more. The

Re: [tesseract-ocr] Generate box file for JPN_VERT?

2018-10-11 Thread Seokbong Choi
HAT_EVER_FONT_YOU_WANT_TO_ADD" You will see the box file and tiff file where characters are vertically aligned. Thanks! On Sun, Oct 7, 2018 at 12:56 PM Seokbong Choi wrote: > Hello, > > I am a Japanese comic book fan. Recently, I come to learn about tesseract, > which is aweso

[tesseract-ocr] Generate box file for JPN_VERT?

2018-10-07 Thread Seokbong Choi
Hello, I am a Japanese comic book fan. Recently, I come to learn about tesseract, which is awesome. There are many challenges around Japanese - it has millions characters, so that millions of iteration are required to train. Another challenge is vertical text. Most of comic books use vertical