Re: [tesseract-ocr] How to set whitelist for non-English characters?

2020-08-02 Thread
Haha, you can also try -c tessedit_char_whitelist='我愛你', single quote instead of double quote. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+uns

Re: [tesseract-ocr] How to set whitelist for non-English characters?

2020-07-30 Thread
maybe you can try '-c tessedit_char_whitelist="我愛你"', something like this. un C 于2020年7月29日周三 下午5:27写道: > I am using tesseract v5.0.0-alpha.20200328. > > I tried ' -c tessedit_char_whitelist=0123456789,' it does work. > But for Chinese characters, neither '-c tessedit_char_whitelist=我愛你' nor > t

Re: [tesseract-ocr] how to improve No block overlapping textline

2019-04-04 Thread
It work!! Thanks! 在 2019年4月4日星期四 UTC+8下午1:39:25,shree写道: > > Try > > tesseract test.jpg test --psm 6 lstm.train > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an emai

[tesseract-ocr] how to improve No block overlapping textline

2019-04-03 Thread
I have an image, I run tesseract on it, can recognise the fist line although some wrong word. But I run `tesseract test.jpg test lstm.train' to make lstf file. It can't recognise the first line. The log is: ``` No block overlapping textline: The first line content ``` How can I improve this. T

[tesseract-ocr] How do I increase the accuracy in this situation

2019-03-30 Thread
I have a image like this: [image: 111.jpg] And I then run tesseract 111.jpg out -l chi_sim; cat out.txt;rm out.txt; But the result is Tesseract Open Source OCR Engine v4.0.0-332-gb727 with Leptonica E I have no idea how to improve this. Any ideas? Thank you. -- You received this message be

Re: [tesseract-ocr] Why combine_lang_model need ommon.unicharset

2019-03-14 Thread
rated traineddata with and without recoder > option. > > > On Tue, Mar 12, 2019 at 9:59 PM 童虎 > > wrote: > >> I use this command follow by a post to create a xx.tessdata >> >> combine_lang_model \ >> --input_unicharset cp.unicharset \ >> --script_dir

[tesseract-ocr] Can't encode transcription: '你 好' in language '' Encoding of string failed!

2019-03-14 Thread
I want train on 4.0 version I use unicharset_extractor 0_gray.box to create the unicharset file. and use combine_lang_model \ --input_unicharset unicharset \ --script_dir /Users/th/source/langdata \ --output_dir . \ --lang chi_sim to create the chi_sim.traineddata file and train use this co

[tesseract-ocr] Why combine_lang_model need ommon.unicharset

2019-03-12 Thread
I use this command follow by a post to create a xx.tessdata combine_lang_model \ --input_unicharset cp.unicharset \ --script_dir /Users/th/source/langdata \ --output_dir output \ --pass_through_recoder \ --lang cp and the cp.unicharset is very simple: https://gist.github.com/huhuang03/62391f632d

Re: [tesseract-ocr] How I extra the green word on a image

2018-12-23 Thread
👌, thank you! 在 2018年12月24日星期一 UTC+8上午1:50:47,Seokbong Choi写道: > > Use HSV filter. You can use OpenCV. I guess you don't need to filter out V > range, but High S range may work. > > > On Sun, Dec 23, 2018 at 10:09 AM 童虎 > > wrote: > >> >> I wa

[tesseract-ocr] How I extra the green word on a image

2018-12-23 Thread
I want use tessract to extra green text(which is Chinese word) [image: t_first_name_more_foggy.png] but the result it not well. And I save the `-c tessedit_wite_images=1` to see the middle image, which is not well [image: bb.png] How can I preprocess this picture? I'm new to image process.