Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-23 Thread Atsuyoshi Suzuki
Hi Shree. I use tessdata_fast. 2018年7月24日火曜日 13時44分40秒 UTC+9 shree: > > Which tessdata repository are you using for your trained data files? > > tessdata > tessdata_best > tessdata_fast > > > > On Tue 24 Jul, 2018, 9:01 AM Atsuyoshi Suzuki, > wrote: > >> Hi. >> >> I tried new tesseract and tra

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-23 Thread Shree Devi Kumar
Which version of tesseract are you using? Please post output of tesseract -v On Tue 24 Jul, 2018, 2:26 AM Emiliano Isaza Villamizar, wrote: > Hello everyone, > > > 'm trying to train tesseract to improve the detection of some prices such > as: CN¥2,400.48. I got got to a point that I keep gett

Re: [tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-23 Thread Shree Devi Kumar
Which tessdata repository are you using for your trained data files? tessdata tessdata_best tessdata_fast On Tue 24 Jul, 2018, 9:01 AM Atsuyoshi Suzuki, wrote: > Hi. > > I tried new tesseract and traineddata for Japanese (both jpn.traineddata > and Japanese.traineddata). > > It's very good r

[tesseract-ocr] Unnecessary extra space with Japanese.traineddata

2018-07-23 Thread Atsuyoshi Suzuki
Hi. I tried new tesseract and traineddata for Japanese (both jpn.traineddata and Japanese.traineddata). It's very good recognition result with jpn.traineddata. Japanese.traineddata provide good result but unnecessary space is inserted in words or characters. Is this behavior expected? In

Re: [tesseract-ocr] How to train by tesseract 4.00

2018-07-23 Thread Lorenzo Bolzani
The TESSDATA_PREFIX maybe? 2018-07-23 17:37 GMT+02:00 Emiliano Isaza Villamizar : > But still i don't know why this happens I haven't modified anything in the > Makefile!! What would I need to change? > > > > > On Friday, July 20, 2018 at 5:30:00 AM UTC-5, Lorenzo Blz wrote: >> >> >> You have som

[tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 244

2018-07-23 Thread Emiliano Isaza Villamizar
Hello everyone, 'm trying to train tesseract to improve the detection of some prices such as: CN¥2,400.48. I got got to a point that I keep getting this error: *total=`cat data/all-lstmf | wc -l` \* * no=`echo "$total * 0.90 / 1" | bc`; \* * head -n "$no" data/all-lstmf > "data/list.train"*

Re: [tesseract-ocr] How to train by tesseract 4.00

2018-07-23 Thread Emiliano Isaza Villamizar
But still i don't know why this happens I haven't modified anything in the Makefile!! What would I need to change? On Friday, July 20, 2018 at 5:30:00 AM UTC-5, Lorenzo Blz wrote: > > > You have some problems with your path configuration, check the error > message: > > Failed to read > /home

Re: [tesseract-ocr] How to train by tesseract 4.00

2018-07-23 Thread Emiliano Isaza Villamizar
I'm just did that. thank you! https://github.com/OCR-D/ocrd-train/issues/17 On Friday, July 20, 2018 at 3:37:10 AM UTC-5, shree wrote: > > Please ask at https://github.com/OCR-D/ocrd-train/issues > > for ocr-d related questions. > > On Fri, Jul 20, 2018 at 11:36 AM Emiliano Isaza Villamizar <

Re: [tesseract-ocr] Re: unrecognized argument "unrecognised argument linedata_only"

2018-07-23 Thread Jennil Thiyam
Even though the double quotes look fancy here, its not the case in command prompt. >From all your help i am able to run this command but still i got lots of thing that says *Normalization failed for string * and at last with this *Error writing unicharset!!* any help is welcome, i am so new to

Re: [tesseract-ocr] Re: unrecognized argument "unrecognised argument linedata_only"

2018-07-23 Thread Lorenzo Bolzani
Please read the complete error message: it's telling you exactly where the problem is. I think you are using "fancy double quotes" or something like that rather than the normal ones. Are you doing cut and paste from some word processor? This is probably causing all the errors... 2018-07-23 9:4

Re: [tesseract-ocr] Need Help Tesseract for unicode characters (Vietnamese) with framework IONIC

2018-07-23 Thread Adrian Owen
I had similar problems with traditional Chinese. My issue was caused by low quality of the input.. I did some pre procession and resizing, and accurately improved... Hope this helps. Original Message Subject: [tesseract-ocr] Need Help Tesseract for unicode characters (Vietna

[tesseract-ocr] Need Help Tesseract for unicode characters (Vietnamese) with framework IONIC

2018-07-23 Thread Tai Doan
I'm using IONIC 3.I want to use Tesseract to translate words on Vietnamese images.But when I using Tesseract, words are translated incorrectly or translated into numbers. Help me. I using https://devdactic.com/ionic-ocr-using-tesseract/ Thanks you. -- You received this message because you are s

Re: [tesseract-ocr] Re: unrecognized argument "unrecognised argument linedata_only"

2018-07-23 Thread Jennil Thiyam
I tried using Lohit Bengali and here is the command /usr/share/tesseract-ocr/./tesstrain.sh --fonts_dir /usr/share/fonts --lang ben --linedata_only --noextract_font_properties --langdata_dir /home/jennil/Desktop/pro/langdata-master --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata --output_dir

[tesseract-ocr] Is there a way to specify to Tesseract API which color represents background and foreground?

2018-07-23 Thread Dario Cazzato
Dear all, I am processing bounding boxes containing one single letter each one. The bounding box is found with other techniques, and a pre-processing step directly creates a binary mask quite well segmented for the Tesseract-OCR. The method works, but sometimes it happens that the OCR is detect