Re: [tesseract-ocr] How to choose the stop condition of LSTM training

2019-04-18 Thread Lorenzo Bolzani
Yes, lstmeval is manual but easy to automate. I use a script like this: ./train.sh $NAME 100 ./train.sh $NAME 300 ./train.sh $NAME 400 ./train.sh $NAME 500 ./train.sh $NAME 750 ./train.sh $NAME 1000 ./train.sh $NAME 1200 ... It does short trainings, save the models into a folder and run lstmeval.

Re: [tesseract-ocr] How to choose the stop condition of LSTM training

2019-04-18 Thread 易鑫
Thank you. I see. Lorenzo Bolzani 于2019年4月18日周四 下午3:00写道: > Yes, lstmeval is manual but easy to automate. I use a script like this: > > ./train.sh $NAME 100 > ./train.sh $NAME 300 > ./train.sh $NAME 400 > ./train.sh $NAME 500 > ./train.sh $NAME 750 > ./train.sh $NAME 1000 > ./train.sh $NAME 1200

[tesseract-ocr] Can I use this way for fine tuning?

2019-04-18 Thread yixinlucky080
Hello,everyone: I have used tesseract 4.0 to train a chi_sim model,but the result is not so good as I expected,So I think out one way for fine tuning. 1.src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text ../training_data/chi_sim_layer_training_text \ --langdata_dir ../

[tesseract-ocr] Re: training tesseract 4.0.. issue with 'Make leptonica' giving error

2019-04-18 Thread yoganand
Yes.. I have followed the same steps. Thats not pretty clear, atleast to my knowledge. i have finished the windows setup in cygwin. i didnt follow the steps that mentioned for compiling leptonica and tesseract. Make command is giving error. $ make tesseract make: *** No rule to make target 'tes

[tesseract-ocr] traning devanagari: »Encoding of string failed!«

2019-04-18 Thread barth
Dear reader, I want to improve devanagari recognition. I have images and manually corrected Text with line coordinates. >From those, I've generated .box files; see attached file which produces the error above. Complete error Message from lstmtrain: »Encoding of string failed! Failure bytes: 9 32

Re: [tesseract-ocr] traning devanagari: »Encoding of string failed!«

2019-04-18 Thread Shree Devi Kumar
> I have images and manually corrected Text with line coordinates. From those, I've generated .box files; What method did you use for generating the .box files? Please provide the image for the box file for test. On Thu, Apr 18, 2019 at 6:09 PM wrote: > Dear reader, > I want to improve devanag

Re: [tesseract-ocr] traning devanagari: »Encoding of string failed!«

2019-04-18 Thread Shree Devi Kumar
Also see https://github.com/OCR-D/ocrd-train/pull/66 https://github.com/tesseract-ocr/tesseract/issues/2357#issuecomment-477239316 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it

Re: [tesseract-ocr] Can I use this way for fine tuning?

2019-04-18 Thread 易鑫
Is anybody here,can some one help me,thanks a lot. 于2019年4月18日周四 下午5:19写道: > Hello,everyone: > I have used tesseract 4.0 to train a chi_sim model,but the result is > not so good as I expected,So I think out one way for fine tuning. > > 1.src/training/tesstrain.sh --fonts_dir /usr/share/font

[tesseract-ocr] Italian, Portuguese, Arabic, Japanese, Korean, and Chinese test datasets

2019-04-18 Thread Sarasi Lalithsena
Hello everyone, I am looking for some datasets to test OCR engines for languages Italian, Portuguese, Arabic, Japanese, Korean, and Chinese. Datasets need to have raw OCR documents and the groud truth text. If you know any such dataset, please post here. Maybe it is helpful to have a catal