Re: [tesseract-ocr] Training stops before specified iterations

2019-07-19 Thread Shree Devi Kumar
Look at tesstrain.log for details of your training run. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line target_error_rate double 0.01 Stop training if the mean percent error rate gets below this value. On Fri, Jul 19, 2019 at 11:39 AM Pooja Kamr

Re: [tesseract-ocr] Trained data for E13B font

2019-07-19 Thread Lorenzo Bolzani
PSM 7 was a partial solution for my specific case, it improved the situation but did not solve it. Also I could not use it in some other cases. The proper solution is very likely doing more training with more data, some data augmentation might probably help if data is scarce. Also doing less train

Re: [tesseract-ocr] Trained data for E13B font

2019-07-19 Thread ElGato ElMago
Lorenzo, We both have got the same case. It seems a solution to this problem would save a lot of people. Shree, I pulled the current head of master branch but it doesn't seem to contain the merges you pointed that have been merged 3 to 4 days ago. How can I get them? ElMagoElGato 2019年7月1

Re: [tesseract-ocr] Trained data for E13B font

2019-07-19 Thread Claudiu
Is there any way to pass bounding boxes to use to the LSTM? We have an algorithm that cleanly gets bounding boxes of MRZ characters. However the results using psm 10 are worse than passing the whole line in. Yet when we pass the whole line in we get these phantom characters. Should PSM 10 mode wor

Re: [tesseract-ocr] Trained data for E13B font

2019-07-19 Thread Shree Devi Kumar
>Is there any way to pass bounding boxes to use to the LSTM? See https://github.com/tesseract-ocr/tesseract/wiki/APIExample#getcomponentimages-example On Fri, Jul 19, 2019 at 2:50 PM Claudiu wrote: > Is there any way to pass bounding boxes to use to the LSTM? We have an > algorithm that cleanl

Re: [tesseract-ocr] Trained data for E13B font

2019-07-19 Thread Shree Devi Kumar
>I pulled the current head of master branch but it doesn't seem to contain the merges you pointed that have been merged 3 to 4 days ago. How can I get them? I usually do `git pull origin master` to get all latest changes from the master branch. On Fri, Jul 19, 2019 at 2:35 PM ElGato ElMago wrot

Re: [tesseract-ocr] Trained data for E13B font

2019-07-19 Thread ElGato ElMago
Lorenzo, I haven't been checking psm too much. Will turn to those options after I see how it goes with bounding boxes. Shree, I see the merges in the git log and also see that new option lstm_choice_amount works now. I guess my executable is latest though I still see the phantom character.

Re: [tesseract-ocr] Trained data for E13B font

2019-07-19 Thread Claudiu
Thanks for that link. Should passing a full image and calling SetRectangle have different behavior than passing just the cropped image and not using SetRectangle? On Fri, Jul 19, 2019 at 11:29 AM Shree Devi Kumar wrote: > >Is there any way to pass bounding boxes to use to the LSTM? > > See > htt

Re: [tesseract-ocr] Training stops before specified iterations

2019-07-19 Thread Pooja Kamra
As per log file, finished error rate is 1.439. On Friday, July 19, 2019 at 1:24:35 PM UTC+5:30, shree wrote: > > Look at tesstrain.log for details of your training run. > > > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line > > > target_error_rat

Re: [tesseract-ocr] GPU for Tesseract

2019-07-19 Thread Pooja Kamra
Tesseract does not require GPU. But if my system has GPU then will it help to improve performance. On Friday, June 28, 2019 at 7:02:30 PM UTC+5:30, Timothy Snyder wrote: > > I think it means that Tesseract doesn't support nor require hardware > acceleration via the GPU. > > Looks like there is

Re: [tesseract-ocr] Training stops before specified iterations

2019-07-19 Thread Shree Devi Kumar
As per your screenshot 15000 iterations have been done. On Fri, Jul 19, 2019 at 3:52 PM Pooja Kamra wrote: > As per log file, finished error rate is 1.439. > > > > On Friday, July 19, 2019 at 1:24:35 PM UTC+5:30, shree wrote: >> >> Look at tesstrain.log for details of your training run. >> >> >>

Re: [tesseract-ocr] understading lstmeval and use it on pretrained models for comparison

2019-07-19 Thread Arno Loo
I went and tried to understand the source code as well as I could and although I did not find all the answers I did find some. (for tesseract 4.0.0-beta.3) At iteration 14615/695400/698614, Mean rms=0.158%, delta=0.295%, char train= 1.882%, word train=2.285%, skip ratio=0.4%, wrote checkpoint.

Re: [tesseract-ocr] understading lstmeval and use it on pretrained models for comparison

2019-07-19 Thread Arno Loo
I went and tried to understand the source code as well as I could and although I did not find all the answers I did find some. (for tesseract 4.0.0-beta.3) At iteration 14615/695400/698614, Mean rms=0.158%, delta=0.295%, char train= 1.882%, word train=2.285%, skip ratio=0.4%, wrote checkpoint.

Re: [tesseract-ocr] Training stops before specified iterations

2019-07-19 Thread Arno Loo
I was confused about the triple iteration number too... https://groups.google.com/d/msg/tesseract-ocr/hni4owhU3vs/ankF3gSrAwAJ Le vendredi 19 juillet 2019 12:22:16 UTC+2, Pooja Kamra a écrit : > > As per log file, finished error rate is 1.439. > > > > On Friday, July 19, 2019 at 1:24:35 PM UTC+5

[tesseract-ocr] Re: GPU for Tesseract

2019-07-19 Thread Arno Loo
If I understand correctly, you can use Tesseract OCR on GPU for speeding up the process but not Tesseract training. Le vendredi 28 juin 2019 07:54:07 UTC+2, Pooja Kamra a écrit : > > On Tesseract site, it is mentioned that no GPU is needed (No support). > What does this statement means? > If i ha

Re: [tesseract-ocr] understading lstmeval and use it on pretrained models for comparison

2019-07-19 Thread Shree Devi Kumar
Very well written. You may want to update the wiki pages with the info too. On Fri, Jul 19, 2019 at 7:45 PM Arno Loo wrote: > I went and tried to understand the source code as well as I could and > although I did not find all the answers I did find some. (for tesseract > 4.0.0-beta.3) > At itera