Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-10 Thread Shree Devi Kumar
Hi Lorenzo, Thanks for detailed description of pre-processing steps. I will link from the wiki so that it is available for easy reference. Thank you for sharing. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-08 Thread Lorenzo Bolzani
Hi, yes, at the very least you can use some adaptive threshold method, like OTSU, to find the best parameters. But OTSU has its own parameters so you need to fine tune those too (a little). What worked best for me was first to do a rough normalization of the images (lightness, contrast) and then d

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-08 Thread Lorenzo Bolzani
Hi Shree, I'd love to but it is a commercial project I'm working on so I cannot share the current solution. I will try to find the old scripts I used for the first attempts. Basically it was something like this: - normalize lightness - make illumination uniform (CLAHE on HSV "V" channel) - denois

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-03 Thread Du Kotomi
Thank you so much for your sharing. It seems a very complicated cleanup. It will be very useful if you can provide some preprocessing script. And I am wondering there is also some thresholds depending on different images, right? By the way, I have read some papers about LSTM +Ctc for ocr. The ad

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-03 Thread Shree Devi Kumar
Hi Lorenzo, Do you have a script for image pre-processing? Please share, if possible. It will be helpful to many. On Wed, Apr 3, 2019 at 6:47 PM Lorenzo Bolzani wrote: > Hi, I train with real data. I use grayscale images, I think color makes no > difference. > > I do a very good image cleanup:

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-03 Thread Lorenzo Bolzani
Hi, I train with real data. I use grayscale images, I think color makes no difference. I do a very good image cleanup: background removal, denoise, straightening, sharpening, illumination correction, contrast stretching, etc. before passing the text to tesseract. This part is likely better done o

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-03 Thread Du Kotomi
Thank you for your kind reminder. I have done this. And confusing thing happens. I train my model with grey scale image without any 300dpi resize. It goes not that well when I validate with my some of my test data. But if I resize test image to 300 dpi, it’s better. In fact ,image will be resize to

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-03 Thread Shree Devi Kumar
I haven't trained with real images. I would guess that training images should be similar to what you will be using for OCR. It might be best to test with a small set of images and see what works best for you. On Wed, Apr 3, 2019 at 2:38 PM Du Kotomi wrote: > If we use text2image tool, there is n

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-03 Thread Du Kotomi
If we use text2image tool, there is no such problem. What about training with our real data. I have enough images for training. Should I need to do some preprocess like binary or resized dpi and then do lstm training? On Wed, Apr 3, 2019 at 16:36 Shree Devi Kumar wrote: > Usually for LSTM train

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-03 Thread Shree Devi Kumar
Usually for LSTM training we are using synthetic images created by text2image program using training text and fonts using tesstrain.sh or tesstrain.py. Hence there is no question of binarization or dpi as the program creates images as expected by tesseract training process. On Wed, Apr 3, 2019 at

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-03 Thread Du Kotomi
Anybody here? On Wed, Apr 3, 2019 at 09:57 wrote: > Sorry for disturb again. I have sent my issue befire, but no one gives the > answer. I really need your help. > > > I go through the source code and find tesseract do Otsu Thresholding and > put the binary pix in the Thresholder object. > But

[tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-02 Thread kotomi . niu
Sorry for disturb again. I have sent my issue befire, but no one gives the answer. I really need your help. I go through the source code and find tesseract do Otsu Thresholding and put the binary pix in the Thresholder object. But It seems the Thresholder object haven't been invoked if I us

[tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-02 Thread kotomi . niu
I go through the source code and find tesseract do Otsu Thresholding and put the binary pix in the Thresholder object. But It seems the Thresholder object haven't been invoked if I use lstm engines. As well as dpi size,tesseract wiki said it is better for 300 dpi. This is a requirement for tes