[tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Ramakant Kushwaha
*Hi,* *Recently I trying to retrain Tesseract 4.0 for recognising handwritten digits. I am following official page but finding it very difficult. It would be great if someone can elaborate below steps* - Prepare training text.

[tesseract-ocr] Re: OCR-D training process - High error rate [Tess 4]

2018-07-17 Thread Ramakant Kushwaha
Hi, I am also trying to train Tesseract 4.0 for hand written digits, I want to know what is the *best way to create pairs of [*.tif, *.gt.txt] with binarized chars and TTF's from two fonts (1869 text lines in total) . Are you using any specific tool to generate *.tif and *.gt.txt files. * *I h

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Ramakant Kushwaha
is working but it's a > bumpy road (lot of assertion failed/segmentation fault if you miss > something). > > > Bye > > Lorenzo > > 2018-07-17 15:03 GMT+02:00 Ramakant Kushwaha >: > >> *Hi,* >> >> *Recently I trying to retrain Tesseract 4.0

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Ramakant Kushwaha
ine >> manually (1 2 3 4...) and duplicate that one. Or you could use a very good >> online ocr service. >> >> >> But I'm not convinced this data is good for training. How does the real >> data that you want to recognize looks like? Individual dig

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Ramakant Kushwaha
omponents(blobs_img) and sort > i = 0 > for d in digits: > tiff = crop from original image using d coordinates > gtx.txt = i > i = (i+1)%10 > > Now you have the tiff images and the gt.txt files to run ocr-d. > > Maybe there are some tools to d

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Ramakant Kushwaha
s best, it also > depends on speed requirements. > > Number 3: this is easy once you have a small image with just a single > character inside. You do not need to do a binary black/white image, gray is > fine (at least it is what works best for me). You can use a MNIST trained

Re: [tesseract-ocr] Easiest way to make a train data to recognize Handwritten Mathmatical Expression.

2018-07-18 Thread Ramakant Kushwaha
As per suggestions on group , I am planning to use MNIST model for handwritten . I do not have trained data, will post if work on it. On Wed, Jul 18, 2018 at 7:09 PM, Hamza Rajput wrote: > If some trained data about this application is already developed instead > of equ.traineddata, then please

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-19 Thread Ramakant Kushwaha
corpus and generate the starter trained . > 3. Use the starter trained data to generate final traineed data after lstm > training > > > If you want a detailed description, I can supply you with a complete > documentation of steps. > > Chandra Churh Chatterjee > &g