[tesseract-ocr] Shapeclustering Not Responding

2018-07-17 Thread xyqiao7
Hi all, I'm trying to train Tesseract, I've gone through the first few step including 1. getting TIF's 2. creating the box files 3. correcting the box files 4. training(tesseract [language].[fontname].exp[samplenumber].tif [language].[fontname].exp[samplenumber] box.train) 5. creating the u

[tesseract-ocr] Creating traineddata with specific wordlist

2018-07-17 Thread James Q
Hi I'm trying to create a traineddata with a specific word list. What I have done so far is: 1.) Create specific files langdata/eng - eng.wordlist (containing my specific words) - eng.finetune.training_text (representative text containing only chars found in my words) - eng.numbers a

[tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Ramakant Kushwaha
*Hi,* *Recently I trying to retrain Tesseract 4.0 for recognising handwritten digits. I am following official page but finding it very difficult. It would be great if someone can elaborate below steps* - Prepare training text.

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Lorenzo Bolzani
Have a look at this thread: https://groups.google.com/forum/#!topic/tesseract-ocr/be4-rjvY2tQ It's easier than it seems, you do not need per character boxes with 4.0, just one per line (that ocr-d automatically generates). If your text is already split into lines you do not have to do anything m

[tesseract-ocr] Re: OCR-D training process - High error rate [Tess 4]

2018-07-17 Thread Ramakant Kushwaha
Hi, I am also trying to train Tesseract 4.0 for hand written digits, I want to know what is the *best way to create pairs of [*.tif, *.gt.txt] with binarized chars and TTF's from two fonts (1869 text lines in total) . Are you using any specific tool to generate *.tif and *.gt.txt files. * *I h

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Ramakant Kushwaha
*Thank you so much for guiding me. * *I had read links and sub-links provided and as suggested I will use OCR-D(* https://github.com/OCR-D/ocrd-train*) for training * I want to know what is the *best way to create pairs of [*.tif, *.gt.txt] from tif image for two and more fonts . Is their any

[tesseract-ocr] Re: java.lang.UnsatisfiedLinkError in tess4j

2018-07-17 Thread Dattatraya Tembare
I'm facing the same problem on Windows 10 with JDK 10 (same code is working in Windows 7 with JDK 8) Error Logs: java.lang.UnsatisfiedLinkError: The specified module could not be found. at com.sun.jna.Native.open(Native Method) ~[jna-4.5.2.jar!/:4.5.2 (b0)] at com.sun.jna.Nati

[tesseract-ocr] Re: java.lang.UnsatisfiedLinkError in tess4j

2018-07-17 Thread Dattatraya Tembare
Forgot to mention, I have installed Visual C++ Redistributable for VS2013 Still how to check it, whether I have installed correct version? On Tuesday, July 17, 2018 at 1:49:09 PM UTC-4, Dattatraya Tembare wrote: > > I'm facing the s

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Lorenzo Bolzani
​​ Generating the training data is a completely different problem from training tesseract. If you want to recognize full words it's better to have full words (or numbers), not individual characters so that the process of splitting the words into characters is done by tesseract. Unless you just wa

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Soumik Ranjan Dasgupta
Try creating a text corpus with only digits using various handwritten fonts that come close to your dataset from fonts.google.com. Use tesstrain.sh for rendering the images, and lstmtraining to train tesseract - you'll achieve a fair accuracy. On Tue, Jul 17, 2018 at 11:38 PM Lorenzo Bolzani wrot

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-17 Thread Ramakant Kushwaha
@Soumik,Thanks Soumik, but I am not getting it, please provide me some links to understand it. I am very new to this thing. can you guide me in creating text corpus of digit with different fonts @Lorenzo, I want to detect digits written in boxex of below image, it's a cash deposit form of a ban