Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Soumik Ranjan Dasgupta
I normally use a custom python file to generate the training text. Attaching a sample text corpus containing only digits 1234. On Wed, Jul 18, 2018 at 12:04 PM Ramakant Kushwaha < ramakant.sing...@gmail.com> wrote: > @Soumik,Thanks Soumik, but I am not getting it, please provide me some > links t

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Soumik Ranjan Dasgupta
Follow https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 to create the traineddata. Copy the eng.traineddata file to $TESSDATA_PREFIX directory, and you'll be good to go. On Wed, Jul 18, 2018 at 1:20 PM Soumik Ranjan Dasgupta < srd1...@cse.jgec.ac.in> wrote: > I normally use

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Lorenzo Bolzani
​​ This is exactly the MNIST problem . I would not use tesseract for this. You can download something like this: https://github.com/EN10/KerasMNIST that comes with pre-trained models too. The problem you'll have will be to extract the di

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Ramakant Kushwaha
@Lorenzo As per my understanding MNIST in useful for detecting individual char/digit, so for using MNIST I have to do below steps,* correct me if i am wrong* 1. Gray + Threshold (Opencv) 2. Extract Connected components (MSER opencv) 3. run a loop over connected components list(sorted) and crop in

[tesseract-ocr] Re: java.lang.UnsatisfiedLinkError in tess4j

2018-07-18 Thread Quan Nguyen
That was old posts for older versions. You need VC++ 2015 at this time. Btw, we haven't tested it with Java 10 yet. Will do that soon. Thanks. On Tuesday, July 17, 2018 at 12:50:20 PM UTC-5, Dattatraya Tembare wrote: > > Forgot to mention, I have installed Visual C++ Redistributable for VS2013

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Lorenzo Bolzani
​​ A MNIST trained model does character recognition, not detection. You first need to isolate characters to use it. The advantage is that it is already trained and I think it may work better than fine tuning tesseract because the handwritten digits are quite different from standard fonts. The di

[tesseract-ocr] Easiest way to make a train data to recognize Handwritten Mathmatical Expression.

2018-07-18 Thread Hamza Rajput
If some trained data about this application is already developed instead of equ.traineddata, then please deliver that trained data to me through email. That will be a great honor thank you! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To un

[tesseract-ocr] Re: java.lang.UnsatisfiedLinkError in tess4j

2018-07-18 Thread Quan Nguyen
I've found the unit tests completed successfully with JDK 10 as well. So make sure you install the appropriate VC++ runtime. On Wednesday, July 18, 2018 at 6:48:35 AM UTC-5, Quan Nguyen wrote: > > That was old posts for older versions. You need VC++ 2015 at this time. > > Btw, we haven't tested i

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-18 Thread Ramakant Kushwaha
Thanks Lorenzo, I will try OPENCV + SIFT + MNIST, will update you soon. On Wednesday, July 18, 2018 at 5:26:05 PM UTC+5:30, Lorenzo Blz wrote: > > ​​ > > A MNIST trained model does character recognition, not detection. You first > need to isolate characters to use it. The advantage is that it

Re: [tesseract-ocr] Easiest way to make a train data to recognize Handwritten Mathmatical Expression.

2018-07-18 Thread Ramakant Kushwaha
As per suggestions on group , I am planning to use MNIST model for handwritten . I do not have trained data, will post if work on it. On Wed, Jul 18, 2018 at 7:09 PM, Hamza Rajput wrote: > If some trained data about this application is already developed instead > of equ.traineddata, then please

[tesseract-ocr] Questions about training korean language in tesseract 4.0

2018-07-18 Thread nampyo hong
Hello, I have two questions about training tesseract 4.0 1. In case of English, I can find box file and how to training such as T 112 4663 140 4696 0 e 140 4662 160 4686 0 s 163 4662 179 4686 0 s 182 4661 198 4686 0 e 200 4661 220 4685 0 r 221 4662 238 4685 0 a 239 4661 260 4685 0 c 261 4661 281

Re: [tesseract-ocr] Questions about training korean language in tesseract 4.0

2018-07-18 Thread Soumik Ranjan Dasgupta
2) For checking the fonts used in generating the traineddata for your language, you can see training/language-specific.sh and langdata/font_properties under your respective language code. If I'm not wrong, the language code for korean is "kor". Check out langdata/kor directory. On Thu, Jul 19, 2