[tesseract-ocr] Re: Training Tesseract 4 from Scratch

2019-04-19 Thread yoganand
*im trying to train my tesseract 4.. i started it with installing cygwin and could do till setup and steps you have given for OCRD-train is giving issues while trying to compile leptonica and tesseract. i felt that steps you have given are bit highlevel for me. i tried 'make leptonica' thru cyg

[tesseract-ocr] compute ctc targets failed

2019-04-19 Thread suraa syss
lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --learning_rate 20e-4 --train_listfile data/list.train --eval_listfil

[tesseract-ocr] Re: Training Tesseract 4 from Scratch

2019-04-19 Thread Kristóf Horváth
So what i meant there is that you have to execute the commands from the location of OCR-d, because thats where you can find the Makefile. 2019. április 19., péntek 9:59:17 UTC+2 időpontban yoganand a következőt írta: > > *im trying to train my tesseract 4.. i started it with installing cygwin >

Re: [tesseract-ocr] Training Tesseract 4 from Scratch

2019-04-19 Thread Reddy, Yoganand
Even, i have the same problem. I think there are many facing this issue. can someone stepup and provide bit more clarity on documentation. On Wed, Apr 3, 2019 at 8:16 PM Shobhit Kapil wrote: > Hi Team, > > I am not at all aware of training tesseract 4, is there any way that how > to learn train

[tesseract-ocr] Re: Can I use this way for fine tuning?

2019-04-19 Thread suraa syss
you want to prepare unicharset before lstm training On Thursday, 18 April 2019 14:49:20 UTC+5:30, yixinl...@gmail.com wrote: > > Hello,everyone: > I have used tesseract 4.0 to train a chi_sim model,but the result is > not so good as I expected,So I think out one way for fine tuning. > > 1.s

[tesseract-ocr] is there a way to scan only first word of a page?

2019-04-19 Thread Vikas Sharma
Hello guys, I am trying to identify page category by recognizing the only first word on a page, but the pages can have much more text so it is taking so much time. I just wanted to limit scanning to one word only. I have tried psm option but no luck there. -- You received this message becau

Re: [tesseract-ocr] is there a way to scan only first word of a page?

2019-04-19 Thread Zdenko Podobny
Simple answer is: no - you can not limit OCR to first word. But you can restrict area for OCR via unz file (search forum that). If you know that your image must have text in some part of image, you can define area of your interest in unz file. Zdenko pi 19. 4. 2019 o 14:49 Vikas Sharma napísa

Re: [tesseract-ocr] How to train tesseract with new script?

2019-04-19 Thread suraa syss
Because that script is not properly trained On Tuesday, 9 April 2019 11:13:44 UTC+5:30, Moni wrote: > > Hi good morning... Currently I am Phd scholar doing my research in > ancient Tamil Inscriptions. Had seen your trained data for bramhi script > and working with that but getting an error "Fail

Re: [tesseract-ocr] is there a way to scan only first word of a page?

2019-04-19 Thread Lorenzo Bolzani
Hi, if the page has a fixed simple format you can crop the image leaving only the upper part. You can use imagemagick or a python script, etc. Lorenzo Il giorno ven 19 apr 2019 alle ore 14:49 Vikas Sharma < vikasharma2...@gmail.com> ha scritto: > Hello guys, > > I am trying to identify page cate