Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread Zdenko Podobny
First of all: if you follow any tutorial on internet - report the problem to the author of the tutorial. Next: use official documentation for training. I see there are a bunch of folks just "generating content" - to gain an audience. Without insight and therefore also without support, using old/out

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread John Alway
"First of all: if you follow any tutorial on internet - report the problem to the author of the tutorial." Next: use official documentation for training. I see there are a bunch of folks just "generating content" - to gain an audience. Without insight and therefore also without support, using old/o

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread Adrian Paul Ciobanita
I don't think the github link is helpful too much, tbh. I've had this issue with training something particular for my case since 2020. I've not had much time lately, but there's still no clean and easy tutorial to retrain something, that correctly describes how to create and use the ground truth fi

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread Zdenko Podobny
Trained data from tesseract 5 are compatible with 4, so definitely I would suggest using the latest tesseract version for training - there was a lot of bug fixing and speed improvements. IMO tesseract training has never been easy. I always suggest focusing on image preprocessing rather than traini

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread Adrian Paul Ciobanita
Can you recomend tutorials, or books avout how to do image pre-processing effectively and efficiently? Do we need to do different types of image pre-processing for each image? If we have 100+ images, how do we ensure that the pre-processing is helping the prediction accuracy 100%? On Wed, Aug 31,

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread Zdenko Podobny
Shreeshrii , bertky and many others from the tesseract community invested a lot of time to improve training and documentation (e.g. tesstrain.sh was abandoned an

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread Zdenko Podobny
There is nothing like 100% OCR accuracy. Simply from a bad image you can not get good results (maybe google vision is close ;-), but it is a different story). Our best experiences are collected at docs ( https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html). For different images/problems yo

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread Adrian Paul Ciobanita
Thank you for the links and knowledge! It definitely makes a good read, and a fine introduction to missing pieces. Stay safe and healthy, Ciobanita Paul Adrian. ~ SATCOM Sr. Systems Engineer / DevOps engineer / Test Engineer ~ Skype: adrian_iss_consult On Wed, 31 Aug 2022 at 1

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread John Alway
Hello Zdenko, Thank you for the advice. I ended up being able to tweak the tesseract parameters and was able to improve performance so that it was good enough without having to train. And I do appreciate the hard work and cleverness that has gone into creating and improving tesseract. It's a b

Re: [tesseract-ocr] Training Fonts, mftraining hangs

2022-08-31 Thread Adrian Paul Ciobanita
Hey John, Are you able to share which parameters you tweaked to get better performance? Thank you On Thu, Sep 1, 2022, 08:13 John Alway wrote: > Hello Zdenko, > > Thank you for the advice. I ended up being able to tweak the tesseract > parameters and was able to improve performance so that i