Re: [tesseract-ocr] tesseract failing on extremely simple example

2021-03-30 Thread Shree Devi Kumar
I did fine-tuning with the eng.traineddata, using about 200 text lines from the training text and 1100 iterations , CER of 0.01. The resulting model is small because it does not have the dictionary files and is compressed to fast/integer model. On Wed, Mar 31, 2021, 03:37 marvin thielk wrote: >

Re: [tesseract-ocr] tesseract failing on extremely simple example

2021-03-30 Thread marvin thielk
oops, missed this delivery failure. The ttf file is too large to attach because it contains asian characters. I can upload it somewhere if you're interested, but I plan on training a model for my own edification. Original message below: This is awesome, thank you so much! What hyperparameters did

Re: [tesseract-ocr] tesseract failing on extremely simple example

2021-03-27 Thread Marvin Thielk
I do have the font available as a ttf file. It is probably copyright protected but I could post it if it would be useful. No I need to recognize letters and numbers, and I've been able to extract text from other regions of the images, its just this region of numbers and .%'s Thanks, ~Marvin O

Re: [tesseract-ocr] tesseract failing on extremely simple example

2021-03-27 Thread Shree Devi Kumar
Do you have the font used in the sample? Do you only need to recognise numbers in it? On Sat, Mar 27, 2021, 16:10 Marvin Thielk wrote: > I've tried a variety of pre-processing attempts and different configs, but > this feels like it should be an easy detection task. > > I've tried with several d