Re: [tesseract-ocr] Bad recognition with good input image

2022-12-16 Thread jannes hoekman
We have a not free postprocessing software with 38 filters for a optimal OCR Its called BIQE = Batch Image Quality Enhancer which also works with Tesseract 4 OCR Op donderdag 24 november 2022 om 20:57:17 UTC+1 schreef zdenop: > please read and follow the docs: > https://github.com/tesseract-oc

[tesseract-ocr] Re: quality of the recognized text

2022-12-16 Thread jannes hoekman
We have a postprocessing software (not free) which has 38 filters to improve the OCR of an image in batch Its called BIQE Op maandag 7 november 2022 om 21:37:55 UTC+1 schreef Аллигатор: > Hi. Is it possible to improve the quality of text recognition? --oem 0 > recognizes better than --oem 3, bu

[tesseract-ocr] Tesseract assigns wrong font size

2022-12-16 Thread Kehinde Adeoya
I'm using Tesseract-3.0.5, and Tessdata-3.0.4 I have trained the font successfully., and Tesseract recognizes the properties of the fonts. I have 2 fonts trained, namely: Ubuntu, and Inter. Tesseract assigns appropriate properties to Ubuntu font but misses sometimes when assigning font-size to I

[tesseract-ocr] Re: Error in training

2022-12-16 Thread Kehinde Adeoya
Which Tesseract version are you using? On Thursday, 15 December 2022 at 13:19:50 UTC+1 soumenha...@gmail.com wrote: > *Please help me* > > ESSDATA_PREFIX=../tesseract/tessdata make training MODEL_NAME=foo > START_MODEL=eng TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10 > combine_lang_model \ >

[tesseract-ocr] Fine Tuning with image containing multiple languages

2022-12-16 Thread Jacob Pedersen
Hi Consider an image containing a mix of English and German text. Extracting wordstr boxes from it and fixing mistakes. When fine tuning the two languages, I get encoding errors for English as it does not contain German chars. What is the correct approach here? 1. Ignore encoding errors? What

[tesseract-ocr] Best Font for numbers

2022-12-16 Thread TOPSie
I am a simple user of Tesseract, with a single purpose. I scan Proof of Delivery slips which have a 6 digit number in the text. I successfully OCR the number and file the scan image using the number. But often the numbers 8 and 3 and 5 and 6 are confused. Rather than do anything with the Tessera