[tesseract-ocr] Re: Optimal image resolution (dpi/ppi) for Tesseract 4.0.0 and eng.traineddata?

2022-02-27 Thread zdenop
Hello Willus, Can you also test tesseract 5? Can you share your input data for testing or script for evaluation, how you generate output charts? Zdenko Dátum: pondelok 31. decembra 2018, čas: 23:23:39 UTC+1, odosielateľ: wil...@gmail.com > So I did some more experimenting and convinced myself

Re: [tesseract-ocr] Incorrect OCR of 4-digit number

2022-02-27 Thread Zdenko Podobny
my 2 cents: First of all create the public testing case/repository focused on this problem e.g. different font families, font size, shot text (like 0swZuoU.png), long text, etc. This could be used for finding problems/bugs, evaluating possible solutions, maybe (re)training. So synthetic data imita

Re: [tesseract-ocr] Incorrect OCR of 4-digit number

2022-02-27 Thread Zdenko Podobny
I do not know. The trick with upscaling is here from version 3.x. The trick with downscaling works from version 4.x Just looking at Willus Dotkom's chart[1] I would guess there is some design decision... But without explanation from original/google programmers, we can just guess or find a bug ;-)

Re: [tesseract-ocr] Incorrect OCR of 4-digit number

2022-02-27 Thread Merlijn B.W. Wajer
Hi, On 27/02/2022 08:55, Zdenko Podobny wrote: tesseract fix_size.png - 0326 0939 1552 2206 See doc for explaining: https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md#rescaling Thanks for th