Thanks
On Saturday, July 20, 2024 at 5:15:44 PM UTC+3 ger.h...@gmail.com wrote: > Too little information provided for anyone to try and (at least) reproduce > your problem. > > Besides, if this is your source image you're toast anyway. For you and > others: > > [image: mekur-bad-rez2.webp] > > > your image reports as ~ 400x500-something pixels in size. (In the chart > image above numbers' unit is *hundreds of pixels* i.e. '4' = 400 px) and *for > tesseract to have a chance at all a single text line's C[apitals]-height > should be around 30px*; higher can be scaled down if needed, during image > preprocessing done before feeding your stuff to tesseract. > > TL;DR: that '30' number means the number of text lines in a section of 100 > pixels should be about *3* (or rather less as line-height > C-height > > x-height), not **9** lines as counted in your image! > > I don't know this language, but for you & anyone else who likes to have at > least a fighting chance of OCR-ing something: 30px D-height implies a > ball-park number of 20px for x-height and "reasonable" line heights to be > 40px or more. And, please, don't get me started on "I resize the image if > you want it to be bigger!" 🤦 To the machine, the above image is just a > bunch of pixelated noise, alas, irrespective of what language the original > was ever written in. Lower pixel measurement values, not surpassing the > benchmark of 30px per line? Redo your scans, get better hardware, do a > better job at the image preprocessing (this image is also failing that > benchmark, incidentally, but one can write a book on that subject alone, so > we'll leave that out) > > > > Met vriendelijke groeten / Best regards, > > Ger Hobbelt > > -------------------------------------------------- > web: http://www.hobbelt.com/ > http://www.hebbut.net/ > mail: g...@hobbelt.com > mobile: +31-6-11 120 978 > -------------------------------------------------- > > > On Wed, Jul 10, 2024 at 11:12 AM Mekuriaw Aze <mekur...@gmail.com> wrote: > >> Dear All >> Cooperation request >> My question is, if I do it again and again in Python to change the image >> to text and make it readable, it give me an error, help me? >> Is the image attached below? Is Geez an Ethiopian language? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/4f47a021-d4ee-4994-bb1b-65009a443153n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/4f47a021-d4ee-4994-bb1b-65009a443153n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0c5ab05c-9f4e-46c5-950b-99afd248a0dan%40googlegroups.com.