Re: [tesseract-ocr] Miss lots of words in the detection

2024-01-23 Thread L ht
Hi Zdenko, Thanks. Your insights have been instrumental in helping me grasp the concepts behind Tesseract. I've been experimenting with various thresholding methods, such as Otsu (0), LeptonicaOtsu (1), and Sauvola (2), and I've noticed that they yield distinct outcomes when applied to my images.

Re: [tesseract-ocr] Miss lots of words in the detection

2024-01-22 Thread Zdenko Podobny
Hi, The most critical part is this: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html, but I need to stress: tesseract is OCR *engine *not OCR *suite*. Unless your input page is not a book page scan without a difficult structure, you need to do your part like image processing and documen

Re: [tesseract-ocr] Miss lots of words in the detection

2024-01-22 Thread L ht
Hi Zdenko, Thanks for your response. I read the Tesseract User Manual (https://tesseract-ocr.github.io/tessdoc/), but not read the code I tried both tessdata_best and tessdata, tried different parameters of --psm, still can not get more detections. To provide some context, when I applied Tessera

Re: [tesseract-ocr] Miss lots of words in the detection

2024-01-21 Thread Zdenko Podobny
Did you read the documentation or did you just set your expectations? Zdenko ne 21. 1. 2024 o 12:00 L ht napĂ­sal(a): > I am new to use tesseract. I found tesseract does not work as expected. I > attach one example. > > tesseract 5.3.2 > tesseract 272525030292764523137280353496213864766.png -

[tesseract-ocr] Miss lots of words in the detection

2024-01-21 Thread L ht
I am new to use tesseract. I found tesseract does not work as expected. I attach one example. tesseract 5.3.2 tesseract 272525030292764523137280353496213864766.png - -l eng --psm 3 quiet can only detect those words "Log in Username Password Cancel" I submit this picture to several online pic->t