[tesseract-ocr] Re: Prescription scan recognition

2024-02-15 Thread 'Mert T' via tesseract-ocr
Any ideas? Mert T schrieb am Donnerstag, 8. Februar 2024 um 17:16:16 UTC+1: > Hello, > > I'm new to Tesseract and have the problem that the text recognition has > many errors. What I'm doing is scanning a prescription in German, and I > want to show only certain areas. > So I created certain ar

Re: [tesseract-ocr] Re: Prescription scan recognition

2024-02-15 Thread Ger Hobbelt
Re "X" checkbox: Since this is a (I assume) standardized form, those checkboxes are at known, fixed, positions. Couple of thoughts: 1: assuming everyone "crosses" a checkbox is a faulty assumption. Some people, depending on circumstances, "blacken" the box in other ways, all legal and to be expe

Re: [tesseract-ocr] Re: Prescription scan recognition

2024-02-15 Thread Ger Hobbelt
On Thu, 15 Feb 2024, 17:06 Ger Hobbelt, wrote: > Re "X" checkbox: > > More shorthand examples in your "input language": Tabl. = tablet (pill) tägl = täglich (German: daily dosage) I mention these extra examples (visible in the scanned images) as I find generally people have a hard time wrap

Re: [tesseract-ocr] Re: Prescription scan recognition

2024-02-15 Thread Ger Hobbelt
Re tesseract output for "mittag" etc in your sample: first port of call for "cleaning up dot matrix printer" for OCR, i.e. dedicated image preprocessing would be googling leptonica image morphology, open close expand dilate dot matrix or some such. While I would go with using leptonica for that,