tesseract fix_size.png - 0326 0939 1552 2206
See doc for explaining: https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md#rescaling Zdenko so 26. 2. 2022 o 21:04 Chris McClelland <prophet3...@gmail.com> napĂsal(a): > Hi tesseract community! > > I've found an interesting scenario where a simple 4-digit number cropped > from a PDF (i.e from a region rendered from a vector font, not from an > embedded bitmap) is incorrectly OCR'd. I used ImageMagick to extract a .png > from the source PDF, like this: > > convert -density 1600 -trim input.pdf[42] -rotate 90 +repage -crop > 600x720+900+3400 crop.png > > ...and then used tesseract to OCR it: > > tesseract crop.png stdout --psm 6 > > The digits "1552" in the source image are OCR'd as "15562". > > You can try for yourself like this: > > wget https://i.imgur.com/0swZuoU.png > tesseract 0swZuoU.png stdout --psm 6 > > The image as hosted on imgur is not bitwise-equivalent to crop.png, but > it's impossible to tell apart by eye. I can upload the original crop.png > somewhere else, if necessary. > > I'm using the latest commit (30ebb31f) of the tesseract engine, and I > tried with the latest commits (4767ea9 & e2aad9b) of both tessdata and > tessdata_best. > > Can I do anything to improve the OCR result in this sort of scenario? > > Chris > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/6d94d071-6161-4d21-8733-c5322ee71dd0n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/6d94d071-6161-4d21-8733-c5322ee71dd0n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yjZ_AeXgpAWONkqMHtzF%3DfB2J68Hg%2BaK959JJM3pF_fA%40mail.gmail.com.