tesseract I_read_docs_carefully_instead_of_a_lot_of_writing.png - --psm 6 $0.081
Zdenko po 12. 2. 2024 o 18:40 Rob <madiso...@gmail.com> napísal(a): > Hello, > > I've run into some trouble using Tesseract OCR in a python program doing > some screen scraping. I can't quite wrap my head around why this one value > is having so much more trouble than the others on the same page, with the > same contrast and font. > > This is the image in question: > It has been scraped from a 1080p resolution screenshot, sliced into > individual images for the values in a grid, scaled up by 10x, inverted > (from white-on-black to this), thresholded, and passed to Tesseract. I have > also tried various Gaussian and median blurs but those seem to just make > other strings fail more. > > I have tried most of the PSM options that make sense, and passed options > with just numerals, $, comma, and decimal as allow list of characters. I've > tried all the different interpolations OpenCV has to offer. Tesseract just > constantly chokes on this value. > > It's a little frustrating because the only OCR I've found that works with > this value is an A9T9 model(I think) through the free api at ocr.space ( > https://ocr.space/ocrapi#ocrengine2 ). Unfortunately there doesn't appear > to be a way for me to run that locally, and the string seems like it should > be simple for an OCR read. > > Any advice on poking Tesseract in the right way to read this, or some > fancy filtering I could do to help make the image clearer for it? > > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/ae2ae7cd-6cd1-44ef-843e-ef10a35929c6n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ae2ae7cd-6cd1-44ef-843e-ef10a35929c6n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xw5JQ7J6atb4WOQN-q%2BrEGMeQbUv9OvfMG%3DrMQr0fgig%40mail.gmail.com.