post: 1. Original image (without preprocessing) 2. + image used for OCR (preprocessed) 3. + output from tesseract executable (not tesseract wrappers) and used parameters/option
Otherwise, nobody can reproduce the problem and therefore suggest a solution. Zdenko ne 31. 12. 2023 o 10:53 Jason Shepherd <jmanshepher...@gmail.com> napĂsal(a): > I'm using pytesseract and tesseract v5.3.3 to read some text from some > images and I sometimes get these weird phantom characters. I've tried to > do some image preprocessing like increasing the image size, erosion, > thresholding, etc, but nothing seems to get rid of this random character > that's spawing from nothing. Attached are two image examples (left side > is processed, right is original with rect bounding boxes drawn), The blue > rectangle to right of "KB PNG" is a '_' being detected even tho that > space is completely blank. Any ideas on getting rid of this? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/8800b99f-b92d-4dbf-83b8-d1d3da9c2bf4n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8800b99f-b92d-4dbf-83b8-d1d3da9c2bf4n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZqRPS17_TXa05XyvMJ41h-4FuFNS9egUcm0c%2Be2Oh4A%40mail.gmail.com.