Hi all, I have an issue with tesseract (.js if that matters) erroneously detecting the wrong things in the image. In the following image, it picks up the artefact in the top-right quadrant and for some reason only outputs "LEVEL", with no digits.
[image: fail_gyarados.png] I realize that removing the artefacts is the best solution, but they can be unpredictable in position and shape. Does anyone have any good ideas or resources you can point me towards to isolate and remove these artefacts? They always start on an edge, so my intuition is that I could (somehow) remove any pixel adjacent to a pixel that is (recursively) adjacent to the edge. But not sure how to read and modify image data in such a way or if I should use an existing library to do so. Also not sure what search terms to employ to research such algorithms. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c32d97ca-795f-4490-8833-e7d7953845b7n%40googlegroups.com.