When I say Mickey-Mouse I mean looking for a series of black pixels in a line in the white-space between the words. Works ok for a binary image ...
On Tuesday, June 8, 2021 at 9:09:26 AM UTC+1 Jeremy Young wrote: > > You won't like this, but .... > We had a similar problem and we tackled it by doing an initial OCR run to > locate the words, then a really simple mickey-mouse process to look for > lines between the words, and then use the detected lines to identify > regions which we re-OCRd one-by-one. > Enjoy! > > On Friday, June 4, 2021 at 8:40:20 PM UTC+1 [email protected] wrote: > >> >> >> https://www.slideshare.net/EdwardOHalloran1/officer-evaluation-form-20160908 >> >> How do I go about improving the OCR of the form above? I have tried a lot >> of methods, such as erasing the lines, cropping out individual rows, etc, >> and none seem to improve the tesseract OCR performance. >> >> The biggest problem is the text that I need (the field) seems to do OK, >> but the surrounding identifier is sometimes poor, which makes extraction >> difficult using regex. >> >> -- LIKEZERO Limited is a limited company registered in Scotland with registered number SC651418. Our registered office is at Quartermile One, 15 Lauriston Place, Edinburgh, United Kingdom, EH3 9EP This email is intended solely for the addressee and may contain confidential information. If you have received this message in error, please immediately and permanently delete it. Do not use, copy or disclose the information contained in this message or in any attachment. This email is not in any way intended to create a binding contract. We may monitor and record emails for security reasons and for monitoring compliance with internal policies. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1115d689-d698-4e70-ac39-56044958c025n%40googlegroups.com.

