[tesseract-ocr] Re: Improving OCR of Form

Jeremy Young Tue, 08 Jun 2021 01:09:31 -0700

You won't like this, but ....
We had a similar problem and we tackled it by doing an initial OCR run to 
locate the words, then a really simple mickey-mouse process to look for 
lines between the words, and then use the detected lines to identify 
regions which we re-OCRd one-by-one.
Enjoy!

On Friday, June 4, 2021 at 8:40:20 PM UTC+1 [email protected] wrote:

>
>
> https://www.slideshare.net/EdwardOHalloran1/officer-evaluation-form-20160908
>
> How do I go about improving the OCR of the form above? I have tried a lot 
> of methods, such as erasing the lines, cropping out individual rows, etc, 
> and none seem to improve the tesseract OCR performance. 
>
> The biggest problem is the text that I need (the field) seems to do OK, 
> but the surrounding identifier is sometimes poor, which makes extraction 
> difficult using regex. 
>
>
-- 

LIKEZERO Limited is a limited company registered in Scotland with 
registered number SC651418. Our registered office is at Quartermile One, 15 
Lauriston Place, Edinburgh, United Kingdom, EH3 9EP

This email is intended 
solely for the addressee and may contain confidential information. If you 
have received this message in error, please immediately and permanently 
delete it. Do not use, copy or disclose the information contained in this 
message or in any attachment.

This email is not in any way intended to 
create a binding contract.

We may monitor and record emails for security 
reasons and for monitoring compliance with internal policies.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/06f6d2a1-9b65-41ef-96d8-d73509fb32a5n%40googlegroups.com.

[tesseract-ocr] Re: Improving OCR of Form

Reply via email to