Re: [tesseract-ocr] Numbers detection

Ger Hobbelt Fri, 22 Dec 2023 11:05:23 -0800

On Thu, 21 Dec 2023, 15:22 Art Rhyno, <artrh...@uwindsor.ca> wrote:

> If
>


Important extra note (as I see a new image that's white text on black
background):

Tesseract was trained on black text on white background, targeting books,
publications and academic papers' OCRing.

To improve your chances, ALWAYS make sure your text (letters) are black (or
very! dark grey at least) and your background is white. That is what the
engine was trained on and for and thus black text on white background is
what you should strive for in your images which you intend to feed to
tesseract.
(See also notes in my email response in another thread in here just a few
minutes ago. It in the documentation, but when you dont realize what youre
reading there, this is the main thing to check and ensure to have:
- text is black
- background is white
- greyscale image is fine and possibly better for the added edge detail,
but you invariably aim for something that looks as close as possible to
"black print on white paper".

This the shown output (processed) image should be inverted to match the
above conditions.

HTH


>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAFP60fp-xZUy5701UqGh4UV-TdoHobV3nV%2BV1zYTizxoGF4muQ%40mail.gmail.com.

Re: [tesseract-ocr] Numbers detection

Reply via email to