[tesseract-ocr] Image Inversion vs Border Creation [Docs Question]

Philip L Wed, 05 Jan 2022 23:08:50 -0800

At 
https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md#binarisation,
 
the Tesseract docs say:

*While tesseract version 3.05 (and older) handle inverted image (dark
background and light text) without problem, for 4.x version use dark text
on light background*
and

*If you OCR just text area without any border, tesseract could have
problems with it. See for some details in tesseract user forum
<https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/v26a-RYPSOE/2Sppq61GBwAJ>#427

<https://github.com/tesseract-ocr/tesseract/issues/427> . You can easy add
small border (e.g. 10 pt) with ImageMagick®
<http://imagemagick.org/script/index.php>:*
*convert 427-1.jpg -bordercolor White -border 10x10 427-1b.jpg*

I'm a little puzzled about two things:

1. If we're using a light background, won't "adding a white border"
typcially just mean making a larger image with the target text making up
less of its area (because the border will match the color of the
background)? Is that the intended interpretation of this -- to avoid text
that directly touches the boundaries of the image?

2. The inversion advice talks about Tesseract 3 and 4. Does Tesseract 5
maintain the "dark text on light background" preference of 4?

p.s. Tried to post a message once before and it didn't show up for some
reason. Giving it one more shot; sorry if this doubleposts.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/4fc6a9e9-b674-479b-930b-e955f23204d1n%40googlegroups.com.

[tesseract-ocr] Image Inversion vs Border Creation [Docs Question]

Reply via email to