At 
https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md#binarisation,
 
the Tesseract docs say:

*While tesseract version 3.05 (and older) handle inverted image (dark 
background and light text) without problem, for 4.x version use dark text 
on light background*
and

*If you OCR just text area without any border, tesseract could have 
problems with it. See for some details in tesseract user forum 
<https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/v26a-RYPSOE/2Sppq61GBwAJ>#427
 
<https://github.com/tesseract-ocr/tesseract/issues/427> . You can easy add 
small border (e.g. 10 pt) with ImageMagick® 
<http://imagemagick.org/script/index.php>:*
*convert 427-1.jpg -bordercolor White -border 10x10 427-1b.jpg*

I'm a little puzzled about two things:

1. If we're using a light background, won't "adding a white border" 
typcially just mean making a larger image with the target text making up 
less of its area (because the border will match the color of the 
background)? Is that the intended interpretation of this -- to avoid text 
that directly touches the boundaries of the image?

2. The inversion advice talks about Tesseract 3 and 4. Does Tesseract 5 
maintain the "dark text on light background" preference of 4?

p.s. Tried to post a message once before and it didn't show up for some 
reason. Giving it one more shot; sorry if this doubleposts.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4fc6a9e9-b674-479b-930b-e955f23204d1n%40googlegroups.com.

Reply via email to