Your #1 is correct. Tesseract4 behaviour here applies to tesseract4 without change. (In other words: tesseract4 is the odd one out)
On Thu, Jan 6, 2022, 08:08 Philip L <philip.lecl...@gmail.com> wrote: > At > https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md#binarisation, > the Tesseract docs say: > > *While tesseract version 3.05 (and older) handle inverted image (dark > background and light text) without problem, for 4.x version use dark text > on light background* > and > > *If you OCR just text area without any border, tesseract could have > problems with it. See for some details in tesseract user forum > <https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/v26a-RYPSOE/2Sppq61GBwAJ>#427 > <https://github.com/tesseract-ocr/tesseract/issues/427> . You can easy add > small border (e.g. 10 pt) with ImageMagick® > <http://imagemagick.org/script/index.php>:* > *convert 427-1.jpg -bordercolor White -border 10x10 427-1b.jpg* > > I'm a little puzzled about two things: > > 1. If we're using a light background, won't "adding a white border" > typcially just mean making a larger image with the target text making up > less of its area (because the border will match the color of the > background)? Is that the intended interpretation of this -- to avoid text > that directly touches the boundaries of the image? > > 2. The inversion advice talks about Tesseract 3 and 4. Does Tesseract 5 > maintain the "dark text on light background" preference of 4? > > p.s. Tried to post a message once before and it didn't show up for some > reason. Giving it one more shot; sorry if this doubleposts. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/4fc6a9e9-b674-479b-930b-e955f23204d1n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/4fc6a9e9-b674-479b-930b-e955f23204d1n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foh_Gj9aTjRuduJ9TOxaeD%3DbjSBoarxo5BTb%2BXwbJUxEA%40mail.gmail.com.