Your #1 is correct.

Tesseract4 behaviour here applies to tesseract4 without change. (In other
words: tesseract4 is the odd one out)

On Thu, Jan 6, 2022, 08:08 Philip L <philip.lecl...@gmail.com> wrote:

> At
> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md#binarisation,
> the Tesseract docs say:
>
> *While tesseract version 3.05 (and older) handle inverted image (dark
> background and light text) without problem, for 4.x version use dark text
> on light background*
> and
>
> *If you OCR just text area without any border, tesseract could have
> problems with it. See for some details in tesseract user forum
> <https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/v26a-RYPSOE/2Sppq61GBwAJ>#427
> <https://github.com/tesseract-ocr/tesseract/issues/427> . You can easy add
> small border (e.g. 10 pt) with ImageMagick®
> <http://imagemagick.org/script/index.php>:*
> *convert 427-1.jpg -bordercolor White -border 10x10 427-1b.jpg*
>
> I'm a little puzzled about two things:
>
> 1. If we're using a light background, won't "adding a white border"
> typcially just mean making a larger image with the target text making up
> less of its area (because the border will match the color of the
> background)? Is that the intended interpretation of this -- to avoid text
> that directly touches the boundaries of the image?
>
> 2. The inversion advice talks about Tesseract 3 and 4. Does Tesseract 5
> maintain the "dark text on light background" preference of 4?
>
> p.s. Tried to post a message once before and it didn't show up for some
> reason. Giving it one more shot; sorry if this doubleposts.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/4fc6a9e9-b674-479b-930b-e955f23204d1n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/4fc6a9e9-b674-479b-930b-e955f23204d1n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foh_Gj9aTjRuduJ9TOxaeD%3DbjSBoarxo5BTb%2BXwbJUxEA%40mail.gmail.com.

Reply via email to