I only skimmed Ger's long reply, but didn't see a link to the issue, which I think is the important bit of information:
https://github.com/tesseract-ocr/tesseract/issues/238 It's a long standing (and complex) problem in which behavior varies across different PDF viewers. Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3f83af1e-4547-41d4-87bf-c0a45129eaf1n%40googlegroups.com.