Hello. I've got some input document input.pdf. This comes straight from a scanner and thus I do some preprocessing to improve accuracy (i.e., unpaper, black/white, increased contrast), which yields preprocessed.png.
When using the command tesseract preprocessed.png output pdf I receive a document, which has the ocr'ed text embedded. Great! However: Can I tell tesseract to use the original document input.pdf as the background (i.e., the one without preprocessing) of the generated PDF while still performing ocr on the preprocessed input? Thanks, Jonas -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/63db35ed-fb19-41b5-ab83-0003538b236fn%40googlegroups.com.