Modern(-ish) scanners have an option to perform OCR on scanned
documents. I've seen such PDF files that have a big image of the scanned
documents as a back-ground, with lots of transparent text on top. That
allows for the user to copy-paste text (OCR-ed) from such scanned documents.
I vaguel
I'm not quite sure what you mean by "scanned pdf", but if each page
basically consists of one image, and no text, that might be a strong
indication.
On Wed, 20 Nov 2024, 11:05 achilles, <1743702...@qq.com.invalid> wrote:
> hi:
> How can i judge a PDF is a Scanned PDF use pdfbox?
> i don't fin
A sure sign that the text is the product of OCR, is that it is rendered in
mode 3 (command "3 Tr"); i.e. invisible. See PDF 1.7 specification
(32000-1:2008), section 9.3.6.
Unless the PDF producer adds some kind of visible watermark using text, all
text will be instructed to render in this mode.
3 matches
Mail list logo