Hi PDFBox team,
I’m trying to transform pdf to page images in order to perform a OCR.
I’ve used splitting with PDFBox since many years and I’m happy with it.
This year a pdf file with a problematic annotation/signature made some
problems. (see stack-trace below, same effect on 2.0.26, 2.0.29, 3.0
Tesseract/Tess4J is a good OCR combo, Tess4j uses PDFBOX for pdf for pdf2imgs
-Original Message-
From: Tilman Hausherr
Sent: Tuesday, 3 October 2023 10:05
To: users@pdfbox.apache.org
Subject: Re: extract text from a password protected PDF
Well yes, OCR, obviously.
You could also look a
2 matches
Mail list logo