from:"David Francescato"

PDF2Img Annotation problem

2023-09-05 Thread David Francescato

Hi PDFBox team, I’m trying to transform pdf to page images in order to perform a OCR. I’ve used splitting with PDFBox since many years and I’m happy with it. This year a pdf file with a problematic annotation/signature made some problems. (see stack-trace below, same effect on 2.0.26, 2.0.29, 3.0

RE: extract text from a password protected PDF

2023-10-03 Thread David Francescato

Tesseract/Tess4J is a good OCR combo, Tess4j uses PDFBOX for pdf for pdf2imgs -Original Message- From: Tilman Hausherr Sent: Tuesday, 3 October 2023 10:05 To: users@pdfbox.apache.org Subject: Re: extract text from a password protected PDF Well yes, OCR, obviously. You could also look a