Re: extract text from a password protected PDF

2023-10-03 Thread Tilman Hausherr
Well yes, OCR, obviously. You could also look at the source code of ExtractText and decide how you want to handle the permissions 😂 Tilman On 02.10.2023 19:37, Robert Rodini wrote: Hi, I have had great success with PDFBOX Extract. That is until the supplier of the PDF decided to password

RE: extract text from a password protected PDF

2023-10-03 Thread David Francescato
Tesseract/Tess4J is a good OCR combo, Tess4j uses PDFBOX for pdf for pdf2imgs -Original Message- From: Tilman Hausherr Sent: Tuesday, 3 October 2023 10:05 To: users@pdfbox.apache.org Subject: Re: extract text from a password protected PDF Well yes, OCR, obviously. You could also look a

Re: empty/missing pdf content

2023-10-03 Thread Pados Attila
Hi, here is the repository with test/reproduce code: https://github.com/padisah/pdfboxtests Here I am reproducing a character displacement problem: text that includes '-' sign, they are shifted from position. There will be more cases added, with missing content. On Tue, Sep 26, 2023 at 3:04 PM