Hello Gilad,

Thank you.

Maruan Sahyoun already contacted me with the same tip. It works fine but only 
because we use PDFBox only for rendering and text extraction at the moment. If 
we would use it for other use cases, especially for filling in form fields, we 
would have to create a copy of the document for text extraction which is of 
obviously not optimal in a web application that may have multiple documents 
open at the same time.

Kind regards,

Dipl.-Ing. (FH) Paul Grütter
Head of Development


 
signotec GmbH
Am Gierath 20b
40885 Ratingen (Germany)

Tel.: +49 2102 53575-10
Fax: +49 2102 53575-39

E-Mail: paul.gruet...@signotec.de
URL: www.signotec.com

Amtsgericht Düsseldorf: HRB 44307
Geschäftsführung/CEO: Arne Brandes


Mit freundlichen Grüßen

Dipl.-Ing. (FH) Paul Grütter
Leiter Entwicklung


 
signotec GmbH
Am Gierath 20b
40885 Ratingen

Tel.: +49 2102 53575-10
Fax: +49 2102 53575-39

E-Mail: mailto:paul.gruet...@signotec.de
URL: https://www.signotec.com/

Amtsgericht Düsseldorf: HRB 44307
Geschäftsführung/CEO: Arne Brandes

Von: Gilad Denneboom <gilad.denneb...@gmail.com> 
Gesendet: Sonntag, 24. März 2024 22:50
An: paul.gruet...@signotec.de.invalid
Cc: users@pdfbox.apache.org
Betreff: Re: How to search for / extract text of form field


Sie erhalten nicht oft eine E-Mail von mailto:gilad.denneb...@gmail.com. 
https://aka.ms/LearnAboutSenderIdentification

Flatten the form fields before searching the file if you want PDFTextStripper 
to find the text in them.

On Thu, Mar 21, 2024 at 12:10 PM Paul Grütter 
<mailto:paul.gruet...@signotec.de.invalid> wrote:
Hello list,
 
I want to search for words in a PDF document and get their positions. It seems 
that PDFBox ignores text which has been entered into a form field although it’s 
rendered correctly. I can be reproduced easily with the standalone app:
 
java -jar pdfbox-app-3.0.2.jar export:text -i=Test.pdf
java -jar pdfbox-app-3.0.2.jar render -i=Test.pdf
 
The Acrobat both finds and extracts text which have been entered into a form 
field.
 
In my code I use PDFTextStripper. I haven’t found any way to configure the 
behaviour. Is it a bug or have I overlooked something? For clarification: I 
don’t want to search for the value (‘V’) but its visual representation (‘AP’).
 
Kind regards,
 
Dipl.-Ing. (FH) Paul Grütter
Head of Development
 

 
signotec GmbH
Am Gierath 20b
40885 Ratingen (Germany)
 
Tel.: +49 2102 53575-10
Fax: +49 2102 53575-39
 
E-Mail: mailto:paul.gruet...@signotec.de
URL: http://www.signotec.com/

Amtsgericht Düsseldorf: HRB 44307
Geschäftsführung/CEO: Arne Brandes
    
 

 

---------------------------------------------------------------------
To unsubscribe, e-mail: mailto:users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: mailto:users-h...@pdfbox.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to