Hello Gilad, Thank you.
Maruan Sahyoun already contacted me with the same tip. It works fine but only because we use PDFBox only for rendering and text extraction at the moment. If we would use it for other use cases, especially for filling in form fields, we would have to create a copy of the document for text extraction which is of obviously not optimal in a web application that may have multiple documents open at the same time. Kind regards, Dipl.-Ing. (FH) Paul Grütter Head of Development signotec GmbH Am Gierath 20b 40885 Ratingen (Germany) Tel.: +49 2102 53575-10 Fax: +49 2102 53575-39 E-Mail: paul.gruet...@signotec.de URL: www.signotec.com Amtsgericht Düsseldorf: HRB 44307 Geschäftsführung/CEO: Arne Brandes Mit freundlichen Grüßen Dipl.-Ing. (FH) Paul Grütter Leiter Entwicklung signotec GmbH Am Gierath 20b 40885 Ratingen Tel.: +49 2102 53575-10 Fax: +49 2102 53575-39 E-Mail: mailto:paul.gruet...@signotec.de URL: https://www.signotec.com/ Amtsgericht Düsseldorf: HRB 44307 Geschäftsführung/CEO: Arne Brandes Von: Gilad Denneboom <gilad.denneb...@gmail.com> Gesendet: Sonntag, 24. März 2024 22:50 An: paul.gruet...@signotec.de.invalid Cc: users@pdfbox.apache.org Betreff: Re: How to search for / extract text of form field Sie erhalten nicht oft eine E-Mail von mailto:gilad.denneb...@gmail.com. https://aka.ms/LearnAboutSenderIdentification Flatten the form fields before searching the file if you want PDFTextStripper to find the text in them. On Thu, Mar 21, 2024 at 12:10 PM Paul Grütter <mailto:paul.gruet...@signotec.de.invalid> wrote: Hello list, I want to search for words in a PDF document and get their positions. It seems that PDFBox ignores text which has been entered into a form field although it’s rendered correctly. I can be reproduced easily with the standalone app: java -jar pdfbox-app-3.0.2.jar export:text -i=Test.pdf java -jar pdfbox-app-3.0.2.jar render -i=Test.pdf The Acrobat both finds and extracts text which have been entered into a form field. In my code I use PDFTextStripper. I haven’t found any way to configure the behaviour. Is it a bug or have I overlooked something? For clarification: I don’t want to search for the value (‘V’) but its visual representation (‘AP’). Kind regards, Dipl.-Ing. (FH) Paul Grütter Head of Development signotec GmbH Am Gierath 20b 40885 Ratingen (Germany) Tel.: +49 2102 53575-10 Fax: +49 2102 53575-39 E-Mail: mailto:paul.gruet...@signotec.de URL: http://www.signotec.com/ Amtsgericht Düsseldorf: HRB 44307 Geschäftsführung/CEO: Arne Brandes --------------------------------------------------------------------- To unsubscribe, e-mail: mailto:users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: mailto:users-h...@pdfbox.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org