Re: Text extraction from a certain PDF uses up multiple GB of memory

2023-12-14 Thread Andreas Lehmkühler
Looks like https://issues.apache.org/jira/browse/PDFBOX-5479 Am 13.12.23 um 14:50 schrieb Tilman Hausherr: On 13.12.2023 11:23, Brangs, Erik wrote: Hi, we ran into problems when doing text extraction from the PDF athttps://d-nb.info/1312454512/34 . We were using PDFBox 3.0.0 to extract the

Text extraction from a certain PDF uses up multiple GB of memory

2023-12-13 Thread Brangs, Erik
Hi, we ran into problems when doing text extraction from the PDF at https://d-nb.info/1312454512/34 . We were using PDFBox 3.0.0 to extract the text and the text extraction used up multiple GB of memory. The problem can be reproduced with PDFBox 4.0.0-SNAPSHOT and PDFBOX 3.0.2-SNAPSHOT. Is ther