Re: Could not load font file

2023-12-14 Thread Tilman Hausherr
Hi, The "SubstFormat" bug is not really important because it doesn't abort, the "Format 14 cmap table" isn't really a bug, there are usually several tables. Please try a snapshot version, the "SubstFormat" bug has been fixed: https://repository.apache.org/content/groups/snapshots/org/apache/pd

Re: Text extraction from a certain PDF uses up multiple GB of memory

2023-12-14 Thread Andreas Lehmkühler
Looks like https://issues.apache.org/jira/browse/PDFBOX-5479 Am 13.12.23 um 14:50 schrieb Tilman Hausherr: On 13.12.2023 11:23, Brangs, Erik wrote: Hi, we ran into problems when doing text extraction from the PDF athttps://d-nb.info/1312454512/34 . We were using PDFBox 3.0.0 to extract the