Hello Patrycja,
Please try with a snapshot while I download the file:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/
Tilman
PS: please create a new thread, answering / responding to an old topic
is confusing
On 05.08.2024 15:31, Patrycja Zaremba wrote:
Hi!
I have a specific PDF file which is very heavy (more than 1GB) and I'm
trying to scrap text from it, but getting this error:
Caused by: java.lang.IllegalArgumentException: capacity < 0: (-2115587440 <
0)
at java.base/java.nio.Buffer.createCapacityException(Buffer.java:290)
~[na:na]
at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:390) ~[na:na]
at
org.apache.pdfbox.io.RandomAccessReadBuffer.<init>(RandomAccessReadBuffer.java:70)
~[pdfbox-io-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.io.RandomAccessReadWriteBuffer.<init>(RandomAccessReadWriteBuffer.java:40)
~[pdfbox-io-3.0.2.jar!/:3.0.2]
at org.apache.pdfbox.filter.Filter.decode(Filter.java:250)
~[pdfbox-3.0.2.jar!/:3.0.2]
at org.apache.pdfbox.cos.COSStream.createView(COSStream.java:196)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.pdmodel.PDPage.getContentsForRandomAccess(PDPage.java:177)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:59)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:525)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:506)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:153)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:153)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:362)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288)
~[pdfbox-3.0.2.jar!/:3.0.2]
at
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235)
~[pdfbox-3.0.2.jar!/:3.0.2]
I'm wondering if it's possible to somehow make it running or is it not
supported?
Here is the problematic file:
https://mega.nz/file/NscTmJSL#Bp4TL4UjqUqgMykNO_f7j33y3n0Zwy12K7fNr45GYF8
I opened it with Acrobat Reader, it doesn't look like corrupted. It loads
significantly long, but finally I am able to select text there etc.
Best regards,
Patrycja Zaremba
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org