That commit fixed another regression, i.e. testPDFBox5752 would fail by
reverting this commit.
Tilman
Am 04.02.2026 um 13:19 schrieb Daniel Persson:
Hi Andreas
You are right, the commit that introduced the error is:
----------------------------------------------------
commit 41c3a431e21c31a9cf6d6dec4b47a126bac2996f (HEAD)
Author: Andreas Lehmkühler <[email protected]>
Date: Tue Dec 16 07:20:09 2025 +0000
PDFBOX-6036: avoid overlapping object keys when importing pages from
another pdf
git-svn-id: https://svn.apache.org/repos/asf/pdfbox/branches/3.0@1930616
13f79535-47bb-0310-9956-ffa450edef68
----------------------------------------------------
I still don't like the new implementation of COSWriterObjectStream. The
original thread-safe implementation is simpler to read and more correct,
but I understand if you want the change for performance reasons. Removing
the synchronization seems like the wrong way to do this. And looking at the
numbers, the original implementation handles memory and time better in my
comparisons.
== Test 3.0.6 ==
iter=0 wall_ms=376.855 cpu_ms=322.945 alloc_mb=92.806
heap_before_mb=216.448 heap_after_mb=141.633 heap_delta_mb=-74.814
gc_count_delta=1 gc_time_ms_delta=5
iter=1 wall_ms=308.536 cpu_ms=302.087 alloc_mb=92.774
heap_before_mb=141.633 heap_after_mb=233.633 heap_delta_mb=92.000
gc_count_delta=0 gc_time_ms_delta=0
iter=2 wall_ms=327.187 cpu_ms=316.977 alloc_mb=92.774
heap_before_mb=233.633 heap_after_mb=327.635 heap_delta_mb=94.002
gc_count_delta=0 gc_time_ms_delta=0
iter=3 wall_ms=396.564 cpu_ms=381.406 alloc_mb=92.774
heap_before_mb=327.635 heap_after_mb=113.293 heap_delta_mb=-214.343
gc_count_delta=1 gc_time_ms_delta=5
iter=4 wall_ms=470.828 cpu_ms=469.447 alloc_mb=92.774
heap_before_mb=113.293 heap_after_mb=205.293 heap_delta_mb=92.000
gc_count_delta=0 gc_time_ms_delta=0
iter=5 wall_ms=528.677 cpu_ms=523.818 alloc_mb=94.127
heap_before_mb=205.293 heap_after_mb=301.293 heap_delta_mb=96.000
gc_count_delta=0 gc_time_ms_delta=0
iter=6 wall_ms=543.075 cpu_ms=533.379 alloc_mb=92.886
heap_before_mb=301.293 heap_after_mb=393.293 heap_delta_mb=92.000
gc_count_delta=0 gc_time_ms_delta=0
iter=7 wall_ms=489.066 cpu_ms=483.486 alloc_mb=92.886
heap_before_mb=393.293 heap_after_mb=485.293 heap_delta_mb=92.000
gc_count_delta=0 gc_time_ms_delta=0
iter=8 wall_ms=817.173 cpu_ms=512.633 alloc_mb=92.824
heap_before_mb=487.291 heap_after_mb=148.949 heap_delta_mb=-338.342
gc_count_delta=1 gc_time_ms_delta=4
iter=9 wall_ms=897.804 cpu_ms=343.736 alloc_mb=92.824
heap_before_mb=148.949 heap_after_mb=240.949 heap_delta_mb=92.000
gc_count_delta=0 gc_time_ms_delta=0
== Test 3.0.7 ==
iter=0 wall_ms=507.491 cpu_ms=501.767 alloc_mb=94.896
heap_before_mb=195.869 heap_after_mb=170.142 heap_delta_mb=-25.726
gc_count_delta=1 gc_time_ms_delta=6
iter=1 wall_ms=495.749 cpu_ms=492.284 alloc_mb=94.861
heap_before_mb=170.142 heap_after_mb=266.142 heap_delta_mb=96.000
gc_count_delta=0 gc_time_ms_delta=0
iter=2 wall_ms=437.024 cpu_ms=435.485 alloc_mb=94.861
heap_before_mb=266.142 heap_after_mb=360.140 heap_delta_mb=93.998
gc_count_delta=0 gc_time_ms_delta=0
iter=3 wall_ms=478.096 cpu_ms=465.265 alloc_mb=94.861
heap_before_mb=360.140 heap_after_mb=127.227 heap_delta_mb=-232.913
gc_count_delta=1 gc_time_ms_delta=5
iter=4 wall_ms=1096.645 cpu_ms=509.049 alloc_mb=94.862
heap_before_mb=127.227 heap_after_mb=221.229 heap_delta_mb=94.002
gc_count_delta=0 gc_time_ms_delta=0
iter=5 wall_ms=1049.944 cpu_ms=319.307 alloc_mb=94.863
heap_before_mb=221.229 heap_after_mb=317.229 heap_delta_mb=96.000
gc_count_delta=0 gc_time_ms_delta=0
iter=6 wall_ms=1851.772 cpu_ms=343.559 alloc_mb=94.863
heap_before_mb=317.229 heap_after_mb=411.227 heap_delta_mb=93.998
gc_count_delta=0 gc_time_ms_delta=0
iter=7 wall_ms=485.274 cpu_ms=345.465 alloc_mb=94.863
heap_before_mb=411.227 heap_after_mb=116.257 heap_delta_mb=-294.970
gc_count_delta=1 gc_time_ms_delta=4
iter=8 wall_ms=407.939 cpu_ms=405.857 alloc_mb=94.802
heap_before_mb=116.257 heap_after_mb=210.259 heap_delta_mb=94.002
gc_count_delta=0 gc_time_ms_delta=0
iter=9 wall_ms=689.281 cpu_ms=380.940 alloc_mb=94.801
heap_before_mb=210.259 heap_after_mb=304.257 heap_delta_mb=93.998
gc_count_delta=0 gc_time_ms_delta=0
But maybe you've seen another trend using another profiling tool.
The results above are created by a ChatGPU testing tool that warms the code
5 times and then tests it 10 times while outputting the result.
PDFBox code I ran was loading a PDF and saving the document without any
changes.
Full code here:
https://github.com/kalaspuffar/PDFBoxTestBase/blob/main/src/main/java/TestingPerformance.java
Best regards
Daniel
On Wed, Feb 4, 2026 at 8:28 AM Andreas Lehmkühler <[email protected]> wrote:
Hmmm, the first commit introduced a regression which ended up in crashes
and the second one fixed the regression. The whole change was about
compressed object streams which shall not contain already compressed
objects such as content streams using FlateFilter as filter. Saying
that, I'm hesitant to believe that your issue is related to those
changes. Maybe another commit between those commits is the root cause.
Without some sample code it is fishing in troubled waters.
Am 03.02.26 um 18:17 schrieb Daniel Persson:
Hi Andreas
It's in 3.0.7. I ran a bunch of commits in order to figure out when the
issue was introduced.
87011ade3 fail
f3bb496975ee6ca6ae98c00c0e50cfc4375a3f8a fail
7ee6d390278fd0b06668ec65ede14810c6075ec9 crash
26283807ad crash
dd76acd546 crash
2fef081c714d8c6524aab118e2bfec7cf379e45a crash
08bc6fdd5200966309787a8188c3d7d5827b170a crash
3800af7bc5d8f08af99a653b37f8e4cd67bf1659 crash
1d4ae695a83c33999bda78a1d9f8c43512940965 crash
1ac4a24f8f7dfd08924ef9645246656ad3b9b33a crash
994b87e2b4d30ac2435cff9fe20ecdfc6ab1b916 crash
f82d2224a047bc642f1d38ff18360c61eaf9cccf success
d7d34f25cec7f4884e8f599ed620b2c3c704017b success
045d17604640a68b798027300f690f0af2b1a95d success
cdffe505e8bdeb5810456c1e6d9df61c7e2aab85 success
304ab0027d18fc8df5638f39bac033a55769dc4e success
222fb5f3b32fdb20f11107919700a80d1dcc130e success
Never commits on top.
So the two pivital commits we have is:
--------------------------------------------------
commit 994b87e2b4d30ac2435cff9fe20ecdfc6ab1b916 (head)
Author: Andreas Lehmkühler <[email protected]>
Date: Sat Dec 6 12:32:10 2025 +0000
PDFBOX-5169: reduce the memory footprint by reusing the internal
byte
array instead of copying it
git-svn-id:
https://svn.apache.org/repos/asf/pdfbox/branches/3.0@1930285
13f79535-47bb-0310-9956-ffa450edef68
--------------------------------------------------
After this one the created PDF could not be rendered in poppler.
Next we have this:
--------------------------------------------------
commit f3bb496975ee6ca6ae98c00c0e50cfc4375a3f8a (HEAD)
Author: Andreas Lehmkühler <[email protected]>
Date: Sat Jan 10 11:25:01 2026 +0000
PDFBOX-6142: take the size of the stream into account when accessing
the data of the underlying byte array
git-svn-id:
https://svn.apache.org/repos/asf/pdfbox/branches/3.0@1931215
13f79535-47bb-0310-9956-ffa450edef68
--------------------------------------------------
This one stores COSDictionary instead of COSStream for the contents of
the
document sometimes.
Best regards
Daniel
On Tue, Feb 3, 2026 at 4:34 PM Andreas Lehmkühler <[email protected]>
wrote:
Am 03.02.26 um 15:46 schrieb Daniel Persson:
Hi again.
Sorry to say that this version is still not great.
Thanks for the feedback
-1.
I have not figured out what is going on because we do a lot of
operations,
but when I process a file with multiple pages (48) and do all our
operations, and then save it again. I get a bunch of blank pages.
So the first 38 pages don't save COSStream for the Content stream; it
uses
a COSDictionary with the length and filter.
Filter: FlateDecode
Length: 7820
So the first 38 pages are blank, and the last 10 are stored correctly.
This
is a change from the previous version of PDFBox.
Trying to create a minimal critical example code to show this issue.
Sending this email if someone might have an idea why I see this.
Is this new in 3.0.7?
Best regards
Daniel
On Mon, Feb 2, 2026 at 6:14 PM Andreas Lehmkühler <[email protected]>
wrote:
Hi,
a candidate for the PDFBox 3.0.7 release is available at:
https://dist.apache.org/repos/dist/dev/pdfbox/3.0.7/
The release candidate is a zip archive of the sources in:
https://svn.apache.org/repos/asf/pdfbox/tags/3.0.7/
The SHA-512 checksum of the archive is
bf863c69225821d93d4a4cf86b4dae59c93211651ca72bfbf5da7dfcf6a480b3d7b8c0ea672adbba789afd0e79481ec8883da15e29c5fa31cba564aa8cfc89d0.
Please vote on releasing this package as Apache PDFBox 3.0.7.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.
[ ] +1 Release this package as Apache PDFBox 3.0.7
[ ] -1 Do not release this package because...
Here is my +1
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]