Re: Splitter does not include structure tree in documents past the first split

2025-05-23 Thread Alastair Porter
I've tested the latest trunk on the test files that I have and the splitter appears to work now. I'll consider this fixed unless our team get back to me with any other exceptional cases. Thanks again for your help Alastair On Tue, 20 May 2025 at 08:37, Tilman Hausherr wrote: > On 19.05.2025 17:

Re: Splitter does not include structure tree in documents past the first split

2025-05-19 Thread Tilman Hausherr
On 19.05.2025 17:59, Alastair Porter wrote: I'm currently verifying if we can privately share these documents with you. Please let me know if it would be useful for debugging. About the second bug: please try opening and saving it. Tilman -

Re: Splitter does not include structure tree in documents past the first split

2025-05-19 Thread Tilman Hausherr
Hi, I fixed the first bug. I'll see if I can reproduce the second one with the many files I have. It might take some time before the new versions appears on the repository, there are currently build problems. JIRA is best to discuss the problems. I'd like to get both files, despite that the

Re: Splitter does not include structure tree in documents past the first split

2025-05-19 Thread Alastair Porter
Hi, With some of our PDFs I get two different errors: 1: java.lang.NullPointerException at org.apache.pdfbox.multipdf.Splitter.cloneStructureTree(Splitter.java:238) at org.apache.pdfbox.multipdf.Splitter.split(Splitter.java:145) at org.apache.pdfbox.tools.PDFSplit.call(PDFSplit.java:133) at org.ap

Re: Splitter does not include structure tree in documents past the first split

2025-05-17 Thread Tilman Hausherr
Hi, Make sure to download the software again, I found another bug that I fixed. Tilman On 16.05.2025 21:36, Alastair Porter wrote: Hi Tilman, Please try with a snapshot: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/4.0.0-SNAPSHOT/ Now elements withou

Re: Splitter does not include structure tree in documents past the first split

2025-05-16 Thread Alastair Porter
Hi Tilman, > Please try with a snapshot: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/4.0.0-SNAPSHOT/ Now elements without /Pg entry are removed only if they have MCIDs. Note > that the "new" second page doesn't pass the PAC test but this is because > it s

Re: Splitter does not include structure tree in documents past the first split

2025-05-16 Thread Tilman Hausherr
Please try with a snapshot: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/4.0.0-SNAPSHOT/ Now elements without /Pg entry are removed only if they have MCIDs. Note that the "new" second page doesn't pass the PAC test but this is because it starts with H2.

Re: Splitter does not include structure tree in documents past the first split

2025-05-15 Thread Tilman Hausherr
On 15.05.2025 07:50, Alastair Porter wrote: I uploaded it to https://porter.net.nz/~alastair/pdfbox-split-missing-tags.pdf new ticket: https://issues.apache.org/jira/browse/PDFBOX-6009 I can reproduce it. Ouch. I'll hope to work on it in the next few days Tilman ---

Re: Splitter does not include structure tree in documents past the first split

2025-05-15 Thread Tilman Hausherr
On 14.05.2025 18:35, Alastair Porter wrote: In the first file, I correctly see the /K element. What's more, this element has correctly been pruned and doesn't include any items from the input document which point to pages that are not in this split. In subsequent split files, I see no /K eleme

Re: Splitter does not include structure tree in documents past the first split

2025-05-14 Thread Alastair Porter
I uploaded it to https://porter.net.nz/~alastair/pdfbox-split-missing-tags.pdf Alastair On Thu, 15 May 2025 at 05:38, wrote: > Please upload the file to a shareholder > > Tilman > > -- Original-Nachricht -- > Von: Alastair Porter > Betreff: Splitter does not include structure tree in documents