Am 26.12.25 um 20:19 schrieb PDF Newbie via dev:
Hi.  I’ve reviewed the documentation for the PDFMerger utility located here:

   https://pdfbox.apache.org/3.0/commandline.html#pdfmerger

And it appears that it only accepts a list of input files on the command line 
with the -i option.  While this is good, I would also like to be able to pass a 
list of file names (one per line), and have the list of files added to a single 
output file specified by the already-existing -o option.

Why?  The number of files I’ll be merging is larger than most UNIX/Linux shell 
environments permit, and file names may be up to 68-characters long 
(64-character string + .pdf), further reducing the number of files I can merge 
within one command.

While I don’t particularly care how this is implemented, other open source 
utilities follow the convention of prefixing an input file name with an ‘at’ 
symbol (‘@‘), or offer a separate option (“-l” / lowercase L) to specify the 
file.
PDFBox offers the '@' option as well, there is no need to implement anything else, if there aren't any other limitations.

Simply put any command line option to a text file and use the @-option to append the options from the mentioned file to the cli:

java -jar pdfbox-app-3.y.z.jar merge @cli_options.txt

We should add that "hidden" feature to the documentation.

BUT there is a possible issue if one tries to merge to many/to big files at once. As PDFBox merges one file after the other to the target without saving the result, the merger utility will run sooner or later into an OutOfMemory exception. To avoid that the implementation has to be changed, e.g. saving the result after 10 or so input files ....

Testing only needs to be basic — before beginning, ensure all files described 
in the list file exist, and optionally test the correctness/compliance of the 
input PDF’s to ensure a defective merged PDF isn’t created, and ensure that the 
input file doesn’t contain more pages than the PDF spec allows for, and 
potentially adding a warning when files that contain incompatible options 
(password protected features such as copying / printing, etc.) are added.

That feature sounds easy, but the details may become complicated.

- testing all files before merging will slow down the whole process, especially if the number of files is huge - it is tricky to define which files are correct/compliant as there are many files in the wild which are corrupt or don't follow the spec but can be read using Acrobat, which makes them "correct"/"complaint" and PDFBox tries to read such files as well - I never heard of an explicit maximum number of pages, but there is a maximum size for an object number which in the end limits the maximum number of pages. However, most likely such a pdf will be very huge and I expect PDFBox will run into an OutOfMemory- or similar exception when writing/reading such a pdf.


Given that the use case for this is commercial in nature, I’d like to offer a 
bounty for this feature - I’m neutral as to which platform should be used for 
posting / tracking / paying for this — just let me know which one.  I’d prefer 
to work with one of the core maintainers for the project, but other recognized 
contributors are also eligible, my only requirement is that the change is added 
to the main distribution.  I’d also consider making a donation directly to the 
project if that is preferred.

I'm hesitant to add your (enterprise-)workflow to an command line utility which wasn't supposed to be used in an enterprise environment. IMHO that check feature should be implemented in a separate tool and you should do the checks yourself before feeding the merger utility with the a list of files

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to