[PR] Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling [pdfbox]

via GitHub Fri, 04 Jul 2025 00:42:27 -0700


SwethaMuthuvel opened a new pull request, #208:
URL: https://github.com/apache/pdfbox/pull/208


   ## 🧩 What This PR Does
   
   This pull request improves the robustness and debuggability of 
`PDFMergerUtility` by:
   
   1. **Adding a `skipCorruptFiles` flag**  
      - Allows users to skip unreadable or corrupt PDF files during merge.  
      - Default behavior remains unchanged (i.e., throws on error).
   
   2. **Wrapping `IOException` with source context**  
      - Converts vague errors like:  
        ```
        IOException: Could not parse object stream
        ```  
        into more useful messages like:  
        ```
        IOException: Failed to load PDF from source: /path/to/file.pdf
        ```  
      - Helps identify exactly which file failed.
   
   3. **Applied consistently in both merge modes**  
      - `optimizedMergeDocuments(...)`  
      - `legacyMergeDocuments(...)`  
      - Added warning logs when skipping files.
   
   ---
   
   ## 💡 Why This Helps
   
   - **Improves debuggability** — pinpoints which file caused the failure.
   - **Makes batch operations resilient** — avoids total failure from one bad 
input.
   - **Scales better** — suitable for bulk merging scenarios.
   - **Does not break existing behavior** — opt-in via 
`setSkipCorruptFiles(true)`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[PR] Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling [pdfbox]

Reply via email to