My question is: was this intentional to silently fail? We realize that with
the wide amount of content that we receive that there are going to be "bad"
PDFs which is fine, but currently we are relying on PDFBox to tell us *when* it
is something that we shouldn't continue any further post-processing on or
not but if it silently fails, we think that if nothing blows up that it
means that we've received all of the pages. If we were to go to alpha3,
this would not be a true assumption any longer.

This has been for years that we have allowed all sort of broken PDFs to pass, because this was the majority of the users wish, expressed by the often repeated emotional text "But it renders with Adobe Reader!".

Using PDFBox to check whether a PDF is valid isn't a good idea. Try a tool like JHOVE.

Tilman



Effectively we loop through a PDF to extract pages like so:

Splitter splitter = new Splitter();
for(PDDocument page : splitter.split(document)) {
   // save each page for consumption later
}

Thanks in advance for any information that you can provide regarding our
expectations of this behavior.

- Levi



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to