On 15.01.2023 16:04, Levi Wilson wrote:
Is it accurate that all versions through 3.0.0-alpha2 *do *raise exceptions
then? Just not alpha3?

I don't know. You can test this yourself, you have the file.

Tilman




On Sun, Jan 15, 2023 at 2:06 AM Tilman Hausherr <thaush...@t-online.de>
wrote:

My question is: was this intentional to silently fail? We realize that
with
the wide amount of content that we receive that there are going to be
"bad"
PDFs which is fine, but currently we are relying on PDFBox to tell us
*when* it
is something that we shouldn't continue any further post-processing on or
not but if it silently fails, we think that if nothing blows up that it
means that we've received all of the pages. If we were to go to alpha3,
this would not be a true assumption any longer.
This has been for years that we have allowed all sort of broken PDFs to
pass, because this was the majority of the users wish, expressed by the
often repeated emotional text "But it renders with Adobe Reader!".

Using PDFBox to check whether a PDF is valid isn't a good idea. Try a
tool like JHOVE.

Tilman


Effectively we loop through a PDF to extract pages like so:

Splitter splitter = new Splitter();
for(PDDocument page : splitter.split(document)) {
    // save each page for consumption later
}

Thanks in advance for any information that you can provide regarding our
expectations of this behavior.

- Levi


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to