found in PDMetadataExtractor.java:

       //TODO: find an example where basic.getThumbNail is not null
        //and figure out how to add that info

I can answer the first question:

for (String language : basic.getThumbnailLanguages())
{
    Thumbnail thumbnail = basic.getThumbnail(language);
    String s = thumbnail.getImage();
    if (s != null)
    {
        byte[] decoded = Base64.getMimeDecoder().decode(s); // normal decoder will fail, unless you remove newlines

        // the actual file is now in "decoded".

    }
}

alternatively: basic.getThumbnail("") will usually succeed, and basic.getThumbnail() will usually fail (because that one expects x-default language in the xmp )

thumbnail.getImage() may throw an NPE due to a bug in Jempbox (fixed in PDFBOX-5984, happens with govdocs file 002868.pdf ), so this should be caught for now.

some files that have thumbnails (all from govdocs): 000143.pdf, 000146.pdf, 000162.pdf, 000163.pdf, 000314.pdf

Tilman

Reply via email to