found in PDMetadataExtractor.java:
//TODO: find an example where basic.getThumbNail is not null
//and figure out how to add that info
I can answer the first question:
for (String language : basic.getThumbnailLanguages())
{
Thumbnail thumbnail = basic.getThumbnail(language);
String s = thumbnail.getImage();
if (s != null)
{
byte[] decoded = Base64.getMimeDecoder().decode(s); // normal
decoder will fail, unless you remove newlines
// the actual file is now in "decoded".
}
}
alternatively: basic.getThumbnail("") will usually succeed, and
basic.getThumbnail() will usually fail (because that one expects
x-default language in the xmp )
thumbnail.getImage() may throw an NPE due to a bug in Jempbox (fixed in
PDFBOX-5984, happens with govdocs file 002868.pdf ), so this should be
caught for now.
some files that have thumbnails (all from govdocs): 000143.pdf,
000146.pdf, 000162.pdf, 000163.pdf, 000314.pdf
Tilman