[ 
https://issues.apache.org/jira/browse/TIKA-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983813#comment-13983813
 ] 

Tim Allison commented on TIKA-1283:
-----------------------------------

[~thaichat04], thank you, as always.  By "thumbnail," I'd also want to include 
images/icons of documents that are included only for display purposes.  For 
example, the icon image (image1.emf) in test-documents/EmbeddedPDF.docx doesn't 
have a "relationship"=thumbnail, but I'd want to include that as a thumbnail 
because it appears as an <v:shape> within a <w:object>.  

The point you make about the differences in handling of these by application is 
right on.  Each application links thumbnail images to the underlying data in 
different ways, and we'll have to go application by application to do this 
correctly (whether we go with this or TIKA-90)

I'm not held to the original proposal in this issue, and I like the clarity of 
TIKA-90 quite a bit.  Some other thoughts...the signature I proposed above 
won't work because a given image can have more than one thumbnail (at least for 
RTFs) and it misses metadata around the thumbnail image (such as mediaType of 
the thumbnail). 

> Add "thumbnail" as possible metadata item to TikaCoreProperties
> ---------------------------------------------------------------
>
>                 Key: TIKA-1283
>                 URL: https://issues.apache.org/jira/browse/TIKA-1283
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: Tim Allison
>            Priority: Minor
>
> TIKA-90 originally requested to add thumbnails to a document's metadata.
> I'd like to have a unified way of determining whether an embedded 
> document/resource is a thumbnail or a regular attachment.
> With the changes in TIKA-1223 (ooxml) and TIKA-1010 (rtf), we are now pulling 
> out more thumbnails than before.
> I propose adding "tika:thumbnail" to the metadata of each thumbnail image.  
> The consumer can then determine what to do with the embedded resource based 
> on the metadata.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to