[ 
https://issues.apache.org/jira/browse/TIKA-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931277#comment-17931277
 ] 

Ross Johnson commented on TIKA-4391:
------------------------------------

I've worked a lot with msg files & normalizing attachments, so just thought I'd 
give a bit of a brain dump of related info.

In HTML bodies & RTF-encapsulated HTML bodies, inline images normally have 
`PidTagAttachmentHidden` = true. Note that I have seen unusual emails where an 
attachment image has `PidTagAttachmentHidden` = true, but it's there doesn't 
seem to be a place in the HTML where that image actually goes, i.e. no apparent 
reference in the HTML. This flag is also used to hide other non-image 
attachments, mostly related to calendar invites & calendar exceptions.

For real RTF bodies that reference attachments, things are a bit different. 
These attachments don't have `PidTagAttachmentHidden` = true but rather have 
`PidTagRenderingPosition` < 0xFFFFFFFF. The main issue with RTF attachments is 
determining whether the attachment is actually fully shown inline or not. For 
example, an embedded message or normal binary file will just show a thumbnail 
in the body (stored in a OLE presentation stream). Other attachments, such as 
an Excel file, may show a selection of a worksheet inline, and clicking on that 
section in Outlook then opens the full Excel file. I think true inline images 
won't have any OLE presentation defined, indicating that the original image 
data is used inline directly instead.

> Detect inline images in msg files
> ---------------------------------
>
>                 Key: TIKA-4391
>                 URL: https://issues.apache.org/jira/browse/TIKA-4391
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> Images are stored as attachments. It would be helpful to be able to 
> distinguish between "inline" images that are intended to be rendered in the 
> email vs regular image attachments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to