Tim Allison created TIKA-4248:
---------------------------------

             Summary: Improve PST handling of attachments
                 Key: TIKA-4248
                 URL: https://issues.apache.org/jira/browse/TIKA-4248
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


The PST parser doesn't handle attachments in quite the same way as other 
parsers which hinders analysis of attachments.

The problem is that the PST parser handles the text content of an email and the 
embedded attachments. And, the PST parser processes attachments before the main 
body. These two features make the normal patterns for embedded attachments 
break down in the RecursiveParserWrapper. For example, when the attachments are 
being processed, the RecursiveParserWrapper can't figure out what the path will 
be through the "body" because that hasn't been parsed yet.

We should probably create a PSTMailItemParser that handles the content and the 
attachments like other parsers so that embedded paths can be maintained.

This will be a breaking change, and I'm targeting it only to the 3.x branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to