[ https://issues.apache.org/jira/browse/TIKA-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896423#comment-17896423 ]
ASF GitHub Bot commented on TIKA-4345: -------------------------------------- tballison merged PR #2037: URL: https://github.com/apache/tika/pull/2037 > Allow body-only content extraction for msg and other email formats > ------------------------------------------------------------------ > > Key: TIKA-4345 > URL: https://issues.apache.org/jira/browse/TIKA-4345 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Minor > > At least in the OutlookExtractor, we're writing some of the headers into the > content stream. For some use cases, it would be helpful to extract only the > body content into the content stream. > Looks like OutlookExtractor and maybe OutlookPSTParser are the only parsers > that need to be modified. We're not writing the from/to etc in the > RFC822Parser into the content stream. > I propose that this be a non-breaking/opt-in option in 3.x, and then the > default in 4.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)