[ https://issues.apache.org/jira/browse/TIKA-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-4345: ------------------------------ Description: At least in the OutlookExtractor, we're writing some of the headers into the content stream. For some use cases, it would be helpful to extract only the body content into the content stream. Looks like OutlookExtractor and maybe OutlookPSTParser are the only parsers that need to be modified. We're not writing the from/to etc in the RFC822Parser into the content stream. I propose that this be a non-breaking/opt-in option. was: At least in the OutlookParser, we're writing some of the headers into the content stream. For some use cases, it would be helpful to extract only the body content into the content stream. I propose that this be a non-breaking/opt-in option. > Allow body-only content extraction for msg and other email formats > ------------------------------------------------------------------ > > Key: TIKA-4345 > URL: https://issues.apache.org/jira/browse/TIKA-4345 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Minor > > At least in the OutlookExtractor, we're writing some of the headers into the > content stream. For some use cases, it would be helpful to extract only the > body content into the content stream. > Looks like OutlookExtractor and maybe OutlookPSTParser are the only parsers > that need to be modified. We're not writing the from/to etc in the > RFC822Parser into the content stream. > I propose that this be a non-breaking/opt-in option. -- This message was sent by Atlassian Jira (v8.20.10#820010)