[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843798#comment-17843798 ]
Tim Allison commented on TIKA-4250: ----------------------------------- So, I caught an example of libpst not reading an attachment in our unit test file (testPST.pst). The attached msg should contain an embedded msg that includes a docx. Via a hex editor, I can see that there is no embedded msg in 8.msg, whereas the structure is correctly maintained in 8.eml. > Add a libpst-based parser > ------------------------- > > Key: TIKA-4250 > URL: https://issues.apache.org/jira/browse/TIKA-4250 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > Attachments: 8.eml, 8.msg > > > We currently use the com.pff Java-based PST parser for PST files. It would be > useful to add a wrapper for libpst as an optional parser. > One of the benefits of libpst is that it creates .eml or .msg files from the > PST records. This is critical for those who want the original bytes from > embedded files. Obv, PST doesn't store eml or msg, but some users want the > "original" emails even if they are constructed from PST records. -- This message was sent by Atlassian Jira (v8.20.10#820010)