[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272297#comment-14272297 ]
Luis Filipe Nassif commented on TIKA-623: ----------------------------------------- I think currently OutlookPSTParser does not extract .msg files, as they do not exist inside pst, mails are broken in several pieces. Looking at the source, it seems to extract/process raw text mail bodies and attachments, even if you set up the parsing to recurse down only one level. And to get the relationship between a mail and its attachs, I think you will need to monitor the handler output currently. I think the parser could be improved to set a parent mail id into the metadata of its attachs and vice versa to make easier to recover the relationships. > Add support for Outlook PST > --------------------------- > > Key: TIKA-623 > URL: https://issues.apache.org/jira/browse/TIKA-623 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Tran Nam Quang > Fix For: 1.6 > > Attachments: OutlookPSTParser.java > > > Hello everyone, > As you might know, Outlook stores its mails and other stuff in a single PST > file. There's a relatively new Java library called java-libpst for reading > Outlook PST files. It is licensed under the LGPL and available over here: > http://code.google.com/p/java-libpst/ > I have tested the library on Outlook 2000 and Outlook 2003, with good > results. It would be great if the library could be integrated into Tika. > Best regards > Tran Nam Quang -- This message was sent by Atlassian JIRA (v6.3.4#6332)