[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271534#comment-14271534 ]
Luis Filipe Nassif commented on TIKA-623: ----------------------------------------- Maybe the PSTParserTest can help: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mbox/OutlookPSTParserTest.java ParsingEmbeddedDocumentExtractor simply appends the contents of all mails together, so I think the hits will point to the PST file. You could override the parseEmbedded(...) method to extract individual mails and process (index) them separately, but I do not know how to do this with solr. > Add support for Outlook PST > --------------------------- > > Key: TIKA-623 > URL: https://issues.apache.org/jira/browse/TIKA-623 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Tran Nam Quang > Fix For: 1.6 > > Attachments: OutlookPSTParser.java > > > Hello everyone, > As you might know, Outlook stores its mails and other stuff in a single PST > file. There's a relatively new Java library called java-libpst for reading > Outlook PST files. It is licensed under the LGPL and available over here: > http://code.google.com/p/java-libpst/ > I have tested the library on Outlook 2000 and Outlook 2003, with good > results. It would be great if the library could be integrated into Tika. > Best regards > Tran Nam Quang -- This message was sent by Atlassian JIRA (v6.3.4#6332)