[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271534#comment-14271534
 ] 

Luis Filipe Nassif commented on TIKA-623:
-----------------------------------------

Maybe the PSTParserTest can help: 
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mbox/OutlookPSTParserTest.java

ParsingEmbeddedDocumentExtractor simply appends the contents of all mails 
together, so I think the hits will point to the PST file. You could override 
the parseEmbedded(...) method to extract individual mails and process (index) 
them separately, but I do not know how to do this with solr.

> Add support for Outlook PST
> ---------------------------
>
>                 Key: TIKA-623
>                 URL: https://issues.apache.org/jira/browse/TIKA-623
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Tran Nam Quang
>             Fix For: 1.6
>
>         Attachments: OutlookPSTParser.java
>
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST 
> file. There's a relatively new Java library called java-libpst for reading 
> Outlook PST files. It is licensed under the LGPL and available over here: 
> http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good 
> results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to