[ 
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843593#comment-17843593
 ] 

Luís Filipe Nassif commented on TIKA-4250:
------------------------------------------

I included a patched version of java-libpst-0.9.5 in the test, results below:
|For 258 pst/ost files (93GB)| | |
| | | | |
| |LibPst-0.6.76|LibPff-20131028|Java-libpst-0.9.5*|
|Emails|195698|201818|208373|
|Contacts|19738|19949|24342|
|Attachments|242394|286723|275481|
|Feeds|0|47916|47913|
|Appointments|0|12664|15885|
|Meetings|0|5285|0|
|Activity|0|3457|3457|
|Documents|0|2202|0|
|Taks|0|578|562|
|Notes|0|391|0|
|Vcalendar|8642|0|0|
|Vjournal|2352|0|0|
|Total|468824|580983|576013|
| | | | |
|*java-libpst-0.9.5 fork with some fixes| |

PS: Tested libpff version is pretty old, I should have run with a newer 
version...

PS2: Libpff recovery of deleted items was not enabled, it recovers some 
thousands of emails and attachs.

> Add a libpst-based parser
> -------------------------
>
>                 Key: TIKA-4250
>                 URL: https://issues.apache.org/jira/browse/TIKA-4250
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> We currently use the com.pff Java-based PST parser for PST files. It would be 
> useful to add a wrapper for libpst as an optional parser. 
> One of the benefits of libpst is that it creates .eml or .msg files from the 
> PST records. This is critical for those who want the original bytes from 
> embedded files. Obv, PST doesn't store eml or msg, but some users want the 
> "original" emails even if they are constructed from PST records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to