[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843593#comment-17843593 ]
Luís Filipe Nassif commented on TIKA-4250: ------------------------------------------ I included a patched version of java-libpst-0.9.5 in the test, results below: |For 258 pst/ost files (93GB)| | | | | | | | | |LibPst-0.6.76|LibPff-20131028|Java-libpst-0.9.5*| |Emails|195698|201818|208373| |Contacts|19738|19949|24342| |Attachments|242394|286723|275481| |Feeds|0|47916|47913| |Appointments|0|12664|15885| |Meetings|0|5285|0| |Activity|0|3457|3457| |Documents|0|2202|0| |Taks|0|578|562| |Notes|0|391|0| |Vcalendar|8642|0|0| |Vjournal|2352|0|0| |Total|468824|580983|576013| | | | | | |*java-libpst-0.9.5 fork with some fixes| | PS: Tested libpff version is pretty old, I should have run with a newer version... PS2: Libpff recovery of deleted items was not enabled, it recovers some thousands of emails and attachs. > Add a libpst-based parser > ------------------------- > > Key: TIKA-4250 > URL: https://issues.apache.org/jira/browse/TIKA-4250 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > We currently use the com.pff Java-based PST parser for PST files. It would be > useful to add a wrapper for libpst as an optional parser. > One of the benefits of libpst is that it creates .eml or .msg files from the > PST records. This is critical for those who want the original bytes from > embedded files. Obv, PST doesn't store eml or msg, but some users want the > "original" emails even if they are constructed from PST records. -- This message was sent by Atlassian Jira (v8.20.10#820010)