How are you instantiating the ForkParser and configuring the LibPstParser?

> Last but not the least, the file for which I am doing testing is a plain text 
> file, so not sure why the PST parser is getting invoked for it.

When the AutoDetectParser is built and the LibPstParser is "turned on"
via TikaConfig, the LibPstParser runs a check to see if it can execute
readpst during initialization and this is where you're seeing the
failure in the above stack trace. It is not lazily initialized.

On Mon, Nov 18, 2024 at 7:52 AM Sandeep Kulkarni
<sandeep.kulkar...@veritas.com.invalid> wrote:
>
> Hi All,
>
> We are using Tika as a library and also making use of Fork Parser to launch 
> Tika in a separate process. Things work for me for
>
> We have integrated Tika 3.0.0 and would like to try out support for readpst 
> that was added to it (TIKA-4250). Main reason is to see if we can get rid of 
> java-libpst which is marked EOL by various scanners and customers are 
> complaining about it.
>
> I used the config example to disable OutlookPSTParser and enable LibPstParser 
> in its place in the commit 
> https://github.com/apache/tika/commit/32baf2345abe1a04d767ea6641a567d5c924587e
>
> As the new parser is not having any config option to specify path for readpst 
> binary, I added path for it in system environment PATH variable. It is 
> installed via Cygwin on Windows environment, path is like c:\cygwin64\bin. It 
> is working fine, and new LibPstParser parser is getting launched. But when we 
> do the same with Fork Parser, we get an error.
>
> [LibPstParser] Couldn't get version of libpst
> java.io.IOException: Cannot run program "readpst": CreateProcess error=2, The 
> system cannot find the file specified
>                 at 
> java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
>                 at 
> java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
>                 at 
> org.apache.tika.utils.ProcessUtils.execute(ProcessUtils.java:94)
>                 at 
> org.apache.tika.parser.microsoft.libpst.LibPstParser.check(LibPstParser.java:176)
>                 at 
> org.apache.tika.parser.microsoft.libpst.LibPstParser.initialize(LibPstParser.java:161)
>
> Any help would be appreciated.
>
> Last but not the least, the file for which I am doing testing is a plain text 
> file, so not sure why the PST parser is getting invoked for it.
>
> Regards,
> Sandeep Kulkarni

Reply via email to