I found another issue that would prevent LibPstParser from working
correctly in the ForkParser. Let's move the discussion of this
integration to: https://issues.apache.org/jira/browse/TIKA-4355

On Tue, Nov 19, 2024 at 10:00 AM Tim Allison <talli...@apache.org> wrote:
>
> Sorry...should have responded to both dev@ and user@
>
> ---------- Forwarded message ---------
> From: Tim Allison <talli...@apache.org>
> Date: Tue, Nov 19, 2024 at 9:59 AM
> Subject: Re: Tika fork parser not picking up system environment variables
> To: <dev@tika.apache.org>
>
>
> How are you instantiating the ForkParser and configuring the LibPstParser?
>
> > Last but not the least, the file for which I am doing testing is a plain 
> > text file, so not sure why the PST parser is getting invoked for it.
>
> When the AutoDetectParser is built and the LibPstParser is "turned on"
> via TikaConfig, the LibPstParser runs a check to see if it can execute
> readpst during initialization and this is where you're seeing the
> failure in the above stack trace. It is not lazily initialized.
>
> On Mon, Nov 18, 2024 at 7:52 AM Sandeep Kulkarni
> <sandeep.kulkar...@veritas.com.invalid> wrote:
> >
> > Hi All,
> >
> > We are using Tika as a library and also making use of Fork Parser to launch 
> > Tika in a separate process. Things work for me for
> >
> > We have integrated Tika 3.0.0 and would like to try out support for readpst 
> > that was added to it (TIKA-4250). Main reason is to see if we can get rid 
> > of java-libpst which is marked EOL by various scanners and customers are 
> > complaining about it.
> >
> > I used the config example to disable OutlookPSTParser and enable 
> > LibPstParser in its place in the commit 
> > https://github.com/apache/tika/commit/32baf2345abe1a04d767ea6641a567d5c924587e
> >
> > As the new parser is not having any config option to specify path for 
> > readpst binary, I added path for it in system environment PATH variable. It 
> > is installed via Cygwin on Windows environment, path is like 
> > c:\cygwin64\bin. It is working fine, and new LibPstParser parser is getting 
> > launched. But when we do the same with Fork Parser, we get an error.
> >
> > [LibPstParser] Couldn't get version of libpst
> > java.io.IOException: Cannot run program "readpst": CreateProcess error=2, 
> > The system cannot find the file specified
> >                 at 
> > java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
> >                 at 
> > java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
> >                 at 
> > org.apache.tika.utils.ProcessUtils.execute(ProcessUtils.java:94)
> >                 at 
> > org.apache.tika.parser.microsoft.libpst.LibPstParser.check(LibPstParser.java:176)
> >                 at 
> > org.apache.tika.parser.microsoft.libpst.LibPstParser.initialize(LibPstParser.java:161)
> >
> > Any help would be appreciated.
> >
> > Last but not the least, the file for which I am doing testing is a plain 
> > text file, so not sure why the PST parser is getting invoked for it.
> >
> > Regards,
> > Sandeep Kulkarni

Reply via email to