I found another issue that would prevent LibPstParser from working correctly in the ForkParser. Let's move the discussion of this integration to: https://issues.apache.org/jira/browse/TIKA-4355
On Tue, Nov 19, 2024 at 10:00 AM Tim Allison <talli...@apache.org> wrote: > > Sorry...should have responded to both dev@ and user@ > > ---------- Forwarded message --------- > From: Tim Allison <talli...@apache.org> > Date: Tue, Nov 19, 2024 at 9:59 AM > Subject: Re: Tika fork parser not picking up system environment variables > To: <dev@tika.apache.org> > > > How are you instantiating the ForkParser and configuring the LibPstParser? > > > Last but not the least, the file for which I am doing testing is a plain > > text file, so not sure why the PST parser is getting invoked for it. > > When the AutoDetectParser is built and the LibPstParser is "turned on" > via TikaConfig, the LibPstParser runs a check to see if it can execute > readpst during initialization and this is where you're seeing the > failure in the above stack trace. It is not lazily initialized. > > On Mon, Nov 18, 2024 at 7:52 AM Sandeep Kulkarni > <sandeep.kulkar...@veritas.com.invalid> wrote: > > > > Hi All, > > > > We are using Tika as a library and also making use of Fork Parser to launch > > Tika in a separate process. Things work for me for > > > > We have integrated Tika 3.0.0 and would like to try out support for readpst > > that was added to it (TIKA-4250). Main reason is to see if we can get rid > > of java-libpst which is marked EOL by various scanners and customers are > > complaining about it. > > > > I used the config example to disable OutlookPSTParser and enable > > LibPstParser in its place in the commit > > https://github.com/apache/tika/commit/32baf2345abe1a04d767ea6641a567d5c924587e > > > > As the new parser is not having any config option to specify path for > > readpst binary, I added path for it in system environment PATH variable. It > > is installed via Cygwin on Windows environment, path is like > > c:\cygwin64\bin. It is working fine, and new LibPstParser parser is getting > > launched. But when we do the same with Fork Parser, we get an error. > > > > [LibPstParser] Couldn't get version of libpst > > java.io.IOException: Cannot run program "readpst": CreateProcess error=2, > > The system cannot find the file specified > > at > > java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128) > > at > > java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071) > > at > > org.apache.tika.utils.ProcessUtils.execute(ProcessUtils.java:94) > > at > > org.apache.tika.parser.microsoft.libpst.LibPstParser.check(LibPstParser.java:176) > > at > > org.apache.tika.parser.microsoft.libpst.LibPstParser.initialize(LibPstParser.java:161) > > > > Any help would be appreciated. > > > > Last but not the least, the file for which I am doing testing is a plain > > text file, so not sure why the PST parser is getting invoked for it. > > > > Regards, > > Sandeep Kulkarni