[ https://issues.apache.org/jira/browse/TIKA-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954583#comment-17954583 ]
Tino Schöllhorn commented on TIKA-4422: --------------------------------------- After further investigation we noticed that we ran out of disk space for the temporary-files. This explains why the worker processes could not be started again. Yet, it would greatly help if there would be better feedback in the error log for the reasons why the processes could not be started. Anyway, this is not a bug by tika. Yet, I change it to an possible improvement, though it was entirely my fault. > Availability problem with TikaServer 3.1.0 > ------------------------------------------ > > Key: TIKA-4422 > URL: https://issues.apache.org/jira/browse/TIKA-4422 > Project: Tika > Issue Type: Bug > Components: tika-server > Affects Versions: 3.1.0 > Environment: Java21 > Ubuntu22 > > Reporter: Tino Schöllhorn > Priority: Major > > Hi, > we have a problem when running the TikaServer. We use Tika 3.1.0 on Ubuntu > with Java21. > Previously, we used Tika 2.4.x - there we could not observe this problem. > We run a *lot* of text-extraction requests. After a few hours (8-10h) Tika is > not able to restart its worker processes. > Tika runs via systemd and via journalctl we see the following output: > > {noformat} > May 28 04:39:39 dss-index java[350084]: INFO [pool-2-thread-1] 04:39:39,752 > org.apache.tika.server.core.TikaServerWatchDog forked process exited with > exit value 3 > May 28 04:39:40 dss-index java[376963]: May 28, 2025 4:39:40 AM > org.apache.cxf.endpoint.ServerImpl initDestination > May 28 04:39:40 dss-index java[376963]: INFO: Setting the server's publish > address to be http://localhost:9998/ > May 28 05:35:32 dss-index java[350084]: INFO [pool-2-thread-1] 05:35:32,896 > org.apache.tika.server.core.TikaServerWatchDog forked process exited with > exit value 2 > May 28 05:35:34 dss-index java[377213]: May 28, 2025 5:35:34 AM > org.apache.cxf.endpoint.ServerImpl initDestination > May 28 05:35:34 dss-index java[377213]: INFO: Setting the server's publish > address to be http://localhost:9998/{noformat} > After these messages the TikaServer does not respond to requests any more. A > restart of the Tika-Parent process is the only thing which helps. > The error messages are emitted in TikaServerWatchDog:161. Yet, I do not > understand what is going wrong here. Probably the messages are error > messages from the OS. perror gives the following output: > {noformat} > OS error code 2: No such file or directory > OS error code 3: No such process{noformat} > Yet, it is unclear to me, what happens. Below you'll find the tika.config. > As far as I understand the situation this seems a bug which has been > introduced sometime between version 2.4.x and 3.1.0. > Hope that someone has an idea what is going on and how this can be remedied. > Tino > – tika.config.start > {code:java} > <?xml version="1.0" encoding="UTF-8"?> > <properties> > <parsers> > <parser class="org.apache.tika.parser.DefaultParser"> > </parser> > </parsers> > <server> > <params> > <port>9998</port> > <host>localhost</host> > <digest>sha256</digest> > <digestMarkLimit>1000000</digestMarkLimit> > <id></id> > <cors>NONE</cors> > <logLevel>info</logLevel> > <returnStackTrace>false</returnStackTrace> > <noFork>false</noFork> > <taskTimeoutMillis>300000</taskTimeoutMillis> > <maxForkedStartupMillis>120000</maxForkedStartupMillis> > <maxRestarts>-1</maxRestarts> > <maxFiles>25000</maxFiles> > <javaPath>java</javaPath> > <forkedJvmArgs> > <arg>-Xms4g</arg> > <arg>-Xmx4g</arg> > <arg>-Dlog4j.configurationFile=tika-forked-log4j2.xml</arg> > </forkedJvmArgs> > <enableUnsecureFeatures>false</enableUnsecureFeatures> > <endpoints> > <endpoint>status</endpoint> > <endpoint>tika</endpoint> > <endpoint>rmeta</endpoint> > <endpoint>language</endpoint> > </endpoints> > </params> > </server> > </properties> > {code} > – tika.config.stop > -- This message was sent by Atlassian Jira (v8.20.10#820010)