Hello,

When calling solr stop on linux, this command is used
*CHECK_PID=`ps auxww | awk '{print $2}' | grep -w $SOLR_PID | sort -r | tr
-d ' '`*
https://github.com/apache/solr/blob/122c88a0748769432ef62cc3fb94c2226dd67aa7/solr/bin/solr#L871

If Solr has stopped but remains as a zombie process then its process entry
will remain in the table, so *ps auxww* will continue to show the PID even
after kill -9. So that results in something like this, with 3 minutes
wasted waiting for a dead process to exit.






*[2021-07-21T09:15:12.365Z] Sending stop command to Solr running on port
8983 ... waiting up to 180 seconds to allow Jetty process 12622 to stop
gracefully.[2021-07-21T09:18:13.551Z]  [|] Solr process 12622 is still
running; jstacking it now.[2021-07-21T09:18:21.806Z] 12622: Unable to open
socket file /proc/12622/root/tmp/.java_pid12622: target process 12622
doesn't respond within 10500ms or HotSpot VM not
loaded[2021-07-21T09:18:21.806Z] Solr process 12622 is still running;
forcefully killing it now.[2021-07-21T09:18:21.806Z] Killed process
12622[2021-07-21T09:18:31.678Z] ERROR: Failed to kill previous Solr Java
process 12622 ... script fails.*

But the output of ps auxww does identify Zombie processes under STAT:
*USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND*
*root          12622  1.4  0.0              0     0       pts/1     Z
 10:42   0:26 [java] <defunct>   *

So the CHECK_PID could filter out Zombies.
Obviously the bigger issue is why the process has ended up as a Zombie (in
this case it was because of
https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
and not specifying "--init" when running Solr inside a docker container) so
maybe a message warning that the process is a zombie is worth having, so
that the user has an opportunity to do something about it.

I guess I will raise a JIRA issue with a patch to do that unless there's
some alternative suggestions?

Regards,
Colvin

Reply via email to