We've encountered zombies that are waiting for a thread to exit that are
looping in ep_poll() almost endlessly although there is a pending SIGKILL
as a result of a group exit.

This happens because we always find ep_events_available() and fetch more
events and never are able to check for signal_pending() that would break
from the loop and return -EINTR.

Special case fatal signals and break immediately to guarantee that we
loop to fetch more events and delay making a timely exit.

It would also be possible to simply move the check for signal_pending()
higher than checking for ep_events_available(), but there have been no
reports of delayed signal handling other than SIGKILL preventing zombies
from exiting that would be fixed by this.

Signed-off-by: David Rientjes <rient...@google.com>
---
 fs/eventpoll.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1748,6 +1748,16 @@ static int ep_poll(struct eventpoll *ep, struct 
epoll_event __user *events,
                         * to TASK_INTERRUPTIBLE before doing the checks.
                         */
                        set_current_state(TASK_INTERRUPTIBLE);
+                       /*
+                        * Always short-circuit for fatal signals to allow
+                        * threads to make a timely exit without the chance of
+                        * finding more events available and fetching
+                        * repeatedly.
+                        */
+                       if (fatal_signal_pending(current)) {
+                               res = -EINTR;
+                               break;
+                       }
                        if (ep_events_available(ep) || timed_out)
                                break;
                        if (signal_pending(current)) {

Reply via email to