On Sat, Jul 6, 2019 at 12:13 AM José María Mateos <ch...@rinzewind.org> wrote: > > Hi, > > This is a minimal proof of concept for something that has been bugging me for > a few days: > > ``` > $ cat signal_multiprocessing_poc.py > > import random > import multiprocessing > import signal > import time > > def signal_handler(signum, frame): > raise Exception(f"Unexpected signal {signum}!") > > def process_message(args): > time.sleep(random.random() / 100.) > > if __name__ == "__main__": > signal.signal(signal.SIGTERM, signal_handler) > n_round = 1 > while n_round < 10: > job_list = [x for x in range(random.randint(100, 400))] > print(f"Running round {n_round} with {len(job_list)} jobs") > with multiprocessing.Pool(8) as p1: > p1.map(process_message, job_list) > n_round += 1 > > ``` > > So basically I have some subprocesses that don't do anything, just sleep for > a few milliseconds, and I capture SIGTERM signals. I don't expect > > This is the output: > > ``` > $ python signal_multiprocessing_poc.py > Running round 1 with 244 jobs > Running round 2 with 151 jobs > Running round 3 with 173 jobs > Running round 4 with 124 jobs > Running round 5 with 249 jobs > Running round 6 with 359 jobs > Process ForkPoolWorker-48: > Traceback (most recent call last): > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", > line 297, in _bootstrap > self.run() > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", > line 99, in run > self._target(*self._args, **self._kwargs) > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/pool.py", line > 110, in worker > task = get() > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/queues.py", > line 352, in get > res = self._reader.recv_bytes() > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/synchronize.py", > line 98, in __exit__ > return self._semlock.__exit__(*args) > File "signal_multiprocessing_poc.py", line 7, in signal_handler > raise Exception(f"Unexpected signal {signum}!") > Exception: Unexpected signal 15! > Running round 7 with 185 jobs > Running round 8 with 246 jobs > Running round 9 with 217 jobs > Process ForkPoolWorker-68: > Traceback (most recent call last): > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", > line 297, in _bootstrap > self.run() > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/process.py", > line 99, in run > self._target(*self._args, **self._kwargs) > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/pool.py", line > 110, in worker > task = get() > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/queues.py", > line 352, in get > res = self._reader.recv_bytes() > File > "/home/j_mariamateos/miniconda3/lib/python3.7/multiprocessing/synchronize.py", > line 98, in __exit__ > return self._semlock.__exit__(*args) > File "signal_multiprocessing_poc.py", line 7, in signal_handler > raise Exception(f"Unexpected signal {signum}!") > Exception: Unexpected signal 15! > ``` > > Can anyone help me understand why the random SIGTERM gets captured? I guess > some of the children processes receive the signal for termination and that's > why I see it, but I don't understand the randomness of it. > > Of course, as a coworker just pointed out, this works (telling the children > processes to ignore the signal handler): > > ``` > import random > import multiprocessing > import signal > import time > > def signal_handler(signum, frame): > raise Exception(f"Unexpected signal {signum}!") > > def init_worker(): > signal.signal(signal.SIGTERM, signal.SIG_IGN) > signal.signal(signal.SIGINT, signal.SIG_IGN) > > def process_message(args): > time.sleep(random.random() / 100.) > > if __name__ == "__main__": > > signal.signal(signal.SIGTERM, signal_handler) > signal.signal(signal.SIGINT, signal_handler) > n_round = 1 > > while n_round < 20: > job_list = [x for x in range(random.randint(100, 400))] > print(f"Running round {n_round} with {len(job_list)} jobs") > with multiprocessing.Pool(8,init_worker) as p1: > p1.map(process_message, job_list) > n_round += 1 > ``` >
In the interests of test consistency, I removed all the random number generation from the test, locking in some numbers instead: import multiprocessing import signal import time def signal_handler(signum, frame): raise Exception(f"Unexpected signal {signum}!") def process_message(args): time.sleep(0.005) if __name__ == "__main__": signal.signal(signal.SIGTERM, signal_handler) for n_round in range(1, 10): job_list = range(250) print(f"Running round {n_round} with {len(job_list)} jobs") with multiprocessing.Pool(8) as p1: p1.map(process_message, job_list) The SIGTERMs are still happening, and it seems to be somewhat unpredictable. I can have a couple of runs of the program with none, then two come up in the same run. Once, I actually got a chained exception. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "mpterm.py", line 16, in <module> with multiprocessing.Pool(8) as p1: File "/usr/local/lib/python3.9/multiprocessing/context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 218, in __init__ p.join() File "/usr/local/lib/python3.9/multiprocessing/process.py", line 147, in join assert self._parent_pid == os.getpid(), 'can only join a child process' AssertionError: can only join a child process (Yes, that says python3.9 there, but I don't think this is version-dependent; the same thing happens on 3.8, 3.7, and 3.6. On 3.5, of course, f-strings aren't available.) But if I comment out the signal.signal line, there seem to be no ill effects. I suspect that what you're seeing here is the multiprocessing module managing its own subprocesses, telling some of them to shut down. I added a print call to multiprocessing/popen_fork.py inside _send_signal (line 53 or thereabouts depending on Python version) and saw a *lot* of termination signals being sent; only a few actually triggered the exception message. My guess is that most of the time, the SIGTERM is smoothly handled as part of the Pool's __exit__ method, but sometimes the child process is blocked on something, and has to be told to shut down; and then normally, the signal gets caught and handled just fine, but since you're explicitly hooking it, you get to see it. ChrisA -- https://mail.python.org/mailman/listinfo/python-list