Currently, mpirun takes that second SIGINT to mean "you seem to be stuck trying to cleanly abort - just die", which means mpirun exits immediately without doing any cleanup. The individual procs all commit suicide when they see their daemons go away, which is why you don't get zombies left behind...but it does mean that the vader files are left.
The second SIGINT has to come within a 5 second window of the first one to trigger that immediate exit, so one solution would be for you to delay your passing of the SIGINT to mpirun for more than 5 seconds. Alternatively, you could just not pass the signal at all since mpirun already received it - is there some reason why you need to pass the signal down? Are you trying to do your cleanup _before_ mpirun does its? Guess I'm wondering: why not just trap the signal, do your cleanup, and then wait for mpirun to terminate? On Apr 6, 2020, at 6:39 AM, Kreutzer, Moritz via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: Hi, We are invoking mpirun from within a script which installs some signal handlers. Now, if we abort an Open MPI run with CTRL+C, the system sends SIGINT to the entire process group. Hence, the mpirun process receives a SIGINT from the system with si_code=SI_KERNEL. Additionally, our own signal handler intercepts SIGINT, does some clean up, and sends the SIGINT further to the mpirun process with si_code=SI_USER. Consequently, mpirun receives 2x SIGINT. This leads to unclean termination with Open MPI 4.0.3. While it does not leave behind any zombie processes, killing it in the described way leads to leftover vader shared memory segment files in /dev/shm (a known issue with Open MPI 3, but supposedly resolved in Open MPI 4). Also, strace shows that the mpirun process does not receive any SIGCHILD. If we remove our own signal handler (which is not our preferred option), mpirun receives only a single SIGINT and n times SIGCHILD (n is the number of processes). Also, this leads to correct clean up of vader shared memory segment files. Is it expected that the cleanup fails when mpirun receivs multiple signals at the same time? If yes, is the only way to guarantee proper clean up to always make sure that only a single signal gets propagated to mpirun? Thanks, Moritz -- Moritz Kreutzer Siemens Digital Industries Software Simulation and Test Solutions, Product Development, High Performance Computing Nordostpark 3 90411 Nuremberg, Germany Tel.: +49 (911) 38379 8085 moritz.kreut...@siemens.com <mailto:moritz.kreut...@siemens.com> www.sw.siemens.com <http://www.sw.siemens.com/> ----------------- Siemens Industry Software GmbH; Anschrift: Franz-Geuer-Str. 10, 50823 Köln; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Dr. Erich Bürgel, Alexander Walter; Sitz der Gesellschaft: Köln; Registergericht: Amtsgericht Köln, HRB 84564; Vorsitzender des Aufsichtsrats: Jürgen Köhler