On Apr 23, 2011, at 9:07 AM, Pablo Lopez Rios wrote: >> what about: >> ( trap "" sigint; exec mpiexec ...)& > > Yup, that's included in the workarounds I tried. Tried again with your > specific suggestion; no luck. > >> Well, maybe mpiexec is adjusting it on its own >> again. This can be checked in /proc/<pid>/status > > The signal masks in /proc/$!/status are: > > nompi (bash): > SigBlk: 0000000000010000 -> 16 blocked > SigIgn: 0000000000000006 -> 1,2 ignored > SigCgt: 0000000000010000 -> 16 caught > > mpi (mpirun): > SigBlk: 0000000000000000 -> none blocked > SigIgn: 0000000000000004 -> 2 ignored > SigCgt: 0000000180015ee2 -> 1,5,6,7,9,10,11,12,14,16,31,32 caught > > I think I'm off by one in interpreting the above masks (for instance I would > expect signals 30 and 31 to be caught, not 31 and 32), but I'm already > assuming that the least significant bit is "signal 0"; assuming it is "signal > 1" would just worsen the values. > > Anyway, why does mpirun bypass the traps I try to set and how do I stop it > doing so?
You can't - this is a design requirement for clean termination of MPI jobs when the user interrupts execution. > > Thanks, > Pablo > > On 23/04/11 13:20, Reuti wrote: >> Hi, >> >> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios: >> >>> I'm having a bit of a problem with wrapping mpirun in a script. The script >>> needs to run an MPI job in the background and tail -f the output. Pressing >>> Ctrl+C should stop tail -f, and the MPI job should continue. However mpirun >>> seems to detect the SIGINT that was meant for tail, and kills the job >>> immediately. I've tried workarounds involving nohup, disown, trap, >>> subshells (including calling the script from within itself), etc, to no >>> avail. >>> >>> The problem is that this doesn't happen if I run the command directly >>> instead, without mpirun. Attached is a script that reproduces the problem. >>> It runs a simple counting script in the background which takes 10 seconds >>> to run, and tails the output. If called with "nompi" as first argument, it >>> will simply run bash -c "$SCRIPT">& "$out"&, and with "mpi" it will do the >>> same with 'mpirun -np 1' prepended. The output I get is: >> what about: >> >> ( trap "" sigint; exec mpiexec ...)& >> >> i.e. replace the subshell with changed interrupt handling with the mpiexec. >> Well, maybe mpiexec is adjusting it on its own again. This can be checked in >> /proc/<pid>/status >> >> -- Reuti >> >>> $ ./ompi_bug.sh mpi >>> mpi: >>> 1 >>> 2 >>> 3 >>> 4 >>> ^C >>> $ ./ompi_bug.sh nompi >>> nompi: >>> 1 >>> 2 >>> 3 >>> 4 >>> ^C >>> $ cat output.* >>> mpi: >>> 1 >>> 2 >>> 3 >>> 4 >>> mpirun: killing job... >>> >>> -------------------------------------------------------------------------- >>> mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on >>> signal 0 (Unknown signal 0). >>> -------------------------------------------------------------------------- >>> mpirun: clean termination accomplished >>> >>> nompi: >>> 1 >>> 2 >>> 3 >>> 4 >>> 5 >>> 6 >>> 7 >>> 8 >>> 9 >>> 10 >>> Done >>> >>> >>> This convinces me that there is something strange with OpenMPI, since I >>> expect no difference in signal handling when running a simple command with >>> or without mpirun in the middle. >>> >>> I've tried looking for options to change this behaviour, but I don't seem >>> to find any. Is there one, preferably in the form of an environment >>> variable? Or is this a bug? >>> >>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 >>> as distributed with OpenSUSE 11.3. >>> >>> Thanks, >>> Pablo >>> <ompi_bug.sh.gz>_______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users