Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

Pablo Lopez Rios Sat, 23 Apr 2011 11:07:14 -0400

 what about:
 ( trap "" sigint; exec mpiexec ...)&


Yup, that's included in the workarounds I tried. Tried again with your specific 
suggestion; no luck.

 Well, maybe mpiexec is adjusting it on its own
 again. This can be checked in /proc/<pid>/status


The signal masks in /proc/$!/status are:

nompi (bash):
SigBlk: 0000000000010000 ->  16 blocked
SigIgn: 0000000000000006 ->  1,2 ignored
SigCgt: 0000000000010000 ->  16 caught

mpi (mpirun):
SigBlk: 0000000000000000 ->  none blocked
SigIgn: 0000000000000004 ->  2 ignored
SigCgt: 0000000180015ee2 ->  1,5,6,7,9,10,11,12,14,16,31,32 caught

I think I'm off by one in interpreting the above masks (for instance I would expect signals 30 and 
31 to be caught, not 31 and 32), but I'm already assuming that the least significant bit is 
"signal 0"; assuming it is "signal 1" would just worsen the values.

Anyway, why does mpirun bypass the traps I try to set and how do I stop it 
doing so?

Thanks,
Pablo

On 23/04/11 13:20, Reuti wrote:

Hi,

Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:

I'm having a bit of a problem with wrapping mpirun in a script. The script 
needs to run an MPI job in the background and tail -f the output. Pressing 
Ctrl+C should stop tail -f, and the MPI job should continue. However mpirun 
seems to detect the SIGINT that was meant for tail, and kills the job 
immediately. I've tried workarounds involving nohup, disown, trap, subshells 
(including calling the script from within itself), etc, to no avail.

The problem is that this doesn't happen if I run the command directly instead, without mpirun. Attached is a script that 
reproduces the problem. It runs a simple counting script in the background which takes 10 seconds to run, and tails the output. If 
called with "nompi" as first argument, it will simply run bash -c "$SCRIPT">&  "$out"&, 
and with "mpi" it will do the same with 'mpirun -np 1' prepended. The output I get is:

what about:

( trap "" sigint; exec mpiexec ...)&

i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, 
maybe mpiexec is adjusting it on its own again. This can be checked in 
/proc/<pid>/status

-- Reuti

$ ./ompi_bug.sh mpi
mpi:
1
2
3
4
^C
$ ./ompi_bug.sh nompi
nompi:
1
2
3
4
^C
$ cat output.*
mpi:
1
2
3
4
mpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on 
signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished

nompi:
1
2
3
4
5
6
7
8
9
10
Done


This convinces me that there is something strange with OpenMPI, since I expect 
no difference in signal handling when running a simple command with or without 
mpirun in the middle.

I've tried looking for options to change this behaviour, but I don't seem to 
find any. Is there one, preferably in the form of an environment variable? Or 
is this a bug?

I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as 
distributed with OpenSUSE 11.3.

Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

Reply via email to