> Pressing Ctrl+C should stop tail -f, and the MPI job
> should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Thanks,
Pablo
On 23/04/11 15:12, Ralph Castain wrote:
On Apr 23, 2011, at 6:20 AM, Reuti wrote:
Hi,
Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
I'm having a bit of a problem with wrapping mpirun in a script. The script
needs to run an MPI job in the background and tail -f the output. Pressing
Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing
in the script receives it. Mpirun traps the ctrl-c and immediately terminates
all running MPI procs.
However mpirun seems to detect the SIGINT that was meant for tail, and kills
the job immediately. I've tried workarounds involving nohup, disown, trap,
subshells (including calling the script from within itself), etc, to no avail.
The problem is that this doesn't happen if I run the command directly instead, without mpirun. Attached is a script that
reproduces the problem. It runs a simple counting script in the background which takes 10 seconds to run, and tails the output. If
called with "nompi" as first argument, it will simply run bash -c "$SCRIPT">& "$out"&,
and with "mpi" it will do the same with 'mpirun -np 1' prepended. The output I get is:
what about:
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well,
maybe mpiexec is adjusting it on its own again. This can be checked in
/proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
mpi:
1
2
3
4
^C
$ ./ompi_bug.sh nompi
nompi:
1
2
3
4
^C
$ cat output.*
mpi:
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on
signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
nompi:
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect
no difference in signal handling when running a simple command with or without
mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to
find any. Is there one, preferably in the form of an environment variable? Or
is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as
distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users