On Apr 23, 2011, at 6:20 AM, Reuti wrote:

> Hi,
> 
> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
> 
>> I'm having a bit of a problem with wrapping mpirun in a script. The script 
>> needs to run an MPI job in the background and tail -f the output. Pressing 
>> Ctrl+C should stop tail -f, and the MPI job should continue.

I don't think that is true at all. When you hit ctrl-C, every process executing 
in the script receives it. Mpirun traps the ctrl-c and immediately terminates 
all running MPI procs.


>> However mpirun seems to detect the SIGINT that was meant for tail, and kills 
>> the job immediately. I've tried workarounds involving nohup, disown, trap, 
>> subshells (including calling the script from within itself), etc, to no 
>> avail.
>> 
>> The problem is that this doesn't happen if I run the command directly 
>> instead, without mpirun. Attached is a script that reproduces the problem. 
>> It runs a simple counting script in the background which takes 10 seconds to 
>> run, and tails the output. If called with "nompi" as first argument, it will 
>> simply run bash -c "$SCRIPT" >& "$out" &, and with "mpi" it will do the same 
>> with 'mpirun -np 1' prepended. The output I get is:
> 
> what about:
> 
> ( trap "" sigint; exec mpiexec ...) &
> 
> i.e. replace the subshell with changed interrupt handling with the mpiexec. 
> Well, maybe mpiexec is adjusting it on its own again. This can be checked in 
> /proc/<pid>/status
> 
> -- Reuti
> 
>> 
>> $ ./ompi_bug.sh mpi
>> mpi:
>> 1
>> 2
>> 3
>> 4
>> ^C
>> $ ./ompi_bug.sh nompi
>> nompi:
>> 1
>> 2
>> 3
>> 4
>> ^C
>> $ cat output.*
>> mpi:
>> 1
>> 2
>> 3
>> 4
>> mpirun: killing job...
>> 
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on 
>> signal 0 (Unknown signal 0).
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>> 
>> nompi:
>> 1
>> 2
>> 3
>> 4
>> 5
>> 6
>> 7
>> 8
>> 9
>> 10
>> Done
>> 
>> 
>> This convinces me that there is something strange with OpenMPI, since I 
>> expect no difference in signal handling when running a simple command with 
>> or without mpirun in the middle.
>> 
>> I've tried looking for options to change this behaviour, but I don't seem to 
>> find any. Is there one, preferably in the form of an environment variable? 
>> Or is this a bug?
>> 
>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 
>> as distributed with OpenSUSE 11.3.
>> 
>> Thanks,
>> Pablo
>> <ompi_bug.sh.gz>_______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to