On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote: >> I'm not sure what you are actually trying to accomplish > > I simply want a script that runs the equivalent of: > > mpirun command>& out& > tail -f out > > such that hitting Ctrl+C stops tail but leaves mpirun running. I can > certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes. At least when I test it, even non-mpirun processes will abort. > it's not unreasonable to expect to be able to do the same with mpirun. I'm afraid it won't work, per my earlier comments. > I need mpirun to either ignore the SIGINT or not receive it at all -- and as > per your comments, ignoring it is not an option. > > Let me rephrase my question then. With the following script: > > mpirun command>& out& > tail -f out > > SIGINT stops tail AND mpirun. That's OK. The following: > > ( > trap : SIGINT > mpirun command>& out& > ) > tail -f out > > has the same effect, idicating that mpirun overrides previous traps in the > same subshell. That's OK too. However the following: > > ( > trap : SIGINT > ( > mpirun command>& out& > ) > ) > tail -f out > > also has the same effect. How is mpirun overriding the trap in the *parent* > subshell so that it ends up getting the SIGINT that was supposedly blocked at > that level? Am I missing something trivial? How can I avoid this? I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it. But you are welcome to not believe me and continue thrashing... :-/ > > Thanks, > Pablo > > > On 23/04/11 16:27, Ralph Castain wrote: >> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote: >> >>>>> Pressing Ctrl+C should stop tail -f, and the MPI job >>>>> should continue. >>>> I don't think that is true at all. When you hit ctrl-C, >>>> every process executing in the script receives it. Mpirun >>>> traps the ctrl-c and immediately terminates all running >>>> MPI procs. >>> By "Ctrl+C should stop tail -f" I mean that this is the >>> desired behaviour of the script, not that this is what ought >>> to happen in general. My question is how to achieve this >>> behaviour, since I'm having trouble working around mpirun >>> catching sigint. >> Like I said in my other response, you can't - mpirun automatically traps >> sigint and terminates the job in order to ensure proper cleanup during >> abnormal terminations. >> >> I'm not sure what you are actually trying to accomplish, but there are other >> signals that don't cause termination. For example, we trap and forward >> SIGUSR1 and SIGUSR2 to your application procs, if that is of use. >> >> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to >> ignore it. >> >> >>> Thanks, >>> Pablo >>> >>> >>> >>> On 23/04/11 15:12, Ralph Castain wrote: >>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote: >>>> >>>>> Hi, >>>>> >>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios: >>>>> >>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The >>>>>> script needs to run an MPI job in the background and tail -f the output. >>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job should continue. >>>> I don't think that is true at all. When you hit ctrl-C, every process >>>> executing in the script receives it. Mpirun traps the ctrl-c and >>>> immediately terminates all running MPI procs. >>>> >>>> >>>>>> However mpirun seems to detect the SIGINT that was meant for tail, and >>>>>> kills the job immediately. I've tried workarounds involving nohup, >>>>>> disown, trap, subshells (including calling the script from within >>>>>> itself), etc, to no avail. >>>>>> >>>>>> The problem is that this doesn't happen if I run the command directly >>>>>> instead, without mpirun. Attached is a script that reproduces the >>>>>> problem. It runs a simple counting script in the background which takes >>>>>> 10 seconds to run, and tails the output. If called with "nompi" as first >>>>>> argument, it will simply run bash -c "$SCRIPT">& "$out"&, and with >>>>>> "mpi" it will do the same with 'mpirun -np 1' prepended. The output I >>>>>> get is: >>>>> what about: >>>>> >>>>> ( trap "" sigint; exec mpiexec ...)& >>>>> >>>>> i.e. replace the subshell with changed interrupt handling with the >>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can >>>>> be checked in /proc/<pid>/status >>>>> >>>>> -- Reuti >>>>> >>>>>> $ ./ompi_bug.sh mpi >>>>>> mpi: >>>>>> 1 >>>>>> 2 >>>>>> 3 >>>>>> 4 >>>>>> ^C >>>>>> $ ./ompi_bug.sh nompi >>>>>> nompi: >>>>>> 1 >>>>>> 2 >>>>>> 3 >>>>>> 4 >>>>>> ^C >>>>>> $ cat output.* >>>>>> mpi: >>>>>> 1 >>>>>> 2 >>>>>> 3 >>>>>> 4 >>>>>> mpirun: killing job... >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme exited >>>>>> on signal 0 (Unknown signal 0). >>>>>> -------------------------------------------------------------------------- >>>>>> mpirun: clean termination accomplished >>>>>> >>>>>> nompi: >>>>>> 1 >>>>>> 2 >>>>>> 3 >>>>>> 4 >>>>>> 5 >>>>>> 6 >>>>>> 7 >>>>>> 8 >>>>>> 9 >>>>>> 10 >>>>>> Done >>>>>> >>>>>> >>>>>> This convinces me that there is something strange with OpenMPI, since I >>>>>> expect no difference in signal handling when running a simple command >>>>>> with or without mpirun in the middle. >>>>>> >>>>>> I've tried looking for options to change this behaviour, but I don't >>>>>> seem to find any. Is there one, preferably in the form of an environment >>>>>> variable? Or is this a bug? >>>>>> >>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also >>>>>> v1.2.8 as distributed with OpenSUSE 11.3. >>>>>> >>>>>> Thanks, >>>>>> Pablo >>>>>> <ompi_bug.sh.gz>_______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users