Am 23.04.2011 um 19:33 schrieb Ralph Castain: > On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote: > >>> I'm not sure what you are actually trying to accomplish >> >> I simply want a script that runs the equivalent of: >> >> mpirun command>& out& >> tail -f out >> >> such that hitting Ctrl+C stops tail but leaves mpirun running. I can >> certainly do this without mpirun, > > I don't think that's true. If both commands are in a script, then at least > for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- > processes.
What about setsid and pushing it in a new seesion instead of using & in the script? -- Reuti > > At least when I test it, even non-mpirun processes will abort. > >> it's not unreasonable to expect to be able to do the same with mpirun. > > I'm afraid it won't work, per my earlier comments. > >> I need mpirun to either ignore the SIGINT or not receive it at all -- and as >> per your comments, ignoring it is not an option. >> >> Let me rephrase my question then. With the following script: >> >> mpirun command>& out& >> tail -f out >> >> SIGINT stops tail AND mpirun. That's OK. The following: >> >> ( >> trap : SIGINT >> mpirun command>& out& >> ) >> tail -f out >> >> has the same effect, idicating that mpirun overrides previous traps in the >> same subshell. That's OK too. However the following: >> >> ( >> trap : SIGINT >> ( >> mpirun command>& out& >> ) >> ) >> tail -f out >> >> also has the same effect. How is mpirun overriding the trap in the *parent* >> subshell so that it ends up getting the SIGINT that was supposedly blocked >> at that level? Am I missing something trivial? How can I avoid this? > > I keep telling you - you can't. The better way to do this is to execute > mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail > without mpirun seeing it. > > But you are welcome to not believe me and continue thrashing... :-/ > >> >> Thanks, >> Pablo >> >> >> On 23/04/11 16:27, Ralph Castain wrote: >>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote: >>> >>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job >>>>>> should continue. >>>>> I don't think that is true at all. When you hit ctrl-C, >>>>> every process executing in the script receives it. Mpirun >>>>> traps the ctrl-c and immediately terminates all running >>>>> MPI procs. >>>> By "Ctrl+C should stop tail -f" I mean that this is the >>>> desired behaviour of the script, not that this is what ought >>>> to happen in general. My question is how to achieve this >>>> behaviour, since I'm having trouble working around mpirun >>>> catching sigint. >>> Like I said in my other response, you can't - mpirun automatically traps >>> sigint and terminates the job in order to ensure proper cleanup during >>> abnormal terminations. >>> >>> I'm not sure what you are actually trying to accomplish, but there are >>> other signals that don't cause termination. For example, we trap and >>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use. >>> >>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to >>> ignore it. >>> >>> >>>> Thanks, >>>> Pablo >>>> >>>> >>>> >>>> On 23/04/11 15:12, Ralph Castain wrote: >>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios: >>>>>> >>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The >>>>>>> script needs to run an MPI job in the background and tail -f the >>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should >>>>>>> continue. >>>>> I don't think that is true at all. When you hit ctrl-C, every process >>>>> executing in the script receives it. Mpirun traps the ctrl-c and >>>>> immediately terminates all running MPI procs. >>>>> >>>>> >>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, and >>>>>>> kills the job immediately. I've tried workarounds involving nohup, >>>>>>> disown, trap, subshells (including calling the script from within >>>>>>> itself), etc, to no avail. >>>>>>> >>>>>>> The problem is that this doesn't happen if I run the command directly >>>>>>> instead, without mpirun. Attached is a script that reproduces the >>>>>>> problem. It runs a simple counting script in the background which takes >>>>>>> 10 seconds to run, and tails the output. If called with "nompi" as >>>>>>> first argument, it will simply run bash -c "$SCRIPT">& "$out"&, and >>>>>>> with "mpi" it will do the same with 'mpirun -np 1' prepended. The >>>>>>> output I get is: >>>>>> what about: >>>>>> >>>>>> ( trap "" sigint; exec mpiexec ...)& >>>>>> >>>>>> i.e. replace the subshell with changed interrupt handling with the >>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can >>>>>> be checked in /proc/<pid>/status >>>>>> >>>>>> -- Reuti >>>>>> >>>>>>> $ ./ompi_bug.sh mpi >>>>>>> mpi: >>>>>>> 1 >>>>>>> 2 >>>>>>> 3 >>>>>>> 4 >>>>>>> ^C >>>>>>> $ ./ompi_bug.sh nompi >>>>>>> nompi: >>>>>>> 1 >>>>>>> 2 >>>>>>> 3 >>>>>>> 4 >>>>>>> ^C >>>>>>> $ cat output.* >>>>>>> mpi: >>>>>>> 1 >>>>>>> 2 >>>>>>> 3 >>>>>>> 4 >>>>>>> mpirun: killing job... >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme >>>>>>> exited on signal 0 (Unknown signal 0). >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpirun: clean termination accomplished >>>>>>> >>>>>>> nompi: >>>>>>> 1 >>>>>>> 2 >>>>>>> 3 >>>>>>> 4 >>>>>>> 5 >>>>>>> 6 >>>>>>> 7 >>>>>>> 8 >>>>>>> 9 >>>>>>> 10 >>>>>>> Done >>>>>>> >>>>>>> >>>>>>> This convinces me that there is something strange with OpenMPI, since I >>>>>>> expect no difference in signal handling when running a simple command >>>>>>> with or without mpirun in the middle. >>>>>>> >>>>>>> I've tried looking for options to change this behaviour, but I don't >>>>>>> seem to find any. Is there one, preferably in the form of an >>>>>>> environment variable? Or is this a bug? >>>>>>> >>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also >>>>>>> v1.2.8 as distributed with OpenSUSE 11.3. >>>>>>> >>>>>>> Thanks, >>>>>>> Pablo >>>>>>> <ompi_bug.sh.gz>_______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users