On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote: >> What about setsid and pushing it in a new >> seesion instead of using& in the script? > > :-) That works. Thanks! > > NB, the working script looks like: > > setsid bash -c "mpirun command>& out"& > tail -f out >
Yes - but now you can't kill mpirun when something goes wrong....<shrug> > Thanks, > Pablo > > > On 23/04/11 18:39, Reuti wrote: >> Am 23.04.2011 um 19:33 schrieb Ralph Castain: >> >>> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote: >>> >>>>> I'm not sure what you are actually trying to accomplish >>>> I simply want a script that runs the equivalent of: >>>> >>>> mpirun command>& out& >>>> tail -f out >>>> >>>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can >>>> certainly do this without mpirun, >>> I don't think that's true. If both commands are in a script, then at least >>> for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- >>> processes. >> What about setsid and pushing it in a new seesion instead of using& in the >> script? >> >> -- Reuti >> >> >>> At least when I test it, even non-mpirun processes will abort. >>> >>>> it's not unreasonable to expect to be able to do the same with mpirun. >>> I'm afraid it won't work, per my earlier comments. >>> >>>> I need mpirun to either ignore the SIGINT or not receive it at all -- and >>>> as per your comments, ignoring it is not an option. >>>> >>>> Let me rephrase my question then. With the following script: >>>> >>>> mpirun command>& out& >>>> tail -f out >>>> >>>> SIGINT stops tail AND mpirun. That's OK. The following: >>>> >>>> ( >>>> trap : SIGINT >>>> mpirun command>& out& >>>> ) >>>> tail -f out >>>> >>>> has the same effect, idicating that mpirun overrides previous traps in the >>>> same subshell. That's OK too. However the following: >>>> >>>> ( >>>> trap : SIGINT >>>> ( >>>> mpirun command>& out& >>>> ) >>>> ) >>>> tail -f out >>>> >>>> also has the same effect. How is mpirun overriding the trap in the >>>> *parent* subshell so that it ends up getting the SIGINT that was >>>> supposedly blocked at that level? Am I missing something trivial? How can >>>> I avoid this? >>> I keep telling you - you can't. The better way to do this is to execute >>> mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail >>> without mpirun seeing it. >>> >>> But you are welcome to not believe me and continue thrashing... :-/ >>> >>>> Thanks, >>>> Pablo >>>> >>>> >>>> On 23/04/11 16:27, Ralph Castain wrote: >>>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote: >>>>> >>>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job >>>>>>>> should continue. >>>>>>> I don't think that is true at all. When you hit ctrl-C, >>>>>>> every process executing in the script receives it. Mpirun >>>>>>> traps the ctrl-c and immediately terminates all running >>>>>>> MPI procs. >>>>>> By "Ctrl+C should stop tail -f" I mean that this is the >>>>>> desired behaviour of the script, not that this is what ought >>>>>> to happen in general. My question is how to achieve this >>>>>> behaviour, since I'm having trouble working around mpirun >>>>>> catching sigint. >>>>> Like I said in my other response, you can't - mpirun automatically traps >>>>> sigint and terminates the job in order to ensure proper cleanup during >>>>> abnormal terminations. >>>>> >>>>> I'm not sure what you are actually trying to accomplish, but there are >>>>> other signals that don't cause termination. For example, we trap and >>>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use. >>>>> >>>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to >>>>> ignore it. >>>>> >>>>> >>>>>> Thanks, >>>>>> Pablo >>>>>> >>>>>> >>>>>> >>>>>> On 23/04/11 15:12, Ralph Castain wrote: >>>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios: >>>>>>>> >>>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The >>>>>>>>> script needs to run an MPI job in the background and tail -f the >>>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should >>>>>>>>> continue. >>>>>>> I don't think that is true at all. When you hit ctrl-C, every process >>>>>>> executing in the script receives it. Mpirun traps the ctrl-c and >>>>>>> immediately terminates all running MPI procs. >>>>>>> >>>>>>> >>>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, >>>>>>>>> and kills the job immediately. I've tried workarounds involving >>>>>>>>> nohup, disown, trap, subshells (including calling the script from >>>>>>>>> within itself), etc, to no avail. >>>>>>>>> >>>>>>>>> The problem is that this doesn't happen if I run the command directly >>>>>>>>> instead, without mpirun. Attached is a script that reproduces the >>>>>>>>> problem. It runs a simple counting script in the background which >>>>>>>>> takes 10 seconds to run, and tails the output. If called with "nompi" >>>>>>>>> as first argument, it will simply run bash -c "$SCRIPT">& "$out"&, >>>>>>>>> and with "mpi" it will do the same with 'mpirun -np 1' prepended. The >>>>>>>>> output I get is: >>>>>>>> what about: >>>>>>>> >>>>>>>> ( trap "" sigint; exec mpiexec ...)& >>>>>>>> >>>>>>>> i.e. replace the subshell with changed interrupt handling with the >>>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This >>>>>>>> can be checked in /proc/<pid>/status >>>>>>>> >>>>>>>> -- Reuti >>>>>>>> >>>>>>>>> $ ./ompi_bug.sh mpi >>>>>>>>> mpi: >>>>>>>>> 1 >>>>>>>>> 2 >>>>>>>>> 3 >>>>>>>>> 4 >>>>>>>>> ^C >>>>>>>>> $ ./ompi_bug.sh nompi >>>>>>>>> nompi: >>>>>>>>> 1 >>>>>>>>> 2 >>>>>>>>> 3 >>>>>>>>> 4 >>>>>>>>> ^C >>>>>>>>> $ cat output.* >>>>>>>>> mpi: >>>>>>>>> 1 >>>>>>>>> 2 >>>>>>>>> 3 >>>>>>>>> 4 >>>>>>>>> mpirun: killing job... >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme >>>>>>>>> exited on signal 0 (Unknown signal 0). >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> mpirun: clean termination accomplished >>>>>>>>> >>>>>>>>> nompi: >>>>>>>>> 1 >>>>>>>>> 2 >>>>>>>>> 3 >>>>>>>>> 4 >>>>>>>>> 5 >>>>>>>>> 6 >>>>>>>>> 7 >>>>>>>>> 8 >>>>>>>>> 9 >>>>>>>>> 10 >>>>>>>>> Done >>>>>>>>> >>>>>>>>> >>>>>>>>> This convinces me that there is something strange with OpenMPI, since >>>>>>>>> I expect no difference in signal handling when running a simple >>>>>>>>> command with or without mpirun in the middle. >>>>>>>>> >>>>>>>>> I've tried looking for options to change this behaviour, but I don't >>>>>>>>> seem to find any. Is there one, preferably in the form of an >>>>>>>>> environment variable? Or is this a bug? >>>>>>>>> >>>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also >>>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Pablo >>>>>>>>> <ompi_bug.sh.gz>_______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users