Am 23.04.2011 um 19:58 schrieb Ralph Castain: > > On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote: > >>> What about setsid and pushing it in a new >>> seesion instead of using& in the script? >> >> :-) That works. Thanks! >> >> NB, the working script looks like: >> >> setsid bash -c "mpirun command>& out"& >> tail -f out >> > > Yes - but now you can't kill mpirun when something goes wrong....<shrug>
You can still send a sigint from the command line to the mpirun process or its process group besides killall. -- Reuti >> Thanks, >> Pablo >> >> >> On 23/04/11 18:39, Reuti wrote: >>> Am 23.04.2011 um 19:33 schrieb Ralph Castain: >>> >>>> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote: >>>> >>>>>> I'm not sure what you are actually trying to accomplish >>>>> I simply want a script that runs the equivalent of: >>>>> >>>>> mpirun command>& out& >>>>> tail -f out >>>>> >>>>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can >>>>> certainly do this without mpirun, >>>> I don't think that's true. If both commands are in a script, then at least >>>> for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- >>>> processes. >>> What about setsid and pushing it in a new seesion instead of using& in the >>> script? >>> >>> -- Reuti >>> >>> >>>> At least when I test it, even non-mpirun processes will abort. >>>> >>>>> it's not unreasonable to expect to be able to do the same with mpirun. >>>> I'm afraid it won't work, per my earlier comments. >>>> >>>>> I need mpirun to either ignore the SIGINT or not receive it at all -- and >>>>> as per your comments, ignoring it is not an option. >>>>> >>>>> Let me rephrase my question then. With the following script: >>>>> >>>>> mpirun command>& out& >>>>> tail -f out >>>>> >>>>> SIGINT stops tail AND mpirun. That's OK. The following: >>>>> >>>>> ( >>>>> trap : SIGINT >>>>> mpirun command>& out& >>>>> ) >>>>> tail -f out >>>>> >>>>> has the same effect, idicating that mpirun overrides previous traps in >>>>> the same subshell. That's OK too. However the following: >>>>> >>>>> ( >>>>> trap : SIGINT >>>>> ( >>>>> mpirun command>& out& >>>>> ) >>>>> ) >>>>> tail -f out >>>>> >>>>> also has the same effect. How is mpirun overriding the trap in the >>>>> *parent* subshell so that it ends up getting the SIGINT that was >>>>> supposedly blocked at that level? Am I missing something trivial? How can >>>>> I avoid this? >>>> I keep telling you - you can't. The better way to do this is to execute >>>> mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail >>>> without mpirun seeing it. >>>> >>>> But you are welcome to not believe me and continue thrashing... :-/ >>>> >>>>> Thanks, >>>>> Pablo >>>>> >>>>> >>>>> On 23/04/11 16:27, Ralph Castain wrote: >>>>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote: >>>>>> >>>>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job >>>>>>>>> should continue. >>>>>>>> I don't think that is true at all. When you hit ctrl-C, >>>>>>>> every process executing in the script receives it. Mpirun >>>>>>>> traps the ctrl-c and immediately terminates all running >>>>>>>> MPI procs. >>>>>>> By "Ctrl+C should stop tail -f" I mean that this is the >>>>>>> desired behaviour of the script, not that this is what ought >>>>>>> to happen in general. My question is how to achieve this >>>>>>> behaviour, since I'm having trouble working around mpirun >>>>>>> catching sigint. >>>>>> Like I said in my other response, you can't - mpirun automatically traps >>>>>> sigint and terminates the job in order to ensure proper cleanup during >>>>>> abnormal terminations. >>>>>> >>>>>> I'm not sure what you are actually trying to accomplish, but there are >>>>>> other signals that don't cause termination. For example, we trap and >>>>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use. >>>>>> >>>>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to >>>>>> ignore it. >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Pablo >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 23/04/11 15:12, Ralph Castain wrote: >>>>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios: >>>>>>>>> >>>>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The >>>>>>>>>> script needs to run an MPI job in the background and tail -f the >>>>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should >>>>>>>>>> continue. >>>>>>>> I don't think that is true at all. When you hit ctrl-C, every process >>>>>>>> executing in the script receives it. Mpirun traps the ctrl-c and >>>>>>>> immediately terminates all running MPI procs. >>>>>>>> >>>>>>>> >>>>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, >>>>>>>>>> and kills the job immediately. I've tried workarounds involving >>>>>>>>>> nohup, disown, trap, subshells (including calling the script from >>>>>>>>>> within itself), etc, to no avail. >>>>>>>>>> >>>>>>>>>> The problem is that this doesn't happen if I run the command >>>>>>>>>> directly instead, without mpirun. Attached is a script that >>>>>>>>>> reproduces the problem. It runs a simple counting script in the >>>>>>>>>> background which takes 10 seconds to run, and tails the output. If >>>>>>>>>> called with "nompi" as first argument, it will simply run bash -c >>>>>>>>>> "$SCRIPT">& "$out"&, and with "mpi" it will do the same with >>>>>>>>>> 'mpirun -np 1' prepended. The output I get is: >>>>>>>>> what about: >>>>>>>>> >>>>>>>>> ( trap "" sigint; exec mpiexec ...)& >>>>>>>>> >>>>>>>>> i.e. replace the subshell with changed interrupt handling with the >>>>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This >>>>>>>>> can be checked in /proc/<pid>/status >>>>>>>>> >>>>>>>>> -- Reuti >>>>>>>>> >>>>>>>>>> $ ./ompi_bug.sh mpi >>>>>>>>>> mpi: >>>>>>>>>> 1 >>>>>>>>>> 2 >>>>>>>>>> 3 >>>>>>>>>> 4 >>>>>>>>>> ^C >>>>>>>>>> $ ./ompi_bug.sh nompi >>>>>>>>>> nompi: >>>>>>>>>> 1 >>>>>>>>>> 2 >>>>>>>>>> 3 >>>>>>>>>> 4 >>>>>>>>>> ^C >>>>>>>>>> $ cat output.* >>>>>>>>>> mpi: >>>>>>>>>> 1 >>>>>>>>>> 2 >>>>>>>>>> 3 >>>>>>>>>> 4 >>>>>>>>>> mpirun: killing job... >>>>>>>>>> >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme >>>>>>>>>> exited on signal 0 (Unknown signal 0). >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> mpirun: clean termination accomplished >>>>>>>>>> >>>>>>>>>> nompi: >>>>>>>>>> 1 >>>>>>>>>> 2 >>>>>>>>>> 3 >>>>>>>>>> 4 >>>>>>>>>> 5 >>>>>>>>>> 6 >>>>>>>>>> 7 >>>>>>>>>> 8 >>>>>>>>>> 9 >>>>>>>>>> 10 >>>>>>>>>> Done >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This convinces me that there is something strange with OpenMPI, >>>>>>>>>> since I expect no difference in signal handling when running a >>>>>>>>>> simple command with or without mpirun in the middle. >>>>>>>>>> >>>>>>>>>> I've tried looking for options to change this behaviour, but I don't >>>>>>>>>> seem to find any. Is there one, preferably in the form of an >>>>>>>>>> environment variable? Or is this a bug? >>>>>>>>>> >>>>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also >>>>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Pablo >>>>>>>>>> <ompi_bug.sh.gz>_______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users