Am 23.04.2011 um 19:58 schrieb Ralph Castain:

> 
> On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote:
> 
>>> What about setsid and pushing it in a new
>>> seesion instead of using&  in the script?
>> 
>> :-) That works. Thanks!
>> 
>> NB, the working script looks like:
>> 
>> setsid bash -c "mpirun command>&  out"&
>> tail -f out
>> 
> 
> Yes - but now you can't kill mpirun when something goes wrong....<shrug>

You can still send a sigint from the command line to the mpirun process or its 
process group besides killall.

-- Reuti


>> Thanks,
>> Pablo
>> 
>> 
>> On 23/04/11 18:39, Reuti wrote:
>>> Am 23.04.2011 um 19:33 schrieb Ralph Castain:
>>> 
>>>> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:
>>>> 
>>>>>> I'm not sure what you are actually trying to accomplish
>>>>> I simply want a script that runs the equivalent of:
>>>>> 
>>>>> mpirun command>&   out&
>>>>> tail -f out
>>>>> 
>>>>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can 
>>>>> certainly do this without mpirun,
>>>> I don't think that's true. If both commands are in a script, then at least 
>>>> for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- 
>>>> processes.
>>> What about setsid and pushing it in a new seesion instead of using&  in the 
>>> script?
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> At least when I test it, even non-mpirun processes will abort.
>>>> 
>>>>> it's not unreasonable to expect to be able to do the same with mpirun.
>>>> I'm afraid it won't work, per my earlier comments.
>>>> 
>>>>> I need mpirun to either ignore the SIGINT or not receive it at all -- and 
>>>>> as per your comments, ignoring it is not an option.
>>>>> 
>>>>> Let me rephrase my question then. With the following script:
>>>>> 
>>>>> mpirun command>&   out&
>>>>> tail -f out
>>>>> 
>>>>> SIGINT stops tail AND mpirun. That's OK. The following:
>>>>> 
>>>>> (
>>>>> trap : SIGINT
>>>>> mpirun command>&   out&
>>>>> )
>>>>> tail -f out
>>>>> 
>>>>> has the same effect, idicating that mpirun overrides previous traps in 
>>>>> the same subshell. That's OK too. However the following:
>>>>> 
>>>>> (
>>>>> trap : SIGINT
>>>>> (
>>>>> mpirun command>&   out&
>>>>> )
>>>>> )
>>>>> tail -f out
>>>>> 
>>>>> also has the same effect. How is mpirun overriding the trap in the 
>>>>> *parent* subshell so that it ends up getting the SIGINT that was 
>>>>> supposedly blocked at that level? Am I missing something trivial? How can 
>>>>> I avoid this?
>>>> I keep telling you - you can't. The better way to do this is to execute 
>>>> mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail 
>>>> without mpirun seeing it.
>>>> 
>>>> But you are welcome to not believe me and continue thrashing... :-/
>>>> 
>>>>> Thanks,
>>>>> Pablo
>>>>> 
>>>>> 
>>>>> On 23/04/11 16:27, Ralph Castain wrote:
>>>>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>>>>>> 
>>>>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>>>>>> should continue.
>>>>>>>> I don't think that is true at all. When you hit ctrl-C,
>>>>>>>> every process executing in the script receives it. Mpirun
>>>>>>>> traps the ctrl-c and immediately terminates all running
>>>>>>>> MPI procs.
>>>>>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>>>>>> desired behaviour of the script, not that this is what ought
>>>>>>> to happen in general. My question is how to achieve this
>>>>>>> behaviour, since I'm having trouble working around mpirun
>>>>>>> catching sigint.
>>>>>> Like I said in my other response, you can't - mpirun automatically traps 
>>>>>> sigint and terminates the job in order to ensure proper cleanup during 
>>>>>> abnormal terminations.
>>>>>> 
>>>>>> I'm not sure what you are actually trying to accomplish, but there are 
>>>>>> other signals that don't cause termination. For example, we trap and 
>>>>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
>>>>>> 
>>>>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to 
>>>>>> ignore it.
>>>>>> 
>>>>>> 
>>>>>>> Thanks,
>>>>>>> Pablo
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>>>>>> 
>>>>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The 
>>>>>>>>>> script needs to run an MPI job in the background and tail -f the 
>>>>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should 
>>>>>>>>>> continue.
>>>>>>>> I don't think that is true at all. When you hit ctrl-C, every process 
>>>>>>>> executing in the script receives it. Mpirun traps the ctrl-c and 
>>>>>>>> immediately terminates all running MPI procs.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, 
>>>>>>>>>> and kills the job immediately. I've tried workarounds involving 
>>>>>>>>>> nohup, disown, trap, subshells (including calling the script from 
>>>>>>>>>> within itself), etc, to no avail.
>>>>>>>>>> 
>>>>>>>>>> The problem is that this doesn't happen if I run the command 
>>>>>>>>>> directly instead, without mpirun. Attached is a script that 
>>>>>>>>>> reproduces the problem. It runs a simple counting script in the 
>>>>>>>>>> background which takes 10 seconds to run, and tails the output. If 
>>>>>>>>>> called with "nompi" as first argument, it will simply run bash -c 
>>>>>>>>>> "$SCRIPT">&    "$out"&, and with "mpi" it will do the same with 
>>>>>>>>>> 'mpirun -np 1' prepended. The output I get is:
>>>>>>>>> what about:
>>>>>>>>> 
>>>>>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>>>>>> 
>>>>>>>>> i.e. replace the subshell with changed interrupt handling with the 
>>>>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This 
>>>>>>>>> can be checked in /proc/<pid>/status
>>>>>>>>> 
>>>>>>>>> -- Reuti
>>>>>>>>> 
>>>>>>>>>> $ ./ompi_bug.sh mpi
>>>>>>>>>> mpi:
>>>>>>>>>> 1
>>>>>>>>>> 2
>>>>>>>>>> 3
>>>>>>>>>> 4
>>>>>>>>>> ^C
>>>>>>>>>> $ ./ompi_bug.sh nompi
>>>>>>>>>> nompi:
>>>>>>>>>> 1
>>>>>>>>>> 2
>>>>>>>>>> 3
>>>>>>>>>> 4
>>>>>>>>>> ^C
>>>>>>>>>> $ cat output.*
>>>>>>>>>> mpi:
>>>>>>>>>> 1
>>>>>>>>>> 2
>>>>>>>>>> 3
>>>>>>>>>> 4
>>>>>>>>>> mpirun: killing job...
>>>>>>>>>> 
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme 
>>>>>>>>>> exited on signal 0 (Unknown signal 0).
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>>> 
>>>>>>>>>> nompi:
>>>>>>>>>> 1
>>>>>>>>>> 2
>>>>>>>>>> 3
>>>>>>>>>> 4
>>>>>>>>>> 5
>>>>>>>>>> 6
>>>>>>>>>> 7
>>>>>>>>>> 8
>>>>>>>>>> 9
>>>>>>>>>> 10
>>>>>>>>>> Done
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> This convinces me that there is something strange with OpenMPI, 
>>>>>>>>>> since I expect no difference in signal handling when running a 
>>>>>>>>>> simple command with or without mpirun in the middle.
>>>>>>>>>> 
>>>>>>>>>> I've tried looking for options to change this behaviour, but I don't 
>>>>>>>>>> seem to find any. Is there one, preferably in the form of an 
>>>>>>>>>> environment variable? Or is this a bug?
>>>>>>>>>> 
>>>>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also 
>>>>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Pablo
>>>>>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to