On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:

>> I'm not sure what you are actually trying to accomplish
> 
> I simply want a script that runs the equivalent of:
> 
> mpirun command>&  out&
> tail -f out
> 
> such that hitting Ctrl+C stops tail but leaves mpirun running. I can 
> certainly do this without mpirun,

I don't think that's true. If both commands are in a script, then at least for 
me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.

At least when I test it, even non-mpirun processes will abort.

> it's not unreasonable to expect to be able to do the same with mpirun.

I'm afraid it won't work, per my earlier comments.

> I need mpirun to either ignore the SIGINT or not receive it at all -- and as 
> per your comments, ignoring it is not an option.
> 
> Let me rephrase my question then. With the following script:
> 
> mpirun command>&  out&
> tail -f out
> 
> SIGINT stops tail AND mpirun. That's OK. The following:
> 
> (
> trap : SIGINT
> mpirun command>&  out&
> )
> tail -f out
> 
> has the same effect, idicating that mpirun overrides previous traps in the 
> same subshell. That's OK too. However the following:
> 
> (
> trap : SIGINT
> (
>  mpirun command>&  out&
> )
> )
> tail -f out
> 
> also has the same effect. How is mpirun overriding the trap in the *parent* 
> subshell so that it ends up getting the SIGINT that was supposedly blocked at 
> that level? Am I missing something trivial? How can I avoid this?

I keep telling you - you can't. The better way to do this is to execute mpirun, 
and then run tail in a -separate- command. Now you can ctrl-c tail without 
mpirun seeing it.

But you are welcome to not believe me and continue thrashing... :-/

> 
> Thanks,
> Pablo
> 
> 
> On 23/04/11 16:27, Ralph Castain wrote:
>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>> 
>>>>>  Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>>  should continue.
>>>> I don't think that is true at all. When you hit ctrl-C,
>>>> every process executing in the script receives it. Mpirun
>>>> traps the ctrl-c and immediately terminates all running
>>>> MPI procs.
>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>> desired behaviour of the script, not that this is what ought
>>> to happen in general. My question is how to achieve this
>>> behaviour, since I'm having trouble working around mpirun
>>> catching sigint.
>> Like I said in my other response, you can't - mpirun automatically traps 
>> sigint and terminates the job in order to ensure proper cleanup during 
>> abnormal terminations.
>> 
>> I'm not sure what you are actually trying to accomplish, but there are other 
>> signals that don't cause termination. For example, we trap and forward 
>> SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
>> 
>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to 
>> ignore it.
>> 
>> 
>>> Thanks,
>>> Pablo
>>> 
>>> 
>>> 
>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>> 
>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The 
>>>>>> script needs to run an MPI job in the background and tail -f the output. 
>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
>>>> I don't think that is true at all. When you hit ctrl-C, every process 
>>>> executing in the script receives it. Mpirun traps the ctrl-c and 
>>>> immediately terminates all running MPI procs.
>>>> 
>>>> 
>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, and 
>>>>>> kills the job immediately. I've tried workarounds involving nohup, 
>>>>>> disown, trap, subshells (including calling the script from within 
>>>>>> itself), etc, to no avail.
>>>>>> 
>>>>>> The problem is that this doesn't happen if I run the command directly 
>>>>>> instead, without mpirun. Attached is a script that reproduces the 
>>>>>> problem. It runs a simple counting script in the background which takes 
>>>>>> 10 seconds to run, and tails the output. If called with "nompi" as first 
>>>>>> argument, it will simply run bash -c "$SCRIPT">&   "$out"&, and with 
>>>>>> "mpi" it will do the same with 'mpirun -np 1' prepended. The output I 
>>>>>> get is:
>>>>> what about:
>>>>> 
>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>> 
>>>>> i.e. replace the subshell with changed interrupt handling with the 
>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can 
>>>>> be checked in /proc/<pid>/status
>>>>> 
>>>>> -- Reuti
>>>>> 
>>>>>> $ ./ompi_bug.sh mpi
>>>>>> mpi:
>>>>>> 1
>>>>>> 2
>>>>>> 3
>>>>>> 4
>>>>>> ^C
>>>>>> $ ./ompi_bug.sh nompi
>>>>>> nompi:
>>>>>> 1
>>>>>> 2
>>>>>> 3
>>>>>> 4
>>>>>> ^C
>>>>>> $ cat output.*
>>>>>> mpi:
>>>>>> 1
>>>>>> 2
>>>>>> 3
>>>>>> 4
>>>>>> mpirun: killing job...
>>>>>> 
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme exited 
>>>>>> on signal 0 (Unknown signal 0).
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun: clean termination accomplished
>>>>>> 
>>>>>> nompi:
>>>>>> 1
>>>>>> 2
>>>>>> 3
>>>>>> 4
>>>>>> 5
>>>>>> 6
>>>>>> 7
>>>>>> 8
>>>>>> 9
>>>>>> 10
>>>>>> Done
>>>>>> 
>>>>>> 
>>>>>> This convinces me that there is something strange with OpenMPI, since I 
>>>>>> expect no difference in signal handling when running a simple command 
>>>>>> with or without mpirun in the middle.
>>>>>> 
>>>>>> I've tried looking for options to change this behaviour, but I don't 
>>>>>> seem to find any. Is there one, preferably in the form of an environment 
>>>>>> variable? Or is this a bug?
>>>>>> 
>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also 
>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>> 
>>>>>> Thanks,
>>>>>> Pablo
>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to