Am 23.04.2011 um 19:33 schrieb Ralph Castain:

> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:
> 
>>> I'm not sure what you are actually trying to accomplish
>> 
>> I simply want a script that runs the equivalent of:
>> 
>> mpirun command>&  out&
>> tail -f out
>> 
>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can 
>> certainly do this without mpirun,
> 
> I don't think that's true. If both commands are in a script, then at least 
> for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- 
> processes.

What about setsid and pushing it in a new seesion instead of using & in the 
script?

-- Reuti


> 
> At least when I test it, even non-mpirun processes will abort.
> 
>> it's not unreasonable to expect to be able to do the same with mpirun.
> 
> I'm afraid it won't work, per my earlier comments.
> 
>> I need mpirun to either ignore the SIGINT or not receive it at all -- and as 
>> per your comments, ignoring it is not an option.
>> 
>> Let me rephrase my question then. With the following script:
>> 
>> mpirun command>&  out&
>> tail -f out
>> 
>> SIGINT stops tail AND mpirun. That's OK. The following:
>> 
>> (
>> trap : SIGINT
>> mpirun command>&  out&
>> )
>> tail -f out
>> 
>> has the same effect, idicating that mpirun overrides previous traps in the 
>> same subshell. That's OK too. However the following:
>> 
>> (
>> trap : SIGINT
>> (
>> mpirun command>&  out&
>> )
>> )
>> tail -f out
>> 
>> also has the same effect. How is mpirun overriding the trap in the *parent* 
>> subshell so that it ends up getting the SIGINT that was supposedly blocked 
>> at that level? Am I missing something trivial? How can I avoid this?
> 
> I keep telling you - you can't. The better way to do this is to execute 
> mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail 
> without mpirun seeing it.
> 
> But you are welcome to not believe me and continue thrashing... :-/
> 
>> 
>> Thanks,
>> Pablo
>> 
>> 
>> On 23/04/11 16:27, Ralph Castain wrote:
>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>>> 
>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>>> should continue.
>>>>> I don't think that is true at all. When you hit ctrl-C,
>>>>> every process executing in the script receives it. Mpirun
>>>>> traps the ctrl-c and immediately terminates all running
>>>>> MPI procs.
>>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>>> desired behaviour of the script, not that this is what ought
>>>> to happen in general. My question is how to achieve this
>>>> behaviour, since I'm having trouble working around mpirun
>>>> catching sigint.
>>> Like I said in my other response, you can't - mpirun automatically traps 
>>> sigint and terminates the job in order to ensure proper cleanup during 
>>> abnormal terminations.
>>> 
>>> I'm not sure what you are actually trying to accomplish, but there are 
>>> other signals that don't cause termination. For example, we trap and 
>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
>>> 
>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to 
>>> ignore it.
>>> 
>>> 
>>>> Thanks,
>>>> Pablo
>>>> 
>>>> 
>>>> 
>>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>>> 
>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The 
>>>>>>> script needs to run an MPI job in the background and tail -f the 
>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should 
>>>>>>> continue.
>>>>> I don't think that is true at all. When you hit ctrl-C, every process 
>>>>> executing in the script receives it. Mpirun traps the ctrl-c and 
>>>>> immediately terminates all running MPI procs.
>>>>> 
>>>>> 
>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, and 
>>>>>>> kills the job immediately. I've tried workarounds involving nohup, 
>>>>>>> disown, trap, subshells (including calling the script from within 
>>>>>>> itself), etc, to no avail.
>>>>>>> 
>>>>>>> The problem is that this doesn't happen if I run the command directly 
>>>>>>> instead, without mpirun. Attached is a script that reproduces the 
>>>>>>> problem. It runs a simple counting script in the background which takes 
>>>>>>> 10 seconds to run, and tails the output. If called with "nompi" as 
>>>>>>> first argument, it will simply run bash -c "$SCRIPT">&   "$out"&, and 
>>>>>>> with "mpi" it will do the same with 'mpirun -np 1' prepended. The 
>>>>>>> output I get is:
>>>>>> what about:
>>>>>> 
>>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>>> 
>>>>>> i.e. replace the subshell with changed interrupt handling with the 
>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can 
>>>>>> be checked in /proc/<pid>/status
>>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>>> $ ./ompi_bug.sh mpi
>>>>>>> mpi:
>>>>>>> 1
>>>>>>> 2
>>>>>>> 3
>>>>>>> 4
>>>>>>> ^C
>>>>>>> $ ./ompi_bug.sh nompi
>>>>>>> nompi:
>>>>>>> 1
>>>>>>> 2
>>>>>>> 3
>>>>>>> 4
>>>>>>> ^C
>>>>>>> $ cat output.*
>>>>>>> mpi:
>>>>>>> 1
>>>>>>> 2
>>>>>>> 3
>>>>>>> 4
>>>>>>> mpirun: killing job...
>>>>>>> 
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme 
>>>>>>> exited on signal 0 (Unknown signal 0).
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpirun: clean termination accomplished
>>>>>>> 
>>>>>>> nompi:
>>>>>>> 1
>>>>>>> 2
>>>>>>> 3
>>>>>>> 4
>>>>>>> 5
>>>>>>> 6
>>>>>>> 7
>>>>>>> 8
>>>>>>> 9
>>>>>>> 10
>>>>>>> Done
>>>>>>> 
>>>>>>> 
>>>>>>> This convinces me that there is something strange with OpenMPI, since I 
>>>>>>> expect no difference in signal handling when running a simple command 
>>>>>>> with or without mpirun in the middle.
>>>>>>> 
>>>>>>> I've tried looking for options to change this behaviour, but I don't 
>>>>>>> seem to find any. Is there one, preferably in the form of an 
>>>>>>> environment variable? Or is this a bug?
>>>>>>> 
>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also 
>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Pablo
>>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to