On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote:

>> What about setsid and pushing it in a new
>> seesion instead of using&  in the script?
> 
> :-) That works. Thanks!
> 
> NB, the working script looks like:
> 
> setsid bash -c "mpirun command>&  out"&
> tail -f out
> 

Yes - but now you can't kill mpirun when something goes wrong....<shrug>

> Thanks,
> Pablo
> 
> 
> On 23/04/11 18:39, Reuti wrote:
>> Am 23.04.2011 um 19:33 schrieb Ralph Castain:
>> 
>>> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:
>>> 
>>>>> I'm not sure what you are actually trying to accomplish
>>>> I simply want a script that runs the equivalent of:
>>>> 
>>>> mpirun command>&   out&
>>>> tail -f out
>>>> 
>>>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can 
>>>> certainly do this without mpirun,
>>> I don't think that's true. If both commands are in a script, then at least 
>>> for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- 
>>> processes.
>> What about setsid and pushing it in a new seesion instead of using&  in the 
>> script?
>> 
>> -- Reuti
>> 
>> 
>>> At least when I test it, even non-mpirun processes will abort.
>>> 
>>>> it's not unreasonable to expect to be able to do the same with mpirun.
>>> I'm afraid it won't work, per my earlier comments.
>>> 
>>>> I need mpirun to either ignore the SIGINT or not receive it at all -- and 
>>>> as per your comments, ignoring it is not an option.
>>>> 
>>>> Let me rephrase my question then. With the following script:
>>>> 
>>>> mpirun command>&   out&
>>>> tail -f out
>>>> 
>>>> SIGINT stops tail AND mpirun. That's OK. The following:
>>>> 
>>>> (
>>>> trap : SIGINT
>>>> mpirun command>&   out&
>>>> )
>>>> tail -f out
>>>> 
>>>> has the same effect, idicating that mpirun overrides previous traps in the 
>>>> same subshell. That's OK too. However the following:
>>>> 
>>>> (
>>>> trap : SIGINT
>>>> (
>>>> mpirun command>&   out&
>>>> )
>>>> )
>>>> tail -f out
>>>> 
>>>> also has the same effect. How is mpirun overriding the trap in the 
>>>> *parent* subshell so that it ends up getting the SIGINT that was 
>>>> supposedly blocked at that level? Am I missing something trivial? How can 
>>>> I avoid this?
>>> I keep telling you - you can't. The better way to do this is to execute 
>>> mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail 
>>> without mpirun seeing it.
>>> 
>>> But you are welcome to not believe me and continue thrashing... :-/
>>> 
>>>> Thanks,
>>>> Pablo
>>>> 
>>>> 
>>>> On 23/04/11 16:27, Ralph Castain wrote:
>>>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>>>>> 
>>>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>>>>> should continue.
>>>>>>> I don't think that is true at all. When you hit ctrl-C,
>>>>>>> every process executing in the script receives it. Mpirun
>>>>>>> traps the ctrl-c and immediately terminates all running
>>>>>>> MPI procs.
>>>>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>>>>> desired behaviour of the script, not that this is what ought
>>>>>> to happen in general. My question is how to achieve this
>>>>>> behaviour, since I'm having trouble working around mpirun
>>>>>> catching sigint.
>>>>> Like I said in my other response, you can't - mpirun automatically traps 
>>>>> sigint and terminates the job in order to ensure proper cleanup during 
>>>>> abnormal terminations.
>>>>> 
>>>>> I'm not sure what you are actually trying to accomplish, but there are 
>>>>> other signals that don't cause termination. For example, we trap and 
>>>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
>>>>> 
>>>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to 
>>>>> ignore it.
>>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> Pablo
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>>>>> 
>>>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The 
>>>>>>>>> script needs to run an MPI job in the background and tail -f the 
>>>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should 
>>>>>>>>> continue.
>>>>>>> I don't think that is true at all. When you hit ctrl-C, every process 
>>>>>>> executing in the script receives it. Mpirun traps the ctrl-c and 
>>>>>>> immediately terminates all running MPI procs.
>>>>>>> 
>>>>>>> 
>>>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, 
>>>>>>>>> and kills the job immediately. I've tried workarounds involving 
>>>>>>>>> nohup, disown, trap, subshells (including calling the script from 
>>>>>>>>> within itself), etc, to no avail.
>>>>>>>>> 
>>>>>>>>> The problem is that this doesn't happen if I run the command directly 
>>>>>>>>> instead, without mpirun. Attached is a script that reproduces the 
>>>>>>>>> problem. It runs a simple counting script in the background which 
>>>>>>>>> takes 10 seconds to run, and tails the output. If called with "nompi" 
>>>>>>>>> as first argument, it will simply run bash -c "$SCRIPT">&    "$out"&, 
>>>>>>>>> and with "mpi" it will do the same with 'mpirun -np 1' prepended. The 
>>>>>>>>> output I get is:
>>>>>>>> what about:
>>>>>>>> 
>>>>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>>>>> 
>>>>>>>> i.e. replace the subshell with changed interrupt handling with the 
>>>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This 
>>>>>>>> can be checked in /proc/<pid>/status
>>>>>>>> 
>>>>>>>> -- Reuti
>>>>>>>> 
>>>>>>>>> $ ./ompi_bug.sh mpi
>>>>>>>>> mpi:
>>>>>>>>> 1
>>>>>>>>> 2
>>>>>>>>> 3
>>>>>>>>> 4
>>>>>>>>> ^C
>>>>>>>>> $ ./ompi_bug.sh nompi
>>>>>>>>> nompi:
>>>>>>>>> 1
>>>>>>>>> 2
>>>>>>>>> 3
>>>>>>>>> 4
>>>>>>>>> ^C
>>>>>>>>> $ cat output.*
>>>>>>>>> mpi:
>>>>>>>>> 1
>>>>>>>>> 2
>>>>>>>>> 3
>>>>>>>>> 4
>>>>>>>>> mpirun: killing job...
>>>>>>>>> 
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme 
>>>>>>>>> exited on signal 0 (Unknown signal 0).
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>> 
>>>>>>>>> nompi:
>>>>>>>>> 1
>>>>>>>>> 2
>>>>>>>>> 3
>>>>>>>>> 4
>>>>>>>>> 5
>>>>>>>>> 6
>>>>>>>>> 7
>>>>>>>>> 8
>>>>>>>>> 9
>>>>>>>>> 10
>>>>>>>>> Done
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This convinces me that there is something strange with OpenMPI, since 
>>>>>>>>> I expect no difference in signal handling when running a simple 
>>>>>>>>> command with or without mpirun in the middle.
>>>>>>>>> 
>>>>>>>>> I've tried looking for options to change this behaviour, but I don't 
>>>>>>>>> seem to find any. Is there one, preferably in the form of an 
>>>>>>>>> environment variable? Or is this a bug?
>>>>>>>>> 
>>>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also 
>>>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Pablo
>>>>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to