Re: [OMPI users] signal handling with mpirun

2017-11-21 Thread r...@open-mpi.org
Try upgrading to the v3.0, or at least to the latest in the v2.x series. The v1.10 series is legacy and no longer maintained. > On Nov 21, 2017, at 8:20 AM, Kulshrestha, Vipul > wrote: > > Hi, > > I am finding that on Ctrl-C, mpirun immediately stops and does not sends > SIGTERM to the chil

[OMPI users] signal handling with mpirun

2017-11-21 Thread Kulshrestha, Vipul
Hi, I am finding that on Ctrl-C, mpirun immediately stops and does not sends SIGTERM to the child processes. I am using openmpi 1.10.6. The child processes are able to handle SIGINT. I verified that by a printf in my signal handler and then issuing SIGINT to my process directly. However, when

Re: [OMPI users] signal handling

2007-03-13 Thread Reuti
Am 13.03.2007 um 06:01 schrieb Ralph Castain: I've been letting this rattle around in my head some more, and *may* have come up with an idea of what *might* be going on. In the GE environment, qsub only launches the daemons - the daemons are the ones that actually "launch" your local appli

Re: [OMPI users] signal handling (part 2)

2007-03-13 Thread Olesen, Mark
Hi Reuti (and others), > And now the odd thing: the jobscript (with the mpirun) is gone on the > head node of this parallel job, but all the spawned qrsh processes > are still there: I'm glad that someone else can almost reproduce my problem. On the suspicion that my application was not ignoring

Re: [OMPI users] signal handling (part 2)

2007-03-13 Thread Reuti
Am 12.03.2007 um 21:29 schrieb Ralph Castain: But now we are going beyond Mark's initial problem. Back to the initial problem: suspending a parallel job in SGE leads to: 19924 1786 19924 S \_ sge_shepherd-45250 -bg 19926 19924 19926 Ts| \_ /bin/sh /var/spool/sge/node39/ job_script

Re: [OMPI users] signal handling

2007-03-13 Thread Reuti
Am 12.03.2007 um 21:29 schrieb Ralph Castain: On 3/12/07 2:18 PM, "Reuti" wrote: Am 12.03.2007 um 20:36 schrieb Ralph Castain: ORTE propagates the signal to the application processes, but the ORTE daemons never actually look at the signal themselves (looks just like a message to them). So

Re: [OMPI users] signal handling

2007-03-13 Thread Ralph Castain
I've been letting this rattle around in my head some more, and *may* have come up with an idea of what *might* be going on. In the GE environment, qsub only launches the daemons - the daemons are the ones that actually "launch" your local application processes. If qsub -notify uses qsub's knowledg

Re: [OMPI users] signal handling

2007-03-12 Thread Ralph Castain
On 3/12/07 2:18 PM, "Reuti" wrote: > Am 12.03.2007 um 20:36 schrieb Ralph Castain: > >> ORTE propagates the signal to the application processes, but the ORTE >> daemons never actually look at the signal themselves (looks just >> like a >> message to them). So I'm a little puzzled by that erro

Re: [OMPI users] signal handling

2007-03-12 Thread Reuti
Am 12.03.2007 um 20:36 schrieb Ralph Castain: ORTE propagates the signal to the application processes, but the ORTE daemons never actually look at the signal themselves (looks just like a message to them). So I'm a little puzzled by that error message about the "daemon received signal 12" -

Re: [OMPI users] signal handling

2007-03-12 Thread Ralph Castain
It's supposed to be...but several of us have found it "blocking" somewhere in the OPAL subdirectory tree. On 3/12/07 2:06 PM, "Ben Allan" wrote: > A build-related questions about 1.1.4 > Is parallel make usage (make -j 8) supported (at least if make is gnu?). > > thanks, > Ben > > ___

Re: [OMPI users] signal handling

2007-03-12 Thread Ben Allan
A build-related questions about 1.1.4 Is parallel make usage (make -j 8) supported (at least if make is gnu?). thanks, Ben

Re: [OMPI users] signal handling

2007-03-12 Thread Reuti
Am 12.03.2007 um 20:22 schrieb Pak Lui: Hi Mark, Olesen, Mark wrote: I'm testing openmpi 1.2rc1 with GridEngine 6.0u9 and ran into interesting behaviour when using the qsub -notify option. With -notify, USR1 and USR2 are sent X seconds before sending STOP and KILL signals, respectively.

Re: [OMPI users] signal handling

2007-03-12 Thread Ralph Castain
ORTE propagates the signal to the application processes, but the ORTE daemons never actually look at the signal themselves (looks just like a message to them). So I'm a little puzzled by that error message about the "daemon received signal 12" - I suspect that's just a misleading message that was s

Re: [OMPI users] signal handling

2007-03-12 Thread Reuti
Am 12.03.2007 um 19:55 schrieb Ralph Castain: I'll have to look into it - I suspect this is simply an erroneous message and that no daemon is actually being started. I'm not entirely sure I understand what's happening, though, in your code. Are you saying that mpirun starts some number of a

Re: [OMPI users] signal handling

2007-03-12 Thread Pak Lui
Hi Mark, Olesen, Mark wrote: I'm testing openmpi 1.2rc1 with GridEngine 6.0u9 and ran into interesting behaviour when using the qsub -notify option. With -notify, USR1 and USR2 are sent X seconds before sending STOP and KILL signals, respectively. When the USR2 signal is sent to the process gro

Re: [OMPI users] signal handling

2007-03-12 Thread Ralph Castain
I'll have to look into it - I suspect this is simply an erroneous message and that no daemon is actually being started. I'm not entirely sure I understand what's happening, though, in your code. Are you saying that mpirun starts some number of application processes which run merrily along, and the

[OMPI users] signal handling

2007-03-12 Thread Olesen, Mark
I'm testing openmpi 1.2rc1 with GridEngine 6.0u9 and ran into interesting behaviour when using the qsub -notify option. With -notify, USR1 and USR2 are sent X seconds before sending STOP and KILL signals, respectively. When the USR2 signal is sent to the process group with the mpirun process, I re