Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-16 Thread Ted Sussman
Hello Gilles and Ralph,

Thank you for your advice so far.  I appreciate the time that you have spent to 
educate me
about the details of Open MPI.

But I think that there is something fundamental that I don't understand.  
Consider Example 2
run with Open MPI 2.1.1.

mpirun --> shell for process 0 -->  executable for process 0 --> MPI calls, 
MPI_Abort
   --> shell for process 1 -->  executable for process 1 --> MPI calls

After the MPI_Abort is called, ps shows that both shells are running, and that 
the executable
for process 1 is running (in this case, process 1 is sleeping).  And mpirun 
does not exit until
process 1 is finished sleeping.

I cannot reconcile this observed behavior with the statement

> > 2.x: each process is put into its own process group upon launch. 
> When we issue a
> > "kill", we issue it to the process group. Thus, every child proc of 
> that child proc will
> > receive it. IIRC, this was the intended behavior.

I assume that, for my example, there are two process groups.  The process group 
for
process 0 contains the shell for process 0 and the executable for process 0; 
and the process
group for process 1 contains the shell for process 1 and the executable for 
process 1.  So
what does MPI_ABORT do?  MPI_ABORT does not kill the process group for process 
0,
since the shell for process 0 continues.  And MPI_ABORT does not kill the 
process group for
process 1, since both the shell and executable for process 1 continue.

If I hit Ctrl-C after MPI_Abort is called, I get the message

mpirun: abort is already in progress.. hit ctrl-c again to forcibly terminate

but I don't need to hit Ctrl-C again because mpirun immediately exits.

Can you shed some light on all of this?

Sincerely,

Ted Sussman


On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote:

>
> You have to understand that we have no way of knowing who is making MPI calls 
> - all we see is
> the proc that we started, and we know someone of that rank is running (but we 
> have no way of
> knowing which of the procs you sub-spawned it is).
>
> So the behavior you are seeking only occurred in some earlier release by 
> sheer accident. Nor will
> you find it portable as there is no specification directing that behavior.
>
> The behavior I´ve provided is to either deliver the signal to _all_ child 
> processes (including
> grandchildren etc.), or _only_ the immediate child of the daemon. It won´t do 
> what you describe -
> kill the mPI proc underneath the shell, but not the shell itself.
>
> What you can eventually do is use PMIx to ask the runtime to selectively 
> deliver signals to
> pid/procs for you. We don´t have that capability implemented just yet, I´m 
> afraid.
>
> Meantime, when I get a chance, I can code an option that will record the pid 
> of the subproc that
> calls MPI_Init, and then let´s you deliver signals to just that proc. No 
> promises as to when that will
> be done.
>
>
> On Jun 15, 2017, at 1:37 PM, Ted Sussman  wrote:
>
> Hello Ralph,
>
> I am just an Open MPI end user, so I will need to wait for the next 
> official release.
>
> mpirun --> shell for process 0 -->  executable for process 0 --> MPI calls
>    --> shell for process 1 -->  executable for process 1 --> MPI calls
>         ...
>
> I guess the question is, should MPI_ABORT kill the executables or the 
> shells?  I naively
> thought, that, since it is the executables that make the MPI calls, it is 
> the executables that
> should be aborted by the call to MPI_ABORT.  Since the shells don't make 
> MPI calls, the
> shells should not be aborted.
>
> And users might have several layers of shells in between mpirun and the 
> executable.
>
> So now I will look for the latest version of Open MPI that has the 1.4.3 
> behavior.
>
> Sincerely,
>
> Ted Sussman
>
> On 15 Jun 2017 at 12:31, r...@open-mpi.org wrote:
>
> >
> > Yeah, things jittered a little there as we debated the "right" 
> behavior. Generally, when we
> see that
> > happening it means that a param is required, but somehow we never 
> reached that point.
> >
> > See if https://github.com/open-mpi/ompi/pull/3704  helps - if so, I can 
> schedule it for the next
> 2.x
> > release if the RMs agree to take it
> >
> > Ralph
> >
> > On Jun 15, 2017, at 12:20 PM, Ted Sussman  
> wrote:
> >
> > Thank you for your comments.
> >
> > Our application relies upon "dum.sh" to clean up after the process 
> exits, either if the
> process
> > exits normally, or if the process exits abnormally because of 
> MPI_ABORT.  If the process
> > group is killed by MPI_ABORT, this clean up will not be performed.  
> If exec is used to launch
> > the executable from dum.sh, then dum.sh is terminated by the exec, 
> so dum.sh cannot
> > perform any clean up.
> >
> > I supp

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-16 Thread Jeff Squyres (jsquyres)
Ted --

Sorry for jumping in late.  Here's my $0.02...

In the runtime, we can do 4 things:

1. Kill just the process that we forked.
2. Kill just the process(es) that call back and identify themselves as MPI 
processes (we don't track this right now, but we could add that functionality).
3. Union of #1 and #2.
4. Kill all processes (to include any intermediate processes that are not 
included in #1 and #2).

In Open MPI 2.x, #4 is the intended behavior.  There may be a bug or two that 
needs to get fixed (e.g., in your last mail, I don't see offhand why it waits 
until the MPI process finishes sleeping), but we should be killing the process 
group, which -- unless any of the descendant processes have explicitly left the 
process group -- should hit the entire process tree.  

Sidenote: there's actually a way to be a bit more aggressive and do a better 
job of ensuring that we kill *all* processes (via creative use of 
PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / 
optimization.

I think Gilles and Ralph proposed a good point to you: if you want to be sure 
to be able to do cleanup after an MPI process terminates (normally or 
abnormally), you should trap signals in your intermediate processes to catch 
what Open MPI's runtime throws and therefore know that it is time to cleanup.  

Hypothetically, this should work in all versions of Open MPI...?

I think Ralph made a pull request that adds an MCA param to change the default 
behavior from #4 to #1.

Note, however, that there's a little time between when Open MPI sends the 
SIGTERM and the SIGKILL, so this solution could be racy.  If you find that 
you're running out of time to cleanup, we might be able to make the delay 
between the SIGTERM and SIGKILL be configurable (e.g., via MCA param).




> On Jun 16, 2017, at 10:08 AM, Ted Sussman  wrote:
> 
> Hello Gilles and Ralph,
> 
> Thank you for your advice so far.  I appreciate the time that you have spent 
> to educate me about the details of Open MPI.
> 
> But I think that there is something fundamental that I don't understand.  
> Consider Example 2 run with Open MPI 2.1.1. 
> 
> mpirun --> shell for process 0 -->  executable for process 0 --> MPI calls, 
> MPI_Abort
>--> shell for process 1 -->  executable for process 1 --> MPI calls
> 
> After the MPI_Abort is called, ps shows that both shells are running, and 
> that the executable for process 1 is running (in this case, process 1 is 
> sleeping).  And mpirun does not exit until process 1 is finished sleeping.
> 
> I cannot reconcile this observed behavior with the statement
> 
> > > 2.x: each process is put into its own process group upon launch. 
> > When we issue a
> > > "kill", we issue it to the process group. Thus, every child proc 
> > of that child proc will
> > > receive it. IIRC, this was the intended behavior.
> 
> I assume that, for my example, there are two process groups.  The process 
> group for process 0 contains the shell for process 0 and the executable for 
> process 0; and the process group for process 1 contains the shell for process 
> 1 and the executable for process 1.  So what does MPI_ABORT do?  MPI_ABORT 
> does not kill the process group for process 0, since the shell for process 0 
> continues.  And MPI_ABORT does not kill the process group for process 1, 
> since both the shell and executable for process 1 continue.
> 
> If I hit Ctrl-C after MPI_Abort is called, I get the message
> 
> mpirun: abort is already in progress.. hit ctrl-c again to forcibly terminate
> 
> but I don't need to hit Ctrl-C again because mpirun immediately exits.
> 
> Can you shed some light on all of this?
> 
> Sincerely,
> 
> Ted Sussman
> 
> 
> On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote:
> 
> >
> > You have to understand that we have no way of knowing who is making MPI 
> > calls - all we see is
> > the proc that we started, and we know someone of that rank is running (but 
> > we have no way of
> > knowing which of the procs you sub-spawned it is).
> >
> > So the behavior you are seeking only occurred in some earlier release by 
> > sheer accident. Nor will
> > you find it portable as there is no specification directing that behavior.
> >
> > The behavior I’ve provided is to either deliver the signal to _all_ child 
> > processes (including
> > grandchildren etc.), or _only_ the immediate child of the daemon. It won’t 
> > do what you describe -
> > kill the mPI proc underneath the shell, but not the shell itself.
> >
> > What you can eventually do is use PMIx to ask the runtime to selectively 
> > deliver signals to
> > pid/procs for you. We don’t have that capability implemented just yet, I’m 
> > afraid.
> >
> > Meantime, when I get a chance, I can code an option that will record the 
> > pid of the subproc that
> > calls MPI_Init, and then let’s you deliver signals to just that proc. No 
> > promises as to when that will
> > be done.
> >
> >
> > On Jun 15, 2017, at 1:37 PM, T

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-16 Thread Ted Sussman
Hello Jeff,

Thanks for your comments.

I am not seeing behavior #4, on the two computers that I have tested on, using 
Open MPI 
2.1.1.

I wonder if you can duplicate my results with the files that I have uploaded.

Regarding what is the "correct" behavior, I am willing to modify my application 
to correspond 
to Open MPI's behavior (whatever behavior the Open MPI developers decide is 
best) -- 
provided that Open MPI does in fact kill off both shells.

So my highest priority now is to find out why Open MPI 2.1.1 does not kill off 
both shells on 
my computer.

Sincerely,

Ted Sussman

 On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:

> Ted --
> 
> Sorry for jumping in late.  Here's my $0.02...
> 
> In the runtime, we can do 4 things:
> 
> 1. Kill just the process that we forked.
> 2. Kill just the process(es) that call back and identify themselves as MPI 
> processes (we don't track this right now, but we could add that 
> functionality).
> 3. Union of #1 and #2.
> 4. Kill all processes (to include any intermediate processes that are not 
> included in #1 and #2).
> 
> In Open MPI 2.x, #4 is the intended behavior.  There may be a bug or two that 
> needs to get fixed (e.g., in your last mail, I don't see offhand why it waits 
> until the MPI process finishes sleeping), but we should be killing the 
> process group, which -- unless any of the descendant processes have 
> explicitly left the process group -- should hit the entire process tree.  
> 
> Sidenote: there's actually a way to be a bit more aggressive and do a better 
> job of ensuring that we kill *all* processes (via creative use of 
> PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / 
> optimization.
> 
> I think Gilles and Ralph proposed a good point to you: if you want to be sure 
> to be able to do cleanup after an MPI process terminates (normally or 
> abnormally), you should trap signals in your intermediate processes to catch 
> what Open MPI's runtime throws and therefore know that it is time to cleanup. 
>  
> 
> Hypothetically, this should work in all versions of Open MPI...?
> 
> I think Ralph made a pull request that adds an MCA param to change the 
> default behavior from #4 to #1.
> 
> Note, however, that there's a little time between when Open MPI sends the 
> SIGTERM and the SIGKILL, so this solution could be racy.  If you find that 
> you're running out of time to cleanup, we might be able to make the delay 
> between the SIGTERM and SIGKILL be configurable (e.g., via MCA param).
> 
> 
> 
> 
> > On Jun 16, 2017, at 10:08 AM, Ted Sussman  wrote:
> > 
> > Hello Gilles and Ralph,
> > 
> > Thank you for your advice so far.  I appreciate the time that you have 
> > spent to educate me about the details of Open MPI.
> > 
> > But I think that there is something fundamental that I don't understand.  
> > Consider Example 2 run with Open MPI 2.1.1. 
> > 
> > mpirun --> shell for process 0 -->  executable for process 0 --> MPI calls, 
> > MPI_Abort
> >--> shell for process 1 -->  executable for process 1 --> MPI calls
> > 
> > After the MPI_Abort is called, ps shows that both shells are running, and 
> > that the executable for process 1 is running (in this case, process 1 is 
> > sleeping).  And mpirun does not exit until process 1 is finished sleeping.
> > 
> > I cannot reconcile this observed behavior with the statement
> > 
> > > > 2.x: each process is put into its own process group upon 
> > > launch. When we issue a
> > > > "kill", we issue it to the process group. Thus, every child 
> > > proc of that child proc will
> > > > receive it. IIRC, this was the intended behavior.
> > 
> > I assume that, for my example, there are two process groups.  The process 
> > group for process 0 contains the shell for process 0 and the executable for 
> > process 0; and the process group for process 1 contains the shell for 
> > process 1 and the executable for process 1.  So what does MPI_ABORT do?  
> > MPI_ABORT does not kill the process group for process 0, since the shell 
> > for process 0 continues.  And MPI_ABORT does not kill the process group for 
> > process 1, since both the shell and executable for process 1 continue.
> > 
> > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > 
> > mpirun: abort is already in progress.. hit ctrl-c again to forcibly 
> > terminate
> > 
> > but I don't need to hit Ctrl-C again because mpirun immediately exits.
> > 
> > Can you shed some light on all of this?
> > 
> > Sincerely,
> > 
> > Ted Sussman
> > 
> > 
> > On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote:
> > 
> > >
> > > You have to understand that we have no way of knowing who is making MPI 
> > > calls - all we see is
> > > the proc that we started, and we know someone of that rank is running 
> > > (but we have no way of
> > > knowing which of the procs you sub-spawned it is).
> > >
> > > So the behavior you are seeking only occurred in some earlier release by 
> > > shee

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-16 Thread gilles
Ted,

if you

mpirun --mca odls_base_verbose 10 ...

you will see which processes get killed and how

Best regards,


Gilles

- Original Message -
> Hello Jeff,
> 
> Thanks for your comments.
> 
> I am not seeing behavior #4, on the two computers that I have tested 
on, using Open MPI 
> 2.1.1.
> 
> I wonder if you can duplicate my results with the files that I have 
uploaded.
> 
> Regarding what is the "correct" behavior, I am willing to modify my 
application to correspond 
> to Open MPI's behavior (whatever behavior the Open MPI developers 
decide is best) -- 
> provided that Open MPI does in fact kill off both shells.
> 
> So my highest priority now is to find out why Open MPI 2.1.1 does not 
kill off both shells on 
> my computer.
> 
> Sincerely,
> 
> Ted Sussman
> 
>  On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> 
> > Ted --
> > 
> > Sorry for jumping in late.  Here's my $0.02...
> > 
> > In the runtime, we can do 4 things:
> > 
> > 1. Kill just the process that we forked.
> > 2. Kill just the process(es) that call back and identify themselves 
as MPI processes (we don't track this right now, but we could add that 
functionality).
> > 3. Union of #1 and #2.
> > 4. Kill all processes (to include any intermediate processes that 
are not included in #1 and #2).
> > 
> > In Open MPI 2.x, #4 is the intended behavior.  There may be a bug or 
two that needs to get fixed (e.g., in your last mail, I don't see 
offhand why it waits until the MPI process finishes sleeping), but we 
should be killing the process group, which -- unless any of the 
descendant processes have explicitly left the process group -- should 
hit the entire process tree.  
> > 
> > Sidenote: there's actually a way to be a bit more aggressive and do 
a better job of ensuring that we kill *all* processes (via creative use 
of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / 
optimization.
> > 
> > I think Gilles and Ralph proposed a good point to you: if you want 
to be sure to be able to do cleanup after an MPI process terminates (
normally or abnormally), you should trap signals in your intermediate 
processes to catch what Open MPI's runtime throws and therefore know 
that it is time to cleanup.  
> > 
> > Hypothetically, this should work in all versions of Open MPI...?
> > 
> > I think Ralph made a pull request that adds an MCA param to change 
the default behavior from #4 to #1.
> > 
> > Note, however, that there's a little time between when Open MPI 
sends the SIGTERM and the SIGKILL, so this solution could be racy.  If 
you find that you're running out of time to cleanup, we might be able to 
make the delay between the SIGTERM and SIGKILL be configurable (e.g., 
via MCA param).
> > 
> > 
> > 
> > 
> > > On Jun 16, 2017, at 10:08 AM, Ted Sussman  
wrote:
> > > 
> > > Hello Gilles and Ralph,
> > > 
> > > Thank you for your advice so far.  I appreciate the time that you 
have spent to educate me about the details of Open MPI.
> > > 
> > > But I think that there is something fundamental that I don't 
understand.  Consider Example 2 run with Open MPI 2.1.1. 
> > > 
> > > mpirun --> shell for process 0 -->  executable for process 0 --> 
MPI calls, MPI_Abort
> > >--> shell for process 1 -->  executable for process 1 --> 
MPI calls
> > > 
> > > After the MPI_Abort is called, ps shows that both shells are 
running, and that the executable for process 1 is running (in this case, 
process 1 is sleeping).  And mpirun does not exit until process 1 is 
finished sleeping.
> > > 
> > > I cannot reconcile this observed behavior with the statement
> > > 
> > > > > 2.x: each process is put into its own process group 
upon launch. When we issue a
> > > > > "kill", we issue it to the process group. Thus, every 
child proc of that child proc will
> > > > > receive it. IIRC, this was the intended behavior.
> > > 
> > > I assume that, for my example, there are two process groups.  The 
process group for process 0 contains the shell for process 0 and the 
executable for process 0; and the process group for process 1 contains 
the shell for process 1 and the executable for process 1.  So what does 
MPI_ABORT do?  MPI_ABORT does not kill the process group for process 0, 
since the shell for process 0 continues.  And MPI_ABORT does not kill 
the process group for process 1, since both the shell and executable for 
process 1 continue.
> > > 
> > > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > > 
> > > mpirun: abort is already in progress.. hit ctrl-c again to 
forcibly terminate
> > > 
> > > but I don't need to hit Ctrl-C again because mpirun immediately 
exits.
> > > 
> > > Can you shed some light on all of this?
> > > 
> > > Sincerely,
> > > 
> > > Ted Sussman
> > > 
> > > 
> > > On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote:
> > > 
> > > >
> > > > You have to understand that we have no way of knowing who is 
making MPI calls - all we see is
> > > > the proc that we s