Ted,

if you

mpirun --mca odls_base_verbose 10 ...

you will see which processes get killed and how

Best regards,


Gilles

----- Original Message -----
> Hello Jeff,
> 
> Thanks for your comments.
> 
> I am not seeing behavior #4, on the two computers that I have tested 
on, using Open MPI 
> 2.1.1.
> 
> I wonder if you can duplicate my results with the files that I have 
uploaded.
> 
> Regarding what is the "correct" behavior, I am willing to modify my 
application to correspond 
> to Open MPI's behavior (whatever behavior the Open MPI developers 
decide is best) -- 
> provided that Open MPI does in fact kill off both shells.
> 
> So my highest priority now is to find out why Open MPI 2.1.1 does not 
kill off both shells on 
> my computer.
> 
> Sincerely,
> 
> Ted Sussman
> 
>  On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> 
> > Ted --
> > 
> > Sorry for jumping in late.  Here's my $0.02...
> > 
> > In the runtime, we can do 4 things:
> > 
> > 1. Kill just the process that we forked.
> > 2. Kill just the process(es) that call back and identify themselves 
as MPI processes (we don't track this right now, but we could add that 
functionality).
> > 3. Union of #1 and #2.
> > 4. Kill all processes (to include any intermediate processes that 
are not included in #1 and #2).
> > 
> > In Open MPI 2.x, #4 is the intended behavior.  There may be a bug or 
two that needs to get fixed (e.g., in your last mail, I don't see 
offhand why it waits until the MPI process finishes sleeping), but we 
should be killing the process group, which -- unless any of the 
descendant processes have explicitly left the process group -- should 
hit the entire process tree.  
> > 
> > Sidenote: there's actually a way to be a bit more aggressive and do 
a better job of ensuring that we kill *all* processes (via creative use 
of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / 
optimization.
> > 
> > I think Gilles and Ralph proposed a good point to you: if you want 
to be sure to be able to do cleanup after an MPI process terminates (
normally or abnormally), you should trap signals in your intermediate 
processes to catch what Open MPI's runtime throws and therefore know 
that it is time to cleanup.  
> > 
> > Hypothetically, this should work in all versions of Open MPI...?
> > 
> > I think Ralph made a pull request that adds an MCA param to change 
the default behavior from #4 to #1.
> > 
> > Note, however, that there's a little time between when Open MPI 
sends the SIGTERM and the SIGKILL, so this solution could be racy.  If 
you find that you're running out of time to cleanup, we might be able to 
make the delay between the SIGTERM and SIGKILL be configurable (e.g., 
via MCA param).
> > 
> > 
> > 
> > 
> > > On Jun 16, 2017, at 10:08 AM, Ted Sussman <ted.suss...@adina.com> 
wrote:
> > > 
> > > Hello Gilles and Ralph,
> > > 
> > > Thank you for your advice so far.  I appreciate the time that you 
have spent to educate me about the details of Open MPI.
> > > 
> > > But I think that there is something fundamental that I don't 
understand.  Consider Example 2 run with Open MPI 2.1.1. 
> > > 
> > > mpirun --> shell for process 0 -->  executable for process 0 --> 
MPI calls, MPI_Abort
> > >        --> shell for process 1 -->  executable for process 1 --> 
MPI calls
> > > 
> > > After the MPI_Abort is called, ps shows that both shells are 
running, and that the executable for process 1 is running (in this case, 
process 1 is sleeping).  And mpirun does not exit until process 1 is 
finished sleeping.
> > > 
> > > I cannot reconcile this observed behavior with the statement
> > > 
> > > >     >     2.x: each process is put into its own process group 
upon launch. When we issue a
> > > >     >     "kill", we issue it to the process group. Thus, every 
child proc of that child proc will
> > > >     >     receive it. IIRC, this was the intended behavior.
> > > 
> > > I assume that, for my example, there are two process groups.  The 
process group for process 0 contains the shell for process 0 and the 
executable for process 0; and the process group for process 1 contains 
the shell for process 1 and the executable for process 1.  So what does 
MPI_ABORT do?  MPI_ABORT does not kill the process group for process 0, 
since the shell for process 0 continues.  And MPI_ABORT does not kill 
the process group for process 1, since both the shell and executable for 
process 1 continue.
> > > 
> > > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > > 
> > > mpirun: abort is already in progress.. hit ctrl-c again to 
forcibly terminate
> > > 
> > > but I don't need to hit Ctrl-C again because mpirun immediately 
exits.
> > > 
> > > Can you shed some light on all of this?
> > > 
> > > Sincerely,
> > > 
> > > Ted Sussman
> > > 
> > > 
> > > On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote:
> > > 
> > > >
> > > > You have to understand that we have no way of knowing who is 
making MPI calls - all we see is
> > > > the proc that we started, and we know someone of that rank is 
running (but we have no way of
> > > > knowing which of the procs you sub-spawned it is).
> > > >
> > > > So the behavior you are seeking only occurred in some earlier 
release by sheer accident. Nor will
> > > > you find it portable as there is no specification directing that 
behavior.
> > > >
> > > > The behavior I´ve provided is to either deliver the signal to _
all_ child processes (including
> > > > grandchildren etc.), or _only_ the immediate child of the daemon.
 It won´t do what you describe -
> > > > kill the mPI proc underneath the shell, but not the shell itself.
> > > >
> > > > What you can eventually do is use PMIx to ask the runtime to 
selectively deliver signals to
> > > > pid/procs for you. We don´t have that capability implemented 
just yet, I´m afraid.
> > > >
> > > > Meantime, when I get a chance, I can code an option that will 
record the pid of the subproc that
> > > > calls MPI_Init, and then let´s you deliver signals to just that 
proc. No promises as to when that will
> > > > be done.
> > > >
> > > >
> > > >     On Jun 15, 2017, at 1:37 PM, Ted Sussman <ted.sussman@adina.
com> wrote:
> > > >
> > > >     Hello Ralph,
> > > >
> > > >     I am just an Open MPI end user, so I will need to wait for 
the next official release.
> > > >
> > > >     mpirun --> shell for process 0 -->  executable for process 0 
--> MPI calls
> > > >            --> shell for process 1 -->  executable for process 1 
--> MPI calls
> > > >                                     ...
> > > >
> > > >     I guess the question is, should MPI_ABORT kill the 
executables or the shells?  I naively
> > > >     thought, that, since it is the executables that make the MPI 
calls, it is the executables that
> > > >     should be aborted by the call to MPI_ABORT.  Since the 
shells don't make MPI calls, the
> > > >     shells should not be aborted.
> > > >
> > > >     And users might have several layers of shells in between 
mpirun and the executable.
> > > >
> > > >     So now I will look for the latest version of Open MPI that 
has the 1.4.3 behavior.
> > > >
> > > >     Sincerely,
> > > >
> > > >     Ted Sussman
> > > >
> > > >     On 15 Jun 2017 at 12:31, r...@open-mpi.org wrote:
> > > >
> > > >     >
> > > >     > Yeah, things jittered a little there as we debated the "
right" behavior. Generally, when we
> > > >     see that
> > > >     > happening it means that a param is required, but somehow 
we never reached that point.
> > > >     >
> > > >     > See if https://github.com/open-mpi/ompi/pull/3704  helps - 
if so, I can schedule it for the next
> > > >     2.x
> > > >     > release if the RMs agree to take it
> > > >     >
> > > >     > Ralph
> > > >     >
> > > >     >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.sussman
@adina.com > wrote:
> > > >     >
> > > >     >     Thank you for your comments.
> > > >     >    
> > > >     >     Our application relies upon "dum.sh" to clean up after 
the process exits, either if the
> > > >     process
> > > >     >     exits normally, or if the process exits abnormally 
because of MPI_ABORT.  If the process
> > > >     >     group is killed by MPI_ABORT, this clean up will not 
be performed.  If exec is used to launch
> > > >     >     the executable from dum.sh, then dum.sh is terminated 
by the exec, so dum.sh cannot
> > > >     >     perform any clean up.
> > > >     >    
> > > >     >     I suppose that other user applications might work 
similarly, so it would be good to have an
> > > >     >     MCA parameter to control the behavior of MPI_ABORT.
> > > >     >    
> > > >     >     We could rewrite our shell script that invokes mpirun, 
so that the cleanup that is now done
> > > >     >     by
> > > >     >     dum.sh is done by the invoking shell script after 
mpirun exits.  Perhaps this technique is the
> > > >     >     preferred way to clean up after mpirun is invoked.
> > > >     >    
> > > >     >     By the way, I have also tested with Open MPI 1.10.7, 
and Open MPI 1.10.7 has different
> > > >     >     behavior than either Open MPI 1.4.3 or Open MPI 2.1.1. 
 In this explanation, it is important to
> > > >     >     know that the aborttest executable sleeps for 20 sec.
> > > >     >    
> > > >     >     When running example 2:
> > > >     >    
> > > >     >     1.4.3: process 1 immediately aborts
> > > >     >     1.10.7: process 1 doesn't abort and never stops.
> > > >     >     2.1.1 process 1 doesn't abort, but stops after it is 
finished sleeping
> > > >     >    
> > > >     >     Sincerely,
> > > >     >    
> > > >     >     Ted Sussman
> > > >     >    
> > > >     >     On 15 Jun 2017 at 9:18, r...@open-mpi.org wrote:
> > > >     >
> > > >     >     Here is how the system is working:
> > > >     >    
> > > >     >     Master: each process is put into its own process group 
upon launch. When we issue a
> > > >     >     "kill", however, we only issue it to the individual 
process (instead of the process group
> > > >     >     that is headed by that child process). This is 
probably a bug as I don´t believe that is
> > > >     >     what we intended, but set that aside for now.
> > > >     >    
> > > >     >     2.x: each process is put into its own process group 
upon launch. When we issue a
> > > >     >     "kill", we issue it to the process group. Thus, every 
child proc of that child proc will
> > > >     >     receive it. IIRC, this was the intended behavior.
> > > >     >    
> > > >     >     It is rather trivial to make the change (it only 
involves 3 lines of code), but I´m not sure
> > > >     >     of what our intended behavior is supposed to be. Once 
we clarify that, it is also trivial
> > > >     >     to add another MCA param (you can never have too many!)
 to allow you to select the
> > > >     >     other behavior.
> > > >     >    
> > > >     >
> > > >     >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.sussman@
adina.com > wrote:
> > > >     >    
> > > >     >     Hello Gilles,
> > > >     >    
> > > >     >     Thank you for your quick answer.  I confirm that if 
exec is used, both processes
> > > >     >     immediately
> > > >     >     abort.
> > > >     >    
> > > >     >     Now suppose that the line
> > > >     >    
> > > >     >     echo "After aborttest:
> > > >     >     OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > > >     >    
> > > >     >     is added to the end of dum.sh.
> > > >     >    
> > > >     >     If Example 2 is run with Open MPI 1.4.3, the output is
> > > >     >    
> > > >     >     After aborttest: OMPI_COMM_WORLD_RANK=0
> > > >     >    
> > > >     >     which shows that the shell script for the process with 
rank 0 continues after the
> > > >     >     abort,
> > > >     >     but that the shell script for the process with rank 1 
does not continue after the
> > > >     >     abort.
> > > >     >    
> > > >     >     If Example 2 is run with Open MPI 2.1.1, with exec 
used to invoke
> > > >     >     aborttest02.exe, then
> > > >     >     there is no such output, which shows that both shell 
scripts do not continue after
> > > >     >     the abort.
> > > >     >    
> > > >     >     I prefer the Open MPI 1.4.3 behavior because our 
original application depends
> > > >     >     upon the
> > > >     >     Open MPI 1.4.3 behavior.  (Our original application 
will also work if both
> > > >     >     executables are
> > > >     >     aborted, and if both shell scripts continue after the 
abort.)
> > > >     >    
> > > >     >     It might be too much to expect, but is there a way to 
recover the Open MPI 1.4.3
> > > >     >     behavior
> > > >     >     using Open MPI 2.1.1?  
> > > >     >    
> > > >     >     Sincerely,
> > > >     >    
> > > >     >     Ted Sussman
> > > >     >    
> > > >     >    
> > > >     >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > > >     >
> > > >     >     Ted,
> > > >     >    
> > > >     >    
> > > >     >     fwiw, the 'master' branch has the behavior you expect.
> > > >     >    
> > > >     >    
> > > >     >     meanwhile, you can simple edit your 'dum.sh' script 
and replace
> > > >     >    
> > > >     >     /home/buildadina/src/aborttest02/aborttest02.exe
> > > >     >    
> > > >     >     with
> > > >     >    
> > > >     >     exec /home/buildadina/src/aborttest02/aborttest02.exe
> > > >     >    
> > > >     >    
> > > >     >     Cheers,
> > > >     >    
> > > >     >    
> > > >     >     Gilles
> > > >     >    
> > > >     >    
> > > >     >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > >     >     Hello,
> > > >     >    
> > > >     >     My question concerns MPI_ABORT, indirect execution of
> > > >     >     executables by mpirun and Open
> > > >     >     MPI 2.1.1.  When mpirun runs executables directly, MPI
_ABORT
> > > >     >     works as expected, but
> > > >     >     when mpirun runs executables indirectly, MPI_ABORT 
does not
> > > >     >     work as expected.
> > > >     >    
> > > >     >     If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, 
MPI_ABORT
> > > >     >     works as expected in all
> > > >     >     cases.
> > > >     >    
> > > >     >     The examples given below have been simplified as far 
as possible
> > > >     >     to show the issues.
> > > >     >    
> > > >     >     ---
> > > >     >    
> > > >     >     Example 1
> > > >     >    
> > > >     >     Consider an MPI job run in the following way:
> > > >     >    
> > > >     >     mpirun ... -app addmpw1
> > > >     >    
> > > >     >     where the appfile addmpw1 lists two executables:
> > > >     >    
> > > >     >     -n 1 -host gulftown ... aborttest02.exe
> > > >     >     -n 1 -host gulftown ... aborttest02.exe
> > > >     >    
> > > >     >     The two executables are executed on the local node 
gulftown.
> > > >     >      aborttest02 calls MPI_ABORT
> > > >     >     for rank 0, then sleeps.
> > > >     >    
> > > >     >     The above MPI job runs as expected.  Both processes 
immediately
> > > >     >     abort when rank 0 calls
> > > >     >     MPI_ABORT.
> > > >     >    
> > > >     >     ---
> > > >     >    
> > > >     >     Example 2
> > > >     >    
> > > >     >     Now change the above example as follows:
> > > >     >    
> > > >     >     mpirun ... -app addmpw2
> > > >     >    
> > > >     >     where the appfile addmpw2 lists shell scripts:
> > > >     >    
> > > >     >     -n 1 -host gulftown ... dum.sh
> > > >     >     -n 1 -host gulftown ... dum.sh
> > > >     >    
> > > >     >     dum.sh invokes aborttest02.exe.  So aborttest02.exe is 
executed
> > > >     >     indirectly by mpirun.
> > > >     >    
> > > >     >     In this case, the MPI job only aborts process 0 when 
rank 0 calls
> > > >     >     MPI_ABORT.  Process 1
> > > >     >     continues to run.  This behavior is unexpected.
> > > >     >    
> > > >     >     ----
> > > >     >    
> > > >     >     I have attached all files to this E-mail.  Since there 
are absolute
> > > >     >     pathnames in the files, to
> > > >     >     reproduce my findings, you will need to update the 
pathnames in the
> > > >     >     appfiles and shell
> > > >     >     scripts.  To run example 1,
> > > >     >    
> > > >     >     sh run1.sh
> > > >     >    
> > > >     >     and to run example 2,
> > > >     >    
> > > >     >     sh run2.sh
> > > >     >    
> > > >     >     ---
> > > >     >    
> > > >     >     I have tested these examples with Open MPI 1.4.3 and 2.
0.3.  In
> > > >     >     Open MPI 1.4.3, both
> > > >     >     examples work as expected.  Open MPI 2.0.3 has the 
same behavior
> > > >     >     as Open MPI 2.1.1.
> > > >     >    
> > > >     >     ---
> > > >     >    
> > > >     >     I would prefer that Open MPI 2.1.1 aborts both 
processes, even
> > > >     >     when the executables are
> > > >     >     invoked indirectly by mpirun.  If there is an MCA 
setting that is
> > > >     >     needed to make Open MPI
> > > >     >     2.1.1 abort both processes, please let me know.
> > > >     >    
> > > >     >    
> > > >     >     Sincerely,
> > > >     >    
> > > >     >     Theodore Sussman
> > > >     >    
> > > >     >    
> > > >     >     The following section of this message contains a file 
attachment
> > > >     >     prepared for transmission using the Internet MIME 
message format.
> > > >     >     If you are using Pegasus Mail, or any other MIME-
compliant system,
> > > >     >     you should be able to save it or view it from within 
your mailer.
> > > >     >     If you cannot, please ask your system administrator 
for assistance.
> > > >     >    
> > > >     >       ---- File information -----------
> > > >     >         File:  config.log.bz2
> > > >     >         Date:  14 Jun 2017, 13:35
> > > >     >         Size:  146548 bytes.
> > > >     >         Type:  Binary
> > > >     >    
> > > >     >    
> > > >     >     The following section of this message contains a file 
attachment
> > > >     >     prepared for transmission using the Internet MIME 
message format.
> > > >     >     If you are using Pegasus Mail, or any other MIME-
compliant system,
> > > >     >     you should be able to save it or view it from within 
your mailer.
> > > >     >     If you cannot, please ask your system administrator 
for assistance.
> > > >     >    
> > > >     >       ---- File information -----------
> > > >     >         File:  ompi_info.bz2
> > > >     >         Date:  14 Jun 2017, 13:35
> > > >     >         Size:  24088 bytes.
> > > >     >         Type:  Binary
> > > >     >    
> > > >     >    
> > > >     >     The following section of this message contains a file 
attachment
> > > >     >     prepared for transmission using the Internet MIME 
message format.
> > > >     >     If you are using Pegasus Mail, or any other MIME-
compliant system,
> > > >     >     you should be able to save it or view it from within 
your mailer.
> > > >     >     If you cannot, please ask your system administrator 
for assistance.
> > > >     >    
> > > >     >       ---- File information -----------
> > > >     >         File:  aborttest02.tgz
> > > >     >         Date:  14 Jun 2017, 13:52
> > > >     >         Size:  4285 bytes.
> > > >     >         Type:  Binary
> > > >     >    
> > > >     >    
> > > >     >     _______________________________________________
> > > >     >     users mailing list
> > > >     >     users@lists.open-mpi.org
> > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > >     >    
> > > >     >     _______________________________________________
> > > >     >     users mailing list
> > > >     >     users@lists.open-mpi.org
> > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > >     >    
> > > >     >    
> > > >     >    
> > > >     >     _______________________________________________
> > > >     >     users mailing list
> > > >     >     users@lists.open-mpi.org
> > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > >     >    
> > > >     >     _______________________________________________
> > > >     >     users mailing list
> > > >     >     users@lists.open-mpi.org
> > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > >     >    
> > > >     >    
> > > >     >    
> > > >     >     _______________________________________________
> > > >     >     users mailing list
> > > >     >     users@lists.open-mpi.org
> > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > >     >
> > > >
> > > >       
> > > >     _______________________________________________
> > > >     users mailing list
> > > >     users@lists.open-mpi.org
> > > >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > 
> > >   
> > > _______________________________________________
> > > users mailing list
> > > users@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > 
> > 
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com
> > 
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to