Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1
Hello Gilles and Ralph, Thank you for your advice so far. I appreciate the time that you have spent to educate me about the details of Open MPI. But I think that there is something fundamental that I don't understand. Consider Example 2 run with Open MPI 2.1.1. mpirun --> shell for process 0 --> executable for process 0 --> MPI calls, MPI_Abort --> shell for process 1 --> executable for process 1 --> MPI calls After the MPI_Abort is called, ps shows that both shells are running, and that the executable for process 1 is running (in this case, process 1 is sleeping). And mpirun does not exit until process 1 is finished sleeping. I cannot reconcile this observed behavior with the statement > > 2.x: each process is put into its own process group upon launch. > When we issue a > > "kill", we issue it to the process group. Thus, every child proc of > that child proc will > > receive it. IIRC, this was the intended behavior. I assume that, for my example, there are two process groups. The process group for process 0 contains the shell for process 0 and the executable for process 0; and the process group for process 1 contains the shell for process 1 and the executable for process 1. So what does MPI_ABORT do? MPI_ABORT does not kill the process group for process 0, since the shell for process 0 continues. And MPI_ABORT does not kill the process group for process 1, since both the shell and executable for process 1 continue. If I hit Ctrl-C after MPI_Abort is called, I get the message mpirun: abort is already in progress.. hit ctrl-c again to forcibly terminate but I don't need to hit Ctrl-C again because mpirun immediately exits. Can you shed some light on all of this? Sincerely, Ted Sussman On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote: > > You have to understand that we have no way of knowing who is making MPI calls > - all we see is > the proc that we started, and we know someone of that rank is running (but we > have no way of > knowing which of the procs you sub-spawned it is). > > So the behavior you are seeking only occurred in some earlier release by > sheer accident. Nor will > you find it portable as there is no specification directing that behavior. > > The behavior I´ve provided is to either deliver the signal to _all_ child > processes (including > grandchildren etc.), or _only_ the immediate child of the daemon. It won´t do > what you describe - > kill the mPI proc underneath the shell, but not the shell itself. > > What you can eventually do is use PMIx to ask the runtime to selectively > deliver signals to > pid/procs for you. We don´t have that capability implemented just yet, I´m > afraid. > > Meantime, when I get a chance, I can code an option that will record the pid > of the subproc that > calls MPI_Init, and then let´s you deliver signals to just that proc. No > promises as to when that will > be done. > > > On Jun 15, 2017, at 1:37 PM, Ted Sussman wrote: > > Hello Ralph, > > I am just an Open MPI end user, so I will need to wait for the next > official release. > > mpirun --> shell for process 0 --> executable for process 0 --> MPI calls > --> shell for process 1 --> executable for process 1 --> MPI calls > ... > > I guess the question is, should MPI_ABORT kill the executables or the > shells? I naively > thought, that, since it is the executables that make the MPI calls, it is > the executables that > should be aborted by the call to MPI_ABORT. Since the shells don't make > MPI calls, the > shells should not be aborted. > > And users might have several layers of shells in between mpirun and the > executable. > > So now I will look for the latest version of Open MPI that has the 1.4.3 > behavior. > > Sincerely, > > Ted Sussman > > On 15 Jun 2017 at 12:31, r...@open-mpi.org wrote: > > > > > Yeah, things jittered a little there as we debated the "right" > behavior. Generally, when we > see that > > happening it means that a param is required, but somehow we never > reached that point. > > > > See if https://github.com/open-mpi/ompi/pull/3704 helps - if so, I can > schedule it for the next > 2.x > > release if the RMs agree to take it > > > > Ralph > > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman > wrote: > > > > Thank you for your comments. > > > > Our application relies upon "dum.sh" to clean up after the process > exits, either if the > process > > exits normally, or if the process exits abnormally because of > MPI_ABORT. If the process > > group is killed by MPI_ABORT, this clean up will not be performed. > If exec is used to launch > > the executable from dum.sh, then dum.sh is terminated by the exec, > so dum.sh cannot > > perform any clean up. > > > > I supp
Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1
Ted -- Sorry for jumping in late. Here's my $0.02... In the runtime, we can do 4 things: 1. Kill just the process that we forked. 2. Kill just the process(es) that call back and identify themselves as MPI processes (we don't track this right now, but we could add that functionality). 3. Union of #1 and #2. 4. Kill all processes (to include any intermediate processes that are not included in #1 and #2). In Open MPI 2.x, #4 is the intended behavior. There may be a bug or two that needs to get fixed (e.g., in your last mail, I don't see offhand why it waits until the MPI process finishes sleeping), but we should be killing the process group, which -- unless any of the descendant processes have explicitly left the process group -- should hit the entire process tree. Sidenote: there's actually a way to be a bit more aggressive and do a better job of ensuring that we kill *all* processes (via creative use of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / optimization. I think Gilles and Ralph proposed a good point to you: if you want to be sure to be able to do cleanup after an MPI process terminates (normally or abnormally), you should trap signals in your intermediate processes to catch what Open MPI's runtime throws and therefore know that it is time to cleanup. Hypothetically, this should work in all versions of Open MPI...? I think Ralph made a pull request that adds an MCA param to change the default behavior from #4 to #1. Note, however, that there's a little time between when Open MPI sends the SIGTERM and the SIGKILL, so this solution could be racy. If you find that you're running out of time to cleanup, we might be able to make the delay between the SIGTERM and SIGKILL be configurable (e.g., via MCA param). > On Jun 16, 2017, at 10:08 AM, Ted Sussman wrote: > > Hello Gilles and Ralph, > > Thank you for your advice so far. I appreciate the time that you have spent > to educate me about the details of Open MPI. > > But I think that there is something fundamental that I don't understand. > Consider Example 2 run with Open MPI 2.1.1. > > mpirun --> shell for process 0 --> executable for process 0 --> MPI calls, > MPI_Abort >--> shell for process 1 --> executable for process 1 --> MPI calls > > After the MPI_Abort is called, ps shows that both shells are running, and > that the executable for process 1 is running (in this case, process 1 is > sleeping). And mpirun does not exit until process 1 is finished sleeping. > > I cannot reconcile this observed behavior with the statement > > > > 2.x: each process is put into its own process group upon launch. > > When we issue a > > > "kill", we issue it to the process group. Thus, every child proc > > of that child proc will > > > receive it. IIRC, this was the intended behavior. > > I assume that, for my example, there are two process groups. The process > group for process 0 contains the shell for process 0 and the executable for > process 0; and the process group for process 1 contains the shell for process > 1 and the executable for process 1. So what does MPI_ABORT do? MPI_ABORT > does not kill the process group for process 0, since the shell for process 0 > continues. And MPI_ABORT does not kill the process group for process 1, > since both the shell and executable for process 1 continue. > > If I hit Ctrl-C after MPI_Abort is called, I get the message > > mpirun: abort is already in progress.. hit ctrl-c again to forcibly terminate > > but I don't need to hit Ctrl-C again because mpirun immediately exits. > > Can you shed some light on all of this? > > Sincerely, > > Ted Sussman > > > On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote: > > > > > You have to understand that we have no way of knowing who is making MPI > > calls - all we see is > > the proc that we started, and we know someone of that rank is running (but > > we have no way of > > knowing which of the procs you sub-spawned it is). > > > > So the behavior you are seeking only occurred in some earlier release by > > sheer accident. Nor will > > you find it portable as there is no specification directing that behavior. > > > > The behavior I’ve provided is to either deliver the signal to _all_ child > > processes (including > > grandchildren etc.), or _only_ the immediate child of the daemon. It won’t > > do what you describe - > > kill the mPI proc underneath the shell, but not the shell itself. > > > > What you can eventually do is use PMIx to ask the runtime to selectively > > deliver signals to > > pid/procs for you. We don’t have that capability implemented just yet, I’m > > afraid. > > > > Meantime, when I get a chance, I can code an option that will record the > > pid of the subproc that > > calls MPI_Init, and then let’s you deliver signals to just that proc. No > > promises as to when that will > > be done. > > > > > > On Jun 15, 2017, at 1:37 PM, T
Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1
Hello Jeff, Thanks for your comments. I am not seeing behavior #4, on the two computers that I have tested on, using Open MPI 2.1.1. I wonder if you can duplicate my results with the files that I have uploaded. Regarding what is the "correct" behavior, I am willing to modify my application to correspond to Open MPI's behavior (whatever behavior the Open MPI developers decide is best) -- provided that Open MPI does in fact kill off both shells. So my highest priority now is to find out why Open MPI 2.1.1 does not kill off both shells on my computer. Sincerely, Ted Sussman On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote: > Ted -- > > Sorry for jumping in late. Here's my $0.02... > > In the runtime, we can do 4 things: > > 1. Kill just the process that we forked. > 2. Kill just the process(es) that call back and identify themselves as MPI > processes (we don't track this right now, but we could add that > functionality). > 3. Union of #1 and #2. > 4. Kill all processes (to include any intermediate processes that are not > included in #1 and #2). > > In Open MPI 2.x, #4 is the intended behavior. There may be a bug or two that > needs to get fixed (e.g., in your last mail, I don't see offhand why it waits > until the MPI process finishes sleeping), but we should be killing the > process group, which -- unless any of the descendant processes have > explicitly left the process group -- should hit the entire process tree. > > Sidenote: there's actually a way to be a bit more aggressive and do a better > job of ensuring that we kill *all* processes (via creative use of > PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / > optimization. > > I think Gilles and Ralph proposed a good point to you: if you want to be sure > to be able to do cleanup after an MPI process terminates (normally or > abnormally), you should trap signals in your intermediate processes to catch > what Open MPI's runtime throws and therefore know that it is time to cleanup. > > > Hypothetically, this should work in all versions of Open MPI...? > > I think Ralph made a pull request that adds an MCA param to change the > default behavior from #4 to #1. > > Note, however, that there's a little time between when Open MPI sends the > SIGTERM and the SIGKILL, so this solution could be racy. If you find that > you're running out of time to cleanup, we might be able to make the delay > between the SIGTERM and SIGKILL be configurable (e.g., via MCA param). > > > > > > On Jun 16, 2017, at 10:08 AM, Ted Sussman wrote: > > > > Hello Gilles and Ralph, > > > > Thank you for your advice so far. I appreciate the time that you have > > spent to educate me about the details of Open MPI. > > > > But I think that there is something fundamental that I don't understand. > > Consider Example 2 run with Open MPI 2.1.1. > > > > mpirun --> shell for process 0 --> executable for process 0 --> MPI calls, > > MPI_Abort > >--> shell for process 1 --> executable for process 1 --> MPI calls > > > > After the MPI_Abort is called, ps shows that both shells are running, and > > that the executable for process 1 is running (in this case, process 1 is > > sleeping). And mpirun does not exit until process 1 is finished sleeping. > > > > I cannot reconcile this observed behavior with the statement > > > > > > 2.x: each process is put into its own process group upon > > > launch. When we issue a > > > > "kill", we issue it to the process group. Thus, every child > > > proc of that child proc will > > > > receive it. IIRC, this was the intended behavior. > > > > I assume that, for my example, there are two process groups. The process > > group for process 0 contains the shell for process 0 and the executable for > > process 0; and the process group for process 1 contains the shell for > > process 1 and the executable for process 1. So what does MPI_ABORT do? > > MPI_ABORT does not kill the process group for process 0, since the shell > > for process 0 continues. And MPI_ABORT does not kill the process group for > > process 1, since both the shell and executable for process 1 continue. > > > > If I hit Ctrl-C after MPI_Abort is called, I get the message > > > > mpirun: abort is already in progress.. hit ctrl-c again to forcibly > > terminate > > > > but I don't need to hit Ctrl-C again because mpirun immediately exits. > > > > Can you shed some light on all of this? > > > > Sincerely, > > > > Ted Sussman > > > > > > On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote: > > > > > > > > You have to understand that we have no way of knowing who is making MPI > > > calls - all we see is > > > the proc that we started, and we know someone of that rank is running > > > (but we have no way of > > > knowing which of the procs you sub-spawned it is). > > > > > > So the behavior you are seeking only occurred in some earlier release by > > > shee
Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1
Ted, if you mpirun --mca odls_base_verbose 10 ... you will see which processes get killed and how Best regards, Gilles - Original Message - > Hello Jeff, > > Thanks for your comments. > > I am not seeing behavior #4, on the two computers that I have tested on, using Open MPI > 2.1.1. > > I wonder if you can duplicate my results with the files that I have uploaded. > > Regarding what is the "correct" behavior, I am willing to modify my application to correspond > to Open MPI's behavior (whatever behavior the Open MPI developers decide is best) -- > provided that Open MPI does in fact kill off both shells. > > So my highest priority now is to find out why Open MPI 2.1.1 does not kill off both shells on > my computer. > > Sincerely, > > Ted Sussman > > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote: > > > Ted -- > > > > Sorry for jumping in late. Here's my $0.02... > > > > In the runtime, we can do 4 things: > > > > 1. Kill just the process that we forked. > > 2. Kill just the process(es) that call back and identify themselves as MPI processes (we don't track this right now, but we could add that functionality). > > 3. Union of #1 and #2. > > 4. Kill all processes (to include any intermediate processes that are not included in #1 and #2). > > > > In Open MPI 2.x, #4 is the intended behavior. There may be a bug or two that needs to get fixed (e.g., in your last mail, I don't see offhand why it waits until the MPI process finishes sleeping), but we should be killing the process group, which -- unless any of the descendant processes have explicitly left the process group -- should hit the entire process tree. > > > > Sidenote: there's actually a way to be a bit more aggressive and do a better job of ensuring that we kill *all* processes (via creative use of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / optimization. > > > > I think Gilles and Ralph proposed a good point to you: if you want to be sure to be able to do cleanup after an MPI process terminates ( normally or abnormally), you should trap signals in your intermediate processes to catch what Open MPI's runtime throws and therefore know that it is time to cleanup. > > > > Hypothetically, this should work in all versions of Open MPI...? > > > > I think Ralph made a pull request that adds an MCA param to change the default behavior from #4 to #1. > > > > Note, however, that there's a little time between when Open MPI sends the SIGTERM and the SIGKILL, so this solution could be racy. If you find that you're running out of time to cleanup, we might be able to make the delay between the SIGTERM and SIGKILL be configurable (e.g., via MCA param). > > > > > > > > > > > On Jun 16, 2017, at 10:08 AM, Ted Sussman wrote: > > > > > > Hello Gilles and Ralph, > > > > > > Thank you for your advice so far. I appreciate the time that you have spent to educate me about the details of Open MPI. > > > > > > But I think that there is something fundamental that I don't understand. Consider Example 2 run with Open MPI 2.1.1. > > > > > > mpirun --> shell for process 0 --> executable for process 0 --> MPI calls, MPI_Abort > > >--> shell for process 1 --> executable for process 1 --> MPI calls > > > > > > After the MPI_Abort is called, ps shows that both shells are running, and that the executable for process 1 is running (in this case, process 1 is sleeping). And mpirun does not exit until process 1 is finished sleeping. > > > > > > I cannot reconcile this observed behavior with the statement > > > > > > > > 2.x: each process is put into its own process group upon launch. When we issue a > > > > > "kill", we issue it to the process group. Thus, every child proc of that child proc will > > > > > receive it. IIRC, this was the intended behavior. > > > > > > I assume that, for my example, there are two process groups. The process group for process 0 contains the shell for process 0 and the executable for process 0; and the process group for process 1 contains the shell for process 1 and the executable for process 1. So what does MPI_ABORT do? MPI_ABORT does not kill the process group for process 0, since the shell for process 0 continues. And MPI_ABORT does not kill the process group for process 1, since both the shell and executable for process 1 continue. > > > > > > If I hit Ctrl-C after MPI_Abort is called, I get the message > > > > > > mpirun: abort is already in progress.. hit ctrl-c again to forcibly terminate > > > > > > but I don't need to hit Ctrl-C again because mpirun immediately exits. > > > > > > Can you shed some light on all of this? > > > > > > Sincerely, > > > > > > Ted Sussman > > > > > > > > > On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote: > > > > > > > > > > > You have to understand that we have no way of knowing who is making MPI calls - all we see is > > > > the proc that we s