If I replace the sleep with an infinite loop, I get the same behavior. One "aborttest" process remains after all the signals are sent.
On 19 Jun 2017 at 10:10, r...@open-mpi.org wrote: > > That is typical behavior when you throw something into "sleep" - not much we > can do about it, I > think. > > On Jun 19, 2017, at 9:58 AM, Ted Sussman <ted.suss...@adina.com> wrote: > > Hello, > > I have rebuilt Open MPI 2.1.1 on the same computer, including > --enable-debug. > > I have attached the abort test program aborttest10.tgz. This version > sleeps for 5 sec before > calling MPI_ABORT, so that I can check the pids using ps. > > This is what happens (see run2.sh.out). > > Open MPI invokes two instances of dum.sh. Each instance of dum.sh > invokes aborttest.exe. > > Pid Process > ------------------- > 19565 dum.sh > 19566 dum.sh > 19567 aborttest10.exe > 19568 aborttest10.exe > > When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to > both > instances of dum.sh (pids 19565 and 19566). > > ps shows that both the shell processes vanish, and that one of the > aborttest10.exe processes > vanishes. But the other aborttest10.exe remains and continues until it > is finished sleeping. > > Hope that this information is useful. > > Sincerely, > > Ted Sussman > > > > On 19 Jun 2017 at 23:06, gil...@rist.or.jp wrote: > > > Ted, > > some traces are missing because you did not configure with --enable-debug > i am afraid you have to do it (and you probably want to install that > debug version in an > other > location since its performances are not good for production) in order to > get all the logs. > > Cheers, > > Gilles > > ----- Original Message ----- > Hello Gilles, > > I retried my example, with the same results as I observed before. The > process with rank > 1 > does not get killed by MPI_ABORT. > > I have attached to this E-mail: > > config.log.bz2 > ompi_info.bz2 (uses ompi_info -a) > aborttest09.tgz > > This testing is done on a computer running Linux 3.10.0. This is a > different computer > than > the computer that I previously used for testing. You can confirm that > I am using Open > MPI > 2.1.1. > > tar xvzf aborttest09.tgz > cd aborttest09 > ./sh run2.sh > > run2.sh contains the command > > /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca > odls_base_verbose > 10 > ./dum.sh > > The output from this run is in aborttest09/run2.sh.out. > > The output shows that the the "default" component is selected by odls. > > The only messages from odls are: odls: launch spawning child ... (two > messages). > There > are no messages from odls with "kill" and I see no SENDING SIGCONT / > SIGKILL > messages. > > I am not running from within any batch manager. > > Sincerely, > > Ted Sussman > > On 17 Jun 2017 at 16:02, gil...@rist.or.jp wrote: > > Ted, > > i do not observe the same behavior you describe with Open MPI 2.1.1 > > # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh > > abort.sh 31361 launching abort > abort.sh 31362 launching abort > I am rank 0 with pid 31363 > I am rank 1 with pid 31364 > ------------------------------------------------------------------------ > -- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 1. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > ------------------------------------------------------------------------ > -- > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process > [[18199,1],0] > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0] > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361 > SUCCESS > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process > [[18199,1],1] > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1] > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362 > SUCCESS > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0] > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361 > SUCCESS > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1] > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362 > SUCCESS > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0] > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361 > SUCCESS > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1] > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362 > SUCCESS > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process > [[18199,1],0] > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is > not alive > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process > [[18199,1],1] > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is > not alive > > > Open MPI did kill both shells, and they were indeed killed as evidenced > by ps > > #ps -fu gilles --forest > UID PID PPID C STIME TTY TIME CMD > gilles 1564 1561 0 15:39 ? 00:00:01 sshd: gilles@pts/1 > gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash > gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/ > local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base > gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort > > > so trapping SIGTERM in your shell and manually killing the MPI task > should work > (as Jeff explained, as long as the shell script is fast enough to do > that between SIGTERM and SIGKILL) > > > if you observe a different behavior, please double check your Open MPI > version and post the outputs of the same commands. > > btw, are you running from a batch manager ? if yes, which one ? > > Cheers, > > Gilles > > ----- Original Message ----- > Ted, > > if you > > mpirun --mca odls_base_verbose 10 ... > > you will see which processes get killed and how > > Best regards, > > > Gilles > > ----- Original Message ----- > Hello Jeff, > > Thanks for your comments. > > I am not seeing behavior #4, on the two computers that I have > tested > on, using Open MPI > 2.1.1. > > I wonder if you can duplicate my results with the files that I have > uploaded. > > Regarding what is the "correct" behavior, I am willing to modify my > application to correspond > to Open MPI's behavior (whatever behavior the Open MPI > developers > decide is best) -- > provided that Open MPI does in fact kill off both shells. > > So my highest priority now is to find out why Open MPI 2.1.1 does > not > kill off both shells on > my computer. > > Sincerely, > > Ted Sussman > > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote: > > Ted -- > > Sorry for jumping in late. Here's my $0.02... > > In the runtime, we can do 4 things: > > 1. Kill just the process that we forked. > 2. Kill just the process(es) that call back and identify > themselves > as MPI processes (we don't track this right now, but we could add that > functionality). > 3. Union of #1 and #2. > 4. Kill all processes (to include any intermediate processes > that > are not included in #1 and #2). > > In Open MPI 2.x, #4 is the intended behavior. There may be a > bug > or > two that needs to get fixed (e.g., in your last mail, I don't see > offhand why it waits until the MPI process finishes sleeping), but we > should be killing the process group, which -- unless any of the > descendant processes have explicitly left the process group -- should > hit the entire process tree. > > Sidenote: there's actually a way to be a bit more aggressive > and > do > a better job of ensuring that we kill *all* processes (via creative > use > of PR_SET_CHILD_SUBREAPER), but that's basically a future > enhancement > / > optimization. > > I think Gilles and Ralph proposed a good point to you: if you > want > to be sure to be able to do cleanup after an MPI process terminates ( > normally or abnormally), you should trap signals in your intermediate > processes to catch what Open MPI's runtime throws and therefore know > that it is time to cleanup. > > Hypothetically, this should work in all versions of Open MPI...? > > I think Ralph made a pull request that adds an MCA param to > change > the default behavior from #4 to #1. > > Note, however, that there's a little time between when Open > MPI > sends the SIGTERM and the SIGKILL, so this solution could be racy. If > you find that you're running out of time to cleanup, we might be able > to > make the delay between the SIGTERM and SIGKILL be configurable > (e.g., > via MCA param). > > > > > On Jun 16, 2017, at 10:08 AM, Ted Sussman > <ted.suss...@adina.com > > wrote: > > Hello Gilles and Ralph, > > Thank you for your advice so far. I appreciate the time > that > you > have spent to educate me about the details of Open MPI. > > But I think that there is something fundamental that I > don't > understand. Consider Example 2 run with Open MPI 2.1.1. > > mpirun --> shell for process 0 --> executable for process > 0 --> > MPI calls, MPI_Abort > --> shell for process 1 --> executable for process 1 --> > MPI calls > > After the MPI_Abort is called, ps shows that both shells > are > running, and that the executable for process 1 is running (in this > case, > process 1 is sleeping). And mpirun does not exit until process 1 is > finished sleeping. > > I cannot reconcile this observed behavior with the > statement > > > 2.x: each process is put into its own process group > upon launch. When we issue a > > "kill", we issue it to the process group. Thus, > every > child proc of that child proc will > > receive it. IIRC, this was the intended behavior. > > I assume that, for my example, there are two process > groups. > The > process group for process 0 contains the shell for process 0 and the > executable for process 0; and the process group for process 1 contains > the shell for process 1 and the executable for process 1. So what > does > MPI_ABORT do? MPI_ABORT does not kill the process group for process > 0, > > since the shell for process 0 continues. And MPI_ABORT does not kill > the process group for process 1, since both the shell and executable > for > process 1 continue. > > If I hit Ctrl-C after MPI_Abort is called, I get the message > > mpirun: abort is already in progress.. hit ctrl-c again to > forcibly terminate > > but I don't need to hit Ctrl-C again because mpirun > immediately > exits. > > Can you shed some light on all of this? > > Sincerely, > > Ted Sussman > > > On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote: > > > You have to understand that we have no way of > knowing who is > making MPI calls - all we see is > the proc that we started, and we know someone of > that rank is > running (but we have no way of > knowing which of the procs you sub-spawned it is). > > So the behavior you are seeking only occurred in > some earlier > release by sheer accident. Nor will > you find it portable as there is no specification > directing > that > behavior. > > The behavior I´ve provided is to either deliver the > signal to > _ > all_ child processes (including > grandchildren etc.), or _only_ the immediate child > of the > daemon. > It won´t do what you describe - > kill the mPI proc underneath the shell, but not the > shell > itself. > > What you can eventually do is use PMIx to ask the > runtime to > selectively deliver signals to > pid/procs for you. We don´t have that capability > implemented > just yet, I´m afraid. > > Meantime, when I get a chance, I can code an > option that will > record the pid of the subproc that > calls MPI_Init, and then let´s you deliver signals to > just > that > proc. No promises as to when that will > be done. > > > On Jun 15, 2017, at 1:37 PM, Ted Sussman > <ted.sussman@ > adina. > com> wrote: > > Hello Ralph, > > I am just an Open MPI end user, so I will need to > wait for > the next official release. > > mpirun --> shell for process 0 --> executable for > process > 0 > --> MPI calls > --> shell for process 1 --> executable for process > 1 > --> MPI calls > ... > > I guess the question is, should MPI_ABORT kill the > executables or the shells? I naively > thought, that, since it is the executables that make > the > MPI > calls, it is the executables that > should be aborted by the call to MPI_ABORT. Since > the > shells don't make MPI calls, the > shells should not be aborted. > > And users might have several layers of shells in > between > mpirun and the executable. > > So now I will look for the latest version of Open MPI > that > has the 1.4.3 behavior. > > Sincerely, > > Ted Sussman > > On 15 Jun 2017 at 12:31, r...@open-mpi.org wrote: > > > > > Yeah, things jittered a little there as we debated > the " > right" behavior. Generally, when we > see that > > happening it means that a param is required, but > somehow > we never reached that point. > > > > See if https://github.com/open-mpi/ompi/pull/3704 > helps > - > if so, I can schedule it for the next > 2.x > > release if the RMs agree to take it > > > > Ralph > > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted. > sussman > @adina.com > wrote: > > > > Thank you for your comments. > > > > Our application relies upon "dum.sh" to clean up > after > the process exits, either if the > process > > exits normally, or if the process exits abnormally > because of MPI_ABORT. If the process > > group is killed by MPI_ABORT, this clean up will not > be performed. If exec is used to launch > > the executable from dum.sh, then dum.sh is > terminated > by the exec, so dum.sh cannot > > perform any clean up. > > > > I suppose that other user applications might work > similarly, so it would be good to have an > > MCA parameter to control the behavior of > MPI_ABORT. > > > > We could rewrite our shell script that invokes > mpirun, > so that the cleanup that is now done > > by > > dum.sh is done by the invoking shell script after > mpirun exits. Perhaps this technique is the > > preferred way to clean up after mpirun is invoked. > > > > By the way, I have also tested with Open MPI > 1.10.7, > and Open MPI 1.10.7 has different > > behavior than either Open MPI 1.4.3 or Open MPI > 2.1. > 1. > In this explanation, it is important to > > know that the aborttest executable sleeps for 20 > sec. > > > > When running example 2: > > > > 1.4.3: process 1 immediately aborts > > 1.10.7: process 1 doesn't abort and never stops. > > 2.1.1 process 1 doesn't abort, but stops after it is > finished sleeping > > > > Sincerely, > > > > Ted Sussman > > > > On 15 Jun 2017 at 9:18, r...@open-mpi.org wrote: > > > > Here is how the system is working: > > > > Master: each process is put into its own process > group > upon launch. When we issue a > > "kill", however, we only issue it to the individual > process (instead of the process group > > that is headed by that child process). This is > probably a bug as I don´t believe that is > > what we intended, but set that aside for now. > > > > 2.x: each process is put into its own process group > upon launch. When we issue a > > "kill", we issue it to the process group. Thus, > every > child proc of that child proc will > > receive it. IIRC, this was the intended behavior. > > > > It is rather trivial to make the change (it only > involves 3 lines of code), but I´m not sure > > of what our intended behavior is supposed to be. > Once > we clarify that, it is also trivial > > to add another MCA param (you can never have too > many!) > to allow you to select the > > other behavior. > > > > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted. > sussman@ > adina.com > wrote: > > > > Hello Gilles, > > > > Thank you for your quick answer. I confirm that if > exec is used, both processes > > immediately > > abort. > > > > Now suppose that the line > > > > echo "After aborttest: > > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_ > WORLD_RANK > > > > is added to the end of dum.sh. > > > > If Example 2 is run with Open MPI 1.4.3, the output > is > > > > After aborttest: OMPI_COMM_WORLD_RANK=0 > > > > which shows that the shell script for the process > with > rank 0 continues after the > > abort, > > but that the shell script for the process with rank > 1 > does not continue after the > > abort. > > > > If Example 2 is run with Open MPI 2.1.1, with exec > used to invoke > > aborttest02.exe, then > > there is no such output, which shows that both shell > scripts do not continue after > > the abort. > > > > I prefer the Open MPI 1.4.3 behavior because our > original application depends > > upon the > > Open MPI 1.4.3 behavior. (Our original application > will also work if both > > executables are > > aborted, and if both shell scripts continue after > the > abort.) > > > > It might be too much to expect, but is there a way > to > recover the Open MPI 1.4.3 > > behavior > > using Open MPI 2.1.1? > > > > Sincerely, > > > > Ted Sussman > > > > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote: > > > > Ted, > > > > > > fwiw, the 'master' branch has the behavior you > expect. > > > > > > meanwhile, you can simple edit your 'dum.sh' script > and replace > > > > /home/buildadina/src/aborttest02/aborttest02.exe > > > > with > > > > exec /home/buildadina/src/aborttest02/aborttest02. > exe > > > > > > Cheers, > > > > > > Gilles > > > > > > On 6/15/2017 3:01 AM, Ted Sussman wrote: > > Hello, > > > > My question concerns MPI_ABORT, indirect > execution > of > > executables by mpirun and Open > > MPI 2.1.1. When mpirun runs executables directly, > MPI > _ABORT > > works as expected, but > > when mpirun runs executables indirectly, > MPI_ABORT > does not > > work as expected. > > > > If Open MPI 1.4.3 is used instead of Open MPI > 2.1.1, > MPI_ABORT > > works as expected in all > > cases. > > > > The examples given below have been simplified as > far > as possible > > to show the issues. > > > > --- > > > > Example 1 > > > > Consider an MPI job run in the following way: > > > > mpirun ... -app addmpw1 > > > > where the appfile addmpw1 lists two executables: > > > > -n 1 -host gulftown ... aborttest02.exe > > -n 1 -host gulftown ... aborttest02.exe > > > > The two executables are executed on the local node > gulftown. > > aborttest02 calls MPI_ABORT > > for rank 0, then sleeps. > > > > The above MPI job runs as expected. Both > processes > immediately > > abort when rank 0 calls > > MPI_ABORT. > > > > --- > > > > Example 2 > > > > Now change the above example as follows: > > > > mpirun ... -app addmpw2 > > > > where the appfile addmpw2 lists shell scripts: > > > > -n 1 -host gulftown ... dum.sh > > -n 1 -host gulftown ... dum.sh > > > > dum.sh invokes aborttest02.exe. So aborttest02.exe > is > executed > > indirectly by mpirun. > > > > In this case, the MPI job only aborts process 0 when > rank 0 calls > > MPI_ABORT. Process 1 > > continues to run. This behavior is unexpected. > > > > ---- > > > > I have attached all files to this E-mail. Since > there > are absolute > > pathnames in the files, to > > reproduce my findings, you will need to update the > pathnames in the > > appfiles and shell > > scripts. To run example 1, > > > > sh run1.sh > > > > and to run example 2, > > > > sh run2.sh > > > > --- > > > > I have tested these examples with Open MPI 1.4.3 > and > 2. > 0.3. In > > Open MPI 1.4.3, both > > examples work as expected. Open MPI 2.0.3 has > the > same behavior > > as Open MPI 2.1.1. > > > > --- > > > > I would prefer that Open MPI 2.1.1 aborts both > processes, even > > when the executables are > > invoked indirectly by mpirun. If there is an MCA > setting that is > > needed to make Open MPI > > 2.1.1 abort both processes, please let me know. > > > > > > Sincerely, > > > > Theodore Sussman > > > > > > The following section of this message contains a > file > attachment > > prepared for transmission using the Internet MIME > message format. > > If you are using Pegasus Mail, or any other MIME- > compliant system, > > you should be able to save it or view it from within > your mailer. > > If you cannot, please ask your system administrator > for assistance. > > > > ---- File information ----------- > > File: config.log.bz2 > > Date: 14 Jun 2017, 13:35 > > Size: 146548 bytes. > > Type: Binary > > > > > > The following section of this message contains a > file > attachment > > prepared for transmission using the Internet MIME > message format. > > If you are using Pegasus Mail, or any other MIME- > compliant system, > > you should be able to save it or view it from within > your mailer. > > If you cannot, please ask your system administrator > for assistance. > > > > ---- File information ----------- > > File: ompi_info.bz2 > > Date: 14 Jun 2017, 13:35 > > Size: 24088 bytes. > > Type: Binary > > > > > > The following section of this message contains a > file > attachment > > prepared for transmission using the Internet MIME > message format. > > If you are using Pegasus Mail, or any other MIME- > compliant system, > > you should be able to save it or view it from within > your mailer. > > If you cannot, please ask your system administrator > for assistance. > > > > ---- File information ----------- > > File: aborttest02.tgz > > Date: 14 Jun 2017, 13:52 > > Size: 4285 bytes. > > Type: Binary > > > > > > > ________________________________________ > _______ > > users mailing list > > users@lists.open-mpi.org > > > https://rfd.newmexicoconsortium.org/mailman/listin > fo/users > > > > > > > ________________________________________ > _______ > > users mailing list > > users@lists.open-mpi.org > > > https://rfd.newmexicoconsortium.org/mailman/listin > fo/users > > > > > > > > > > > ________________________________________ > _______ > > users mailing list > > users@lists.open-mpi.org > > > https://rfd.newmexicoconsortium.org/mailman/listin > fo/users > > > > > > > ________________________________________ > _______ > > users mailing list > > users@lists.open-mpi.org > > > https://rfd.newmexicoconsortium.org/mailman/listin > fo/users > > > > > > > > > > > ________________________________________ > _______ > > users mailing list > > users@lists.open-mpi.org > > > https://rfd.newmexicoconsortium.org/mailman/listin > fo/users > > > > > > > __________________________________________ > _____ > users mailing list > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listin > fo/users > > > > _____________________________________________ > __ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/us > ers > > > -- > Jeff Squyres > jsquy...@cisco.com > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > The following section of this message contains a file attachment > prepared for transmission using the Internet MIME message format. > If you are using Pegasus Mail, or any other MIME-compliant system, > you should be able to save it or view it from within your mailer. > If you cannot, please ask your system administrator for assistance. > > ---- File information ----------- > File: aborttest10.tgz > Date: 19 Jun 2017, 12:42 > Size: 4740 bytes. > Type: Binary > <aborttest10.tgz>_______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users