Ted,
fwiw, the 'master' branch has the behavior you expect.
meanwhile, you can simple edit your 'dum.sh' script and replace
/home/buildadina/src/aborttest02/aborttest02.exe
with
exec /home/buildadina/src/aborttest02/aborttest02.exe
Cheers,
Gilles
On 6/15/2017 3:01 AM, Ted Sussman wrote:
Hello,
My question concerns MPI_ABORT, indirect execution of executables by mpirun and
Open
MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT works as expected,
but
when mpirun runs executables indirectly, MPI_ABORT does not work as expected.
If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as
expected in all
cases.
The examples given below have been simplified as far as possible to show the
issues.
---
Example 1
Consider an MPI job run in the following way:
mpirun ... -app addmpw1
where the appfile addmpw1 lists two executables:
-n 1 -host gulftown ... aborttest02.exe
-n 1 -host gulftown ... aborttest02.exe
The two executables are executed on the local node gulftown. aborttest02 calls
MPI_ABORT
for rank 0, then sleeps.
The above MPI job runs as expected. Both processes immediately abort when rank
0 calls
MPI_ABORT.
---
Example 2
Now change the above example as follows:
mpirun ... -app addmpw2
where the appfile addmpw2 lists shell scripts:
-n 1 -host gulftown ... dum.sh
-n 1 -host gulftown ... dum.sh
dum.sh invokes aborttest02.exe. So aborttest02.exe is executed indirectly by
mpirun.
In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT.
Process 1
continues to run. This behavior is unexpected.
----
I have attached all files to this E-mail. Since there are absolute pathnames
in the files, to
reproduce my findings, you will need to update the pathnames in the appfiles
and shell
scripts. To run example 1,
sh run1.sh
and to run example 2,
sh run2.sh
---
I have tested these examples with Open MPI 1.4.3 and 2.0.3. In Open MPI 1.4.3,
both
examples work as expected. Open MPI 2.0.3 has the same behavior as Open MPI
2.1.1.
---
I would prefer that Open MPI 2.1.1 aborts both processes, even when the
executables are
invoked indirectly by mpirun. If there is an MCA setting that is needed to
make Open MPI
2.1.1 abort both processes, please let me know.
Sincerely,
Theodore Sussman
The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.
---- File information -----------
File: config.log.bz2
Date: 14 Jun 2017, 13:35
Size: 146548 bytes.
Type: Binary
The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.
---- File information -----------
File: ompi_info.bz2
Date: 14 Jun 2017, 13:35
Size: 24088 bytes.
Type: Binary
The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.
---- File information -----------
File: aborttest02.tgz
Date: 14 Jun 2017, 13:52
Size: 4285 bytes.
Type: Binary
_______________________________________________
users mailing list
[email protected]
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://rfd.newmexicoconsortium.org/mailman/listinfo/users