The result should be the same with v3.1.1. I will investigate on our Coral test
systems.
-Nathan
On Jul 02, 2018, at 02:23 PM, "Hammond, Simon David via users"
<users@lists.open-mpi.org> wrote:
Howard,
This fixed the issue with OpenMPI 3.1.0. Do you want me to try the same with
3.1.1 as well?
S.
--
Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM, USA
From: users <users-boun...@lists.open-mpi.org> on behalf of Howard Pritchard
<hpprit...@gmail.com>
Reply-To: Open MPI Users <users@lists.open-mpi.org>
Date: Monday, July 2, 2018 at 1:34 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] [EXTERNAL] Re: OpenMPI 3.1.0 Lock Up on POWER9 w/
CUDA9.2
HI Si,
Could you add --disable-builtin-atomics
to the configure options and see if the hang goes away?
Howard
2018-07-02 8:48 GMT-06:00 Jeff Squyres (jsquyres) via users
<users@lists.open-mpi.org>:
Simon --
You don't currently have another Open MPI installation in your PATH /
LD_LIBRARY_PATH, do you?
I have seen dependency library loads cause "make check" to get confused, and
instead of loading the libraries from the build tree, actually load some -- but not all
-- of the required OMPI/ORTE/OPAL/etc. libraries from an installation tree. Hilarity
ensues (to include symptoms such as running forever).
Can you double check that you have no Open MPI libraries in your LD_LIBRARY_PATH before
running "make check" on the build tree?
On Jun 30, 2018, at 3:18 PM, Hammond, Simon David via users
<users@lists.open-mpi.org> wrote:
Nathan,
Same issue with OpenMPI 3.1.1 on POWER9 with GCC 7.2.0 and CUDA9.2.
S.
--
Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM, USA
[Sent from remote connection, excuse typos]
On 6/16/18, 10:10 PM, "Nathan Hjelm" <hje...@me.com> wrote:
Try the latest nightly tarball for v3.1.x. Should be fixed.
On Jun 16, 2018, at 5:48 PM, Hammond, Simon David via users
<users@lists.open-mpi.org> wrote:
The output from the test in question is:
Single thread test. Time: 0 s 10182 us 10 nsec/poppush
Atomics thread finished. Time: 0 s 169028 us 169 nsec/poppush
<then runs forever>
S.
--
Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM, USA
[Sent from remote connection, excuse typos]
On 6/16/18, 5:45 PM, "Hammond, Simon David" <sdha...@sandia.gov> wrote:
Hi OpenMPI Team,
We have recently updated an install of OpenMPI on POWER9 system (configuration details
below). We migrated from OpenMPI 2.1 to OpenMPI 3.1. We seem to have a symptom where code
than ran before is now locking up and making no progress, getting stuck in wait-all
operations. While I think it's prudent for us to root cause this a little more, I have
gone back and rebuilt MPI and re-run the "make check" tests. The opal_fifo test
appears to hang forever. I am not sure if this is the cause of our issue but wanted to
report that we are seeing this on our system.
OpenMPI 3.1.0 Configuration:
./configure
--prefix=/home/projects/ppc64le-pwr9-nvidia/openmpi/3.1.0-nomxm/gcc/7.2.0/cuda/9.2.88
--with-cuda=$CUDA_ROOT --enable-mpi-java --enable-java
--with-lsf=/opt/lsf/10.1
--with-lsf-libdir=/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/lib --with-verbs
GCC versions are 7.2.0, built by our team. CUDA is 9.2.88 from NVIDIA for
POWER9 (standard download from their website). We enable IBM's JDK 8.0.0.
RedHat: Red Hat Enterprise Linux Server release 7.5 (Maipo)
Output:
make[3]: Entering directory `/home/sdhammo/openmpi/openmpi-3.1.0/test/class'
make[4]: Entering directory `/home/sdhammo/openmpi/openmpi-3.1.0/test/class'
PASS: ompi_rb_tree
PASS: opal_bitmap
PASS: opal_hash_table
PASS: opal_proc_table
PASS: opal_tree
PASS: opal_list
PASS: opal_value_array
PASS: opal_pointer_array
PASS: opal_lifo
<runs forever>
Output from Top:
20 0 73280 4224 2560 S 800.0 0.0 17:22.94 lt-opal_fifo
--
Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM, USA
[Sent from remote connection, excuse typos]
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
jsquy...@cisco.com
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users