Howard, This fixed the issue with OpenMPI 3.1.0. Do you want me to try the same with 3.1.1 as well?
S. -- Si Hammond Scalable Computer Architectures Sandia National Laboratories, NM, USA From: users <users-boun...@lists.open-mpi.org> on behalf of Howard Pritchard <hpprit...@gmail.com> Reply-To: Open MPI Users <users@lists.open-mpi.org> Date: Monday, July 2, 2018 at 1:34 PM To: Open MPI Users <users@lists.open-mpi.org> Subject: Re: [OMPI users] [EXTERNAL] Re: OpenMPI 3.1.0 Lock Up on POWER9 w/ CUDA9.2 HI Si, Could you add --disable-builtin-atomics to the configure options and see if the hang goes away? Howard 2018-07-02 8:48 GMT-06:00 Jeff Squyres (jsquyres) via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>: Simon -- You don't currently have another Open MPI installation in your PATH / LD_LIBRARY_PATH, do you? I have seen dependency library loads cause "make check" to get confused, and instead of loading the libraries from the build tree, actually load some -- but not all -- of the required OMPI/ORTE/OPAL/etc. libraries from an installation tree. Hilarity ensues (to include symptoms such as running forever). Can you double check that you have no Open MPI libraries in your LD_LIBRARY_PATH before running "make check" on the build tree? > On Jun 30, 2018, at 3:18 PM, Hammond, Simon David via users > <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: > > Nathan, > > Same issue with OpenMPI 3.1.1 on POWER9 with GCC 7.2.0 and CUDA9.2. > > S. > > -- > Si Hammond > Scalable Computer Architectures > Sandia National Laboratories, NM, USA > [Sent from remote connection, excuse typos] > > > On 6/16/18, 10:10 PM, "Nathan Hjelm" <hje...@me.com<mailto:hje...@me.com>> > wrote: > > Try the latest nightly tarball for v3.1.x. Should be fixed. > >> On Jun 16, 2018, at 5:48 PM, Hammond, Simon David via users >> <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: >> >> The output from the test in question is: >> >> Single thread test. Time: 0 s 10182 us 10 nsec/poppush >> Atomics thread finished. Time: 0 s 169028 us 169 nsec/poppush >> <then runs forever> >> >> S. >> >> -- >> Si Hammond >> Scalable Computer Architectures >> Sandia National Laboratories, NM, USA >> [Sent from remote connection, excuse typos] >> >> >> On 6/16/18, 5:45 PM, "Hammond, Simon David" >> <sdha...@sandia.gov<mailto:sdha...@sandia.gov>> wrote: >> >> Hi OpenMPI Team, >> >> We have recently updated an install of OpenMPI on POWER9 system >> (configuration details below). We migrated from OpenMPI 2.1 to OpenMPI 3.1. >> We seem to have a symptom where code than ran before is now locking up and >> making no progress, getting stuck in wait-all operations. While I think it's >> prudent for us to root cause this a little more, I have gone back and >> rebuilt MPI and re-run the "make check" tests. The opal_fifo test appears to >> hang forever. I am not sure if this is the cause of our issue but wanted to >> report that we are seeing this on our system. >> >> OpenMPI 3.1.0 Configuration: >> >> ./configure >> --prefix=/home/projects/ppc64le-pwr9-nvidia/openmpi/3.1.0-nomxm/gcc/7.2.0/cuda/9.2.88 >> --with-cuda=$CUDA_ROOT --enable-mpi-java --enable-java >> --with-lsf=/opt/lsf/10.1 >> --with-lsf-libdir=/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/lib --with-verbs >> >> GCC versions are 7.2.0, built by our team. CUDA is 9.2.88 from NVIDIA for >> POWER9 (standard download from their website). We enable IBM's JDK 8.0.0. >> RedHat: Red Hat Enterprise Linux Server release 7.5 (Maipo) >> >> Output: >> >> make[3]: Entering directory >> `/home/sdhammo/openmpi/openmpi-3.1.0/test/class' >> make[4]: Entering directory >> `/home/sdhammo/openmpi/openmpi-3.1.0/test/class' >> PASS: ompi_rb_tree >> PASS: opal_bitmap >> PASS: opal_hash_table >> PASS: opal_proc_table >> PASS: opal_tree >> PASS: opal_list >> PASS: opal_value_array >> PASS: opal_pointer_array >> PASS: opal_lifo >> <runs forever> >> >> Output from Top: >> >> 20 0 73280 4224 2560 S 800.0 0.0 17:22.94 lt-opal_fifo >> >> -- >> Si Hammond >> Scalable Computer Architectures >> Sandia National Laboratories, NM, USA >> [Sent from remote connection, excuse typos] >> >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/users > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users