HI Si,

Could you add --disable-builtin-atomics

to the configure options and see if the hang goes away?

Howard


2018-07-02 8:48 GMT-06:00 Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org>:

> Simon --
>
> You don't currently have another Open MPI installation in your PATH /
> LD_LIBRARY_PATH, do you?
>
> I have seen dependency library loads cause "make check" to get confused,
> and instead of loading the libraries from the build tree, actually load
> some -- but not all -- of the required OMPI/ORTE/OPAL/etc. libraries from
> an installation tree.  Hilarity ensues (to include symptoms such as running
> forever).
>
> Can you double check that you have no Open MPI libraries in your
> LD_LIBRARY_PATH before running "make check" on the build tree?
>
>
>
> > On Jun 30, 2018, at 3:18 PM, Hammond, Simon David via users <
> users@lists.open-mpi.org> wrote:
> >
> > Nathan,
> >
> > Same issue with OpenMPI 3.1.1 on POWER9 with GCC 7.2.0 and CUDA9.2.
> >
> > S.
> >
> > --
> > Si Hammond
> > Scalable Computer Architectures
> > Sandia National Laboratories, NM, USA
> > [Sent from remote connection, excuse typos]
> >
> >
> > On 6/16/18, 10:10 PM, "Nathan Hjelm" <hje...@me.com> wrote:
> >
> >    Try the latest nightly tarball for v3.1.x. Should be fixed.
> >
> >> On Jun 16, 2018, at 5:48 PM, Hammond, Simon David via users <
> users@lists.open-mpi.org> wrote:
> >>
> >> The output from the test in question is:
> >>
> >> Single thread test. Time: 0 s 10182 us 10 nsec/poppush
> >> Atomics thread finished. Time: 0 s 169028 us 169 nsec/poppush
> >> <then runs forever>
> >>
> >> S.
> >>
> >> --
> >> Si Hammond
> >> Scalable Computer Architectures
> >> Sandia National Laboratories, NM, USA
> >> [Sent from remote connection, excuse typos]
> >>
> >>
> >> On 6/16/18, 5:45 PM, "Hammond, Simon David" <sdha...@sandia.gov> wrote:
> >>
> >>   Hi OpenMPI Team,
> >>
> >>   We have recently updated an install of OpenMPI on POWER9 system
> (configuration details below). We migrated from OpenMPI 2.1 to OpenMPI 3.1.
> We seem to have a symptom where code than ran before is now locking up and
> making no progress, getting stuck in wait-all operations. While I think
> it's prudent for us to root cause this a little more, I have gone back and
> rebuilt MPI and re-run the "make check" tests. The opal_fifo test appears
> to hang forever. I am not sure if this is the cause of our issue but wanted
> to report that we are seeing this on our system.
> >>
> >>   OpenMPI 3.1.0 Configuration:
> >>
> >>   ./configure --prefix=/home/projects/ppc64le-pwr9-nvidia/openmpi/3.
> 1.0-nomxm/gcc/7.2.0/cuda/9.2.88 --with-cuda=$CUDA_ROOT --enable-mpi-java
> --enable-java --with-lsf=/opt/lsf/10.1 --with-lsf-libdir=/opt/lsf/10.
> 1/linux3.10-glibc2.17-ppc64le/lib --with-verbs
> >>
> >>   GCC versions are 7.2.0, built by our team. CUDA is 9.2.88 from NVIDIA
> for POWER9 (standard download from their website). We enable IBM's JDK
> 8.0.0.
> >>   RedHat: Red Hat Enterprise Linux Server release 7.5 (Maipo)
> >>
> >>   Output:
> >>
> >>   make[3]: Entering directory `/home/sdhammo/openmpi/
> openmpi-3.1.0/test/class'
> >>   make[4]: Entering directory `/home/sdhammo/openmpi/
> openmpi-3.1.0/test/class'
> >>   PASS: ompi_rb_tree
> >>   PASS: opal_bitmap
> >>   PASS: opal_hash_table
> >>   PASS: opal_proc_table
> >>   PASS: opal_tree
> >>   PASS: opal_list
> >>   PASS: opal_value_array
> >>   PASS: opal_pointer_array
> >>   PASS: opal_lifo
> >>   <runs forever>
> >>
> >>   Output from Top:
> >>
> >>   20   0   73280   4224   2560 S 800.0  0.0  17:22.94 lt-opal_fifo
> >>
> >>   --
> >>   Si Hammond
> >>   Scalable Computer Architectures
> >>   Sandia National Laboratories, NM, USA
> >>   [Sent from remote connection, excuse typos]
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to