Nathan,

Same issue with OpenMPI 3.1.1 on POWER9 with GCC 7.2.0 and CUDA9.2.

S.

-- 
Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM, USA
[Sent from remote connection, excuse typos]
 

On 6/16/18, 10:10 PM, "Nathan Hjelm" <hje...@me.com> wrote:

    Try the latest nightly tarball for v3.1.x. Should be fixed. 
    
    > On Jun 16, 2018, at 5:48 PM, Hammond, Simon David via users 
<users@lists.open-mpi.org> wrote:
    > 
    > The output from the test in question is:
    > 
    > Single thread test. Time: 0 s 10182 us 10 nsec/poppush
    > Atomics thread finished. Time: 0 s 169028 us 169 nsec/poppush
    > <then runs forever>
    > 
    > S.
    > 
    > -- 
    > Si Hammond
    > Scalable Computer Architectures
    > Sandia National Laboratories, NM, USA
    > [Sent from remote connection, excuse typos]
    > 
    > 
    > On 6/16/18, 5:45 PM, "Hammond, Simon David" <sdha...@sandia.gov> wrote:
    > 
    >    Hi OpenMPI Team,
    > 
    >    We have recently updated an install of OpenMPI on POWER9 system 
(configuration details below). We migrated from OpenMPI 2.1 to OpenMPI 3.1. We 
seem to have a symptom where code than ran before is now locking up and making 
no progress, getting stuck in wait-all operations. While I think it's prudent 
for us to root cause this a little more, I have gone back and rebuilt MPI and 
re-run the "make check" tests. The opal_fifo test appears to hang forever. I am 
not sure if this is the cause of our issue but wanted to report that we are 
seeing this on our system.
    > 
    >    OpenMPI 3.1.0 Configuration:
    > 
    >    ./configure 
--prefix=/home/projects/ppc64le-pwr9-nvidia/openmpi/3.1.0-nomxm/gcc/7.2.0/cuda/9.2.88
 --with-cuda=$CUDA_ROOT --enable-mpi-java --enable-java 
--with-lsf=/opt/lsf/10.1 
--with-lsf-libdir=/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/lib --with-verbs
    > 
    >    GCC versions are 7.2.0, built by our team. CUDA is 9.2.88 from NVIDIA 
for POWER9 (standard download from their website). We enable IBM's JDK 8.0.0.
    >    RedHat: Red Hat Enterprise Linux Server release 7.5 (Maipo)
    > 
    >    Output:
    > 
    >    make[3]: Entering directory 
`/home/sdhammo/openmpi/openmpi-3.1.0/test/class'
    >    make[4]: Entering directory 
`/home/sdhammo/openmpi/openmpi-3.1.0/test/class'
    >    PASS: ompi_rb_tree
    >    PASS: opal_bitmap
    >    PASS: opal_hash_table
    >    PASS: opal_proc_table
    >    PASS: opal_tree
    >    PASS: opal_list
    >    PASS: opal_value_array
    >    PASS: opal_pointer_array
    >    PASS: opal_lifo
    >    <runs forever>
    > 
    >    Output from Top:
    > 
    >    20   0   73280   4224   2560 S 800.0  0.0  17:22.94 lt-opal_fifo
    > 
    >    -- 
    >    Si Hammond
    >    Scalable Computer Architectures
    >    Sandia National Laboratories, NM, USA
    >    [Sent from remote connection, excuse typos]
    > 
    > 
    > 
    > 
    > _______________________________________________
    > users mailing list
    > users@lists.open-mpi.org
    > https://lists.open-mpi.org/mailman/listinfo/users
    

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to