Re: [OMPI users] UCX errors after upgrade

2019-10-03 Thread Geoffrey Paulsen via users
To address Raymond Muno's question: "Any estimate on when OpenMPI 4.2 will be released?" Howard and I are the release managers for v4.0.2.  We expect to publish it tomorrow, assuming no show-stop bugs come up in testing this week. ---Geoff

Re: [OMPI users] UCX errors after upgrade

2019-10-02 Thread Raymond Muno via users
We are now using OpenMPI 4.0.2RC2 and RC3 compiled (with Intel, PGI and GCC)  with MLNX_OFED 4.7 (released a couple days ago). This supplies UCX 1.7.  So far, it seems like things are working well. Any estimate on when OpenMPI 4.2 will be released? On 9/25/19 2:27 PM, Jeff Squyres (jsquyres)

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Jeff Squyres (jsquyres) via users
Thanks Raymond; I have filed an issue for this on Github and tagged the relevant Mellanox people: https://github.com/open-mpi/ompi/issues/7009 On Sep 25, 2019, at 3:09 PM, Raymond Muno via users mailto:users@lists.open-mpi.org>> wrote: We are running against 4.0.2RC2 now. This is ussing

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
As a test, I rebooted a set of nodes. The user could run on 480 cores, on 5 nodes. We could not run beyond two nodes previous to that. We still get the VM_UNMAP warning, however. On 9/25/19 2:09 PM, Raymond Muno via users wrote: We are running against 4.0.2RC2 now. This is ussing current Inte

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
We are running against 4.0.2RC2 now. This is ussing current Intel compilers, version 2019update4. Still having issues. [epyc-compute-1-3.local:17402] common_ucx.c:149  Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. [epyc-compute-1-3.

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Jeff Squyres (jsquyres) via users
Can you try the latest 4.0.2rc tarball? We're very, very close to releasing v4.0.2... I don't know if there's a specific UCX fix in there, but there are a ton of other good bug fixes in there since v4.0.1. On Sep 25, 2019, at 2:12 PM, Raymond Muno via users mailto:users@lists.open-mpi.org>>