I would suggest trying OMPI v4.1.4 (or the v5 snapshot)
 * https://www.open-mpi.org/software/ompi/v4.1/ 
<https://www.open-mpi.org/software/ompi/v4.1/> 
 * https://www.mail-archive.com/announce@lists.open-mpi.org//msg00152.html 
<https://www.mail-archive.com/announce@lists.open-mpi.org//msg00152.html> 

We fixed some large payload collective issues in that release which might be 
what you are seeing here with MPI_Alltoallv with the tuned collective component.



On Thu, Jun 2, 2022 at 1:54 AM Mikhail Brinskii via users 
<users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote:
Hi Eric,

 

Yes, UCX is supposed to be stable for large sized problems.

Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and OMPI-4.1.2 + 
UCX1.11.2?

Have you also tried to run large sized problems test with OMPI-5.0.x?

Regarding the application, at some point it invokes MPI_Alltoallv sending more 
than 2GB to some of the ranks (using derived dt), right?

 

//WBR, Mikhail

 

From: users <users-boun...@lists.open-mpi.org 
<mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Eric Chamberland via 
users
Sent: Thursday, June 2, 2022 5:31 AM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Cc: Eric Chamberland <eric.chamberl...@giref.ulaval.ca 
<mailto:eric.chamberl...@giref.ulaval.ca> >; Thomas Briffard 
<thomas.briff...@michelin.com <mailto:thomas.briff...@michelin.com> >; Vivien 
Clauzon <vivien.clau...@michelin.com <mailto:vivien.clau...@michelin.com> >; 
dave.mar...@giref.ulaval.ca <mailto:dave.mar...@giref.ulaval.ca> ; Ramses van 
Zon <r...@scinet.utoronto.ca <mailto:r...@scinet.utoronto.ca> >; 
charles.coulomb...@ulaval.ca <mailto:charles.coulomb...@ulaval.ca> 
Subject: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 
and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

 

Hi,

In the past, we have successfully launched large sized (finite elements) 
computations using PARMetis as mesh partitioner.

It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 with 
OpenMPI 3.1.2 that we succeeded.

Today, we have a bunch of nightly (small) tests running nicely and testing all 
of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and IntelMPI 2021.6.

Preparing for launching the same computation we did in 2012, and even larger 
ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI 4.1.2+ucx-1.11.2 
and launched computation from small to large problems (meshes).

For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using and call 
ParMETIS_V3_PartMeshKway, we always get a segfault with the same backtrace 
pointing into ucx library:

Wed Jun  1 23:04:54 
2022<stdout>:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut VmSize: 
1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 359012 <etiq_18> 
Wed Jun  1 23:07:07 2022<stdout>:Erreur    :  MEF++ Signal recu : 11 :  
segmentation violation  
Wed Jun  1 23:07:07 2022<stdout>:Erreur    :   
Wed Jun  1 23:07:07 2022<stdout>:------------------------------ (Début des 
informations destinées aux développeurs C++) ------------------------------
Wed Jun  1 23:07:07 2022<stdout>:La pile d'appels contient 27 symboles. 
Wed Jun  1 23:07:07 2022<stdout>:# 000: 
reqBacktrace(std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71)
 [0x4119f1])
Wed Jun  1 23:07:07 2022<stdout>:# 001: attacheDebugger()  >>>  probGD.opt 
(probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022<stdout>:# 002: 
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f)
 [0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022<stdout>:# 003: /lib64/libc.so.6(+0x36400) 
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022<stdout>:# 004: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123)
 [0x2ab3c966e353]
Wed Jun  1 23:07:07 2022<stdout>:# 005: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7)
 [0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022<stdout>:# 006: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7)
 [0x2ab3ca712137]
Wed Jun  1 23:07:07 2022<stdout>:# 007: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c)
 [0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022<stdout>:# 008: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad)
 [0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022<stdout>:# 009: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2)
 [0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022<stdout>:# 010: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92)
 [0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022<stdout>:# 011: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141)
 [0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022<stdout>:# 012: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42)
 [0x2ab3d4836da2]
Wed Jun  1 23:07:07 2022<stdout>:# 013: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(PMPI_Alltoallv+0x29)
 [0x2ab3bbc7bdf9]
Wed Jun  1 23:07:07 2022<stdout>:# 014: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x106)
 [0x2ab3bb0e1c06]
Wed Jun  1 23:07:07 2022<stdout>:# 015: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0xdd6)
 [0x2ab3bb0f10b6]
Wed Jun  1 23:07:07 2022<stdout>:# 016: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100)
 [0x2ab3bb0f1ac0]

PARMetis is compiled as part of PETSc-3.17.1 with 64bit indices.  Here are 
PETSc configure options:

--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1
COPTFLAGS=\"-O2 -march=native\"
CXXOPTFLAGS=\"-O2 -march=native\"
FOPTFLAGS=\"-O2 -march=native\"
--download-fftw=1
--download-hdf5=1
--download-hypre=1
--download-metis=1
--download-mumps=1
--download-parmetis=1
--download-plapack=1
--download-prometheus=1
--download-ptscotch=1
--download-scotch=1
--download-sprng=1
--download-superlu_dist=1
--download-triangle=1
--with-avx512-kernels=1
--with-blaslapack-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
--with-cc=mpicc
--with-cxx=mpicxx
--with-cxx-dialect=C++11
--with-debugging=0
--with-fc=mpifort
--with-mkl_pardiso-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
--with-scalapack=1
--with-scalapack-lib=\"[/scinet/intel/oneapi/2021u4/mkl/2021.4.0/lib/intel64/libmkl_scalapack_lp64.so,/scinet/intel/oneapi/2021u4/mkl/2021.4.0/lib/intel64/libmkl_blacs_openmpi_lp64.so]\"
--with-x=0
--with-64-bit-indices=1
--with-memalign=64

and OpenMPI configure options:

'--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2'
'--enable-mpi-cxx'
'--enable-mpi1-compatibility'
'--with-hwloc=internal'
'--with-knem=/opt/knem-1.1.3.90mlnx1'
'--with-libevent=internal'
'--with-platform=contrib/platform/mellanox/optimized'
'--with-pmix=internal'
'--with-slurm=/opt/slurm'
'--with-ucx=/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2'

I am then wondering:

1) Is UCX library considered "stable" for production use with very large sized 
problems ?

2) Is there a way to "bypass" UCX at runtime?

3) Any idea for debugging this?

Of course, I do not yet have a "minimum reproducer" that bugs, since it happens 
only on "large" problems, but I think I could export the data for a 512 
processes reproducer with PARMetis call only...

Thanks for helping,

Eric


-- 


Eric Chamberland, ing., M. Ing


Professionnel de recherche


GIREF/Université Laval


(418) 656-2131 poste 41 22 42



-- 
Josh Hursey
IBM Spectrum MPI Developer

Reply via email to