Hi - we’re having a weird problem with OpenMPI on our newish infiniband EDR (mlx5) nodes. We're running CentOS 7.6, with all the infiniband and ucx libraries as provided by CentOS, i.e. ucx-1.4.0-1.el7.x86_64 libibverbs-utils-17.2-3.el7.x86_64 libibverbs-17.2-3.el7.x86_64 libibumad-17.2-3.el7.x86_64 kernel is 3.10.0-957.21.2.el7.x86_64 I’ve compiled my open OpenMPI, version 4.0.1 (—with-verbs —with-ofi —with-ucx).
The job is started with mpirun —mca pml ucx —mca btl ^vader,tcp,openib as recommended for ucx. We have some jobs (one particular code, some but not all sets of input parameters) that appear to take an increasing amount of memory (in MPI?) until the node crashes. The total memory used by all processes (reported by ps or top) is not increasing, but “free” reports less and less available memory. Within a couple of minutes it uses all of the 96GB on each of the nodes. When the job is killed the processes go away, but the memory usage (as reported by “free”) stays the same, e.g.: total used free shared buff/cache available Mem: 98423956 88750140 7021688 2184 2652128 6793020 Swap: 65535996 365312 65170684 As far as I can tell I have to reboot to get the memory back. If I attach to a running process with “gdb -p”, I see stack traces that look like these two examples (starting from the first mpi-related call): #0 0x00002b22a95134a3 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00002b22be73a3e8 in mlx5_poll_cq_v1 () from /usr/lib64/libibverbs/libmlx5-rdmav17.so #2 0x00002b22bcb267de in uct_ud_verbs_iface_progress () from /lib64/libuct.so.0 #3 0x00002b22bc8d28b2 in ucp_worker_progress () from /lib64/libucp.so.0 #4 0x00002b22b7cd14e7 in mca_pml_ucx_progress () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/openmpi/mca_pml_ucx.so #5 0x00002b22ab6064fc in opal_progress () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libopen-pal.so.40 #6 0x00002b22a9f51dc5 in ompi_request_default_wait () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 #7 0x00002b22a9fa355c in ompi_coll_base_allreduce_intra_ring () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 #8 0x00002b22a9f65cb3 in PMPI_Allreduce () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 #9 0x00002b22a9cedf9b in pmpi_allreduce__ () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi_mpifh.so.40 #0 0x00002ae0518de69d in write () from /lib64/libpthread.so.0 #1 0x00002ae064458d7f in ibv_cmd_reg_mr () from /usr/lib64/libibverbs.so.1 #2 0x00002ae066b9221b in mlx5_reg_mr () from /usr/lib64/libibverbs/libmlx5-rdmav17.so #3 0x00002ae064461f08 in ibv_reg_mr () from /usr/lib64/libibverbs.so.1 #4 0x00002ae064f6e312 in uct_ib_md_reg_mr.isra.11.constprop () from /lib64/libuct.so.0 #5 0x00002ae064f6e4f2 in uct_ib_rcache_mem_reg_cb () from /lib64/libuct.so.0 #6 0x00002ae0651aec0f in ucs_rcache_get () from /lib64/libucs.so.0 #7 0x00002ae064f6d6a4 in uct_ib_mem_rcache_reg () from /lib64/libuct.so.0 #8 0x00002ae064d1fa58 in ucp_mem_rereg_mds () from /lib64/libucp.so.0 #9 0x00002ae064d21438 in ucp_request_memory_reg () from /lib64/libucp.so.0 #10 0x00002ae064d21663 in ucp_request_send_start () from /lib64/libucp.so.0 #11 0x00002ae064d335dd in ucp_tag_send_nb () from /lib64/libucp.so.0 #12 0x00002ae06420a5e6 in mca_pml_ucx_start () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/openmpi/mca_pml_ucx.so #13 0x00002ae05236fc06 in ompi_coll_base_alltoall_intra_basic_linear () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 #14 0x00002ae05232f347 in PMPI_Alltoall () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 #15 0x00002ae0520b704c in pmpi_alltoall__ () from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi_mpifh.so.40 This doesn’t seem to happen on our older nodes (which have FDR mlx4 interfaces). I don’t really have a mental model for OpenMPI's memory use, so I don’t know what component I should investigate: OpenMPI itself? ucx? OFED? Something else? IF anyone has any suggestions for what to try, and/or what other information would be useful, I’d appreciate it. thanks, Noam
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users