Noam, it may be a stupid question. Could you try running slabtop ss the program executes
Also 'watch cat /proc/meminfo'is also a good diagnostic On Wed, 19 Jun 2019 at 18:32, Noam Bernstein via users < users@lists.open-mpi.org> wrote: > Hi - we’re having a weird problem with OpenMPI on our newish infiniband > EDR (mlx5) nodes. We're running CentOS 7.6, with all the infiniband and > ucx libraries as provided by CentOS, i.e. > > ucx-1.4.0-1.el7.x86_64 > libibverbs-utils-17.2-3.el7.x86_64 > libibverbs-17.2-3.el7.x86_64 > libibumad-17.2-3.el7.x86_64 > > kernel is > > 3.10.0-957.21.2.el7.x86_64 > > I’ve compiled my open OpenMPI, version 4.0.1 (—with-verbs —with-ofi > —with-ucx). > > The job is started with > > mpirun —mca pml ucx —mca btl ^vader,tcp,openib > > as recommended for ucx. > > We have some jobs (one particular code, some but not all sets of input > parameters) that appear to take an increasing amount of memory (in MPI?) > until the node crashes. The total memory used by all processes (reported > by ps or top) is not increasing, but “free” reports less and less available > memory. Within a couple of minutes it uses all of the 96GB on each of the > nodes. When the job is killed the processes go away, but the memory usage > (as reported by “free”) stays the same, e.g.: > > total used free > shared buff/cache available > Mem: 98423956 88750140 7021688 2184 2652128 > 6793020 > Swap: 65535996 365312 65170684 > > As far as I can tell I have to reboot to get the memory back. > > If I attach to a running process with “gdb -p”, I see stack traces that > look like these two examples (starting from the first mpi-related call): > > > #0 0x00002b22a95134a3 in pthread_spin_lock () from /lib64/libpthread.so.0 > #1 0x00002b22be73a3e8 in mlx5_poll_cq_v1 () from > /usr/lib64/libibverbs/libmlx5-rdmav17.so > #2 0x00002b22bcb267de in uct_ud_verbs_iface_progress () from > /lib64/libuct.so.0 > #3 0x00002b22bc8d28b2 in ucp_worker_progress () from /lib64/libucp.so.0 > #4 0x00002b22b7cd14e7 in mca_pml_ucx_progress () > from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/openmpi/mca_pml_ucx.so > #5 0x00002b22ab6064fc in opal_progress () from > /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libopen-pal.so.40 > #6 0x00002b22a9f51dc5 in ompi_request_default_wait () > from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 > #7 0x00002b22a9fa355c in ompi_coll_base_allreduce_intra_ring () > from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 > #8 0x00002b22a9f65cb3 in PMPI_Allreduce () from > /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 > #9 0x00002b22a9cedf9b in pmpi_allreduce__ () > from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi_mpifh.so.40 > > > #0 0x00002ae0518de69d in write () from /lib64/libpthread.so.0 > #1 0x00002ae064458d7f in ibv_cmd_reg_mr () from /usr/lib64/libibverbs.so.1 > #2 0x00002ae066b9221b in mlx5_reg_mr () from > /usr/lib64/libibverbs/libmlx5-rdmav17.so > #3 0x00002ae064461f08 in ibv_reg_mr () from /usr/lib64/libibverbs.so.1 > #4 0x00002ae064f6e312 in uct_ib_md_reg_mr.isra.11.constprop () from > /lib64/libuct.so.0 > #5 0x00002ae064f6e4f2 in uct_ib_rcache_mem_reg_cb () from > /lib64/libuct.so.0 > #6 0x00002ae0651aec0f in ucs_rcache_get () from /lib64/libucs.so.0 > #7 0x00002ae064f6d6a4 in uct_ib_mem_rcache_reg () from /lib64/libuct.so.0 > #8 0x00002ae064d1fa58 in ucp_mem_rereg_mds () from /lib64/libucp.so.0 > #9 0x00002ae064d21438 in ucp_request_memory_reg () from /lib64/libucp.so.0 > #10 0x00002ae064d21663 in ucp_request_send_start () from /lib64/libucp.so.0 > #11 0x00002ae064d335dd in ucp_tag_send_nb () from /lib64/libucp.so.0 > #12 0x00002ae06420a5e6 in mca_pml_ucx_start () > from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/openmpi/mca_pml_ucx.so > #13 0x00002ae05236fc06 in ompi_coll_base_alltoall_intra_basic_linear () > from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 > #14 0x00002ae05232f347 in PMPI_Alltoall () from > /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40 > #15 0x00002ae0520b704c in pmpi_alltoall__ () from > /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi_mpifh.so.40 > > This doesn’t seem to happen on our older nodes (which have FDR mlx4 > interfaces). > > I don’t really have a mental model for OpenMPI's memory use, so I don’t > know what component I should investigate: OpenMPI itself? ucx? OFED? > Something else? IF anyone has any suggestions for what to try, and/or what > other information would be useful, I’d appreciate it. > > thanks, > Noam > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users