Noam, it may be a stupid question. Could you try running    slabtop   ss
the program executes

Also  'watch  cat /proc/meminfo'is also a good diagnostic

On Wed, 19 Jun 2019 at 18:32, Noam Bernstein via users <
users@lists.open-mpi.org> wrote:

> Hi - we’re having a weird problem with OpenMPI on our newish infiniband
> EDR (mlx5) nodes.  We're running CentOS 7.6, with all the infiniband and
> ucx libraries as provided by CentOS, i.e.
>
> ucx-1.4.0-1.el7.x86_64
> libibverbs-utils-17.2-3.el7.x86_64
> libibverbs-17.2-3.el7.x86_64
> libibumad-17.2-3.el7.x86_64
>
> kernel is
>
> 3.10.0-957.21.2.el7.x86_64
>
> I’ve compiled my open OpenMPI, version 4.0.1 (—with-verbs —with-ofi
> —with-ucx).
>
> The job is started with
>
> mpirun —mca pml ucx —mca btl ^vader,tcp,openib
>
> as recommended for ucx.
>
> We have some jobs (one particular code, some but not all sets of input
> parameters) that appear to take an increasing amount of memory (in MPI?)
> until the node crashes.  The total memory used by all processes (reported
> by ps or top) is not increasing, but “free” reports less and less available
> memory.  Within a couple of minutes it uses all of the 96GB on each of the
> nodes. When the job is killed the processes go away, but the memory usage
> (as reported by “free”) stays the same, e.g.:
>
>               total        used        free
>   shared  buff/cache   available
> Mem:       98423956    88750140     7021688        2184     2652128
>   6793020
> Swap:      65535996      365312    65170684
>
> As far as I can tell I have to reboot to get the memory back.
>
> If I attach to a running process with “gdb -p”, I see stack traces that
> look like these two examples (starting from the first mpi-related call):
>
>
> #0  0x00002b22a95134a3 in pthread_spin_lock () from /lib64/libpthread.so.0
> #1  0x00002b22be73a3e8 in mlx5_poll_cq_v1 () from
> /usr/lib64/libibverbs/libmlx5-rdmav17.so
> #2  0x00002b22bcb267de in uct_ud_verbs_iface_progress () from
> /lib64/libuct.so.0
> #3  0x00002b22bc8d28b2 in ucp_worker_progress () from /lib64/libucp.so.0
> #4  0x00002b22b7cd14e7 in mca_pml_ucx_progress ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/openmpi/mca_pml_ucx.so
> #5  0x00002b22ab6064fc in opal_progress () from
> /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libopen-pal.so.40
> #6  0x00002b22a9f51dc5 in ompi_request_default_wait ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40
> #7  0x00002b22a9fa355c in ompi_coll_base_allreduce_intra_ring ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40
> #8  0x00002b22a9f65cb3 in PMPI_Allreduce () from
> /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40
> #9  0x00002b22a9cedf9b in pmpi_allreduce__ ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi_mpifh.so.40
>
>
> #0  0x00002ae0518de69d in write () from /lib64/libpthread.so.0
> #1  0x00002ae064458d7f in ibv_cmd_reg_mr () from /usr/lib64/libibverbs.so.1
> #2  0x00002ae066b9221b in mlx5_reg_mr () from
> /usr/lib64/libibverbs/libmlx5-rdmav17.so
> #3  0x00002ae064461f08 in ibv_reg_mr () from /usr/lib64/libibverbs.so.1
> #4  0x00002ae064f6e312 in uct_ib_md_reg_mr.isra.11.constprop () from
> /lib64/libuct.so.0
> #5  0x00002ae064f6e4f2 in uct_ib_rcache_mem_reg_cb () from
> /lib64/libuct.so.0
> #6  0x00002ae0651aec0f in ucs_rcache_get () from /lib64/libucs.so.0
> #7  0x00002ae064f6d6a4 in uct_ib_mem_rcache_reg () from /lib64/libuct.so.0
> #8  0x00002ae064d1fa58 in ucp_mem_rereg_mds () from /lib64/libucp.so.0
> #9  0x00002ae064d21438 in ucp_request_memory_reg () from /lib64/libucp.so.0
> #10 0x00002ae064d21663 in ucp_request_send_start () from /lib64/libucp.so.0
> #11 0x00002ae064d335dd in ucp_tag_send_nb () from /lib64/libucp.so.0
> #12 0x00002ae06420a5e6 in mca_pml_ucx_start ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/openmpi/mca_pml_ucx.so
> #13 0x00002ae05236fc06 in ompi_coll_base_alltoall_intra_basic_linear ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40
> #14 0x00002ae05232f347 in PMPI_Alltoall () from
> /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40
> #15 0x00002ae0520b704c in pmpi_alltoall__ () from
> /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi_mpifh.so.40
>
> This doesn’t seem to happen on our older nodes (which have FDR mlx4
> interfaces).
>
> I don’t really have a mental model for OpenMPI's memory use, so I don’t
> know what component I should investigate: OpenMPI itself? ucx?  OFED?
> Something else?  IF anyone has any suggestions for what to try, and/or what
> other information would be useful, I’d appreciate it.
>
> thanks,
> Noam
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to