This looks a lot like a problem I had with OpenMPI 3.1.2. I thought the fix was landed in 4.0.0 but you might want to check the code to be sure there wasn’t a regression in 4.1.x. Most of our codes are still running 3.1.2 so I haven’t built anything beyond 4.0.0 which definitely included the fix.
See… - Apply patch for memory leak associated with UCX PML. - https://github.com/openucx/ucx/issues/2921 - https://github.com/open-mpi/ompi/pull/5878 Charles Taylor UF Research Computing > On Jun 19, 2019, at 2:26 PM, Noam Bernstein via users > <users@lists.open-mpi.org> wrote: > >> On Jun 19, 2019, at 2:00 PM, John Hearns via users <users@lists.open-mpi.org >> <mailto:users@lists.open-mpi.org>> wrote: >> >> Noam, it may be a stupid question. Could you try running slabtop ss the >> program executes > > The top SIZE usage is this line > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > > 5937540 5937540 100% 0.09K 141370 42 565480K kmalloc-96 > which seems to be growing continuously. However, it’s much smaller than the > drop in free memory. It gets to around 1 GB after tens of seconds (500 MB > here), but the overall free memory is dropping by about 1 GB / second, so > tens of GB over the same time. > >> >> Also 'watch cat /proc/meminfo'is also a good diagnostic > > Other than MemFree dropping, I don’t see much. Here’s a diff, 10 seconds > apart: > 2,3c2,3 > < MemFree: 54229400 kB > < MemAvailable: 54271804 kB > --- > > MemFree: 45010772 kB > > MemAvailable: 45054200 kB > 19c19 > < AnonPages: 22063260 kB > --- > > AnonPages: 22526300 kB > 22,24c22,24 > < Slab: 851380 kB > < SReclaimable: 87100 kB > < SUnreclaim: 764280 kB > --- > > Slab: 1068208 kB > > SReclaimable: 89148 kB > > SUnreclaim: 979060 kB > 31c31 > < Committed_AS: 34976896 kB > --- > > Committed_AS: 34977680 kB > > MemFree has dropped by 9 GB, but as far as I can tell nothing else has > increased by anything near as much, so I don’t know where the memory is going. > > Noam > > > ____________ > || > |U.S. NAVAL| > |_RESEARCH_| > LABORATORY > > Noam Bernstein, Ph.D. > Center for Materials Physics and Technology > U.S. Naval Research Laboratory > T +1 202 404 8628 F +1 202 404 7546 > https://www.nrl.navy.mil > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nrl.navy.mil&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=NpYP1iUbEbTx87BW8Gx5ow&m=uR1yQLj0g46Qb_ELHglK3ck3gNxjVqxMHyRu2bcfRQo&s=0UyoZWeZV8v9A3u8grfAMtjdaqPRb8FsOMORqr9NOew&e=> > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=NpYP1iUbEbTx87BW8Gx5ow&m=uR1yQLj0g46Qb_ELHglK3ck3gNxjVqxMHyRu2bcfRQo&s=oTZPqoXvy0rvbh3Ni6Mquuzel8PXWG1ub4-c6xleDnQ&e=
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users