This looks a lot like a problem I had with OpenMPI 3.1.2.  I thought the fix 
was landed in 4.0.0 but you might
want to check the code to be sure there wasn’t a regression in 4.1.x.  Most of 
our codes are still running
3.1.2 so I haven’t built anything beyond 4.0.0 which definitely included the 
fix.

See…

- Apply patch for memory leak associated with UCX PML.
-    https://github.com/openucx/ucx/issues/2921
-    https://github.com/open-mpi/ompi/pull/5878

Charles Taylor
UF Research Computing


> On Jun 19, 2019, at 2:26 PM, Noam Bernstein via users 
> <users@lists.open-mpi.org> wrote:
> 
>> On Jun 19, 2019, at 2:00 PM, John Hearns via users <users@lists.open-mpi.org 
>> <mailto:users@lists.open-mpi.org>> wrote:
>> 
>> Noam, it may be a stupid question. Could you try running    slabtop   ss the 
>> program executes
> 
> The top SIZE usage is this line
>    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                  
>  
> 5937540 5937540 100%    0.09K 141370     42    565480K kmalloc-96
> which seems to be growing continuously. However, it’s much smaller than the 
> drop in free memory.  It gets to around 1 GB after tens of seconds (500 MB 
> here), but the overall free memory is dropping by about 1 GB / second, so 
> tens of GB over the same time.
> 
>> 
>> Also  'watch  cat /proc/meminfo'is also a good diagnostic
> 
> Other than MemFree dropping, I don’t see much. Here’s a diff, 10 seconds 
> apart:
> 2,3c2,3
> < MemFree:        54229400 kB
> < MemAvailable:   54271804 kB
> ---
> > MemFree:        45010772 kB
> > MemAvailable:   45054200 kB
> 19c19
> < AnonPages:      22063260 kB
> ---
> > AnonPages:      22526300 kB
> 22,24c22,24
> < Slab:             851380 kB
> < SReclaimable:      87100 kB
> < SUnreclaim:       764280 kB
> ---
> > Slab:            1068208 kB
> > SReclaimable:      89148 kB
> > SUnreclaim:       979060 kB
> 31c31
> < Committed_AS:   34976896 kB
> ---
> > Committed_AS:   34977680 kB
> 
> MemFree has dropped by 9 GB, but as far as I can tell nothing else has 
> increased by anything near as much, so I don’t know where the memory is going.
> 
>                                                                       Noam
> 
> 
> ____________
> ||
> |U.S. NAVAL|
> |_RESEARCH_|
> LABORATORY
> 
> Noam Bernstein, Ph.D.
> Center for Materials Physics and Technology
> U.S. Naval Research Laboratory
> T +1 202 404 8628  F +1 202 404 7546
> https://www.nrl.navy.mil 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nrl.navy.mil&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=NpYP1iUbEbTx87BW8Gx5ow&m=uR1yQLj0g46Qb_ELHglK3ck3gNxjVqxMHyRu2bcfRQo&s=0UyoZWeZV8v9A3u8grfAMtjdaqPRb8FsOMORqr9NOew&e=>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=NpYP1iUbEbTx87BW8Gx5ow&m=uR1yQLj0g46Qb_ELHglK3ck3gNxjVqxMHyRu2bcfRQo&s=oTZPqoXvy0rvbh3Ni6Mquuzel8PXWG1ub4-c6xleDnQ&e=

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to