> On Jun 20, 2019, at 9:40 AM, Jeff Squyres (jsquyres) <[email protected]>
> wrote:
>
> On Jun 20, 2019, at 9:31 AM, Noam Bernstein via users
> <[email protected]> wrote:
>>
>> One thing that I’m wondering if anyone familiar with the internals can
>> explain is how you get a memory leak that isn’t freed when then program
>> ends? Doesn’t that suggest that it’s something lower level, like maybe a
>> kernel issue?
>
> If "top" doesn't show processes eating up the memory, and killing processes
> (e.g., MPI processes) doesn't give you memory back, then it's likely that
> something in the kernel is leaking memory.
That’s definitely what’s happening. “free" is reporting a lot of memory used,
but adding the values from ps is much lower.
>
> Have you tried the latest version of UCX -- including their kernel drivers --
> from Mellanox (vs. inbox/CentOS)?
>
I’ve tried the latest ucx from the ucx web site, 1.5.1, which doesn’t change
the behavior.
I haven’t yet tried the latest OFED or Mellanox low level stuff. That’s next
on my list, but slightly more involved to do, so I’ve been avoiding it.
thanks,
Noam
_______________________________________________
users mailing list
[email protected]
https://lists.open-mpi.org/mailman/listinfo/users