On 12/14/21 07:08, Dominique Martinet wrote:
> I've double-checked with traces in load_spa_handle/unref_handle and it
> is all free()d as soon as the client disconnects, so there's no reason
> the memory would still be used... And I think we're just looking at some
> malloc optimisation not releasing the memory.
> 
> To confirm, I've tried starting pipewire-pulse with jemalloc loaded,
> LD_PRELOAD=/usr/lib64/libjemalloc.so , and interestingly after the 100
> clients exit the memory stays at ~3-400MB but as soon as single new
> client connects it jumps back down to 20MB, so that seems to confirm it.
> (with tcmalloc it stays all the way up at 700+MB...)

 
> So I guess we're just chasing after artifacts from the allocator, and
> it'll be hard to tell which it is when I happen to see pipewire-pulse
> with high memory later on...

It can be difficult to tell the difference between:
(a) allocator caching
(b) application usage

To help with we developed some additional tracing utilities:
https://pagure.io/glibc-malloc-trace-utils

The idea was to get a full API trace of malloc family calls and then play them 
back
in a simulator to evaluate the heap/arena usage when threads were involved.

Knowing the exact API calls lets you determine if you have (a), where the API 
calls
show a small usage but in reality RSS is higher, or (b) where the API calls 
show there
are some unmatched free()s and the usage is growing.

It seems like you used jemalloc and then found that memory usage stays low?

If that is the case it may be userspace caching from the allocator.

jemalloc is particularly lean with a time-decay thread that frees back to the OS
in order to reduce memory usage down to a fixed percentage. The consequence of
this is that you get latency on the allocation side, and the application has to
take this into account.

> From what I can see the big allocations are (didn't look at lifetime of each
> alloc):
>  - load_spa_handle for audioconvert/libspa-audioconvert allocs 3.7MB
>  - pw_proxy_new allocates 590k
>  - reply_create_playback_stream allocates 4MB
>  - spa_buffer_alloc_array allocates 1MB from negotiate_buffers
>  - spa_buffer_alloc_array allocates 256K x2 + 128K
>    from negotiate_link_buffers

On a 64-bit system the maximum dynamic allocation size is 32MiB.

As you call malloc with ever larger values the dynamic scaling will scale up to
at most 32MiB (half of a 64MiB heap). So it is possible that all of these 
allocations
are placed on the mmap/sbrk'd heaps and stay there for future usage until freed 
back.

Could you try running with this env var:

GLIBC_TUNABLES=glibc.malloc.mmap_threshold=131072

Note: See `info libc tunables`.

> maybe some of these buffers sticking around for the duration of the
> connection could be pooled and shared?
 
They are pooled and shared if they are cached by the system memory allocator.

All of tcmalloc, jemalloc, and glibc malloc attempt to cache the userspace 
requests
with different algorithms that match given workloads.

-- 
Cheers,
Carlos.
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to