Re: [PATCH 0/3] Defer lcore variables allocation

David Marchand Mon, 09 Dec 2024 09:40:34 -0800

On Mon, Dec 9, 2024 at 4:39 PM Mattias Rönnblom <hof...@lysator.liu.se> wrote:
> On 2024-12-09 12:03, David Marchand wrote:
> > On Fri, Dec 6, 2024 at 12:02 PM Mattias Rönnblom <hof...@lysator.liu.se> 
> > wrote:
> >> On 2024-12-05 18:57, David Marchand wrote:
> >>> As I had reported in rc2, the lcore variables allocation have a
> >>> noticeable impact on applications consuming DPDK, even when such
> >>> applications does not use DPDK, or use features associated to
> >>> some lcore variables.
> >>>
> >>> While the amount has been reduced in a rush before rc2,
> >>> there are still cases when the increased memory footprint is noticed
> >>> like in scaling tests.
> >>> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
> >>>
> >>
> >> What this bug report fails to mention is that it only affects
> >> applications using locked memory.
> >
> > - By locked memory, are you referring to mlock() and friends?
> > No ovsdb binary calls them, only the datapath cares about mlocking.
> >
> >
> > - At a minimum, I understand the lcore var change introduced an
> > increase in memory of 4kB * 128 (getpagesize() * RTE_MAX_LCORES),
> > since lcore_var_alloc() calls memset() of the lcore var size, for
> > every lcore.
> >
>
> Yes, that is my understanding. It's also consistent with the
> measurements I've posted on this list.
>
> > In this unit test where 1000 processes are kept alive in parallel,
> > this means memory consumption increased by 512k * 1000, so ~500M at
> > least.
> > This amount of memory is probably significant in a resource-restrained
> > env like a (Ubuntu) CI.
> >
> >
>
> I wouldn't expect thousands of concurrent processes in a
> resource-constrained system. Sounds wasteful indeed. But sure, there may
> well be scenarios where this make sense.
>
> > - I went and traced this unit tests on my laptop by monitoring
> > kmem:mm_page_alloc, though there may be a better metrics when it comes
> > to memory consumption.
> >
> > # dir=build; perf stat -e kmem:mm_page_alloc -- tests/testsuite -C
> > $dir/tests 
> > AUTOTEST_PATH=$dir/utilities:$dir/vswitchd:$dir/ovsdb:$dir/vtep:$dir/tests:$dir/ipsec::
> > 2154
> >
> > Which gives:
> > - 1 635 489      kmem:mm_page_alloc for v23.11
> > - 5 777 043      kmem:mm_page_alloc for v24.11
> >
>
> Interesting. What is vm.overcommit_memory set to?


# cat /proc/sys/vm/overcommit_memory
0

And I am not sure what is being used in Ubuntu CI.

But the problem is, in the end, simpler.

[snip]

>
> > There is a 4M difference, where I would expect 128k.
> > So something more happens, than a simple page allocation per lcore,
> > though I fail to understand what.

Isolating the perf events for one process of this huge test, I counted
4878 page alloc calls.
>From them, 4108 had rte_lcore_var_alloc in their calling stack which
is unexpected.

After spending some time reading glibc, I noticed alloc_perturb().
*bingo*, I remembered that OVS unit tests are run with MALLOC_PERTURB_
(=165 after double checking OVS sources).

"""
Tunable: glibc.malloc.perturb

This tunable supersedes the MALLOC_PERTURB_ environment variable and
is identical in features.

If set to a non-zero value, memory blocks are initialized with values
depending on some low order bits of this tunable when they are
allocated (except when allocated by calloc) and freed. This can be
used to debug the use of uninitialized or freed heap memory. Note that
this option does not guarantee that the freed block will have any
specific values. It only guarantees that the content the block had
before it was freed will be overwritten.

The default value of this tunable is ‘0’.
"""

Now, reproducing this out of the test:

$ perf stat -e kmem:mm_page_alloc -- ./build/ovsdb/ovsdb-client --help
>/dev/null
 Performance counter stats for './build/ovsdb/ovsdb-client --help':
               810      kmem:mm_page_alloc
       0,003277941 seconds time elapsed
       0,003260000 seconds user
       0,000000000 seconds sys

$ MALLOC_PERTURB_=165 perf stat -e kmem:mm_page_alloc --
./build/ovsdb/ovsdb-client --help >/dev/null
 Performance counter stats for './build/ovsdb/ovsdb-client --help':
             4 789      kmem:mm_page_alloc
       0,008766171 seconds time elapsed
       0,000976000 seconds user
       0,007794000 seconds sys

So the issue is not triggered by mlock'd memory, but by the whole
buffer of 16M for lcore variables being touched by a glibc debugging
feature.

And in Ubuntu CI, it translated to requesting 16G.

> >
> >
> > Btw, just focusing on lcore var, I did two more tests:
> > - 1 606 998      kmem:mm_page_alloc for v24.11 + revert all lcore var 
> > changes.
> > - 1 634 606      kmem:mm_page_alloc for v24.11 + current series with
> > postponed allocations.
> >
> >
>
> If one move initialization to shared object constructors (from having
> been at some later time), and then end up not running that
> initialization code at all (e.g., DPDK is not used), those code pages
> will increase RSS. That might well hurt more than the lcore variable
> memory itself, depending on how much code is run.
>
> However, such read-only pages can be replaced with something more useful
> if the system is under memory pressure, so they aren't really a big
> issue as far as (real) memory footprint is concerned.
>
> Just linking to DPDK (and its dependencies) already came with a 1-7 MB
> RSS penalty, prior to lcore variables. I wonder how much of that goes
> away if all RTE_INIT() type constructors are removed.

Regardless of the RSS change, removing completely constructors is not simple.
Postponing *all* existing constructors from DPDK code would be an ABI
breakage, as RTE_INIT have a priority notion and an application
callbacks using RTE_INIT may rely on this.
Just deferring "unprioritised" constructors would be doable on paper,
but the location in rte_eal_init where those are is deferred would
have to be carefully evaluated (with -d plugins in mind).


-- 
David Marchand

Re: [PATCH 0/3] Defer lcore variables allocation

Reply via email to