On 2024-11-09 00:52, Morten Brørup wrote:
From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Friday, 8 November 2024 23.23
On 2024-11-08 20:53, Morten Brørup wrote:
From: Morten Brørup [mailto:m...@smartsharesystems.com]
Sent: Friday, 8 November 2024 19.35
From: David Marchand [mailto:david.march...@redhat.com]
Sent: Friday, 8 November 2024 19.18
OVS locks all pages to avoid page faults while processing packets.
It sounds smart, so I just took a look at how it does this. I'm not
sure, but it seems like it only locks pages that are actually mapped
(current and future).
mlockall(MLOCK_CURRENT) will bring in the whole BSS, it seems. Plus all
the rest like unused parts of the execution stacks, the data section
and
unused code (text) in the binary and all libraries it has linked to.
It makes a simple (e.g., a unit test) DPDK 24.07 program use ~33x more
residential memory. After lcore variables, the same MLOCK_CURRENT-ed
program is ~30% larger than before. So, a relatively modest increase.
Thank you for testing this, Mattias.
What are the absolute numbers, i.e. in KB, to get an idea of the numbers I
should be looking for?
Hello world type program with static linking. Default DPDK config. x86_64.
DPDK version MAX_LCORE_VAR EAL params mlock RSS [MB]
22.11 - --no-huge -m 1000 no 22
24.11 1048576 --no-huge -m 1000 no 22
24.11 1048576 --no-huge -m 1000 yes 1576
24.11 4096 --no-huge -m 1000 yes 1445
22.11 - - yes 333*
24.11 1048576 - yes 542*
24.11 4096 - yes 411*
* Excluding huge pages
If you are more selective what libraries you bring in, the footprint
will be lower. How large a fraction is effectively unavoidable, I don't
know. The relative increase will depends on how much memory the
application uses, obviously. The hello world app doesn't have any
app-level state.
I wonder why the footprint grows at all... Intuitively the same variables
should consume approximately the same amount of RAM, regardless how they are
allocated.
Speculating...
lcore variables use malloc(), which in turn does not bring in memory
pages unless they are needed. Much of the lcore buffer will be unused,
and not resident. I covered this, including some example calculation of
the space savings, in an earlier thread. It may be in the programmer's
guide as well, I don't remember.
The lcore_states were allocated through rte_calloc() and thus used some space
in the already allocated hugepages, so they didn't add more pages to the
footprint. But they do when allocated and initialized as lcore variables.
The first lcore variable allocated/initialized uses RTE_MAX_LCORE (128) pages
of 4 KB each = 512 KB total. It seems unlikely that adding 512 KB increases the
footprint by 30 %.
mlockall() brings in all currently-untouched malloc()ed pages, growing
the set of residential pages.
The numbers are less drastic, obviously, for many real-world programs,
which have large packet pools and other memory hogs.
Agree.
However, it would be good to understand why switching to lcore variables has
this effect on the footprint when using mlockall() like OVS.