On 2024-10-14 09:56, Morten Brørup wrote:
From: Jerin Jacob [mailto:jerinjac...@gmail.com]
Sent: Wednesday, 18 September 2024 12.12
On Thu, Sep 12, 2024 at 8:52 PM Jerin Jacob <jerinjac...@gmail.com>
wrote:
On Thu, Sep 12, 2024 at 7:11 PM Morten Brørup
<m...@smartsharesystems.com> wrote:
From: Jerin Jacob [mailto:jerinjac...@gmail.com]
Sent: Thursday, 12 September 2024 15.17
On Thu, Sep 12, 2024 at 2:40 PM Morten Brørup
<m...@smartsharesystems.com>
wrote:
+#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR *
RTE_MAX_LCORE)
Considering hugepages...
Lcore variables may be allocated before DPDK's memory allocator
(rte_malloc()) is ready, so rte_malloc() cannot be used for lcore
variables.
And lcore variables are not usable (shared) for DPDK multi-
process, so the
lcore_buffer could be allocated through the O/S APIs as anonymous
hugepages,
instead of using rte_malloc().
The alternative, using rte_malloc(), would disallow allocating
lcore
variables before DPDK's memory allocator has been initialized,
which I think
is too late.
I thought it is not. A lot of the subsystems are initialized
after the
memory subsystem is initialized.
[1] example given in documentation. I thought, RTE_INIT needs to
replaced if the subsystem called after memory initialized (which
is
the case for most of the libraries)
The list of RTE_INIT functions are called before main(). It is not
very useful.
Yes, it would be good to replace (or supplement) RTE_INIT_PRIO by
something similar, which calls the list of "INIT" functions at the
appropriate time during EAL initialization.
DPDK should then use this "INIT" list for all its initialization,
so the init function of new features (such as this, and trace) can be
inserted at the correct location in the list.
Trace library had a similar situation. It is managed like [2]
Yes, if we insist on using rte_malloc() for lcore variables, the
alternative is to prohibit establishing lcore variables in functions
called through RTE_INIT.
I was not insisting on using ONLY rte_malloc(). Since rte_malloc()
can
be called before rte_eal_init)(it will return NULL). Alloc routine
can
check first rte_malloc() is available if not switch over glibc.
@Mattias Rönnblom This comment is not addressed in v7. Could you check?
Mattias, following up on Jerin's suggestion:
When allocating an lcore variable, and the buffer holding lcore variables is
out of space (or was never allocated), a new buffer is allocated.
Here's the twist I think Jerin is asking for:
You could check if rte_malloc() is available, and use that (instead of the
heap) when allocating a new buffer holding lcore variables.
This check can be performed (aggressively) when allocating a new lcore
variable, or (conservatively) only when allocating a new buffer.
Now, if using hugepages, the value of RTE_MAX_LCORE_VAR (the maximum size of
one lcore variable instance) becomes more important.
Let's consider systems with 2 MB hugepages:
If it supports two lcores (RTE_MAX_LCORE is 2), the current RTE_MAX_LCORE_VAR
default of 1 MB is a perfect match; it will use 2 MB of RAM as one 2 MB
hugepage.
If it supports 128 lcores, the current RTE_MAX_LCORE_VAR default of 1 MB will
use 128 MB of RAM.
If we scale it back, so it only uses one 2 MB hugepage, RTE_MAX_LCORE_VAR will
have to be 2 MB / 128 lcores = 16 KB.
16 KB might be too small. E.g. a mempool cache uses 2 * 512 * sizeof(void *) =
8 KB + a few bytes for the information about the cache. So I can easily point
at one example where 16 KB is going very close to the edge.
So, as you already asked, what is a reasonable default minimum value of
RTE_MAX_LCORE_VAR?
Maybe we should just stick with your initial suggestion (1 MB) and see how it
goes.
Sure. Let's stick with 1 MB.
I'm guessing that if/when someone takes a closer look how to do
per-lcore *dynamic* allocations, this API and its implementation will be
revisited as well.
<roadmap>
At the recent DPDK Summit, we discussed memory consumption in one of the
workshops.
One of the possible means for reducing memory consumption is making
RTE_MAX_LCORE dynamic, so an application using only a few cores will scale its
per-lcore tables to the actual number of lcores, instead of scaling to some
hardcoded maximum.
With this in mind, I'm less worried about the RTE_MAX_LCORE multiplier.
</roadmap>
A interesting hack would be disable huge page usage, set up a swap file
in a zram device, and then MADV_PAGEOUT the DPDK process after startup.
I wonder how much smaller DPDK process RSS would be, when it had paged
back in all the pages that were actually required.