On 2024/9/12 13:35, Mattias Rönnblom wrote: > On 2024-09-12 04:33, fengchengwen wrote: >> On 2024/9/12 1:04, Mattias Rönnblom wrote: >>> Introduce DPDK per-lcore id variables, or lcore variables for short. >>> >>> An lcore variable has one value for every current and future lcore >>> id-equipped thread. >>> >>> The primary <rte_lcore_var.h> use case is for statically allocating >>> small, frequently-accessed data structures, for which one instance >>> should exist for each lcore. >>> >>> Lcore variables are similar to thread-local storage (TLS, e.g., C11 >>> _Thread_local), but decoupling the values' life time with that of the >>> threads. >>> >>> Lcore variables are also similar in terms of functionality provided by >>> FreeBSD kernel's DPCPU_*() family of macros and the associated >>> build-time machinery. DPCPU uses linker scripts, which effectively >>> prevents the reuse of its, otherwise seemingly viable, approach. >>> >>> The currently-prevailing way to solve the same problem as lcore >>> variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized >>> array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of >>> lcore variables over this approach is that data related to the same >>> lcore now is close (spatially, in memory), rather than data used by >>> the same module, which in turn avoid excessive use of padding, >>> polluting caches with unused data. >>> >>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>> Acked-by: Morten Brørup <m...@smartsharesystems.com> >>> >>> -- >>> >>> PATCH v2: >>> * Add Windows support. (Morten Brørup) >>> * Fix lcore variables API index reference. (Morten Brørup) >>> * Various improvements of the API documentation. (Morten Brørup) >>> * Elimination of unused symbol in version.map. (Morten Brørup) >> >> these history could move to the cover letter. >> >>> >>> PATCH: >>> * Update MAINTAINERS and release notes. >>> * Stop covering included files in extern "C" {}. >>> >>> RFC v6: >>> * Include <stdlib.h> to get aligned_alloc(). >>> * Tweak documentation (grammar). >>> * Provide API-level guarantees that lcore variable values take on an >>> initial value of zero. >>> * Fix misplaced __rte_cache_aligned in the API doc example. >>> >>> RFC v5: >>> * In Doxygen, consistenly use @<cmd> (and not \<cmd>). >>> * The RTE_LCORE_VAR_GET() and SET() convience access macros >>> covered an uncommon use case, where the lcore value is of a >>> primitive type, rather than a struct, and is thus eliminated >>> from the API. (Morten Brørup) >>> * In the wake up GET()/SET() removeal, rename RTE_LCORE_VAR_PTR() >>> RTE_LCORE_VAR_VALUE(). >>> * The underscores are removed from __rte_lcore_var_lcore_ptr() to >>> signal that this function is a part of the public API. >>> * Macro arguments are documented. >>> >>> RFV v4: >>> * Replace large static array with libc heap-allocated memory. One >>> implication of this change is there no longer exists a fixed upper >>> bound for the total amount of memory used by lcore variables. >>> RTE_MAX_LCORE_VAR has changed meaning, and now represent the >>> maximum size of any individual lcore variable value. >>> * Fix issues in example. (Morten Brørup) >>> * Improve access macro type checking. (Morten Brørup) >>> * Refer to the lcore variable handle as "handle" and not "name" in >>> various macros. >>> * Document lack of thread safety in rte_lcore_var_alloc(). >>> * Provide API-level assurance the lcore variable handle is >>> always non-NULL, to all applications to use NULL to mean >>> "not yet allocated". >>> * Note zero-sized allocations are not allowed. >>> * Give API-level guarantee the lcore variable values are zeroed. >>> >>> RFC v3: >>> * Replace use of GCC-specific alignof(<expression>) with alignof(<type>). >>> * Update example to reflect FOREACH macro name change (in RFC v2). >>> >>> RFC v2: >>> * Use alignof to derive alignment requirements. (Morten Brørup) >>> * Change name of FOREACH to make it distinct from <rte_lcore.h>'s >>> *per-EAL-thread* RTE_LCORE_FOREACH(). (Morten Brørup) >>> * Allow user-specified alignment, but limit max to cache line size. >>> --- >>> MAINTAINERS | 6 + >>> config/rte_config.h | 1 + >>> doc/api/doxy-api-index.md | 1 + >>> doc/guides/rel_notes/release_24_11.rst | 14 + >>> lib/eal/common/eal_common_lcore_var.c | 78 +++++ >>> lib/eal/common/meson.build | 1 + >>> lib/eal/include/meson.build | 1 + >>> lib/eal/include/rte_lcore_var.h | 385 +++++++++++++++++++++++++ >>> lib/eal/version.map | 2 + >>> 9 files changed, 489 insertions(+) >>> create mode 100644 lib/eal/common/eal_common_lcore_var.c >>> create mode 100644 lib/eal/include/rte_lcore_var.h >>> >>> diff --git a/MAINTAINERS b/MAINTAINERS >>> index c5a703b5c0..362d9a3f28 100644 >>> --- a/MAINTAINERS >>> +++ b/MAINTAINERS >>> @@ -282,6 +282,12 @@ F: lib/eal/include/rte_random.h >>> F: lib/eal/common/rte_random.c >>> F: app/test/test_rand_perf.c >>> +Lcore Variables >>> +M: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>> +F: lib/eal/include/rte_lcore_var.h >>> +F: lib/eal/common/eal_common_lcore_var.c >>> +F: app/test/test_lcore_var.c >>> + >>> ARM v7 >>> M: Wathsala Vithanage <wathsala.vithan...@arm.com> >>> F: config/arm/ >>> diff --git a/config/rte_config.h b/config/rte_config.h >>> index dd7bb0d35b..311692e498 100644 >>> --- a/config/rte_config.h >>> +++ b/config/rte_config.h >>> @@ -41,6 +41,7 @@ >>> /* EAL defines */ >>> #define RTE_CACHE_GUARD_LINES 1 >>> #define RTE_MAX_HEAPS 32 >>> +#define RTE_MAX_LCORE_VAR 1048576 >>> #define RTE_MAX_MEMSEG_LISTS 128 >>> #define RTE_MAX_MEMSEG_PER_LIST 8192 >>> #define RTE_MAX_MEM_MB_PER_LIST 32768 >>> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md >>> index f9f0300126..ed577f14ee 100644 >>> --- a/doc/api/doxy-api-index.md >>> +++ b/doc/api/doxy-api-index.md >>> @@ -99,6 +99,7 @@ The public API headers are grouped by topics: >>> [interrupts](@ref rte_interrupts.h), >>> [launch](@ref rte_launch.h), >>> [lcore](@ref rte_lcore.h), >>> + [lcore variables](@ref rte_lcore_var.h), >>> [per-lcore](@ref rte_per_lcore.h), >>> [service cores](@ref rte_service.h), >>> [keepalive](@ref rte_keepalive.h), >>> diff --git a/doc/guides/rel_notes/release_24_11.rst >>> b/doc/guides/rel_notes/release_24_11.rst >>> index 0ff70d9057..a3884f7491 100644 >>> --- a/doc/guides/rel_notes/release_24_11.rst >>> +++ b/doc/guides/rel_notes/release_24_11.rst >>> @@ -55,6 +55,20 @@ New Features >>> Also, make sure to start the actual text at the margin. >>> ======================================================= >>> +* **Added EAL per-lcore static memory allocation facility.** >>> + >>> + Added EAL API <rte_lcore_var.h> for statically allocating small, >>> + frequently-accessed data structures, for which one instance should >>> + exist for each EAL thread and registered non-EAL thread. >>> + >>> + With lcore variables, data is organized spatially on a per-lcore id >>> + basis, rather than per library or PMD, avoiding the need for cache >>> + aligning (or RTE_CACHE_GUARDing) data structures, which in turn >>> + reduces CPU cache internal fragmentation, improving performance. >>> + >>> + Lcore variables are similar to thread-local storage (TLS, e.g., >>> + C11 _Thread_local), but decoupling the values' life time from that >>> + of the threads. >>> Removed Items >>> ------------- >>> diff --git a/lib/eal/common/eal_common_lcore_var.c >>> b/lib/eal/common/eal_common_lcore_var.c >>> new file mode 100644 >>> index 0000000000..309822039b >>> --- /dev/null >>> +++ b/lib/eal/common/eal_common_lcore_var.c >>> @@ -0,0 +1,78 @@ >>> +/* SPDX-License-Identifier: BSD-3-Clause >>> + * Copyright(c) 2024 Ericsson AB >>> + */ >>> + >>> +#include <inttypes.h> >>> +#include <stdlib.h> >>> + >>> +#ifdef RTE_EXEC_ENV_WINDOWS >>> +#include <malloc.h> >>> +#endif >>> + >>> +#include <rte_common.h> >>> +#include <rte_debug.h> >>> +#include <rte_log.h> >>> + >>> +#include <rte_lcore_var.h> >>> + >>> +#include "eal_private.h" >>> + >>> +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE) >>> + >>> +static void *lcore_buffer; >>> +static size_t offset = RTE_MAX_LCORE_VAR; >>> + >>> +static void * >>> +lcore_var_alloc(size_t size, size_t align) >>> +{ >>> + void *handle; >>> + void *value; >>> + >>> + offset = RTE_ALIGN_CEIL(offset, align); >>> + >>> + if (offset + size > RTE_MAX_LCORE_VAR) { >>> +#ifdef RTE_EXEC_ENV_WINDOWS >>> + lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE, >>> + RTE_CACHE_LINE_SIZE); >>> +#else >>> + lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE, >>> + LCORE_BUFFER_SIZE); >>> +#endif >>> + RTE_VERIFY(lcore_buffer != NULL); >>> + >>> + offset = 0; >>> + } >>> + >>> + handle = RTE_PTR_ADD(lcore_buffer, offset); >>> + >>> + offset += size; >>> + >>> + RTE_LCORE_VAR_FOREACH_VALUE(value, handle) >>> + memset(value, 0, size); >>> + >>> + EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a " >>> + "%"PRIuPTR"-byte alignment", size, align); >> >> Currrent the data was malloc by libc function, I think it's mainly for such >> INIT macro which will be init before main. >> But it will introduce following problem: >> 1\ it can't benefit from huge-pages. this patch may reserved many 1MBs for >> each lcore, if we could place it in huge-pages it will reduce the TLB miss >> rate, especially it freq access data. > > This mechanism is for small allocations, which the sum of is also expected to > be small (although the system won't break if they aren't). > > If you have large allocations, you are better off using lazy huge page > allocations further down the initialization process. Otherwise, you will end > up using memory for RTE_MAX_LCORE instances, rather than the actual lcore > count, which could be substantially smaller.
Yes, it may cost two much memory if allocated from hugepage memory. > > But sure, everything else being equal, you could have used huge pages for > these lcore variable values. But everything isn't equal. > >> 2\ it can't across multi-process. many of current lcore-data also don't >> support multi-process, but I think it worth do that, and it will help us to >> some service recovery when sub-process failed and reboot. >> >> ... >> > > Not sure I think that's a downside. Further cementing that anti-pattern into > DPDK seems to be a bad idea to me. > > lcore variables doesn't *introduce* any of these issues, since the mechanisms > it's replacing also have these shortcomings (if you think about them as such > - I'm not sure I do). Got it. This feature is a enhanced for current lcore variables, which bring together scattered data from the point view of a single core. and current it seemmed hard to extend support hugepage memory.