Re: [RFC] memtank: add memtank library

Stephen Hemminger Fri, 03 Jul 2026 09:30:37 -0700

On Wed, 10 Jun 2026 11:39:17 +0100
Konstantin Ananyev <[email protected]> wrote:


> Introduce memtank, highly customizable fixed sized object allocator
> for DPDK applications. It offers close to the mempool level performance
> on the fast path.
> Main difference with the mempool is the ability to grow/shrink at runtime
> with user provided grow/shirnk threshold values, plus some extra
> features for higher flexibility.
> 
> Key properties:
> 
> - relies on user to provide callbacks for actual memory reservations.
>   User is free to choose whatever is most suitable way for his scenario,
>   i.e: via malloc/rte_malloc/mmap/some custom memory allocator.
> - user defined constructor callback for newly allocated objects.
> - bulk alloc and free APIs.
> - different alloc/free policies (specified by user via flags parameter):
>   * lightweight as possible, but can fail
>   * more robust, but heavyweight - causes call to user-provided backing
>     memory allocator.
> - backing memory grows/shrinks on demand, special API extensions
>   to allow user control grow/shrink size, frequency and when/where it
>   is going to happen (DP, CP, both, etc.).
> - ability to pre-allocate all objects at memtank creation time
>   (mempool like behavior).
> - custom object size and alignment.
> - per object runtime statistics and sanity-checks (boundary violation,
>   double free, etc.) can be enabled/disabled at memtank creation time.
> 
> Known limitations (subject for further improvements):
> 
> - scalability:
>   after 8+ lcores conventional mempool (with FIFO) starts to outperform
>   memtank (which uses LIFO inside).
> - mempool_cache integration is not part of the library and right now
>   has to be implemented by used manually on top of memtank API.
> 
> Envisioned usage scenarios within DPDK-based apps:
> various flow/session control structures (TCP PCB, CT, NAT sessions, etc.)
> that needs to be allocated/freed at the data-path.
> Also can be used by 'semi-fastpath' allocations:
> TBL-8 blocks for LPM, hash buckets, etc.
> 
> Initial idea is inspired by Linux/Solaris SLAB allocators.
> Also re-used some ideas from my previous work for TLDK project:
> https://github.com/FDio/tldk
> Signed-off-by: Konstantin Ananyev <[email protected]>
> ---

Lots of good feedback from Fable AI review. Still needs more work.


Thanks for bringing memtank into DPDK proper - dynamic grow/shrink
fills a real gap between mempool and malloc for session-type
structures. The API shape holds up well against the new-library
criteria: handle-based create/destroy, single installed header,
RTE_EXPORT_EXPERIMENTAL_SYMBOL used correctly (26.11 for the next
merge window), and the three callbacks in rte_memtank_prm are the
point of the design rather than a framework smell.

That said, I applied the RFC on 26.07-rc2 and found real bugs.
Items 1-3 were verified at runtime with ASAN/UBSan harnesses
against the applied tree, item 4 with a build.

Errors:

1. Heap buffer overflow in rte_memtank_create() metadata sizing.

memtank_meta_size() reserves no headroom for aligning the user
allocator's return pointer, unlike memchunk_size() which adds
"algn - 1" for exactly this reason. create() then does:

        mt = RTE_PTR_ALIGN_CEIL(p, alignof(typeof(*mt)));

which can advance p by up to 63 bytes into an allocation that only
has 48 bytes of slack (the gap between the offset of mtf.free[]
and sizeof(struct memtank_free)). Nothing documents an alignment
contract for prm->alloc(); with any allocator returning less than
16-byte-aligned memory (32-bit glibc malloc guarantees only 8),
put_free() writes the tail of the free[] cache past the end of
the allocation. Verified with ASAN and an 8-byte-aligned alloc
callback:

  WRITE of size 7680 ...
    #2 copy_objs lib/memtank/memtank.c:101
    #3 put_free lib/memtank/memtank.c:131
    #4 rte_memtank_free lib/memtank/memtank.c:627
  located 0 bytes after 8520-byte region

Fix: mirror memchunk_size(),

        sz = sizeof(*mt) + nb_free * sizeof(mt->mtf.free[0]);
        sz = RTE_ALIGN_CEIL(sz + alignof(typeof(*mt)) - 1,
                alignof(typeof(*mt)));

and document minimum alignment expectations for alloc().

2. Shrink is a no-op: the loop in shrink_chunk() never executes.

memtank.c:234-239:

        k = 0;
        n = 0;
        for (i = 0; i != num && n != k; i += k) {

"n != k" is evaluated before the first iteration with both
variables zero, so the body never runs and the function always
returns 0. Consequently rte_memtank_shrink() and the
RTE_MTANK_FREE_SHRINK flag never release memory. Verified at
runtime: with all chunks on the FULL list and nb_free == max_free,
rte_memtank_shrink() returns 0 and the free() callback is never
invoked; memory is only released by destroy(). The condition
should presumably be "n == k" (continue while the previous batch
was fully satisfied, stop when _shrink_chunk() drains the FULL
list). Note this also means the stress test exercises none of the
shrink paths, despite reporting shrink statistics.

3. Misaligned struct memobj access when obj_align < 8.

check_param() accepts any power-of-two obj_align including 1 and
2, but the per-object trailer (two uint64_t red zones plus a
pointer) is placed at obj + obj_sz - sizeof(struct memobj), which
is only 8-byte aligned when the object stride is. With obj_align=2
and obj_size=5, UBSan traps on the first grow:

  memtank.c:86: runtime error: store to misaligned address ...
  for type 'struct memobj', which requires 8 byte alignment

Undefined behavior, and a fault on strict-alignment targets. Fix:
round obj_align up to alignof(struct memobj) inside create(), or
reject smaller values in check_param(). The shipped unit test uses
obj_align=2 and only avoids the trap because obj_size=0 keeps the
stride a multiple of 8.

4. Build failure: VLA in the stress test.

test_memtank_stress.c:425:

        uint8_t buf[sz];

DPDK's default warning flags include -Wvla, so CI builds with
werror=true fail here. sz can also reach OBJ_SZ_MAX (1MB), which
is unfriendly to thread stacks regardless. Use a fixed-size buffer
with a bounded compare loop, or allocate once per worker.

Warnings:

5. mt->obj_size truncation for very large objects. memobj_size()
returns size_t but is stored into uint32_t (memtank.c:561). With
obj_size near UINT32_MAX the padded per-object size wraps, so
init_chunk() strides with a truncated value while chunk_size is
computed with the full one - corruption instead of EINVAL.
check_param() should bound obj_size.

6. Deprecated API in new test code: rte_atomic64_t/rte_atomic64_*
(test_memtank_stress.c:33-39) and rte_smp_rmb()/rte_smp_mb()
(:542, :616, :678, :686). New code should use
rte_atomic_*_explicit() with rte_memory_order_*. Related: wrk_cmd
(:127) is a plain uint32_t shared between the control lcore and
workers; it should be RTE_ATOMIC(uint32_t) with release store /
acquire load rather than bare access plus full barriers.

7. master/slave terminology throughout the stress test
(struct master_args, MASTER_FLAG_*, test_memtank_master, "launch
on all slaves"). DPDK uses main/worker; this was purged tree-wide
in 20.11 and must not be reintroduced.

8. __rte_cache_aligned placement. memtank.h:40,48,62 and
test_memtank_stress.c:97 use the old trailing form
"} __rte_cache_aligned;"; current convention is
"struct __rte_cache_aligned foo {" (MSVC-compatible position, see
rte_mempool.h). wrk_cmd applies the attribute to a variable - use
"static alignas(RTE_CACHE_LINE_SIZE) uint32_t" there.

9. meson.build declares deps += ['ring', 'telemetry'] but the
library uses neither (the ring is only in the test app, which has
its own dep entry). The empty extra_flags loop is dead boilerplate.

10. Callback contracts are undocumented. alloc/free/init can be
invoked concurrently from multiple lcores (any thread can trigger
grow or shrink), so they must be MT-safe. Worth stating in the
rte_memtank_prm doxygen, along with alignment expectations and
that alloc() need not zero memory.

11. Documentation needs a pass before leaving RFC:
- memtank_lib.rst uses markdown triple backticks (22 occurrences),
  which RST renders literally; use ``inline literals``.
- The create/destroy example does not compile: "sruct", semicolons
  instead of commas inside the designated initializer, and
  "user_define_free" vs the defined "user_defined_free".
- Typos in the rst and header doxygen: "Same a s mempool", "Aled
  public API", "ret_memtank_free", "rte_memetank_destroy",
  "memntank", "pooll", "susbystem", "udnerlying", "thershold",
  "maximimum", "statisitics", "insteance", "intitialize".
- rte_memtank.h:32 says the lists are "(USED, FREE)" but the
  code's states are FULL and USED.
- misc.c:233 dump output has a stray comma: "[USED]={,".
- rte_memtank_chunk_free() doxygen for nb_obj says "Number of
  objects to allocate".

12. Missing MAINTAINERS entry and release notes for a new library
(fine to defer for an RFC, needed for the real submission).

Info:

13. extern "C" opens before the #includes in rte_memtank.h;
current convention wraps only the declarations, after all includes
(see rte_mempool.h, rte_argparse.h). Include order is also
inverted (stdio.h after the rte_ headers).

14. RTE_MTANK_ALLOC_CHUNK is never tested by the implementation -
rte_memtank_alloc() gates the chunk path on "flags != 0", so any
future flag silently implies chunk allocation. Either test the
flag explicitly or document that any flag enables the chunk path.

Re: [RFC] memtank: add memtank library

Reply via email to