> From: Konstantin Ananyev [mailto:[email protected]]
> Sent: Monday, 15 December 2025 15.41
>
> >
> > Executive Summary:
> >
> > My analysis shows that the mbuf library is not a barrier for fast-
> freeing
> > segmented packet mbufs, and thus fast-free of jumbo frames is
> possible.
> >
> >
> > Detailed Analysis:
> >
> > The purpose of the mbuf fast-free Tx optimization is to reduce
> > rte_pktmbuf_free_seg() to something much simpler in the ethdev
> drivers, by
> > eliminating the code path related to indirect mbufs.
> > Optimally, we want to simplify the ethdev driver's function that
> frees the
> > transmitted mbufs, so it can free them directly to their mempool
> without
> > accessing the mbufs themselves.
> >
> > If the driver cannot access the mbuf itself, it cannot determine
> which
> > mempool it belongs to.
> > We don't want the driver to access every mbuf being freed; but if all
> > mbufs of a Tx queue belong to the same mempool, the driver can
> determine
> > which mempool by looking into just one of the mbufs.
> >
> > REQUIREMENT 1: The mbufs of a Tx queue must come from the same
> mempool.
> >
> >
> > When an mbuf is freed to its mempool, some of the fields in the mbuf
> must
> > be initialized.
> > So, for fast-free, this must be done by the driver's function that
> > prepares the Tx descriptor.
> > This is a requirement to the driver, not a requirement to the
> application.
> >
> > Now, let's dig into the code for freeing an mbuf.
> > Note: For readability purposes, I'll cut out some code and comments
> > unrelated to this topic.
> >
> > static __rte_always_inline void
> > rte_pktmbuf_free_seg(struct rte_mbuf *m)
> > {
> > m = rte_pktmbuf_prefree_seg(m);
> > if (likely(m != NULL))
> > rte_mbuf_raw_free(m);
> > }
> >
> >
> > rte_mbuf_raw_free(m) is simple, so nothing to gain there:
> >
> > /**
> > * Put mbuf back into its original mempool.
> > *
> > * The caller must ensure that the mbuf is direct and properly
> > * reinitialized (refcnt=1, next=NULL, nb_segs=1), as done by
> > * rte_pktmbuf_prefree_seg().
> > */
> > static __rte_always_inline void
> > rte_mbuf_raw_free(struct rte_mbuf *m)
> > {
> > rte_mbuf_history_mark(m, RTE_MBUF_HISTORY_OP_LIB_FREE);
> > rte_mempool_put(m->pool, m);
> > }
> >
> > Note that the description says that the mbuf must be direct.
> > This is not entirely accurate; the mbuf is allowed to use a pinned
> > external buffer, if the mbuf holds the only reference to it.
> > (Most of the mbuf library functions have this documentation
> inaccuracy,
> > which should be fixed some day.)
> >
> > So, the fast-free optimization really comes down to
> > rte_pktmbuf_prefree_seg(m), which must not return NULL.
> >
> > Let's dig into that.
> >
> > /**
> > * Decrease reference counter and unlink a mbuf segment
> > *
> > * This function does the same than a free, except that it does not
> > * return the segment to its pool.
> > * It decreases the reference counter, and if it reaches 0, it is
> > * detached from its parent for an indirect mbuf.
> > *
> > * @return
> > * - (m) if it is the last reference. It can be recycled or freed.
> > * - (NULL) if the mbuf still has remaining references on it.
> > */
> > static __rte_always_inline struct rte_mbuf *
> > rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > {
> > bool refcnt_not_one;
> >
> > refcnt_not_one = unlikely(rte_mbuf_refcnt_read(m) != 1);
> > if (refcnt_not_one && __rte_mbuf_refcnt_update(m, -1) != 0)
> > return NULL;
> >
> > if (unlikely(!RTE_MBUF_DIRECT(m))) {
> > rte_pktmbuf_detach(m);
> > if (RTE_MBUF_HAS_EXTBUF(m) &&
> > RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
> > __rte_pktmbuf_pinned_extbuf_decref(m))
> > return NULL;
> > }
> >
> > if (refcnt_not_one)
> > rte_mbuf_refcnt_set(m, 1);
> > if (m->nb_segs != 1)
> > m->nb_segs = 1;
> > if (m->next != NULL)
> > m->next = NULL;
> >
> > return m;
> > }
> >
> > This function can only succeed (i.e. return non-NULL) when 'refcnt'
> is 1
> > (or reaches 0).
> >
> > REQUIREMENT 2: The driver must hold the only reference to the mbuf,
> > i.e. 'm->refcnt' must be 1.
> >
> >
> > When the function succeeds, it initializes the mbuf fields as
> required by
> > rte_mbuf_raw_free() before returning.
> >
> > Now, since the driver has exclusive access to the mbuf, it is free to
> > initialize the 'm->next' and 'm->nb_segs' at any time.
> > It could do that when preparing the Tx descriptor.
> >
> > This is very interesting, because it means that fast-free does not
> > prohibit segmented packets!
> > (But the driver must have sufficient Tx descriptors for all segments
> in
> > the mbuf.)
> >
> >
> > Now, lets dig into rte_pktmbuf_prefree_seg()'s block handling non-
> direct
> > mbufs, i.e. cloned mbufs and mbufs with external buffer:
> >
> > if (unlikely(!RTE_MBUF_DIRECT(m))) {
> > rte_pktmbuf_detach(m);
> > if (RTE_MBUF_HAS_EXTBUF(m) &&
> > RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
> > __rte_pktmbuf_pinned_extbuf_decref(m))
> > return NULL;
> > }
> >
> > Starting with rte_pktmbuf_detach():
> >
> > static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
> > {
> > struct rte_mempool *mp = m->pool;
> > uint32_t mbuf_size, buf_len;
> > uint16_t priv_size;
> >
> > if (RTE_MBUF_HAS_EXTBUF(m)) {
> > /*
> > * The mbuf has the external attached buffer,
> > * we should check the type of the memory pool where
> > * the mbuf was allocated from to detect the pinned
> > * external buffer.
> > */
> > uint32_t flags = rte_pktmbuf_priv_flags(mp);
> >
> > if (flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) {
> > /*
> > * The pinned external buffer should not be
> > * detached from its backing mbuf, just exit.
> > */
> > return;
> > }
> > __rte_pktmbuf_free_extbuf(m);
> > } else {
> > __rte_pktmbuf_free_direct(m);
> > }
> > priv_size = rte_pktmbuf_priv_size(mp);
> > mbuf_size = (uint32_t)(sizeof(struct rte_mbuf) + priv_size);
> > buf_len = rte_pktmbuf_data_room_size(mp);
> >
> > m->priv_size = priv_size;
> > m->buf_addr = (char *)m + mbuf_size;
> > rte_mbuf_iova_set(m, rte_mempool_virt2iova(m) + mbuf_size);
> > m->buf_len = (uint16_t)buf_len;
> > rte_pktmbuf_reset_headroom(m);
> > m->data_len = 0;
> > m->ol_flags = 0;
> > }
> >
> > The only quick and simple code path through this function is when the
> mbuf
> > uses a pinned external buffer:
> > if (RTE_MBUF_HAS_EXTBUF(m)) {
> > uint32_t flags = rte_pktmbuf_priv_flags(mp);
> > if (flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF)
> > return;
> >
> > REQUIREMENT 3: The mbuf must not be cloned or use a non-pinned
> external
> > buffer.
> >
> >
> > Continuing with the next part of rte_pktmbuf_prefree_seg()'s block:
> > if (RTE_MBUF_HAS_EXTBUF(m) &&
> > RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
> > __rte_pktmbuf_pinned_extbuf_decref(m))
> > return NULL;
> >
> > Continuing with the next part of the block in
> rte_pktmbuf_prefree_seg():
> >
> > /**
> > * @internal Handle the packet mbufs with attached pinned external
> buffer
> > * on the mbuf freeing:
> > *
> > * - return zero if reference counter in shinfo is one. It means
> there is
> > * no more reference to this pinned buffer and mbuf can be returned
> to
> > * the pool
> > *
> > * - otherwise (if reference counter is not one), decrement
> reference
> > * counter and return non-zero value to prevent freeing the backing
> mbuf.
> > *
> > * Returns non zero if mbuf should not be freed.
> > */
> > static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf
> *m)
> > {
> > struct rte_mbuf_ext_shared_info *shinfo;
> >
> > /* Clear flags, mbuf is being freed. */
> > m->ol_flags = RTE_MBUF_F_EXTERNAL;
> > shinfo = m->shinfo;
> >
> > /* Optimize for performance - do not dec/reinit */
> > if (likely(rte_mbuf_ext_refcnt_read(shinfo) == 1))
> > return 0;
> >
> > /*
> > * Direct usage of add primitive to avoid
> > * duplication of comparing with one.
> > */
> > if (likely(rte_atomic_fetch_add_explicit(&shinfo->refcnt, -1,
> > rte_memory_order_acq_rel) - 1))
> > return 1;
> >
> > /* Reinitialize counter before mbuf freeing. */
> > rte_mbuf_ext_refcnt_set(shinfo, 1);
> > return 0;
> > }
> >
> > Essentially, if the mbuf does use a pinned external buffer,
> > rte_pktmbuf_prefree_seg() only succeeds if that pinned external
> buffer is
> > only referred to by the mbuf.
> >
> > REQUIREMENT 4: If the mbuf uses a pinned external buffer, the mbuf
> must
> > hold the only reference to that pinned external buffer, i.e. in that
> case,
> > 'm->shinfo->refcnt' must be 1.
> >
> >
> > Please review.
> >
> > If I'm not mistaken, the mbuf library is not a barrier for fast-
> freeing
> > segmented packet mbufs, and thus fast-free of jumbo frames is
> possible.
> >
> > We need a driver developer to confirm that my suggested approach -
> > resetting the mbuf fields, incl. 'm->nb_segs' and 'm->next', when
> > preparing the Tx descriptor - is viable.
>
> Great analysis, makes a lot of sense to me.
> Shall we add then a special API to make PMD maintainers life a bit
> easier:
> Something like rte_mbuf_fast_free_prep(mp, mb), that will optionally
> check
> that requirements outlined above are satisfied for given mbuf and
> also reset mbuf fields to expected values?
Good idea, Konstantin.
Detailed suggestion below.
Note that __rte_mbuf_raw_sanity_check_mp() is used to checks the requirements
after 'nb_segs' and 'next' have been initialized.
/**
* Reinitialize an mbuf for freeing back into the mempool.
*
* The caller must ensure that the mbuf comes from the specified mempool,
* is direct and only referred to by the caller (refcnt=1).
*
* This function is used by drivers in their transmit function for mbuf fast
release
* when the transmit descriptor is initialized,
* so the driver can call rte_mbuf_raw_free()
* when the packet segment has been transmitted.
*
* @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
*
* @param mp
* The mempool to which the mbuf belong.
* @param m
* The mbuf being reinitialized.
*/
static __rte_always_inline void
rte_mbuf_raw_prefree_seg(const struct rte_mempool *mp, struct rte_mbuf *m)
{
if (m->nb_segs != 1)
m->nb_segs = 1;
if (m->next != NULL)
m->next = NULL;
__rte_mbuf_raw_sanity_check_mp(m, mp);
rte_mbuf_history_mark(mbuf, RTE_MBUF_HISTORY_OP_LIB_PREFREE_RAW);
}
/**
* Reinitialize a bulk of mbufs for freeing back into the mempool.
*
* The caller must ensure that the mbufs come from the specified mempool,
* are direct and only referred to by the caller (refcnt=1).
*
* This function is used by drivers in their transmit function for mbuf fast
release
* when the transmit descriptors are initialized,
* so the driver can call rte_mbuf_raw_free_bulk()
* when the packet segments have been transmitted.
*
* @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
*
* @param mp
* The mempool to which the mbufs belong.
* @param mbufs
* Array of pointers to mbufs being reinitialized.
* The array must not contain NULL pointers.
* @param count
* Array size.
*/
static __rte_always_inline void
rte_mbuf_raw_prefree_seg_bulk(const struct rte_mempool *mp, struct rte_mbuf
**mbufs, unsigned int count)
{
for (unsigned int idx = 0; idx < count; idx++) {
struct rte_mbuf *m = mbufs[idx];
if (m->nb_segs != 1)
m->nb_segs = 1;
if (m->next != NULL)
m->next = NULL;
__rte_mbuf_raw_sanity_check_mp(m, mp);
}
rte_mbuf_history_mark_bulk(mbufs, count,
RTE_MBUF_HISTORY_OP_LIB_PREFREE_RAW);
}
> Konstantin
>
>