RE: [PATCH v2] net/null: Add fast mbuf release TX offload

Ivan Malov Mon, 28 Jul 2025 06:51:30 -0700

Hi Morten,

On Mon, 28 Jul 2025, Morten Brørup wrote:

From: Ivan Malov [mailto:ivan.ma...@arknetworks.am]
Sent: Saturday, 26 July 2025 08.15

Hi Morten,

Good patch. Please see below.

On Sat, 26 Jul 2025, Morten Brørup wrote:

Added fast mbuf release, re-using the existing mbuf pool pointer
in the queue structure.

Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
---
v2:
* Also announce the offload as a per-queue capability.
* Added missing test of per-device offload configuration when

configuring

 the queue.
---
drivers/net/null/rte_eth_null.c | 33 ++++++++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/net/null/rte_eth_null.c

b/drivers/net/null/rte_eth_null.c

index 8a9b74a03b..09cfc74494 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -34,6 +34,17 @@ struct pmd_internals;
struct null_queue {
        struct pmd_internals *internals;

+       /**
+        * For RX queue:
+        *  Mempool to allocate mbufs from.
+        *
+        * For TX queue:


Perhaps spell it 'Rx', 'Tx', but this is up to you.


I just checked, and it seems all three spellings "rx", "Rx" and "RX" are being 
used in DPDK.
I personally prefer RX, so I'll keep that.

+        *  Mempool to free mbufs to, if fast release of mbufs is enabled.
+        *  UINTPTR_MAX if the mempool for fast release of mbufs has not

yet been detected.

+        *  NULL if fast release of mbufs is not enabled.
+        *
+        *  @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
+        */


May be it would be better to have a separate 'tx_pkt_burst' callback, to
avoid
conditional checks below. Though, I understand this will downgrade the
per-queue
capability to the per-port only, so feel free to disregard this point.


I considered this, and I can imagine an application using FAST_FREE for its 
primary queues, and normal free for some secondary queues.
Looking at other drivers - which have implemented a runtime check, not separate 
callbacks for FAST_FREE - I guess they came to the same conclusion.

        struct rte_mempool *mb_pool;
        void *dummy_packet;

@@ -151,7 +162,16 @@ eth_null_tx(void *q, struct rte_mbuf **bufs,

uint16_t nb_bufs)

        for (i = 0; i < nb_bufs; i++)
                bytes += rte_pktmbuf_pkt_len(bufs[i]);

-       rte_pktmbuf_free_bulk(bufs, nb_bufs);
+       if (h->mb_pool != NULL) { /* RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE */
+               if (unlikely(h->mb_pool == (void *)UINTPTR_MAX)) {
+                       if (unlikely(nb_bufs == 0))
+                               return 0; /* Do not dereference uninitialized

bufs[0]. */

+                       h->mb_pool = bufs[0]->pool;
+               }
+               rte_mbuf_raw_free_bulk(h->mb_pool, bufs, nb_bufs);
+       } else {
+               rte_pktmbuf_free_bulk(bufs, nb_bufs);
+       }
        rte_atomic_fetch_add_explicit(&h->tx_pkts, nb_bufs,

rte_memory_order_relaxed);

        rte_atomic_fetch_add_explicit(&h->tx_bytes, bytes,

rte_memory_order_relaxed);


@@ -259,7 +279,7 @@ static int
eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
                uint16_t nb_tx_desc __rte_unused,
                unsigned int socket_id __rte_unused,
-               const struct rte_eth_txconf *tx_conf __rte_unused)
+               const struct rte_eth_txconf *tx_conf)
{
        struct rte_mbuf *dummy_packet;
        struct pmd_internals *internals;
@@ -284,6 +304,10 @@ eth_tx_queue_setup(struct rte_eth_dev *dev,

uint16_t tx_queue_id,


        internals->tx_null_queues[tx_queue_id].internals = internals;
        internals->tx_null_queues[tx_queue_id].dummy_packet =

dummy_packet;

+       internals->tx_null_queues[tx_queue_id].mb_pool =
+                       (dev->data->dev_conf.txmode.offloads | tx_conf-
offloads) &
+                       RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE ?
+                       (void *)UINTPTR_MAX : NULL;


Given the fact that FAST_FREE and MULTI_SEGS are effectively
conflicting,
wouldn't it be better to have a check for the presence of both flags
here? I'm
not sure whether this is already checked at the generic layer above the
PMD.


Interesting thought - got me looking deeper into this.

It seems MULTI_SEGS is primarily a capability flag.
The description of the MULTI_SEGS flag [1] could be interpreted that way too:
/** Device supports multi segment send. */

[1]: https://elixir.bootlin.com/dpdk/v25.07/source/lib/ethdev/rte_ethdev.h#L1614

E.g. the i40e driver offers MULTI_SEGS capability per-device, but not 
per-queue. And it doesn't use the MULTI_SEGS flag for any purpose (beyond 
capability reporting).

Furthermore, enabling MULTI_SEGS on TX (per device or per queue) wouldn't mean 
that all transmitted packets are segmented; it only means that the driver must 
be able to handle segmented packets.
I.e. MULTI_SEGS could be enabled on a device, and yet it would be acceptable to 
enable FAST_FREE on a queue on that device.


Yes, you are correct and I apologise. It's capability, not the requestor bit.

Thank you.


Thank you.


Thank you for reviewing.


        return 0;
}
@@ -309,7 +333,10 @@ eth_dev_info(struct rte_eth_dev *dev,
        dev_info->max_rx_queues = RTE_DIM(internals->rx_null_queues);
        dev_info->max_tx_queues = RTE_DIM(internals->tx_null_queues);
        dev_info->min_rx_bufsize = 0;
-       dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS |

RTE_ETH_TX_OFFLOAD_MT_LOCKFREE;

+       dev_info->tx_queue_offload_capa =

RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;

+       dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
+                       RTE_ETH_TX_OFFLOAD_MT_LOCKFREE |
+                       dev_info->tx_queue_offload_capa;

        dev_info->reta_size = internals->reta_size;
        dev_info->flow_type_rss_offloads = internals-
flow_type_rss_offloads;
--
2.43.0

RE: [PATCH v2] net/null: Add fast mbuf release TX offload

Reply via email to