[AMD Official Use Only - AMD Internal Distribution Only] Snipped
> > [Public] > > > > Hi Morten, > > > > We have tested the effect of the patch using func-latency and PPs via > > testpmd. > > Please find our observations below > > > > - DPDK tag: 25.07-rc1 > > - compiler: gcc 14.2 > > - platform: AMD EPYC 8534P 64core 2.3GHz > > - app cmd: > > -- One port: ` sudo build/app/dpdk-testpmd -l 15,16 --vdev=net_null1 > > - -no-pci -- --nb-cores=1 --nb-ports=1 --txq=1 --rxq=1 --txd=2048 -- > > rxd=2048 -a --forward-mode=io --stats-period=1` > > -- Two port: ` sudo build/app/dpdk-testpmd -l 15,16,17 -- > > vdev=net_null1 --vdev=net_null2 --no-pci -- --nb-cores=2 --nb-ports=2 > > - > > -txq=1 --rxq=1 --txd=2048 --rxd=2048 -a --forward-mode=io --stats- > > period=1` > > > > Result 1 port: > > - Before patch: TX MPPs 117.61, RX-PPs 117.67, Func-latency TX: > > 1918ns, Func-latency free-bulk: 2667ns > > - After patch: TX MPPs 117.55, RX-PPs 117.54, Func-latency TX: > > 1921ns, Func-latency free-bulk: 2660ns > > > > Result 2 port: > > - Before patch: TX MPPs 117.61, RX-PPs 117.67, Func-latency TX: > > 1942ns, Func-latency free-bulk: 2557ns > > - After patch: TX MPPs 117.54, RX-PPs 117.54, Func-latency TX: > > 1946ns, Func-latency free-bulk: 2740ns > > > > Perf Top: diff before vs after shows 13.84% vs 13.79% > > > > Reviewed-by: Thiyagarjan P <thiyagaraja...@amd.com> > > Tested-by: Vipin Varghese <vipin.vargh...@amd.com> > > Thank you for reviewing and testing. > > > > > Clarification request: `with fast-mbuf-free on single port we see > > free- bulk reduction by -7ns. But null_tx increase by +3ns. TX PPs > > reduction by 0.07 Mpps. Is this anomaly of null_net PMD?` > > I have finally found the bug in my patch: > It announces device-level capability for FAST_FREE, but ignores device-level > FAST_FREE configuration, and uses queue-level FAST_FREE configuration > instead. > > Due to this bug, your testing probably shows the performance of the non- > FAST_FREE code path. > The added comparison for FAST_FREE (code path not taken) might explain the > null_tx +3ns increase. > > I will send a v2 patch. Will check > > > > > > > > > > > On Tue, 24 Jun 2025 18:14:16 +0000 Morten Brørup > > > > <m...@smartsharesystems.com> wrote: > > > > > > > > > Added fast mbuf release, re-using the existing mbuf pool pointer > > in > > > > > the queue structure. > > > > > > > > > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > > > > > > > > Makes sense. > > > > > > > > > --- > > > > > drivers/net/null/rte_eth_null.c | 30 > > > > > +++++++++++++++++++++++++++- > > -- > > > > > 1 file changed, 27 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/drivers/net/null/rte_eth_null.c > > > > b/drivers/net/null/rte_eth_null.c > > > > > index 8a9b74a03b..12c0d8d1ff 100644 > > > > > --- a/drivers/net/null/rte_eth_null.c > > > > > +++ b/drivers/net/null/rte_eth_null.c > > > > > @@ -34,6 +34,17 @@ struct pmd_internals; struct null_queue { > > > > > struct pmd_internals *internals; > > > > > > > > > > + /** > > > > > + * For RX queue: > > > > > + * Mempool to allocate mbufs from. > > > > > + * > > > > > + * For TX queue: > > > > > + * Mempool to free mbufs to, if fast release of mbufs is > > enabled. > > > > > + * UINTPTR_MAX if the mempool for fast release of mbufs has > > not > > > > yet been detected. > > > > > + * NULL if fast release of mbufs is not enabled. > > > > > + * > > > > > + * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE > > > > > + */ > > > > > struct rte_mempool *mb_pool; > > > > > > > > Do all drivers to it this way? > > > > > > No, I think most drivers have separate structures for rx and tx > > queues. This driver > > > doesn't so I'm reusing the existing mempool pointer. > > > Also, they don't cache the mempool pointer, but look at mbuf[0].pool > > at every burst; > > > so their tx queue structure doesn't have a mempool pointer field. > > > And they check an offload flag (either the bit in the raw offload > > field, or a shadow > > > variable for the relevant offload flag), instead of checking the > > mempool pointer. > > > > > > Other drivers can be improved, and I have submitted an optimization > > patch for the > > > i40e driver with some of the things I do in this patch: > > > https://inbox.dpdk.org/dev/20250624061238.89259-1- > > > m...@smartsharesystems.com/ > > > > > > > Is it documented in ethdev? > > > > > > The RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE flag is documented. > > > How to implement it in the drivers is not. > > > > > > -Morten