> From: Feifei Wang [mailto:feifei.wa...@arm.com] > Sent: Thursday, 25 May 2023 11.46 > > Add 'rte_eth_recycle_rx_queue_info_get' and 'rte_eth_recycle_mbufs' > APIs to recycle used mbufs from a transmit queue of an Ethernet device, > and move these mbufs into a mbuf ring for a receive queue of an Ethernet > device. This can bypass mempool 'put/get' operations hence saving CPU > cycles. > > For each recycling mbufs, the rte_eth_recycle_mbufs() function performs > the following operations: > - Copy used *rte_mbuf* buffer pointers from Tx mbuf ring into Rx mbuf > ring. > - Replenish the Rx descriptors with the recycling *rte_mbuf* mbufs freed > from the Tx mbuf ring. > > Suggested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Suggested-by: Ruifeng Wang <ruifeng.w...@arm.com> > Signed-off-by: Feifei Wang <feifei.wa...@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > ---
[...] > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h > index 2c9d615fb5..c6723d5277 100644 > --- a/lib/ethdev/ethdev_driver.h > +++ b/lib/ethdev/ethdev_driver.h > @@ -59,6 +59,10 @@ struct rte_eth_dev { > eth_rx_descriptor_status_t rx_descriptor_status; > /** Check the status of a Tx descriptor */ > eth_tx_descriptor_status_t tx_descriptor_status; > + /** Pointer to PMD transmit mbufs reuse function */ > + eth_recycle_tx_mbufs_reuse_t recycle_tx_mbufs_reuse; > + /** Pointer to PMD receive descriptors refill function */ > + eth_recycle_rx_descriptors_refill_t recycle_rx_descriptors_refill; > > /** > * Device data that is shared between primary and secondary processes The rte_eth_dev struct currently looks like this: /** * @internal * The generic data structure associated with each Ethernet device. * * Pointers to burst-oriented packet receive and transmit functions are * located at the beginning of the structure, along with the pointer to * where all the data elements for the particular device are stored in shared * memory. This split allows the function pointer and driver data to be per- * process, while the actual configuration data for the device is shared. */ struct rte_eth_dev { eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function */ eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function */ /** Pointer to PMD transmit prepare function */ eth_tx_prep_t tx_pkt_prepare; /** Get the number of used Rx descriptors */ eth_rx_queue_count_t rx_queue_count; /** Check the status of a Rx descriptor */ eth_rx_descriptor_status_t rx_descriptor_status; /** Check the status of a Tx descriptor */ eth_tx_descriptor_status_t tx_descriptor_status; /** * Device data that is shared between primary and secondary processes */ struct rte_eth_dev_data *data; void *process_private; /**< Pointer to per-process device data */ const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */ struct rte_device *device; /**< Backing device */ struct rte_intr_handle *intr_handle; /**< Device interrupt handle */ /** User application callbacks for NIC interrupts */ struct rte_eth_dev_cb_list link_intr_cbs; /** * User-supplied functions called from rx_burst to post-process * received packets before passing them to the user */ struct rte_eth_rxtx_callback *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT]; /** * User-supplied functions called from tx_burst to pre-process * received packets before passing them to the driver for transmission */ struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT]; enum rte_eth_dev_state state; /**< Flag indicating the port state */ void *security_ctx; /**< Context for security ops */ } __rte_cache_aligned; Inserting the two new function pointers (recycle_tx_mbufs_reuse and recycle_rx_descriptors_refill) as the 7th and 8th fields will move the 'data' and 'process_private' pointers out of the first cache line. If those data pointers are used in the fast path with the rx_pkt_burst and tx_pkt_burst functions, moving them to a different cache line might have a performance impact on those two functions. Disclaimer: This is a big "if", and wild speculation from me, because I haven't looked at it in detail! If this structure is not used in the fast path like this, you can ignore my suggestion below. Please consider moving the 'data' and 'process_private' pointers to the beginning of this structure, so they are kept in the same cache line as the rx_pkt_burst and tx_pkt_burst function pointers. I don't know the relative importance of the remaining six fast path functions (the four existing ones plus the two new ones in this patch), so you could also rearrange those, so the least important two functions are moved out of the first cache line. It doesn't have to be the two recycle functions that go into a different cache line. -Morten