> -----Original Message-----
> From: Morten Brørup <m...@smartsharesystems.com>
> Sent: Thursday, May 25, 2023 11:09 PM
> To: Feifei Wang <feifei.wa...@arm.com>; tho...@monjalon.net; Ferruh
> Yigit <ferruh.yi...@amd.com>; Andrew Rybchenko
> <andrew.rybche...@oktetlabs.ru>
> Cc: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; Ruifeng Wang
> <ruifeng.w...@arm.com>
> Subject: RE: [PATCH v6 1/4] ethdev: add API for mbufs recycle mode
>
> > From: Feifei Wang [mailto:feifei.wa...@arm.com]
> > Sent: Thursday, 25 May 2023 11.46
> >
> > Add 'rte_eth_recycle_rx_queue_info_get' and 'rte_eth_recycle_mbufs'
> > APIs to recycle used mbufs from a transmit queue of an Ethernet
> > device, and move these mbufs into a mbuf ring for a receive queue of
> > an Ethernet device. This can bypass mempool 'put/get' operations hence
> > saving CPU cycles.
> >
> > For each recycling mbufs, the rte_eth_recycle_mbufs() function
> > performs the following operations:
> > - Copy used *rte_mbuf* buffer pointers from Tx mbuf ring into Rx mbuf
> > ring.
> > - Replenish the Rx descriptors with the recycling *rte_mbuf* mbufs
> > freed from the Tx mbuf ring.
> >
> > Suggested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > Suggested-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > Signed-off-by: Feifei Wang <feifei.wa...@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > ---
>
> [...]
>
> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index 2c9d615fb5..c6723d5277 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -59,6 +59,10 @@ struct rte_eth_dev {
> > eth_rx_descriptor_status_t rx_descriptor_status;
> > /** Check the status of a Tx descriptor */
> > eth_tx_descriptor_status_t tx_descriptor_status;
> > + /** Pointer to PMD transmit mbufs reuse function */
> > + eth_recycle_tx_mbufs_reuse_t recycle_tx_mbufs_reuse;
> > + /** Pointer to PMD receive descriptors refill function */
> > + eth_recycle_rx_descriptors_refill_t recycle_rx_descriptors_refill;
> >
> > /**
> > * Device data that is shared between primary and secondary
> > processes
>
> The rte_eth_dev struct currently looks like this:
>
> /**
> * @internal
> * The generic data structure associated with each Ethernet device.
> *
> * Pointers to burst-oriented packet receive and transmit functions are
> * located at the beginning of the structure, along with the pointer to
> * where all the data elements for the particular device are stored in shared
> * memory. This split allows the function pointer and driver data to be per-
> * process, while the actual configuration data for the device is shared.
> */
> struct rte_eth_dev {
> eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function */
> eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function */
>
> /** Pointer to PMD transmit prepare function */
> eth_tx_prep_t tx_pkt_prepare;
> /** Get the number of used Rx descriptors */
> eth_rx_queue_count_t rx_queue_count;
> /** Check the status of a Rx descriptor */
> eth_rx_descriptor_status_t rx_descriptor_status;
> /** Check the status of a Tx descriptor */
> eth_tx_descriptor_status_t tx_descriptor_status;
>
> /**
> * Device data that is shared between primary and secondary
> processes
> */
> struct rte_eth_dev_data *data;
> void *process_private; /**< Pointer to per-process device data */
> const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD
> */
> struct rte_device *device; /**< Backing device */
> struct rte_intr_handle *intr_handle; /**< Device interrupt handle */
>
> /** User application callbacks for NIC interrupts */
> struct rte_eth_dev_cb_list link_intr_cbs;
> /**
> * User-supplied functions called from rx_burst to post-process
> * received packets before passing them to the user
> */
> struct rte_eth_rxtx_callback
> *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> /**
> * User-supplied functions called from tx_burst to pre-process
> * received packets before passing them to the driver for transmission
> */
> struct rte_eth_rxtx_callback
> *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>
> enum rte_eth_dev_state state; /**< Flag indicating the port state */
> void *security_ctx; /**< Context for security ops */ }
> __rte_cache_aligned;
>
> Inserting the two new function pointers (recycle_tx_mbufs_reuse and
> recycle_rx_descriptors_refill) as the 7th and 8th fields will move the 'data'
> and
> 'process_private' pointers out of the first cache line.
>
> If those data pointers are used in the fast path with the rx_pkt_burst and
> tx_pkt_burst functions, moving them to a different cache line might have a
> performance impact on those two functions.
>
> Disclaimer: This is a big "if", and wild speculation from me, because I
> haven't
> looked at it in detail! If this structure is not used in the fast path like
> this, you
> can ignore my suggestion below.
>
> Please consider moving the 'data' and 'process_private' pointers to the
> beginning of this structure, so they are kept in the same cache line as the
> rx_pkt_burst and tx_pkt_burst function pointers.
>
> I don't know the relative importance of the remaining six fast path functions
> (the four existing ones plus the two new ones in this patch), so you could
> also
> rearrange those, so the least important two functions are moved out of the
> first cache line. It doesn't have to be the two recycle functions that go
> into a
> different cache line.
>
> -Morten
This is a good question~. By reviewing the code, we find the pointers which are
used for fast path
can be mapped to structure 'rte_eth_fp_ops *fpo', this ensures all fast path
pointers are in the same
Rx/Tx cacheline
void
eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
const struct rte_eth_dev *dev)
{
fpo->rx_pkt_burst = dev->rx_pkt_burst;
fpo->tx_pkt_burst = dev->tx_pkt_burst;
fpo->tx_pkt_prepare = dev->tx_pkt_prepare;
fpo->rx_queue_count = dev->rx_queue_count;
fpo->rx_descriptor_status = dev->rx_descriptor_status;
fpo->tx_descriptor_status = dev->tx_descriptor_status;
fpo->recycle_tx_mbufs_reuse = dev->recycle_tx_mbufs_reuse;
fpo->recycle_rx_descriptors_refill = dev->recycle_rx_descriptors_refill;
fpo->rxq.data = dev->data->rx_queues;
fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
fpo->txq.data = dev->data->tx_queues;
fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
}
Besides rx_queues and tx_queues pointer are important for fast path, other
members of
'data' and 'process_private' are for slow path. So it is not necessary for
these members to be
in the cacheline.