> -----Original Message-----
> From: Morten Brørup <m...@smartsharesystems.com>
> Sent: Thursday, May 25, 2023 11:09 PM
> To: Feifei Wang <feifei.wa...@arm.com>; tho...@monjalon.net; Ferruh
> Yigit <ferruh.yi...@amd.com>; Andrew Rybchenko
> <andrew.rybche...@oktetlabs.ru>
> Cc: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; Ruifeng Wang
> <ruifeng.w...@arm.com>
> Subject: RE: [PATCH v6 1/4] ethdev: add API for mbufs recycle mode
> 
> > From: Feifei Wang [mailto:feifei.wa...@arm.com]
> > Sent: Thursday, 25 May 2023 11.46
> >
> > Add 'rte_eth_recycle_rx_queue_info_get' and 'rte_eth_recycle_mbufs'
> > APIs to recycle used mbufs from a transmit queue of an Ethernet
> > device, and move these mbufs into a mbuf ring for a receive queue of
> > an Ethernet device. This can bypass mempool 'put/get' operations hence
> > saving CPU cycles.
> >
> > For each recycling mbufs, the rte_eth_recycle_mbufs() function
> > performs the following operations:
> > - Copy used *rte_mbuf* buffer pointers from Tx mbuf ring into Rx mbuf
> > ring.
> > - Replenish the Rx descriptors with the recycling *rte_mbuf* mbufs
> > freed from the Tx mbuf ring.
> >
> > Suggested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > Suggested-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > Signed-off-by: Feifei Wang <feifei.wa...@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > ---
> 
> [...]
> 
> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index 2c9d615fb5..c6723d5277 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -59,6 +59,10 @@ struct rte_eth_dev {
> >     eth_rx_descriptor_status_t rx_descriptor_status;
> >     /** Check the status of a Tx descriptor */
> >     eth_tx_descriptor_status_t tx_descriptor_status;
> > +   /** Pointer to PMD transmit mbufs reuse function */
> > +   eth_recycle_tx_mbufs_reuse_t recycle_tx_mbufs_reuse;
> > +   /** Pointer to PMD receive descriptors refill function */
> > +   eth_recycle_rx_descriptors_refill_t recycle_rx_descriptors_refill;
> >
> >     /**
> >      * Device data that is shared between primary and secondary
> > processes
> 
> The rte_eth_dev struct currently looks like this:
> 
> /**
>  * @internal
>  * The generic data structure associated with each Ethernet device.
>  *
>  * Pointers to burst-oriented packet receive and transmit functions are
>  * located at the beginning of the structure, along with the pointer to
>  * where all the data elements for the particular device are stored in shared
>  * memory. This split allows the function pointer and driver data to be per-
>  * process, while the actual configuration data for the device is shared.
>  */
> struct rte_eth_dev {
>       eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function */
>       eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function */
> 
>       /** Pointer to PMD transmit prepare function */
>       eth_tx_prep_t tx_pkt_prepare;
>       /** Get the number of used Rx descriptors */
>       eth_rx_queue_count_t rx_queue_count;
>       /** Check the status of a Rx descriptor */
>       eth_rx_descriptor_status_t rx_descriptor_status;
>       /** Check the status of a Tx descriptor */
>       eth_tx_descriptor_status_t tx_descriptor_status;
> 
>       /**
>        * Device data that is shared between primary and secondary
> processes
>        */
>       struct rte_eth_dev_data *data;
>       void *process_private; /**< Pointer to per-process device data */
>       const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD
> */
>       struct rte_device *device; /**< Backing device */
>       struct rte_intr_handle *intr_handle; /**< Device interrupt handle */
> 
>       /** User application callbacks for NIC interrupts */
>       struct rte_eth_dev_cb_list link_intr_cbs;
>       /**
>        * User-supplied functions called from rx_burst to post-process
>        * received packets before passing them to the user
>        */
>       struct rte_eth_rxtx_callback
> *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>       /**
>        * User-supplied functions called from tx_burst to pre-process
>        * received packets before passing them to the driver for transmission
>        */
>       struct rte_eth_rxtx_callback
> *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> 
>       enum rte_eth_dev_state state; /**< Flag indicating the port state */
>       void *security_ctx; /**< Context for security ops */ }
> __rte_cache_aligned;
> 
> Inserting the two new function pointers (recycle_tx_mbufs_reuse and
> recycle_rx_descriptors_refill) as the 7th and 8th fields will move the 'data' 
> and
> 'process_private' pointers out of the first cache line.
> 
> If those data pointers are used in the fast path with the rx_pkt_burst and
> tx_pkt_burst functions, moving them to a different cache line might have a
> performance impact on those two functions.
> 
> Disclaimer: This is a big "if", and wild speculation from me, because I 
> haven't
> looked at it in detail! If this structure is not used in the fast path like 
> this, you
> can ignore my suggestion below.
> 
> Please consider moving the 'data' and 'process_private' pointers to the
> beginning of this structure, so they are kept in the same cache line as the
> rx_pkt_burst and tx_pkt_burst function pointers.
> 
> I don't know the relative importance of the remaining six fast path functions
> (the four existing ones plus the two new ones in this patch), so you could 
> also
> rearrange those, so the least important two functions are moved out of the
> first cache line. It doesn't have to be the two recycle functions that go 
> into a
> different cache line.
> 
> -Morten

This is a good question~. By reviewing the code, we find the pointers which are 
used for fast path
can be mapped to  structure 'rte_eth_fp_ops *fpo', this ensures all fast path 
pointers are in the same
Rx/Tx cacheline

void
eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
                const struct rte_eth_dev *dev)
{
        fpo->rx_pkt_burst = dev->rx_pkt_burst;
        fpo->tx_pkt_burst = dev->tx_pkt_burst;
        fpo->tx_pkt_prepare = dev->tx_pkt_prepare;
        fpo->rx_queue_count = dev->rx_queue_count;
        fpo->rx_descriptor_status = dev->rx_descriptor_status;
        fpo->tx_descriptor_status = dev->tx_descriptor_status;
        fpo->recycle_tx_mbufs_reuse = dev->recycle_tx_mbufs_reuse;
        fpo->recycle_rx_descriptors_refill = dev->recycle_rx_descriptors_refill;

        fpo->rxq.data = dev->data->rx_queues;
        fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;

        fpo->txq.data = dev->data->tx_queues;
        fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
}

Besides rx_queues and tx_queues pointer are important for fast path,  other 
members of
'data' and 'process_private' are for slow path. So it is not necessary for 
these members to be
in the cacheline. 

Reply via email to