Re: [dpdk-dev] [PATCH v2 01/15] ethdev: introduce shared Rx queue

Xueming(Steven) Li Tue, 28 Sep 2021 02:50:40 -0700

On Thu, 2021-09-16 at 09:46 +0530, Jerin Jacob wrote:
> On Wed, Sep 15, 2021 at 8:15 PM Xueming(Steven) Li <[email protected]> 
> wrote:
> > 
> > Hi Jerin,
> > 
> > On Mon, 2021-08-30 at 15:01 +0530, Jerin Jacob wrote:
> > > On Sat, Aug 28, 2021 at 7:46 PM Xueming(Steven) Li <[email protected]> 
> > > wrote:
> > > > 
> > > > 
> > > > 
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <[email protected]>
> > > > > Sent: Thursday, August 26, 2021 7:58 PM
> > > > > To: Xueming(Steven) Li <[email protected]>
> > > > > Cc: dpdk-dev <[email protected]>; Ferruh Yigit <[email protected]>; 
> > > > > NBU-Contact-Thomas Monjalon <[email protected]>;
> > > > > Andrew Rybchenko <[email protected]>
> > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx queue
> > > > > 
> > > > > On Thu, Aug 19, 2021 at 5:39 PM Xueming(Steven) Li 
> > > > > <[email protected]> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <[email protected]>
> > > > > > > Sent: Thursday, August 19, 2021 1:27 PM
> > > > > > > To: Xueming(Steven) Li <[email protected]>
> > > > > > > Cc: dpdk-dev <[email protected]>; Ferruh Yigit 
> > > > > > > <[email protected]>;
> > > > > > > NBU-Contact-Thomas Monjalon <[email protected]>; Andrew 
> > > > > > > Rybchenko
> > > > > > > <[email protected]>
> > > > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx queue
> > > > > > > 
> > > > > > > On Wed, Aug 18, 2021 at 4:44 PM Xueming(Steven) Li 
> > > > > > > <[email protected]> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Jerin Jacob <[email protected]>
> > > > > > > > > Sent: Tuesday, August 17, 2021 11:12 PM
> > > > > > > > > To: Xueming(Steven) Li <[email protected]>
> > > > > > > > > Cc: dpdk-dev <[email protected]>; Ferruh Yigit
> > > > > > > > > <[email protected]>; NBU-Contact-Thomas Monjalon
> > > > > > > > > <[email protected]>; Andrew Rybchenko
> > > > > > > > > <[email protected]>
> > > > > > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx 
> > > > > > > > > queue
> > > > > > > > > 
> > > > > > > > > On Tue, Aug 17, 2021 at 5:01 PM Xueming(Steven) Li 
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Jerin Jacob <[email protected]>
> > > > > > > > > > > Sent: Tuesday, August 17, 2021 5:33 PM
> > > > > > > > > > > To: Xueming(Steven) Li <[email protected]>
> > > > > > > > > > > Cc: dpdk-dev <[email protected]>; Ferruh Yigit
> > > > > > > > > > > <[email protected]>; NBU-Contact-Thomas Monjalon
> > > > > > > > > > > <[email protected]>; Andrew Rybchenko
> > > > > > > > > > > <[email protected]>
> > > > > > > > > > > Subject: Re: [PATCH v2 01/15] ethdev: introduce shared Rx
> > > > > > > > > > > queue
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, Aug 11, 2021 at 7:34 PM Xueming Li 
> > > > > > > > > > > <[email protected]> wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > In current DPDK framework, each RX queue is pre-loaded
> > > > > > > > > > > > with mbufs for incoming packets. When number of
> > > > > > > > > > > > representors scale out in a switch domain, the memory
> > > > > > > > > > > > consumption became significant. Most important, polling
> > > > > > > > > > > > all ports leads to high cache miss, high latency and 
> > > > > > > > > > > > low throughput.
> > > > > > > > > > > > 
> > > > > > > > > > > > This patch introduces shared RX queue. Ports with same
> > > > > > > > > > > > configuration in a switch domain could share RX queue 
> > > > > > > > > > > > set by specifying sharing group.
> > > > > > > > > > > > Polling any queue using same shared RX queue receives
> > > > > > > > > > > > packets from all member ports. Source port is 
> > > > > > > > > > > > identified by mbuf->port.
> > > > > > > > > > > > 
> > > > > > > > > > > > Port queue number in a shared group should be identical.
> > > > > > > > > > > > Queue index is
> > > > > > > > > > > > 1:1 mapped in shared group.
> > > > > > > > > > > > 
> > > > > > > > > > > > Share RX queue must be polled on single thread or core.
> > > > > > > > > > > > 
> > > > > > > > > > > > Multiple groups is supported by group ID.
> > > > > > > > > > > > 
> > > > > > > > > > > > Signed-off-by: Xueming Li <[email protected]>
> > > > > > > > > > > > Cc: Jerin Jacob <[email protected]>
> > > > > > > > > > > > ---
> > > > > > > > > > > > Rx queue object could be used as shared Rx queue object,
> > > > > > > > > > > > it's important to clear all queue control callback api 
> > > > > > > > > > > > that using queue object:
> > > > > > > > > > > > 
> > > > > > > > > > > > https://mails.dpdk.org/archives/dev/2021-July/215574.html
> > > > > > > > > > > 
> > > > > > > > > > > >  #undef RTE_RX_OFFLOAD_BIT2STR diff --git
> > > > > > > > > > > > a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h 
> > > > > > > > > > > > index
> > > > > > > > > > > > d2b27c351f..a578c9db9d 100644
> > > > > > > > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > > > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > > > > > > > @@ -1047,6 +1047,7 @@ struct rte_eth_rxconf {
> > > > > > > > > > > >         uint8_t rx_drop_en; /**< Drop packets if no 
> > > > > > > > > > > > descriptors are available. */
> > > > > > > > > > > >         uint8_t rx_deferred_start; /**< Do not start 
> > > > > > > > > > > > queue with rte_eth_dev_start(). */
> > > > > > > > > > > >         uint16_t rx_nseg; /**< Number of descriptions 
> > > > > > > > > > > > in rx_seg array.
> > > > > > > > > > > > */
> > > > > > > > > > > > +       uint32_t shared_group; /**< Shared port group
> > > > > > > > > > > > + index in switch domain. */
> > > > > > > > > > > 
> > > > > > > > > > > Not to able to see anyone setting/creating this group ID 
> > > > > > > > > > > test application.
> > > > > > > > > > > How this group is created?
> > > > > > > > > > 
> > > > > > > > > > Nice catch, the initial testpmd version only support one 
> > > > > > > > > > default group(0).
> > > > > > > > > > All ports that supports shared-rxq assigned in same group.
> > > > > > > > > > 
> > > > > > > > > > We should be able to change "--rxq-shared" to 
> > > > > > > > > > "--rxq-shared-group"
> > > > > > > > > > to support group other than default.
> > > > > > > > > > 
> > > > > > > > > > To support more groups simultaneously, need to consider
> > > > > > > > > > testpmd forwarding stream core assignment, all streams in 
> > > > > > > > > > same group need to stay on same core.
> > > > > > > > > > It's possible to specify how many ports to increase group
> > > > > > > > > > number, but user must schedule stream affinity carefully - 
> > > > > > > > > > error prone.
> > > > > > > > > > 
> > > > > > > > > > On the other hand, one group should be sufficient for most
> > > > > > > > > > customer, the doubt is whether it valuable to support 
> > > > > > > > > > multiple groups test.
> > > > > > > > > 
> > > > > > > > > Ack. One group is enough in testpmd.
> > > > > > > > > 
> > > > > > > > > My question was more about who and how this group is created,
> > > > > > > > > Should n't we need API to create shared_group? If we do the 
> > > > > > > > > following, at least, I can think, how it can be implemented 
> > > > > > > > > in SW
> > > > > or other HW.
> > > > > > > > > 
> > > > > > > > > - Create aggregation queue group
> > > > > > > > > - Attach multiple  Rx queues to the aggregation queue group
> > > > > > > > > - Pull the packets from the queue group(which internally fetch
> > > > > > > > > from the Rx queues _attached_)
> > > > > > > > > 
> > > > > > > > > Does the above kind of sequence, break your representor use 
> > > > > > > > > case?
> > > > > > > > 
> > > > > > > > Seems more like a set of EAL wrapper. Current API tries to 
> > > > > > > > minimize the application efforts to adapt shared-rxq.
> > > > > > > > - step 1, not sure how important it is to create group with 
> > > > > > > > API, in rte_flow, group is created on demand.
> > > > > > > 
> > > > > > > Which rte_flow pattern/action for this?
> > > > > > 
> > > > > > No rte_flow for this, just recalled that the group in rte_flow is 
> > > > > > not created along with flow, not via api.
> > > > > > I don’t see anything else to create along with group, just double 
> > > > > > whether it valuable to introduce a new api set to manage group.
> > > > > 
> > > > > See below.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > - step 2, currently, the attaching is done in 
> > > > > > > > rte_eth_rx_queue_setup, specify offload and group in rx_conf 
> > > > > > > > struct.
> > > > > > > > - step 3, define a dedicate api to receive packets from shared 
> > > > > > > > rxq? Looks clear to receive packets from shared rxq.
> > > > > > > >   currently, rxq objects in share group is same - the shared 
> > > > > > > > rxq, so the eth callback eth_rx_burst_t(rxq_obj, mbufs, n) could
> > > > > > > >   be used to receive packets from any ports in group, normally 
> > > > > > > > the first port(PF) in group.
> > > > > > > >   An alternative way is defining a vdev with same queue number 
> > > > > > > > and copy rxq objects will make the vdev a proxy of
> > > > > > > >   the shared rxq group - this could be an helper API.
> > > > > > > > 
> > > > > > > > Anyway the wrapper doesn't break use case, step 3 api is more 
> > > > > > > > clear, need to understand how to implement efficiently.
> > > > > > > 
> > > > > > > Are you doing this feature based on any HW support or it just pure
> > > > > > > SW thing, If it is SW, It is better to have just new vdev for 
> > > > > > > like drivers/net/bonding/. This we can help aggregate multiple 
> > > > > > > Rxq across
> > > > > the multiple ports of same the driver.
> > > > > > 
> > > > > > Based on HW support.
> > > > > 
> > > > > In Marvel HW, we do some support, I will outline here and some 
> > > > > queries on this.
> > > > > 
> > > > > # We need to create some new HW structure for aggregation # Connect 
> > > > > each Rxq to the new HW structure for aggregation # Use
> > > > > rx_burst from the new HW structure.
> > > > > 
> > > > > Could you outline your HW support?
> > > > > 
> > > > > Also, I am not able to understand how this will reduce the memory, 
> > > > > atleast in our HW need creating more memory now to deal this as
> > > > > we need to deal new HW structure.
> > > > > 
> > > > > How is in your HW it reduces the memory? Also, if memory is the 
> > > > > constraint, why NOT reduce the number of queues.
> > > > > 
> > > > 
> > > > Glad to know that Marvel is working on this, what's the status of 
> > > > driver implementation?
> > > > 
> > > > In my PMD implementation, it's very similar, a new HW object shared 
> > > > memory pool is created to replace per rxq memory pool.
> > > > Legacy rxq feed queue with allocated mbufs as number of descriptors, 
> > > > now shared rxqs share the same pool, no need to supply
> > > > mbufs for each rxq, just feed the shared rxq.
> > > > 
> > > > So the memory saving reflects to mbuf per rxq, even 1000 representors 
> > > > in shared rxq group, the mbufs consumed is one rxq.
> > > > In other words, new members in shared rxq doesn’t allocate new mbufs to 
> > > > feed rxq, just share with existing shared rxq(HW mempool).
> > > > The memory required to setup each rxq doesn't change too much, agree.
> > > 
> > > We can ask the application to configure the same mempool for multiple
> > > RQ too. RIght? If the saving is based on sharing the mempool
> > > with multiple RQs.
> > > 
> > > > 
> > > > > # Also, I was thinking, one way to avoid the fast path or ABI change 
> > > > > would like.
> > > > > 
> > > > > # Driver Initializes one more eth_dev_ops in driver as aggregator 
> > > > > ethdev # devargs of new ethdev or specific API like
> > > > > drivers/net/bonding/rte_eth_bond.h can take the argument (port, 
> > > > > queue) tuples which needs to aggregate by new ethdev port # No
> > > > > change in fastpath or ABI is required in this model.
> > > > > 
> > > > 
> > > > This could be an option to access shared rxq. What's the difference of 
> > > > the new PMD?
> > > 
> > > No ABI and fast change are required.
> > > 
> > > > What's the difference of PMD driver to create the new device?
> > > > 
> > > > Is it important in your implementation? Does it work with existing 
> > > > rx_burst api?
> > > 
> > > Yes . It will work with the existing rx_burst API.
> > > 
> > 
> > The aggregator ethdev required by user is a port, maybe it good to add
> > a callback for PMD to prepare a complete ethdev just like creating
> > representor ethdev - pmd register new port internally. If the PMD
> > doens't provide the callback, ethdev api fallback to initialize an
> > empty ethdev by copy rxq data(shared) and rx_burst api from source port
> > and share group. Actually users can do this fallback themselves or with
> > an util api.
> > 
> > IIUC, an aggregator ethdev not a must, do you think we can continue and
> > leave that design in later stage?
> 
> 
> IMO aggregator ethdev reduces the complexity for application hence
> avoid any change in
> test application etc. IMO, I prefer to take that. I will leave the
> decision to ethdev maintainers.


Hi Jerin, new API added for aggregator, the last one in v3, thanks! 

> 
> 
> > 
> > > > 
> > > > > 
> > > > > 
> > > > > > Most user might uses PF in group as the anchor port to rx burst, 
> > > > > > current definition should be easy for them to migrate.
> > > > > > but some user might prefer grouping some hot
> > > > > > plug/unpluggedrepresentors, EAL could provide wrappers, users could 
> > > > > > do that either due to the strategy not complex enough.
> > > > > Anyway, welcome any suggestion.
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > >         /**
> > > > > > > > > > > >          * Per-queue Rx offloads to be set using 
> > > > > > > > > > > > DEV_RX_OFFLOAD_* flags.
> > > > > > > > > > > >          * Only offloads set on rx_queue_offload_capa or
> > > > > > > > > > > > rx_offload_capa @@ -1373,6 +1374,12 @@ struct 
> > > > > > > > > > > > rte_eth_conf
> > > > > > > > > > > > { #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> > > > > > > > > > > >  #define DEV_RX_OFFLOAD_RSS_HASH                
> > > > > > > > > > > > 0x00080000
> > > > > > > > > > > >  #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
> > > > > > > > > > > > +/**
> > > > > > > > > > > > + * Rx queue is shared among ports in same switch domain
> > > > > > > > > > > > +to save memory,
> > > > > > > > > > > > + * avoid polling each port. Any port in group can be 
> > > > > > > > > > > > used to receive packets.
> > > > > > > > > > > > + * Real source port number saved in mbuf->port field.
> > > > > > > > > > > > + */
> > > > > > > > > > > > +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ   0x00200000
> > > > > > > > > > > > 
> > > > > > > > > > > >  #define DEV_RX_OFFLOAD_CHECKSUM 
> > > > > > > > > > > > (DEV_RX_OFFLOAD_IPV4_CKSUM | \
> > > > > > > > > > > >                                  
> > > > > > > > > > > > DEV_RX_OFFLOAD_UDP_CKSUM
> > > > > > > > > > > > > \
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > 
> >

Re: [dpdk-dev] [PATCH v2 01/15] ethdev: introduce shared Rx queue

Reply via email to