On Tue, 2021-09-28 at 19:08 +0530, Jerin Jacob wrote:
> On Tue, Sep 28, 2021 at 6:55 PM Xueming(Steven) Li <xuemi...@nvidia.com> 
> wrote:
> > 
> > On Tue, 2021-09-28 at 18:28 +0530, Jerin Jacob wrote:
> > > On Tue, Sep 28, 2021 at 5:07 PM Xueming(Steven) Li <xuemi...@nvidia.com> 
> > > wrote:
> > > > 
> > > > On Tue, 2021-09-28 at 15:05 +0530, Jerin Jacob wrote:
> > > > > On Sun, Sep 26, 2021 at 11:06 AM Xueming(Steven) Li 
> > > > > <xuemi...@nvidia.com> wrote:
> > > > > > 
> > > > > > On Wed, 2021-08-11 at 13:04 +0100, Ferruh Yigit wrote:
> > > > > > > On 8/11/2021 9:28 AM, Xueming(Steven) Li wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Jerin Jacob <jerinjac...@gmail.com>
> > > > > > > > > Sent: Wednesday, August 11, 2021 4:03 PM
> > > > > > > > > To: Xueming(Steven) Li <xuemi...@nvidia.com>
> > > > > > > > > Cc: dpdk-dev <dev@dpdk.org>; Ferruh Yigit 
> > > > > > > > > <ferruh.yi...@intel.com>; NBU-Contact-Thomas Monjalon 
> > > > > > > > > <tho...@monjalon.net>;
> > > > > > > > > Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
> > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared 
> > > > > > > > > Rx queue
> > > > > > > > > 
> > > > > > > > > On Mon, Aug 9, 2021 at 7:46 PM Xueming(Steven) Li 
> > > > > > > > > <xuemi...@nvidia.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Jerin Jacob <jerinjac...@gmail.com>
> > > > > > > > > > > Sent: Monday, August 9, 2021 9:51 PM
> > > > > > > > > > > To: Xueming(Steven) Li <xuemi...@nvidia.com>
> > > > > > > > > > > Cc: dpdk-dev <dev@dpdk.org>; Ferruh Yigit 
> > > > > > > > > > > <ferruh.yi...@intel.com>;
> > > > > > > > > > > NBU-Contact-Thomas Monjalon <tho...@monjalon.net>; Andrew 
> > > > > > > > > > > Rybchenko
> > > > > > > > > > > <andrew.rybche...@oktetlabs.ru>
> > > > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce 
> > > > > > > > > > > shared Rx queue
> > > > > > > > > > > 
> > > > > > > > > > > On Mon, Aug 9, 2021 at 5:18 PM Xueming Li 
> > > > > > > > > > > <xuemi...@nvidia.com> wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > In current DPDK framework, each RX queue is pre-loaded 
> > > > > > > > > > > > with mbufs
> > > > > > > > > > > > for incoming packets. When number of representors scale 
> > > > > > > > > > > > out in a
> > > > > > > > > > > > switch domain, the memory consumption became 
> > > > > > > > > > > > significant. Most
> > > > > > > > > > > > important, polling all ports leads to high cache miss, 
> > > > > > > > > > > > high
> > > > > > > > > > > > latency and low throughput.
> > > > > > > > > > > > 
> > > > > > > > > > > > This patch introduces shared RX queue. Ports with same
> > > > > > > > > > > > configuration in a switch domain could share RX queue 
> > > > > > > > > > > > set by specifying sharing group.
> > > > > > > > > > > > Polling any queue using same shared RX queue receives 
> > > > > > > > > > > > packets from
> > > > > > > > > > > > all member ports. Source port is identified by 
> > > > > > > > > > > > mbuf->port.
> > > > > > > > > > > > 
> > > > > > > > > > > > Port queue number in a shared group should be 
> > > > > > > > > > > > identical. Queue
> > > > > > > > > > > > index is
> > > > > > > > > > > > 1:1 mapped in shared group.
> > > > > > > > > > > > 
> > > > > > > > > > > > Share RX queue is supposed to be polled on same thread.
> > > > > > > > > > > > 
> > > > > > > > > > > > Multiple groups is supported by group ID.
> > > > > > > > > > > 
> > > > > > > > > > > Is this offload specific to the representor? If so can 
> > > > > > > > > > > this name be changed specifically to representor?
> > > > > > > > > > 
> > > > > > > > > > Yes, PF and representor in switch domain could take 
> > > > > > > > > > advantage.
> > > > > > > > > > 
> > > > > > > > > > > If it is for a generic case, how the flow ordering will 
> > > > > > > > > > > be maintained?
> > > > > > > > > > 
> > > > > > > > > > Not quite sure that I understood your question. The control 
> > > > > > > > > > path of is
> > > > > > > > > > almost same as before, PF and representor port still 
> > > > > > > > > > needed, rte flows not impacted.
> > > > > > > > > > Queues still needed for each member port, descriptors(mbuf) 
> > > > > > > > > > will be
> > > > > > > > > > supplied from shared Rx queue in my PMD implementation.
> > > > > > > > > 
> > > > > > > > > My question was if create a generic 
> > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple ethdev 
> > > > > > > > > receive queues land into the same
> > > > > > > > > receive queue, In that case, how the flow order is maintained 
> > > > > > > > > for respective receive queues.
> > > > > > > > 
> > > > > > > > I guess the question is testpmd forward stream? The forwarding 
> > > > > > > > logic has to be changed slightly in case of shared rxq.
> > > > > > > > basically for each packet in rx_burst result, lookup source 
> > > > > > > > stream according to mbuf->port, forwarding to target fs.
> > > > > > > > Packets from same source port could be grouped as a small burst 
> > > > > > > > to process, this will accelerates the performance if traffic 
> > > > > > > > come from
> > > > > > > > limited ports. I'll introduce some common api to do shard rxq 
> > > > > > > > forwarding, call it with packets handling callback, so it 
> > > > > > > > suites for
> > > > > > > > all forwarding engine. Will sent patches soon.
> > > > > > > > 
> > > > > > > 
> > > > > > > All ports will put the packets in to the same queue (share 
> > > > > > > queue), right? Does
> > > > > > > this means only single core will poll only, what will happen if 
> > > > > > > there are
> > > > > > > multiple cores polling, won't it cause problem?
> > > > > > > 
> > > > > > > And if this requires specific changes in the application, I am 
> > > > > > > not sure about
> > > > > > > the solution, can't this work in a transparent way to the 
> > > > > > > application?
> > > > > > 
> > > > > > Discussed with Jerin, new API introduced in v3 2/8 that aggregate 
> > > > > > ports
> > > > > > in same group into one new port. Users could schedule polling on the
> > > > > > aggregated port instead of all member ports.
> > > > > 
> > > > > The v3 still has testpmd changes in fastpath. Right? IMO, For this
> > > > > feature, we should not change fastpath of testpmd
> > > > > application. Instead, testpmd can use aggregated ports probably as
> > > > > separate fwd_engine to show how to use this feature.
> > > > 
> > > > Good point to discuss :) There are two strategies to polling a shared
> > > > Rxq:
> > > > 1. polling each member port
> > > >    All forwarding engines can be reused to work as before.
> > > >    My testpmd patches are efforts towards this direction.
> > > >    Does your PMD support this?
> > > 
> > > Not unfortunately. More than that, every application needs to change
> > > to support this model.
> > 
> > Both strategies need user application to resolve port ID from mbuf and
> > process accordingly.
> > This one doesn't demand aggregated port, no polling schedule change.
> 
> I was thinking, mbuf will be updated from driver/aggregator port as when it
> comes to application.
> 
> > 
> > > 
> > > > 2. polling aggregated port
> > > >    Besides forwarding engine, need more work to to demo it.
> > > >    This is an optional API, not supported by my PMD yet.
> > > 
> > > We are thinking of implementing this PMD when it comes to it, ie.
> > > without application change in fastpath
> > > logic.
> > 
> > Fastpath have to resolve port ID anyway and forwarding according to
> > logic. Forwarding engine need to adapt to support shard Rxq.
> > Fortunately, in testpmd, this can be done with an abstract API.
> > 
> > Let's defer part 2 until some PMD really support it and tested, how do
> > you think?
> 
> We are not planning to use this feature so either way it is OK to me.
> I leave to ethdev maintainers decide between 1 vs 2.

A better driver should support both, but specific driver could select
either one. 1 brings less changes to application, 2 brings better
performance with additional steps.

> 
> I do have a strong opinion not changing the testpmd basic forward engines
> for this feature.I would like to keep it simple as fastpath optimized and 
> would
> like to add a separate Forwarding engine as means to verify this feature.
> 
> 
> 
> > 
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Overall, is this for optimizing memory for the port represontors? 
> > > > > > > If so can't we
> > > > > > > have a port representor specific solution, reducing scope can 
> > > > > > > reduce the
> > > > > > > complexity it brings?
> > > > > > > 
> > > > > > > > > If this offload is only useful for representor case, Can we 
> > > > > > > > > make this offload specific to representor the case by 
> > > > > > > > > changing its name and
> > > > > > > > > scope.
> > > > > > > > 
> > > > > > > > It works for both PF and representors in same switch domain, 
> > > > > > > > for application like OVS, few changes to apply.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Signed-off-by: Xueming Li <xuemi...@nvidia.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  doc/guides/nics/features.rst                    | 11 
> > > > > > > > > > > > +++++++++++
> > > > > > > > > > > >  doc/guides/nics/features/default.ini            |  1 +
> > > > > > > > > > > >  doc/guides/prog_guide/switch_representation.rst | 10 
> > > > > > > > > > > > ++++++++++
> > > > > > > > > > > >  lib/ethdev/rte_ethdev.c                         |  1 +
> > > > > > > > > > > >  lib/ethdev/rte_ethdev.h                         |  7 
> > > > > > > > > > > > +++++++
> > > > > > > > > > > >  5 files changed, 30 insertions(+)
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst
> > > > > > > > > > > > b/doc/guides/nics/features.rst index 
> > > > > > > > > > > > a96e12d155..2e2a9b1554 100644
> > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > @@ -624,6 +624,17 @@ Supports inner packet L4 checksum.
> > > > > > > > > > > >    
> > > > > > > > > > > > ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > +.. _nic_features_shared_rx_queue:
> > > > > > > > > > > > +
> > > > > > > > > > > > +Shared Rx queue
> > > > > > > > > > > > +---------------
> > > > > > > > > > > > +
> > > > > > > > > > > > +Supports shared Rx queue for ports in same switch 
> > > > > > > > > > > > domain.
> > > > > > > > > > > > +
> > > > > > > > > > > > +* **[uses]     rte_eth_rxconf,rte_eth_rxmode**: 
> > > > > > > > > > > > ``offloads:RTE_ETH_RX_OFFLOAD_SHARED_RXQ``.
> > > > > > > > > > > > +* **[provides] mbuf**: ``mbuf.port``.
> > > > > > > > > > > > +
> > > > > > > > > > > > +
> > > > > > > > > > > >  .. _nic_features_packet_type_parsing:
> > > > > > > > > > > > 
> > > > > > > > > > > >  Packet type parsing
> > > > > > > > > > > > diff --git a/doc/guides/nics/features/default.ini
> > > > > > > > > > > > b/doc/guides/nics/features/default.ini
> > > > > > > > > > > > index 754184ddd4..ebeb4c1851 100644
> > > > > > > > > > > > --- a/doc/guides/nics/features/default.ini
> > > > > > > > > > > > +++ b/doc/guides/nics/features/default.ini
> > > > > > > > > > > > @@ -19,6 +19,7 @@ Free Tx mbuf on demand =
> > > > > > > > > > > >  Queue start/stop     =
> > > > > > > > > > > >  Runtime Rx queue setup =
> > > > > > > > > > > >  Runtime Tx queue setup =
> > > > > > > > > > > > +Shared Rx queue      =
> > > > > > > > > > > >  Burst mode info      =
> > > > > > > > > > > >  Power mgmt address monitor =
> > > > > > > > > > > >  MTU update           =
> > > > > > > > > > > > diff --git 
> > > > > > > > > > > > a/doc/guides/prog_guide/switch_representation.rst
> > > > > > > > > > > > b/doc/guides/prog_guide/switch_representation.rst
> > > > > > > > > > > > index ff6aa91c80..45bf5a3a10 100644
> > > > > > > > > > > > --- a/doc/guides/prog_guide/switch_representation.rst
> > > > > > > > > > > > +++ b/doc/guides/prog_guide/switch_representation.rst
> > > > > > > > > > > > @@ -123,6 +123,16 @@ thought as a software "patch 
> > > > > > > > > > > > panel" front-end for applications.
> > > > > > > > > > > >  .. [1] `Ethernet switch device driver model (switchdev)
> > > > > > > > > > > > 
> > > > > > > > > > > > <https://www.kernel.org/doc/Documentation/networking/switchdev.txt
> > > > > > > > > > > > > `_
> > > > > > > > > > > > 
> > > > > > > > > > > > +- Memory usage of representors is huge when number of 
> > > > > > > > > > > > representor
> > > > > > > > > > > > +grows,
> > > > > > > > > > > > +  because PMD always allocate mbuf for each descriptor 
> > > > > > > > > > > > of Rx queue.
> > > > > > > > > > > > +  Polling the large number of ports brings more CPU 
> > > > > > > > > > > > load, cache
> > > > > > > > > > > > +miss and
> > > > > > > > > > > > +  latency. Shared Rx queue can be used to share Rx 
> > > > > > > > > > > > queue between
> > > > > > > > > > > > +PF and
> > > > > > > > > > > > +  representors in same switch domain.
> > > > > > > > > > > > +``RTE_ETH_RX_OFFLOAD_SHARED_RXQ``
> > > > > > > > > > > > +  is present in Rx offloading capability of device 
> > > > > > > > > > > > info. Setting
> > > > > > > > > > > > +the
> > > > > > > > > > > > +  offloading flag in device Rx mode or Rx queue 
> > > > > > > > > > > > configuration to
> > > > > > > > > > > > +enable
> > > > > > > > > > > > +  shared Rx queue. Polling any member port of shared 
> > > > > > > > > > > > Rx queue can
> > > > > > > > > > > > +return
> > > > > > > > > > > > +  packets of all ports in group, port ID is saved in 
> > > > > > > > > > > > ``mbuf.port``.
> > > > > > > > > > > > +
> > > > > > > > > > > >  Basic SR-IOV
> > > > > > > > > > > >  ------------
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git a/lib/ethdev/rte_ethdev.c 
> > > > > > > > > > > > b/lib/ethdev/rte_ethdev.c
> > > > > > > > > > > > index 9d95cd11e1..1361ff759a 100644
> > > > > > > > > > > > --- a/lib/ethdev/rte_ethdev.c
> > > > > > > > > > > > +++ b/lib/ethdev/rte_ethdev.c
> > > > > > > > > > > > @@ -127,6 +127,7 @@ static const struct {
> > > > > > > > > > > >         RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > > > > > > > > > > >         RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> > > > > > > > > > > >         RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
> > > > > > > > > > > > +       RTE_ETH_RX_OFFLOAD_BIT2STR(SHARED_RXQ),
> > > > > > > > > > > >  };
> > > > > > > > > > > > 
> > > > > > > > > > > >  #undef RTE_RX_OFFLOAD_BIT2STR
> > > > > > > > > > > > diff --git a/lib/ethdev/rte_ethdev.h 
> > > > > > > > > > > > b/lib/ethdev/rte_ethdev.h
> > > > > > > > > > > > index d2b27c351f..a578c9db9d 100644
> > > > > > > > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > > > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > > > > > > > @@ -1047,6 +1047,7 @@ struct rte_eth_rxconf {
> > > > > > > > > > > >         uint8_t rx_drop_en; /**< Drop packets if no 
> > > > > > > > > > > > descriptors are available. */
> > > > > > > > > > > >         uint8_t rx_deferred_start; /**< Do not start 
> > > > > > > > > > > > queue with rte_eth_dev_start(). */
> > > > > > > > > > > >         uint16_t rx_nseg; /**< Number of descriptions 
> > > > > > > > > > > > in rx_seg array.
> > > > > > > > > > > > */
> > > > > > > > > > > > +       uint32_t shared_group; /**< Shared port group 
> > > > > > > > > > > > index in
> > > > > > > > > > > > + switch domain. */
> > > > > > > > > > > >         /**
> > > > > > > > > > > >          * Per-queue Rx offloads to be set using 
> > > > > > > > > > > > DEV_RX_OFFLOAD_* flags.
> > > > > > > > > > > >          * Only offloads set on rx_queue_offload_capa or
> > > > > > > > > > > > rx_offload_capa @@ -1373,6 +1374,12 @@ struct 
> > > > > > > > > > > > rte_eth_conf {
> > > > > > > > > > > > #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> > > > > > > > > > > >  #define DEV_RX_OFFLOAD_RSS_HASH                
> > > > > > > > > > > > 0x00080000
> > > > > > > > > > > >  #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
> > > > > > > > > > > > +/**
> > > > > > > > > > > > + * Rx queue is shared among ports in same switch 
> > > > > > > > > > > > domain to save
> > > > > > > > > > > > +memory,
> > > > > > > > > > > > + * avoid polling each port. Any port in group can be 
> > > > > > > > > > > > used to receive packets.
> > > > > > > > > > > > + * Real source port number saved in mbuf->port field.
> > > > > > > > > > > > + */
> > > > > > > > > > > > +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ   0x00200000
> > > > > > > > > > > > 
> > > > > > > > > > > >  #define DEV_RX_OFFLOAD_CHECKSUM 
> > > > > > > > > > > > (DEV_RX_OFFLOAD_IPV4_CKSUM | \
> > > > > > > > > > > >                                  
> > > > > > > > > > > > DEV_RX_OFFLOAD_UDP_CKSUM | \
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > 
> > > > 
> > 

Reply via email to