On Fri, 2021-10-15 at 12:28 +0300, Andrew Rybchenko wrote: > On 10/12/21 5:39 PM, Xueming Li wrote: > > In current DPDK framework, each Rx queue is pre-loaded with mbufs to > > save incoming packets. For some PMDs, when number of representors scale > > out in a switch domain, the memory consumption became significant. > > Polling all ports also leads to high cache miss, high latency and low > > throughput. > > > > This patch introduce shared Rx queue. Ports in same Rx domain and > > switch domain could share Rx queue set by specifying non-zero sharing > > group in Rx queue configuration. > > > > No special API is defined to receive packets from shared Rx queue. > > Polling any member port of a shared Rx queue receives packets of that > > queue for all member ports, source port is identified by mbuf->port. > > > > Shared Rx queue must be polled in same thread or core, polling a queue > > ID of any member port is essentially same. > > > > Multiple share groups are supported by non-zero share group ID. Device > > "by non-zero share group ID" is not required. Since it must be > always non-zero to enable sharing. > > > should support mixed configuration by allowing multiple share > > groups and non-shared Rx queue. > > > > Even Rx queue shared, queue configuration like offloads and RSS should > > not be impacted. > > I don't understand above sentence. > Even when Rx queues are shared, queue configuration like > offloads and RSS may differ. If a PMD has some limitation, > it should care about consistency itself. These limitations > should be documented in the PMD documentation. >
OK, I'll remove this line. > > > > Example grouping and polling model to reflect service priority: > > Group1, 2 shared Rx queues per port: PF, rep0, rep1 > > Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127 > > Core0: poll PF queue0 > > Core1: poll PF queue1 > > Core2: poll rep2 queue0 > > > Can I have: > PF RxQ#0, RxQ#1 > Rep0 RxQ#0 shared with PF RxQ#0 > Rep1 RxQ#0 shared with PF RxQ#1 > > I guess no, since it looks like RxQ ID must be equal. > Or am I missing something? Otherwise grouping rules > are not obvious to me. May be we need dedicated > shared_qid in boundaries of the share_group? Yes, RxQ ID must be equal, following configuration should work: Rep1 RxQ#1 shared with PF RxQ#1 Equal mapping should work by default instead of a new field that must be set. I'll add some description to emphasis, how do you think? > > > > > PMD driver advertise shared Rx queue capability via > > RTE_ETH_DEV_CAPA_RXQ_SHARE. > > > > PMD driver is responsible for shared Rx queue consistency checks to > > avoid member port's configuration contradict to each other. > > > > Signed-off-by: Xueming Li <xuemi...@nvidia.com> > > --- > > doc/guides/nics/features.rst | 13 ++++++++++++ > > doc/guides/nics/features/default.ini | 1 + > > .../prog_guide/switch_representation.rst | 10 +++++++++ > > doc/guides/rel_notes/release_21_11.rst | 5 +++++ > > lib/ethdev/rte_ethdev.c | 9 ++++++++ > > lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++ > > 6 files changed, 59 insertions(+) > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst > > index e346018e4b8..b64433b8ea5 100644 > > --- a/doc/guides/nics/features.rst > > +++ b/doc/guides/nics/features.rst > > @@ -615,6 +615,19 @@ Supports inner packet L4 checksum. > > ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``. > > > > > > +.. _nic_features_shared_rx_queue: > > + > > +Shared Rx queue > > +--------------- > > + > > +Supports shared Rx queue for ports in same Rx domain of a switch domain. > > + > > +* **[uses] rte_eth_dev_info**: ``dev_capa:RTE_ETH_DEV_CAPA_RXQ_SHARE``. > > +* **[uses] rte_eth_dev_infoļ¼rte_eth_switch_info**: ``rx_domain``, > > ``domain_id``. > > +* **[uses] rte_eth_rxconf**: ``share_group``. > > +* **[provides] mbuf**: ``mbuf.port``. > > + > > + > > .. _nic_features_packet_type_parsing: > > > > Packet type parsing > > diff --git a/doc/guides/nics/features/default.ini > > b/doc/guides/nics/features/default.ini > > index d473b94091a..93f5d1b46f4 100644 > > --- a/doc/guides/nics/features/default.ini > > +++ b/doc/guides/nics/features/default.ini > > @@ -19,6 +19,7 @@ Free Tx mbuf on demand = > > Queue start/stop = > > Runtime Rx queue setup = > > Runtime Tx queue setup = > > +Shared Rx queue = > > Burst mode info = > > Power mgmt address monitor = > > MTU update = > > diff --git a/doc/guides/prog_guide/switch_representation.rst > > b/doc/guides/prog_guide/switch_representation.rst > > index ff6aa91c806..de41db8385d 100644 > > --- a/doc/guides/prog_guide/switch_representation.rst > > +++ b/doc/guides/prog_guide/switch_representation.rst > > @@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for > > applications. > > .. [1] `Ethernet switch device driver model (switchdev) > > > > <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_ > > > > +- For some PMDs, memory usage of representors is huge when number of > > + representor grows, mbufs are allocated for each descriptor of Rx queue. > > + Polling large number of ports brings more CPU load, cache miss and > > + latency. Shared Rx queue can be used to share Rx queue between PF and > > + representors among same Rx domain. ``RTE_ETH_DEV_CAPA_RXQ_SHARE`` is > > + present in device capability of device info. Setting non-zero share group > > + in Rx queue configuration to enable share. Polling any member port can > > + receive packets of all member ports in the group, port ID is saved in > > + ``mbuf.port``. > > + > > Basic SR-IOV > > ------------ > > > > diff --git a/doc/guides/rel_notes/release_21_11.rst > > b/doc/guides/rel_notes/release_21_11.rst > > index 5036641842c..d72fc97f4fb 100644 > > --- a/doc/guides/rel_notes/release_21_11.rst > > +++ b/doc/guides/rel_notes/release_21_11.rst > > @@ -141,6 +141,11 @@ New Features > > * Added tests to validate packets hard expiry. > > * Added tests to verify tunnel header verification in IPsec inbound. > > > > +* **Added ethdev shared Rx queue support.** > > + > > + * Added new device capability flag and rx domain field to switch info. > > + * Added share group to Rx queue configuration. > > + * Added testpmd support and dedicate forwarding engine. > > Please, add one more empty line since it must be two > before the next section. Also it should be put after > the last ethdev item above since list of features has > defined order. > > > > > Removed Items > > ------------- > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > > index 028907bc4b9..9b1b66370a7 100644 > > --- a/lib/ethdev/rte_ethdev.c > > +++ b/lib/ethdev/rte_ethdev.c > > @@ -2159,6 +2159,15 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t > > rx_queue_id, > > return -EINVAL; > > } > > > > + if (local_conf.share_group > 0 && > > + (dev_info.dev_capa & RTE_ETH_DEV_CAPA_RXQ_SHARE) == 0) { > > + RTE_ETHDEV_LOG(ERR, > > + "Ethdev port_id=%d rx_queue_id=%d, enabled > > share_group=%u while device doesn't support Rx queue share in %s()\n", > > + port_id, rx_queue_id, local_conf.share_group, > > + __func__); > > I'd remove function name logging here. Log is unique enough. > > > + return -EINVAL; > > + } > > + > > /* > > * If LRO is enabled, check that the maximum aggregated packet > > * size is supported by the configured device. > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > > index 6d80514ba7a..041da6ee52f 100644 > > --- a/lib/ethdev/rte_ethdev.h > > +++ b/lib/ethdev/rte_ethdev.h > > @@ -1044,6 +1044,13 @@ struct rte_eth_rxconf { > > uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. > > */ > > uint8_t rx_deferred_start; /**< Do not start queue with > > rte_eth_dev_start(). */ > > uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */ > > + /** > > + * Share group index in Rx domain and switch domain. > > + * Non-zero value to enable Rx queue share, zero value disable share. > > + * PMD driver is responsible for Rx queue consistency checks to avoid > > + * member port's configuration contradict to each other. > > + */ > > + uint32_t share_group; > > I think that we don't need 32-bit for shared groups. > 16-bits sounds more than enough. > > > /** > > * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags. > > * Only offloads set on rx_queue_offload_capa or rx_offload_capa > > @@ -1445,6 +1452,14 @@ struct rte_eth_conf { > > #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001 > > /** Device supports Tx queue setup after device started. */ > > #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 > > +/** > > + * Device supports shared Rx queue among ports within Rx domain and > > + * switch domain. Mbufs are consumed by shared Rx queue instead of > > + * every port. Multiple groups is supported by share_group of Rx > > + * queue configuration. Polling any port in the group receive packets > > + * of all member ports, source port identified by mbuf->port field. > > + */ > > +#define RTE_ETH_DEV_CAPA_RXQ_SHARE 0x00000004 > > Let's use RTE_BIT64(2) > > I think above two should be fixed in a separate > cleanup patch. Not only above two, more Rx/Tx offload bits to cleanup, let's do it later. > > > /**@}*/ > > > > /* > > @@ -1488,6 +1503,12 @@ struct rte_eth_switch_info { > > * but each driver should explicitly define the mapping of switch > > * port identifier to that physical interconnect/switch > > */ > > + uint16_t rx_domain; > > + /**< > > + * Shared Rx queue sub-domain boundary. Only ports in same Rx domain > > + * and switch domain can share Rx queue. Valid only if device advertised > > + * RTE_ETH_DEV_CAPA_RXQ_SHARE capability. > > + */ > > Please, put the documentation before the documented > field. > > [snip]