On 9/30/21 5:55 PM, Xueming Li wrote: > In current DPDK framework, each RX queue is pre-loaded with mbufs for
RX -> Rx > incoming packets. When number of representors scale out in a switch > domain, the memory consumption became significant. Most important, > polling all ports leads to high cache miss, high latency and low > throughput. It should be highlighted that it is a problem of some PMDs. Not all. > > This patch introduces shared RX queue. Ports with same configuration in "This patch introduces" -> "Introduce" RX -> Rx > a switch domain could share RX queue set by specifying sharing group. RX -> Rx > Polling any queue using same shared RX queue receives packets from all RX -> Rx > member ports. Source port is identified by mbuf->port. > > Port queue number in a shared group should be identical. Queue index is > 1:1 mapped in shared group. > > Share RX queue must be polled on single thread or core. RX -> Rx > > Multiple groups is supported by group ID. is -> are > > Signed-off-by: Xueming Li <xuemi...@nvidia.com> > Cc: Jerin Jacob <jerinjac...@gmail.com> The patch should update release notes. > --- > Rx queue object could be used as shared Rx queue object, it's important > to clear all queue control callback api that using queue object: > https://mails.dpdk.org/archives/dev/2021-July/215574.html > --- > doc/guides/nics/features.rst | 11 +++++++++++ > doc/guides/nics/features/default.ini | 1 + > doc/guides/prog_guide/switch_representation.rst | 10 ++++++++++ > lib/ethdev/rte_ethdev.c | 1 + > lib/ethdev/rte_ethdev.h | 7 +++++++ > 5 files changed, 30 insertions(+) > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst > index 4fce8cd1c97..69bc1d5719c 100644 > --- a/doc/guides/nics/features.rst > +++ b/doc/guides/nics/features.rst > @@ -626,6 +626,17 @@ Supports inner packet L4 checksum. > ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``. > > > +.. _nic_features_shared_rx_queue: > + > +Shared Rx queue > +--------------- > + > +Supports shared Rx queue for ports in same switch domain. > + > +* **[uses] rte_eth_rxconf,rte_eth_rxmode**: > ``offloads:RTE_ETH_RX_OFFLOAD_SHARED_RXQ``. > +* **[provides] mbuf**: ``mbuf.port``. > + > + > .. _nic_features_packet_type_parsing: > > Packet type parsing > diff --git a/doc/guides/nics/features/default.ini > b/doc/guides/nics/features/default.ini > index 754184ddd4d..ebeb4c18512 100644 > --- a/doc/guides/nics/features/default.ini > +++ b/doc/guides/nics/features/default.ini > @@ -19,6 +19,7 @@ Free Tx mbuf on demand = > Queue start/stop = > Runtime Rx queue setup = > Runtime Tx queue setup = > +Shared Rx queue = > Burst mode info = > Power mgmt address monitor = > MTU update = > diff --git a/doc/guides/prog_guide/switch_representation.rst > b/doc/guides/prog_guide/switch_representation.rst > index ff6aa91c806..bc7ce65fa3d 100644 > --- a/doc/guides/prog_guide/switch_representation.rst > +++ b/doc/guides/prog_guide/switch_representation.rst > @@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for > applications. > .. [1] `Ethernet switch device driver model (switchdev) > <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_ > > +- Memory usage of representors is huge when number of representor grows, > + because PMD always allocate mbuf for each descriptor of Rx queue. It is a problem of some PMDs only. So, it must be rewritten to highlight it. > + Polling the large number of ports brings more CPU load, cache miss and > + latency. Shared Rx queue can be used to share Rx queue between PF and > + representors in same switch. ``RTE_ETH_RX_OFFLOAD_SHARED_RXQ`` is > + present in Rx offloading capability of device info. Setting the > + offloading flag in device Rx mode or Rx queue configuration to enable > + shared Rx queue. Polling any member port of the shared Rx queue can return > + packets of all ports in the group, port ID is saved in ``mbuf.port``. > + > Basic SR-IOV > ------------ > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > index 61aa49efec6..73270c10492 100644 > --- a/lib/ethdev/rte_ethdev.c > +++ b/lib/ethdev/rte_ethdev.c > @@ -127,6 +127,7 @@ static const struct { > RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM), > RTE_RX_OFFLOAD_BIT2STR(RSS_HASH), > RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT), > + RTE_ETH_RX_OFFLOAD_BIT2STR(SHARED_RXQ), > }; > > #undef RTE_RX_OFFLOAD_BIT2STR > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > index afdc53b674c..d7ac625ee74 100644 > --- a/lib/ethdev/rte_ethdev.h > +++ b/lib/ethdev/rte_ethdev.h > @@ -1077,6 +1077,7 @@ struct rte_eth_rxconf { > uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. > */ > uint8_t rx_deferred_start; /**< Do not start queue with > rte_eth_dev_start(). */ > uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */ > + uint32_t shared_group; /**< Shared port group index in switch domain. */ > /** > * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags. > * Only offloads set on rx_queue_offload_capa or rx_offload_capa > @@ -1403,6 +1404,12 @@ struct rte_eth_conf { > #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x00040000 > #define DEV_RX_OFFLOAD_RSS_HASH 0x00080000 > #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000 > +/** > + * Rx queue is shared among ports in same switch domain to save memory, > + * avoid polling each port. Any port in the group can be used to receive > + * packets. Real source port number saved in mbuf->port field. > + */ > +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ 0x00200000 > > #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \ > DEV_RX_OFFLOAD_UDP_CKSUM | \ > IMHO it should be squashed with the second patch to make it easier to review. Otherwise it is hard to understand what is shared_group and the offlaod which are dead in the patch.