On Mon, 2021-10-11 at 14:49 +0300, Andrew Rybchenko wrote:
> Hi Xueming,
> 
> On 9/30/21 5:55 PM, Xueming Li wrote:
> > In current DPDK framework, all RX queues is pre-loaded with mbufs for
> > incoming packets. When number of representors scale out in a switch
> > domain, the memory consumption became significant. Further more,
> > polling all ports leads to high cache miss, high latency and low
> > throughputs.
> > 
> > This patch introduces shared RX queue. PF and representors with same
> > configuration in same switch domain could share RX queue set by
> > specifying shared Rx queue offloading flag and sharing group.
> > 
> > All ports that Shared Rx queue actually shares One Rx queue and only
> > pre-load mbufs to one Rx queue, memory is saved.
> > 
> > Polling any queue using same shared RX queue receives packets from all
> > member ports. Source port is identified by mbuf->port.
> > 
> > Multiple groups is supported by group ID. Port queue number in a shared
> > group should be identical. Queue index is 1:1 mapped in shared group.
> > An example of polling two share groups:
> >   core      group   queue
> >   0 0       0
> >   1 0       1
> >   2 0       2
> >   3 0       3
> >   4 1       0
> >   5 1       1
> >   6 1       2
> >   7 1       3
> > 
> > Shared RX queue must be polled on single thread or core. If both PF0 and
> > representor0 joined same share group, can't poll pf0rxq0 on core1 and
> > rep0rxq0 on core2. Actually, polling one port within share group is
> > sufficient since polling any port in group will return packets for any
> > port in group.
> 
> I apologize that I jump in into the review process that late.

Appreciate the bold suggestion, never too late :)

> 
> Frankly speaking I doubt that it is the best design to solve
> the problem. Yes, I confirm that the problem exists, but I
> think there is better and simpler way to solve it.
> 
> The problem of the suggested solution is that it puts all
> the headache about consistency to application and PMDs
> without any help from ethdev layer to guarantee the
> consistency. As the result I believe it will be either
> missing/lost consistency checks or huge duplication in
> each PMD which supports the feature. Shared RxQs must be
> equally configured including number of queues, offloads
> (taking device level Rx offloads into account), RSS
> settings etc. So, applications must care about it and
> PMDs (or ethdev layer) must check it.

The name might be confusing, here is my understanding:
1. NIC  shares the buffer supply HW Q between shared RxQs - for memory
2. PMD polls one shared RxQ - for latency and performance
3. Most per queue features like offloads and RSS not impacted. That's
why this not mentioned. Some offloading might not being supported due
to PMD or hw limitation, need to add check in PMD case by case.
4. Multiple group is defined for service level flexibility. For
example, PF and VIP customer's load distributed via queues and dedicate
cores. Low priority customers share one core with one shared queue.
multiple groups enables more combination.
5. One port could assign queues to different group for polling
flexibility. For example first 4 queues in group 0 and next 4 queues in
group1, each group have other member ports with 4 queues, so the port
with 8 queues could be polled with 8 cores w/o non-shared rxq penalty,
in other words, each core only poll one shared RxQ.

> 
> The advantage of the solution is that any device may
> create group and subsequent devices join. Absence of
> primary device is nice. But do we really need it?
> Will the design work if some representors are configured
> to use shared RxQ, but some do not? Theoretically it
> is possible, but could require extra non-trivial code
> on fast path.

If multiple groups, any device could be hot-unplugged.

Mixed configuration is supported, the only difference is how to set
mbuf->port. Since group is per queue, mixed is better to be supported,
didn't see any difficulty here.

PDM could select to support only group 0, same settings for each rxq,
that fits most scenario.

> 
> Also looking at the first two patch I don't understand
> how application will find out which devices may share
> RxQs. E.g. if we have two difference NICs which support
> sharing, we can try to setup only one group 0, but
> finally will have two devices (not one) which must be
> polled.
> 
> 1. We need extra flag in dev_info->dev_capa
>    RTE_ETH_DEV_CAPA_RX_SHARE to advertise that
>    the device supports Rx sharing.

dev_info->rx_queue_offload_capa could be used here, no?

> 
> 2. I think we need "rx_domain" in device info
>    (which should be treated in boundaries of the
>    switch_domain) if and only if
>    RTE_ETH_DEV_CAPA_RX_SHARE is advertised.
>    Otherwise rx_domain value does not make sense.

I see, this will give flexibility of different hw, will add it.

> 
> (1) and (2) will allow application to find out which
> devices can share Rx.
> 
> 3. Primary device (representors backing device) should
>    advertise shared RxQ offload. Enabling of the offload
>    tells the device to provide packets to all device in
>    the Rx domain with mbuf->port filled in appropriately.
>    Also it allows app to identify primary device in the
>    Rx domain. When application enables the offload, it
>    must ensure that it does not treat used port_id as an
>    input port_id, but always check mbuf->port for each
>    packet.
> 
> 4. A new Rx mode should be introduced for secondary
>    devices. It should not allow to configure RSS, specify
>    any Rx offloads etc. ethdev must ensure it.
>    It is an open question right now if it should require
>    to provide primary port_id. In theory representors
>    have it. However, may be it is nice for consistency
>    to ensure that application knows that it does.
>    If shared Rx mode is specified for device, application
>    does not need to setup RxQs and attempts to do it
>    should be discarded in ethdev.
>    For consistency it is better to ensure that number of
>    queues match.

RSS and Rx offloads should be supported as individual, PMD needs to
check if not supported.

>    It is an interesting question what should happen if
>    primary device is reconfigured and shared Rx is
>    disabled on reconfiguration.

I feel better no primary port/queue assumption in configuration, all
members are equally treated, each queue can join or quit share group,
that's important to support multiple groups.

> 
> 5. If so, in theory implementation of the Rx burst
>    in the secondary could simply call Rx burst on
>    primary device.
> 
> Andrew.

Reply via email to