RE: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side

Morten Brørup Fri, 28 Jan 2022 03:29:20 -0800

> From: Morten Brørup
> Sent: Thursday, 27 January 2022 18.14
> 
> > From: Honnappa Nagarahalli [mailto:[email protected]]
> > Sent: Thursday, 27 January 2022 05.07
> >
> > Thanks Morten, appreciate your comments. Few responses inline.
> >
> > > -----Original Message-----
> > > From: Morten Brørup <[email protected]>
> > > Sent: Sunday, December 26, 2021 4:25 AM
> > >
> > > > From: Feifei Wang [mailto:[email protected]]
> > > > Sent: Friday, 24 December 2021 17.46
> > > >
> > <snip>
> >
> > > >
> > > > However, this solution poses several constraint:
> > > >
> > > > 1)The receive queue needs to know which transmit queue it should
> > take
> > > > the buffers from. The application logic decides which transmit
> port
> > to
> > > > use to send out the packets. In many use cases the NIC might have
> a
> > > > single port ([1], [2], [3]), in which case a given transmit queue
> > is
> > > > always mapped to a single receive queue (1:1 Rx queue: Tx queue).
> > This
> > > > is easy to configure.
> > > >
> > > > If the NIC has 2 ports (there are several references), then we
> will
> > > > have
> > > > 1:2 (RX queue: TX queue) mapping which is still easy to
> configure.
> > > > However, if this is generalized to 'N' ports, the configuration
> can
> > be
> > > > long. More over the PMD would have to scan a list of transmit
> > queues
> > > > to pull the buffers from.
> > >
> > > I disagree with the description of this constraint.
> > >
> > > As I understand it, it doesn't matter now many ports or queues are
> in
> > a NIC or
> > > system.
> > >
> > > The constraint is more narrow:
> > >
> > > This patch requires that all packets ingressing on some port/queue
> > must
> > > egress on the specific port/queue that it has been configured to
> ream
> > its
> > > buffers from. I.e. an application cannot route packets between
> > multiple ports
> > > with this patch.
> > Agree, this patch as is has this constraint. It is not a constraint
> > that would apply for NICs with single port. The above text is
> > describing some of the issues associated with generalizing the
> solution
> > for N number of ports. If N is small, the configuration is small and
> > scanning should not be bad.


But I think N is the number of queues, not the number of ports.

> >
> 
> Perhaps we can live with the 1:1 limitation, if that is the primary use
> case.

Or some similar limitation for NICs with dual ports for redundancy.

> 
> Alternatively, the feature could fall back to using the mempool if
> unable to get/put buffers directly from/to a participating NIC. In this
> case, I envision a library serving as a shim layer between the NICs and
> the mempool. In other words: Take a step back from the implementation,
> and discuss the high level requirements and architecture of the
> proposed feature.

Please ignore my comment above. I had missed the fact that the direct re-arm 
feature only works inside a single NIC, and not across multiple NICs. And it is 
not going to work across multiple NICs, unless they are exactly the same type, 
because their internal descriptor structures may differ. Also, taking a deeper 
look at the i40e part of the patch, I notice that it already falls back to 
using the mempool.

> 
> > >
> > > >
> >
> > <snip>
> >
> > > >
> > >
> > > You are missing the fourth constraint:
> > >
> > > 4) The application must transmit all received packets immediately,
> > i.e. QoS
> > > queueing and similar is prohibited.
> > I do not understand this, can you please elaborate?. Even if there is
> > QoS queuing, there would be steady stream of packets being
> transmitted.
> > These transmitted packets will fill the buffers on the RX side.
> 
> E.g. an appliance may receive packets on a 10 Gbps backbone port, and
> queue some of the packets up for a customer with a 20 Mbit/s
> subscription. When there is a large burst of packets towards that
> subscriber, they will queue up in the QoS queue dedicated to that
> subscriber. During that traffic burst, there is much more RX than TX.
> And after the traffic burst, there will be more TX than RX.
> 
> >
> > >
> > <snip>
> >
> > > >
> > >
> > > The patch provides a significant performance improvement, but I am
> > > wondering if any real world applications exist that would use this.
> > Only a
> > > "router on a stick" (i.e. a single-port router) comes to my mind,
> and
> > that is
> > > probably sufficient to call it useful in the real world. Do you
> have
> > any other
> > > examples to support the usefulness of this patch?
> > SmartNIC is a clear and dominant use case, typically they have a
> single
> > port for data plane traffic (dual ports are mostly for redundancy)
> > This patch avoids good amount of store operations. The smaller CPUs
> > found in SmartNICs have smaller store buffers which can become
> > bottlenecks. Avoiding the lcore cache saves valuable HW cache space.
> 
> OK. This is an important use case!

Some NICs have many queues, so the number of RX/TX queue mappings is big. 
Aren't SmartNICs going to use many RX/TX queues?

> 
> >
> > >
> > > Anyway, the patch doesn't do any harm if unused, and the only
> > performance
> > > cost is the "if (rxq->direct_rxrearm_enable)" branch in the Ethdev
> > driver. So I
> > > don't oppose to it.

If a PMD maintainer agrees to maintaining such a feature, I don't oppose either.

The PMDs are full of cruft already, so why bother complaining about more, if 
the performance impact is negligible. :-)

RE: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side

Reply via email to