Re: [dpdk-dev] [dpdk-stable] [PATCH v4] net/bonding: per-slave intermediate rx ring

Matan Azrad Thu, 23 Aug 2018 00:28:50 -0700

Hi

From: Eric Kinzie
> On Wed Aug 22 11:42:37 +0000 2018, Matan Azrad wrote:
> > Hi Luca
> >
> > From: Luca Boccassi
> > > On Wed, 2018-08-22 at 07:09 +0000, Matan Azrad wrote:
> > > > Hi Chas
> > > >
> > > > From: Chas Williams
> > > > > On Tue, Aug 21, 2018 at 11:43 AM Matan Azrad
> > > > > <mailto:matan@mellanox .com> wrote:
> > > > > Hi Chas
> > > > >
> > > > > From: Chas Williams
> > > > > > On Tue, Aug 21, 2018 at 6:56 AM Matan Azrad
> > > > > > <mailto:matan@mellano x.com>
> > > > > > wrote:
> > > > > > Hi
> > > > > >
> > > > > > From: Chas Williams
> > > > > > > This will need to be implemented for some of the other RX
> > > > > > > burst methods at some point for other modes to see this
> > > > > > > performance improvement (with the exception of active-backup).
> > > > > >
> > > > > > Yes, I think it should be done at least to
> > > > > > bond_ethdev_rx_burst_8023ad_fast_queue (should be easy) for
> now.
> > > > > >
> > > > > > There is some duplicated code between the various RX paths.
> > > > > > I would like to eliminate that as much as possible, so I was
> > > > > > going to give that some thought first.
> > > > >
> > > > > There is no reason to stay this function as is while its twin is
> > > > > changed.
> > > > >
> > > > > Unfortunately, this is all the patch I have at this time.
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > > On Thu, Aug 16, 2018 at 9:32 AM Luca Boccassi
> > > > > > > <mailto:bluca@deb ian.org> wrote:
> > > > > > >
> > > > > > > > During bond 802.3ad receive, a burst of packets is fetched
> > > > > > > > from each slave into a local array and appended to
> > > > > > > > per-slave ring buffer.
> > > > > > > > Packets are taken from the head of the ring buffer and
> > > > > > > > returned to the caller.  The number of mbufs provided to
> > > > > > > > each slave is sufficient to meet the requirements of the
> > > > > > > > ixgbe vector receive.
> > > > > >
> > > > > > Luca,
> > > > > >
> > > > > > Can you explain these requirements of ixgbe?
> > > > > >
> > > > > > The ixgbe (and some other Intel PMDs) have vectorized RX
> > > > > > routines that are more efficient (if not faster) taking
> > > > > > advantage of some advanced CPU instructions.  I think you need
> > > > > > to be receiving at least 32 packets or more.
> > > > >
> > > > > So, why to do it in bond which is a generic driver for all the
> > > > > vendors PMDs, If for ixgbe and other Intel nics it is better you
> > > > > can force those PMDs to receive always 32 packets and to manage
> > > > > a ring by themselves.
> > > > >
> > > > > The drawback of the ring is some additional latency on the
> > > > > receive path.
> > > > > In testing, the additional latency hasn't been an issue for bonding.
> > > >
> > > > When bonding does processing slower it may be a bottleneck for the
> > > > packet processing for some application.
> > > >
> > > > > The bonding PMD has a fair bit of overhead associated with the
> > > > > RX and TX path calculations.  Most applications can just arrange
> > > > > to call the RX path with a sufficiently large receive.  Bonding
> > > > > can't do this.
> > > >
> > > > I didn't talk on application I talked on the slave PMDs, The slave
> > > > PMD can manage a ring by itself if it helps for its own performance.
> > > > The bonding should not be oriented to specific PMDs.
> > >
> > > The issue though is that the performance problem is not with the
> > > individual PMDs - it's with bonding. There were no reports regarding
> > > the individual PMDs.
> > > This comes from reports from customers from real world production
> > > deployments - the issue of bonding being too slow was raised multiple
> times.
> > > This patch addresses those issues, again in production deployments,
> > > where it's been used for years, to users and customers satisfaction.
> >
> > From Chas I understood that using burst of 32 helps for some slave PMDs
> performance which makes sense.
> > I can't understand how the extra copy phases improves the bonding itself
> performance:
> >
> > You added 2 copy phases in the bonding RX function:
> > 1. Get packets from the slave to a local array.
> > 2. Copy packet pointers from a local array to the ring array.
> > 3. Copy packet pointers from the ring array to the application array.
> >
> > Each packet arriving to the application must pass the above 3 phases(in a
> specific call or in previous calls).
> >
> > Without this patch we have only -
> > Get packets from the slave to the application array.
> >
> > Can you explain how the extra copies improves the bonding performance?
> >
> > Looks like it improves the slaves PMDs and because of that the bonding
> PMD performance becomes better.
> 
> I'm not sure that adding more buffer management to the vector PMDs will
> improve the drivers' performance; it's just that calling the rx function in 
> such
> a way that it returns no data wastes time.


Sorry, I don't fully understand what you said here, please rephrase.

>  The bonding driver is already an exercise in buffer management so adding 
> this layer of indirection here makes
> sense in my opinion, as does hiding the details of the consituent interfaces 
> where possible.

Can you explain how this new buffer management with the extra pointer copies 
improves the bonding itself performance?
Looks really strange to me.

> > > So I'd like to share this improvement rather than keeping it private
> > > - because I'm nice that way :-P
> > >
> > > > > > Did you check for other vendor PMDs? It may hurt performance
> > > > > > there..
> > > > > >
> > > > > > I don't know, but I suspect probably not.  For the most part
> > > > > > you are typically reading almost up to the vector requirement.
> > > > > > But if one slave has just a single packet, then you can't
> > > > > > vectorize on the next slave.
> > > > > >
> > > > >
> > > > > I don't think that the ring overhead is better for PMDs which
> > > > > are not using the vectorized instructions.
> > > > >
> > > > > The non-vectorized PMDs are usually quite slow.  The additional
> > > > > overhead doesn't make a difference in their performance.
> > > >
> > > > We should not do things worse than they are.
> > >
> > > There were no reports that this made things worse. The feedback from
> > > production was that it made things better.
> >
> > Yes, It may be good for specific slaves drivers but hurt another
> > slaves drivers, So maybe it should stay private to specific costumers using
> specific nics.
> >
> > Again, I can understand how this patch improves performance of some
> > PMDs therefore I think the bonding is not the place to add it but maybe
> some PMDs.

Re: [dpdk-dev] [dpdk-stable] [PATCH v4] net/bonding: per-slave intermediate rx ring

Reply via email to