RE: [RFC v2 3/7] ethdev: introduce Rx queue based limit watermark

Spike Du Mon, 23 May 2022 20:46:42 -0700

> -----Original Message-----
> From: Stephen Hemminger <step...@networkplumber.org>
> Sent: Tuesday, May 24, 2022 6:55 AM
> To: Spike Du <spi...@nvidia.com>
> Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko
> <viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; dev@dpdk.org;
> Raslan Darawsheh <rasl...@nvidia.com>
> Subject: Re: [RFC v2 3/7] ethdev: introduce Rx queue based limit watermark
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, 23 May 2022 03:01:20 +0000
> Spike Du <spi...@nvidia.com> wrote:
> 
> > Hi, pls see below.
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger <step...@networkplumber.org>
> > > Sent: Sunday, May 22, 2022 11:23 PM
> > > To: Spike Du <spi...@nvidia.com>
> > > Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko
> > > <viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; dev@dpdk.org;
> > > Raslan Darawsheh <rasl...@nvidia.com>
> > > Subject: Re: [RFC v2 3/7] ethdev: introduce Rx queue based limit
> > > watermark
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Sun, 22 May 2022 08:58:56 +0300
> > > Spike Du <spi...@nvidia.com> wrote:
> > >
> > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > index
> > > > 04cff8ee10..687ae5ff29 100644
> > > > --- a/lib/ethdev/rte_ethdev.h
> > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > @@ -1249,7 +1249,16 @@ struct rte_eth_rxconf {
> > > >        */
> > > >       union rte_eth_rxseg *rx_seg;
> > > >
> > > > -     uint64_t reserved_64s[2]; /**< Reserved for future fields */
> > > > +     /**
> > > > +      * Per-queue Rx limit watermark defined as percentage of Rx queue
> > > > +      * size. If Rx queue receives traffic higher than this percentage,
> > > > +      * the event RTE_ETH_EVENT_RX_LWM is triggered.
> > > > +      */
> > > > +     uint8_t lwm;
> > > > +
> > > > +     uint8_t reserved_bits[3];
> > > > +     uint32_t reserved_32s;
> > > > +     uint64_t reserved_64s;
> > > >       void *reserved_ptrs[2];   /**< Reserved for future fields */
> > > >  };
> > > >
> > >
> > > Ok but, this is an ABI risk about this because reserved stuff was
> > > never required before.
> > > Whenever is a reserved field is introduced the code (in this case
> > > rte_ethdev_configure).
> > >
> > > Best practice would have been to have the code require all reserved
> > > fields be
> > > 0 in earlier releases. In this case an application is like to define
> > > a watermark of zero; how will your code handle it.
> > Having watermark of 0 is desired, which is the default. LWM of 0 means
> > the Rx Queue's watermark is not monitored, hence no LWM event is
> generated.
> > >
> > > Also, using 8 bits as percentage is different than how other API's handle
> this.
> > > Since Rx queue size is in packets, why is this not in packets?
> > The short answer is to simply the LWM configuration.
> > Rx queue descriptor is complex nowadays.
> > For normal queue, user may configure LWM according to queue descriptor
> number easily.
> > But for below queues, it's not easy:
> > Take mprq as example, the testpmd cmd  options can be " -a
> >
> 0000:03:00.0,rxqs_min_mprq=1,mprq_en=1,mprq_max_memcpy_len=465,
> mprq_lo
> > g_stride_size=8,mprq_log_stride_num=3
> > -- --mbcache=512 -i  --nb-cores=7  --txd=1024 --rxd=1024 ", For MLX5
> > implementation,  the minimum "unit" in queue has 64 descriptors, the
> > "unit" number is 16,  if you configure according to descriptor number(1024)
> Here, you may easily set LWM as something like 512, but HW doesn't allow it,
> because 512 > 16. If you want the watermark to be half, the correct value is 
> 8.
> > The same issue happens to feature like "Rx queue buffer split" where a
> packet can be split to multiple descriptors.
> > Using percentage doesn't have such issues, PMD will cover all the details.
> >
> > > Also document what behavior of 0 is.
> > Sure. The behavior is like the old days without this feature, pls see above.
> >
> > > Why introduce new query/set operations? This should just be part of
> > > the overall device configuration.
> > Due to different implementation. LWM can be a dynamic configuration
> which can help user design a flexible flow control.
> > User may feel ok with LWM of 80% to get high throughput, or later on with
> 50% to throttle the traffic responsively by handling LWM event in order to
> reduce drop.
> > Some driver like mlx5 may implement LWM event as one-time shot. When
> > you receive LWM event, you need to reconfigure LWM in order to receive
> the event again, thus you will not likely to be overwhelmed by the events.
> > These all require set operation.
> >
> > For the query operation. The rte_event API
> rte_eth_dev_callback_process() is per-port API, it doesn't carry much
> information when an event happens.
> > When a LWM event happens, we need to know in which Rx queue it
> happens or optionally what's the current LWM percentage of this queue.
> > The query operation serves this purpose.
> >
> >
> > Regards,
> > Spike.
> >
> >
> 
> The bigger question is why does this have to be just MLX5 and why can't it fit
> into the existing DPDK RX interrupt framework?
> 
> Linux and BSD have had this for years in their packet coalescing logic.
> Ethtool provides ability to set lot of irq coalescing parameters like:
> 
>        ethtool -C|--coalesce devname [adaptive-rx on|off] [adaptive-tx on|off]
>               [rx-usecs N] [rx-frames N] [rx-usecs-irq N] [rx-frames-irq N]
>               [tx-usecs N] [tx-frames N] [tx-usecs-irq N] [tx-frames-irq N]
>               [stats-block-usecs N] [pkt-rate-low N] [rx-usecs-low N]
>               [rx-frames-low N] [tx-usecs-low N] [tx-frames-low N]
>               [pkt-rate-high N] [rx-usecs-high N] [rx-frames-high N]
>               [tx-usecs-high N] [tx-frames-high N] [sample-interval N]
>               [cqe-mode-rx on|off] [cqe-mode-tx on|off]
> 
> It feels like this is just the DPDK version of a small subset of that.
> Since many device already support IRQ coalescing, it would be best to build
> one new API that has most of these. Rather than a MLX/Nvidia only API for a
> single parameter.

I take MLX5 as example here because I only know how mlx5 works, I don't 
understand
How other NICs work.  It doesn't mean I try to change common code only to 
satisfy 
Mlx5 needs.

I think interrupt coalesce is different from LWM:
Interrupt coalesce is delay interrupt until a batch of packets(or an interval) 
is received. 
LWM intends to notify when a Rx queue is out of buffer. Delaying interrupt 
can't detect
A specific fullness value of the Rx queue, but LWM can if driver supports it.


Regards,
Spike.
RE: [RFC v2 3/7] ethdev: introduce Rx queue based limit watermark

Reply via email to