> -----Original Message-----
> From: Jerin Jacob <jerinjac...@gmail.com>
> Sent: Monday, October 5, 2020 5:35 PM
> To: Nicolau, Radu <radu.nico...@intel.com>
> Cc: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Richardson, Bruce
> <bruce.richard...@intel.com>; Ananyev, Konstantin
> <konstantin.anan...@intel.com>; Van Haaren, Harry
> <harry.van.haa...@intel.com>; dev@dpdk.org; jer...@marvell.com; nd
> <n...@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements
> 
> On Tue, Sep 29, 2020 at 2:32 PM Nicolau, Radu <radu.nico...@intel.com> wrote:
> >
> >
> > On 9/28/2020 5:02 PM, Honnappa Nagarahalli wrote:
> > > <snip>
> > >>> Add minimum burst throughout the scheduler pipeline and a flush counter.
> > >>> Replace ring API calls with local single threaded implementation where
> > >>> possible.
> > >>>
> > >>> Signed-off-by: Radu Nicolau mailto:radu.nico...@intel.com
> > >>>
> > >>> Thanks for the patch, a few comments inline.
> > >>>
> 
> > >>> Why not make these APIs part of the rte_ring library? You could further
> > >> optimize them by keeping the indices on the same cacheline.
> > >>> I'm not sure there is any need for non thread-safe rings outside this
> > >> particular case.
> > >>> [Honnappa] I think if we add the APIs, we will find the use cases.
> > >>> But, more than that, I understand that rte_ring structure is exposed to 
> > >>> the
> > >> application. The reason for doing that is the inline functions that 
> > >> rte_ring
> > >> provides. IMO, we should still maintain modularity and should not use the
> > >> internals of the rte_ring structure outside of the library.
> > >>> +1 to that.
> > >>>
> > >>> BTW, is there any real perf benefit from such micor-optimisation?
> > >> I'd tend to view these as use-case specific, and I'm not sure we should 
> > >> clutter
> > >> up the ring library with yet more functions, especially since they can't 
> > >> be
> > >> mixed with the existing enqueue/dequeue functions, since they don't use
> > >> the head pointers.
> > > IMO, the ring library is pretty organized with the recent addition of 
> > > HTS/RTS
> modes. This can be one of the modes and should allow us to use the existing
> functions (though additional functions are required as well).
> > > The other concern I have is, this implementation can be further optimized 
> > > by
> using a single cache line for the pointers. It uses 2 cache lines just 
> because of the
> layout of the rte_ring structure.
> > > There was a question earlier about the performance improvements of this
> patch? Are there any % performance improvements that can be shared?
> > > It is also possible to change the above functions to use the head/tail 
> > > pointers
> from producer or the consumer cache line alone to check for perf differences.
> >
> > I don't have a % for the final improvement for this change alone, but
> > there was some improvement in the memory overhead measurable during
> > development, which very likely resulted in the whole optimization having
> > more headroom.
> >
> > I agree that this may be further optimized, maybe by having a local
> > implementation of a ring-like container instead.
> 
> Have we decided on the next steps for this patch? Is the plan to
> supersede this patch and have different
> one in rte_ring subsystem,

My preference is to merge this version of the patch;
1) The ring helper functions are stripped to the SW PMD usage, and not valid to 
use in the general.
2) Adding static inline APIs in an LTS without extensive doesn't seem a good 
idea.

If Honnappa is OK with the above solution for 20.11, we can see about moving 
the rings part of the
code to rte_ring library location in 21.02, and give ourselves some time to 
settle the usage/API before
the next LTS.

Regards, -Harry

Reply via email to