> > > -----Original Message----- > > From: Jerin Jacob <jerinjac...@gmail.com> > > Sent: Monday, October 5, 2020 5:35 PM > > To: Nicolau, Radu <radu.nico...@intel.com> > > Cc: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Richardson, Bruce > > <bruce.richard...@intel.com>; Ananyev, Konstantin > > <konstantin.anan...@intel.com>; Van Haaren, Harry > > <harry.van.haa...@intel.com>; dev@dpdk.org; jer...@marvell.com; nd > > <n...@arm.com> > > Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements > > > > On Tue, Sep 29, 2020 at 2:32 PM Nicolau, Radu <radu.nico...@intel.com> > > wrote: > > > > > > > > > On 9/28/2020 5:02 PM, Honnappa Nagarahalli wrote: > > > > <snip> > > > >>> Add minimum burst throughout the scheduler pipeline and a flush > > > >>> counter. > > > >>> Replace ring API calls with local single threaded implementation where > > > >>> possible. > > > >>> > > > >>> Signed-off-by: Radu Nicolau mailto:radu.nico...@intel.com > > > >>> > > > >>> Thanks for the patch, a few comments inline. > > > >>> > > > > > >>> Why not make these APIs part of the rte_ring library? You could > > > >>> further > > > >> optimize them by keeping the indices on the same cacheline. > > > >>> I'm not sure there is any need for non thread-safe rings outside this > > > >> particular case. > > > >>> [Honnappa] I think if we add the APIs, we will find the use cases. > > > >>> But, more than that, I understand that rte_ring structure is exposed > > > >>> to the > > > >> application. The reason for doing that is the inline functions that > > > >> rte_ring > > > >> provides. IMO, we should still maintain modularity and should not use > > > >> the > > > >> internals of the rte_ring structure outside of the library. > > > >>> +1 to that. > > > >>> > > > >>> BTW, is there any real perf benefit from such micor-optimisation? > > > >> I'd tend to view these as use-case specific, and I'm not sure we > > > >> should clutter > > > >> up the ring library with yet more functions, especially since they > > > >> can't be > > > >> mixed with the existing enqueue/dequeue functions, since they don't use > > > >> the head pointers. > > > > IMO, the ring library is pretty organized with the recent addition of > > > > HTS/RTS > > modes. This can be one of the modes and should allow us to use the existing > > functions (though additional functions are required as well). > > > > The other concern I have is, this implementation can be further > > > > optimized by > > using a single cache line for the pointers. It uses 2 cache lines just > > because of the > > layout of the rte_ring structure. > > > > There was a question earlier about the performance improvements of this > > patch? Are there any % performance improvements that can be shared? > > > > It is also possible to change the above functions to use the head/tail > > > > pointers > > from producer or the consumer cache line alone to check for perf > > differences. > > > > > > I don't have a % for the final improvement for this change alone, but > > > there was some improvement in the memory overhead measurable during > > > development, which very likely resulted in the whole optimization having > > > more headroom. > > > > > > I agree that this may be further optimized, maybe by having a local > > > implementation of a ring-like container instead. > > > > Have we decided on the next steps for this patch? Is the plan to > > supersede this patch and have different > > one in rte_ring subsystem, > > My preference is to merge this version of the patch; > 1) The ring helper functions are stripped to the SW PMD usage, and not valid > to use in the general. > 2) Adding static inline APIs in an LTS without extensive doesn't seem a good > idea. > > If Honnappa is OK with the above solution for 20.11, we can see about moving > the rings part of the > code to rte_ring library location in 21.02, and give ourselves some time to > settle the usage/API before > the next LTS. >
As ring library maintainer I share Honnappa concern that another library not uses public ring API, but instead accesses ring internals directly. Obviously such coding practice is not welcomed as it makes harder to maintain/extend ring library in future. About 2) - these new API can(/shoud) be marked an experimental anyway. As another thing - it is still unclear what a performance gain we are talking about here. Is it really worth it comparing to just using SP/SC?