Re: [dpdk-dev] [PATCH v3 1/8] stack: introduce rte stack library

Eads, Gage Mon, 01 Apr 2019 12:35:32 -0700

> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Monday, April 1, 2019 12:41 PM
> To: Eads, Gage <gage.e...@intel.com>; dev@dpdk.org
> Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson, Bruce
> <bruce.richard...@intel.com>; Ananyev, Konstantin
> <konstantin.anan...@intel.com>; Gavin Hu (Arm Technology China)
> <gavin...@arm.com>; nd <n...@arm.com>; tho...@monjalon.net; nd
> <n...@arm.com>
> Subject: RE: [PATCH v3 1/8] stack: introduce rte stack library
> 
> >
> > > > +static ssize_t
> > > > +rte_stack_get_memsize(unsigned int count) {
> > > > +       ssize_t sz = sizeof(struct rte_stack);
> > > > +
> > > > +       /* Add padding to avoid false sharing conflicts */
> > > > +       sz += RTE_CACHE_LINE_ROUNDUP(count * sizeof(void *)) +
> > > > +               2 * RTE_CACHE_LINE_SIZE;
> > > I did not understand how the false sharing is caused and how this
> > > padding is solving the issue. Verbose comments would help.
> >
> > The additional padding (beyond the CACHE_LINE_ROUNDUP) is to prevent
> > false sharing caused by adjacent/next-line hardware prefetchers. I'll
> > address this.
> >
> Is it not a generic problem? Or is it specific to this library?


This is not limited to this library, but it only affects systems with (enabled) 
next-line prefetchers, for example Intel products with an L2 adjacent cache 
line prefetcher[1]. For those systems, additional padding can potentially 
improve performance. As I understand it, this was the reason behind the 128B 
alignment added to rte_ring a couple years ago[2].

[1] 
https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors
[2] http://mails.dpdk.org/archives/dev/2017-February/058613.html

Re: [dpdk-dev] [PATCH v3 1/8] stack: introduce rte stack library

Reply via email to