On Wed, Oct 28, 2020 at 10:24:01AM +0000, Van Haaren, Harry wrote:
> > -----Original Message-----
> > From: dev <dev-boun...@dpdk.org> On Behalf Of Thomas Monjalon
> > Sent: Wednesday, October 28, 2020 10:09 AM
> > To: Nithin Dabilpuram <ndabilpu...@marvell.com>
> > Cc: Pavan Nikhilesh <pbhagavat...@marvell.com>; Jerin Jacob
> > <jer...@marvell.com>; Ruifeng Wang <ruifeng.w...@arm.com>; Richardson, Bruce
> > <bruce.richard...@intel.com>; Ananyev, Konstantin
> > <konstantin.anan...@intel.com>; kirankum...@marvell.com; dev@dpdk.org;
> > david.march...@redhat.com; olivier.m...@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic 
> > mbuf
> > field
> > 
> > 28/10/2020 10:30, Nithin Dabilpuram:
> > > From: Thomas Monjalon <tho...@monjalon.net>
> > >
> > > The node_mbuf_priv1 was stored in the deprecated mbuf field udata64.
> > > It is moved to a dynamic field in order to allow removal of udata64.
> > >
> > > Signed-off-by: Thomas Monjalon <tho...@monjalon.net>
> > > Signed-off-by: Nithin Dabilpuram <ndabilpu...@marvell.com>
> > [...]
> > > + IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) =
> > node_mbuf_priv1_dynfield_offset;
> > 
> > That's interesting.
> > You copy the offset in the node context for better performance.
> > How much is it better than with global offset variable?
> > How much it decreases compared to a static mbuf field?
> 
> Also interested in this topic, I'll offer the logical/theory point of view;
> 
> With a static field, the offset into the mbuf can be encoded in the 
> instruction
> stream, meaning there are no d-cache loads to identify particular dynamic 
> field.
> 
> With a static/global variable, the cache line where the value resides is 
> presumably
> not hot in cache per burst (assuming an application that does significant 
> work, so not
> in cache since last burst). Hence overhead estimate could be 1x cache line 
> load per burst.
> 
> With the data copied into the node, the offset is presumably on a hot cache 
> line as the
> node is using other data-members of its context. As a result, perhaps a cold 
> static cache
> line load is converted to a hot node-context line re-use. 
> 
> Real world overhead likely depends on A) does the application cache-trash 
> enough to make
> the static/global line fall out of cache - causing perf degradation due to 
> reload, and B) does
> the node->ctx still fit in the same number of lines as before if the value is 
> copied there.

Agreed, node->ctx is already referred to get other data (lpm pointer). So
referening another 4 bytes might even convert that to load pair which is at
no extra cost.

Number's wise, 
it decreases by ~1.4 % from static mbuf field to global offset variable 
and it decreases by ~1% from static mbuf field to node context field
cached per process call

Reply via email to