> -----Original Message-----
> From: dev <dev-boun...@dpdk.org> On Behalf Of Thomas Monjalon
> Sent: Wednesday, October 28, 2020 10:09 AM
> To: Nithin Dabilpuram <ndabilpu...@marvell.com>
> Cc: Pavan Nikhilesh <pbhagavat...@marvell.com>; Jerin Jacob
> <jer...@marvell.com>; Ruifeng Wang <ruifeng.w...@arm.com>; Richardson, Bruce
> <bruce.richard...@intel.com>; Ananyev, Konstantin
> <konstantin.anan...@intel.com>; kirankum...@marvell.com; dev@dpdk.org;
> david.march...@redhat.com; olivier.m...@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic mbuf
> field
> 
> 28/10/2020 10:30, Nithin Dabilpuram:
> > From: Thomas Monjalon <tho...@monjalon.net>
> >
> > The node_mbuf_priv1 was stored in the deprecated mbuf field udata64.
> > It is moved to a dynamic field in order to allow removal of udata64.
> >
> > Signed-off-by: Thomas Monjalon <tho...@monjalon.net>
> > Signed-off-by: Nithin Dabilpuram <ndabilpu...@marvell.com>
> [...]
> > +   IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) =
> node_mbuf_priv1_dynfield_offset;
> 
> That's interesting.
> You copy the offset in the node context for better performance.
> How much is it better than with global offset variable?
> How much it decreases compared to a static mbuf field?

Also interested in this topic, I'll offer the logical/theory point of view;

With a static field, the offset into the mbuf can be encoded in the instruction
stream, meaning there are no d-cache loads to identify particular dynamic field.

With a static/global variable, the cache line where the value resides is 
presumably
not hot in cache per burst (assuming an application that does significant work, 
so not
in cache since last burst). Hence overhead estimate could be 1x cache line load 
per burst.

With the data copied into the node, the offset is presumably on a hot cache 
line as the
node is using other data-members of its context. As a result, perhaps a cold 
static cache
line load is converted to a hot node-context line re-use. 

Real world overhead likely depends on A) does the application cache-trash 
enough to make
the static/global line fall out of cache - causing perf degradation due to 
reload, and B) does
the node->ctx still fit in the same number of lines as before if the value is 
copied there.

Reply via email to