> -----Original Message-----
> From: Thomas Monjalon <tho...@monjalon.net>
> Sent: Wednesday, October 28, 2020 6:08 PM
> To: Nithin Dabilpuram <ndabilpu...@marvell.com>; Van Haaren, Harry
> <harry.van.haa...@intel.com>
> Cc: dev@dpdk.org; Pavan Nikhilesh <pbhagavat...@marvell.com>; Jerin Jacob
> <jer...@marvell.com>; Ruifeng Wang <ruifeng.w...@arm.com>; Richardson, Bruce
> <bruce.richard...@intel.com>; Ananyev, Konstantin
> <konstantin.anan...@intel.com>; kirankum...@marvell.com; dev@dpdk.org;
> david.march...@redhat.com; olivier.m...@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic mbuf
> field
> 
> 28/10/2020 11:24, Van Haaren, Harry:
> > From: Thomas Monjalon
> > > > +       IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) =
> node_mbuf_priv1_dynfield_offset;
> > >
> > > That's interesting.
> > > You copy the offset in the node context for better performance.
> > > How much is it better than with global offset variable?
> > > How much it decreases compared to a static mbuf field?
> >
> > Also interested in this topic, I'll offer the logical/theory point of view;
> >
> > With a static field, the offset into the mbuf can be encoded in the 
> > instruction
> > stream, meaning there are no d-cache loads to identify particular dynamic 
> > field.
> >
> > With a static/global variable, the cache line where the value resides is 
> > presumably
> > not hot in cache per burst (assuming an application that does significant 
> > work, so
> not
> > in cache since last burst). Hence overhead estimate could be 1x cache line 
> > load per
> burst.
> 
> Would it help to group all dynfields and dynflags offsets
> in the same cache line?

It could - but if/how-much it would benefit depends on the workload I think.

Using each cache line fully is always good, so if grouping the offsets together 
is
reasonable to do, it seems a good idea.

My assumptions is that registration of dynamic fields/flags is expected at init 
time,
and that the values remain constant at runtime. That would make this a 
cache-line
in "shared" state in each core that uses the dynfields of mbuf.

Overall, it is unlikely to have much impact on a real-world application.. but 
DPDK
puts performance first! And packing a single cache-line full of hot data is 
best practice :)

Reply via email to