> -----Original Message----- > From: Thomas Monjalon <tho...@monjalon.net> > Sent: Wednesday, October 28, 2020 6:08 PM > To: Nithin Dabilpuram <ndabilpu...@marvell.com>; Van Haaren, Harry > <harry.van.haa...@intel.com> > Cc: dev@dpdk.org; Pavan Nikhilesh <pbhagavat...@marvell.com>; Jerin Jacob > <jer...@marvell.com>; Ruifeng Wang <ruifeng.w...@arm.com>; Richardson, Bruce > <bruce.richard...@intel.com>; Ananyev, Konstantin > <konstantin.anan...@intel.com>; kirankum...@marvell.com; dev@dpdk.org; > david.march...@redhat.com; olivier.m...@6wind.com > Subject: Re: [dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic mbuf > field > > 28/10/2020 11:24, Van Haaren, Harry: > > From: Thomas Monjalon > > > > + IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) = > node_mbuf_priv1_dynfield_offset; > > > > > > That's interesting. > > > You copy the offset in the node context for better performance. > > > How much is it better than with global offset variable? > > > How much it decreases compared to a static mbuf field? > > > > Also interested in this topic, I'll offer the logical/theory point of view; > > > > With a static field, the offset into the mbuf can be encoded in the > > instruction > > stream, meaning there are no d-cache loads to identify particular dynamic > > field. > > > > With a static/global variable, the cache line where the value resides is > > presumably > > not hot in cache per burst (assuming an application that does significant > > work, so > not > > in cache since last burst). Hence overhead estimate could be 1x cache line > > load per > burst. > > Would it help to group all dynfields and dynflags offsets > in the same cache line?
It could - but if/how-much it would benefit depends on the workload I think. Using each cache line fully is always good, so if grouping the offsets together is reasonable to do, it seems a good idea. My assumptions is that registration of dynamic fields/flags is expected at init time, and that the values remain constant at runtime. That would make this a cache-line in "shared" state in each core that uses the dynfields of mbuf. Overall, it is unlikely to have much impact on a real-world application.. but DPDK puts performance first! And packing a single cache-line full of hot data is best practice :)