On Thu, May 12, 2016 at 01:14:34PM +0000, Ananyev, Konstantin wrote: > > > > On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote: > > > Hi Jerrin, > > > > > > > > > > > Hi All, > > > > > > > > I would like align mbuff rearm_data field to 8 byte aligned so that > > > > write to mbuf->rearm_data with uint64_t* will be naturally aligned. > > > > I am not sure about IA but some other architecture/implementation has > > > > overhead > > > > in non-naturally aligned stores. > > > > > > > > Proposed patch is something like this below, But open for any change to > > > > make fit for all other architectures/platform. > > > > > > > > Any thoughts ? > > > > > > > > ? [master] [dpdk-master] $ git diff > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > > > index 529debb..5a917d0 100644 > > > > --- a/lib/librte_mbuf/rte_mbuf.h > > > > +++ b/lib/librte_mbuf/rte_mbuf.h > > > > @@ -733,10 +733,8 @@ struct rte_mbuf { > > > > void *buf_addr; /**< Virtual address of segment > > > > buffer. */ > > > > phys_addr_t buf_physaddr; /**< Physical address of segment > > > > buffer. */ > > > > > > > > - uint16_t buf_len; /**< Length of segment buffer. */ > > > > - > > > > > > > > > There is no need to move buf_len itself, I think. > > > Just move rearm_data marker prior to buf_len is enough. > > > Though how do you suggest to deal with the fact, that right now we blindly > > > update the whole 64bits pointed by rearm_data: > > > > > > drivers/net/ixgbe/ixgbe_rxtx_vec.c: > > > /* > > > * Flush mbuf with pkt template. > > > * Data to be rearmed is 6 bytes long. > > > * Though, RX will overwrite ol_flags that are coming next > > > * anyway. So overwrite whole 8 bytes with one load: > > > * 6 bytes of rearm_data plus first 2 bytes of ol_flags. > > > */ > > > p0 = (uintptr_t)&mb0->rearm_data; > > > *(uint64_t *)p0 = rxq->mbuf_initializer; > > > > > > ? > > > > > > If buf_len will be inside these 64bits, we can't do it anymore. > > > > > > Are you suggesting something like: > > > > > > uint64_t *p0, v0; > > > > > > p0 = &mb0->rearm_data; > > > v0 = *p0 & REARM_MASK; > > > *p0 = v0 | rxq->mbuf_initializer; > > > ? > > > > Due to unaligned rearm_data issue, In ThunderX platform, we need to write > > multiple half word of aligned stores(so masking was better us). > > Ok, so what would be the gain on ARM if you'll make that change?
~4 cpu cycles per packet.Again it may not be ARM architecture specific as ARM architecture does not define instruction latency so it is more of a implementation specific data. > Again, what would be the drop (if any) on IA? > > > But I think, if we can put 16bit hole between port and ol_flags then > > we may not need the masking stuff in ixgbe. Right? > > You mean move buf_len somewhere else (end of cacheline0) and > introduce a 2B hole between port and ol_flags, right? Yes > Yep, that probably wouldn't have any performance impact. I will try two options and send a patch which don't have any performance impact on IA. > > > > > OR > > > > Even better, if we can fill in a uint16_t variable which will replaced > > later in the flow like "data_len"? > > data_len is grouped with rx_descriptor_fields1 on purpose - > so we can update packet_type,pkt_len, data_len,vlan_tci,rss with one 16B > write. OK > > Konstantin > > > and move buf_len at end the first cache line? > >or any other thoughts to fix unaligned rearm_data issue? > > > > Jerin > > > > > > > > > > > > If so I wonder what would be the performance impact of that change. > > > Konstantin > > > > > > > > > > /* next 6 bytes are initialised on RX descriptor rearm */ > > > > - MARKER8 rearm_data; > > > > + MARKER64 rearm_data; > > > > uint16_t data_off; > > > > > > > > /** > > > > @@ -754,6 +752,7 @@ struct rte_mbuf { > > > > uint8_t nb_segs; /**< Number of segments. */ > > > > uint8_t port; /**< Input port. */ > > > > > > > > + uint16_t buf_len; /**< Length of segment buffer. */ > > > > uint64_t ol_flags; /**< Offload features. */ > > > > > > > > /* remaining bytes are set on RX when pulling packet from > > > > * descriptor > > > > > > > > /Jerin