> -----Original Message----- > From: Richardson, Bruce > Sent: Monday, June 15, 2015 4:24 PM > To: Olivier MATZ > Cc: Ananyev, Konstantin; dev at dpdk.org; Damjan Marion (damarion) > Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > On Mon, Jun 15, 2015 at 05:19:27PM +0200, Olivier MATZ wrote: > > > > > > On 06/15/2015 04:52 PM, Ananyev, Konstantin wrote: > > > Hi Olivier, > > > > > >> -----Original Message----- > > >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > > >> Sent: Monday, June 15, 2015 3:31 PM > > >> To: Richardson, Bruce > > >> Cc: Ananyev, Konstantin; dev at dpdk.org; Damjan Marion (damarion) > > >> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > >> > > >> > > >> > > >> On 06/15/2015 04:12 PM, Bruce Richardson wrote: > > >>> On Mon, Jun 15, 2015 at 04:05:05PM +0200, Olivier MATZ wrote: > > >>>> Hi, > > >>>> > > >>>> On 06/15/2015 03:54 PM, Ananyev, Konstantin wrote: > > >>>>> > > >>>>> > > >>>>>> -----Original Message----- > > >>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce > > >>>>>> Richardson > > >>>>>> Sent: Monday, June 15, 2015 2:44 PM > > >>>>>> To: Olivier MATZ > > >>>>>> Cc: dev at dpdk.org; Damjan Marion (damarion) > > >>>>>> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > >>>>>> > > >>>>>> On Mon, Jun 15, 2015 at 03:20:22PM +0200, Olivier MATZ wrote: > > >>>>>>> Hi Damjan, > > >>>>>>> > > >>>>>>> On 06/10/2015 11:47 PM, Damjan Marion (damarion) wrote: > > >>>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> We noticed 7% performance improvement by simply moving > > >>>>>>>> rte_mbuf.next field to the 1st cache line. > > >>>>>>>> > > >>>>>>>> Currently, it falls under /* second cache line - fields only used > > >>>>>>>> in slow path or on TX */ > > >>>>>>>> but it is actually used at several places in rx fast path. (e.g.: > > >>>>>>>> i40e_rx_alloc_bufs() is setting that field to NULL). > > >>>>>>>> > > >>>>>>>> Is there anything we can do here (stop using next field, or move > > >>>>>>>> it to 1st cache line)? > > >>>>>>> > > >>>>>>> Agree, this is also something I noticed, see: > > >>>>>>> http://dpdk.org/ml/archives/dev/2015-February/014400.html > > >>>>>>> > > >>>>>>> I did not have the time to do performance testing, but it's > > >>>>>>> something > > >>>>>>> I'd like to do as soon as I can. I don't see any obvious reason not > > >>>>>>> to > > >>>>>>> do it. > > >>>>>>> > > >>>>>>> It seems we currently just have enough room to do it (8 bytes are > > >>>>>>> remaining in the first cache line when compiled in 64 bits). > > >>>>>> > > >>>>>> This, to me, is the obvious reason not to do it! It prevents us from > > >>>>>> taking in > > >>>>>> any other offload fields in the RX fast-path into the mbuf. > > >>>>>> > > >>>>>> That being said, I can see why we might want to look to move it - > > >>>>>> but from the > > >>>>>> work done in the ixgbe driver, I'd be hopeful we can get as much > > >>>>>> performance with > > >>>>>> it on the second cache line for most cases, through judicious use of > > >>>>>> prefetching, > > >>>>>> or otherwise. > > >>>>>> > > >>>>>> It took a lot of work and investigation to get free space in the > > >>>>>> mbuf - especially > > >>>>>> in cache line 0, and I'd like to avoid just filling the cache line > > >>>>>> up again as > > >>>>>> long as we possibly can! > > >>>>> > > >>>>> Yep, agree with Bruce here. > > >>>>> Plus, with packet_type going to be 4B and vlan_tci_outer, > > >>>>> we just don't have 8 free bytes at the first cache line any more. > > >>>> > > >>>> I don't understand why m->next would not be a better candidate than > > >>>> rx offload fields to be in the first cache line. For instance, m->next > > >>>> is mandatory and must be initialized when allocating a mbuf (to be > > >>>> compared with m->seqn for instance, which is also in the first cache > > >>>> line). So if we want to do some room in the first cache line, I > > >>>> think we can. > > >>>> > > >>>> To me, the only reason for not doing it now is because we don't > > >>>> have a full performance test report (several use-cases, drivers, ...) > > >>>> that shows it's better. > > >>>> > > >>> Because the "next" field is not mandatory to be set on initialization. > > >>> It can > > >>> instead be set only when needed, and cleared on free if it is used. > > >>> > > >>> The next pointers always start out as NULL when the mbuf pool is > > >>> created. The > > >>> only time it is set to non-NULL is when we have chained mbufs. If we > > >>> never have > > >>> any chained mbufs, we never need to touch the next field, or even read > > >>> it - since > > >>> we have the num-segments count in the first cache line. If we do have a > > >>> multi-segment > > >>> mbuf, it's likely to be a big packet, so we have more processing time > > >>> available > > >>> and we can then take the hit of setting the next pointer. Whenever we > > >>> go to > > >>> free that mbuf for that packet, the code to do the freeing obviously > > >>> needs to > > >>> read the next pointer so as to free all the buffers in the chain, and > > >>> so it can > > >>> also reset the next pointer to NULL when doing so. > > >>> > > >>> In this way, we can ensure that the next pointer on cache line 1 is not > > >>> a problem > > >>> in our fast path. > > >> > > >> This is a good idea, but looking at the drivers, it seems that today > > >> they all set m->next to NULL in the rx function. What you are suggesting > > >> is to remove all of them, and document somewhere that all mbufs in a > > >> pool are supposed to have their m->next set to NULL, correct? > > >> > > >> I think what you are describing could also apply to reference counter > > >> (set to 1 by default), right? > > > > > > We probably can reset next to NULL at __rte_pktmbuf_prefree_seg(), > > > at the same time we do reset refcnt to 0. > > > Is that what you suggesting? > > > > Yes, I can give it a try. > > > > > Why would we need to change that function? The main free_seg function (which > is called from rte_pktmbuf_free() function) already sets the next pointer to > NULL? Is there some edge case in the code now where we are missing setting > the next pointer to NULL on free?
ixgbe_tx_free_bufs() at drivers/net/ixgbe/ixgbe_rxtx_vec.c > > Also, any code that is not using the regular free functions i.e. mbuf_free or > free_seg - anything not starting with "-" :-) - is responsible itself for > ensuring that it frees things correctly. If it doesn't set next to NULL when > it > needs to - it needs to be fixed there, rather than changing the prefree_seg > function. Why do think it would be a problem? It seems like a good idea, tohide it inside __rte_pktmbuf_prefree_seg(), So users don't have to set it to NULL manually each time. Konstantin > > /Bruce > > > > > > > > > Konstantin > > > > > >> > > >> > > >> Olivier > > >> > > >> > > >>> > > >>> /Bruce > > >>>