On Mon, Jun 15, 2015 at 04:28:55PM +0100, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: Richardson, Bruce > > Sent: Monday, June 15, 2015 4:24 PM > > To: Olivier MATZ > > Cc: Ananyev, Konstantin; dev at dpdk.org; Damjan Marion (damarion) > > Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > > On Mon, Jun 15, 2015 at 05:19:27PM +0200, Olivier MATZ wrote: > > > > > > > > > On 06/15/2015 04:52 PM, Ananyev, Konstantin wrote: > > > > Hi Olivier, > > > > > > > >> -----Original Message----- > > > >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > > > >> Sent: Monday, June 15, 2015 3:31 PM > > > >> To: Richardson, Bruce > > > >> Cc: Ananyev, Konstantin; dev at dpdk.org; Damjan Marion (damarion) > > > >> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > >> > > > >> > > > >> > > > >> On 06/15/2015 04:12 PM, Bruce Richardson wrote: > > > >>> On Mon, Jun 15, 2015 at 04:05:05PM +0200, Olivier MATZ wrote: > > > >>>> Hi, > > > >>>> > > > >>>> On 06/15/2015 03:54 PM, Ananyev, Konstantin wrote: > > > >>>>> > > > >>>>> > > > >>>>>> -----Original Message----- > > > >>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce > > > >>>>>> Richardson > > > >>>>>> Sent: Monday, June 15, 2015 2:44 PM > > > >>>>>> To: Olivier MATZ > > > >>>>>> Cc: dev at dpdk.org; Damjan Marion (damarion) > > > >>>>>> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > >>>>>> > > > >>>>>> On Mon, Jun 15, 2015 at 03:20:22PM +0200, Olivier MATZ wrote: > > > >>>>>>> Hi Damjan, > > > >>>>>>> > > > >>>>>>> On 06/10/2015 11:47 PM, Damjan Marion (damarion) wrote: > > > >>>>>>>> > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> We noticed 7% performance improvement by simply moving > > > >>>>>>>> rte_mbuf.next field to the 1st cache line. > > > >>>>>>>> > > > >>>>>>>> Currently, it falls under /* second cache line - fields only > > > >>>>>>>> used in slow path or on TX */ > > > >>>>>>>> but it is actually used at several places in rx fast path. > > > >>>>>>>> (e.g.: i40e_rx_alloc_bufs() is setting that field to NULL). > > > >>>>>>>> > > > >>>>>>>> Is there anything we can do here (stop using next field, or move > > > >>>>>>>> it to 1st cache line)? > > > >>>>>>> > > > >>>>>>> Agree, this is also something I noticed, see: > > > >>>>>>> http://dpdk.org/ml/archives/dev/2015-February/014400.html > > > >>>>>>> > > > >>>>>>> I did not have the time to do performance testing, but it's > > > >>>>>>> something > > > >>>>>>> I'd like to do as soon as I can. I don't see any obvious reason > > > >>>>>>> not to > > > >>>>>>> do it. > > > >>>>>>> > > > >>>>>>> It seems we currently just have enough room to do it (8 bytes are > > > >>>>>>> remaining in the first cache line when compiled in 64 bits). > > > >>>>>> > > > >>>>>> This, to me, is the obvious reason not to do it! It prevents us > > > >>>>>> from taking in > > > >>>>>> any other offload fields in the RX fast-path into the mbuf. > > > >>>>>> > > > >>>>>> That being said, I can see why we might want to look to move it - > > > >>>>>> but from the > > > >>>>>> work done in the ixgbe driver, I'd be hopeful we can get as much > > > >>>>>> performance with > > > >>>>>> it on the second cache line for most cases, through judicious use > > > >>>>>> of prefetching, > > > >>>>>> or otherwise. > > > >>>>>> > > > >>>>>> It took a lot of work and investigation to get free space in the > > > >>>>>> mbuf - especially > > > >>>>>> in cache line 0, and I'd like to avoid just filling the cache line > > > >>>>>> up again as > > > >>>>>> long as we possibly can! > > > >>>>> > > > >>>>> Yep, agree with Bruce here. > > > >>>>> Plus, with packet_type going to be 4B and vlan_tci_outer, > > > >>>>> we just don't have 8 free bytes at the first cache line any more. > > > >>>> > > > >>>> I don't understand why m->next would not be a better candidate than > > > >>>> rx offload fields to be in the first cache line. For instance, > > > >>>> m->next > > > >>>> is mandatory and must be initialized when allocating a mbuf (to be > > > >>>> compared with m->seqn for instance, which is also in the first cache > > > >>>> line). So if we want to do some room in the first cache line, I > > > >>>> think we can. > > > >>>> > > > >>>> To me, the only reason for not doing it now is because we don't > > > >>>> have a full performance test report (several use-cases, drivers, ...) > > > >>>> that shows it's better. > > > >>>> > > > >>> Because the "next" field is not mandatory to be set on > > > >>> initialization. It can > > > >>> instead be set only when needed, and cleared on free if it is used. > > > >>> > > > >>> The next pointers always start out as NULL when the mbuf pool is > > > >>> created. The > > > >>> only time it is set to non-NULL is when we have chained mbufs. If we > > > >>> never have > > > >>> any chained mbufs, we never need to touch the next field, or even > > > >>> read it - since > > > >>> we have the num-segments count in the first cache line. If we do have > > > >>> a multi-segment > > > >>> mbuf, it's likely to be a big packet, so we have more processing time > > > >>> available > > > >>> and we can then take the hit of setting the next pointer. Whenever we > > > >>> go to > > > >>> free that mbuf for that packet, the code to do the freeing obviously > > > >>> needs to > > > >>> read the next pointer so as to free all the buffers in the chain, and > > > >>> so it can > > > >>> also reset the next pointer to NULL when doing so. > > > >>> > > > >>> In this way, we can ensure that the next pointer on cache line 1 is > > > >>> not a problem > > > >>> in our fast path. > > > >> > > > >> This is a good idea, but looking at the drivers, it seems that today > > > >> they all set m->next to NULL in the rx function. What you are > > > >> suggesting > > > >> is to remove all of them, and document somewhere that all mbufs in a > > > >> pool are supposed to have their m->next set to NULL, correct? > > > >> > > > >> I think what you are describing could also apply to reference counter > > > >> (set to 1 by default), right? > > > > > > > > We probably can reset next to NULL at __rte_pktmbuf_prefree_seg(), > > > > at the same time we do reset refcnt to 0. > > > > Is that what you suggesting? > > > > > > Yes, I can give it a try. > > > > > > > > Why would we need to change that function? The main free_seg function (which > > is called from rte_pktmbuf_free() function) already sets the next pointer to > > NULL? Is there some edge case in the code now where we are missing setting > > the next pointer to NULL on free? > > ixgbe_tx_free_bufs() at drivers/net/ixgbe/ixgbe_rxtx_vec.c >
Not a gap - the vector TX functions cannot deal with scattered packets generally, not just freeing them :-) > > > > Also, any code that is not using the regular free functions i.e. mbuf_free > > or > > free_seg - anything not starting with "-" :-) - is responsible itself for > > ensuring that it frees things correctly. If it doesn't set next to NULL > > when it > > needs to - it needs to be fixed there, rather than changing the prefree_seg > > function. > > Why do think it would be a problem? > It seems like a good idea, tohide it inside __rte_pktmbuf_prefree_seg(), > So users don't have to set it to NULL manually each time. > I'm just nervous of slowing things down. Adding an extra instruction here applies to every single packet - it's not just per burst. That's why I'd like to be sure there is a definite problem before adding in more instructions here. /Bruce > Konstantin > > > > > /Bruce > > > > > > > > > > > > > Konstantin > > > > > > > >> > > > >> > > > >> Olivier > > > >> > > > >> > > > >>> > > > >>> /Bruce > > > >>>