> -----Original Message----- > From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > Sent: Monday, October 19, 2015 5:30 PM > To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org > Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive > function > > > > On 15/10/15 16:43, Ananyev, Konstantin wrote: > > > > > >> -----Original Message----- > >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > >> Sent: Thursday, October 15, 2015 11:33 AM > >> To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org > >> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive > >> function > >> > >> > >> > >> On 15/10/15 00:23, Ananyev, Konstantin wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > >>>> Sent: Wednesday, October 14, 2015 5:11 PM > >>>> To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org > >>>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD > >>>> receive function > >>>> > >>>> > >>>> > >>>> On 28/09/15 00:19, Ananyev, Konstantin wrote: > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > >>>>>> Sent: Friday, September 25, 2015 7:29 PM > >>>>>> To: Richardson, Bruce; dev at dpdk.org > >>>>>> Cc: Ananyev, Konstantin > >>>>>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD > >>>>>> receive function > >>>>>> > >>>>>> On 07/09/15 07:41, Richardson, Bruce wrote: > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > >>>>>>>> Sent: Monday, September 7, 2015 3:15 PM > >>>>>>>> To: Richardson, Bruce; dev at dpdk.org > >>>>>>>> Cc: Ananyev, Konstantin > >>>>>>>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD > >>>>>>>> receive > >>>>>>>> function > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 07/09/15 13:57, Richardson, Bruce wrote: > >>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > >>>>>>>>>> Sent: Monday, September 7, 2015 1:26 PM > >>>>>>>>>> To: dev at dpdk.org > >>>>>>>>>> Cc: Ananyev, Konstantin; Richardson, Bruce > >>>>>>>>>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD > >>>>>>>>>> receive function > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I just realized I've missed the "[PATCH]" tag from the subject. Did > >>>>>>>>>> anyone had time to review this? > >>>>>>>>>> > >>>>>>>>> Hi Zoltan, > >>>>>>>>> > >>>>>>>>> the big thing that concerns me with this is the addition of new > >>>>>>>>> instructions for each packet in the fast path. Ideally, this > >>>>>>>>> prefetching would be better handled in the application itself, as > >>>>>>>>> for > >>>>>>>>> some apps, e.g. those using pipelining, the core doing the RX from > >>>>>>>>> the > >>>>>>>>> NIC may not touch the packet data at all, and the prefetches will > >>>>>>>> instead cause a performance slowdown. > >>>>>>>>> Is it possible to get the same performance increase - or something > >>>>>>>>> close to it - by making changes in OVS? > >>>>>>>> OVS already does a prefetch when it's processing the previous > >>>>>>>> packet, but > >>>>>>>> apparently it's not early enough. At least for my test scenario, > >>>>>>>> where I'm > >>>>>>>> forwarding UDP packets with the least possible overhead. I guess in > >>>>>>>> tests > >>>>>>>> where OVS does more complex processing it should be fine. > >>>>>>>> I'll try to move the prefetch earlier in OVS codebase, but I'm not > >>>>>>>> sure if > >>>>>>>> it'll help. > >>>>>>> I would suggest trying to prefetch more than one packet ahead. > >>>>>>> Prefetching 4 or > >>>>>>> 8 ahead might work better, depending on the processing being done. > >>>>>> > >>>>>> I've moved the prefetch earlier, and it seems to work: > >>>>>> > >>>>>> https://patchwork.ozlabs.org/patch/519017/ > >>>>>> > >>>>>> However it raises the question: should we remove header prefetch from > >>>>>> all the other drivers, and the CONFIG_RTE_PMD_PACKET_PREFETCH option? > >>>>> > >>>>> My vote would be for that. > >>>>> Konstantin > >>>> > >>>> After some further thinking I would rather support the > >>>> rte_packet_prefetch() macro (prefetch the header in the driver, and > >>>> configure it through CONFIG_RTE_PMD_PACKET_PREFETCH) > >>>> > >>>> - the prefetch can happen earlier, so if an application needs the header > >>>> right away, this is the fastest > >>>> - if the application doesn't need header prefetch, it can turn it off > >>>> compile time. Same as if we wouldn't have this option. > >>>> - if the application has mixed needs (sometimes it needs the header > >>>> right away, sometimes it doesn't), it can turn it off and do what it > >>>> needs. Same as if we wouldn't have this option. > >>>> > >>>> A harder decision would be whether it should be turned on or off by > >>>> default. Currently it's on, and half of the receive functions don't use > >>>> it. > >>> > >>> Yep, it is sort of a mess right now. > >>> Another question if we'd like to keep it and standardise it: > >>> at what moment to prefetch: as soon as we realize that HW is done with > >>> that buffer, > >>> or as late inside rx_burst() as possible? > >>> I suppose there is no clear answer for that. > >> I think if the application needs the driver start the prefetching, it > >> does it because it's already too late when rte_eth_rx_burst() returns. > >> So I think it's best to do it as soon as we learn that the HW is done. > > > > Probably, but it might be situations when it would be more plausible to do > > it later too. > Could you elaborate? > If the application wants prefetch to start earlier than could be done by > itself (right after rte_eth_rx_burst() returns), earlier is better. So > it will have the best chances to have the data in cache. > If it would need the data later, then it could do the prefetch by itself. > > > Again it might depend on particular HW and your memory access pattern. > > > >> > >>> That's why my thought was to just get rid of it. > >>> Though if it would be implemented in some standardized way and disabled > >>> by default - > >>> that's probably ok too. > >>> BTW, couldn't that be implemented just via rte_ethdev rx callbacks > >>> mechanism? > >>> Then we can have the code one place and don't need compile option at all - > >>> could be ebabled/disabled dynamically on a per nic basis. > >>> Or would it be too high overhead for that? > >> > >> I think if we go that way, it's better to pass an option to > >> rte_eth_rx_burst() and tell if you want the header prefetched by the > >> driver. That would be the most flexible. > > > > That would mean ABI breakage for rx_burst()... > > Probably then better a new parameter in rx_conf or something. > > Though it still means that each PMD has to implement it on its own. > That would be the case in any way, as we are are talking about > prefetching in the receive function. > > > And again there might be PMDs that would just ignore that parameter. > If the PMD has a good reason to do that (e.g. prefetch has undesirable > effects), I think it's fine. > > > While for callback it would be all in one and known place. > Who and when would call that callback? If the application after > rte_eth_rx_burst() returned, it wouldn't have too much use, it could > just call prefetch by itself.
I am talking about the callbacks called by rx_burst() itself: static inline uint16_t rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t nb_pkts) { struct rte_eth_dev *dev; dev = &rte_eth_devices[port_id]; int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id], rx_pkts, nb_pkts); #ifdef RTE_ETHDEV_RXTX_CALLBACKS struct rte_eth_rxtx_callback *cb = dev->post_rx_burst_cbs[queue_id]; if (unlikely(cb != NULL)) { do { nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx, nb_pkts, cb->param); cb = cb->next; } while (cb != NULL); #endif return nb_rx; } Konstantin > > > But again, I didn't measure it, so not sure what will be an impact of the > > callback itself. > > Might be it not worth it. > > Konstantin > > > >> > >>> Konstantin > >>> > >>> > >>> > >>>> > >>>>> > >>>>> > >>>>>> > >>>>>>> > >>>>>>> /Bruce > >>>>>