Hi Zoltan, > -----Original Message----- > From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > Sent: Wednesday, July 29, 2015 10:40 AM > To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org > Subject: Re: [dpdk-dev] ixgbe vPMD RX functions and buffer number minimum > requirement > > Hi, > > On 28/07/15 01:10, Ananyev, Konstantin wrote: > > Hi Zoltan, > > > >> -----Original Message----- > >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > >> Sent: Monday, July 27, 2015 12:38 PM > >> To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org > >> Subject: Re: [dpdk-dev] ixgbe vPMD RX functions and buffer number minimum > >> requirement > >> > >> Hi Konstantin, > >> > >> Thanks! Another question I would have: why does _recv_raw_pkts_vec() > >> insist on (nb_pkts > RTE_IXGBE_VPMD_RX_BURST)? Looking at the code it > >> should be able to return packets when nb_pkts >= > >> RTE_IXGBE_DESCS_PER_LOOP. > > > > Yes, that seems pretty trivial modification. > > Don't know any good reason why it wasn't done that way. > > > >> The split_flags check in > >> ixgbe_recv_scattered_pkts_vec() would be a bit more complicated, and > >> therefore maybe would have a tiny performance overhead as well, but I > >> don't it would be anything serious. > > > > I think, if the performance wouldn't be affected, that will be really > > useful change. > > So it is definitely worth to try. > > Probably even _recv_raw_pkts_vec() for first nb_pkts & > > ~(RTE_IXGBE_VPMD_RX_BURST - 1), > > and then sort of scalar analog for the remainder. > > Ok, I'll give it a go. > Another question, regarding performance: what setup you used to show a > performance difference? I've tried to compare the vector function with > the normal bulk alloc with receiving a 10 Gbps stream of 64 bytes UDP > packets (and forward them out on the other port), but both yielded ~14 > Mpps. I have a i5-4570 CPU @ 3.20GHz, maybe I should limit the clock speed?
One port is not enough here if you are running @ 3.2 GHz. For PMD - I personally use a box with 4x10G ports, one port per pci line, run testpmd io fwd mode, all 4 ports over 1 core @ 2.8 GHz. If you don't have similar box - let us know when you done with your changes, we can give it a try. Yes, reducing clock speed to minimal is another good option. Though, I think just 1x10G port wouldn't be enough even for 1.2 GHz. Probably 2x10G would do. Konstantin > > Regards, > > Zoltan > > > Konstantin > > > >> > >> Regards, > >> > >> Zoltan > >> > >> > >> On 24/07/15 17:43, Ananyev, Konstantin wrote: > >>> Hi Zoltan, > >>> > >>>> -----Original Message----- > >>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zoltan Kiss > >>>> Sent: Friday, July 24, 2015 4:00 PM > >>>> To: Richardson, Bruce; dev at dpdk.org > >>>> Subject: [dpdk-dev] ixgbe vPMD RX functions and buffer number minimum > >>>> requirement > >>>> > >>>> Hi, > >>>> > >>>> I was thinking how to handle the situation when you call > >>>> rte_eth_rx_burst with less than RTE_IXGBE_VPMD_RX_BURST buffers. In > >>>> ODP-DPDK unfortunately we can't force this requirement onto the calling > >>>> application. > >>>> One idea I had to check in ixgbe_recv_pkts_vec() if nb_pkts < > >>>> RTE_IXGBE_VPMD_RX_BURST, and call ixgbe_recv_pkts_bulk_alloc in that > >>>> case. Accordingly, in ixgbe_recv_scattered_pkts_vec() we could call > >>>> ixgbe_recv_scattered_pkts() in this case. A branch predictor can easily > >>>> eliminate the performance penalty of this, and applications can use > >>>> whatever burst size feasible for them. > >>>> The obvious problem could be whether you can mix the receive functions > >>>> this way. I have a feeling it wouldn't fly, but I wanted to ask first > >>>> before spending time investigate this option further. > >>> > >>> No, it is not possible to mix different RX functions, they setup/use > >>> ixgbe_rx_queue > >>> fields in a different manner. > >>> Konstantin > >>> > >>>> > >>>> Regards, > >>>> > >>>> Zoltan