Hi, Adrien: Sorry for the late reply due to Chinese new year.
> -----Original Message----- > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > Sent: Tuesday, January 24, 2017 12:36 AM > To: Richardson, Bruce <bruce.richard...@intel.com> > Cc: Ananyev, Konstantin <konstantin.anan...@intel.com>; Andrew > Rybchenko <arybche...@solarflare.com>; Yang, Zhiyong > <zhiyong.y...@intel.com>; dev@dpdk.org; thomas.monja...@6wind.com > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching > behavior > > On Fri, Jan 20, 2017 at 11:48:22AM +0000, Bruce Richardson wrote: > > On Fri, Jan 20, 2017 at 11:24:40AM +0000, Ananyev, Konstantin wrote: > > > > > > > > From: Andrew Rybchenko [mailto:arybche...@solarflare.com] > > > > Sent: Friday, January 20, 2017 10:26 AM > > > > To: Yang, Zhiyong <zhiyong.y...@intel.com>; dev@dpdk.org > > > > Cc: thomas.monja...@6wind.com; Richardson, Bruce > > > > <bruce.richard...@intel.com>; Ananyev, Konstantin > > > > <konstantin.anan...@intel.com> > > > > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD > > > > batching behavior > > > > > > > > On 01/20/2017 12:51 PM, Zhiyong Yang wrote: > > > > The rte_eth_tx_burst() function in the file Rte_ethdev.h is > > > > invoked to transmit output packets on the output queue for DPDK > > > > applications as follows. > > > > > > > > static inline uint16_t > > > > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id, > > > > struct rte_mbuf **tx_pkts, uint16_t nb_pkts); > > > > > > > > Note: The fourth parameter nb_pkts: The number of packets to > transmit. > > > > The rte_eth_tx_burst() function returns the number of packets it > > > > actually sent. The return value equal to *nb_pkts* means that all > > > > packets have been sent, and this is likely to signify that other > > > > output packets could be immediately transmitted again. > > > > Applications that implement a "send as many packets to transmit as > > > > possible" policy can check this specific case and keep invoking > > > > the rte_eth_tx_burst() function until a value less than > > > > *nb_pkts* is returned. > > > > > > > > When you call TX only once in rte_eth_tx_burst, you may get > > > > different behaviors from different PMDs. One problem that every > > > > DPDK user has to face is that they need to take the policy into > > > > consideration at the app- lication level when using any specific > > > > PMD to send the packets whether or not it is necessary, which > > > > brings usage complexities and makes DPDK users easily confused > > > > since they have to learn the details on TX function limit of > > > > specific PMDs and have to handle the different return value: the > > > > number of packets transmitted successfully for various PMDs. Some > > > > PMDs Tx func- tions have a limit of sending at most 32 packets for > > > > every invoking, some PMDs have another limit of at most 64 packets > > > > once, another ones have imp- lemented to send as many packets to > transmit as possible, etc. This will easily cause wrong usage for DPDK users. > > > > > > > > This patch proposes to implement the above policy in DPDK lib in > > > > order to simplify the application implementation and avoid the > > > > incorrect invoking as well. So, DPDK Users don't need to consider > > > > the implementation policy and to write duplicated code at the > > > > application level again when sending packets. In addition to it, > > > > the users don't need to know the difference of specific PMD TX and > > > > can transmit the arbitrary number of packets as they expect when > > > > invoking TX API rte_eth_tx_burst, then check the return value to get > the number of packets actually sent. > > > > > > > > How to implement the policy in DPDK lib? Two solutions are proposed > below. > > > > > > > > Solution 1: > > > > Implement the wrapper functions to remove some limits for each > > > > specific PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple > do like that. > > > > > > > > > IMHO, the solution is a bit better since it: > > > > > 1. Does not affect other PMDs at all > > > > > 2. Could be a bit faster for the PMDs which require it since has > > > > >no indirect > > > > > function call on each iteration > > > > > 3. No ABI change > > > > > > I also would prefer solution number 1 for the reasons outlined by Andrew > above. > > > Also, IMO current limitation for number of packets to TX in some > > > Intel PMD TX routines are sort of artificial: > > > - they are not caused by any real HW limitations > > > - avoiding them at PMD level shouldn't cause any performance or > functional degradation. > > > So I don't see any good reason why instead of fixing these > > > limitations in our own PMDs we are trying to push them to the upper > (rte_ethdev) layer. > > For what it's worth, I agree with Konstantin. Wrappers should be as thin as > possible on top of PMD functions, they are not helpers. We could define a > set of higher level functions for this purpose though. > > In the meantime, exposing and documenting PMD limitations seems safe > enough. > > We could assert that RX/TX burst requests larger than the size of the target > queue are unlikely to be fully met (i.e. PMDs usually do not check for > completions in the middle of a TX burst). As a tool, it is very important for its users to easily consume it and make it work in a right way. Sort of artificial limits will make things look like a little confused and make some users probably get into trouble when writing drivers. Why do we correct it and make it easier? :) Zhiyong