> > From: Andrew Rybchenko [mailto:arybche...@solarflare.com] > Sent: Friday, January 20, 2017 10:26 AM > To: Yang, Zhiyong <zhiyong.y...@intel.com>; dev@dpdk.org > Cc: thomas.monja...@6wind.com; Richardson, Bruce > <bruce.richard...@intel.com>; Ananyev, Konstantin > <konstantin.anan...@intel.com> > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching > behavior > > On 01/20/2017 12:51 PM, Zhiyong Yang wrote: > The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to > transmit output packets on the output queue for DPDK applications as > follows. > > static inline uint16_t > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id, > struct rte_mbuf **tx_pkts, uint16_t nb_pkts); > > Note: The fourth parameter nb_pkts: The number of packets to transmit. > The rte_eth_tx_burst() function returns the number of packets it actually > sent. The return value equal to *nb_pkts* means that all packets have been > sent, and this is likely to signify that other output packets could be > immediately transmitted again. Applications that implement a "send as many > packets to transmit as possible" policy can check this specific case and > keep invoking the rte_eth_tx_burst() function until a value less than > *nb_pkts* is returned. > > When you call TX only once in rte_eth_tx_burst, you may get different > behaviors from different PMDs. One problem that every DPDK user has to > face is that they need to take the policy into consideration at the app- > lication level when using any specific PMD to send the packets whether or > not it is necessary, which brings usage complexities and makes DPDK users > easily confused since they have to learn the details on TX function limit > of specific PMDs and have to handle the different return value: the number > of packets transmitted successfully for various PMDs. Some PMDs Tx func- > tions have a limit of sending at most 32 packets for every invoking, some > PMDs have another limit of at most 64 packets once, another ones have imp- > lemented to send as many packets to transmit as possible, etc. This will > easily cause wrong usage for DPDK users. > > This patch proposes to implement the above policy in DPDK lib in order to > simplify the application implementation and avoid the incorrect invoking > as well. So, DPDK Users don't need to consider the implementation policy > and to write duplicated code at the application level again when sending > packets. In addition to it, the users don't need to know the difference of > specific PMD TX and can transmit the arbitrary number of packets as they > expect when invoking TX API rte_eth_tx_burst, then check the return value > to get the number of packets actually sent. > > How to implement the policy in DPDK lib? Two solutions are proposed below. > > Solution 1: > Implement the wrapper functions to remove some limits for each specific > PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that. > > > IMHO, the solution is a bit better since it: > > 1. Does not affect other PMDs at all > > 2. Could be a bit faster for the PMDs which require it since has no indirect > > function call on each iteration > > 3. No ABI change
I also would prefer solution number 1 for the reasons outlined by Andrew above. Also, IMO current limitation for number of packets to TX in some Intel PMD TX routines are sort of artificial: - they are not caused by any real HW limitations - avoiding them at PMD level shouldn't cause any performance or functional degradation. So I don't see any good reason why instead of fixing these limitations in our own PMDs we are trying to push them to the upper (rte_ethdev) layer. Konstantin > > > Solution 2: > Implement the policy in the function rte_eth_tx_burst() at the ethdev lay- > er in a more consistent batching way. Make best effort to send *nb_pkts* > packets with bursts of no more than 32 by default since many DPDK TX PMDs > are using this max TX burst size(32). In addition, one data member which > defines the max TX burst size such as "uint16_t max_tx_burst_pkts;"will be > added to rte_eth_dev_data, which drivers can override if they work with > bursts of 64 or other NB(thanks for Bruce <bruce.richard...@intel.com>'s > suggestion). This can reduce the performance impacting to the lowest limit. > > > I see no noticeable difference in performance, so don't mind if this is > > finally choosen. > > Just be sure that you update all PMDs to set reasonable default values, or > > may be > > even better, set UINT16_MAX in generic place - 0 is a bad default here. > > (Lost few seconds wondering why nothing is sent and cannot stop) > > > I prefer the latter between the 2 solutions because it makes DPDK code more > consistent and easier and avoids to write too much duplicate logic in DPDK > source code. In addition, I think no or a little performance drop is > brought by solution 2. But ABI change will be introduced. > > In fact, the current rte_eth_rx_burst() function is using the similar > mechanism and faces the same problem as rte_eth_tx_burst(). > > static inline uint16_t > rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id, > struct rte_mbuf **rx_pkts, const uint16_t nb_pkts); > > Applications are responsible of implementing the policy "retrieve as many > received packets as possible", and check this specific case and keep > invoking the rte_eth_rx_burst() function until a value less than *nb_pkts* > is returned. > > The patch proposes to apply the above method to rte_eth_rx_burst() as well. > > In summary, The purpose of the RFC makes the job easier and more simple for > driver writers and avoids to write too much duplicate code at the applica- > tion level. >