On 07/30/15 19:10, Zhang, Helin wrote: > >> -----Original Message----- >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] >> Sent: Thursday, July 30, 2015 7:58 AM >> To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin >> Subject: RFC: i40e xmit path HW limitation >> >> Hi, Konstantin, Helin, >> there is a documented limitation of xl710 controllers (i40e driver) which is >> not >> handled in any way by a DPDK driver. >> From the datasheet chapter 8.4.1: >> >> "? A single transmit packet may span up to 8 buffers (up to 8 data >> descriptors per >> packet including both the header and payload buffers). >> ? The total number of data descriptors for the whole TSO (explained later on >> in >> this chapter) is unlimited as long as each segment within the TSO obeys the >> previous rule (up to 8 data descriptors per segment for both the TSO header >> and >> the segment payload buffers)." > Yes, I remember the RX side just supports 5 segments per packet receiving. > But what's the possible issue you thought about? Note that it's a Tx size we are talking about.
See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo. If such a cluster arrives and you post it on the HW ring - HW will shut this HW ring down permanently. The application will see that it's ring is stuck. > >> This means that, for instance, long cluster with small fragments has to be >> linearized before it may be placed on the HW ring. > What type of size of the small fragments? Basically 2KB is the default size > of mbuf of most > example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the > maximum > packet size we supported. > If 1KB mbuf is used, don't expect it can transmit more than 8KB size of > packet. I kinda lost u here. Again, we talk about the Tx side here and buffers are not obligatory completely filled. Namely there may be a cluster with 15 fragments 100 bytes each. > >> In more standard environments like Linux or FreeBSD drivers the solution is >> straight forward - call skb_linearize()/m_collapse() corresponding. >> In the non-conformist environment like DPDK life is not that easy - there is >> no >> easy way to collapse the cluster into a linear buffer from inside the device >> driver >> since device driver doesn't allocate memory in a fast path and utilizes the >> user >> allocated pools only. >> Here are two proposals for a solution: >> >> 1. We may provide a callback that would return a user TRUE if a give >> cluster has to be linearized and it should always be called before >> rte_eth_tx_burst(). Alternatively it may be called from inside the >> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some >> error code for a case when one of the clusters it's given has to be >> linearized. >> 2. Another option is to allocate a mempool in the driver with the >> elements consuming a single page each (standard 2KB buffers would >> do). Number of elements in the pool should be as Tx ring length >> multiplied by "64KB/(linear data length of the buffer in the pool >> above)". Here I use 64KB as a maximum packet length and not taking >> into an account esoteric things like "Giant" TSO mentioned in the >> spec above. Then we may actually go and linearize the cluster if >> needed on top of the buffers from the pool above, post the buffer >> from the mempool above on the HW ring, link the original cluster to >> that new cluster (using the private data) and release it when the >> send is done. >> >> >> The first is a change in the API and would require from the application some >> additional handling (linearization). The second would require some additional >> memory but would keep all dirty details inside the driver and would leave the >> rest of the code intact. >> >> Pls., comment. >> >> thanks, >> vlad >>