On 07/30/2015 07:17 PM, Stephen Hemminger wrote: > On Thu, 30 Jul 2015 17:57:33 +0300 > Vlad Zolotarov <vladz at cloudius-systems.com> wrote: > >> Hi, Konstantin, Helin, >> there is a documented limitation of xl710 controllers (i40e driver) >> which is not handled in any way by a DPDK driver. >> From the datasheet chapter 8.4.1: >> >> "? A single transmit packet may span up to 8 buffers (up to 8 data >> descriptors per packet including >> both the header and payload buffers). >> ? The total number of data descriptors for the whole TSO (explained later on >> in this chapter) is >> unlimited as long as each segment within the TSO obeys the previous rule (up >> to 8 data descriptors >> per segment for both the TSO header and the segment payload buffers)." >> >> This means that, for instance, long cluster with small fragments has to >> be linearized before it may be placed on the HW ring. >> In more standard environments like Linux or FreeBSD drivers the solution >> is straight forward - call skb_linearize()/m_collapse() corresponding. >> In the non-conformist environment like DPDK life is not that easy - >> there is no easy way to collapse the cluster into a linear buffer from >> inside the device driver >> since device driver doesn't allocate memory in a fast path and utilizes >> the user allocated pools only. >> >> Here are two proposals for a solution: >> >> 1. We may provide a callback that would return a user TRUE if a give >> cluster has to be linearized and it should always be called before >> rte_eth_tx_burst(). Alternatively it may be called from inside the >> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some >> error code for a case when one of the clusters it's given has to be >> linearized. >> 2. Another option is to allocate a mempool in the driver with the >> elements consuming a single page each (standard 2KB buffers would >> do). Number of elements in the pool should be as Tx ring length >> multiplied by "64KB/(linear data length of the buffer in the pool >> above)". Here I use 64KB as a maximum packet length and not taking >> into an account esoteric things like "Giant" TSO mentioned in the >> spec above. Then we may actually go and linearize the cluster if >> needed on top of the buffers from the pool above, post the buffer >> from the mempool above on the HW ring, link the original cluster to >> that new cluster (using the private data) and release it when the >> send is done. > Or just silently drop heavily scattered packets (and increment oerrors) > with a PMD_TX_LOG debug message. > > I think a DPDK driver doesn't have to accept all possible mbufs and do > extra work. It seems reasonable to expect caller to be well behaved > in this restricted ecosystem. >
How can the caller know what's well behaved? It's device dependent.