On Fri, 26 Aug 2016 18:22:52 +0200 Tomasz Kulasek <tomaszx.kulasek at intel.com> wrote:
> As discussed in that thread: > > http://dpdk.org/ml/archives/dev/2015-September/023603.html > > Different NIC models depending on HW offload requested might impose > different requirements on packets to be TX-ed in terms of: > > - Max number of fragments per packet allowed > - Max number of fragments per TSO segments > - The way pseudo-header checksum should be pre-calculated > - L3/L4 header fields filling > - etc. > > > MOTIVATION: > ----------- > > 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst. > However, this work is sometimes required, and now, it's an > application issue. Why not? You are adding an additional API burden on every application. > > 2) Different hardware may have different requirements for TX offloads, > other subset can be supported and so on. These need to be reported by API so that application can handle it. Doing these transformations in tx_prep seems late in the process. > > 3) Some parameters (e.g. number of segments in ixgbe driver) may hung > device. These parameters may be vary for different devices. > > For example i40e HW allows 8 fragments per packet, but that is after > TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit. Seems better to handle these limits as exceptions in i40e_tx_burst etc; rather than a pre-step. Look at how Linux driver API works, several drivers have to have an exception linearize path. > > 4) Fields in packet may require different initialization (like e.g. will > require pseudo-header checksum precalculation, sometimes in a > different way depending on packet type, and so on). Now application > needs to care about it. Once again, the driver should do this in Tx. > > 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to > prepare packet burst in acceptable form for specific device. > > 6) Some additional checks may be done in debug mode keeping tx_burst > implementation clean. Most of this could be done by refactoring existing tx_burst in drivers. Much of the code seems to be written as the "let's write a 2000 line function because that is most efficient" rather than "let's write small steps and let the compiler optimize it"