On Fri, 26 Aug 2016 18:22:52 +0200
Tomasz Kulasek <tomaszx.kulasek at intel.com> wrote:

> As discussed in that thread:
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
> Different NIC models depending on HW offload requested might impose
> different requirements on packets to be TX-ed in terms of:
>  - Max number of fragments per packet allowed
>  - Max number of fragments per TSO segments
>  - The way pseudo-header checksum should be pre-calculated
>  - L3/L4 header fields filling
>  - etc.
> -----------
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>    However, this work is sometimes required, and now, it's an
>    application issue.

Why not? You are adding an additional API burden on every application.

> 2) Different hardware may have different requirements for TX offloads,
>    other subset can be supported and so on.

These need to be reported by API so that application can handle it.
Doing these transformations in tx_prep seems late in the process.

> 3) Some parameters (e.g. number of segments in ixgbe driver) may hung
>    device. These parameters may be vary for different devices.
>    For example i40e HW allows 8 fragments per packet, but that is after
>    TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

Seems better to handle these limits as exceptions in i40e_tx_burst etc;
rather than a pre-step. Look at how Linux driver API works, several drivers
have to have an exception linearize path.

> 4) Fields in packet may require different initialization (like e.g. will
>    require pseudo-header checksum precalculation, sometimes in a
>    different way depending on packet type, and so on). Now application
>    needs to care about it.

Once again, the driver should do this in Tx.

> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>    prepare packet burst in acceptable form for specific device.
> 6) Some additional checks may be done in debug mode keeping tx_burst
>    implementation clean.

Most of this could be done by refactoring existing tx_burst in drivers.
Much of the code seems to be written as the "let's write a 2000 line
function because that is most efficient" rather than "let's write
small steps and let the compiler optimize it"

Reply via email to