Hi, > -----Original Message----- > From: Anoob Joseph <ano...@marvell.com> > Sent: Wednesday, September 08, 2021 18:30 > To: Yigit, Ferruh <ferruh.yi...@intel.com>; Xu, Rosen <rosen...@intel.com>; > Andrew Rybchenko <arybche...@solarflare.com> > Cc: Nicolau, Radu <radu.nico...@intel.com>; Doherty, Declan > <declan.dohe...@intel.com>; hemant.agra...@nxp.com; > ma...@nvidia.com; Ananyev, Konstantin <konstantin.anan...@intel.com>; > tho...@monjalon.net; Ankur Dwivedi <adwiv...@marvell.com>; > andrew.rybche...@oktetlabs.ru; Akhil Goyal <gak...@marvell.com>; > dev@dpdk.org > Subject: RE: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload > > Hi Ferruh, Rosen, Andrew, > > Please see inline. > > Thanks, > Anoob > > > Subject: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload > > > > External Email > > > > ---------------------------------------------------------------------- > > On 8/23/2021 11:02 AM, Akhil Goyal wrote: > > > Reassembly is a costly operation if it is done in software, however, > > > if it is offloaded to HW, it can considerably save application cycles. > > > The operation becomes even more costlier if IP fragmants are > > > encrypted. > > > > > > To resolve above two issues, a new offload > > DEV_RX_OFFLOAD_REASSEMBLY > > > is introduced in ethdev for devices which can attempt reassembly of > > > packets in hardware. > > > rte_eth_dev_info is added with the reassembly capabilities which a > > > device can support. > > > Now, if IP fragments are encrypted, reassembly can also be attempted > > > while doing inline IPsec processing. > > > This is controlled by a flag in rte_security_ipsec_sa_options to > > > enable reassembly of encrypted IP fragments in the inline path. > > > > > > The resulting reassembled packet would be a typical segmented mbuf > > > in case of success. > > > > > > And if reassembly of fragments is failed or is incomplete (if > > > fragments do not come before the reass_timeout), the mbuf is updated > > > with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned > > as > > > is. Now application may decide the fate of the packet to wait more > > > for fragments to come or drop. > > > > > > Signed-off-by: Akhil Goyal <gak...@marvell.com> > > > --- > > > lib/ethdev/rte_ethdev.c | 1 + > > > lib/ethdev/rte_ethdev.h | 18 +++++++++++++++++- > > > lib/mbuf/rte_mbuf_core.h | 3 ++- > > > lib/security/rte_security.h | 10 ++++++++++ > > > 4 files changed, 30 insertions(+), 2 deletions(-) > > > > > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index > > > 9d95cd11e1..1ab3a093cf 100644 > > > --- a/lib/ethdev/rte_ethdev.c > > > +++ b/lib/ethdev/rte_ethdev.c > > > @@ -119,6 +119,7 @@ static const struct { > > > RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER), > > > RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND), > > > RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME), > > > + RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY), > > > RTE_RX_OFFLOAD_BIT2STR(SCATTER), > > > RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP), > > > RTE_RX_OFFLOAD_BIT2STR(SECURITY), > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index > > > d2b27c351f..e89a4dc1eb 100644 > > > --- a/lib/ethdev/rte_ethdev.h > > > +++ b/lib/ethdev/rte_ethdev.h > > > @@ -1360,6 +1360,7 @@ struct rte_eth_conf { > > > #define DEV_RX_OFFLOAD_VLAN_FILTER 0x00000200 > > > #define DEV_RX_OFFLOAD_VLAN_EXTEND 0x00000400 > > > #define DEV_RX_OFFLOAD_JUMBO_FRAME 0x00000800 > > > +#define DEV_RX_OFFLOAD_REASSEMBLY 0x00001000 > > > > previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been > long > > that offload has been removed, but not sure if it cause any problem to > > re- use it. > > > > > #define DEV_RX_OFFLOAD_SCATTER 0x00002000 > > > /** > > > * Timestamp is set by the driver in > > RTE_MBUF_DYNFIELD_TIMESTAMP_NAME > > > @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf { > > > */ > > > #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID > > (UINT16_MAX) > > > > > > +/** > > > + * Reassembly capabilities that a device can support. > > > + * The device which can support reassembly offload should set > > > + * DEV_RX_OFFLOAD_REASSEMBLY > > > + */ > > > +struct rte_eth_reass_capa { > > > + /** Maximum time in ns that a fragment can wait for further > > fragments */ > > > + uint64_t reass_timeout; > > > + /** Maximum number of fragments that device can reassemble */ > > > + uint16_t max_frags; > > > + /** Reserved for future capabilities */ > > > + uint16_t reserved[3]; > > > +}; > > > + > > > > I wonder if there is any other hardware around supports reassembly > > offload, it would be good to get more feedback on the capabilities list. > > > > > /** > > > * Ethernet device associated switch information > > > */ > > > @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info { > > > * embedded managed interconnect/switch. > > > */ > > > struct rte_eth_switch_info switch_info; > > > + /* Reassembly capabilities of a device for reassembly offload */ > > > + struct rte_eth_reass_capa reass_capa; > > > > > > - uint64_t reserved_64s[2]; /**< Reserved for future fields */ > > > > Reserved fields were added to be able to update the struct without > > breaking the ABI, so that a critical change doesn't have to wait until > > next ABI break release. > > Since this is ABI break release, we can keep the reserved field and > > add the new struct. Or this can be an opportunity to get rid of the reserved > field. > > > > Personally I have no objection to get rid of the reserved field, but > > better to agree on this explicitly. > > > > > void *reserved_ptrs[2]; /**< Reserved for future fields */ > > > }; > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h > > > index > > > bb38d7f581..cea25c87f7 100644 > > > --- a/lib/mbuf/rte_mbuf_core.h > > > +++ b/lib/mbuf/rte_mbuf_core.h > > > @@ -200,10 +200,11 @@ extern "C" { > > > #define PKT_RX_OUTER_L4_CKSUM_BAD (1ULL << 21) > > > #define PKT_RX_OUTER_L4_CKSUM_GOOD (1ULL << 22) > > > #define PKT_RX_OUTER_L4_CKSUM_INVALID ((1ULL << 21) | (1ULL > > << 22)) > > > +#define PKT_RX_REASSEMBLY_INCOMPLETE (1ULL << 23) > > > > > > > Similar comment with Andrew's, what is the expectation from > > application if this flag exists? Can we drop it to simplify the logic in the > application? > > [Anoob] There can be few cases where hardware/NIC attempts inline > reassembly but it fails to complete it > > 1. Number of fragments is larger than what is supported by the hardware 2. > Hardware reassembly resources are exhausted (due to limited reassembly > contexts etc) 3. Reassembly errors such as overlapping fragments 4. Wait > time exhausted (or reassembly timeout) > > In such cases, application would be required to retrieve the original > fragments so that it can attempt reassembly in software. The incomplete flag > is useful for 2 purposes basically, 1. Application would need to retrieve the > time the fragment has already spend in hardware reassembly so that > software reassembly attempt can compensate for it. Otherwise, reassembly > timeout across hardware + software will not be accurate 2. Retrieve original > fragments. With this proposal, an incomplete reassembly would result in a > chained mbuf but the segments need not be consecutive. To explain bit more, > > Suppose we have a packet that is fragmented into 3 fragments, and fragment > 3 & fragment 1 arrives in that order. Fragment 2 didn't arrive and hardware > ultimately pushes it. In that case, application would be receiving a > chained/segmented mbuf with fragment 1 & fragment 3 chained. > > Now, this chained mbuf can't be treated like a regular chained mbuf. Each > fragment would have its IP hdr and there are fragments missing in between. > The only thing application is expected to do is, retrieve fragments, push it > to > s/w reassembly.
What you mentioned is error identification. But actually a negotiation about max frame size is needed before datagrams tx/rx. > > > > > /* add new RX flags here, don't forget to update PKT_FIRST_FREE */ > > > > > > -#define PKT_FIRST_FREE (1ULL << 23) > > > +#define PKT_FIRST_FREE (1ULL << 24) > > > #define PKT_LAST_FREE (1ULL << 40) > > > > > > /* add new TX flags here, don't forget to update PKT_LAST_FREE */ > > > diff --git a/lib/security/rte_security.h > > > b/lib/security/rte_security.h index 88d31de0a6..364eeb5cd4 100644 > > > --- a/lib/security/rte_security.h > > > +++ b/lib/security/rte_security.h > > > @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options { > > > * * 0: Disable per session security statistics collection for this SA. > > > */ > > > uint32_t stats : 1; > > > + > > > + /** Enable reassembly on incoming packets. > > > + * > > > + * * 1: Enable driver to try reassembly of encrypted IP packets for > > > + * this SA, if supported by the driver. This feature will work > > > + * only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in > > > + * inline ethernet device. > > > + * * 0: Disable reassembly of packets (default). > > > + */ > > > + uint32_t reass_en : 1; > > > }; > > > > > > /** IPSec security association direction */ > > >