Hi Ferruh, Rosen, Andrew,

Please see inline.

Thanks,
Anoob

> Subject: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 8/23/2021 11:02 AM, Akhil Goyal wrote:
> > Reassembly is a costly operation if it is done in software, however,
> > if it is offloaded to HW, it can considerably save application cycles.
> > The operation becomes even more costlier if IP fragmants are
> > encrypted.
> >
> > To resolve above two issues, a new offload
> DEV_RX_OFFLOAD_REASSEMBLY
> > is introduced in ethdev for devices which can attempt reassembly of
> > packets in hardware.
> > rte_eth_dev_info is added with the reassembly capabilities which a
> > device can support.
> > Now, if IP fragments are encrypted, reassembly can also be attempted
> > while doing inline IPsec processing.
> > This is controlled by a flag in rte_security_ipsec_sa_options to
> > enable reassembly of encrypted IP fragments in the inline path.
> >
> > The resulting reassembled packet would be a typical segmented mbuf in
> > case of success.
> >
> > And if reassembly of fragments is failed or is incomplete (if
> > fragments do not come before the reass_timeout), the mbuf is updated
> > with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned
> as
> > is. Now application may decide the fate of the packet to wait more for
> > fragments to come or drop.
> >
> > Signed-off-by: Akhil Goyal <gak...@marvell.com>
> > ---
> >  lib/ethdev/rte_ethdev.c     |  1 +
> >  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
> >  lib/mbuf/rte_mbuf_core.h    |  3 ++-
> >  lib/security/rte_security.h | 10 ++++++++++
> >  4 files changed, 30 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 9d95cd11e1..1ab3a093cf 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -119,6 +119,7 @@ static const struct {
> >     RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
> >     RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
> >     RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> > +   RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
> >     RTE_RX_OFFLOAD_BIT2STR(SCATTER),
> >     RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
> >     RTE_RX_OFFLOAD_BIT2STR(SECURITY),
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > d2b27c351f..e89a4dc1eb 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
> >  #define DEV_RX_OFFLOAD_VLAN_FILTER 0x00000200
> >  #define DEV_RX_OFFLOAD_VLAN_EXTEND 0x00000400
> >  #define DEV_RX_OFFLOAD_JUMBO_FRAME 0x00000800
> > +#define DEV_RX_OFFLOAD_REASSEMBLY  0x00001000
> 
> previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been long
> that offload has been removed, but not sure if it cause any problem to re-
> use it.
> 
> >  #define DEV_RX_OFFLOAD_SCATTER             0x00002000
> >  /**
> >   * Timestamp is set by the driver in
> RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
> >   */
> >  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
>       (UINT16_MAX)
> >
> > +/**
> > + * Reassembly capabilities that a device can support.
> > + * The device which can support reassembly offload should set
> > + * DEV_RX_OFFLOAD_REASSEMBLY
> > + */
> > +struct rte_eth_reass_capa {
> > +   /** Maximum time in ns that a fragment can wait for further
> fragments */
> > +   uint64_t reass_timeout;
> > +   /** Maximum number of fragments that device can reassemble */
> > +   uint16_t max_frags;
> > +   /** Reserved for future capabilities */
> > +   uint16_t reserved[3];
> > +};
> > +
> 
> I wonder if there is any other hardware around supports reassembly offload,
> it would be good to get more feedback on the capabilities list.
> 
> >  /**
> >   * Ethernet device associated switch information
> >   */
> > @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
> >      * embedded managed interconnect/switch.
> >      */
> >     struct rte_eth_switch_info switch_info;
> > +   /* Reassembly capabilities of a device for reassembly offload */
> > +   struct rte_eth_reass_capa reass_capa;
> >
> > -   uint64_t reserved_64s[2]; /**< Reserved for future fields */
> 
> Reserved fields were added to be able to update the struct without breaking
> the ABI, so that a critical change doesn't have to wait until next ABI break
> release.
> Since this is ABI break release, we can keep the reserved field and add the
> new struct. Or this can be an opportunity to get rid of the reserved field.
> 
> Personally I have no objection to get rid of the reserved field, but better to
> agree on this explicitly.
> 
> >     void *reserved_ptrs[2];   /**< Reserved for future fields */
> >  };
> >
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h index
> > bb38d7f581..cea25c87f7 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -200,10 +200,11 @@ extern "C" {
> >  #define PKT_RX_OUTER_L4_CKSUM_BAD  (1ULL << 21)
> >  #define PKT_RX_OUTER_L4_CKSUM_GOOD (1ULL << 22)
> >  #define PKT_RX_OUTER_L4_CKSUM_INVALID      ((1ULL << 21) | (1ULL
> << 22))
> > +#define PKT_RX_REASSEMBLY_INCOMPLETE       (1ULL << 23)
> >
> 
> Similar comment with Andrew's, what is the expectation from application if
> this flag exists? Can we drop it to simplify the logic in the application?

[Anoob] There can be few cases where hardware/NIC attempts inline reassembly 
but it fails to complete it

1. Number of fragments is larger than what is supported by the hardware
2. Hardware reassembly resources are exhausted (due to limited reassembly 
contexts etc)
3. Reassembly errors such as overlapping fragments
4. Wait time exhausted (or reassembly timeout)

In such cases, application would be required to retrieve the original fragments 
so that it can attempt reassembly in software. The incomplete flag is useful 
for 2 purposes basically,
1. Application would need to retrieve the time the fragment has already spend 
in hardware reassembly so that software reassembly attempt can compensate for 
it. Otherwise, reassembly timeout across hardware + software will not be 
accurate
2. Retrieve original fragments. With this proposal, an incomplete reassembly 
would result in a chained mbuf but the segments need not be consecutive. To 
explain bit more,

Suppose we have a packet that is fragmented into 3 fragments, and fragment 3 & 
fragment 1 arrives in that order. Fragment 2 didn't arrive and hardware 
ultimately pushes it. In that case, application would be receiving a 
chained/segmented mbuf with fragment 1 & fragment 3 chained.

Now, this chained mbuf can't be treated like a regular chained mbuf. Each 
fragment would have its IP hdr and there are fragments missing in between. The 
only thing application is expected to do is, retrieve fragments, push it to s/w 
reassembly.
 
> 
> >  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> >
> > -#define PKT_FIRST_FREE (1ULL << 23)
> > +#define PKT_FIRST_FREE (1ULL << 24)
> >  #define PKT_LAST_FREE (1ULL << 40)
> >
> >  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> > diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> > index 88d31de0a6..364eeb5cd4 100644
> > --- a/lib/security/rte_security.h
> > +++ b/lib/security/rte_security.h
> > @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
> >      * * 0: Disable per session security statistics collection for this SA.
> >      */
> >     uint32_t stats : 1;
> > +
> > +   /** Enable reassembly on incoming packets.
> > +    *
> > +    * * 1: Enable driver to try reassembly of encrypted IP packets for
> > +    *      this SA, if supported by the driver. This feature will work
> > +    *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> > +    *      inline ethernet device.
> > +    * * 0: Disable reassembly of packets (default).
> > +    */
> > +   uint32_t reass_en : 1;
> >  };
> >
> >  /** IPSec security association direction */
> >

Reply via email to