Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
> Sent: Monday, October 3, 2022 3:47 PM
> To: Wang, YuanX <yuanx.w...@intel.com>; dev@dpdk.org; Thomas
> Monjalon <tho...@monjalon.net>; Ferruh Yigit <ferruh.yi...@amd.com>
> Cc: ferruh.yi...@xilinx.com; m...@ashroe.eu; Li, Xiaoyun
> <xiaoyun...@intel.com>; Singh, Aman Deep <aman.deep.si...@intel.com>;
> Zhang, Yuying <yuying.zh...@intel.com>; Zhang, Qi Z
> <qi.z.zh...@intel.com>; Yang, Qiming <qiming.y...@intel.com>;
> jerinjac...@gmail.com; viachesl...@nvidia.com;
> step...@networkplumber.org; Ding, Xuan <xuan.d...@intel.com>;
> hpoth...@marvell.com; Tang, Yaqi <yaqi.t...@intel.com>; Wenxuan Wu
> <wenxuanx...@intel.com>
> Subject: Re: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 10/2/22 00:05, Yuan Wang wrote:
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given an arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> > header to driver. Besides, the existence of tunneling results in the
> > composition of a packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happen after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol
> > header is configured, driver will split the ingress packets into multiple
> segments.
> >
> > Examples for proto_hdr field defines:
> > To split after ETH-IPV4-UDP, it should be defined as
> > RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
> RTE_PTYPE_L4_UDP
> >
> > For inner ETH-IPV4-UDP, it should be defined as
> > RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> > RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> >
> > struct rte_eth_rxseg_split {
> >          struct rte_mempool *mp; /* memory pools to allocate segment from
> */
> >          uint16_t length; /* segment maximal data length,
> >                              configures split point */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          /**
> >      * Proto_hdr defines a bit mask of the protocol sequence as
> >           * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
> >           * in the mask indicates the split position.
> >      * For non-tunneling packets, the complete protocol sequence
> >           * should be defined.
> >      * For tunneling packets, for simplicity, only the tunnel and
> >           * inner protocol sequence should be defined.
> >      */
> >          uint32_t proto_hdr;
> > };
> >
> > If protocol header split can be supported by a PMD, the
> > rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> > obtain a list of these protocol headers.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >          seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
> >                 off0=2B
> >          seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
> >                 | RTE_PTYPE_L4_UDP, off1=128B
> >          seg2 - pool2, off1=0B
> >
> > The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> > following:
> >          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >          seg1 - udp header @ 128 in mbuf from pool1
> >          seg2 - payload @ 0 in mbuf from pool2
> >
> > Note: NIC will only do split when the packets exactly match all the
> > protocol headers in the segments. For example, if ARP packets received
> > with above config, the NIC won't do split for ARP packets since it
> > does not contains ipv4 header and udp header. These packets will be
> > put into the last valid mempool, with zero offset.
> >
> > Now buffer split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field will be ignored.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field will be ignored.
> >
> > The split limitations imposed by underlying driver is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory,
> respectively.
> >
> > Signed-off-by: Yuan Wang <yuanx.w...@intel.com>
> > Signed-off-by: Xuan Ding <xuan.d...@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx...@intel.com>
> 
> I apologize for delay with review. Overall LGTM now. See few notes below.

Thanks so much for your time and patience for this patch series.

> 
> > ---
> >   doc/guides/rel_notes/release_22_11.rst |  7 +++
> >   lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
> >   lib/ethdev/rte_ethdev.h                | 29 +++++++++-
> >   3 files changed, 98 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index 6a7474a3d6..510869c73a 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -101,6 +101,13 @@ New Features
> >     * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
> >       header protocols of a PMD to split.
> >
> > +* **Added protocol header based buffer split.**
> > +
> > +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` 
> > structure is
> > +    replaced with ``proto_hdr`` to support protocol header based buffer
> split.
> > +    User can choose length or protocol header to configure buffer split
> > +    according to NIC's capability.
> > +
> 
> It should be grouped together with other ethdev features.

We will send a new version. For the doc changes, the same as patch 1, could you 
help to adjust the doc?
Thanks very much.

> 
> >
> >   Removed Items
> >   -------------
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 1f0a7f8f3f..27ec19faed 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   }
> >
> >   static int
> > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> > -                        uint16_t n_seg, uint32_t *mbp_buf_size,
> > -                        const struct rte_eth_dev_info *dev_info)
> > +rte_eth_rx_queue_check_split(uint16_t port_id,
> > +                   const struct rte_eth_rxseg_split *rx_seg,
> > +                   uint16_t n_seg, uint32_t *mbp_buf_size,
> > +                   const struct rte_eth_dev_info *dev_info)
> >   {
> >     const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> >     struct rte_mempool *mp_first;
> > @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >             struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >             uint32_t length = rx_seg[seg_idx].length;
> >             uint32_t offset = rx_seg[seg_idx].offset;
> > +           uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >
> >             if (mpl == NULL) {
> >                     RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1707,13
> > +1709,63 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >             }
> >             offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >             *mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -           length = length != 0 ? length : *mbp_buf_size;
> > -           if (*mbp_buf_size < length + offset) {
> > -                   RTE_ETHDEV_LOG(ERR,
> > -                                  "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -                                  mpl->name, *mbp_buf_size,
> > -                                  length + offset, length, offset);
> > -                   return -EINVAL;
> > +
> > +           if (proto_hdr > 0) {
> > +                   /* Split based on protocol headers. */
> 
> Isn't safer here to ensure that segment length is set to 0?
> Just to protect agains misusage etc.

It's a reasonable suggestion, I will take it, please see v8.

> 
> > +
> > +                   /* skip the payload */
> 
> Sorry, it is confusing. What do you mean here?

Because setting n proto_hdr will generate (n+1) segments. If we want to split 
the packet into n segments, we only need to check the first (n-1) proto_hdr.
For example, for ETH-IPV4-UDP-PAYLOAD, if we want to split after the UDP 
header, we only need to set and check the UDP header in the first segment.

Maybe mask is not a good way, so we will use index to filter out the check of 
proto_hdr inside the last segment.

> 
> > +                   if (proto_hdr == RTE_PTYPE_ALL_MASK)
> > +                           continue;
> > +
> > +                   int ptype_cnt;
> > +
> > +                   ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> > +                   if (ptype_cnt <= 0) {
> > +                           RTE_ETHDEV_LOG(ERR,
> > +                                   "Port %u failed to supported buffer
> split header protocols\n",
> > +                                   port_id);
> > +                           return -EINVAL;
> > +                   }
> > +
> > +                   uint32_t ptypes[ptype_cnt];
> > +                   int i;
> 
> First of all do no mix code and variable declaration.
> It significantly complicates code reading.

Thanks, the code and variable declaration will be separated.

> Second creation of an array on stack based on function return value is very
> dangerours from security point of view - potential stack overflow and
> corresponding vulnerabilities.

The function value is used for defining how much space is needed to store 
ptypes. Thanks for your correction of stack overflow, we will use heap instead.

> 
> > +
> > +                   ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +
>       ptypes, ptype_cnt);
> > +                   if (ptype_cnt < 0) {
> > +                           RTE_ETHDEV_LOG(ERR,
> > +                                   "Port %u failed to supported buffer
> split header protocols\n",
> > +                                   port_id);
> > +                           return -EINVAL;
> > +                   }
> > +
> > +                   for (i = 0; i < ptype_cnt; i++)
> > +                           if (ptypes[i] == proto_hdr)
> > +                                   break;
> > +                   if (i == ptype_cnt) {
> > +                           RTE_ETHDEV_LOG(ERR,
> > +                                   "Requested Rx split header protocols
> 0x%x is not supported.\n",
> > +                                   proto_hdr);
> > +                           return -EINVAL;
> > +                   }
> > +
> > +                   if (*mbp_buf_size < offset) {
> 
> The check is obviously insufficient, but I agree that it should be driver
> reponsibility to do extra checks for required space in mbuf.
> 
> > +                           RTE_ETHDEV_LOG(ERR,
> > +                                           "%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > +                                           mpl->name, *mbp_buf_size,
> > +                                           offset);
> > +                           return -EINVAL;
> > +                   }
> > +           } else {
> > +                   /* Split at fixed length. */
> > +                   length = length != 0 ? length : *mbp_buf_size;
> > +                   if (*mbp_buf_size < length + offset) {
> > +                           RTE_ETHDEV_LOG(ERR,
> > +                                   "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +                                   mpl->name, *mbp_buf_size,
> > +                                   length + offset, length, offset);
> > +                           return -EINVAL;
> > +                   }
> >             }
> >     }
> >     return 0;
> > @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >             n_seg = rx_conf->rx_nseg;
> >
> >             if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> {
> > -                   ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> > +                   ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
> >                                                        &mbp_buf_size,
> >                                                        &dev_info);
> >                     if (ret != 0)
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > cf14e04010..a5f9647bd3 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -994,6 +994,9 @@ struct rte_eth_txmode {
> >    *   specified in the first array element, the second buffer, from the
> >    *   pool in the second element, and so on.
> >    *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >    * - The offsets from the segment description elements specify
> >    *   the data offset from the buffer beginning except the first mbuf.
> >    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > @@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field will be ignored.
> 
> Looking at the code above I think proto_hdr must be 0.
> 
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field will be ignored.
> 
> I'd require length to be 0 to avoid misusage of the API.

Sure, we will fix them in v8.

> 
> > + *
> > + * - For Protocol header based buffer split, if the received packets
> > + *   don't exactly match all protocol headers in the elements, packets
> > + *   will not be split.
> > + *   These packets will be put into:
> > + *     - pool from the last valid element
> > + *     - the buffer size from this pool
> > + *     - zero offset
> 
> Shoundl't be check that dataroom in the last segment mempool is sufficient
> for up to MTU packet if Rx scatter is disabled?

Yes, we will add this check in the last segment.

Thanks,
Yuan

> 
> >    */
> >   struct rte_eth_rxseg_split {
> >     struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >     uint16_t length; /**< Segment data length, configures split point. */
> >     uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -   uint32_t reserved; /**< Reserved field. */
> > +   /**
> > +    * Proto_hdr defines a bit mask of the protocol sequence as
> RTE_PTYPE_*,
> > +    * configures split point. The last RTE_PTYPE* in the mask indicates
> the
> > +    * split position.
> > +    * For non-tunneling packets, the complete protocol sequence should
> be defined.
> > +    * For tunneling packets, for simplicity, only the tunnel and inner
> > +    * protocol sequence should be defined.
> > +    */
> > +   uint32_t proto_hdr;
> >   };
> >
> >   /**

Reply via email to