Hi Andrew, > -----Original Message----- > From: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> > Sent: Monday, October 3, 2022 3:47 PM > To: Wang, YuanX <yuanx.w...@intel.com>; dev@dpdk.org; Thomas > Monjalon <tho...@monjalon.net>; Ferruh Yigit <ferruh.yi...@amd.com> > Cc: ferruh.yi...@xilinx.com; m...@ashroe.eu; Li, Xiaoyun > <xiaoyun...@intel.com>; Singh, Aman Deep <aman.deep.si...@intel.com>; > Zhang, Yuying <yuying.zh...@intel.com>; Zhang, Qi Z > <qi.z.zh...@intel.com>; Yang, Qiming <qiming.y...@intel.com>; > jerinjac...@gmail.com; viachesl...@nvidia.com; > step...@networkplumber.org; Ding, Xuan <xuan.d...@intel.com>; > hpoth...@marvell.com; Tang, Yaqi <yaqi.t...@intel.com>; Wenxuan Wu > <wenxuanx...@intel.com> > Subject: Re: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split > > On 10/2/22 00:05, Yuan Wang wrote: > > Currently, Rx buffer split supports length based split. With Rx queue > > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet > segment > > configured, PMD will be able to split the received packets into > > multiple segments. > > > > However, length based buffer split is not suitable for NICs that do > > split based on protocol headers. Given an arbitrarily variable length > > in Rx packet segment, it is almost impossible to pass a fixed protocol > > header to driver. Besides, the existence of tunneling results in the > > composition of a packet is various, which makes the situation even worse. > > > > This patch extends current buffer split to support protocol header > > based buffer split. A new proto_hdr field is introduced in the > > reserved field of rte_eth_rxseg_split structure to specify protocol > > header. The proto_hdr field defines the split position of packet, > > splitting will always happen after the protocol header defined in the > > Rx packet segment. When Rx queue offload > > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding > protocol > > header is configured, driver will split the ingress packets into multiple > segments. > > > > Examples for proto_hdr field defines: > > To split after ETH-IPV4-UDP, it should be defined as > > RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | > RTE_PTYPE_L4_UDP > > > > For inner ETH-IPV4-UDP, it should be defined as > > RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER | > > RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP > > > > struct rte_eth_rxseg_split { > > struct rte_mempool *mp; /* memory pools to allocate segment from > */ > > uint16_t length; /* segment maximal data length, > > configures split point */ > > uint16_t offset; /* data offset from beginning > > of mbuf data buffer */ > > /** > > * Proto_hdr defines a bit mask of the protocol sequence as > > * RTE_PTYPE_*, configures split point. The last RTE_PTYPE* > > * in the mask indicates the split position. > > * For non-tunneling packets, the complete protocol sequence > > * should be defined. > > * For tunneling packets, for simplicity, only the tunnel and > > * inner protocol sequence should be defined. > > */ > > uint32_t proto_hdr; > > }; > > > > If protocol header split can be supported by a PMD, the > > rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to > > obtain a list of these protocol headers. > > > > For example, let's suppose we configured the Rx queue with the > > following segments: > > seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4, > > off0=2B > > seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4 > > | RTE_PTYPE_L4_UDP, off1=128B > > seg2 - pool2, off1=0B > > > > The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like > > following: > > seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from > pool0 > > seg1 - udp header @ 128 in mbuf from pool1 > > seg2 - payload @ 0 in mbuf from pool2 > > > > Note: NIC will only do split when the packets exactly match all the > > protocol headers in the segments. For example, if ARP packets received > > with above config, the NIC won't do split for ARP packets since it > > does not contains ipv4 header and udp header. These packets will be > > put into the last valid mempool, with zero offset. > > > > Now buffer split can be configured in two modes. For length based > > buffer split, the mp, length, offset field in Rx packet segment should > > be configured, while the proto_hdr field will be ignored. > > For protocol header based buffer split, the mp, offset, proto_hdr > > field in Rx packet segment should be configured, while the length > > field will be ignored. > > > > The split limitations imposed by underlying driver is reported in the > > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the > > split parts may differ either, dpdk memory and external memory, > respectively. > > > > Signed-off-by: Yuan Wang <yuanx.w...@intel.com> > > Signed-off-by: Xuan Ding <xuan.d...@intel.com> > > Signed-off-by: Wenxuan Wu <wenxuanx...@intel.com> > > I apologize for delay with review. Overall LGTM now. See few notes below.
Thanks so much for your time and patience for this patch series. > > > --- > > doc/guides/rel_notes/release_22_11.rst | 7 +++ > > lib/ethdev/rte_ethdev.c | 74 ++++++++++++++++++++++---- > > lib/ethdev/rte_ethdev.h | 29 +++++++++- > > 3 files changed, 98 insertions(+), 12 deletions(-) > > > > diff --git a/doc/guides/rel_notes/release_22_11.rst > > b/doc/guides/rel_notes/release_22_11.rst > > index 6a7474a3d6..510869c73a 100644 > > --- a/doc/guides/rel_notes/release_22_11.rst > > +++ b/doc/guides/rel_notes/release_22_11.rst > > @@ -101,6 +101,13 @@ New Features > > * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get > supported > > header protocols of a PMD to split. > > > > +* **Added protocol header based buffer split.** > > + > > + * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` > > structure is > > + replaced with ``proto_hdr`` to support protocol header based buffer > split. > > + User can choose length or protocol header to configure buffer split > > + according to NIC's capability. > > + > > It should be grouped together with other ethdev features. We will send a new version. For the doc changes, the same as patch 1, could you help to adjust the doc? Thanks very much. > > > > > Removed Items > > ------------- > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index > > 1f0a7f8f3f..27ec19faed 100644 > > --- a/lib/ethdev/rte_ethdev.c > > +++ b/lib/ethdev/rte_ethdev.c > > @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id) > > } > > > > static int > > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg, > > - uint16_t n_seg, uint32_t *mbp_buf_size, > > - const struct rte_eth_dev_info *dev_info) > > +rte_eth_rx_queue_check_split(uint16_t port_id, > > + const struct rte_eth_rxseg_split *rx_seg, > > + uint16_t n_seg, uint32_t *mbp_buf_size, > > + const struct rte_eth_dev_info *dev_info) > > { > > const struct rte_eth_rxseg_capa *seg_capa = &dev_info- > >rx_seg_capa; > > struct rte_mempool *mp_first; > > @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct > rte_eth_rxseg_split *rx_seg, > > struct rte_mempool *mpl = rx_seg[seg_idx].mp; > > uint32_t length = rx_seg[seg_idx].length; > > uint32_t offset = rx_seg[seg_idx].offset; > > + uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr; > > > > if (mpl == NULL) { > > RTE_ETHDEV_LOG(ERR, "null mempool pointer\n"); > @@ -1707,13 > > +1709,63 @@ rte_eth_rx_queue_check_split(const struct > rte_eth_rxseg_split *rx_seg, > > } > > offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM; > > *mbp_buf_size = rte_pktmbuf_data_room_size(mpl); > > - length = length != 0 ? length : *mbp_buf_size; > > - if (*mbp_buf_size < length + offset) { > > - RTE_ETHDEV_LOG(ERR, > > - "%s mbuf_data_room_size %u < %u > (segment length=%u + segment offset=%u)\n", > > - mpl->name, *mbp_buf_size, > > - length + offset, length, offset); > > - return -EINVAL; > > + > > + if (proto_hdr > 0) { > > + /* Split based on protocol headers. */ > > Isn't safer here to ensure that segment length is set to 0? > Just to protect agains misusage etc. It's a reasonable suggestion, I will take it, please see v8. > > > + > > + /* skip the payload */ > > Sorry, it is confusing. What do you mean here? Because setting n proto_hdr will generate (n+1) segments. If we want to split the packet into n segments, we only need to check the first (n-1) proto_hdr. For example, for ETH-IPV4-UDP-PAYLOAD, if we want to split after the UDP header, we only need to set and check the UDP header in the first segment. Maybe mask is not a good way, so we will use index to filter out the check of proto_hdr inside the last segment. > > > + if (proto_hdr == RTE_PTYPE_ALL_MASK) > > + continue; > > + > > + int ptype_cnt; > > + > > + ptype_cnt = > rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0); > > + if (ptype_cnt <= 0) { > > + RTE_ETHDEV_LOG(ERR, > > + "Port %u failed to supported buffer > split header protocols\n", > > + port_id); > > + return -EINVAL; > > + } > > + > > + uint32_t ptypes[ptype_cnt]; > > + int i; > > First of all do no mix code and variable declaration. > It significantly complicates code reading. Thanks, the code and variable declaration will be separated. > Second creation of an array on stack based on function return value is very > dangerours from security point of view - potential stack overflow and > corresponding vulnerabilities. The function value is used for defining how much space is needed to store ptypes. Thanks for your correction of stack overflow, we will use heap instead. > > > + > > + ptype_cnt = > rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, > > + > ptypes, ptype_cnt); > > + if (ptype_cnt < 0) { > > + RTE_ETHDEV_LOG(ERR, > > + "Port %u failed to supported buffer > split header protocols\n", > > + port_id); > > + return -EINVAL; > > + } > > + > > + for (i = 0; i < ptype_cnt; i++) > > + if (ptypes[i] == proto_hdr) > > + break; > > + if (i == ptype_cnt) { > > + RTE_ETHDEV_LOG(ERR, > > + "Requested Rx split header protocols > 0x%x is not supported.\n", > > + proto_hdr); > > + return -EINVAL; > > + } > > + > > + if (*mbp_buf_size < offset) { > > The check is obviously insufficient, but I agree that it should be driver > reponsibility to do extra checks for required space in mbuf. > > > + RTE_ETHDEV_LOG(ERR, > > + "%s > mbuf_data_room_size %u < %u segment offset)\n", > > + mpl->name, *mbp_buf_size, > > + offset); > > + return -EINVAL; > > + } > > + } else { > > + /* Split at fixed length. */ > > + length = length != 0 ? length : *mbp_buf_size; > > + if (*mbp_buf_size < length + offset) { > > + RTE_ETHDEV_LOG(ERR, > > + "%s mbuf_data_room_size %u < %u > (segment length=%u + segment offset=%u)\n", > > + mpl->name, *mbp_buf_size, > > + length + offset, length, offset); > > + return -EINVAL; > > + } > > } > > } > > return 0; > > @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, > uint16_t rx_queue_id, > > n_seg = rx_conf->rx_nseg; > > > > if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) > { > > - ret = rte_eth_rx_queue_check_split(rx_seg, n_seg, > > + ret = rte_eth_rx_queue_check_split(port_id, rx_seg, > n_seg, > > &mbp_buf_size, > > &dev_info); > > if (ret != 0) > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index > > cf14e04010..a5f9647bd3 100644 > > --- a/lib/ethdev/rte_ethdev.h > > +++ b/lib/ethdev/rte_ethdev.h > > @@ -994,6 +994,9 @@ struct rte_eth_txmode { > > * specified in the first array element, the second buffer, from the > > * pool in the second element, and so on. > > * > > + * - The proto_hdrs in the elements define the split position of > > + * received packets. > > + * > > * - The offsets from the segment description elements specify > > * the data offset from the buffer beginning except the first mbuf. > > * The first segment offset is added with RTE_PKTMBUF_HEADROOM. > > @@ -1015,12 +1018,36 @@ struct rte_eth_txmode { > > * - pool from the last valid element > > * - the buffer size from this pool > > * - zero offset > > + * > > + * - Length based buffer split: > > + * - mp, length, offset should be configured. > > + * - The proto_hdr field will be ignored. > > Looking at the code above I think proto_hdr must be 0. > > > + * > > + * - Protocol header based buffer split: > > + * - mp, offset, proto_hdr should be configured. > > + * - The length field will be ignored. > > I'd require length to be 0 to avoid misusage of the API. Sure, we will fix them in v8. > > > + * > > + * - For Protocol header based buffer split, if the received packets > > + * don't exactly match all protocol headers in the elements, packets > > + * will not be split. > > + * These packets will be put into: > > + * - pool from the last valid element > > + * - the buffer size from this pool > > + * - zero offset > > Shoundl't be check that dataroom in the last segment mempool is sufficient > for up to MTU packet if Rx scatter is disabled? Yes, we will add this check in the last segment. Thanks, Yuan > > > */ > > struct rte_eth_rxseg_split { > > struct rte_mempool *mp; /**< Memory pool to allocate segment > from. */ > > uint16_t length; /**< Segment data length, configures split point. */ > > uint16_t offset; /**< Data offset from beginning of mbuf data buffer. > */ > > - uint32_t reserved; /**< Reserved field. */ > > + /** > > + * Proto_hdr defines a bit mask of the protocol sequence as > RTE_PTYPE_*, > > + * configures split point. The last RTE_PTYPE* in the mask indicates > the > > + * split position. > > + * For non-tunneling packets, the complete protocol sequence should > be defined. > > + * For tunneling packets, for simplicity, only the tunnel and inner > > + * protocol sequence should be defined. > > + */ > > + uint32_t proto_hdr; > > }; > > > > /**