Hi Andrew, > -----Original Message----- > From: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> > Sent: Tuesday, September 20, 2022 1:35 PM > To: Wang, YuanX <yuanx.w...@intel.com>; dev@dpdk.org; Thomas > Monjalon <tho...@monjalon.net>; Ferruh Yigit <ferruh.yi...@xilinx.com> > Cc: m...@ashroe.eu; Li, Xiaoyun <xiaoyun...@intel.com>; Singh, Aman Deep > <aman.deep.si...@intel.com>; Zhang, Yuying <yuying.zh...@intel.com>; > Zhang, Qi Z <qi.z.zh...@intel.com>; Yang, Qiming <qiming.y...@intel.com>; > jerinjac...@gmail.com; viachesl...@nvidia.com; > step...@networkplumber.org; Ding, Xuan <xuan.d...@intel.com>; > hpoth...@marvell.com; Tang, Yaqi <yaqi.t...@intel.com>; Wenxuan Wu > <wenxuanx...@intel.com> > Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split > > On 9/16/22 11:38, Wang, YuanX wrote: > > Hi Andrew, > > > >> -----Original Message----- > >> From: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> > >> Sent: Monday, September 12, 2022 7:47 PM > >> To: Wang, YuanX <yuanx.w...@intel.com>; dev@dpdk.org; Thomas > Monjalon > >> <tho...@monjalon.net>; Ferruh Yigit <ferruh.yi...@xilinx.com> > >> Cc: m...@ashroe.eu; Li, Xiaoyun <xiaoyun...@intel.com>; Singh, Aman > >> Deep <aman.deep.si...@intel.com>; Zhang, Yuying > >> <yuying.zh...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; Yang, > >> Qiming <qiming.y...@intel.com>; jerinjac...@gmail.com; > >> viachesl...@nvidia.com; step...@networkplumber.org; Ding, Xuan > >> <xuan.d...@intel.com>; hpoth...@marvell.com; Tang, Yaqi > >> <yaqi.t...@intel.com>; Wenxuan Wu <wenxuanx...@intel.com> > >> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based > >> buffer split > >> > >> On 9/2/22 22:10, Yuan Wang wrote: > >>> Currently, Rx buffer split supports length based split. With Rx > >>> queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx > packet > >> segment > >>> configured, PMD will be able to split the received packets into > >>> multiple segments. > >>> > >>> However, length based buffer split is not suitable for NICs that do > >>> split based on protocol headers. Given an arbitrarily variable > >>> length in Rx packet segment, it is almost impossible to pass a fixed > >>> protocol header to driver. Besides, the existence of tunneling > >>> results in the composition of a packet is various, which makes the > situation even worse. > >>> > >>> This patch extends current buffer split to support protocol header > >>> based buffer split. A new proto_hdr field is introduced in the > >>> reserved field of rte_eth_rxseg_split structure to specify protocol > >>> header. The proto_hdr field defines the split position of packet, > >>> splitting will always happens after the protocol header defined in > >>> the Rx packet segment. When Rx queue offload > >>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding > >> protocol > >>> header is configured, driver will split the ingress packets into > >>> multiple > >> segments. > >>> > >>> struct rte_eth_rxseg_split { > >>> struct rte_mempool *mp; /* memory pools to allocate > >>> segment from > >> */ > >>> uint16_t length; /* segment maximal data length, > >>> configures split point */ > >>> uint16_t offset; /* data offset from beginning > >>> of mbuf data buffer */ > >>> uint32_t proto_hdr; /* supported ptype of a specific pmd, > >>> configures split point. > >>> It should be defined by RTE_PTYPE_* > >> > >> If I understand correctly, the statement is a bit misleading since it > >> should be a bit mask of RTE_PTYPE_* defines. Not exactly one > RTE_PTYPE_*. > > > > Do you mean that a segment should support multiple protocol headers, > such as splitting both tcp and udp headers? > > No-no. Look. In order to split after some protocol, for example UDP, NIC > should recognice all previous protocols. Moreover, in the case of a tunenl > UDP could be inner and outer. At which point would you like to split if NIC > supports both? > Another example, if NIC support stplit after Eth-IPv4-UDP and after Eth-IPv6- > UDP, how to request to split just after Eth-IPv4-UDP, but not Eth-IPv6-UDP?
Thank you for your patience. We have a proposal to solve this problem. We define the proto_hdr as a bit mask. Each mask should contain the composition of the packet, but inner and outer are written separately (to avoid unnecessary trouble). We use the highest RTE_PTYPE* in the mask to define the split position. For the first example, since ptype has distinguished between outer and inner, external UDP can be simply written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP. The inner UDP can be simply written as RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP. For the second example, Eth-IPv4-UDP can be written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP. Eth-IPv6-UDP can be written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP. What do you think? Thanks, Yuan > > > > >> > >>> */ > >>> }; > >>> > >>> If protocol header split can be supported by a PMD. The > >>> rte_eth_buffer_split_get_supported_hdr_ptypes function can be use > to > >>> obtain a list of these protocol headers. > >>> > >>> For example, let's suppose we configured the Rx queue with the > >>> following segments: > >>> seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B > >>> seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B > >>> seg2 - pool2, off1=0B > >>> > >>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like > >> > >> What is MAC_IPV4_UDP_PAYLOAD? Do you mean > ETH_IPV4_UDP_PAYLOAD? > > > > Thanks for your correction, it should be ETH_IPV4_UDP_PAYLOAD. > > > >> > >>> following: > >>> seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from > >> pool0 > >>> seg1 - udp header @ 128 in mbuf from pool1 > >>> seg2 - payload @ 0 in mbuf from pool2 > >>> > >>> Note: NIC will only do split when the packets exactly match all the > >>> protocol headers in the segments. For example, if ARP packets > >>> received with above config, the NIC won't do split for ARP packets > >>> since it does not contains ipv4 header and udp header. > >> > >> You must define which mempool is used in the case. > > > > IMHO I don't think we can define which mempool to use, it depends on NIC > behavior. > > For our NIC, packets that are unable to split will be put into the last > > valid > pool, with zero offset. > > So here we would like to define to put these packets into the last valid > mempool, with zero offset. > > > Anyway API should not be silent about the case since it is the first question > at > least in my head. IMHO the last segment is the only sensible option since it > is > typically will be big enough. Other mempool for protocol headers are likely to > be small. So, I suggest to define this way. > > >> > >>> > >>> Now buffer split can be configured in two modes. For length based > >>> buffer split, the mp, length, offset field in Rx packet segment > >>> should be configured, while the proto_hdr field will be ignored. > >>> For protocol header based buffer split, the mp, offset, proto_hdr > >>> field in Rx packet segment should be configured, while the length > >>> field will be ignored. > >>> > >>> The split limitations imposed by underlying driver is reported in > >>> the rte_eth_dev_info->rx_seg_capa field. The memory attributes for > >>> the split parts may differ either, dpdk memory and external memory, > >> respectively. > >>> > >>> Signed-off-by: Yuan Wang <yuanx.w...@intel.com> > >>> Signed-off-by: Xuan Ding <xuan.d...@intel.com> > >>> Signed-off-by: Wenxuan Wu <wenxuanx...@intel.com> > >>> --- > >>> doc/guides/rel_notes/release_22_11.rst | 5 +++ > >>> lib/ethdev/rte_ethdev.c | 55 ++++++++++++++++++++------ > >>> lib/ethdev/rte_ethdev.h | 17 +++++++- > >>> 3 files changed, 65 insertions(+), 12 deletions(-) > >>> > >>> diff --git a/doc/guides/rel_notes/release_22_11.rst > >>> b/doc/guides/rel_notes/release_22_11.rst > >>> index 4d90514a9a..f3b58c7895 100644 > >>> --- a/doc/guides/rel_notes/release_22_11.rst > >>> +++ b/doc/guides/rel_notes/release_22_11.rst > >>> @@ -60,6 +60,11 @@ New Features > >>> Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to > >>> get > >> supported > >>> header protocols of a PMD to split. > >>> > >>> +* **Added protocol header based buffer split.** > >>> + Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` > >>> +structure is > >>> + replaced with ``proto_hdr`` to support protocol header based buffer > split. > >>> + User can choose length or protocol header to configure buffer > >>> +split > >>> + according to NIC's capability. > >> > >> Add one more empty line to have two before the next sectoin. > > > > Thanks for your catch. > > > >> > >>> > >>> Removed Items > >>> ------------- > >>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index > >>> 093c577add..dfceb723ee 100644 > >>> --- a/lib/ethdev/rte_ethdev.c > >>> +++ b/lib/ethdev/rte_ethdev.c > >>> @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id) > >>> } > >>> > >>> static int > >>> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split > *rx_seg, > >>> - uint16_t n_seg, uint32_t *mbp_buf_size, > >>> - const struct rte_eth_dev_info *dev_info) > >>> +rte_eth_rx_queue_check_split(uint16_t port_id, > >>> + const struct rte_eth_rxseg_split *rx_seg, > >>> + uint16_t n_seg, uint32_t *mbp_buf_size, > >>> + const struct rte_eth_dev_info *dev_info) > >>> { > >>> const struct rte_eth_rxseg_capa *seg_capa = &dev_info- > >>> rx_seg_capa; > >>> struct rte_mempool *mp_first; > >>> @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct > >> rte_eth_rxseg_split *rx_seg, > >>> struct rte_mempool *mpl = rx_seg[seg_idx].mp; > >>> uint32_t length = rx_seg[seg_idx].length; > >>> uint32_t offset = rx_seg[seg_idx].offset; > >>> + uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr; > >>> > >>> if (mpl == NULL) { > >>> RTE_ETHDEV_LOG(ERR, "null mempool pointer\n"); > >> @@ -1693,13 > >>> +1695,44 @@ rte_eth_rx_queue_check_split(const struct > >> rte_eth_rxseg_split *rx_seg, > >>> } > >>> offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM; > >>> *mbp_buf_size = rte_pktmbuf_data_room_size(mpl); > >>> - length = length != 0 ? length : *mbp_buf_size; > >>> - if (*mbp_buf_size < length + offset) { > >>> - RTE_ETHDEV_LOG(ERR, > >>> - "%s mbuf_data_room_size %u < %u > >> (segment length=%u + segment offset=%u)\n", > >>> - mpl->name, *mbp_buf_size, > >>> - length + offset, length, offset); > >>> - return -EINVAL; > >>> + > >>> + int ret = > >> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, > >>> +NULL, 0); > >> > >> Do not mix variable declaration and code. > >> It is better to give the variable some sensible name. > >> Otherwise else branch code is hard to read. > > > > Thanks for the suggestion, will take care of naming. > > > >> > >>> + if (ret <= 0) { > >> > >> May be I'm missing something, but nothing prevetns a driver/HW to > >> support both protocol-based and fixed-length split. > >> So, ability to support protocol based split should be treated as a > >> request to do it. It must be based on rx_seg->proto_hdr content (for all > segments). > >> > >> Also nothing should prevent to mix protocol and fixed-length split. > >> I.e. split just after UDP in the first segment, > >> 40 bytes in the second segment, everything else in the third. > > > > Mix mode is an interesting idea. Currently testpmd and driver do not > support mixed mode, but it does not affect the library to support this mode. > > That's OK. > > > > >> > >>> + /* Split at fixed length. */ > >>> + length = length != 0 ? length : *mbp_buf_size; > >>> + if (*mbp_buf_size < length + offset) { > >>> + RTE_ETHDEV_LOG(ERR, > >>> + "%s mbuf_data_room_size %u < %u > >> (segment length=%u + segment offset=%u)\n", > >>> + mpl->name, *mbp_buf_size, > >>> + length + offset, length, offset); > >>> + return -EINVAL; > >>> + } > >>> + } else { > >>> + /* Split after specified protocol header. */ > >>> + uint32_t ptypes[ret]; > >>> + int i; > >>> + > >>> + ret = > >> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, > >>> +ptypes, ret); > >> > >> In theory, the funciton could fail since input arguments differ. So, > >> it should be handled. > > > > Thanks for your catch, will fix in the next version. > > > >> > >>> + for (i = 0; i < ret; i++) > >>> + if (ptypes[i] & proto_hdr) > >> > >> IMHO it should be ==, not &. I think that > >> rte_eth_buffer_split_get_supported_hdr_ptypes() should define points > >> at which split could happen and we should match the point exactly. > > > > Sure, == is better. Thanks for the suggestion. > > > >> > >>> + break; > >>> + > >>> + if (i == ret) { > >>> +#define PTYPE_NAMESIZE 256 > >> > >> Why? It is looks really strange that it is defined here. > > > > I intend to display the protocol name in the log, but if the proto_hdr is a > > bit > mask, can I just show the number? > > Please see v4 for this modification. > > > >> > >>> + char ptype_name[PTYPE_NAMESIZE]; > >>> + rte_get_ptype_name(proto_hdr, > >> ptype_name, sizeof(ptype_name)); > >>> + RTE_ETHDEV_LOG(ERR, > >>> + "Protocol header %s is not > >> supported.\n", > >>> + ptype_name); > >>> + return -EINVAL; > >>> + } > >>> + if (*mbp_buf_size < offset) { > >>> + RTE_ETHDEV_LOG(ERR, > >>> + "%s > >> mbuf_data_room_size %u < %u segment offset)\n", > >>> + mpl->name, *mbp_buf_size, > >>> + offset); > >>> + return -EINVAL; > >>> + } > >>> } > >>> } > >>> return 0; > >>> @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, > >> uint16_t rx_queue_id, > >>> n_seg = rx_conf->rx_nseg; > >>> > >>> if (rx_conf->offloads & > >> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) { > >>> - ret = rte_eth_rx_queue_check_split(rx_seg, n_seg, > >>> + ret = rte_eth_rx_queue_check_split(port_id, rx_seg, > >> n_seg, > >>> > >>> &mbp_buf_size, > >>> &dev_info); > >>> if (ret != 0) > >>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index > >>> c58c908c3a..410fba5eab 100644 > >>> --- a/lib/ethdev/rte_ethdev.h > >>> +++ b/lib/ethdev/rte_ethdev.h > >>> @@ -1175,6 +1175,9 @@ struct rte_eth_txmode { > >>> * specified in the first array element, the second buffer, from the > >>> * pool in the second element, and so on. > >>> * > >>> + * - The proto_hdrs in the elements define the split position of > >>> + * received packets. > >>> + * > >>> * - The offsets from the segment description elements specify > >>> * the data offset from the buffer beginning except the first mbuf. > >>> * The first segment offset is added with RTE_PKTMBUF_HEADROOM. > >>> @@ -1196,12 +1199,24 @@ struct rte_eth_txmode { > >>> * - pool from the last valid element > >>> * - the buffer size from this pool > >>> * - zero offset > >>> + * > >>> + * - Length based buffer split: > >>> + * - mp, length, offset should be configured. > >>> + * - The proto_hdr field will be ignored. > >>> + * > >>> + * - Protocol header based buffer split: > >>> + * - mp, offset, proto_hdr should be configured. > >>> + * - The length field will be ignored. > >>> */ > >>> struct rte_eth_rxseg_split { > >>> struct rte_mempool *mp; /**< Memory pool to allocate segment > >> from. */ > >>> uint16_t length; /**< Segment data length, configures split > >>> point. */ > >>> uint16_t offset; /**< Data offset from beginning of mbuf data > >>> buffer. > >> */ > >>> - uint32_t reserved; /**< Reserved field. */ > >>> + /** > >>> + * Supported ptype of a specific pmd, configures split point. > >>> + * It should be defined by RTE_PTYPE_*. > >>> + */ > >>> + uint32_t proto_hdr; > >>> }; > >>> > >>> /** > >