Hi Xuan,
On 4/12/22 19:15, Ding, Xuan wrote:
Hi Andrew,
-----Original Message-----
From: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
Sent: Thursday, April 7, 2022 6:48 PM
To: Wu, WenxuanX <wenxuanx...@intel.com>; tho...@monjalon.net; Li,
Xiaoyun <xiaoyun...@intel.com>; Singh, Aman Deep
<aman.deep.si...@intel.com>; Zhang, Yuying <yuying.zh...@intel.com>;
Zhang, Qi Z <qi.z.zh...@intel.com>
Cc: dev@dpdk.org; step...@networkplumber.org;
m...@smartsharesystems.com; viachesl...@nvidia.com; Yu, Ping
<ping...@intel.com>; Ding, Xuan <xuan.d...@intel.com>; Wang, YuanX
<yuanx.w...@intel.com>; david.march...@redhat.com; Ferruh Yigit
<ferr...@xilinx.com>
Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
On 4/2/22 13:41, wenxuanx...@intel.com wrote:
From: Xuan Ding <xuan.d...@intel.com>
Header split consists of splitting a received packet into two separate
regions based on the packet content. The split happens after the
packet header and before the packet payload. Splitting is usually
between the packet header that can be posted to a dedicated buffer and
the packet payload that can be posted to a different buffer.
Currently, Rx buffer split supports length and offset based packet split.
Although header split is a subset of buffer split, configuring buffer
split based on length is not suitable for NICs that do split based on
header protocol types. Because tunneling makes the conversion from
length to protocol type impossible.
This patch extends the current buffer split to support protocol type
and offset based header split. A new proto field is introduced in the
rte_eth_rxseg_split structure reserved field to specify header
protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
enabled and protocol type configured, PMD will split the ingress
packets into two separate regions. Currently, both inner and outer
L2/L3/L4 level header split can be supported.
RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
ago to substitute bit-field header_split in struct rte_eth_rxmode. It allows to
enable header split offload with the header size controlled using
split_hdr_size in the same structure.
Right now I see no single PMD which actually supports
RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
Many examples and test apps initialize the field to 0 explicitly. The most of
drivers simply ignore split_hdr_size since the offload is not advertised, but
some double-check that its value is 0.
I think that it means that the field should be removed on the next LTS, and I'd
say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
We should not redefine the offload meaning.
Yes, you are right. No single PMD supports RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
Previously, I used this flag is to distinguish buffer split and header split.
The former supports multi-segments split by length and offset.
offset is misleading here, since split offset is derived from
segment lengths. Offset specified in segments is a different
thing.
The later supports two segments split by proto and offset.
At this level, header split is a subset of buffer split.
IMHO, generic definition of the header split should not limit
it by just two segments.
Since we shouldn't redefine the meaning of this offload,
I will use the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
The existence of tunnel needs to define a proto field in buffer split,
because some PMDs do not support split based on length and offset.
Not sure that I fully understand, but I'm looking forward
to review v5.
For example, let's suppose we configured the Rx queue with the
following segments:
seg0 - pool0, off0=2B
seg1 - pool1, off1=128B
Corresponding feature is named Rx buffer split.
Does it mean that protocol type based header split requires Rx buffer split
feature to be supported?
Protocol type based header split does not requires Rx buffer split.
In previous design, the header split and buffer split are exclusive.
Because we only configure one split offload for one RX queue.
With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
pool0
seg1 - payload @ 128 in mbuf from pool1
Is it always outermost UDP? Does it require both UDP over IPv4 and UDP over
IPv6 to be supported? What will happen if only one is supported? How
application can find out which protocol stack are supported?
Both inner and outer UDP are considered.
Current design does not distinguish UDP over IPv4 or IPv6.
If we want to support granularity like only IPv4 or IPv6 supported,
user need add more configurations.
You should make it clear for application how to use it.
What happens if unsupported packet is received on an RxQ
configured to do header split?
If application want to find out which protocol stack is supported,
one way I think is to expose the protocol stack supported by the driver through
dev_info.
Any thoughts are welcomed :)
dev_info is nice, but very heavily overloaded. We can start
from dev_info and understand if it should be factored out
to a separate API or it is OK to have it in dev_info if
it just few simple fields.
The memory attributes for the split parts may differ either - for
example the mempool0 and mempool1 belong to dpdk memory and
external
memory, respectively.
Signed-off-by: Xuan Ding <xuan.d...@intel.com>
Signed-off-by: Yuan Wang <yuanx.w...@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx...@intel.com>
Reviewed-by: Qi Zhang <qi.z.zh...@intel.com>
---
lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
lib/ethdev/rte_ethdev.h | 48
+++++++++++++++++++++++++++++++++++++++--
2 files changed, 72 insertions(+), 10 deletions(-)
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
29a3d80466..29adcdc2f0 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
rte_eth_rxseg_split *rx_seg,
struct rte_mempool *mpl = rx_seg[seg_idx].mp;
uint32_t length = rx_seg[seg_idx].length;
uint32_t offset = rx_seg[seg_idx].offset;
+ uint16_t proto = rx_seg[seg_idx].proto;
if (mpl == NULL) {
RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13
+1695,29 @@ rte_eth_rx_queue_check_split(const struct
rte_eth_rxseg_split *rx_seg,
}
offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
- length = length != 0 ? length : *mbp_buf_size;
- if (*mbp_buf_size < length + offset) {
- RTE_ETHDEV_LOG(ERR,
- "%s mbuf_data_room_size %u < %u
(segment length=%u + segment offset=%u)\n",
- mpl->name, *mbp_buf_size,
- length + offset, length, offset);
- return -EINVAL;
+ if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
+ /* Check buffer split. */
+ length = length != 0 ? length : *mbp_buf_size;
+ if (*mbp_buf_size < length + offset) {
+ RTE_ETHDEV_LOG(ERR,
+ "%s mbuf_data_room_size %u < %u
(segment length=%u + segment offset=%u)\n",
+ mpl->name, *mbp_buf_size,
+ length + offset, length, offset);
+ return -EINVAL;
+ }
+ } else {
+ /* Check header split. */
+ if (length != 0) {
+ RTE_ETHDEV_LOG(ERR, "segment length
should be set to zero in header split\n");
+ return -EINVAL;
+ }
+ if (*mbp_buf_size < offset) {
+ RTE_ETHDEV_LOG(ERR,
+ "%s mbuf_data_room_size %u < %u
segment offset)\n",
+ mpl->name, *mbp_buf_size,
+ offset);
+ return -EINVAL;
+ }
}
}
return 0;
@@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id,
uint16_t rx_queue_id,
rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
n_seg = rx_conf->rx_nseg;
- if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
{
+ if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
||
+ rx_conf->offloads &
RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
&mbp_buf_size,
&dev_info);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
04cff8ee10..e8371b98ed 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
* - pool from the last valid element
* - the buffer size from this pool
* - zero offset
+ *
+ * Header split is a subset of buffer split. The split happens after
+ the
+ * packet header and before the packet payload. For PMDs that do not
+ * support header split configuration by length, the location of the
+ split
+ * needs to be specified by the header protocol type. While for
+ buffer split,
+ * this field should not be configured.
+ *
+ * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
+ * the PMD will split the received packets into two separate regions:
+ * - The header buffer will be allocated from the memory pool,
+ * specified in the first array element, the second buffer, from the
+ * pool in the second element.
+ *
+ * - The lengths do not need to be configured in header split.
+ *
+ * - The offsets from the segment description elements specify
+ * the data offset from the buffer beginning except the first mbuf.
+ * The first segment offset is added with RTE_PKTMBUF_HEADROOM.
*/
struct rte_eth_rxseg_split {
struct rte_mempool *mp; /**< Memory pool to allocate segment
from. */
uint16_t length; /**< Segment data length, configures split point. */
uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
*/
- uint32_t reserved; /**< Reserved field. */
+ uint16_t proto; /**< header protocol type, configures header split
+point. */
I realize that you don't want to use here enum defined above to save some
reserved space, but description must refer to the enum
rte_eth_rx_header_split_protocol_type.
Thanks for your suggestion, will fix it in next version.
+ uint16_t reserved; /**< Reserved field. */
As far as I can see the structure is experimental. So, it should not be the
problem to extend it, but it is a really good question raised by Stephen in RFC
v1 discussion.
Shouldn't we require that all reserved fields are initialized to zero and
ignored on processing? Frankly speaking I always thought so, but failed to
find the place were it is documented.
Yes, it can be documented. By default is should be zero, and we can configure
it to enable protocol type based buffer split.
@Thomas, @David, @Ferruh?
};
/**
@@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
* A common structure used to describe Rx packet segment properties.
*/
union rte_eth_rxseg {
- /* The settings for buffer split offload. */
+ /* The settings for buffer split and header split offload. */
struct rte_eth_rxseg_split split;
/* The other features settings should be added here. */
};
@@ -1664,6 +1683,31 @@ struct rte_eth_conf {
RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
#define DEV_RX_OFFLOAD_VLAN
RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
RTE_ETH_RX_OFFLOAD_VLAN
+/**
+ * @warning
+ * @b EXPERIMENTAL: this enum may change without prior notice.
+ * This enum indicates the header split protocol type */ enum
+rte_eth_rx_header_split_protocol_type {
+ RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
+ RTE_ETH_RX_HEADER_SPLIT_MAC,
+ RTE_ETH_RX_HEADER_SPLIT_IPV4,
+ RTE_ETH_RX_HEADER_SPLIT_IPV6,
+ RTE_ETH_RX_HEADER_SPLIT_L3,
+ RTE_ETH_RX_HEADER_SPLIT_TCP,
+ RTE_ETH_RX_HEADER_SPLIT_UDP,
+ RTE_ETH_RX_HEADER_SPLIT_SCTP,
+ RTE_ETH_RX_HEADER_SPLIT_L4,
+ RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
+ RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
+ RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
+ RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
+ RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
+ RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
+ RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
+ RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
Enumeration members should be documented. See my question in the patch
description.
Thanks for your detailed comments, questions are answered accordingly.
Best Regards,
Xuan
+};
+
/*
* If new Rx offload capabilities are defined, they also must be
* mentioned in rte_rx_offload_names in rte_ethdev.c file.