> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@xilinx.com> > Sent: Wednesday, August 24, 2022 9:04 PM > To: Ding, Xuan <xuan.d...@intel.com>; Hanumanth Reddy Pothula > <hpoth...@marvell.com>; Thomas Monjalon <tho...@monjalon.net>; Andrew > Rybchenko <andrew.rybche...@oktetlabs.ru> > Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx...@intel.com>; Li, Xiaoyun > <xiaoyun...@intel.com>; step...@networkplumber.org; Wang, YuanX > <yuanx.w...@intel.com>; m...@ashroe.eu; Zhang, Yuying > <yuying.zh...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; > viachesl...@nvidia.com; Jerin Jacob Kollanukkaran <jer...@marvell.com>; > Nithin Kumar Dabilpuram <ndabilpu...@marvell.com> > Subject: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort capability > > External Email > > ----------------------------------------------------------------------
Thanks Ding Xuan and Ferruh Yigit for reviewing the changes and for providing your valuable feedback. Please find responses inline. > On 8/23/2022 4:26 AM, Ding, Xuan wrote: > > Hi Hanumanth, > > > >> -----Original Message----- > >> From: Hanumanth Pothula <hpoth...@marvell.com> > >> Sent: Saturday, August 13, 2022 1:25 AM > >> To: Thomas Monjalon <tho...@monjalon.net>; Ferruh Yigit > >> <ferruh.yi...@xilinx.com>; Andrew Rybchenko > >> <andrew.rybche...@oktetlabs.ru> > >> Cc: dev@dpdk.org; Ding, Xuan <xuan.d...@intel.com>; Wu, WenxuanX > >> <wenxuanx...@intel.com>; Li, Xiaoyun <xiaoyun...@intel.com>; > >> step...@networkplumber.org; Wang, YuanX <yuanx.w...@intel.com>; > >> m...@ashroe.eu; Zhang, Yuying <yuying.zh...@intel.com>; Zhang, Qi Z > >> <qi.z.zh...@intel.com>; viachesl...@nvidia.com; jer...@marvell.com; > >> ndabilpu...@marvell.com; Hanumanth Pothula <hpoth...@marvell.com> > >> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability > >> > >> Presently, the 'Buffer Split' feature supports sending multiple > >> segments of the received packet to PMD, which programs the HW to > >> receive the packet in segments from different pools. > >> > >> This patch extends the feature to support the pool sort capability. > >> Some of the HW has support for choosing memory pools based on the > >> packet's size. The pool sort capability allows PMD to choose a memory > >> pool based on the packet's length. > >> > >> This is often useful for saving the memory where the application can > >> create a different pool to steer the specific size of the packet, > >> thus enabling effective use of memory. > >> > >> For example, let's say HW has a capability of three pools, > >> - pool-1 size is 2K > >> - pool-2 size is > 2K and < 4K > >> - pool-3 size is > 4K > >> Here, > >> pool-1 can accommodate packets with sizes < 2K > >> pool-2 can accommodate packets with sizes > 2K and < 4K > >> pool-3 can accommodate packets with sizes > 4K > >> > >> With pool sort capability enabled in SW, an application may create > >> three pools of different sizes and send them to PMD. Allowing PMD to > >> program HW based on packet lengths. So that packets with less than 2K > >> are received on pool-1, packets with lengths between 2K and 4K are > >> received on pool-2 and finally packets greater than 4K are received on > >> pool- > 3. > >> > >> The following two capabilities are added to the rte_eth_rxseg_capa > >> structure, 1. pool_sort --> tells pool sort capability is supported by HW. > >> 2. max_npool --> max number of pools supported by HW. > >> > >> Defined new structure rte_eth_rxseg_sort, to be used only when pool > >> sort capability is present. If required this may be extended further > >> to support more configurations. > >> > >> Signed-off-by: Hanumanth Pothula <hpoth...@marvell.com> > >> > >> v2: > >> - Along with spec changes, uploading testpmd and driver changes. > > > > Thanks for CCing. It's an interesting feature. > > > > But I have one question here: > > Buffer split is for split receiving packets into multiple segments, > > while pool sort supports PMD to put the receiving packets into different > > pools > according to packet size. > > Every packet is still intact. > > > > So, at this level, pool sort does not belong to buffer split. > > And you already use a different function to check pool sort rather than > > check > buffer split. > > > > Should a new RX offload be introduced? like > "RTE_ETH_RX_OFFLOAD_POOL_SORT". > > Please find my response below. > > Hi Hanumanth, > > I had the similar concern with the feature. I assume you want to benefit from > exiting config structure that gets multiple mempool as argument, since this > feature also needs multiple mempools, but the feature is different. > > It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to decide if to > receive into multiple mempool or not, which doesn't have anything related > split. > Also not sure about using the 'sort' keyword. > What do you think to introduce new fetaure, instead of extending existing > split > one? Actually we thought both BUFFER_SPLIT and POOL_SORT are similar features where RX pools are configured in certain way and thought not use up one more RX offload capability, as the existing software architecture can be extended to support pool_sort capability. Yes, as part of pool sort, there is no buffer split but pools are picked based on the buffer length. Since you think it's better to use new RX offload for POOL_SORT, will go ahead and implement the same. > This is optimisation, right? To enable us to use less memory for the packet > buffer, does it qualify to a device offload? > Yes, its qualify as a device offload and saves memory. Marvel NIC has a capability to receive packets on two different pools based on its length. Below explained more on the same. > > Also, what is the relation with segmented Rx, how a PMD decide to use > segmented Rx or bigger mempool? How can application can configure this? > > Need to clarify the rules, based on your sample, if a 512 bytes packet > received, > does it have to go pool-1, or can it go to any of three pools? > Here, Marvell NIC supports two HW pools, SPB(small packet buffer) pool and LPB(large packet buffer) pool. SPB pool can hold up to 4KB LPB pool can hold anything more than 4KB Smaller packets are received on SPB pool and larger packets on LPB pool, based on the RQ configuration. Here, in our case HW pools holds whole packet. So if a packet is divided into segments, lower layer HW going to receive all segments of the packet and then going to place the whole packet in SPB/LPB pool, based on the packet length. As pools are picked based on the packets length we used SORT term. In case you have any better term(word), please suggest. > > And I don't see any change in the 'net/cnxk' Rx burst code, when > multiple mempool used, while filling the mbufs shouldn't it check which > mempool is filled. How this works without update in the Rx burst code, > or am I missing some implementation detail? > Please find PMD changes in patch [v2,3/3] net/cnxk: introduce pool sort capability Here, in control path, HW pools are programmed based on the inputs it received from the application. Once the HW is programmed, packets are received on HW pools based the packets sizes. > I will upload V3 where POOL_SORT is implemented as new RX OFFLOAD, unless If you have any other suggestion/thoughts. > >> --- > >> lib/ethdev/rte_ethdev.c | 87 +++++++++++++++++++++++++++++++++++----- > - > >> lib/ethdev/rte_ethdev.h | 45 +++++++++++++++++++-- > >> 2 files changed, 118 insertions(+), 14 deletions(-) > >> > >> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index > >> 1979dc0850..7fd5443eb8 100644 > >> --- a/lib/ethdev/rte_ethdev.c > >> +++ b/lib/ethdev/rte_ethdev.c > >> @@ -1635,7 +1635,55 @@ rte_eth_dev_is_removed(uint16_t port_id) } > >> > >> static int > >> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg, > >> +rte_eth_rx_queue_check_sort(const struct rte_eth_rxseg *rx_seg, > >> + uint16_t n_seg, uint32_t *mbp_buf_size, > >> + const struct rte_eth_dev_info *dev_info) { > >> + const struct rte_eth_rxseg_capa *seg_capa = &dev_info- > >>> rx_seg_capa; > >> + uint16_t seg_idx; > >> + > >> + if (!seg_capa->multi_pools || n_seg > seg_capa->max_npool) { > >> + RTE_ETHDEV_LOG(ERR, > >> + "Invalid capabilities, multi_pools:%d differnt > >> length segments %u exceed supported %u\n", > >> + seg_capa->multi_pools, n_seg, seg_capa- > >>> max_nseg); > >> + return -EINVAL; > >> + } > >> + > >> + for (seg_idx = 0; seg_idx < n_seg; seg_idx++) { > >> + struct rte_mempool *mpl = rx_seg[seg_idx].sort.mp; > >> + uint32_t length = rx_seg[seg_idx].sort.length; > >> + > >> + if (mpl == NULL) { > >> + RTE_ETHDEV_LOG(ERR, "null mempool pointer\n"); > >> + return -EINVAL; > >> + } > >> + > >> + if (mpl->private_data_size < > >> + sizeof(struct rte_pktmbuf_pool_private)) { > >> + RTE_ETHDEV_LOG(ERR, > >> + "%s private_data_size %u < %u\n", > >> + mpl->name, mpl->private_data_size, > >> + (unsigned int)sizeof > >> + (struct rte_pktmbuf_pool_private)); > >> + return -ENOSPC; > >> + } > >> + > >> + *mbp_buf_size = rte_pktmbuf_data_room_size(mpl); > >> + length = length != 0 ? length : (*mbp_buf_size - > >> RTE_PKTMBUF_HEADROOM); > >> + if (*mbp_buf_size < length + RTE_PKTMBUF_HEADROOM) { > >> + RTE_ETHDEV_LOG(ERR, > >> + "%s mbuf_data_room_size %u < %u))\n", > >> + mpl->name, *mbp_buf_size, > >> + length); > >> + return -EINVAL; > >> + } > >> + } > >> + > >> + return 0; > >> +} > >> + > >> +static int > >> +rte_eth_rx_queue_check_split(const struct rte_eth_rxseg *rx_seg, > >> uint16_t n_seg, uint32_t *mbp_buf_size, > >> const struct rte_eth_dev_info *dev_info) { @@ - > >> 1654,12 +1702,12 @@ rte_eth_rx_queue_check_split(const struct > >> rte_eth_rxseg_split *rx_seg, > >> * Check the sizes and offsets against buffer sizes > >> * for each segment specified in extended configuration. > >> */ > >> - mp_first = rx_seg[0].mp; > >> + mp_first = rx_seg[0].split.mp; > >> offset_mask = RTE_BIT32(seg_capa->offset_align_log2) - 1; > >> for (seg_idx = 0; seg_idx < n_seg; seg_idx++) { > >> - struct rte_mempool *mpl = rx_seg[seg_idx].mp; > >> - uint32_t length = rx_seg[seg_idx].length; > >> - uint32_t offset = rx_seg[seg_idx].offset; > >> + struct rte_mempool *mpl = rx_seg[seg_idx].split.mp; > >> + uint32_t length = rx_seg[seg_idx].split.length; > >> + uint32_t offset = rx_seg[seg_idx].split.offset; > >> > >> if (mpl == NULL) { > >> RTE_ETHDEV_LOG(ERR, "null mempool pointer\n"); > >> @@ -1693,7 +1741,11 @@ rte_eth_rx_queue_check_split(const struct > >> rte_eth_rxseg_split *rx_seg, > >> } > >> offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM; > >> *mbp_buf_size = rte_pktmbuf_data_room_size(mpl); > >> - length = length != 0 ? length : *mbp_buf_size; > >> + /* On segment length == 0, update segment's length with > >> + * the pool's length - headeroom space, to make sure enough > >> + * space is accomidate for header. > >> + **/ > >> + length = length != 0 ? length : (*mbp_buf_size - > >> +RTE_PKTMBUF_HEADROOM); > >> if (*mbp_buf_size < length + offset) { > >> RTE_ETHDEV_LOG(ERR, > >> "%s mbuf_data_room_size %u < %u > >> (segment length=%u + segment offset=%u)\n", @@ -1764,7 +1816,6 @@ > >> rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id, > >> return -EINVAL; > >> } > >> } else { > >> - const struct rte_eth_rxseg_split *rx_seg; > >> uint16_t n_seg; > >> > >> /* Extended multi-segment configuration check. */ @@ - > >> 1774,13 +1825,27 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t > >> rx_queue_id, > >> return -EINVAL; > >> } > >> > >> - rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg; > >> n_seg = rx_conf->rx_nseg; > >> > >> if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) > >> { > >> - ret = rte_eth_rx_queue_check_split(rx_seg, n_seg, > >> - &mbp_buf_size, > >> - &dev_info); > >> + ret = -1; /* To make sure at least one of below > >> conditions becomes > >> +true */ > >> + > >> + /* Check both NIX and application supports buffer- > >> split capability */ > >> + if (dev_info.rx_seg_capa.mode_split && > >> + rx_conf->mode_flag == > >> RTE_ETH_RXSEG_MODE_SPLIT) { > >> + ret = rte_eth_rx_queue_check_split(rx_conf- > >>> rx_seg, n_seg, > >> + > >> &mbp_buf_size, > >> + &dev_info); > >> + } > >> + > >> + /* Check both NIX and application supports pool-sort > >> capability */ > >> + if (dev_info.rx_seg_capa.mode_sort && > >> + rx_conf->mode_flag == > >> RTE_ETH_RXSEG_MODE_SORT) { > >> + ret = rte_eth_rx_queue_check_sort(rx_conf- > >>> rx_seg, n_seg, > >> + > >> &mbp_buf_size, > >> + &dev_info); > >> + } > >> + > >> if (ret != 0) > >> return ret; > >> } else { > >> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index > >> de9e970d4d..9f6787d7ad 100644 > >> --- a/lib/ethdev/rte_ethdev.h > >> +++ b/lib/ethdev/rte_ethdev.h > >> @@ -1204,16 +1204,46 @@ struct rte_eth_rxseg_split { > >> uint32_t reserved; /**< Reserved field. */ }; > >> > >> +/** > >> + * The pool sort capability allows PMD to choose a memory pool based on > >> +the > >> + * packet's length. So, basically, PMD programs HW for receiving > >> +packets from > >> + * different pools, based on the packet's length. > >> + * > >> + * This is often useful for saving the memory where the application can > >> +create > >> + * a different pool to steer the specific size of the packet, thus > >> +enabling > >> + * effective use of memory. > >> + */ > >> +struct rte_eth_rxseg_sort { > >> + struct rte_mempool *mp; /**< Memory pool to allocate packets > >> from. */ > >> + uint16_t length; /**< Packet data length. */ > >> + uint32_t reserved; /**< Reserved field. */ }; > >> + > >> +enum rte_eth_rxseg_mode { > >> + /** > >> + * Buffer split mode: PMD split the received packets into multiple > >> segments. > >> + * @see struct rte_eth_rxseg_split > >> + */ > >> + RTE_ETH_RXSEG_MODE_SPLIT = RTE_BIT64(0), > >> + /** > >> + * Pool sort mode: PMD to chooses a memory pool based on the > >> packet's length. > >> + * @see struct rte_eth_rxseg_sort > >> + */ > >> + RTE_ETH_RXSEG_MODE_SORT = RTE_BIT64(1), }; > >> + > >> /** > >> * @warning > >> * @b EXPERIMENTAL: this structure may change without prior notice. > >> * > >> * A common structure used to describe Rx packet segment properties. > >> */ > >> -union rte_eth_rxseg { > >> +struct rte_eth_rxseg { > >> /* The settings for buffer split offload. */ > >> struct rte_eth_rxseg_split split; > >> - /* The other features settings should be added here. */ > >> + > >> + /*The settings for packet sort offload. */ > >> + struct rte_eth_rxseg_sort sort; > >> }; > >> > >> /** > >> @@ -1239,6 +1269,11 @@ struct rte_eth_rxconf { > >> * fields on rte_eth_dev_info structure are allowed to be set. > >> */ > >> uint64_t offloads; > >> + /** > >> + * PMD may support more than one rxseg mode. This allows > >> application > >> + * to chose which mode to enable. > >> + */ > >> + enum rte_eth_rxseg_mode mode_flag; > >> /** > >> * Points to the array of segment descriptions for an entire packet. > >> * Array elements are properties for consecutive Rx segments. > >> @@ -1246,7 +1281,7 @@ struct rte_eth_rxconf { > >> * The supported capabilities of receiving segmentation is reported > >> * in rte_eth_dev_info.rx_seg_capa field. > >> */ > >> - union rte_eth_rxseg *rx_seg; > >> + struct rte_eth_rxseg *rx_seg; > >> > >> uint64_t reserved_64s[2]; /**< Reserved for future fields */ > >> void *reserved_ptrs[2]; /**< Reserved for future fields */ > >> @@ -1827,10 +1862,14 @@ struct rte_eth_switch_info { > >> */ > >> struct rte_eth_rxseg_capa { > >> __extension__ > >> + uint32_t mode_split : 1; /**< Supports buffer split capability @see > >> struct rte_eth_rxseg_split */ > >> + uint32_t mode_sort : 1; /**< Supports pool sort capability @see > >> struct > > > > The same doubt here. As I know, the 'rte_eth_rxseg_capa' structure is used > > for > buffer split. > > > > Thanks, > > Xuan > > > >> +rte_eth_rxseg_sort */ > >> uint32_t multi_pools:1; /**< Supports receiving to multiple pools.*/ > >> uint32_t offset_allowed:1; /**< Supports buffer offsets. */ > >> uint32_t offset_align_log2:4; /**< Required offset alignment. */ > >> uint16_t max_nseg; /**< Maximum amount of segments to split. */ > >> + /* < Maximum amount of pools that PMD can sort based on > >> packet/segment lengths */ > >> + uint16_t max_npool; > >> uint16_t reserved; /**< Reserved field. */ }; > >> > >> -- > >> 2.25.1 > >