Re: [PATCH v2] app/testpmd: add flush multicast MAC address command
On 2023/9/29 1:23, Ferruh Yigit wrote: > On 8/2/2023 7:23 AM, Dengdui Huang wrote: >> Add command to flush multicast MAC address >> Usage: >> mcast_addr flush : >> flush multicast MAC address on port_id >> >> Signed-off-by: Dengdui Huang >> --- >> app/test-pmd/cmdline.c | 43 + >> app/test-pmd/config.c | 18 + >> app/test-pmd/testpmd.h | 1 + >> doc/guides/testpmd_app_ug/testpmd_funcs.rst | 8 >> 4 files changed, 70 insertions(+) >> >> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >> index 0d0723f659..d5a3bd2287 100644 >> --- a/app/test-pmd/cmdline.c >> +++ b/app/test-pmd/cmdline.c >> @@ -516,6 +516,9 @@ static void cmd_help_long_parsed(void *parsed_result, >> "set allmulti (port_id|all) (on|off)\n" >> "Set the allmulti mode on port_id, or all.\n\n" >> >> +"mcast_addr flush (port_id)\n" >> +"To flush the set of multicast addresses.\n\n" >> +> > > I assume 'set of multicast address' comes from the API getting > 'mc_addr_set' as parameter, but using 'set' is not useful in the command > description, what about: > "Flush all multicast MAC addresses on port_id." > > Similarly there are logs/documents below refers as "set of multicast > addresses", does it make sense to replace all as "all multicast MAC > addresses"? > > > And relating to the ordering, as you said "mcast_addr add|remove ..." is > missing, which is contributing to the confusion. > Can you please add this first (in a separate patch) to just below > "mac_addr .*" group, later put "mcast_addr flush .*' below it in this patch? > > >> "set flow_ctrl rx (on|off) tx (on|off) (high_water)" >> " (low_water) (pause_time) (send_xon) >> mac_ctrl_frame_fwd" >> " (on|off) autoneg (on|off) (port_id)\n" >> @@ -8561,6 +8564,45 @@ static cmdline_parse_inst_t cmd_mcast_addr = { >> }, >> }; >> >> +/* *** FLUSH MULTICAST MAC ADDRESS ON PORT *** */ >> +struct cmd_mcast_addr_flush_result { >> +cmdline_fixed_string_t mcast_addr_cmd; >> +cmdline_fixed_string_t what; >> +uint16_t port_num; >> +}; >> + >> +static void cmd_mcast_addr_flush_parsed(void *parsed_result, >> +__rte_unused struct cmdline *cl, >> +__rte_unused void *data) >> +{ >> +struct cmd_mcast_addr_flush_result *res = parsed_result; >> + >> +mcast_addr_flush(res->port_num); >> +} >> + >> +static cmdline_parse_token_string_t cmd_mcast_addr_flush_cmd = >> +TOKEN_STRING_INITIALIZER(struct cmd_mcast_addr_result, >> + mcast_addr_cmd, "mcast_addr"); >> +static cmdline_parse_token_string_t cmd_mcast_addr_flush_what = >> +TOKEN_STRING_INITIALIZER(struct cmd_mcast_addr_result, what, >> + "flush"); >> +static cmdline_parse_token_num_t cmd_mcast_addr_flush_portnum = >> +TOKEN_NUM_INITIALIZER(struct cmd_mcast_addr_result, port_num, >> + RTE_UINT16); >> + >> +static cmdline_parse_inst_t cmd_mcast_addr_flush = { >> +.f = cmd_mcast_addr_flush_parsed, >> +.data = (void *)0, >> +.help_str = "mcast_addr flush : " >> +"flush multicast MAC address on port_id",> > > What about: > "Flush all multicast MAC addresses on port_id." > > >> +.tokens = { >> +(void *)&cmd_mcast_addr_flush_cmd, >> +(void *)&cmd_mcast_addr_flush_what, >> +(void *)&cmd_mcast_addr_flush_portnum, >> +NULL, >> +}, >> +}; >> + >> /* vf vlan anti spoof configuration */ >> >> /* Common result structure for vf vlan anti spoof */ >> @@ -12929,6 +12971,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = { >> (cmdline_parse_inst_t *)&cmd_set_port_meter_stats_mask, >> (cmdline_parse_inst_t *)&cmd_show_port_meter_stats, >> (cmdline_parse_inst_t *)&cmd_mcast_addr, >> +(cmdline_parse_inst_t *)&cmd_mcast_addr_flush, >> (cmdline_parse_inst_t *)&cmd_set_vf_vlan_anti_spoof, >> (cmdline_parse_inst_t *)&cmd_set_vf_mac_anti_spoof, >> (cmdline_parse_inst_t *)&cmd_set_vf_vlan_stripq, >> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c >> index 11f3a22048..d66d1db37c 100644 >> --- a/app/test-pmd/config.c >> +++ b/app/test-pmd/config.c >> @@ -6802,6 +6802,24 @@ mcast_addr_remove(portid_t port_id, struct >> rte_ether_addr *mc_addr) >> mcast_addr_pool_append(port, mc_addr); >> } >> >> +void >> +mcast_addr_flush(portid_t port_id) >> +{ >> +int ret; >> + >> +if (port_id_is_invalid(port_id, ENABLED_WARN)) >> +return; >> + >> +ret = rte_eth_dev_set_mc_addr_list(port_id, NULL, 0); >> +if (ret != 0) { >> +fprintf(stderr, >> +"Failed to flush the set of filtered addresses on port >> %u\n", >> +port_id); >> +return; >> +} >> +
Re: [PATCH v3 1/2] app/testpmd: fix help string
On 2023/10/8 10:47, fengchengwen wrote: > Hi Dengdui, > > On 2023/10/8 9:56, Dengdui Huang wrote: >> Command help string is missing 'mcast_addr add|remove'. >> This patch add it. >> >> Fixes: 8fff667578a7 ("app/testpmd: new command to add/remove multicast MAC >> addresses") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang >> --- >> app/test-pmd/cmdline.c | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >> index a0e97719b3..3165347a05 100644 >> --- a/app/test-pmd/cmdline.c >> +++ b/app/test-pmd/cmdline.c >> @@ -516,6 +516,12 @@ static void cmd_help_long_parsed(void *parsed_result, >> "set allmulti (port_id|all) (on|off)\n" >> "Set the allmulti mode on port_id, or all.\n\n" >> >> +"mcast_addr add (port_id) (mcast_addr)\n" >> +"Add a multicast MAC addresses on port_id.\n\n" >> + >> +"mcast_addr remove (port_id) (mcast_addr)" > > Please add '\n' at last. > > with above fixed, > Acked-by: Chengwen Feng > OK, thanks >> +"Remove a multicast MAC address from port_id.\n\n" >> + >> "set flow_ctrl rx (on|off) tx (on|off) (high_water)" >> " (low_water) (pause_time) (send_xon) >> mac_ctrl_frame_fwd" >> " (on|off) autoneg (on|off) (port_id)\n" >>
Re: [PATCH v6 1/2] bus/pci: fix secondary process PCI uio resource map problem
On 2024/7/2 15:40, Chaoyong He wrote: > From: Zerun Fu > > For the primary process, the logic loops all BARs and will skip > the map of BAR with an invalid physical address (0), also will > assign 'uio_res->nb_maps' with the real mapped BARs number. But > for the secondary process, instead of loops all BARs, the logic > using the 'uio_res->nb_map' as index. If the device uses continuous > BARs there will be no problem, whereas if it uses discrete BARs, > it will lead to mapping errors. > > Fix this problem by also loops all BARs and skip the map of BAR > with an invalid physical address in secondary process. > > Fixes: 9b957f378abf ("pci: merge uio functions for linux and bsd") Fixes: e6cf7bee1c77 ("bus/pci: fix UIO resource access from secondary process") > Cc: muk...@igel.co.jp > Cc: sta...@dpdk.org > > Signed-off-by: Zerun Fu > Reviewed-by: Chaoyong He > Reviewed-by: Long Wu > Reviewed-by: Peng Zhang > Acked-by: Anatoly Burakov Tested-by: Dengdui Huang
Re: [PATCH] app/testpmd: fix segment fault with invalid queue id
On 2023/5/16 23:12, Stephen Hemminger wrote: > On Tue, 16 May 2023 19:00:21 +0800 > Dengdui Huang wrote: > >> When input queue id is invalid, it will lead to >> Segmentation fault, like: >> >> dpdk-testpmd -a :01:00.0 -- -i >> testpmd> show port 0 txq/rxq 99 desc 0 status >> Segmentation fault >> >> dpdk-testpmd -a :01:00.0 -- -i >> testpmd> show port 0 rxq 99 desc used count >> Segmentation fault >> >> This patch fixes it. >> >> In addition, this patch add the check for the offset >> of the descriptor in case of other anomalies. >> >> Fixes: fae9aa717d6c ("app/testpmd: support checking descriptor status") >> Fixes: 3f9acb5c83bb ("ethdev: avoid non-dataplane checks in Rx queue count") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang > > What is the backtrace and device driver? The problem is that other users > besides testpmd might hit same problem. > > It would make sense to have a function to test for valid rx and tx queue id > in rte_ethdev. Similar to existing rte_eth_dev_is_valid_port() rather than > open coding it. Maybe rte_eth_dev_is_valid_rxq(port_id, queue_id)? > > here was talk that the existing rx queue descriptor status is racy, and > unused by any real application; and therefore would be good candidate for > future removal. Hi, Stephen OK. Agreed. But these APIs are still in dpdk, so testpmd as user side should use them correctly. Thanks, Dengdui
Re: [PATCH v2 1/2] ethdev: add API to check queue ID validity
On 2023/5/22 21:58, Andrew Rybchenko wrote: > On 5/22/23 16:09, Dengdui Huang wrote: >> The API rte_eth_dev_is_valid_rxq/txq checks >> the port ID validity and then the Rx/Tx queue ID is valid. > > What is valid Tx/Rx queue? It depends on on caller > expectations. Some functions are satisfied with just > check vs configured number of queues. Some require > the queue to be setup. May be some should require > the queue to be started. > > So, I suggest to avoid term "valid" and be more precise > here and API naming. > Hi Andrew, Thanks for your review, Accepted and fixed in v3. >> >> Signed-off-by: Dengdui Huang >> --- >> doc/guides/rel_notes/release_23_07.rst | 5 >> lib/ethdev/rte_ethdev.c | 30 + >> lib/ethdev/rte_ethdev.h | 36 ++ >> lib/ethdev/version.map | 4 +++ >> 4 files changed, 75 insertions(+) >> >> diff --git a/doc/guides/rel_notes/release_23_07.rst >> b/doc/guides/rel_notes/release_23_07.rst >> index a9b1293689..19e645156f 100644 >> --- a/doc/guides/rel_notes/release_23_07.rst >> +++ b/doc/guides/rel_notes/release_23_07.rst >> @@ -56,6 +56,11 @@ New Features >> === >> +* **Added ethdev Rx/Tx queue id check API.** >> + >> + Added ethdev Rx/Tx queue id check API which provides functions > > id -> ID > >> + for check if Rx/Tx queue id is valid. > > id -> ID > >> + > > It should be two empty lines here and just one above. > Thanks, will fix in v3 >> Removed Items >> - >> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c >> index 4d03255683..3d85218127 100644 >> --- a/lib/ethdev/rte_ethdev.c >> +++ b/lib/ethdev/rte_ethdev.c >> @@ -407,6 +407,36 @@ rte_eth_dev_is_valid_port(uint16_t port_id) >> return is_valid; >> } >> +int >> +rte_eth_dev_is_valid_rxq(uint16_t port_id, uint16_t queue_id) >> +{ >> + struct rte_eth_dev *dev; >> + >> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); >> + dev = &rte_eth_devices[port_id]; >> + >> + if (queue_id >= dev->data->nb_rx_queues || >> + dev->data->rx_queues[queue_id] == NULL) > > We already have internal eth_dev_validate_tx_queue(). Shouldn't > it be used here? > Thanks, will do in v3. > Also, some functions check that queues array is not NULL. > If the the is excessive after queue number check, it > should be consistent everywhere and corresponding check > of the array pointer vs NULL should be removed in a separate > cleanup patch. If the check is required in some corner cases > (I hope no), it should be here as well. > I cannot understand what you mean by that. >> + return -EINVAL; >> + >> + return 0; >> +} > > [snip] >
Re: [PATCH v2 1/2] ethdev: add API to check queue ID validity
On 2023/6/2 6:13, Ferruh Yigit wrote: > On 5/31/2023 5:31 PM, Ferruh Yigit wrote: >> On 5/22/2023 2:58 PM, Andrew Rybchenko wrote: >>> On 5/22/23 16:09, Dengdui Huang wrote: The API rte_eth_dev_is_valid_rxq/txq checks the port ID validity and then the Rx/Tx queue ID is valid. >>> >>> What is valid Tx/Rx queue? It depends on on caller >>> expectations. Some functions are satisfied with just >>> check vs configured number of queues. Some require >>> the queue to be setup. May be some should require >>> the queue to be started. >>> >>> So, I suggest to avoid term "valid" and be more precise >>> here and API naming. >>> >> >> I understand the concern 'valid' keyword, but we already have an API as >> 'rte_eth_dev_is_valid_port()', which does similar checks, >> >> so 'rte_eth_dev_is_valid_rxq()' & 'rte_eth_dev_is_valid_txq()' looks >> consistent with it. >> >> v3 has API names, 'rte_eth_dev_rxq_avail()' & 'rte_eth_dev_txq_avail()', >> I am not sure about these naming too, it feels like queues are valid but >> it maybe in available and not available states. >> >> >> @Andrew, do you have any suggestion on the API naming? >> If not I am for going with rte_eth_dev_is_valid_rxq()' & >> 'rte_eth_dev_is_valid_txq()' mainly because of existing >> 'rte_eth_dev_is_valid_port()' API. >> >> Perhaps we can elaborate what 'valid' means in API documentation to help >> users. >> > > Hi Dengdui, > > It looks like there is no better suggestion, lets not block this patch > more and continue with > 'rte_eth_dev_is_valid_rxq()' & 'rte_eth_dev_is_valid_txq()' API names. > > Can you please send a v4, with changes in v3 but API names as above, and > more description in the API documentation for what 'valid' means? > Hi Ferruh, Thanks for your review, I will do in v4. > Signed-off-by: Dengdui Huang --- doc/guides/rel_notes/release_23_07.rst | 5 lib/ethdev/rte_ethdev.c | 30 + lib/ethdev/rte_ethdev.h | 36 ++ lib/ethdev/version.map | 4 +++ 4 files changed, 75 insertions(+) diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst index a9b1293689..19e645156f 100644 --- a/doc/guides/rel_notes/release_23_07.rst +++ b/doc/guides/rel_notes/release_23_07.rst @@ -56,6 +56,11 @@ New Features === +* **Added ethdev Rx/Tx queue id check API.** + + Added ethdev Rx/Tx queue id check API which provides functions >>> >>> id -> ID >>> + for check if Rx/Tx queue id is valid. >>> >>> id -> ID >>> + >>> >>> It should be two empty lines here and just one above. >>> Removed Items - diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index 4d03255683..3d85218127 100644 --- a/lib/ethdev/rte_ethdev.c +++ b/lib/ethdev/rte_ethdev.c @@ -407,6 +407,36 @@ rte_eth_dev_is_valid_port(uint16_t port_id) return is_valid; } +int +rte_eth_dev_is_valid_rxq(uint16_t port_id, uint16_t queue_id) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = &rte_eth_devices[port_id]; + + if (queue_id >= dev->data->nb_rx_queues || + dev->data->rx_queues[queue_id] == NULL) >>> >>> We already have internal eth_dev_validate_tx_queue(). Shouldn't >>> it be used here? >>> >>> Also, some functions check that queues array is not NULL. >>> If the the is excessive after queue number check, it >>> should be consistent everywhere and corresponding check >>> of the array pointer vs NULL should be removed in a separate >>> cleanup patch. If the check is required in some corner cases >>> (I hope no), it should be here as well. >>> + return -EINVAL; + + return 0; +} >>> >>> [snip] >>> >> >
Re: [PATCH] net/hns3: fix Rx packet truncation when KEEP CRC enabled
On 2024/2/27 0:43, Ferruh Yigit wrote: > On 2/26/2024 3:16 AM, Jie Hai wrote: >> On 2024/2/23 21:53, Ferruh Yigit wrote: >>> On 2/20/2024 3:58 AM, Jie Hai wrote: Hi, Ferruh, Thanks for your review. On 2024/2/7 22:15, Ferruh Yigit wrote: > On 2/6/2024 1:10 AM, Jie Hai wrote: >> From: Dengdui Huang >> >> When KEEP_CRC offload is enabled, some packets will be truncated and >> the CRC is still be stripped in following cases: >> 1. For HIP08 hardware, the packet type is TCP and the length >> is less than or equal to 60B. >> 2. For other hardwares, the packet type is IP and the length >> is less than or equal to 60B. >> > > If a device doesn't support the offload by some packets, it can be > option to disable offload for that device, instead of calculating it in > software and append it. The KEEP CRC feature of hns3 is faulty only in the specific packet type and small packet(<60B) case. What's more, the small ethernet packet is not common. > Unless you have a specific usecase, or requirement to support the > offload. Yes, some users of hns3 are already using this feature. So we cannot drop this offload > <...> > >> @@ -2492,10 +2544,16 @@ hns3_recv_pkts_simple(void *rx_queue, >> goto pkt_err; >> rxm->packet_type = hns3_rx_calc_ptype(rxq, l234_info, >> ol_info); >> - >> if (rxm->packet_type == RTE_PTYPE_L2_ETHER_TIMESYNC) >> rxm->ol_flags |= RTE_MBUF_F_RX_IEEE1588_PTP; >> + if (unlikely(rxq->crc_len > 0)) { >> + if (hns3_need_recalculate_crc(rxq, rxm)) >> + hns3_recalculate_crc(rxq, rxm); >> + rxm->pkt_len -= rxq->crc_len; >> + rxm->data_len -= rxq->crc_len; >> > > Removing 'crc_len' from 'mbuf->pkt_len' & 'mbuf->data_len' is > practically same as stripping CRC. > > We don't count CRC length in the statistics, but it should be > accessible > in the payload by the user. Our drivers are behaving exactly as you say. >>> >>> If so I missed why mbuf 'pkt_len' and 'data_len' reduced by >>> 'rxq->crc_len', can you please explain what above lines does? >>> >>> >> @@ -2470,8 +2523,7 @@ hns3_recv_pkts_simple(void *rx_queue, >> rxdp->rx.bd_base_info = 0; >> >> rxm->data_off = RTE_PKTMBUF_HEADROOM; >> - rxm->pkt_len = (uint16_t)(rte_le_to_cpu_16(rxd.rx.pkt_len)) - >> - rxq->crc_len; >> + rxm->pkt_len = rte_le_to_cpu_16(rxd.rx.pkt_len); >> >> In the previous code above, the 'pkt_len' is set to the length obtained >> from the BD. the length obtained from the BD already contains CRC length. >> But as you said above, the DPDK requires that the length of the mbuf >> does not contain CRC length . So we subtract 'rxq->crc_len' from >> mbuf'pkt_len' and 'data_len'. This patch doesn't change the logic, it >> just moves the code around. >> > > Nope, I am not saying mbuf length shouldn't contain CRC length, indeed > it is other way around and this is our confusion. > > CRC length shouldn't be in the statistics, I mean in received bytes stats. > Assume that received packet is 128 bytes and we know it has the CRC, > Rx received bytes stat should be 124 (rx_bytes = 128 - CRC = 124) > > But mbuf->data_len & mbuf->pkt_len should have full frame length, > including CRC. > > As application explicitly requested to KEEP CRC, it will know last 4 > bytes are CRC. > Anything after 'mbuf->data_len' in the mbuf buffer is not valid, so if > you reduce 'mbuf->data_len' by CRC size, application can't know if 4 > bytes after 'mbuf->data_len' is valid CRC or not. > I agree with you. But the implementation of other PMDs supported KEEP_CRC is like this. In addition, there are probably many users that are already using it. If we modify it, it may cause applications incompatible. what do you think?
Re: [PATCH] net/hns3: fix Rx packet truncation when KEEP CRC enabled
On 2024/2/28 21:07, Ferruh Yigit wrote: > On 2/28/2024 2:27 AM, huangdengdui wrote: >> >> >> On 2024/2/27 0:43, Ferruh Yigit wrote: >>> On 2/26/2024 3:16 AM, Jie Hai wrote: >>>> On 2024/2/23 21:53, Ferruh Yigit wrote: >>>>> On 2/20/2024 3:58 AM, Jie Hai wrote: >>>>>> Hi, Ferruh, >>>>>> >>>>>> Thanks for your review. >>>>>> >>>>>> On 2024/2/7 22:15, Ferruh Yigit wrote: >>>>>>> On 2/6/2024 1:10 AM, Jie Hai wrote: >>>>>>>> From: Dengdui Huang >>>>>>>> >>>>>>>> When KEEP_CRC offload is enabled, some packets will be truncated and >>>>>>>> the CRC is still be stripped in following cases: >>>>>>>> 1. For HIP08 hardware, the packet type is TCP and the length >>>>>>>> is less than or equal to 60B. >>>>>>>> 2. For other hardwares, the packet type is IP and the length >>>>>>>> is less than or equal to 60B. >>>>>>>> >>>>>>> >>>>>>> If a device doesn't support the offload by some packets, it can be >>>>>>> option to disable offload for that device, instead of calculating it in >>>>>>> software and append it. >>>>>> >>>>>> The KEEP CRC feature of hns3 is faulty only in the specific packet >>>>>> type and small packet(<60B) case. >>>>>> What's more, the small ethernet packet is not common. >>>>>> >>>>>>> Unless you have a specific usecase, or requirement to support the >>>>>>> offload. >>>>>> >>>>>> Yes, some users of hns3 are already using this feature. >>>>>> So we cannot drop this offload >>>>>> >>>>>>> <...> >>>>>>> >>>>>>>> @@ -2492,10 +2544,16 @@ hns3_recv_pkts_simple(void *rx_queue, >>>>>>>> goto pkt_err; >>>>>>>> rxm->packet_type = hns3_rx_calc_ptype(rxq, l234_info, >>>>>>>> ol_info); >>>>>>>> - >>>>>>>> if (rxm->packet_type == RTE_PTYPE_L2_ETHER_TIMESYNC) >>>>>>>> rxm->ol_flags |= RTE_MBUF_F_RX_IEEE1588_PTP; >>>>>>>> + if (unlikely(rxq->crc_len > 0)) { >>>>>>>> + if (hns3_need_recalculate_crc(rxq, rxm)) >>>>>>>> + hns3_recalculate_crc(rxq, rxm); >>>>>>>> + rxm->pkt_len -= rxq->crc_len; >>>>>>>> + rxm->data_len -= rxq->crc_len; >>>>>>>> >>>>>>> >>>>>>> Removing 'crc_len' from 'mbuf->pkt_len' & 'mbuf->data_len' is >>>>>>> practically same as stripping CRC. >>>>>>> >>>>>>> We don't count CRC length in the statistics, but it should be >>>>>>> accessible >>>>>>> in the payload by the user. >>>>>> Our drivers are behaving exactly as you say. >>>>>> >>>>> >>>>> If so I missed why mbuf 'pkt_len' and 'data_len' reduced by >>>>> 'rxq->crc_len', can you please explain what above lines does? >>>>> >>>>> >>>> @@ -2470,8 +2523,7 @@ hns3_recv_pkts_simple(void *rx_queue, >>>> rxdp->rx.bd_base_info = 0; >>>> >>>> rxm->data_off = RTE_PKTMBUF_HEADROOM; >>>> - rxm->pkt_len = (uint16_t)(rte_le_to_cpu_16(rxd.rx.pkt_len)) - >>>> - rxq->crc_len; >>>> + rxm->pkt_len = rte_le_to_cpu_16(rxd.rx.pkt_len); >>>> >>>> In the previous code above, the 'pkt_len' is set to the length obtained >>>> from the BD. the length obtained from the BD already contains CRC length. >>>> But as you said above, the DPDK requires that the length of the mbuf >>>> does not contain CRC length . So we subtract 'rxq->crc_len' from >>>> mbuf'pkt_len' and 'data_len'. This patch doesn't change the logic, it >>>> just moves the code around. >>>> >>> >>> Nope, I am
Re: [PATCH] net/hns3: fix Rx packet truncation when KEEP CRC enabled
On 2024/2/29 17:25, Ferruh Yigit wrote: > On 2/29/2024 3:58 AM, huangdengdui wrote: >> >> >> On 2024/2/28 21:07, Ferruh Yigit wrote: >>> On 2/28/2024 2:27 AM, huangdengdui wrote: >>>> >>>> >>>> On 2024/2/27 0:43, Ferruh Yigit wrote: >>>>> On 2/26/2024 3:16 AM, Jie Hai wrote: >>>>>> On 2024/2/23 21:53, Ferruh Yigit wrote: >>>>>>> On 2/20/2024 3:58 AM, Jie Hai wrote: >>>>>>>> Hi, Ferruh, >>>>>>>> >>>>>>>> Thanks for your review. >>>>>>>> >>>>>>>> On 2024/2/7 22:15, Ferruh Yigit wrote: >>>>>>>>> On 2/6/2024 1:10 AM, Jie Hai wrote: >>>>>>>>>> From: Dengdui Huang >>>>>>>>>> >>>>>>>>>> When KEEP_CRC offload is enabled, some packets will be truncated and >>>>>>>>>> the CRC is still be stripped in following cases: >>>>>>>>>> 1. For HIP08 hardware, the packet type is TCP and the length >>>>>>>>>> is less than or equal to 60B. >>>>>>>>>> 2. For other hardwares, the packet type is IP and the length >>>>>>>>>> is less than or equal to 60B. >>>>>>>>>> >>>>>>>>> >>>>>>>>> If a device doesn't support the offload by some packets, it can be >>>>>>>>> option to disable offload for that device, instead of calculating it >>>>>>>>> in >>>>>>>>> software and append it. >>>>>>>> >>>>>>>> The KEEP CRC feature of hns3 is faulty only in the specific packet >>>>>>>> type and small packet(<60B) case. >>>>>>>> What's more, the small ethernet packet is not common. >>>>>>>> >>>>>>>>> Unless you have a specific usecase, or requirement to support the >>>>>>>>> offload. >>>>>>>> >>>>>>>> Yes, some users of hns3 are already using this feature. >>>>>>>> So we cannot drop this offload >>>>>>>> >>>>>>>>> <...> >>>>>>>>> >>>>>>>>>> @@ -2492,10 +2544,16 @@ hns3_recv_pkts_simple(void *rx_queue, >>>>>>>>>> goto pkt_err; >>>>>>>>>> rxm->packet_type = hns3_rx_calc_ptype(rxq, l234_info, >>>>>>>>>> ol_info); >>>>>>>>>> - >>>>>>>>>> if (rxm->packet_type == RTE_PTYPE_L2_ETHER_TIMESYNC) >>>>>>>>>> rxm->ol_flags |= RTE_MBUF_F_RX_IEEE1588_PTP; >>>>>>>>>> + if (unlikely(rxq->crc_len > 0)) { >>>>>>>>>> + if (hns3_need_recalculate_crc(rxq, rxm)) >>>>>>>>>> + hns3_recalculate_crc(rxq, rxm); >>>>>>>>>> + rxm->pkt_len -= rxq->crc_len; >>>>>>>>>> + rxm->data_len -= rxq->crc_len; >>>>>>>>>> >>>>>>>>> >>>>>>>>> Removing 'crc_len' from 'mbuf->pkt_len' & 'mbuf->data_len' is >>>>>>>>> practically same as stripping CRC. >>>>>>>>> >>>>>>>>> We don't count CRC length in the statistics, but it should be >>>>>>>>> accessible >>>>>>>>> in the payload by the user. >>>>>>>> Our drivers are behaving exactly as you say. >>>>>>>> >>>>>>> >>>>>>> If so I missed why mbuf 'pkt_len' and 'data_len' reduced by >>>>>>> 'rxq->crc_len', can you please explain what above lines does? >>>>>>> >>>>>>> >>>>>> @@ -2470,8 +2523,7 @@ hns3_recv_pkts_simple(void *rx_queue, >>>>>> rxdp->rx.bd_base_info = 0; >>>>>> >>>>>> rxm->data_off = RTE_PKTMBUF_HEADROOM; >>>>>> - rxm->pkt_len = (uint16_t)(rte_le_to_cpu_16(rxd.rx.pkt_len)) - >>
Re: [PATCH] app/testpmd: fix crash in multi-process packet forwarding
On 2024/1/26 14:23, fengchengwen wrote: > Hi Dengdui, > > On 2024/1/26 10:41, Dengdui Huang wrote: >> On multi-process scenario, each process creates flows based on the >> number of queues. When nbcore is greater than 1, multiple cores may >> use the same queue to forward packet, like: >> dpdk-testpmd -a BDF --proc-type=auto -- -i --rxq=4 --txq=4 >> --nb-cores=2 --num-procs=2 --proc-id=0 >> testpmd> start >> mac packet forwarding - ports=1 - cores=2 - streams=4 - NUMA support >> enabled, MP allocation mode: native >> Logical Core 2 (socket 0) forwards packets on 2 streams: >> RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 >> RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00 >> Logical Core 3 (socket 0) forwards packets on 2 streams: >> RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 >> RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00 > > tip: it would be more readable if with an indent, just like below example. > > Acked-by: Chengwen Feng > > Thanks > OK, Thanks >> >> After this commit, the result will be: >> dpdk-testpmd -a BDF --proc-type=auto -- -i --rxq=4 --txq=4 >> --nb-cores=2 --num-procs=2 --proc-id=0 >> testpmd> start >> io packet forwarding - ports=1 - cores=2 - streams=2 - NUMA support >> enabled, MP allocation mode: native >> Logical Core 2 (socket 0) forwards packets on 1 streams: >> RX P=0/Q=0 (socket 2) -> TX P=0/Q=0 (socket 2) peer=02:00:00:00:00:00 >> Logical Core 3 (socket 0) forwards packets on 1 streams: >> RX P=0/Q=1 (socket 2) -> TX P=0/Q=1 (socket 2) peer=02:00:00:00:00:00 >> >> Fixes: a550baf24af9 ("app/testpmd: support multi-process") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang >> --- >> app/test-pmd/config.c | 6 +- >> 1 file changed, 1 insertion(+), 5 deletions(-) >> >> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c >> index cad7537bc6..2c4dedd603 100644 >> --- a/app/test-pmd/config.c >> +++ b/app/test-pmd/config.c >> @@ -4794,7 +4794,6 @@ rss_fwd_config_setup(void) >> queueid_t nb_q; >> streamid_t sm_id; >> int start; >> -int end; >> >> nb_q = nb_rxq; >> if (nb_q > nb_txq) >> @@ -4802,7 +4801,7 @@ rss_fwd_config_setup(void) >> cur_fwd_config.nb_fwd_lcores = (lcoreid_t) nb_fwd_lcores; >> cur_fwd_config.nb_fwd_ports = nb_fwd_ports; >> cur_fwd_config.nb_fwd_streams = >> -(streamid_t) (nb_q * cur_fwd_config.nb_fwd_ports); >> +(streamid_t) (nb_q / num_procs * cur_fwd_config.nb_fwd_ports); >> >> if (cur_fwd_config.nb_fwd_streams < cur_fwd_config.nb_fwd_lcores) >> cur_fwd_config.nb_fwd_lcores = >> @@ -4824,7 +4823,6 @@ rss_fwd_config_setup(void) >> * the 2~3 queue for secondary process. >> */ >> start = proc_id * nb_q / num_procs; >> -end = start + nb_q / num_procs; >> rxp = 0; >> rxq = start; >> for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams; sm_id++) { >> @@ -4843,8 +4841,6 @@ rss_fwd_config_setup(void) >> continue; >> rxp = 0; >> rxq++; >> -if (rxq >= end) >> -rxq = start; >> } >> } >> >>
Re: [PATCH v2 0/6] support setting lanes
Hi, Ferruh Sorry for replying your email very late. The answers to your questions are as follows. Please correct me if I am wrong. On 2024/4/4 21:58, Ferruh Yigit wrote: > > Hi Dengdui, Damodharam, > > As details of the implementation under discussion, I have a high level > question. > > Why/when an application need to configure the lanes explicitly? > > In above description, it mentions if one port is configured as 4x25G and > other 2x50G, port can't be up. How do you end up being in this situation? > According to the Ethernet standard document[1], speed and lanes need to be configured to the PHY together. Currently, the application just set one speed like 100G to the driver. And this doesn't matter with lane number. Currently, the lane number of one NIC for a specified speed depand on their default behavior. As a result, this situation will be happened. > Lets assume first port is configured as 100G, and FW configured it as > 4x25G, and again user configured second port as 100G, why FW can't > detect this and configure ports with correct lane configuration? When the auto-negotiation is disabled, FW of the local port never know the configuration of the peer port. After all, not all NICs support auto-negotiation feature. > > In this case, if we push the responsibility to the user, when user is > configuring the second port how she will know what is the lane > configuration for first port, and what is the proper lane configuration > for the second port? So we need to support the lane query function for above reason. > > Instead of pushing this configuration to user, why it can't be handled > internally? > > As long as user requested speed configured by device, the lane > configuration has no impact to the user, right?> Is there a case/reason user > need to explicitly set, lets say PAM4 > against NRZ? > Sorry, I can not understand what you mean. [1] https://lore.kernel.org/netdev/20201010154119.3537085-1-ido...@idosch.org/T/
Re: [PATCH v3] ethdev: Add link_speed lanes support
Hi Damodharam Here are some suggestions. See below. On 2024/6/18 4:34, Damodharam Ammepalli wrote: > Update the eth_dev_ops structure with new function vectors > to get, get capabilities and set ethernet link speed lanes. > Update the testpmd to provide required config and information > display infrastructure. > > The supporting ethernet controller driver will register callbacks > to avail link speed lanes config and get services. This lanes > configuration is applicable only when the nic is forced to fixed > speeds. In Autonegiation mode, the hardware automatically > negotiates the number of lanes. > > + > /* *** configure txq/rxq, txd/rxd *** */ > struct cmd_config_rx_tx { > cmdline_fixed_string_t port; > @@ -13238,6 +13459,9 @@ static cmdline_parse_ctx_t builtin_ctx[] = { > (cmdline_parse_inst_t *)&cmd_set_port_setup_on, > (cmdline_parse_inst_t *)&cmd_config_speed_all, > (cmdline_parse_inst_t *)&cmd_config_speed_specific, > + (cmdline_parse_inst_t *)&cmd_config_speed_lanes_all, > + (cmdline_parse_inst_t *)&cmd_config_speed_lanes_specific, > + (cmdline_parse_inst_t *)&cmd_show_speed_lanes, Please also add to help string and update doc > (cmdline_parse_inst_t *)&cmd_config_loopback_all, > (cmdline_parse_inst_t *)&cmd_config_loopback_specific, > (cmdline_parse_inst_t *)&cmd_config_rx_tx, > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c > index 66c3a68c1d..cf3ea50114 100644 > --- a/app/test-pmd/config.c > +++ b/app/test-pmd/config.c > @@ -207,6 +207,32 @@ static const struct { > {"gtpu", RTE_ETH_FLOW_GTPU}, > }; > > +static const struct { > + enum rte_eth_speed_lanes lane; > + const uint32_t value; > +} speed_lane_name[] = { > + { > + .lane = RTE_ETH_SPEED_LANE_NOLANE, > + .value = 0, > + }, > + { > + .lane = RTE_ETH_SPEED_LANE_1, > + .value = 1, > + }, > + { > + .lane = RTE_ETH_SPEED_LANE_2, > + .value = 2, > + }, > + { > + .lane = RTE_ETH_SPEED_LANE_4, > + .value = 4, > + }, > + { > + .lane = RTE_ETH_SPEED_LANE_8, > + .value = 8, > + }, > +}; > + > static void > print_ethaddr(const char *name, struct rte_ether_addr *eth_addr) > { > @@ -786,6 +812,7 @@ port_infos_display(portid_t port_id) > char name[RTE_ETH_NAME_MAX_LEN]; > int ret; > char fw_version[ETHDEV_FWVERS_LEN]; > + uint32_t lanes; > > if (port_id_is_invalid(port_id, ENABLED_WARN)) { > print_valid_ports(); > @@ -828,6 +855,8 @@ port_infos_display(portid_t port_id) > > printf("\nLink status: %s\n", (link.link_status) ? ("up") : ("down")); > printf("Link speed: %s\n", rte_eth_link_speed_to_str(link.link_speed)); > + if (rte_eth_speed_lanes_get(port_id, &lanes) == 0) > + printf("Active Lanes: %d\n", lanes); It should not be printed when lanes=0. Or print unknown when lane=0? > printf("Link duplex: %s\n", (link.link_duplex == > RTE_ETH_LINK_FULL_DUPLEX) ? > ("full-duplex") : ("half-duplex")); > printf("Autoneg status: %s\n", (link.link_autoneg == > RTE_ETH_LINK_AUTONEG) ? > @@ -962,7 +991,7 @@ port_summary_header_display(void) > > port_number = rte_eth_dev_count_avail(); > printf("Number of available ports: %i\n", port_number); > - printf("%-4s %-17s %-12s %-14s %-8s %s\n", "Port", "MAC Address", > "Name", > + printf("%-4s %-17s %-12s %-14s %-8s %-8s\n", "Port", "MAC Address", > "Name", > "Driver", "Status", "Link"); > } > > @@ -993,7 +1022,7 @@ port_summary_display(portid_t port_id) > if (ret != 0) > return; > > - printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %s\n", > + printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %-8s\n", Does the lanes need to be printed? > port_id, RTE_ETHER_ADDR_BYTES(&mac_addr), name, > dev_info.driver_name, (link.link_status) ? ("up") : ("down"), > rte_eth_link_speed_to_str(link.link_speed)); > @@ -7244,3 +7273,35 @@ show_mcast_macs(portid_t port_id) > printf(" %s\n", buf); > } > } > + > +int > +parse_speed_lanes(uint32_t lane, uint32_t *speed_lane) > +{ > + uint8_t i; > + > + for (i = 0; i < RTE_DIM(speed_lane_name); i++) { > + if (speed_lane_name[i].value == lane) { > + *speed_lane = lane; > + return 0; > + } > + } > + return -1; > +} > + > +void > +show_speed_lanes_capability(unsigned int num, struct > rte_eth_speed_lanes_capa *speed_lanes_capa) > +{ > + unsigned int i, j; > + > + printf("\n%-15s %-10s", "Supported-speeds", "Valid-lanes"); > + printf("\n---\n"); > + for (i = 0; i < num; i++) { > + printf("%-17s ", > rte_eth_link_speed_to_str(speed_lanes_capa[i].speed));
Re: [PATCH v3] ethdev: Add link_speed lanes support
On 2024/6/26 5:07, Damodharam Ammepalli wrote: > On Wed, Jun 19, 2024 at 8:23 PM huangdengdui wrote: >> >> Hi Damodharam >> Here are some suggestions. See below. >> > Thank you for the review. > >> On 2024/6/18 4:34, Damodharam Ammepalli wrote: >>> Update the eth_dev_ops structure with new function vectors >>> to get, get capabilities and set ethernet link speed lanes. >>> Update the testpmd to provide required config and information >>> display infrastructure. >>> >>> The supporting ethernet controller driver will register callbacks >>> to avail link speed lanes config and get services. This lanes >>> configuration is applicable only when the nic is forced to fixed >>> speeds. In Autonegiation mode, the hardware automatically >>> negotiates the number of lanes. >>> >> >> >>> + >>> /* *** configure txq/rxq, txd/rxd *** */ >>> struct cmd_config_rx_tx { >>> cmdline_fixed_string_t port; >>> @@ -13238,6 +13459,9 @@ static cmdline_parse_ctx_t builtin_ctx[] = { >>> (cmdline_parse_inst_t *)&cmd_set_port_setup_on, cut >>> >>> @@ -993,7 +1022,7 @@ port_summary_display(portid_t port_id) >>> if (ret != 0) >>> return; >>> >>> - printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %s\n", >>> + printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %-8s\n", >> >> Does the lanes need to be printed? > Ferruh in the previous comment, asked not to print. > OK >> >>> port_id, RTE_ETHER_ADDR_BYTES(&mac_addr), name, >>> dev_info.driver_name, (link.link_status) ? ("up") : ("down"), >>> rte_eth_link_speed_to_str(link.link_speed)); >>> @@ -7244,3 +7273,35 @@ show_mcast_macs(portid_t port_id) >>> printf(" %s\n", buf); >>> } >>> } >>> + >>> +int >>> +parse_speed_lanes(uint32_t lane, uint32_t *speed_lane) >>> +{ >>> + uint8_t i; >>> + >>> + for (i = 0; i < RTE_DIM(speed_lane_name); i++) { >>> + if (speed_lane_name[i].value == lane) { >>> + *speed_lane = lane; >>> + return 0; >>> + } >>> + } >>> + return -1; >>> +} >>> + cut >>> >>> +/** >>> + * This enum indicates the possible link speed lanes of an ethdev port. >>> + */ >>> +enum rte_eth_speed_lanes { >>> + RTE_ETH_SPEED_LANE_NOLANE = 0, /**< speed lanes unsupported mode or >>> default */ >>> + RTE_ETH_SPEED_LANE_1, /**< Link speed lane 1 */ >>> + RTE_ETH_SPEED_LANE_2, /**< Link speed lanes 2 */ >>> + RTE_ETH_SPEED_LANE_4, /**< Link speed lanes 4 */ >>> + RTE_ETH_SPEED_LANE_8, /**< Link speed lanes 8 */ >>> + RTE_ETH_SPEED_LANE_MAX, >>> +}; >> >> Is it better to make the index equal to the lanes num? >> enum rte_eth_speed_lanes { >> RTE_ETH_SPEED_LANE_NOLANE = 0, /**< speed lanes unsupported >> mode or default */ >> RTE_ETH_SPEED_LANE_1 = 1, /**< Link speed lane 1 */ >> RTE_ETH_SPEED_LANE_2 = 2, /**< Link speed lanes 2 */ >> RTE_ETH_SPEED_LANE_4 = 4, /**< Link speed lanes 4 */ >> RTE_ETH_SPEED_LANE_8 = 8, /**< Link speed lanes 8 */ >> RTE_ETH_SPEED_LANE_MAX, >> }; >> > I followed the existing enums code convention in rtelib. Your point > makes sense too. > I looked at the other enum code in the lib. There are many similar code styles. Make the index meaningful to avoid conversion. For example, the parse_speed_lanes() function in this patch >> In addition, when lanes = 0, is it better to define it as Unknown? >> If default lanes can return 0 lanes, The active lanes are different for each >> NIC, >> users may be confused. >> > Ack. Are you proposing a new enum RTE_ETH_SPEED_LANE_UKNOWN or rename > RTE_ETH_SPEED_LANE_NOLANE? > I suggest changing the name to RTE_ETH_SPEED_LANE_UKNOWN, Also change the comment to describe it as an unknown lane. This prevents the driver from always returning lanes=0 even if the driver knows the number of active lanes. >>> + >>> +/* Translate from link speed lanes to speed lanes capa */ >>> +#define RTE_ETH_SPEED_LANES_TO_CAPA
Re: [PATCH] devtools: fix version variable not initialized
On 2024/6/27 22:28, David Marchand wrote: > On Wed, Apr 17, 2024 at 11:32 AM Dengdui Huang > wrote: >> >> The version variable is not initialized. Therefore, if the -V option >> is not specified, the value of $version is obtained from the context, >> which may cause the version map parsing failure. >> >> Fixes: 6edec7f202ac ("devtools: list symbols by version") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang > > This is an internal script and I wonder how the mentionned issue is hit. > In any case this fix is correct. > > Reviewed-by: David Marchand > > Thanks for your review. The project build script may pass version information through environment variables. This problem occurs if the following execution sequence exists: export version=devel meson build ninja -C build
Re: [PATCH v4 0/6] refine argparse library
For the patchset, Reviewed-by: Dengdui Huang On 2024/3/18 19:18, Chengwen Feng wrote: > I found a couple of issues when I revisited the argparse_autotest > output, so got this patchset. > > Chengwen Feng (6): > argparse: refine error message > argparse: remove dead code > argparse: replace flag enum with marco > argparse: fix argument flags operate as uint32 type > test/argparse: refine testcases > argparse: fix doc don't display two hyphens > > --- > v4: address Thomas's comment on 4/6 commit: > - remove unused macros TEST_ARGPARSE_FLAG_HAS_ARG_BITMASK and > TEST_ARGPARSE_FLAG_VAL_TYPE_BITMASK. > v3: > - address Thomas's comment on 4/6 comit. > - add commit: fix doc don't display two hyphens. > v2: address David Marchand's comment: > - replace flag enum with marco. > - replace flag's hardcode with macro in test_argparse.c. > > app/test/test_argparse.c | 65 +++- > doc/guides/prog_guide/argparse_lib.rst | 47 +++--- > lib/argparse/rte_argparse.c| 61 +- > lib/argparse/rte_argparse.h| 85 -- > 4 files changed, 127 insertions(+), 131 deletions(-) >
Re: [PATCH] app/testpmd: add flush multicast MAC address command
在 2023/8/1 23:18, Stephen Hemminger 写道: > On Tue, 1 Aug 2023 10:43:04 +0800 > Dengdui Huang wrote: > >> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >> index 0d0723f659..2d9d925776 100644 >> --- a/app/test-pmd/cmdline.c >> +++ b/app/test-pmd/cmdline.c >> @@ -494,6 +494,9 @@ static void cmd_help_long_parsed(void *parsed_result, >> "mac_addr remove (port_id) (XX:XX:XX:XX:XX:XX)\n" >> "Remove a MAC address from port_id.\n\n" >> >> +"mcast_addr flush (port_id)\n" >> +"To flush the set of multicast addresses.\n\n" >> + >> "mac_addr set (port_id) (XX:XX:XX:XX:XX:XX)\n" >> "Set the default MAC address for port_id.\n\n" > > Why out this in middle of the mac_addr commands? better to be in logical or > alpha order. Thanks, I will do in v2.
Re: [PATCH] app/testpmd: add flush multicast MAC address command
在 2023/8/1 23:18, Stephen Hemminger 写道: > On Tue, 1 Aug 2023 10:43:04 +0800 > Dengdui Huang wrote: > >> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >> index 0d0723f659..2d9d925776 100644 >> --- a/app/test-pmd/cmdline.c >> +++ b/app/test-pmd/cmdline.c >> @@ -494,6 +494,9 @@ static void cmd_help_long_parsed(void *parsed_result, >> "mac_addr remove (port_id) (XX:XX:XX:XX:XX:XX)\n" >> "Remove a MAC address from port_id.\n\n" >> >> +"mcast_addr flush (port_id)\n" >> +"To flush the set of multicast addresses.\n\n" >> + >> "mac_addr set (port_id) (XX:XX:XX:XX:XX:XX)\n" >> "Set the default MAC address for port_id.\n\n" > > Why out this in middle of the mac_addr commands? better to be in logical or > alpha order. Sorry Stephen, I made a mistake in my reply. It's already in logical order(mac_addr/mcast_addr add/remove>other setting), the same order as in the doc. The order looks odd because the help command doesn't have a description of "multicast add/remove".Do you agree with this explanation?
Re: [PATCH] app/testpmd: add flush multicast MAC address command
在 2023/8/2 12:16, Stephen Hemminger 写道: > On Wed, 2 Aug 2023 10:41:46 +0800 > huangdengdui wrote: > >> 在 2023/8/1 23:18, Stephen Hemminger 写道: >>> On Tue, 1 Aug 2023 10:43:04 +0800 >>> Dengdui Huang wrote: >>> >>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >>>> index 0d0723f659..2d9d925776 100644 >>>> --- a/app/test-pmd/cmdline.c >>>> +++ b/app/test-pmd/cmdline.c >>>> @@ -494,6 +494,9 @@ static void cmd_help_long_parsed(void *parsed_result, >>>>"mac_addr remove (port_id) (XX:XX:XX:XX:XX:XX)\n" >>>>"Remove a MAC address from port_id.\n\n" >>>> >>>> + "mcast_addr flush (port_id)\n" >>>> + "To flush the set of multicast addresses.\n\n" >>>> + >>>>"mac_addr set (port_id) (XX:XX:XX:XX:XX:XX)\n" >>>>"Set the default MAC address for port_id.\n\n" >>> >>> Why out this in middle of the mac_addr commands? better to be in logical or >>> alpha order. >> Sorry Stephen, I made a mistake in my reply. It's already in logical >> order(mac_addr/mcast_addr add/remove>other setting), the same order as >> in the doc. >> The order looks odd because the help command doesn't have a description of >> "multicast add/remove".Do you agree with this explanation? > > The help is already a bit of a mess. It really needs to be split up more. > > Lets add the new line after "set allmulti"? > OK, I will do in v2.
Re: [PATCH 3/3] app/testpmd: support setting lanes
On 2024/3/16 5:47, Damodharam Ammepalli wrote: > On Tue, Mar 12, 2024 at 12:52 AM Dengdui Huang > wrote: >> >> Extended speed command for setting lane number and >> Show info print lane number. >> >> Signed-off-by: Dengdui Huang >> --- >> app/test-pmd/cmdline.c | 110 +++- >> app/test-pmd/config.c | 60 +++ >> doc/guides/rel_notes/release_24_03.rst | 1 + >> doc/guides/testpmd_app_ug/testpmd_funcs.rst | 3 +- >> 4 files changed, 103 insertions(+), 71 deletions(-) >> >> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >> index f521a1fe9e..e66daf4bba 100644 >> --- a/app/test-pmd/cmdline.c >> +++ b/app/test-pmd/cmdline.c >> @@ -1356,15 +1356,20 @@ struct cmd_config_speed_all { >> cmdline_fixed_string_t keyword; >> cmdline_fixed_string_t all; >> cmdline_fixed_string_t item1; >> + cmdline_fixed_string_t lanes_item; >> cmdline_fixed_string_t item2; >> cmdline_fixed_string_t value1; >> + uint8_t lanes_value; >> cmdline_fixed_string_t value2; >> }; >> >> static int >> -parse_and_check_speed_duplex(char *speedstr, char *duplexstr, uint32_t >> *speed) >> +parse_and_check_speed_duplex(char *speedstr, uint8_t lanes, char *duplexstr, >> +uint32_t *speed) >> { >> > We internally implemented a similar feature, without changing the > existing testpmd speed cmd. > Instead of modifying the existing command set we can have a separate > cmd for the lanes > configuration similar to FEC configuration. Our internal > implementation looks something like this, > without affecting existing implementations. > testpmd> port stop 0 > testpmd> port config 0 speed_lanes 4 Hi, Damodharam, Thanks for your review. I think the lanes should be configured with speed and duplex, they will eventually map to Link speed capabilities(RTE_ETH_LINK_SPEED_*). Would it be better to add the following new command? testpmd> port config 0 speed 20 lanes 4 duplex full It can be used when the driver supports setting lanes; It cannot be used when the driver does not support the setting lanes. what do you think? > testpmd> port config 0 speed 20 duplex full > testpmd> port start 0 > testpmd> show port summary 0 > Number of available ports: 2 > Port MAC Address Name Driver Status Link Lanes > 014:23:F2:C3:BA:D2 :b1:00.0 net_bnxt up 200 Gbps 4 The summary command adds print of the number of lanes. I will implement this in the next version. > testpmd> > testpmd> show port info 0 > > * Infos for port 0 * > MAC address: 14:23:F2:C3:BA:D2 > Device name: :b1:00.0 > Driver name: net_bnxt > Firmware-version: 228.9.115.0 > Connect to socket: 2 > memory allocation on the socket: 2 > Link status: up > Link speed: 200 Gbps > Lanes: 4 > Link duplex: full-duplex > Autoneg status: Off > >> + uint32_t speed_num; >> + char *endptr; >> int duplex; >> >> if (!strcmp(duplexstr, "half")) { >> @@ -1378,47 +1383,22 @@ parse_and_check_speed_duplex(char *speedstr, char >> *duplexstr, uint32_t *speed) >> return -1; >> } >> >> - if (!strcmp(speedstr, "10")) { >> - *speed = (duplex == RTE_ETH_LINK_HALF_DUPLEX) ? >> - RTE_ETH_LINK_SPEED_10M_HD : >> RTE_ETH_LINK_SPEED_10M; >> - } else if (!strcmp(speedstr, "100")) { >> - *speed = (duplex == RTE_ETH_LINK_HALF_DUPLEX) ? >> - RTE_ETH_LINK_SPEED_100M_HD : >> RTE_ETH_LINK_SPEED_100M; >> - } else { >> - if (duplex != RTE_ETH_LINK_FULL_DUPLEX) { >> - fprintf(stderr, "Invalid speed/duplex parameters\n"); >> - return -1; >> - } >> - if (!strcmp(speedstr, "1000")) { >> - *speed = RTE_ETH_LINK_SPEED_1G; >> - } else if (!strcmp(speedstr, "2500")) { >> - *speed = RTE_ETH_LINK_SPEED_2_5G; >> - } else if (!strcmp(speedstr, "5000")) { >> - *speed = RTE_ETH_LINK_SPEED_5G; >> - } else if (!strcmp(speedstr, "1")) { >> - *speed = RTE_ETH_LINK_SPEED_10G; >> - } else if (!strcmp(speedstr, "25000")) { >> - *speed = RTE_ETH_LINK_SPEED_25G; >> - } else if (!strcmp(speedstr, "4")) { >> - *speed = RTE_ETH_LINK_SPEED_40G; >> - } else if (!strcmp(speedstr, "5")) { >> - *speed = RTE_ETH_LINK_SPEED_50G; >> - } else if (!strcmp(speedstr, "10")) { >> - *speed = RTE_ETH_LINK_SPEED_100G; >> - } else if (!strcmp(speedstr, "20")) { >> - *speed = RTE_ETH_LINK_SPEED_200G; >> - } else if (!strcmp(speedst
Re: [PATCH 1/3] ethdev: support setting lanes
On 2024/3/19 11:02, Stephen Hemminger wrote: > On Tue, 12 Mar 2024 15:52:36 +0800 > Dengdui Huang wrote: > >> -ret = snprintf(str, len, "Link up at %s %s %s", >> +ret = snprintf(str, len, "Link up at %s %ulanes %s %s", > > Don't you want a space after %u? > > Could you make it so that lanes is only part of the message if non-default > value > is used? Ok, I'll do it in the next version.
Re: [PATCH 0/3] support setting lanes
On 2024/3/20 20:31, Ferruh Yigit wrote: > On 3/18/2024 9:26 PM, Damodharam Ammepalli wrote: >> On Mon, Mar 18, 2024 at 7:56 AM Thomas Monjalon wrote: >>> >>> 12/03/2024 08:52, Dengdui Huang: Some speeds can be achieved with different number of lanes. For example, 100Gbps can be achieved using two lanes of 50Gbps or four lanes of 25Gbps. When use different lanes, the port cannot be up. >>> >>> I'm not sure what you are referring to. >>> I suppose it is not PCI lanes. >>> Please could you link to an explanation of how a port is split in lanes? >>> Which hardware does this? >>> >>> >>> >> This is a snapshot of 100Gb that the latest BCM576xx supports. >> 100Gb (NRZ: 25G per lane, 4 lanes) link speed >> 100Gb (PAM4-56: 50G per lane, 2 lanes) link speed >> 100Gb (PAM4-112: 100G per lane, 1 lane) link speed >> >> Let the user feed in lanes=< integer value> and the NIC driver decides >> the matching combination speed x lanes that works. In future if a new speed >> is implemented with more than 8 lanes, there wouldn't be a need >> to touch this speed command. Using separate lane command would >> be a better alternative to support already shipped products and only new >> drivers would consider this lanes configuration, if applicable. >> > > As far as I understand, lane is related to the physical layer of the > NIC, there are multiple copies of transmitter, receiver, modulator HW > block and each set called as a 'lane' and multiple lanes work together > to achieve desired speed. (please correct me if this is wrong). > > Why not just configuring the speed is not enough? Why user needs to know > the detail and configuration of the lanes? > Will it work if driver/device configure the "speed x lane" internally > for the requested speed? > > Is there a benefit to force specific lane count for a specific speed > (like power optimization, just a wild guess)? > > > And +1 for auto-negotiation if possible. As you said above,,multiple lanes work together to achieve desired speed. For example, the following solutions can be used to implement 100G: 1、Combines four 25G lanes 2、Combines two 50G lanes 3、A single 100G lane It is assumed that two ports are interconnected and the two ports support the foregoing three solutions. But, we just configured the speed to 100G and one port uses four 25G lanes by default and the other port uses two 50G lanes by default, the port cannot be up. In this case, we need to configure the two ports to use the same solutions (for example, uses two 50G lanes) so that the ports can be up. ethtool has supported lanes configuration a while ago.[1] [1] https://lore.kernel.org/netdev/20201010154119.3537085-1-ido...@idosch.org/T/
Re: [PATCH 0/3] support setting lanes
On 2024/3/21 16:28, Thomas Monjalon wrote: > 21/03/2024 03:02, huangdengdui: >> >> On 2024/3/20 20:31, Ferruh Yigit wrote: >>> On 3/18/2024 9:26 PM, Damodharam Ammepalli wrote: >>>> On Mon, Mar 18, 2024 at 7:56 AM Thomas Monjalon >>>> wrote: >>>>> >>>>> 12/03/2024 08:52, Dengdui Huang: >>>>>> Some speeds can be achieved with different number of lanes. For example, >>>>>> 100Gbps can be achieved using two lanes of 50Gbps or four lanes of >>>>>> 25Gbps. >>>>>> When use different lanes, the port cannot be up. >>>>> >>>>> I'm not sure what you are referring to. >>>>> I suppose it is not PCI lanes. >>>>> Please could you link to an explanation of how a port is split in lanes? >>>>> Which hardware does this? >>>>> >>>> This is a snapshot of 100Gb that the latest BCM576xx supports. >>>> 100Gb (NRZ: 25G per lane, 4 lanes) link speed >>>> 100Gb (PAM4-56: 50G per lane, 2 lanes) link speed >>>> 100Gb (PAM4-112: 100G per lane, 1 lane) link speed >>>> >>>> Let the user feed in lanes=< integer value> and the NIC driver decides >>>> the matching combination speed x lanes that works. In future if a new speed >>>> is implemented with more than 8 lanes, there wouldn't be a need >>>> to touch this speed command. Using separate lane command would >>>> be a better alternative to support already shipped products and only new >>>> drivers would consider this lanes configuration, if applicable. >>>> >>> >>> As far as I understand, lane is related to the physical layer of the >>> NIC, there are multiple copies of transmitter, receiver, modulator HW >>> block and each set called as a 'lane' and multiple lanes work together >>> to achieve desired speed. (please correct me if this is wrong). >>> >>> Why not just configuring the speed is not enough? Why user needs to know >>> the detail and configuration of the lanes? >>> Will it work if driver/device configure the "speed x lane" internally >>> for the requested speed? >>> >>> Is there a benefit to force specific lane count for a specific speed >>> (like power optimization, just a wild guess)? >>> >>> >>> And +1 for auto-negotiation if possible. >> >> As you said above,,multiple lanes work together to achieve desired speed. >> For example, the following solutions can be used to implement 100G: >> 1、Combines four 25G lanes >> 2、Combines two 50G lanes >> 3、A single 100G lane >> >> It is assumed that two ports are interconnected and the two ports support >> the foregoing three solutions. But, we just configured the speed to 100G and >> one port uses four 25G lanes by default and the other port uses two 50G lanes >> by default, the port cannot be up. In this case, we need to configure the >> two ports to use the same solutions (for example, uses two 50G lanes) >> so that the ports can be up. > > Why this config is not OK? How do we know? > Really I have a very bad feeling about this feature. > > Sorry, I don't quite understand your question. Are you asking why cannot be up when one port uses four 25G lanes and the other port uses two 50G lanes? 100GBASE-SR2 (two 50G lanes) and 100GBASE-SR4 (four 25G lanes) have different standards at the physical layer.[1] So it's not possible to communicate. Configuring lanes can help the driver choose the same standard. [1] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9844436
Re: [PATCH v2 1/6] ethdev: support setting lanes
On 2024/3/22 21:58, Thomas Monjalon wrote: > 22/03/2024 08:09, Dengdui Huang: >> -#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8) /**< 10 Gbps */ >> -#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9) /**< 20 Gbps */ >> -#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */ >> -#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */ >> -#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */ >> -#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */ >> -#define RTE_ETH_LINK_SPEED_100GRTE_BIT32(14) /**< 100 Gbps */ >> -#define RTE_ETH_LINK_SPEED_200GRTE_BIT32(15) /**< 200 Gbps */ >> -#define RTE_ETH_LINK_SPEED_400GRTE_BIT32(16) /**< 400 Gbps */ >> +#define RTE_ETH_LINK_SPEED_10GRTE_BIT32(8) /**< 10 Gbps */ >> +#define RTE_ETH_LINK_SPEED_20GRTE_BIT32(9) /**< 20 Gbps 2lanes >> */ >> +#define RTE_ETH_LINK_SPEED_25GRTE_BIT32(10) /**< 25 Gbps */ >> +#define RTE_ETH_LINK_SPEED_40GRTE_BIT32(11) /**< 40 Gbps 4lanes >> */ >> +#define RTE_ETH_LINK_SPEED_50GRTE_BIT32(12) /**< 50 Gbps */ >> +#define RTE_ETH_LINK_SPEED_56GRTE_BIT32(13) /**< 56 Gbps 4lanes >> */ >> +#define RTE_ETH_LINK_SPEED_100G RTE_BIT32(14) /**< 100 Gbps */ >> +#define RTE_ETH_LINK_SPEED_200G RTE_BIT32(15) /**< 200 Gbps >> 4lanes */ >> +#define RTE_ETH_LINK_SPEED_400G RTE_BIT32(16) /**< 400 Gbps >> 4lanes */ >> +#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17) /**< 10 Gbps >> 4lanes */ >> +#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 Gbps 2 >> lanes */ >> +#define RTE_ETH_LINK_SPEED_100G_2LANESRTE_BIT32(19) /**< 100 Gbps 2 >> lanes */ >> +#define RTE_ETH_LINK_SPEED_100G_4LANESRTE_BIT32(20) /**< 100 Gbps >> 4lanes */ >> +#define RTE_ETH_LINK_SPEED_200G_2LANESRTE_BIT32(21) /**< 200 Gbps >> 2lanes */ >> +#define RTE_ETH_LINK_SPEED_400G_8LANESRTE_BIT32(22) /**< 400 Gbps >> 8lanes */ > > I don't think it is a good idea to make this more complex. > It brings nothing as far as I can see, compared to having speed and lanes > separated. > Can we have lanes information a separate value? no need for bitmask. > Hi,Thomas, Ajit, roretzla, damodharam I also considered the option at the beginning of the design. But this option is not used due to the following reasons: 1. For the user, ethtool couples speed and lanes. The result of querying the NIC capability is as follows: Supported link modes: 10baseSR4/Full 10baseSR2/Full The NIC capability is configured as follows: ethtool -s eth1 speed 10 lanes 4 autoneg off ethtool -s eth1 speed 10 lanes 2 autoneg off Therefore, users are more accustomed to the coupling of speed and lanes. 2. For the PHY, When the physical layer capability is configured through the MDIO, the speed and lanes are also coupled. For example: Table 45–7—PMA/PMD control 2 register bit definitions[1] PMA/PMD type selection 1 0 0 1 0 1 0 = 100GBASE-SR2 PMA/PMD 0 1 0 1 1 1 1 = 100GBASE-SR4 PMA/PMD Therefore, coupling speeds and lanes is easier to understand. And it is easier for the driver to report the support lanes. In addition, the code implementation is compatible with the old version. When the driver does not support the lanes setting, the code does not need to be modified. So I think the speed and lanes coupling is better. [1] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9844436
Re: [PATCH v2 1/6] ethdev: support setting lanes
On 2024/3/27 2:21, Damodharam Ammepalli wrote: > On Tue, Mar 26, 2024 at 11:12 AM Ajit Khaparde > wrote: >> >> On Tue, Mar 26, 2024 at 6:47 AM Ajit Khaparde >> wrote: >>> >>> On Tue, Mar 26, 2024 at 4:15 AM lihuisong (C) wrote: >>>> >>>> >>>> 在 2024/3/26 18:30, Thomas Monjalon 写道: >>>>> 26/03/2024 02:42, lihuisong (C): >>>>>> 在 2024/3/25 17:30, Thomas Monjalon 写道: >>>>>>> 25/03/2024 07:24, huangdengdui: >>>>>>>> On 2024/3/22 21:58, Thomas Monjalon wrote: >>>>>>>>> 22/03/2024 08:09, Dengdui Huang: >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8) /**< 10 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9) /**< 20 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_100GRTE_BIT32(14) /**< 100 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_200GRTE_BIT32(15) /**< 200 Gbps */ >>>>>>>>>> -#define RTE_ETH_LINK_SPEED_400GRTE_BIT32(16) /**< 400 Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_10GRTE_BIT32(8) /**< 10 >>>>>>>>>> Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_20GRTE_BIT32(9) /**< 20 >>>>>>>>>> Gbps 2lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_25GRTE_BIT32(10) /**< 25 >>>>>>>>>> Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_40GRTE_BIT32(11) /**< 40 >>>>>>>>>> Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_50GRTE_BIT32(12) /**< 50 >>>>>>>>>> Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_56GRTE_BIT32(13) /**< 56 >>>>>>>>>> Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_100G RTE_BIT32(14) /**< 100 >>>>>>>>>> Gbps */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_200G RTE_BIT32(15) /**< 200 >>>>>>>>>> Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_400G RTE_BIT32(16) /**< 400 >>>>>>>>>> Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17) /**< 10 >>>>>>>>>> Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 >>>>>>>>>> Gbps 2 lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_100G_2LANESRTE_BIT32(19) /**< 100 >>>>>>>>>> Gbps 2 lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_100G_4LANESRTE_BIT32(20) /**< 100 >>>>>>>>>> Gbps 4lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_200G_2LANESRTE_BIT32(21) /**< 200 >>>>>>>>>> Gbps 2lanes */ >>>>>>>>>> +#define RTE_ETH_LINK_SPEED_400G_8LANESRTE_BIT32(22) /**< 400 >>>>>>>>>> Gbps 8lanes */ >>>>>>>>> I don't think it is a good idea to make this more complex. >>>>>>>>> It brings nothing as far as I can see, compared to having speed and >>>>>>>>> lanes separated. >>>>>>>>> Can we have lanes information a separate value? no need for bitmask. >>>>>>>>> >>>>>>>> Hi,Thomas, Ajit, roretzla, damodharam >>>>>>>> >>>>>>>> I also considered the option at the beginning of the design. >>>>>>>> But this option is not used due to the following reasons: >>>>>>>> 1. For the user, ethtool couples speed and lanes. >>>>>>>> The result of querying the NIC c
Re: [PATCH v2 1/6] ethdev: support setting lanes
On 2024/4/2 4:07, Thomas Monjalon wrote: > 30/03/2024 12:38, huangdengdui: >> But, there are different solutions for the device to report the setting >> lane capability, as following: >> 1. Like the current patch, reporting device capabilities in speed and >>lane coupling mode. However, if we use this solution, we will have >>to couple the the lanes setting with speed setting. >> >> 2. Like the Damodharam's RFC patch [1], the device reports the maximum >>number of supported lanes. Users can config a lane randomly, >>which is completely separated from the speed. >> >> 3. Similar to the FEC capability reported by a device, the device reports the >>relationship table of the number of lanes supported by the speed, >>for example: >> speedlanes_capa >> 50G 1,2 >> 100G 1,2,4 >> 200G 2,4 >> >> Options 1 and 2 have been discussed a lot above. >> >> For solution 1, the speed and lanes are over-coupled, and the implementation >> is too >> complex. But I think it's easier to understand and easier for the device to >> report >> capabilities. In addition, the ethtool reporting capability also uses this >> mode. >> >> For solution 2, as huisong said that user don't know what lanes should or >> can be set >> for a specified speed on one NIC. >> >> I think that when the device reports the capability, the lanes should be >> associated >> with the speed. In this way, users can know which lanes are supported by the >> current >> speed and verify the configuration validity. >> >> So I think solution 3 is better. What do you think? > > I don't understand your proposals. > Please could you show the function signature for each option? > > I agree with separating the lanes setting from the speed setting. I have a different proposal for device lanes capability reporting. Three interfaces are added to the lib/ethdev like FEC interfaces. 1. rte_eth_lanes_get(uint16_t port_id, uint32_t *capa) /* get current lanes */ 2. rte_eth_lanes_set(uint16_t port_id, uint32_t capa) 3. rte_eth_lanes_get_capa(uint16_t port_id, struct rte_eth_lanes_capa *speed_lanes_capa) /* A structure used to get capabilities per link speed */ struct rte_eth_lanes_capa { uint32_t speed; /**< Link speed (see RTE_ETH_SPEED_NUM_*) */ uint32_t capa; /**< lanes capabilities bitmask */ }; For example, an ethdev report the following lanes capability array: struct rte_eth_lanes_capa[] device_capa = { { RTE_ETH_SPEED_NUM_50G, 0x0003 }, //supports lanes 1 and 2 for 50G { RTE_ETH_SPEED_NUM_100G, 0x000B } //supports lanes 1, 2 and 4 for 100G }; The application can know which lanes are supported at a specified speed. I think it's better to implement the setting lanes feature in this way. Welcom to jump into discuss.
Re: [PATCH] app/testpmd: handle IEEE1588 init fail
On 2024/4/6 0:44, Stephen Hemminger wrote: > On Sat, 30 Mar 2024 15:44:09 +0800 > Dengdui Huang wrote: > >> When the port's timestamping function failed to initialize >> (for example, the device does not support PTP), the packets >> received by the hardware do not contain the timestamp. >> In this case, IEEE1588 packet forwarding should not start. >> This patch fix it. >> >> Plus, adding a failure message when failed to disable PTP. >> >> Fixes: a78040c990cb ("app/testpmd: update forward engine beginning") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang > > Noticed that ieee1588 part is printing errors to stdout, > but other parts of test-pmd are using stderr or TEST_PMD_LOG. > > It would be good to decide on one good way to handle this > across all of testpmd. Yeah, it's a bit of a mess. Is it better to use TEST_PMD_LOG? But this is a test app, and modifying it seems unnecessary. What should we do next?
Re: [PATCH] app/testpmd: handle IEEE1588 init fail
On 2024/4/8 16:45, Ferruh Yigit wrote: > On 4/8/2024 6:52 AM, huangdengdui wrote: >> >> >> On 2024/4/6 0:44, Stephen Hemminger wrote: >>> On Sat, 30 Mar 2024 15:44:09 +0800 >>> Dengdui Huang wrote: >>> >>>> When the port's timestamping function failed to initialize >>>> (for example, the device does not support PTP), the packets >>>> received by the hardware do not contain the timestamp. >>>> In this case, IEEE1588 packet forwarding should not start. >>>> This patch fix it. >>>> >>>> Plus, adding a failure message when failed to disable PTP. >>>> >>>> Fixes: a78040c990cb ("app/testpmd: update forward engine beginning") >>>> Cc: sta...@dpdk.org >>>> >>>> Signed-off-by: Dengdui Huang >>> >>> Noticed that ieee1588 part is printing errors to stdout, >>> but other parts of test-pmd are using stderr or TEST_PMD_LOG. >>> >>> It would be good to decide on one good way to handle this >>> across all of testpmd. >> >> Yeah, it's a bit of a mess. Is it better to use TEST_PMD_LOG? >> But this is a test app, and modifying it seems unnecessary. >> What should we do next? >> > > 'TESTPMD_LOG' exists and used in a few places, but still most of the > logging done with 'printf/fprintf'. > > Agree to have an agreement what to use, document it, and stick to it. > > > For some cases, like 'usage()' output (testpmd supported parameters), or > cmdline prompt we always want to have output, so 'printf' suits well. > > Not sure where 'TESTPMD_LOG' is needed and what is the benefit. I don't > remember many cases that I want to refine testpmd app level output. > Maybe a case can be packet verbose output, but it also has its specific > command to control it. > > So should we continue to 'printf/fprintf', is there any disadvantage to > do so? OK, 'printf/fprintf' is really necessary. Am I to understand it as follows? 'TESTPMD_LOG' is more suitable for printing app runtime context logs, such as initialization logs and uninstallation logs. 'printf' is suitable for normal interaction with the user, such as show port info When should we print to stderr?
Re: [PATCH] app/testpmd: handle IEEE1588 init fail
On 2024/4/9 10:50, Stephen Hemminger wrote: > On Tue, 9 Apr 2024 10:06:01 +0800 > huangdengdui wrote: > >> On 2024/4/8 16:45, Ferruh Yigit wrote: >>> On 4/8/2024 6:52 AM, huangdengdui wrote: >>>> >>>> >>>> On 2024/4/6 0:44, Stephen Hemminger wrote: >>>>> On Sat, 30 Mar 2024 15:44:09 +0800 >>>>> Dengdui Huang wrote: >>>>> >>>>>> When the port's timestamping function failed to initialize >>>>>> (for example, the device does not support PTP), the packets >>>>>> received by the hardware do not contain the timestamp. >>>>>> In this case, IEEE1588 packet forwarding should not start. >>>>>> This patch fix it. >>>>>> >>>>>> Plus, adding a failure message when failed to disable PTP. >>>>>> >>>>>> Fixes: a78040c990cb ("app/testpmd: update forward engine beginning") >>>>>> Cc: sta...@dpdk.org >>>>>> >>>>>> Signed-off-by: Dengdui Huang >>>>> >>>>> Noticed that ieee1588 part is printing errors to stdout, >>>>> but other parts of test-pmd are using stderr or TEST_PMD_LOG. >>>>> >>>>> It would be good to decide on one good way to handle this >>>>> across all of testpmd. >>>> >>>> Yeah, it's a bit of a mess. Is it better to use TEST_PMD_LOG? >>>> But this is a test app, and modifying it seems unnecessary. >>>> What should we do next? >>>> >>> >>> 'TESTPMD_LOG' exists and used in a few places, but still most of the >>> logging done with 'printf/fprintf'. >>> >>> Agree to have an agreement what to use, document it, and stick to it. >>> >>> >>> For some cases, like 'usage()' output (testpmd supported parameters), or >>> cmdline prompt we always want to have output, so 'printf' suits well. >>> >>> Not sure where 'TESTPMD_LOG' is needed and what is the benefit. I don't >>> remember many cases that I want to refine testpmd app level output. >>> Maybe a case can be packet verbose output, but it also has its specific >>> command to control it. >>> >>> So should we continue to 'printf/fprintf', is there any disadvantage to >>> do so? >> OK, 'printf/fprintf' is really necessary. Am I to understand it as follows? >> >> 'TESTPMD_LOG' is more suitable for printing app runtime context logs, >> such as initialization logs and uninstallation logs. >> >> 'printf' is suitable for normal interaction with the user, such as >> show port info >> >> When should we print to stderr? > > Any Unix convention is that any error message should go to stderr. > For test-pmd, using TESTPMD_LOG has benefits and problems. > The benefit is following a convention across all the startup and error > codes. And with the color log patches, errors are highlighted. > > But the current way rte_log works, the testpmd stuff ends up going > to syslog/journal which is not necessary and overly chatty. That seems hard to unify. I don't have a better idea at the moment. Do you have a better suggestion?
Re: [PATCH v4 1/2] ethdev: add API to check if queue is valid
On 2023/6/2 20:47, Ferruh Yigit wrote: > On 6/2/2023 8:52 AM, Dengdui Huang wrote: >> The API rte_eth_dev_is_valid_rxq/txq which >> is used to check if Rx/Tx queue is valid. >> If the queue has been setup, it is considered valid. >> >> Signed-off-by: Dengdui Huang >> --- >> doc/guides/rel_notes/release_23_07.rst | 6 >> lib/ethdev/rte_ethdev.c| 22 +++ >> lib/ethdev/rte_ethdev.h| 38 ++ >> lib/ethdev/version.map | 2 ++ >> 4 files changed, 68 insertions(+) >> >> diff --git a/doc/guides/rel_notes/release_23_07.rst >> b/doc/guides/rel_notes/release_23_07.rst >> index 0d3cada5d0..1332fa3a5a 100644 >> --- a/doc/guides/rel_notes/release_23_07.rst >> +++ b/doc/guides/rel_notes/release_23_07.rst >> @@ -83,6 +83,12 @@ New Features >>for new capability registers, large passthrough BAR and some >>performance enhancements for UPT. >> >> +* **Added ethdev Rx/Tx queue ID check API.** >> + >> + Added ethdev Rx/Tx queue ID check API which provides functions >> + for check if Rx/Tx queue is valid. If the queue has been setup, >> + it is considered valid. >> + >> > > Can you please move the release note update to up, before rte_flow > updates, the expected order is documented in section comment. > >> Removed Items >> - >> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c >> index d46e74504e..a134928c8a 100644 >> --- a/lib/ethdev/rte_ethdev.c >> +++ b/lib/ethdev/rte_ethdev.c >> @@ -771,6 +771,28 @@ eth_dev_validate_tx_queue(const struct rte_eth_dev >> *dev, uint16_t tx_queue_id) >> return 0; >> } >> >> +int >> +rte_eth_dev_is_valid_rxq(uint16_t port_id, uint16_t queue_id) >> +{ >> +struct rte_eth_dev *dev; >> + >> +RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); >> +dev = &rte_eth_devices[port_id]; >> + >> +return eth_dev_validate_rx_queue(dev, queue_id); >> +} >> + >> +int >> +rte_eth_dev_is_valid_txq(uint16_t port_id, uint16_t queue_id) >> +{ >> +struct rte_eth_dev *dev; >> + >> +RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); >> +dev = &rte_eth_devices[port_id]; >> + >> +return eth_dev_validate_tx_queue(dev, queue_id); >> +} >> + >> int >> rte_eth_dev_rx_queue_start(uint16_t port_id, uint16_t rx_queue_id) >> { >> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h >> index 9932413c33..4ef803c244 100644 >> --- a/lib/ethdev/rte_ethdev.h >> +++ b/lib/ethdev/rte_ethdev.h >> @@ -2666,6 +2666,44 @@ int rte_eth_dev_socket_id(uint16_t port_id); >> */ >> int rte_eth_dev_is_valid_port(uint16_t port_id); >> >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice >> + * >> + * Check if Rx queue is valid. If the queue has been setup, >> + * it is considered valid. >> + * >> + * @param port_id >> + * The port identifier of the Ethernet device. >> + * @param queue_id >> + * The index of the receive queue. >> + * @return >> + * - -ENODEV: if *port_id* is valid. > > s/valid/invalid ? > >> + * - -EINVAL: if queue is out of range or not been setup. > > "if queue_id is out of range or queue is not been setup" ? > >> + * - 0 if Rx queue is valid. >> + */ >> +__rte_experimental >> +int rte_eth_dev_is_valid_rxq(uint16_t port_id, uint16_t queue_id); >> + >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice >> + * >> + * Check if Tx queue is valid. If the queue has been setup, >> + * it is considered valid. >> + * >> + * @param port_id >> + * The port identifier of the Ethernet device. >> + * @param queue_id >> + * The index of the transmit queue. >> + * @return >> + * - -ENODEV: if *port_id* is valid. > > s/valid/invalid ? > >> + * - -EINVAL: if queue is out of range or not been setup. > > "if queue_id is out of range or queue is not been setup" ? > >> + * - 0 if Tx queue is valid. >> + */ >> +__rte_experimental >> +int rte_eth_dev_is_valid_txq(uint16_t port_id, uint16_t queue_id); >> + >> /** >> * Start specified Rx queue of a port. It is used when rx_deferred_start >> * flag of the specified queue is true. >> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map >> index 9d418091ef..3aa6bce156 100644 >> --- a/lib/ethdev/version.map >> +++ b/lib/ethdev/version.map >> @@ -309,6 +309,8 @@ EXPERIMENTAL { >> rte_flow_async_action_list_handle_destroy; >> rte_flow_async_action_list_handle_query_update; >> rte_flow_async_actions_update; >> +rte_eth_dev_is_valid_rxq; >> +rte_eth_dev_is_valid_txq; > > Can you please sort '23.07' block alphabetically? > >> }; >> >> INTERNAL { > Hi Ferruh, Thanks for your review, I will do in v5.
Re: [PATCH v6] app/dma-perf: introduce dma-perf application
Hi Cheng, Few comments inline. Please check. Thanks, Dengdui On 2023/6/13 12:31, Cheng Jiang wrote: > There are many high-performance DMA devices supported in DPDK now, and > these DMA devices can also be integrated into other modules of DPDK as > accelerators, such as Vhost. Before integrating DMA into applications, > developers need to know the performance of these DMA devices in various > scenarios and the performance of CPUs in the same scenario, such as > different buffer lengths. Only in this way can we know the target > performance of the application accelerated by using them. This patch > introduces a high-performance testing tool, which supports comparing the > performance of CPU and DMA in different scenarios automatically with a > pre-set config file. Memory Copy performance test are supported for now. > > Signed-off-by: Cheng Jiang > Signed-off-by: Jiayu Hu > Signed-off-by: Yuan Wang > Acked-by: Morten Brørup > Acked-by: Chenbo Xia > --- > v6: > improved code based on Anoob's comments; > fixed some code structure issues; > v5: > fixed some LONG_LINE warnings; > v4: > fixed inaccuracy of the memory footprint display; > v3: > fixed some typos; > v2: > added lcore/dmadev designation; > added error case process; > removed worker_threads parameter from config.ini; > improved the logs; > improved config file; > > app/meson.build | 1 + > app/test-dma-perf/benchmark.c | 477 > app/test-dma-perf/config.ini | 59 > app/test-dma-perf/main.c | 569 ++ > app/test-dma-perf/main.h | 69 + > app/test-dma-perf/meson.build | 17 + > 6 files changed, 1192 insertions(+) > create mode 100644 app/test-dma-perf/benchmark.c > create mode 100644 app/test-dma-perf/config.ini > create mode 100644 app/test-dma-perf/main.c > create mode 100644 app/test-dma-perf/main.h > create mode 100644 app/test-dma-perf/meson.build > > diff --git a/app/meson.build b/app/meson.build > index 74d2420f67..4fc1a83eba 100644 > --- a/app/meson.build > +++ b/app/meson.build > @@ -19,6 +19,7 @@ apps = [ > 'test-cmdline', > 'test-compress-perf', > 'test-crypto-perf', > +'test-dma-perf', > 'test-eventdev', > 'test-fib', > 'test-flow-perf', > diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-perf/benchmark.c > new file mode 100644 > index 00..bc1ca82297 > --- /dev/null > +++ b/app/test-dma-perf/benchmark.c > @@ -0,0 +1,477 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2023 Intel Corporation > + */ > + > +static inline int > +__rte_format_printf(3, 4) > +print_err(const char *func, int lineno, const char *format, ...) > +{ > + va_list ap; > + int ret; > + > + ret = fprintf(stderr, "In %s:%d - ", func, lineno); > + va_start(ap, format); > + ret += vfprintf(stderr, format, ap); > + va_end(ap); > + > + return ret; > +} > + > +static inline void > +calc_result(uint32_t buf_size, uint32_t nr_buf, uint16_t nb_workers, > uint16_t test_secs, > + uint32_t total_cnt, float *memory, uint32_t > *ave_cycle, > + float *bandwidth, float *mops) > +{ > + *memory = (float)(buf_size * (nr_buf / nb_workers) * 2) / (1024 * 1024); > + *ave_cycle = test_secs * rte_get_timer_hz() / total_cnt; > + *bandwidth = (buf_size * 8 * (rte_get_timer_hz() / (float)*ave_cycle)) > / 10; > + *mops = (float)rte_get_timer_hz() / *ave_cycle / 100; The value of ave_cycle may be 0. *mops = (float)(total_cnt / test_secs) / 100; ? > +} > + > +static void > +output_result(uint8_t scenario_id, uint32_t lcore_id, char *dma_name, > uint64_t ave_cycle, > + uint32_t buf_size, uint32_t nr_buf, float memory, > + float bandwidth, float mops, bool is_dma) > +{ > + if (is_dma) > + printf("lcore %u, DMA %s:\n", lcore_id, dma_name); > + else > + printf("lcore %u\n", lcore_id); > + > + printf("average cycles/op: %" PRIu64 ", buffer size: %u, nr_buf: %u, > memory: %.2lfMB, frequency: %" PRIu64 ".\n", > + ave_cycle, buf_size, nr_buf, memory, > rte_get_timer_hz()); > + printf("Average bandwidth: %.3lfGbps, MOps: %.3lf\n", bandwidth, mops); > + > + if (is_dma) > + snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN, > CSV_LINE_DMA_FMT, > + scenario_id, lcore_id, dma_name, buf_size, > + nr_buf, memory, ave_cycle, bandwidth, mops); > + else > + snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN, > CSV_LINE_CPU_FMT, > + scenario_id, lcore_id, buf_size, > + nr_buf, memory, ave_cycle, bandwidth, mops); > +} > + > +static inline void > +cache_flush_buf(__maybe_unused struct rte_mbuf **array, > + __maybe_unused uint32_t buf_size, > +
Re: [PATCH v4 00/42] replace strerror
On 2024/10/26 5:56, Thomas Monjalon wrote: > 24/10/2024 08:47, huangdengdui: >> On 2024/10/23 23:42, Stephen Hemminger wrote: >>> On Wed, 23 Oct 2024 16:28:10 +0800 >>> Dengdui Huang wrote: >>> >>>> The function strerror() is insecure in a multi-thread environment. >>>> It is better to use rte_strerror() instead of strerror(). >>>> In this patchset, only the libs and drivers are modified. >>> >>> Even rte_strerror is not completely safe. It depends on the calling >>> thread being a registered lcore. >> >> As discussed earlier, it is still safe if used from non-DPDK registered >> threads[1]: >> >> #define RTE_DEFINE_PER_LCORE(type, name) \ >> __thread __typeof__(type) per_lcore_##name >> >> [1]: >> https://elixir.bootlin.com/dpdk/v23.11-rc3/source/lib/eal/include/rte_per_lcore.h#L37 >> >>> >>> It would be better to use a coccinelle script to do direct replacement >>> with strerror_r(). >>> >>> Also, rte_strerror is not signal safe. >> >> Can we use strerror_r() in the signal processing context and replace it with >> rte_strerror() everywhere else? > > It does not make sense to use rte_strerror after libc functions. > Please restrict the use of rte_strerror for error numbers from DPDK functions. > > The Windows platform does not support strerror_r(). Using strerror_r() instead of strerror() is not a good idea, Platform differences have been handled in rte_strerror()[1]. The rte_strerror() can also handle errno from libc functions[2][3]. So is it better to use rte_strerror() instead of strerror()? [1]https://elixir.bootlin.com/dpdk/v23.11-rc3/source/lib/eal/common/eal_common_errno.c#L15 [2]https://elixir.bootlin.com/dpdk/v23.11-rc3/source/lib/eal/common/eal_common_errno.c#L49 [3]https://elixir.bootlin.com/dpdk/v23.11-rc3/source/lib/eal/include/rte_errno.h#L32
Re: [PATCH v4 00/42] replace strerror
Hi, Stephen Hemminger On 2024/10/23 23:42, Stephen Hemminger wrote: > On Wed, 23 Oct 2024 16:28:10 +0800 > Dengdui Huang wrote: > >> The function strerror() is insecure in a multi-thread environment. >> It is better to use rte_strerror() instead of strerror(). >> In this patchset, only the libs and drivers are modified. >> >> chang log: >> v3->v4 fix ci error >> v2->v3 drop patch "telemetry: replace strerror" due to compile fail >> v1-v2 fix ci error > > Even rte_strerror is not completely safe. It depends on the calling > thread being a registered lcore. As discussed earlier, it is still safe if used from non-DPDK registered threads[1]: #define RTE_DEFINE_PER_LCORE(type, name)\ __thread __typeof__(type) per_lcore_##name [1]: https://elixir.bootlin.com/dpdk/v23.11-rc3/source/lib/eal/include/rte_per_lcore.h#L37 > > It would be better to use a coccinelle script to do direct replacement > with strerror_r(). > > Also, rte_strerror is not signal safe. Can we use strerror_r() in the signal processing context and replace it with rte_strerror() everywhere else?
Re: [PATCH 00/43] replace strerror
On 2024/11/7 2:57, Stephen Hemminger wrote: > On Tue, 14 Nov 2023 16:24:56 +0800 > Dengdui Huang wrote: > >> The function strerror() is insecure in a multi-thread environment. >> This series of patches fix it. In this patchset, only the libs and >> drivers are modified. > > strerror is ok in multi-threaded environment, look at what glibc does. > Yes, it's safe in the latest version.[1] But, static variables used in previous versions, so it is insecure in multi-threaded environment.[2][3] [1]:https://elixir.bootlin.com/glibc/glibc-2.40/source/string/strerror.c#L24 [2]:https://elixir.bootlin.com/glibc/glibc-2.31/source/string/strerror.c#L26 [3]:https://elixir.bootlin.com/glibc/glibc-2.3/source/string/strerror.c#L29 > The bigger problem is where DPDK code is calling strerror() on rte_errno. > This should be flagged by checkpatch (but isn't now). I found it, too, and have fixed it in this patchset. But I didn't add it to the checkpatch script. > > Examples: > $ git grep 'strerror(rte_errno)' | grep -v rte_strerror > app/dumpcap/main.c:strerror(rte_errno)); > app/test/test_bpf.c: __func__, __LINE__, rte_errno, > strerror(rte_errno)); > app/test/test_bpf.c: __func__, __LINE__, str, rte_errno, > strerror(rte_errno)); > app/test/test_bpf.c: __func__, __LINE__, rte_errno, > strerror(rte_errno)); > app/test/test_bpf.c: __func__, __LINE__, s, rte_errno, > strerror(rte_errno)); > app/test/test_bpf.c: __func__, __LINE__, rte_errno, > strerror(rte_errno)); > drivers/bus/vdev/vdev.c: devname, > strerror(rte_errno)); > drivers/common/mlx5/linux/mlx5_nl.c: iface_idx, strerror(rte_errno)); > drivers/common/mlx5/linux/mlx5_nl.c: add ? "add" : "remove", > m, strerror(rte_errno)); > drivers/common/mlx5/linux/mlx5_nl.c: strerror(rte_errno)); > drivers/common/mlx5/linux/mlx5_nl.c: strerror(rte_errno)); > drivers/common/mlx5/linux/mlx5_nl.c: strerror(rte_errno)); > drivers/common/mlx5/mlx5_common.c:strerror(rte_errno)); > drivers/common/mlx5/mlx5_common.c:strerror(rte_errno)); > drivers/common/mlx5/mlx5_common.c: > strerror(rte_errno)); > drivers/net/af_xdp/rte_eth_af_xdp.c: name, > strerror(rte_errno)); > drivers/net/bonding/rte_eth_bond_flow.c: > strerror(rte_errno)); > drivers/net/bonding/rte_eth_bond_flow.c: > strerror(rte_errno)); > drivers/net/failsafe/failsafe.c:strerror(rte_errno)); > drivers/net/failsafe/failsafe_eal.c:rte_errno > ? strerror(rte_errno) : "", > drivers/net/failsafe/failsafe_flow.c: strerror(rte_errno)); > drivers/net/failsafe/failsafe_flow.c: strerror(rte_errno)); > drivers/net/memif/rte_eth_memif.c:strerror(rte_errno)); > drivers/net/mlx4/mlx4.c:strerror(rte_errno)); > drivers/net/mlx4/mlx4_ethdev.c: mode, rte_errno, > strerror(rte_errno), error.type, error.cause, > drivers/net/mlx4/mlx4_ethdev.c: index, rte_errno, > strerror(rte_errno), error.type, error.cause, > drivers/net/mlx4/mlx4_ethdev.c: index, rte_errno, > strerror(rte_errno), error.type, error.cause, > drivers/net/mlx4/mlx4_ethdev.c: rte_errno, strerror(rte_errno), > error.type, error.cause, > drivers/net/mlx4/mlx4_ethdev.c: rte_errno, strerror(rte_errno), > error.type, error.cause, > drivers/net/mlx4/mlx4_ethdev.c: WARN("ioctl(SIOCGIFFLAGS) > failed: %s", strerror(rte_errno)); > drivers/net/mlx4/mlx4_ethdev.c:strerror(rte_errno)); > drivers/net/mlx4/mlx4_ethdev.c:strerror(rte_errno)); > drivers/net/mlx4/mlx4_ethdev.c:strerror(rte_errno)); > drivers/net/mlx4/mlx4_rxq.c:(void *)dev, > strerror(rte_errno)); > drivers/net/mlx4/mlx4_rxq.c:(void *)dev, > strerror(rte_errno)); > drivers/net/mlx4/mlx4_txq.c:(void *)dev, strerror(rte_errno)); > drivers/net/mlx4/mlx4_txq.c:(void *)dev, strerror(rte_errno)); > drivers/net/mlx4/mlx4_txq.c:(void *)dev, strerror(rte_errno)); > drivers/net/mlx4/mlx4_txq.c:(void *)dev, strerror(rte_errno)); > drivers/net/mlx4/mlx4_txq.c:(void *)dev, strerror(rte_errno)); > drivers/net/mlx5/hws/mlx5dr_matcher.c: > strerror(rte_errno)); > drivers/net/mlx5/linux/mlx5_ethdev_os.c: > dev->data->port_id, strerror(rte_errno)); > drivers/net/mlx5/linux/mlx5_ethdev_os.c: > dev->data->port_id, strerror(rte_errno)); > drivers/net/mlx5/linux/mlx5_ethdev_os.c: > dev->data->port
Re: [PATCH v4] net/hns3: fix Rx packet without CRC data
On 2024/11/30 1:12, Stephen Hemminger wrote: > On Fri, 29 Nov 2024 09:36:43 +0800 > Jie Hai wrote: > >>> + >>> +static inline void >>> +hns3_recalculate_crc(struct rte_mbuf *m) >>> +{ >>> + char *append_data; >>> + uint32_t crc; >>> + >>> + crc = rte_net_crc_calc(rte_pktmbuf_mtod(m, void *), >>> + m->data_len, RTE_NET_CRC32_ETH); >>> + >>> + /* >>> +* After CRC is stripped by hardware, pkt_len and data_len do not >>> +* contain the CRC length. Therefore, after CRC data is appended >>> +* by PMD again. >>> +*/ >>> + append_data = rte_pktmbuf_append(m, RTE_ETHER_CRC_LEN); >>> + >>> + /* CRC data is binary data and does not care about the byte order. */ >>> + memcpy(append_data, &crc, RTE_ETHER_CRC_LEN); >>> +} > > As mentioned previously. > Including CRC in the packet length (pkt_len and data_len) is not the > current behavior of most drivers. Therefore hns3 should follow the precedent > of other drivers and put it past the data. Yes. This patch does not change the original behavior. In subsequent processing, crc_len is deducted from pkt_len and data_len. > > In the future the KEEP_CRC flag needs more work to be useable. It needs > documentation and flag in mbuf (similar to hash and checksum) so that > application > can no that it is present and valid. > > Please resend the patch as a bugfix that puts crc after the data.
Re: [PATCH v3 3/3] net/hns3: fix Rx packet without CRC data
On 2024/11/26 1:45, Stephen Hemminger wrote: > On Fri, 19 Jul 2024 17:04:15 +0800 > Jie Hai wrote: > >> +static inline void >> +hns3_recalculate_crc(struct rte_mbuf *m) >> +{ >> +char *append_data; >> +uint32_t crc; >> + >> +crc = rte_net_crc_calc(rte_pktmbuf_mtod(m, void *), >> + m->data_len, RTE_NET_CRC32_ETH); >> + >> +/* >> + * The hns3 driver requires that mbuf size must be at least 512B. >> + * When CRC is stripped by hardware, the pkt_len must be less than >> + * or equal to 60B. Therefore, the space of the mbuf is enough >> + * to insert the CRC. >> + * >> + * In addition, after CRC is stripped by hardware, pkt_len and data_len >> + * do not contain the CRC length. Therefore, after CRC data is appended >> + * by PMD again, both pkt_len and data_len add the CRC length. >> + */ >> +append_data = rte_pktmbuf_append(m, RTE_NET_CRC32_ETH); >> +/* The CRC data is binary data and does not care about the byte order. >> */ >> +rte_memcpy(append_data, (void *)&crc, RTE_NET_CRC32_ETH); >> +} >> + > > There is a bug here. If there is no space at end of mbuf (tailroom) then > rte_pktmbuf_append will return NULL. This would only happen if mbuf pool > was sized such that there was space of a full size packet without CRC > and then the code to add kept CRC was executed. > > You need to check for NULL return and figure out some kind of unwind > path such as making it a receive error and dropping the packet. This situation has been described in the comments. The hns3 driver requires that mbuf size must be at least 512B.[1] When CRC needs to be recalculated, the packet length must be less than 60. So the space of the mbuf must be enough to insert the CRC. [1]:https://elixir.bootlin.com/dpdk/v23.11/source/drivers/net/hns3/hns3_rxtx.c#L1723
Re: [PATCH] mbuf: add packet offload Rx flag for keep CRC
On 2025/1/24 22:34, Morten Brørup wrote: >> From: Dengdui Huang [mailto:huangdeng...@huawei.com] >> Sent: Friday, 24 January 2025 11.00 >> >> After discussion[1], the drivers do not include the CRC in the packet >> length calculation. This will cause users to be confused about whether >> the mbuf contains CRC data. This patch adds a packet offload Rx flag, >> indicating that CRC data exists at the end of the mbuf chain. >> >> [1] https://inbox.dpdk.org/dev/20240206011030.2007689-1- >> haij...@huawei.com/ >> >> Signed-off-by: Dengdui Huang >> --- > > Mbufs with F_RX_KEEP_CRC requires much more than this. > > If the packet length omits the 4 byte Ethernet CRC, and the last segment only > holds the CRC, rte_mbuf_check() will fail and cause panic in > rte_mbuf_sanity_check(). > And many functions working on segments, such as rte_pktmbuf_copy(), > linearize(), etc. need to be patched to check for F_RX_KEEP_CRC when working > on the packet. This will degrade performance, and we are also talking about > frequently used dataplane functions. > Currently, when the CRC data is stored at the end of a packet, neither data_len nor pkt_len contains the CRC length. Therefore, using rte_pktmbuf_copy() and linearization() for packets containing CRC data is also problematic. > Furthermore, if we really need to support KEEP_CRC with segmented packets, we > need to add test cases with the CRC partially in the last segment, and with > only the CRC in the last segment, for functions and libraries supporting > segmented packets. Regardless if the packet length includes the 4 bytes CRC > or not. > > KEEP_CRC looks exotic to me, and am worried that full support for KEEP_CRC > will impact performance and would be essentially untested in a bunch of > libraries. I don't want exotic features impacting the performance of > frequently used dataplane functions. > Can you please remind me of the use cases for KEEP_CRC? > > Perhaps support for KEEP_CRC could be a build time option (default omitted, > for performance and test coverage reasons)? > > Alternatively, support for KEEP_CRC could be limited to non-segmented packets? > > Or, how about a completely different approach: > Drivers supporting KEEP_CRC can strip the 4 byte CRC (like stripping a VLAN > tag) and store it in an mbuf dynfield. > > It's a good idea to store it in mbuf dynfield. As mentioned above, storing CRC data at the end of the mbuf is very complex and currently imperfect. Can this feature be re-implemented in this simpler way?
Re: [PATCH] mbuf: add packet offload Rx flag for keep CRC
On 2025/1/25 1:34, Stephen Hemminger wrote: > On Fri, 24 Jan 2025 17:59:57 +0800 > Dengdui Huang wrote: > >> diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c >> index 559d5ad8a7..c828200ea1 100644 >> --- a/lib/mbuf/rte_mbuf.c >> +++ b/lib/mbuf/rte_mbuf.c >> @@ -771,7 +771,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask) >> case RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD: return >> "RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD"; >> case RTE_MBUF_F_RX_OUTER_L4_CKSUM_INVALID: >> return "RTE_MBUF_F_RX_OUTER_L4_CKSUM_INVALID"; >> - >> +case RTE_MBUF_F_RX_KEEP_CRC: return "RTE_MBUF_F_RX_KEEP_CRC"; >> default: return NULL; >> } > > DPDK style is to add break line after the case statement. > Please do it for both cases. OK,I made a mistake.
Re: [PATCH 1/3] net/hns3: fix simple Tx path incorrect free the mbuf
On 2025/1/9 0:57, Stephen Hemminger wrote: > On Wed, 8 Jan 2025 10:40:43 +0800 > Jie Hai wrote: > >> On 2024/12/31 1:55, Stephen Hemminger wrote: >>> On Mon, 30 Dec 2024 14:54:03 +0800 >>> Jie Hai wrote: >>> From: Jie Hai To: , , , , , Chengwen Feng , "Wei Hu (Xavier)" , Huisong Li CC: , Subject: [PATCH 1/3] net/hns3: fix simple Tx path incorrect free the mbuf Date: Mon, 30 Dec 2024 14:54:03 +0800 X-Mailer: git-send-email 2.22.0 From: Dengdui Huang When RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload is not set, use rte_pktmbuf_free_seg() to free the mbuf. Fixes: 7ef933908f04 ("net/hns3: add simple Tx path") Cc: sta...@dpdk.org Signed-off-by: Dengdui Huang Signed-off-by: Jie Hai >>> >>> What about the fast free case which is using rte_mempool_put_bulk when >>> it should use rte_pktmbuf_free_bulk instead? >>> >>> >> Hi, Stephen Hemminger, >> >> During the fast free case, the performance of using >> rte_mempool_put_bulk is higher than that of using >> rte_pktmbuf_free_bulk because other references >> to mbuf do not need to be considered. So it's better >> to not change. >> >> Thanks, >> Jie Hai > > The problem is that having an open coded version of this buried in > one driver is a long term potential proble> > If you really think that optimizing free like this is noticeable, then > why not make a new function "rte_pktmuf_fast_free_bulk" and put it in the > regular mbuf library. > Do you mean to add the following functions to the library? void rte_pktmbuf_fast_free_bulk(struct rte_mbuf **mbufs, unsigned int count) { rte_mempool_put_bulk(mbufs[0]->pool, (void **)mbufs, count); } The driver uses rte_mempool_put_bulk only when the following conditions are met: 1. All mbufs comes from the same mempool 2. All mbufs have only one reference. 3. All mbufs have only one segment. So the rte_pktmbuf_fast_free_bulk function is just a wrapper around the rte_mempool_put_bulk function.
Re: [PATCH] examples/l3fwd: add option to set refetch offset
On 2025/1/11 1:20, Stephen Hemminger wrote: > This will make it slower for many platforms. > GCC will unroll a loop of fixed small size, which is what we want. Do you mean to replace option with a macro? But most of prefetch_offset are used with the nb_rx, So using macros is the same as using options. const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); for (j = 0; j != k; j += FWDSTEP) { for (i = 0, pos = j + prefetch_offset; i < FWDSTEP && pos < k; i++, pos++) rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[pos], void *)); processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j], &dst_port[j]); if (do_step3) processx4_step3(&pkts_burst[j], &dst_port[j]); } The option can dynamically adjust the prefetch window, which makes it easier to find the prefetch window for a HW platform. So I think it's better to use option.
Re: [PATCH] examples/l3fwd: add option to set refetch offset
On 2025/1/11 1:19, Stephen Hemminger wrote: > On Fri, 10 Jan 2025 17:37:15 +0800 > Dengdui Huang wrote: > >> +#define DEFAULT_PREFECH_OFFSET 4 > > Spelling I made a mistake. I'll fix it for the next version.
Re: [PATCH v2] mbuf: add fast free bulk function
On 2025/1/15 17:38, Morten Brørup wrote: >> From: huangdengdui [mailto:huangdeng...@huawei.com] >> Sent: Wednesday, 15 January 2025 07.52 >> >> On 2025/1/15 0:39, Morten Brørup wrote: >>> mbuf: add fast free bulk function >>> >>> When putting an mbuf back into its mempool, there are certain >> requirements >>> to the mbuf. Specifically, some of its fields must be initialized. >>> >>> These requirements are in fact invariants about free mbufs, held in >>> mempools, and thus also apply when allocating an mbuf from a mempool. >>> With this in mind, the additional assertions in rte_mbuf_raw_free() >> were >>> moved to __rte_mbuf_raw_sanity_check(). >>> Furthermore, the assertion regarding pinned external buffer was >> enhanced; >>> it now also asserts that the referenced pinned external buffer has >>> refcnt == 1. >>> >>> The description of RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE was updated to >>> include the remaining requirements, which were missing here. >>> >>> And finally: >>> A new rte_mbuf_fast_free_bulk() inline function was added for the >> benefit >>> of ethdev drivers supporting fast release of mbufs. >>> It asserts these requirements and that the mbufs belong to the >> specified >>> mempool, and then calls rte_mempool_put_bulk(). >>> >>> Signed-off-by: Morten Brørup >>> --- >>> v2: >>> * Fixed missing inline. >>> --- >>> lib/ethdev/rte_ethdev.h | 6 -- >>> lib/mbuf/rte_mbuf.h | 39 +-- >>> 2 files changed, 41 insertions(+), 4 deletions(-) >>> >>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h >>> index 1f71cad244..e9267fca79 100644 >>> --- a/lib/ethdev/rte_ethdev.h >>> +++ b/lib/ethdev/rte_ethdev.h >>> @@ -1612,8 +1612,10 @@ struct rte_eth_conf { >>> #define RTE_ETH_TX_OFFLOAD_MULTI_SEGS RTE_BIT64(15) >>> /** >>> * Device supports optimization for fast release of mbufs. >>> - * When set application must guarantee that per-queue all mbufs >> comes from >>> - * the same mempool and has refcnt = 1. >>> + * When set application must guarantee that per-queue all mbufs come >> from the same mempool, >>> + * are direct, have refcnt=1, next=NULL and nb_segs=1, as done by >> rte_pktmbuf_prefree_seg(). >>> + * >>> + * @see rte_mbuf_fast_free_bulk() >>> */ >>> #define RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE RTE_BIT64(16) >>> #define RTE_ETH_TX_OFFLOAD_SECURITY RTE_BIT64(17) >>> diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h >>> index 0d2e0e64b3..7590d82689 100644 >>> --- a/lib/mbuf/rte_mbuf.h >>> +++ b/lib/mbuf/rte_mbuf.h >>> @@ -568,6 +568,10 @@ __rte_mbuf_raw_sanity_check(__rte_unused const >> struct rte_mbuf *m) >>> RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1); >>> RTE_ASSERT(m->next == NULL); >>> RTE_ASSERT(m->nb_segs == 1); >>> + RTE_ASSERT(!RTE_MBUF_CLONED(m)); >>> + RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m) || >>> + (RTE_MBUF_HAS_PINNED_EXTBUF(m) && >>> + rte_mbuf_ext_refcnt_read(m->shinfo) == 1)); >>> __rte_mbuf_sanity_check(m, 0); >>> } >>> >>> @@ -623,12 +627,43 @@ static inline struct rte_mbuf >> *rte_mbuf_raw_alloc(struct rte_mempool *mp) >>> static __rte_always_inline void >>> rte_mbuf_raw_free(struct rte_mbuf *m) >>> { >>> - RTE_ASSERT(!RTE_MBUF_CLONED(m) && >>> - (!RTE_MBUF_HAS_EXTBUF(m) || >> RTE_MBUF_HAS_PINNED_EXTBUF(m))); >>> __rte_mbuf_raw_sanity_check(m); >>> rte_mempool_put(m->pool, m); >>> } >>> >>> +/** >>> + * Put a bulk of mbufs allocated from the same mempool back into the >> mempool. >>> + * >>> + * The caller must ensure that the mbufs come from the specified >> mempool, >>> + * are direct and properly reinitialized (refcnt=1, next=NULL, >> nb_segs=1), as done by >>> + * rte_pktmbuf_prefree_seg(). >>> + * >>> + * This function should be used with care, when optimization is >>> + * required. For standard needs, prefer rte_pktmbuf_free_bulk(). >>> + * >>> + * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE >>> + * >>> + * @param mp >>> + * The mempool to which the mbufs belong. >>> + * @param mbu
Re: [PATCH 1/3] net/hns3: fix simple Tx path incorrect free the mbuf
On 2025/1/10 1:03, Stephen Hemminger wrote: > On Thu, 9 Jan 2025 15:43:12 +0800 > huangdengdui wrote: > >> On 2025/1/9 0:57, Stephen Hemminger wrote: >>> On Wed, 8 Jan 2025 10:40:43 +0800 >>> Jie Hai wrote: >>> >>>> On 2024/12/31 1:55, Stephen Hemminger wrote: >>>>> On Mon, 30 Dec 2024 14:54:03 +0800 >>>>> Jie Hai wrote: >>>>> >>>>>> From: Jie Hai >>>>>> To: , , , >>>>>> , , Chengwen >>>>>> Feng , "Wei Hu (Xavier)" >>>>>> , Huisong Li >>>>>> CC: , >>>>>> Subject: [PATCH 1/3] net/hns3: fix simple Tx path incorrect free the mbuf >>>>>> Date: Mon, 30 Dec 2024 14:54:03 +0800 >>>>>> X-Mailer: git-send-email 2.22.0 >>>>>> >>>>>> From: Dengdui Huang >>>>>> >>>>>> When RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload is not set, >>>>>> use rte_pktmbuf_free_seg() to free the mbuf. >>>>>> >>>>>> Fixes: 7ef933908f04 ("net/hns3: add simple Tx path") >>>>>> Cc: sta...@dpdk.org >>>>>> >>>>>> Signed-off-by: Dengdui Huang >>>>>> Signed-off-by: Jie Hai >>>>> >>>>> What about the fast free case which is using rte_mempool_put_bulk when >>>>> it should use rte_pktmbuf_free_bulk instead? >>>>> >>>>> >>>> Hi, Stephen Hemminger, >>>> >>>> During the fast free case, the performance of using >>>> rte_mempool_put_bulk is higher than that of using >>>> rte_pktmbuf_free_bulk because other references >>>> to mbuf do not need to be considered. So it's better >>>> to not change. >>>> >>>> Thanks, >>>> Jie Hai >>> >>> The problem is that having an open coded version of this buried in >>> one driver is a long term potential proble> >>> If you really think that optimizing free like this is noticeable, then >>> why not make a new function "rte_pktmuf_fast_free_bulk" and put it in the >>> regular mbuf library. >>> >> >> Do you mean to add the following functions to the library? >> >> void rte_pktmbuf_fast_free_bulk(struct rte_mbuf **mbufs, unsigned int count) >> { >> rte_mempool_put_bulk(mbufs[0]->pool, (void **)mbufs, count); >> } > > Yes something like that. Or call it rte_mbuf_raw_free_bulk to align with > what rte_mbuf_raw_free(). And maybe add some debug assertions to make sure > that mbuf is not cloned, indirect or has refcnt. > > The concern is that this optimization might put an mbuf in the pool > that has different properties than the normal free path. > And that all semantics of allocation/free should be in rte_mbuf code to > allow for future optimizations. "Morten Brørup" has submitted a patch to implement this function.[1] Currently, multiple drivers use rte_mempool_put_bulk to release mbufs. We can modify them later so that all drivers use rte_mbuf_fast_free_bulk to free mbufs. The current patch is a function problem, which has caused troubles to hns3 users. Can we merge it first? [1]https://inbox.dpdk.org/dev/20250114163951.125667-1...@smartsharesystems.com/ > > >> >> The driver uses rte_mempool_put_bulk only when the following conditions are >> met: >> 1. All mbufs comes from the same mempool >> 2. All mbufs have only one reference. >> 3. All mbufs have only one segment. >> So the rte_pktmbuf_fast_free_bulk function is just a wrapper around the >> rte_mempool_put_bulk function. >
Re: [PATCH v2] mbuf: add fast free bulk function
On 2025/1/15 0:39, Morten Brørup wrote: > mbuf: add fast free bulk function > > When putting an mbuf back into its mempool, there are certain requirements > to the mbuf. Specifically, some of its fields must be initialized. > > These requirements are in fact invariants about free mbufs, held in > mempools, and thus also apply when allocating an mbuf from a mempool. > With this in mind, the additional assertions in rte_mbuf_raw_free() were > moved to __rte_mbuf_raw_sanity_check(). > Furthermore, the assertion regarding pinned external buffer was enhanced; > it now also asserts that the referenced pinned external buffer has > refcnt == 1. > > The description of RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE was updated to > include the remaining requirements, which were missing here. > > And finally: > A new rte_mbuf_fast_free_bulk() inline function was added for the benefit > of ethdev drivers supporting fast release of mbufs. > It asserts these requirements and that the mbufs belong to the specified > mempool, and then calls rte_mempool_put_bulk(). > > Signed-off-by: Morten Brørup > --- > v2: > * Fixed missing inline. > --- > lib/ethdev/rte_ethdev.h | 6 -- > lib/mbuf/rte_mbuf.h | 39 +-- > 2 files changed, 41 insertions(+), 4 deletions(-) > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > index 1f71cad244..e9267fca79 100644 > --- a/lib/ethdev/rte_ethdev.h > +++ b/lib/ethdev/rte_ethdev.h > @@ -1612,8 +1612,10 @@ struct rte_eth_conf { > #define RTE_ETH_TX_OFFLOAD_MULTI_SEGS RTE_BIT64(15) > /** > * Device supports optimization for fast release of mbufs. > - * When set application must guarantee that per-queue all mbufs comes from > - * the same mempool and has refcnt = 1. > + * When set application must guarantee that per-queue all mbufs come from > the same mempool, > + * are direct, have refcnt=1, next=NULL and nb_segs=1, as done by > rte_pktmbuf_prefree_seg(). > + * > + * @see rte_mbuf_fast_free_bulk() > */ > #define RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE RTE_BIT64(16) > #define RTE_ETH_TX_OFFLOAD_SECURITY RTE_BIT64(17) > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h > index 0d2e0e64b3..7590d82689 100644 > --- a/lib/mbuf/rte_mbuf.h > +++ b/lib/mbuf/rte_mbuf.h > @@ -568,6 +568,10 @@ __rte_mbuf_raw_sanity_check(__rte_unused const struct > rte_mbuf *m) > RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1); > RTE_ASSERT(m->next == NULL); > RTE_ASSERT(m->nb_segs == 1); > + RTE_ASSERT(!RTE_MBUF_CLONED(m)); > + RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m) || > + (RTE_MBUF_HAS_PINNED_EXTBUF(m) && > + rte_mbuf_ext_refcnt_read(m->shinfo) == 1)); > __rte_mbuf_sanity_check(m, 0); > } > > @@ -623,12 +627,43 @@ static inline struct rte_mbuf > *rte_mbuf_raw_alloc(struct rte_mempool *mp) > static __rte_always_inline void > rte_mbuf_raw_free(struct rte_mbuf *m) > { > - RTE_ASSERT(!RTE_MBUF_CLONED(m) && > - (!RTE_MBUF_HAS_EXTBUF(m) || RTE_MBUF_HAS_PINNED_EXTBUF(m))); > __rte_mbuf_raw_sanity_check(m); > rte_mempool_put(m->pool, m); > } > > +/** > + * Put a bulk of mbufs allocated from the same mempool back into the mempool. > + * > + * The caller must ensure that the mbufs come from the specified mempool, > + * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1), > as done by > + * rte_pktmbuf_prefree_seg(). > + * > + * This function should be used with care, when optimization is > + * required. For standard needs, prefer rte_pktmbuf_free_bulk(). > + * > + * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE > + * > + * @param mp > + * The mempool to which the mbufs belong. > + * @param mbufs > + * Array of pointers to packet mbufs. > + * The array must not contain NULL pointers. > + * @param count > + * Array size. > + */ > +static __rte_always_inline void > +rte_mbuf_fast_free_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, > unsigned int count) > +{ > + for (unsigned int idx = 0; idx < count; idx++) { > + const struct rte_mbuf *m = mbufs[idx]; > + RTE_ASSERT(m != NULL); > + RTE_ASSERT(m->pool == mp); > + __rte_mbuf_raw_sanity_check(m); > + } Is there some way to avoid executing a loop in non-debug mode? Like the following or other better way #ifdef RTE_LIBRTE_MBUF_DEBUG { for (unsigned int idx = 0; idx < count; idx++) { const struct rte_mbuf *m = mbufs[idx]; RTE_ASSERT(m != NULL); RTE_ASSERT(m->pool == mp); __rte_mbuf_raw_sanity_check(m); } } #endif > + > + rte_mempool_put_bulk(mp, (void **)mbufs, count); Can the mp be obtained from the mbuf? > +} > + > /** > * The packet mbuf constructor. > *
Re: [PATCH] examples/l3fwd: optimize packet prefetch
On 2025/1/8 21:42, Konstantin Ananyev wrote: > > >> >> The prefetch window depending on the hardware platform. The current prefetch >> policy may not be applicable to all platforms. In most cases, the number of >> packets received by Rx burst is small (64 is used in most performance >> reports). >> In L3fwd, the maximum value cannot exceed 512. Therefore, prefetching all >> packets before processing can achieve better performance. > > As you mentioned 'prefetch' behavior differs a lot from one HW platform to > another. > So it could easily be that changes you suggesting will cause performance > boost on one platform and degradation on another. > In fact, right now l3fwd 'prefetch' usage is a bit of mess: > - l3fwd_lpm_neon.h uses FWDSTEP as a prefetch window. > - l3fwd_fib.c uses FIB_PREFETCH_OFFSET for that purpose > - rest of the code uses either PREFETCH_OFFSET or doesn't use 'prefetch' at > all > > Probably what we need here is some unified approach: > configurable at run-time prefetch_window_size that all code-paths will obey. Agreed, I'll add a parameter to configure the prefetch window. > >> Signed-off-by: Dengdui Huang >> --- >> examples/l3fwd/l3fwd_lpm_neon.h | 42 - >> 1 file changed, 5 insertions(+), 37 deletions(-) >> >> diff --git a/examples/l3fwd/l3fwd_lpm_neon.h >> b/examples/l3fwd/l3fwd_lpm_neon.h >> index 3c1f827424..0b51782b8c 100644 >> --- a/examples/l3fwd/l3fwd_lpm_neon.h >> +++ b/examples/l3fwd/l3fwd_lpm_neon.h >> @@ -91,53 +91,21 @@ l3fwd_lpm_process_packets(int nb_rx, struct rte_mbuf >> **pkts_burst, >> const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); >> const int32_t m = nb_rx % FWDSTEP; >> >> -if (k) { >> -for (i = 0; i < FWDSTEP; i++) { >> -rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], >> -void *)); >> -} >> -for (j = 0; j != k - FWDSTEP; j += FWDSTEP) { >> -for (i = 0; i < FWDSTEP; i++) { >> -rte_prefetch0(rte_pktmbuf_mtod( >> -pkts_burst[j + i + FWDSTEP], >> -void *)); >> -} >> +/* The number of packets is small. Prefetch all packets. */ >> +for (i = 0; i < nb_rx; i++) >> +rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *)); >> >> +if (k) { >> +for (j = 0; j != k; j += FWDSTEP) { >> processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); >> processx4_step2(qconf, dip, ipv4_flag, portid, >> &pkts_burst[j], &dst_port[j]); >> if (do_step3) >> processx4_step3(&pkts_burst[j], &dst_port[j]); >> } >> - >> -processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); >> -processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j], >> -&dst_port[j]); >> -if (do_step3) >> -processx4_step3(&pkts_burst[j], &dst_port[j]); >> - >> -j += FWDSTEP; >> } >> >> if (m) { >> -/* Prefetch last up to 3 packets one by one */ >> -switch (m) { >> -case 3: >> -rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], >> -void *)); >> -j++; >> -/* fallthrough */ >> -case 2: >> -rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], >> -void *)); >> -j++; >> -/* fallthrough */ >> -case 1: >> -rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], >> -void *)); >> -j++; >> -} >> -j -= m; >> /* Classify last up to 3 packets one by one */ >> switch (m) { >> case 3: >> -- >> 2.33.0 >
Re: [PATCH] maintainers: update for hns3
Acked-by: Dengdui Huang On 2025/4/2 16:47, Jie Hai wrote: > I am moving on to other things and dengdui is going to > take over the role of hns3 maintainer. Update the > MAINTAINERS accordingly. > > Signed-off-by: Jie Hai > --- > MAINTAINERS | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4b01103f8e83..bcc9fdaf0c92 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -765,7 +765,7 @@ F: doc/guides/nics/gve.rst > F: doc/guides/nics/features/gve.ini > > Hisilicon hns3 > -M: Jie Hai > +M: Dengdui Huang > F: drivers/net/hns3/ > F: doc/guides/nics/hns3.rst > F: doc/guides/nics/features/hns3.ini
Re: [PATCH v2] net: fix GTP packet parsing
Hi, Jiale, this patch can solve the issues[1] you reported. Can you review this patch and add a Tested-by tag? [1] https://bugs.dpdk.org/show_bug.cgi?id=1672 On 2025/4/17 20:37, Dengdui Huang wrote: > After parsing the GTP packet header, the next protocol type should > be converted from RTE_GTP_TYPE_IPV4/IPV6 to RTE_ETHER_TYPE_IPV4/IPV6. > Otherwise, the next protocol cannot be parsed. > > Bugzilla ID: 1672 > Fixes: 64ed7f854cf4 ("net: add tunnel packet type parsing") > Cc: sta...@dpdk.org > > Signed-off-by: Dengdui Huang > Acked-by: Jie Hai > --- > lib/net/rte_net.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/lib/net/rte_net.c b/lib/net/rte_net.c > index be24690fdf..1771588a09 100644 > --- a/lib/net/rte_net.c > +++ b/lib/net/rte_net.c > @@ -231,7 +231,13 @@ ptype_tunnel_with_udp(uint16_t *proto, const struct > rte_mbuf *m, >*/ > if (gh->msg_type == 0xff) { > ip_ver = *(const uint8_t *)((const char *)gh + gtp_len); > - *proto = (ip_ver) & 0xf0; > + ip_ver = (ip_ver) & 0xf0; > + if (ip_ver == RTE_GTP_TYPE_IPV4) > + *proto = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4); > + else if (ip_ver == RTE_GTP_TYPE_IPV6) > + *proto = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV6); > + else > + *proto = 0; > } else { > *proto = 0; > }
Re: [PATCH] examples/l3fwd: adjust Tx burst size based on Rx burst
Tested-by: Dengdui Huang The issue addressed by this patch is the same as the one fixed by the patch [1] pushed by Haijie. However, this patch is better, and I have already tested it on the Kunpeng platform. Please make the changes based on Konstantin's review comments and push the next version. [1] https://patchwork.dpdk.org/project/dpdk/patch/20241204020622.977156-1-haij...@huawei.com/ On 2025/2/12 12:54, Sivaprasad Tummala wrote: > Previously, the TX burst size was fixed at 256, leading to performance > degradation in certain scenarios. > > This patch introduces logic to set the TX burst size to match the > configured RX burst size (--burst option, default 32, max 512) > for better efficiency. > > Fixes: d5c4897ecfb2 ("examples/l3fwd: add option to set Rx burst size") > Cc: haij...@huawei.com > Cc: sta...@dpdk.org > > Signed-off-by: Sivaprasad Tummala > --- > examples/l3fwd/l3fwd.h| 6 +++--- > examples/l3fwd/l3fwd_common.h | 11 +++ > examples/l3fwd/main.c | 2 ++ > 3 files changed, 12 insertions(+), 7 deletions(-) > > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h > index 0cce3406ee..9d7c73504b 100644 > --- a/examples/l3fwd/l3fwd.h > +++ b/examples/l3fwd/l3fwd.h > @@ -35,7 +35,7 @@ > /* > * Try to avoid TX buffering if we have at least MAX_TX_BURST packets to > send. > */ > -#define MAX_TX_BURST (MAX_PKT_BURST / 2) > +#define MAX_TX_BURST (DEFAULT_PKT_BURST / 2) > > #define NB_SOCKETS8 > > @@ -152,8 +152,8 @@ send_single_packet(struct lcore_conf *qconf, > len++; > > /* enough pkts to be sent */ > - if (unlikely(len == MAX_PKT_BURST)) { > - send_burst(qconf, MAX_PKT_BURST, port); > + if (unlikely(len == nb_pkt_per_burst)) { > + send_burst(qconf, nb_pkt_per_burst, port); > len = 0; > } > > diff --git a/examples/l3fwd/l3fwd_common.h b/examples/l3fwd/l3fwd_common.h > index d94e5f1357..6cb7de5144 100644 > --- a/examples/l3fwd/l3fwd_common.h > +++ b/examples/l3fwd/l3fwd_common.h > @@ -25,6 +25,9 @@ > */ > #define SENDM_PORT_OVERHEAD(x) (x) > > +extern uint32_t nb_pkt_per_burst; > +extern uint32_t max_tx_burst; > + > /* > * From http://www.rfc-editor.org/rfc/rfc1812.txt section 5.2.2: > * - The IP version number must be 4. > @@ -71,7 +74,7 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, > struct rte_mbuf *m[], >* If TX buffer for that queue is empty, and we have enough packets, >* then send them straightway. >*/ > - if (num >= MAX_TX_BURST && len == 0) { > + if (num >= max_tx_burst && len == 0) { > n = rte_eth_tx_burst(port, qconf->tx_queue_id[port], m, num); > if (unlikely(n < num)) { > do { > @@ -86,7 +89,7 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, > struct rte_mbuf *m[], >*/ > > n = len + num; > - n = (n > MAX_PKT_BURST) ? MAX_PKT_BURST - len : num; > + n = (n > nb_pkt_per_burst) ? nb_pkt_per_burst - len : num; > > j = 0; > switch (n % FWDSTEP) { > @@ -112,9 +115,9 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, > struct rte_mbuf *m[], > len += n; > > /* enough pkts to be sent */ > - if (unlikely(len == MAX_PKT_BURST)) { > + if (unlikely(len == nb_pkt_per_burst)) { > > - send_burst(qconf, MAX_PKT_BURST, port); > + send_burst(qconf, nb_pkt_per_burst, port); > > /* copy rest of the packets into the TX buffer. */ > len = num - n; > diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c > index 994b7dd8e5..4cabd05be2 100644 > --- a/examples/l3fwd/main.c > +++ b/examples/l3fwd/main.c > @@ -59,6 +59,7 @@ uint16_t nb_rxd = RX_DESC_DEFAULT; > uint16_t nb_txd = TX_DESC_DEFAULT; > uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST; > uint32_t mb_mempool_cache_size = MEMPOOL_CACHE_SIZE; > +uint32_t max_tx_burst = MAX_TX_BURST; > > /**< Ports set in promiscuous mode off by default. */ > static int promiscuous_on; > @@ -733,6 +734,7 @@ parse_pkt_burst(const char *optarg) > return; > } > nb_pkt_per_burst = burst_size; > + max_tx_burst = burst_size / 2; > RTE_LOG(INFO, L3FWD, "Using PMD-provided burst value %d\n", burst_size); > } >
Re: [PATCH 0/2] fix the problem of dma-perf infinite loop
On 2025/3/21 15:49, David Marchand wrote: > On Fri, Mar 21, 2025 at 5:03 AM Dengdui Huang wrote: >> >> After CPU isolation is configured, an infinite loop occurs when >> dma-perf is executed using the default config file. >> >> This patchset fix it. >> >> Dengdui Huang (2): >> eal: fix wake a incorrect lcore >> app/dma-perf: fix infinite loop >> >> app/test-dma-perf/benchmark.c | 5 - >> lib/eal/unix/eal_unix_thread.c | 3 +++ >> lib/eal/windows/eal_thread.c | 3 +++ >> 3 files changed, 10 insertions(+), 1 deletion(-) > > We can make EAL launch API more resilient, but the dma-perf > application should not post jobs on non sense lcore id, in the first > place. > parse_lcore() should be validating that the requested lcore_id is valid. > > OK, I'll add a check to make sure dma-perf uses the correct lcore.
Re: [PATCH 2/2] app/dma-perf: fix infinite loop
On 2025/3/21 23:58, Stephen Hemminger wrote: > On Fri, 21 Mar 2025 12:03:16 +0800 > Dengdui Huang wrote: > >> When a core that is not used by the rte is specified in the config >> for testing, the problem of infinite loop occurs. The root cause >> is that the program waits for the completion of the test task when >> the test worker fails to be started on the lcore. This patch fix it. >> >> Fixes: 533d7e7f66f3 ("app/dma-perf: support config per device") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang >> --- >> app/test-dma-perf/benchmark.c | 5 - >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-perf/benchmark.c >> index 6d617ea200..351c1c966e 100644 >> --- a/app/test-dma-perf/benchmark.c >> +++ b/app/test-dma-perf/benchmark.c >> @@ -751,7 +751,10 @@ mem_copy_benchmark(struct test_configure *cfg) >> goto out; >> } >> >> -rte_eal_remote_launch(get_work_function(cfg), (void >> *)(lcores[i]), lcore_id); >> +if (rte_eal_remote_launch(get_work_function(cfg), (void >> *)(lcores[i]), lcore_id)) { >> +printf("Error: Fail to start the test on lcore %d\n", >> lcore_id); > > Convention is to log errors on stderr and lcore_id is unsigned not signed > value. OK, I'll fix it for the next version.
Re: [PATCH] mem: fix infinite loop
On 2025/4/3 5:16, Dmitry Kozlyuk wrote: > On 02.04.2025 15:42, Dengdui Huang wrote: >> When the process address space is insufficient, >> mmap will fail, which will cause an infinite loop. >> This pathc fix it. >> >> Fixes: c4b89ecb64ea ("eal: introduce memory management wrappers") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang > > Acked-by: Dmitry Kozlyuk > > Typo: "pathc fix" -> "patch fixes", or, better: > "This patch stops attempting mmap if it fails and the requested size cannot > be reduced." > > The bug is actually older, but it doesn't matter since both c4b89ecb64ea and > b7cc54187ea4 belong to all LTS: > > Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory") > It would be better to write the commit log like this, and I'll modify it in v2.
Re: [PATCH v3 2/3] eal: fix unckeck pipe
On 2025/4/2 22:45, Morten Brørup wrote: >> From: Dengdui Huang [mailto:huangdeng...@huawei.com] >> Sent: Wednesday, 2 April 2025 14.25 >> >> The communication pipe may be unavailable, for example, >> when the lcore role is ROLE_OFF or ROLE_NON_EAL. >> So check whether the pipe is available before using it. >> >> Fixes: a95d70547c57 ("eal: factorize lcore main loop") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang >> --- >> lib/eal/unix/eal_unix_thread.c | 3 +++ >> lib/eal/windows/eal_thread.c | 3 +++ >> 2 files changed, 6 insertions(+) >> >> diff --git a/lib/eal/unix/eal_unix_thread.c >> b/lib/eal/unix/eal_unix_thread.c >> index ef6cbff0ee..103571cabf 100644 >> --- a/lib/eal/unix/eal_unix_thread.c >> +++ b/lib/eal/unix/eal_unix_thread.c >> @@ -17,6 +17,9 @@ eal_thread_wake_worker(unsigned int worker_id) >> char c = 0; >> int n; >> >> +if (m2w == 0 || w2m == 0) >> +return -EINVAL; >> + >> do { >> n = write(m2w, &c, 1); >> } while (n == 0 || (n < 0 && errno == EINTR)); >> diff --git a/lib/eal/windows/eal_thread.c >> b/lib/eal/windows/eal_thread.c >> index 9e3df200b9..82bb974868 100644 >> --- a/lib/eal/windows/eal_thread.c >> +++ b/lib/eal/windows/eal_thread.c >> @@ -24,6 +24,9 @@ eal_thread_wake_worker(unsigned int worker_id) >> char c = 0; >> int n; >> >> +if (m2w == 0 || w2m == 0) >> +return -EINVAL; >> + >> do { >> n = _write(m2w, &c, 1); >> } while (n == 0 || (n < 0 && errno == EINTR)); >> -- >> 2.33.0 > > This internal function eal_thread_wake_worker() is only called from > rte_eal_remote_launch(). > It would be better to check the lcore role there. > Is it better to check lcore roles in rte_eal_remote_launch() instead? Only Role_RTE and Role_SERVICE lcore can be woken up through rte_eal_remote_launch(). > References: > https://elixir.bootlin.com/dpdk/v25.03-rc2/A/ident/eal_thread_wake_worker > https://elixir.bootlin.com/dpdk/v25.03-rc2/source/lib/eal/common/eal_common_launch.c#L52 > > Furthermore, in theory, file descriptor 0 is a valid file descriptor; if > "stdin" has been closed, a newly opened file (or pipe) may be assigned file > descriptor 0. > >
Re: [PATCH v2 1/2] eal: fix uncheck worker ID
On 2025/3/27 17:32, Morten Brørup wrote: >> From: Dengdui Huang [mailto:huangdeng...@huawei.com] >> Sent: Thursday, 27 March 2025 10.01 >> >> The worker_id may come from user input. >> So it is necessary to verify it. >> >> Fixes: a95d70547c57 ("eal: factorize lcore main loop") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang >> --- >> lib/eal/common/eal_common_launch.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/lib/eal/common/eal_common_launch.c >> b/lib/eal/common/eal_common_launch.c >> index 5320c3bd3c..76313d5cdf 100644 >> --- a/lib/eal/common/eal_common_launch.c >> +++ b/lib/eal/common/eal_common_launch.c >> @@ -35,6 +35,9 @@ rte_eal_remote_launch(lcore_function_t *f, void *arg, >> unsigned int worker_id) >> { >> int rc = -EBUSY; >> > > Also check the lcore_id: > + if (unlikely(lcore_id >= RTE_MAX_LCORE)) > + return -EINVAL; > + > >> +if (!rte_lcore_has_role(worker_id, ROLE_RTE)) >> +return -EINVAL; >> + > > Although rte_lcore_has_role() checks the lcore_id and returns -EINVAL if > invalid, "if (!-EINVAL))" will not take the branch to return -EINVAL here. So > this check alone does not suffice. You could change it to "if > (rte_lcore_has_role(worker_id, ROLE_RTE) <= 0) return -EINVAL;". > However, I prefer the separate lcore_id check, as mentioned above, rather > than changing this check. > >> /* Check if the worker is in 'WAIT' state. Use acquire order >> * since 'state' variable is used as the guard variable. >> */ >> -- >> 2.33.0 > > > While you are at it, please also add "if (unlikely(lcore_id >= > RTE_MAX_LCORE)) return -EINVAL;" checks in the other functions in > eal_common_lcore.c where lcore_id comes from user input and the check is > missing. > > > I'll modify it in v3 based on your suggestions above.
Re: [PATCH v2 1/2] net: fix offset calculation for GENEVE packet
I have also found some bugs about the tunnel packet type parsing API[1], including one of the problems fixed by this patch. [1] https://patchwork.dpdk.org/project/dpdk/list/?series=35233 On 2025/5/21 13:11, sk...@marvell.com wrote: > From: Sunil Kumar Kori > > While parsing packet headers, offset must be added to get next > header but for geneve header parsing offset is overwritten. > Also inner_l2_len is not set in case of geneve packets. > > Fixes: 64ed7f854cf4 ("net: add tunnel packet type parsing") > > Signed-off-by: Sunil Kumar Kori > --- > lib/net/rte_net.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/net/rte_net.c b/lib/net/rte_net.c > index be24690fdf..1264f33d61 100644 > --- a/lib/net/rte_net.c > +++ b/lib/net/rte_net.c > @@ -251,7 +251,8 @@ ptype_tunnel_with_udp(uint16_t *proto, const struct > rte_mbuf *m, > if (unlikely(gnh == NULL)) > return 0; > geneve_len = sizeof(*gnh) + gnh->opt_len * 4; > - *off = geneve_len; > + *off += geneve_len; > + hdr_lens->inner_l2_len = sizeof(struct rte_udp_hdr) + > geneve_len; The l2_len in mbuf is also calculated from the outer L4 header, so it is calculated in the same way here. However, this is easy to be misunderstood, can we add a note to inner_l2_len as follows? struct rte_net_hdr_lens { uint8_t l2_len; /* Outer_L4_len + ... + inner L2_len for tunneling pkt. */ uint8_t inner_l2_len; uint16_t l3_len; uint16_t inner_l3_len; uint16_t tunnel_len; uint8_t l4_len; uint8_t inner_l4_len; }; > *proto = gnh->proto; > if (gnh->proto == 0) > *proto = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4);
Re: [PATCH v2 1/2] net: fix offset calculation for GENEVE packet
Acked-by: Dengdui Huang On 2025/5/21 13:11, sk...@marvell.com wrote: > From: Sunil Kumar Kori > > While parsing packet headers, offset must be added to get next > header but for geneve header parsing offset is overwritten. > Also inner_l2_len is not set in case of geneve packets. > > Fixes: 64ed7f854cf4 ("net: add tunnel packet type parsing") > > Signed-off-by: Sunil Kumar Kori > --- > lib/net/rte_net.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/net/rte_net.c b/lib/net/rte_net.c > index be24690fdf..1264f33d61 100644 > --- a/lib/net/rte_net.c > +++ b/lib/net/rte_net.c > @@ -251,7 +251,8 @@ ptype_tunnel_with_udp(uint16_t *proto, const struct > rte_mbuf *m, > if (unlikely(gnh == NULL)) > return 0; > geneve_len = sizeof(*gnh) + gnh->opt_len * 4; > - *off = geneve_len; > + *off += geneve_len; > + hdr_lens->inner_l2_len = sizeof(struct rte_udp_hdr) + > geneve_len; > *proto = gnh->proto; > if (gnh->proto == 0) > *proto = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4);
Re: [PATCH v4 1/3] net: fix GTP packet parsing
On 2025/5/27 4:53, Stephen Hemminger wrote: > On Fri, 23 May 2025 10:55:55 +0800 > Dengdui Huang wrote: > >> After parsing the GTP packet header, the next protocol type should >> be converted from RTE_GTP_TYPE_IPV4/IPV6 to RTE_ETHER_TYPE_IPV4/IPV6. >> Otherwise, the next protocol cannot be parsed. >> >> Bugzilla ID: 1672 >> Fixes: 64ed7f854cf4 ("net: add tunnel packet type parsing") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang >> Acked-by: Jie Hai > > Looks good but there are failures in CI test of hardware L4 checksum. > Please fix (or rerun the test). > OK, thank you for your review. I should not push these three patches together because they address different issues. I will push this patch separately, and the other two patches will be pushed as a new patchset.
Re: [PATCH 0/6] net/hns3: VF support multi-TCs
Hi, maintainers, Kindly ping for reviews. The CI compilation errors have been checked and are not introduced by this patch. We hope to merge these patches into the 25.07 version. Thanks. On 2025/6/11 16:18, Dengdui Huang wrote: > This patchset adds the VF multi-TCs feature. > > Chengwen Feng (6): > net/hns3: fix VF fail to config queue TC > net/hns3: remove duplicate struct field > net/hns3: refactor DCB module code > net/hns3: VF support parse max TC number > net/hns3: VF support discover multi-TCs capability > net/hns3: VF support multi-TCs configure > > drivers/net/hns3/hns3_cmd.c | 5 +- > drivers/net/hns3/hns3_cmd.h | 10 + > drivers/net/hns3/hns3_dcb.c | 161 > drivers/net/hns3/hns3_dcb.h | 4 + > drivers/net/hns3/hns3_dump.c | 11 +- > drivers/net/hns3/hns3_ethdev.c| 132 ++--- > drivers/net/hns3/hns3_ethdev.h| 12 +- > drivers/net/hns3/hns3_ethdev_vf.c | 297 -- > drivers/net/hns3/hns3_mbx.h | 50 - > drivers/net/hns3/hns3_rss.c | 8 +- > drivers/net/hns3/hns3_rxtx.c | 26 ++- > drivers/net/hns3/hns3_tm.c| 6 +- > 12 files changed, 539 insertions(+), 183 deletions(-) >
Re: [PATCH 1/3] net/hns3: fix the hardware GRO function is abnormal
On 2025/6/10 1:13, Stephen Hemminger wrote: > On Mon, 9 Jun 2025 21:06:49 +0800 > Dengdui Huang wrote: > >> +rem = dma_addr & (HNS3_RX_DMA_ADDR_ALIGN_128 - 1); >> +if ((rx_offload & RTE_ETH_RX_OFFLOAD_TCP_LRO) && rem > 0) { >> +hns3_err(hw, "Hardware GRO is not supported when mbuf DMA " >> + "address is 64-byte aligned"); >> +return -EINVAL; > > Best to not break messages across source lines, it makes it harder for users > and tools looking for error messages in the source. > > I.e do: > hns3_err(hw, >"Hardware GRO is not supported for if mbuf DMA not > 64-byte aligned"); >" > I can fix during merge if that is ok? or you can resubmit I made a mistake, can you help me fix it when merging? Thanks
Re: [PATCH 3/3] net/hns3: fix can't use vector for Rx when set VLAN filter
On 2025/6/10 1:12, Stephen Hemminger wrote: > On Mon, 9 Jun 2025 21:06:51 +0800 > Dengdui Huang wrote: > >> Currently, When RTE_ETH_RX_OFFLOAD_VLAN_FILTER offload is set, >> driver wouldn't select Rx vector algorithm. Actually, this >> algorithm support it, so open it. >> >> Fixes: a3d4f4d291d7 ("net/hns3: support NEON Rx") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Dengdui Huang > > Mind if I reword commit message slightly. > Use past tense for case before the patch was applied (old) > and present for case when patch is applied. > > When the RTE_ETH_RX_OFFLOAD_VLAN_FILTER offload flag was set, > the driver would not select the Rx vector algorithm. > But this flag is OK when using vector algorithm so remove > it from the mask. > > Okay, I don't mind. This new description is easier to understand.
Re: [PATCH v2] examples/l3fwd: adjust Tx burst size based on Rx burst
On 2025/6/9 23:21, Stephen Hemminger wrote: > On Mon, 9 Jun 2025 09:58:27 + > Sivaprasad Tummala wrote: > >> Previously, the TX burst size was fixed at 256, leading to performance >> degradation in certain scenarios. >> >> This patch introduces logic to set the TX burst size to match the >> configured RX burst size (--burst option, default 32, max 512) >> for better efficiency. >> >> Fixes: d5c4897ecfb2 ("examples/l3fwd: add option to set Rx burst size") >> Cc: haij...@huawei.com >> Cc: sta...@dpdk.org >> >> Signed-off-by: Sivaprasad Tummala >> Tested-by: Venkat Kumar Ande >> Tested-by: Dengdui Huang > > What driver? Why not fix the driver. > If RX burst is small, there should be no way to get TX burst larger > than that to happen. If the Tx burst is too large, a number of mbufs will be temporarily stored in l3fwd's mbuf_table in a short period of time. This leads to a decrease in the hit rate of the mempool cache, resulting in a drop in performance.
Does rte_net_get_ptype() support processing packets with two 0x8100 VLAN tags?
Hi everyone, The current rte_net_get_ptype() only supports parsing packets with a single 0x8100 VLAN tag and two 0x88a8 VLAN tags, but does not support processing packets with two 0x8100 VLAN tags. Some network cards (e.g., hns3) do support parsing packets with two 0x8100 VLAN tags. Can this API be extended to support such packets? I understand that two 0x8100 VLAN tag packets aren't commonly used, so I'm uncertain whether adding support is necessary. Anyone is welcome to discuss this and share their opinions. Thanks, /Dengdui
Re: Does rte_net_get_ptype() support processing packets with two 0x8100 VLAN tags?
On 2025/7/2 13:56, Morten Brørup wrote: >> From: huangdengdui [mailto:huangdeng...@huawei.com] >> Sent: Wednesday, 2 July 2025 04.38 >> >> Hi everyone, >> >> The current rte_net_get_ptype() only supports parsing packets with a >> single 0x8100 VLAN tag and two 0x88a8 VLAN tags, >> but does not support processing packets with two 0x8100 VLAN tags. >> >> Some network cards (e.g., hns3) do support parsing packets with two >> 0x8100 VLAN tags. >> Can this API be extended to support such packets? > > It already does; the layers parameter must have RTE_PTYPE_INNER_L2_MASK set. Yes, by setting RTE_PTYPE_INNER_L2_MASK, packets with two 0x8100 VLAN tags can be parsed, but they will be recognized as tunnel packets even though they are not actually tunnel packets. The current branch that parses the 0x8100 VLAN tag only handles a single VLAN tag[1][2]. Should we extend it to support parsing two or more 0x8100 VLAN tags? [1]https://elixir.bootlin.com/dpdk/v25.07-rc2/source/lib/net/rte_net.c#L348 [2]https://elixir.bootlin.com/dpdk/v25.07-rc2/source/lib/net/rte_net.c#L500 > >> >> I understand that two 0x8100 VLAN tag packets aren't commonly used, so >> I'm uncertain whether adding support is necessary. > > AFAIR, it's called "VLAN Stacking". > Stacking two 0x8100 VLAN tags is commonly used. > E.g. the majority of consumer fiber internet connections in Denmark use VLAN > Stacking (not QinQ) at the network owners' Points-of-Interconnect for the > service providers to identify individual homes. Oh, I'm not very familiar with this. If stacking two 0x8100 VLAN tags is a commonly used, it would make sense for rte_net_get_ptype() to support parsing such packets. > >> >> Anyone is welcome to discuss this and share their opinions. >> >> Thanks, >> /Dengdui
Re: [PATCH 6/6] net/hns3: VF support multi-TCs configure
On 2025/6/30 1:40, Stephen Hemminger wrote: > On Wed, 11 Jun 2025 16:19:00 +0800 > Dengdui Huang wrote: > >> +#pragma pack(1) >> +#define HNS3_MBX_PRIO_SHIFT 4 >> +#define HNS3_MBX_PRIO_MASK 0xFu >> +struct hns3_mbx_tc_config { >> +/* >> + * Each four bits correspond to one priority's TC. >> + * Bit0-3 correspond to priority-0's TC, bit4-7 correspond to >> + * priority-1's TC, and so on. >> + */ >> +uint32_t prio_tc_map; >> +uint8_t tc_dwrr[HNS3_MAX_TC_NUM]; >> +uint8_t num_tc; >> +/* >> + * Each bit correspond to one TC's scheduling mode, 0 means SP >> + * scheduling mode, 1 means DWRR scheduling mode. >> + * Bit0 corresponds to TC0, bit1 corresponds to TC1, and so on. >> + */ >> +uint8_t tc_sch_mode; >> }; >> +#pragma pack() >> > > DPDK has portable macros for packing __rte_packed_begin and __rte_packed_end. > Please change to using those macros. > Then rebase, retest and resubmit this patcheset Okay, I have modified and pushed the new version.