net/bnxt: wrong link status when lsc_intr is used
: dev->data->dev_conf.intr_conf=1 bnxt_mq_rx_configure(): pools = 1 nb_q_per_grp = 2 bnxt_mq_rx_configure(): rxq[0] = 0x106200280 vnic[0] = 0x100200080 bnxt_mq_rx_configure(): rxq[1] = 0x105f10e40 vnic[0] = 0x100200080 bnxt_setup_one_vnic(): vnic[0] = 0x100200080 vnic->fw_grp_ids = 0x105f07e00 bnxt_hwrm_vnic_alloc(): Alloc VNIC. Start 0, End 2 bnxt_hwrm_vnic_alloc(): VNIC ID 3 bnxt_setup_one_vnic(): rxq[0]->vnic=0x100200080 vnic->fw_grp_ids=0x105f07e00 bnxt_setup_one_vnic(): rxq[1]->vnic=0x100200080 vnic->fw_grp_ids=0x105f07e00 bnxt_setup_one_vnic(): vnic->rx_queue_cnt = 2 bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0 bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0 bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0 bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0 bnxt_ulp_port_init(): Skip ulp init for port: 1, TF is not enabled bnxt_receive_function(): Using SSE vector mode receive for port 1 bnxt_transmit_function(): Using SSE vector mode transmit for port 1 Port 1: 00:0A:F7:B6:E3:D1 Here, we can see that lsc interrupts are enabled even though we specified not to enable them. Then given autoneg does not work on my nic, I can try setting the link up and showing port info: testpmd> set link-up port 0 bnxt_print_link_info(): Port 0 Link Up - speed 0 Mbps - half-duplex testpmd> show port info 0 * Infos for port 0 * MAC address: 00:0A:F7:B6:E3:D0 Device name: :02:00.0 Driver name: net_bnxt Firmware-version: 223.0.161.0 Devargs: Connect to socket: 0 memory allocation on the socket: 0 Link status: up Link speed: None Link duplex: half-duplex Autoneg status: Off MTU: 1500 Promiscuous mode: enabled Allmulticast mode: disabled Maximum number of MAC addresses: 128 Maximum number of MAC addresses of hash filtering: 0 VLAN offload: strip off, filter off, extend off, qinq strip off Hash key size in bytes: 40 Redirection table size: 128 Supported RSS offload flow types: ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp user-defined-50 user-defined-51 Minimum size of RX buffer: 1 Maximum configurable length of RX packet: 9600 Maximum configurable size of LRO aggregated packet: 0 Maximum number of VMDq pools: 64 Current number of RX queues: 2 Max possible RX queues: 117 Max possible number of RXDs per queue: 8192 Min possible number of RXDs per queue: 16 RXDs number alignment: 1 Current number of TX queues: 2 Max possible TX queues: 117 Max possible number of TXDs per queue: 4096 Min possible number of TXDs per queue: 16 TXDs number alignment: 1 Max segment number per packet: 65535 Max segment number per MTU/TSO: 65535 Device capabilities: 0x3( RUNTIME_RX_QUEUE_SETUP RUNTIME_TX_QUEUE_SETUP ) Switch name: :02:00.0 Switch domain Id: 0 Switch Port Id: 32768 Device error handling mode: proactive This shows link status is seen as up, even although link speed is None. I was wondering if patching the code to move this line which sets lsc interrupt on somewhere else might be reasonable, or if this could cause further trouble. Maybe having a parameter to trigger it ON/OFF might be a good addition. May I have your opinion on this matter? Sincerely, Edwin Brossette
[PATCH] bnxt: fix unwanted interrupt config on link state change
From: Edwin Brossette When getting the device's info via bnxt_dev_info_get_op(), the device enables interrupts on link state changes because of the following line: > eth_dev->data->dev_conf.intr_conf.lsc = 1; Enabling this mode might not be wanted by the user. The flag RTE_ETH_DEV_INTR_LSC can be used to inform the above application that lsc interrupts are supported. Thus, checking this flag, the user can decide whether or not to enable these interrupts. Since there is no reason for a function meant to display config to actually modify it, remove this line. In addition, raise the dev_flag associated with this state on the device's intialization to show the device supports link state change interrupts. Fixes: 7bc8e9a227cc ("net/bnxt: support async link notification") Signed-off-by: Edwin Brossette --- drivers/net/bnxt/bnxt_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index b3de490d3667..753e86b4b2af 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -1017,7 +1017,6 @@ static int bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev, .tx_free_thresh = 32, .tx_rs_thresh = 32, }; - eth_dev->data->dev_conf.intr_conf.lsc = 1; dev_info->rx_desc_lim.nb_min = BNXT_MIN_RING_DESC; dev_info->rx_desc_lim.nb_max = BNXT_MAX_RX_RING_DESC; @@ -5859,6 +5858,7 @@ bnxt_dev_init(struct rte_eth_dev *eth_dev, void *params __rte_unused) rte_eth_copy_pci_info(eth_dev, pci_dev); eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS; + eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC; bp = eth_dev->data->dev_private; -- 2.35.0.4.g44a5d4affccf
Re: net/bnxt: wrong link status when lsc_intr is used
gt; bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0 > bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0 > bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0 > [...] > bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0 > bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0 > Port 0 Link down > bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:40,Support:140,Force:0 > bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0 > Port 1 Link down > Done > testpmd> show port info 0 > bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0 > bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0 > > * Infos for port 0 * > MAC address: 00:0A:F7:B6:E3:D0 > Device name: :02:00.0 > Driver name: net_bnxt > Firmware-version: 223.0.161.0 > Devargs: > Connect to socket: 0 > memory allocation on the socket: 0 > Link status: down<== > Link speed: None > Link duplex: half-duplex > Autoneg status: Off > MTU: 1500 > Promiscuous mode: enabled > Allmulticast mode: disabled > Maximum number of MAC addresses: 128 > Maximum number of MAC addresses of hash filtering: 0 > VLAN offload: > strip off, filter off, extend off, qinq strip off > Hash key size in bytes: 40 > Redirection table size: 128 > Supported RSS offload flow types: > ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp > user-defined-50 user-defined-51 > Minimum size of RX buffer: 1 > Maximum configurable length of RX packet: 9600 > Maximum configurable size of LRO aggregated packet: 0 > Maximum number of VMDq pools: 64 > Current number of RX queues: 2 > Max possible RX queues: 117 > Max possible number of RXDs per queue: 8192 > Min possible number of RXDs per queue: 16 > RXDs number alignment: 1 > Current number of TX queues: 2 > Max possible TX queues: 117 > Max possible number of TXDs per queue: 4096 > Min possible number of TXDs per queue: 16 > TXDs number alignment: 1 > Max segment number per packet: 65535 > Max segment number per MTU/TSO: 65535 > Device capabilities: 0x3( RUNTIME_RX_QUEUE_SETUP RUNTIME_TX_QUEUE_SETUP ) > Switch name: :02:00.0 > Switch domain Id: 0 > Switch Port Id: 32768 > Device error handling mode: proactive > testpmd> > Here you can see the link status appears down if lsc interrupts are turned off, which is the expected result. Sincerely, Edwin Brossette. On Tue, Jan 31, 2023 at 6:08 AM Somnath Kotur wrote: > On Thu, Jan 19, 2023 at 7:07 PM Edwin Brossette > wrote: > > > > Hello, > > > Hi Edwin, > Thanks for reaching out, here's my attempt at answering your questions > > I am trying to operate a Broadcom BCM57414 2x10G nic using dpdk bnxt > pmd. I use DPDK 22.11. > > However, doing so I stumbled over a number of different issues. Mainly, > using the dpdk rte_eth library, I don't seem to be able to correctly poll > the link status: I expect my nic has a problem using autoneg to set the > link speed/duplex, because these parameters remain unknown while autoneg is > on. However, after trying to set link up, instead of showing the link state > as down, I see the link being up, which is in truth not the case, as no > packets can transit on the line and the switch at the other end sees it > down. > > > > When searching around and trying to debug the code, I found the function > bnxt_dev_info_get_op() sets my nic in interrupt mode: > > > > > eth_dev->data->dev_conf.intr_conf.lsc = 1; > This was added long back in the driver code by this commit (as you can > see 6+ yrs old :)) , so I believe at the time the intent was to get > this link notification asynchronous > and 'lsc' I believe was for link state change? Please suggest an > alternative place to do this or you may even post the patch yourself > to help do this the right way > commit 7bc8e9a227ccbc649c2d230f0ee92ee58b1cb955 > Author: Ajit Khaparde > Date: Tue Oct 11 16:47:50 2016 -0500 > > net/bnxt: support async link notification > > > > > Which is a bit of a surprising thing to do for a function meant to get > info. > > Thus my card ends up working in a mode I didn't configure it to, which > may be the cause of my issue: later when setting the link up in function > bnxt_dev_set_link_up_op(): > > > > > if (!bp->link_info->link_up) > > > rc = bnxt_set_hwrm_link_config(bp, true); > > > if (!rc) > > > eth_dev->data->dev_link.link_status = 1; > > > > So link_status in eth_dev g
[ixgbevf] Problem with RSS initial config after device init on X550 nic
Hello, We recently ran into an issue with our product when working with an X550 nic with stable dpdk-23.11. We observed that all the incoming traffic was directed only into a single queue. The issue became evident after displaying the RSS reta which was fully zeroed after device init, thus directing all traffic to rxq0. Moreover, RSS hash key did not seem to be correctly initialized. Manually setting the reta afterwards was enough to balance the incoming traffic between our queues, which convinced me that the issue here was simply a matter of correctly initializing the device on port start. Looking into the pmd's code, I couldn't see any RSS configuration done vf side at device startup, at least not in ixgbevf_dev_rx_init(). I've seen ixgbe_dev_mq_rx_configure() was called during the pf's init and configured RSS to be handled by vf if sriov was on, but this isn't enough to fully configure RSS for the vf. (see code here: https://git.dpdk.org/dpdk/tree/drivers/net/ixgbe/ixgbe_rxtx.c#n4644 ) I have also observed that all different models of nics using ixgbe did not handle RSS in the same way. For example, for nics of the 82599 series, it is written in their datasheet that, for IOV mode: "— Note that RSS is not supported in IOV mode since there is only a single RSS hash function in the hardware." On the contrary, x550 nics have special registers to handle RSS in VF, like VFRSSRK and VFRETA, for example. I believe the RSS config not being initialized for X550 nics might come from a slight misunderstanding on this part. Therefore, I can suggest a patch to add a call to ixgbe_rss_configure() somewhere in ixgbevf_dev_rx_init() specifically for this model of nic. Despite this function being named ixgbe_xxx instead of ixgbevf_xxx, it will do correct initialization for RSS in vf mode because all functions to get RSS-related registers such as ixgbe_reta_reg_get() or ixgbe_reta_size_get() will check if the device is in vf or pf mode and fetch the appropriate registers. Here is a way to reproduce, on an X550 card: Here are the nics I am using: :08:00.0 ntfp1 ac:1f:6b:57:57:74 ixgbe1x2.5 GT/s PCIe 1x2.5 GT/s PCIe Intel Corporation Ethernet Connection X553 10 GbE SFP+ :08:00.1 ntfp2 ac:1f:6b:57:57:75 ixgbe1x2.5 GT/s PCIe 1x2.5 GT/s PCIe Intel Corporation Ethernet Connection X553 10 GbE SFP+ :08:10.0 eth0d2:c4:fc:c5:c3:05 ixgbevf 1x2.5 GT/s PCIe 0xUnknownIntel Corporation X553 Virtual Function :08:10.2 eth1e2:a8:68:09:20:29 ixgbevf 1x2.5 GT/s PCIe 0xUnknownIntel Corporation X553 Virtual Function 1) Starting up dpdk-testpmd: sudo dpdk-hugepages.py --setup 2G; dpdk-devbind --bind=vfio-pci :08:10.0 dpdk-devbind --bind=vfio-pci :08:10.2 dpdk-testpmd -a :08:10.0 -a :08:10.2 -- -i --rxq=2 --txq=2 --coremask=0xff0 --total-num-mbufs=25 EAL: Detected CPU lcores: 12 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) EAL: Probe PCI driver: net_ixgbe_vf (8086:15c5) device: :08:10.0 (socket -1) EAL: Probe PCI driver: net_ixgbe_vf (8086:15c5) device: :08:10.2 (socket -1) Interactive-mode selected previous number of forwarding cores 1 - changed to number of configured cores 8 Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa. testpmd: create a new mbuf pool : n=25, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Configuring Port 0 (socket 0) Port 0: 02:09:C0:9E:09:75 Configuring Port 1 (socket 0) Port 1: 02:09:C0:76:6D:4B Checking link statuses... Done testpmd> 2) Display port info: testpmd> show port info 0 * Infos for port 0 * MAC address: 02:09:C0:B8:68:2F Device name: :08:10.0 Driver name: net_ixgbe_vf Firmware-version: not available Devargs: Connect to socket: 0 memory allocation on the socket: 0 Link status: down Link speed: None Link duplex: half-duplex Autoneg status: On MTU: 1500 Promiscuous mode: disabled Allmulticast mode: disabled Maximum number of MAC addresses: 128 Maximum number of MAC addresses of hash filtering: 4096 VLAN offload: strip off, filter off, extend off, qinq strip off Hash key size in bytes: 40 Redirection table size: 64 Supported RSS offload flow types: ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp ipv6-ex ipv6-tcp-ex ipv6-udp-ex Minimum size of RX buffer: 1024 Maximum configurable length of RX packet: 9728 Maximum configurable size of LRO aggregated packet: 0 Maximum number of VMDq pools: 64 Current number of RX queues: 2 Max possible RX queues: 4 Max possible number of RXDs per queue: 4096 Min possible number of RXDs per queue: 32 RXDs number alignment: 8 Current number of TX queues: 2 Max possible TX queues: 4 Max possible number of TXDs per queue: 4096 Min possible number of
[PATCH] net/ixgbevf: fix RSS init for x550 nics
From: Edwin Brossette Different Intel nics with the igxbe pmd do not handle RSS in the same way when working with virtualization. While some nics like Intel 82599ES only have a single RSS table in the device and leave all rss features to be handled by the pf, some other nics like x550 let the vf handle RSS features. This can lead to different behavior when rss is enabled depending on the model of nic used. In particular, it occurred that ixgbevf_dev_rx_init() do not initiate rss parameters at device init, even if the multi-queue mode option is set in the device configuration (ie: RTE_ETH_MQ_RX_RSS is set). Note that this issue went unnoticed until now, probably because some nics do not really have support for RSS in virtualization mode. Thus, depending on the nic used, we can we find ourselves in a situation where RSS is not configured despite being enabled. This will cause serious performance issues because the RSS reta will be fully zeroed, causing all packets to go only in the first queue and leaving all the others empty. By looking at ixgbe_reta_size_get(), we can see that only X550 nic models have a non zero reta size set in vf mode. Thus add a call to ixgbe_rss_configure() for these cards in ixgbevf_dev_rx_init() if the option to enable RSS is set. Fixes: f4d1598ee14f ("ixgbevf: support RSS config on x550") Signed-off-by: Edwin Brossette --- drivers/net/ixgbe/ixgbe_rxtx.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 537aa2f68de8..0aa968f7e258 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -5873,6 +5873,25 @@ ixgbevf_dev_rx_init(struct rte_eth_dev *dev) IXGBE_PSRTYPE_RQPL_SHIFT; IXGBE_WRITE_REG(hw, IXGBE_VFPSRTYPE, psrtype); + /* Initialize the rss for x550_vf cards if enabled */ + switch (hw->mac.type) { + case ixgbe_mac_X550_vf: + case ixgbe_mac_X550EM_x_vf: + case ixgbe_mac_X550EM_a_vf: + switch (dev->data->dev_conf.rxmode.mq_mode) { + case RTE_ETH_MQ_RX_RSS: + case RTE_ETH_MQ_RX_DCB_RSS: + case RTE_ETH_MQ_RX_VMDQ_RSS: + ixgbe_rss_configure(dev); + break; + default: + break; + } + break; + default: + break; + } + ixgbe_set_rx_function(dev); return 0; -- 2.35.0.4.g44a5d4affccf
net/virtio: duplicated xstats
Hello, I noticed a small inconsistency in the virtio pmd's xstats. The stat "rx_q0_errors" appears twice. I also think the stats "rx_q0_packets", "rx_q0_bytes", "tx_q0_packets" and "tx_q0_bytes" are duplicates of "rx_q0_good_packets", "rx_q0_good_bytes", "tx_q0_good_packets" and "tx_q0_good_bytes" I believe this issue probably appeared after this commit: f30e69b41f94: ethdev: add device flag to bypass auto-filled queue xstats http://scm.6wind.com/vendor/dpdk.org/dpdk/commit/?id=f30e69b41f949cd4a9afb6ff39de196e661708e2 >From what I understand, the rxq0_error stat was originally reported by the librte. However, changes were made so it is reported by the pmd instead. The flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS was temporarily set to keep the old behaviour so that every pmd could have time to adapt the change. But it seems the flag was forgotten in the virtio pmd and as a result, some stats are fetched at two different times when displaying xstats. First in lib_rte_ethdev: https://git.dpdk.org/dpdk/tree/lib/ethdev/rte_ethdev.c#n3266 (you can see the check on the RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS flag before the snprintf on eth_dev_rxq_stats_strings[] ) And a second time in the virtio pmd: https://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtio_ethdev.c#n705 pmd (see the snprintf on rte_virtio_rxq_stat_strings[] ) This problem can be reproduced on testpmd simply by displaying the xstats on a port with the net_virtio driver: Reproduction: === 1) start dpdk-testpmd: modprobe -a uio_pci_generic dpdk-devbind -b uio_pci_generic 03:00.0 dpdk-devbind -b uio_pci_generic 04:00.0 dpdk-devbind -s Network devices using DPDK-compatible driver :03:00.0 'Virtio 1.0 network device 1041' drv=uio_pci_generic unused=vfio-pci :04:00.0 'Virtio 1.0 network device 1041' drv=uio_pci_generic unused=vfio-pci [...] dpdk-testpmd -a :03:00.0 -a :04:00.0 -- -i --rxq=1 --txq=1 --coremask=0x4 --total-num-mbufs=25 EAL: Detected CPU lcores: 3 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: VFIO support initialized EAL: Probe PCI driver: net_virtio (1af4:1041) device: :03:00.0 (socket -1) EAL: Probe PCI driver: net_virtio (1af4:1041) device: :04:00.0 (socket -1) Interactive-mode selected Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa. testpmd: create a new mbuf pool : n=25, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Configuring Port 0 (socket 0) Port 0: 52:54:00:B0:8F:88 Configuring Port 1 (socket 0) Port 1: 52:54:00:EF:09:1F Checking link statuses... Done 2) port info: show port info 0 * Infos for port 0 * MAC address: 52:54:00:B0:8F:88 Device name: :03:00.0 Driver name: net_virtio Firmware-version: not available Devargs: Connect to socket: 0 memory allocation on the socket: 0 Link status: up Link speed: Unknown Link duplex: full-duplex Autoneg status: On MTU: 1500 Promiscuous mode: enabled Allmulticast mode: disabled Maximum number of MAC addresses: 64 Maximum number of MAC addresses of hash filtering: 0 VLAN offload: strip off, filter off, extend off, qinq strip off No RSS offload flow type is supported. Minimum size of RX buffer: 64 Maximum configurable length of RX packet: 9728 Maximum configurable size of LRO aggregated packet: 0 Current number of RX queues: 1 Max possible RX queues: 1 Max possible number of RXDs per queue: 32768 Min possible number of RXDs per queue: 32 RXDs number alignment: 1 Current number of TX queues: 1 Max possible TX queues: 1 Max possible number of TXDs per queue: 32768 Min possible number of TXDs per queue: 32 TXDs number alignment: 1 Max segment number per packet: 65535 Max segment number per MTU/TSO: 65535 Device capabilities: 0x0( ) Device error handling mode: none Device private info: guest_features: 0x110af8020 vtnet_hdr_size: 12 use_vec: rx-0 tx-0 use_inorder: rx-0 tx-0 intr_lsc: 1 max_mtu: 9698 max_rx_pkt_len: 1530 max_queue_pairs: 1 req_guest_features: 0x805f10ef8028 3) show port xstats: show port xstats 0 ## NIC extended statistics for port 0 rx_good_packets: 0 tx_good_packets: 0 rx_good_bytes: 0 tx_good_bytes: 0 rx_missed_errors: 0 rx_errors: 0 tx_errors: 0 rx_mbuf_allocation_errors: 0 rx_q0_packets: 0 rx_q0_bytes: 0 rx_q0_errors: 0 <== tx_q0_packets: 0 tx_q0_bytes: 0 rx_q0_good_packets: 0 rx_q0_good_bytes: 0 rx_q0_errors: 0 <== rx_q0_multicast_packets: 0 rx_q0_broadcast_packets: 0 rx_q0_undersize_packets: 0 rx_q0_size_64_packets: 0 rx_q0_size_65_127_packets: 0 rx_q0_size_128_255_packets: 0 rx_q0_size_256_511_packets: 0 rx_q0_size_512_1023_packets: 0 rx_q0_size_1024_1518_packets: 0 rx_q0_size_1519_max_packets: 0 tx_q0_good_packets: 0 tx_q0_good_bytes: 0 tx_q0_multicast_packets: 0 tx_q0_br
Re: net/virtio: duplicated xstats
Hello again, The flag is already set during the device init, so it should be removed, not added. I can confirm removing it fixed my issue. I will submit a patch for this bug. Regards, Edwin Brossette. On Fri, Nov 24, 2023 at 11:39 AM Maxime Coquelin wrote: > Hi Edwin, > > Thanks for reporting the issue. > > On 11/24/23 10:26, Olivier Matz wrote: > > Fix Maxime's mail. > > Thanks Olivier. > > > On Fri, Nov 24, 2023 at 10:18:27AM +0100, Edwin Brossette wrote: > >> Hello, > >> > >> I noticed a small inconsistency in the virtio pmd's xstats. > >> The stat "rx_q0_errors" appears twice. > >> I also think the stats "rx_q0_packets", "rx_q0_bytes", "tx_q0_packets" > and > >> "tx_q0_bytes" are duplicates of "rx_q0_good_packets", > "rx_q0_good_bytes", > >> "tx_q0_good_packets" and "tx_q0_good_bytes" > >> > >> I believe this issue probably appeared after this commit: > >> > >> f30e69b41f94: ethdev: add device flag to bypass auto-filled queue xstats > >> > http://scm.6wind.com/vendor/dpdk.org/dpdk/commit/?id=f30e69b41f949cd4a9afb6ff39de196e661708e2 > > Adding Ferruh as he is the author of this commit. > > >> From what I understand, the rxq0_error stat was originally reported by > the > >> librte. However, changes were made so it is reported by the pmd instead. > >> The flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS was temporarily set to keep > the > >> old behaviour so that every pmd could have time to adapt the change. > >> But it seems the flag was forgotten in the virtio pmd and as a result, > some > >> stats are fetched at two different times when displaying xstats. > > Have you tried adding this flag to Virtio PMD? Does it fixes the issue? > > >> First in lib_rte_ethdev: > >> https://git.dpdk.org/dpdk/tree/lib/ethdev/rte_ethdev.c#n3266 > >> (you can see the check on the RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS flag > before > >> the snprintf on eth_dev_rxq_stats_strings[] ) > >> > >> And a second time in the virtio pmd: > >> https://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtio_ethdev.c#n705 > pmd > >> (see the snprintf on rte_virtio_rxq_stat_strings[] ) > >> > >> This problem can be reproduced on testpmd simply by displaying the > xstats > >> on a port with the net_virtio driver: > >> > >> Reproduction: > >> === > >> > >> 1) start dpdk-testpmd: > >> > >> modprobe -a uio_pci_generic > >> dpdk-devbind -b uio_pci_generic 03:00.0 > >> dpdk-devbind -b uio_pci_generic 04:00.0 > >> > >> dpdk-devbind -s > >> > >> Network devices using DPDK-compatible driver > >> > >> :03:00.0 'Virtio 1.0 network device 1041' drv=uio_pci_generic > >> unused=vfio-pci > >> :04:00.0 'Virtio 1.0 network device 1041' drv=uio_pci_generic > >> unused=vfio-pci > >> [...] > >> > >> dpdk-testpmd -a :03:00.0 -a :04:00.0 -- -i --rxq=1 --txq=1 > >> --coremask=0x4 --total-num-mbufs=25 > >> EAL: Detected CPU lcores: 3 > >> EAL: Detected NUMA nodes: 1 > >> EAL: Detected static linkage of DPDK > >> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > >> EAL: Selected IOVA mode 'PA' > >> EAL: VFIO support initialized > >> EAL: Probe PCI driver: net_virtio (1af4:1041) device: :03:00.0 > (socket > >> -1) > >> EAL: Probe PCI driver: net_virtio (1af4:1041) device: :04:00.0 > (socket > >> -1) > >> Interactive-mode selected > >> Warning: NUMA should be configured manually by using --port-numa-config > and > >> --ring-numa-config parameters along with --numa. > >> testpmd: create a new mbuf pool : n=25, size=2176, > socket=0 > >> testpmd: preferred mempool ops selected: ring_mp_mc > >> Configuring Port 0 (socket 0) > >> Port 0: 52:54:00:B0:8F:88 > >> Configuring Port 1 (socket 0) > >> Port 1: 52:54:00:EF:09:1F > >> Checking link statuses... > >> Done > >> > >> 2) port info: > >> > >> show port info 0 > >> > >> * Infos for port 0 * > >> MAC address: 52:54:00:B0:8F:88 > >> Device name: :03:00.0 > >> Driver name: net_virtio > >> Firmware-version: not available > >> Devargs
[PATCH] net/virtio: fix duplicated rxq xstats
From: Edwin Brossette The flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS was temporarily set while moving queue stats from 'struct rte_eth_stats' to the individual pmds, as explained in commit f30e69b41f94 ("ethdev: add device flag to bypass auto-filled queue xstats"). This flag was added so every pmd would keep its original behavior until the change was implemented. However, this flag was not removed afterwards in the virtio pmd and as a result, some queue stats are displayed twice when trying to get them: once in lib_rte_ethdev, and a second time in the virtio pmd. Remove this flag so stats are printed only once. Fixes: f30e69b41f94 ("ethdev: add device flag to bypass auto-filled queue xstats") Cc: sta...@dpdk.org Signed-off-by: Edwin Brossette --- drivers/net/virtio/virtio_ethdev.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index c2c0a1a11137..517585740eeb 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1793,8 +1793,6 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features) else eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC; - eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS; - /* Setting up rx_header size for the device */ if (virtio_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) || virtio_with_feature(hw, VIRTIO_F_VERSION_1) || -- 2.35.0.4.g44a5d4affccf
net/e1000: Test issues following change in max rx/tx queues
Hello, We recently started to observe concerning behaviour on our continuous integration platform following this commit, which set the maximum amount of tx/rx queues at 2 for the e1000 pmd: net/e1000: fix queue number initialization https://git.dpdk.org/dpdk/commit/?id=c1a42d646472fd3477429bf016f682e0865b77f0 Reverting this change locally on our side was enough to fix our test issues. There is a considerately long explanation in a comment just above the change explaining why the number of rx/tx queues is limited at 1, yet it was changed at 2 anyway. Since I couldn't see any mention in logs about theses restrictions being removed, I wished to inquire whether this change was truly intended and if there was any problem which motivated it. Maybe the max number of rx/tx queues should reverted back to 1? Or maybe the comment should be updated if limitations are no longer true? Regards, Edwin Brossette.
net/ixgbe: all interrupt disabled after enabling lsc on X552/X557-AT card
Hello, I have found a bug in the ixgbe pmd, concerning this model of nic in particular: -> Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T I was able to detect this bug after enabling the use of Link state change interrupt in the card via the following flag: - port->rte_conf.intr_conf.lsc = 1; With these interrupts enabled on this particular network card, there is a chance on every link-up that all interrupts get disabled, as the interrupt mask is unintentionally set to 0. This happens because there are two interrupts that are triggered on a link status change on this model of nic. See related code here: https://git.dpdk.org/dpdk/tree/drivers/net/ixgbe/ixgbe_ethdev.c?id=c99e1db87798c2cd2c0ac94c7676e961eedffc87#n4552 Whenever an lsc interrupt is triggered, we enter this ixgbe_dev_interrupt_get_status() function. Bit 20 (IXGBE_EICR_LSC ) and 25 (IXGBE_EICR_GPI_SDP0_X550EM_x ) of the Extendend Interrupt Cause Register (EICR) will be set to '1' on a link status change. When the interrupt is handled in the ixgbe_dev_interrupt_delayed_handler() function, if the IXGBE_FLAG_NEED_LINK_UPDATE flag is raised (happens when EICR_LSC bit is raised in eicr), an alarm for a delayed handler will be programmed. The following code poses issue: https://git.dpdk.org/dpdk/tree/drivers/net/ixgbe/ixgbe_ethdev.c?id=c99e1db87798c2cd2c0ac94c7676e961eedffc87#n4736 intr->mask = intr->mask_original; intr->mask_original = 0; Sometimes, it's possible to see a first interrupt with bit IXGBE_EICR_LSC raised, quickly followed by a second interrupt with both bit IXGBE_EICR_LSC and bit IXGBE_EICR_GPI_SDP0_X550EM_x being raised. In this case, the flag IXGBE_FLAG_NEED_LINK_UPDATE will be raised in both cases and the delayed handler will be programmed to be ran twice. This will lead to intr->mask_original being set to 0 after the first delayed_handler is done, and then intr->mask being also set to 0 after the second time it's run. This effectively disables all interrupts from working on the device, which is particularly painful since we no longer poll for link state on our side, having switched to interrupt-based link state handling. My suggestion to solve this issue is to avoid creating an alarm for ixgbe_dev_interrupt_delayed_handler() if there is already one pending. I do this by checking for bit IXGBE_EIMS_LSC in intr->mask. I no longer see this issue after pushing in my local repository. I will suggest a patch by responding to this mail. I can show a way to reproduce bellow with test pmd. I have added some custom logs into the code for my own convenience: = 1) start testpmd: PCI-SLOT IFNAME MAC-ADDRESSKMOD MAX SPEEDCURRENT SPEEDDEVICE :03:00.0 ntfp1 0c:c4:7a:75:8f:ec ixgbe 1x2.5 GT/s PCIe 1x2.5 GT/s PCIe Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T :03:00.1 ntfp2 0c:c4:7a:75:8f:ed ixgbe 1x2.5 GT/s PCIe 1x2.5 GT/s PCIe Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T sudo dpdk-hugepages.py --setup 2G; dpdk-devbind --bind=vfio-pci :03:00.0 dpdk-devbind --bind=vfio-pci :03:00.1 dpdk-testpmd -a :03:00.0 -a :03:00.1 -- -i --rxq=2 --txq=2 --coremask=0x0c --total-num-mbufs=25 EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: VFIO support initialized EAL: Using IOMMU type 8 (No-IOMMU) EAL: Probe PCI driver: net_ixgbe (8086:15ad) device: :03:00.0 (socket -1) EAL: Probe PCI driver: net_ixgbe (8086:15ad) device: :03:00.1 (socket -1) Interactive-mode selected previous number of forwarding cores 1 - changed to number of configured cores 2 Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa. testpmd: create a new mbuf pool : n=25, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Configuring Port 0 (socket 0) Port 0: 0C:C4:7A:75:8F:EC Configuring Port 1 (socket 0) Port 1: 0C:C4:7A:75:8F:ED Checking link statuses... Done 2) To reproduce, simply repeatedly set link up and down. At one point there should be something like this: testpmd> set link-up port 0 ixgbe_dev_interrupt_get_status(): <><><><>><>: eicr register: 10 <-- IXGBE_EICR_LSC ixgbe_dev_interrupt_get_status(): <><><><>><>: set IXGBE_FLAG_NEED_LINK_UPDATE ixgbe_dev_interrupt_get_status(): <><><><>><>: current flags: flags = 1 ixgbe_dev_interrupt_action(): <><><><>><>: current flags (1): flags = 1 ixgbe_dev_interrupt_action(): <><><><>><>: current flags (2): flags = 1 ixgbe_dev_interrupt_action(): <><><><>><>: current flags (3): flags = 1 ixgbe_dev_interrupt_action(): <><><><>><>: for dev: 0x557c0cd844c0 ixgbe_dev_interrupt_action(): <><><><>><>: saved intr: mask_original = 36700160 < original interrupt mask ixgbe_dev_interrupt_act
[PATCH] net/ixgbe: don't create a delayed interrupt handler if one already exists
From: Edwin Brossette Since link state may need some time to stabilize after a link state change, we cannot update the link state right after one occurs. So link state change interrupts (lsc) are handled after a delay. To do this, an alarm to call a delayed handler is programmed. This delayed handler is tasked with updating the link after a variable delay of one to four seconds which should be enough time for the link state to become stable again. However, a problem can occur with some models of network cards. For example, ixgbe_mac_X550EM_x may trigger this interrupt twice because another interrupt signal is received on the General Purpose Interrupt pin SPD0, which has the same interrupt handler. In such a case, the delayed interrupt handler would be programmed to be executed twice. Since we save the original interrupt mask value to restore it after the delayed handler is done with its work, we end up overwritting its value after the second alarm is programmed. Even worse: when restoring it the first time, the saved original mask variable is reset to 0, so we end up completely disabling all interrupts when trying to restore this mask after the second time the delayed handler is executed. Add a check on the interrupt mask value when programming the alarm for the delayed handler. If the bit for lsc interrupts is unset, it means an alarm was already programmed for the delayed handler. In this case, skip the alarm creation. Fixes: 9b667210700e ("net/ixgbe: fix blocked interrupts") Cc: sta...@dpdk.org Signed-off-by: Edwin Brossette --- drivers/net/ixgbe/ixgbe_ethdev.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index c61c52b2966b..52cafcbc965f 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -4667,14 +4667,20 @@ ixgbe_dev_interrupt_action(struct rte_eth_dev *dev) timeout = IXGBE_LINK_DOWN_CHECK_TIMEOUT; ixgbe_dev_link_status_print(dev); - if (rte_eal_alarm_set(timeout * 1000, - ixgbe_dev_interrupt_delayed_handler, (void *)dev) < 0) - PMD_DRV_LOG(ERR, "Error setting alarm"); - else { - /* remember original mask */ - intr->mask_original = intr->mask; - /* only disable lsc interrupt */ - intr->mask &= ~IXGBE_EIMS_LSC; + + /* Don't program delayed handler if LSC interrupt is disabled. +* It means one is already programmed. +*/ + if (intr->mask & IXGBE_EIMS_LSC){ + if (rte_eal_alarm_set(timeout * 1000, + ixgbe_dev_interrupt_delayed_handler, (void *)dev) < 0) + PMD_DRV_LOG(ERR, "Error setting alarm"); + else { + /* remember original mask */ + intr->mask_original = intr->mask; + /* only disable lsc interrupt */ + intr->mask &= ~IXGBE_EIMS_LSC; + } } } -- 2.35.0.4.g44a5d4affccf
Crash in tap pmd when using more than 8 rx queues
Hello, I have recently stumbled into an issue with my DPDK-based application running the failsafe pmd. This pmd uses a tap device, with which my application fails to start if more than 8 rx queues are used. This issue appears to be related to this patch: https://git.dpdk.org/dpdk/commit/?id=c36ce7099c2187926cd62cff7ebd479823554929 I have seen in the documentation that there was a limitation to 8 max queues shared when using a tap device shared between multiple processes. However, my application uses a single primary process, with no secondary process, but it appears that I am still running into this limitation. Now if we look at this small chunk of code: memset(&msg, 0, sizeof(msg)); strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name)); strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name)); msg.len_param = sizeof(*request_param); for (i = 0; i < dev->data->nb_tx_queues; i++) { msg.fds[fd_iterator++] = process_private->txq_fds[i]; msg.num_fds++; request_param->txq_count++; } for (i = 0; i < dev->data->nb_rx_queues; i++) { msg.fds[fd_iterator++] = process_private->rxq_fds[i]; msg.num_fds++; request_param->rxq_count++; } (Note that I am not using the latest DPDK version, but stable v23.11.1. But I believe the issue is still present on latest.) There are no checks on the maximum value i can take in the for loops. Since the size of msg.fds is limited by the maximum of 8 queues shared between process because of the IPC API, there is a potential buffer overflow which can happen here. See the struct declaration: struct rte_mp_msg { char name[RTE_MP_MAX_NAME_LEN]; int len_param; int num_fds; uint8_t param[RTE_MP_MAX_PARAM_LEN]; int fds[RTE_MP_MAX_FD_NUM]; }; This means that if the number of queues used is more than 8, the program will crash. This is what happens on my end as I get the following log: *** stack smashing detected ***: terminated Reverting the commit mentionned above fixes my issue. Also setting a check like this works for me: if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) return -1; I've made the changes on my local branch to fix my issue. This mail is just to bring attention on this problem. Thank you in advance for considering it. Regards, Edwin Brossette.
net/failsafe: segfault happens on hotplug alarm.
Hello, I recently ran into an issue when using DPDK's failsafe pmd on a Microsoft Azure setup. On this setup, I have the failsafe pmd managing a netvsc interface with a Mellanox nic (which can be used through the hardware acceleration feature). A segfault is sometimes seen whenever I unplug the Mellanox device, which is a case that should be handled by the pmd. On a more recent DPDK version (I tested with stable v23.11.1), this segfault is systematic. This seems to happen because the function rte_eth_dev_release_port() is called twice when the hotplug_alarm triggers. You can see it in this bit of code here: https://git.dpdk.org/dpdk/tree/drivers/net/failsafe/failsafe_ether.c#n276 In the fs_dev_remove() function, the rte_eth_dev_close() calls run the rte_eth_dev_release_port() the first time, and it is then called a second time when handling the DEV_PROBED case. I noticed when searching into the mailing list that this problem was already seen once and a patch was suggested. Here is the link to the mail: https://mails.dpdk.org/archives/dev/2022-November/256898.html Applying this patch on my local branched fixed the issue on my end although I cannot attest for sure that it is the best possible fix. I could see a memory leak happening if the fs_dev_remove() function was called as sdev->state == DEV_PROBED, as the release would never be done. But it is still weird that if sdev->state == DEV_STARTED, we do rte_eth_dev_release_port() first then rte_dev_remove(), and if sdev->state == DEV_PROBED, we call them in the opposite order. Perhaps someone could look into it? Regards, Edwin Brossette.
Re: Crash in tap pmd when using more than 8 rx queues
Hello, I created a Bugzilla PR, just as you requested: https://bugs.dpdk.org/show_bug.cgi?id=1536 As for the bug resolution, I have other matters to attend to and I'm afraid I cannot spend more time on this issue, so I was only planning to report it. Regards, Edwin Brossette. On Fri, Sep 6, 2024 at 1:16 PM Ferruh Yigit wrote: > On 9/5/2024 1:55 PM, Edwin Brossette wrote: > > Hello, > > > > I have recently stumbled into an issue with my DPDK-based application > > running the failsafe pmd. This pmd uses a tap device, with which my > > application fails to start if more than 8 rx queues are used. This issue > > appears to be related to this patch: > > https://git.dpdk.org/dpdk/commit/? > > id=c36ce7099c2187926cd62cff7ebd479823554929 <https://git.dpdk.org/dpdk/ > > commit/?id=c36ce7099c2187926cd62cff7ebd479823554929> > > > > I have seen in the documentation that there was a limitation to 8 max > > queues shared when using a tap device shared between multiple processes. > > However, my application uses a single primary process, with no secondary > > process, but it appears that I am still running into this limitation. > > > > Now if we look at this small chunk of code: > > > > memset(&msg, 0, sizeof(msg)); > > strlcpy(msg.name <http://msg.name>, TAP_MP_REQ_START_RXTX, > > sizeof(msg.name <http://msg.name>)); > > strlcpy(request_param->port_name, dev->data->name, sizeof(request_param- > >>port_name)); > > msg.len_param = sizeof(*request_param); > > for (i = 0; i < dev->data->nb_tx_queues; i++) { > > msg.fds[fd_iterator++] = process_private->txq_fds[i]; > > msg.num_fds++; > > request_param->txq_count++; > > } > > for (i = 0; i < dev->data->nb_rx_queues; i++) { > > msg.fds[fd_iterator++] = process_private->rxq_fds[i]; > > msg.num_fds++; > > request_param->rxq_count++; > > } > > (Note that I am not using the latest DPDK version, but stable v23.11.1. > > But I believe the issue is still present on latest.) > > > > There are no checks on the maximum value i can take in the for loops. > > Since the size of msg.fds is limited by the maximum of 8 queues shared > > between process because of the IPC API, there is a potential buffer > > overflow which can happen here. > > > > See the struct declaration: > > struct rte_mp_msg { > > char name[RTE_MP_MAX_NAME_LEN]; > > int len_param; > > int num_fds; > > uint8_t param[RTE_MP_MAX_PARAM_LEN]; > > int fds[RTE_MP_MAX_FD_NUM]; > > }; > > > > This means that if the number of queues used is more than 8, the program > > will crash. This is what happens on my end as I get the following log: > > *** stack smashing detected ***: terminated > > > > Reverting the commit mentionned above fixes my issue. Also setting a > > check like this works for me: > > > > if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > > RTE_MP_MAX_FD_NUM) > > return -1; > > > > I've made the changes on my local branch to fix my issue. This mail is > > just to bring attention on this problem. > > Thank you in advance for considering it. > > > > Hi Edwin, > > Thanks for the report, I confirm issue is valid, although that code > changed a little (to increase 8 limit) [3]. > > And in this release Stephen put another patch [1] to increase the limit > even more, but irrelevant from the limit, tap code needs to be fixed. > > To fix: > 1. We need to add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check you > mentioned, to not blindly update the 'msg.fds[]' > 2. We should prevent this to be a limit for tap PMD when there is only > primary process, this seems was oversight in our end. > > > Can you work on the issue or just reporting it? > Can you please report the bug in Bugzilla [2], to record the issue? > > > > [1] > > https://patches.dpdk.org/project/dpdk/patch/20240905162018.74301-1-step...@networkplumber.org/ > > [2] > https://bugs.dpdk.org/ > > [3] > https://git.dpdk.org/dpdk/commit/?id=72ab1dc1598e > >
net/netvsc: problem with configuring netvsc port after first port start
nimum size of RX buffer: 1024 Maximum configurable length of RX packet: 65536 Maximum configurable size of LRO aggregated packet: 0 Current number of RX queues: 2 Max possible RX queues: 64 Max possible number of RXDs per queue: 65535 Min possible number of RXDs per queue: 0 RXDs number alignment: 1 Current number of TX queues: 2 Max possible TX queues: 64 Max possible number of TXDs per queue: 4096 Min possible number of TXDs per queue: 1 TXDs number alignment: 1 Max segment number per packet: 40 Max segment number per MTU/TSO: 40 Device capabilities: 0x0( ) Device error handling mode: none Device private info: none -> First, stop port: testpmd> port stop 3 Stopping ports... Done -> Then, change something in the port config. This will trigger a call to rte_eth_dev_configure() on the next port start. Here I change the link speed/duplex: testpmd> port config 3 speed 1 duplex full testpmd> -> Finally, try to start the port: testpmd> port start 3 Configuring Port 3 (socket 0) hn_nvs_alloc_subchans(): nvs subch alloc failed: 0x2 hn_dev_configure(): subchannel configuration failed ETHDEV: Port3 dev_configure = -5 Fail to configure port 3 <-- As you can see, the port configuration fails. The error happens in hn_nvs_alloc_subchans(). Maybe the previous ressources were not properly deallocated on port stop? When I looked around in the pmd's code, I noticed the function hn_reinit() with the following commentary: /* * Connects EXISTING rx/tx queues to NEW vmbus channel(s), and * re-initializes NDIS and RNDIS, including re-sending initial * NDIS/RNDIS configuration. To be used after the underlying vmbus * has been un- and re-mapped, e.g. as must happen when the device * MTU is changed. */ This function shows that it is possible to call rte_eth_dev_configure() without failing, as this funtion hn_reinit() calls hn_dev_configure(), and is called when the mtu is changed. I suspect the operations described in comment above might also be needed when running rte_eth_dev_configure(). Is this a known issue ? In my application case, this bug causes issues when I need to restart and configure the port. Thank you for considering this issue. Regards, Edwin Brossette.
Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
Hello, Sorry for bothering you again. May I inquire if this issue is still being worked on ? If so, when can I expect to see a fix ? Best regards, Edwin Brossette On Mon, Dec 23, 2024 at 2:09 PM Slava Ovsiienko wrote: > Confirm, it’s a bug, IIUC was introduced by reporting function update. > AFAIK, we do not test with non-DevX environment anymore, so missed this. > > Fix should be provided. > > > > With best regards, > > Slava > > > > *From:* Asaf Penso > *Sent:* Sunday, December 22, 2024 9:39 AM > *To:* igooto...@gmail.com; Slava Ovsiienko > *Cc:* Laurent Hardy ; Olivier Matz < > olivier.m...@6wind.com>; Didier Pallard ; > Jean-Mickael Guerin ; Edwin Brossette < > edwin.brosse...@6wind.com>; dev@dpdk.org > *Subject:* RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off > > > > Hello Igor and Slava, > > Can you please check out this issue? > > > > Regards, > > Asaf Penso > > > > *From:* Edwin Brossette > *Sent:* Friday, 20 December 2024 19:06 > *To:* dev@dpdk.org > *Cc:* Laurent Hardy ; Olivier Matz < > olivier.m...@6wind.com>; Didier Pallard ; > Jean-Mickael Guerin > *Subject:* net/mlx5: wrong Rx/Tx descriptor limits when DevX is off > > > > Hello, > > I have run into a regression following an update to stable dpdk-24.11 with > a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics > in my lab which have DevX disabled: using mstconfig utility, I can see the > flag UCTX_EN is not set. > > Mainly, the issue is that the ports cannot be started, with the following > error logs in the journal: > > Set nb_rxd=1 (asked=512) for port=0 > Set nb_txd=1 (asked=512) for port=0 > starting port 0 > Initializing port 0 [7c:fe:90:65:e6:54] > port 0: ntfp1 (mlx5_pci) > nb_rxq=2 nb_txq=2 > rxq0=c9 rxq1=c25 > txq0=c9 txq1=c25 > port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526 > mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be > higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 > mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next > power of two (64) > mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be > higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 > mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next > power of two (64) > mlx5_net: Port 0 Rx queue 0 CQ creation failure. > mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory > rte_eth_dev_start(port 0) failed, error=-12 > Failed to start port 0, set link down > Failed to start port 0 > > > Looking more precisely into the problem, it appears that the number of Rx > and Tx descriptors configured for my queues is 1. This happens because > mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is > unexpected. I identified this patch to be responsible for the regression: > > 4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits > > https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c > > After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never > configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is > called in this bit of code: > > https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681 > > /* > * When CTX is created by Verbs, query HCA attribute is unsupported. > * When CTX is imported, we cannot know if it is created by DevX or > * Verbs. So, we use query HCA attribute function to check it. > */ > if (cdev->config.devx || cdev->config.device_fd != MLX5_ARG_UNSET) { > > /* Query HCA attributes. */ > ret = mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr); > if (ret) { > > DRV_LOG(ERR, "Unable to read HCA caps in DevX mode."); > rte_errno = ENOTSUP; > goto error; > > } > cdev->config.devx = 1; > > } > DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT "); > > > > I deduced that following the above patch, the correct value for maximum Rx > and Tx descriptors will only be set if DevX is enabled (see the if > condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx > descriptors will be 1, which will make the ports fail to start. Perhaps we > should keep the previous default value (65535) if config.devx == 0 (DevX > off)? This could be done like this, for example: > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c > b/drivers/net/mlx5/mlx5_ethdev.c > index 7708a0b80883..8ba3eb4a32de 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -359,10 +359,12 @@ mlx5_dev
net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
Hello, I have run into a regression following an update to stable dpdk-24.11 with a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics in my lab which have DevX disabled: using mstconfig utility, I can see the flag UCTX_EN is not set. Mainly, the issue is that the ports cannot be started, with the following error logs in the journal: Set nb_rxd=1 (asked=512) for port=0 Set nb_txd=1 (asked=512) for port=0 starting port 0 Initializing port 0 [7c:fe:90:65:e6:54] port 0: ntfp1 (mlx5_pci) nb_rxq=2 nb_txq=2 rxq0=c9 rxq1=c25 txq0=c9 txq1=c25 port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526 mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next power of two (64) mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next power of two (64) mlx5_net: Port 0 Rx queue 0 CQ creation failure. mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory rte_eth_dev_start(port 0) failed, error=-12 Failed to start port 0, set link down Failed to start port 0 Looking more precisely into the problem, it appears that the number of Rx and Tx descriptors configured for my queues is 1. This happens because mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is unexpected. I identified this patch to be responsible for the regression: 4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is called in this bit of code: https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681 /* * When CTX is created by Verbs, query HCA attribute is unsupported. * When CTX is imported, we cannot know if it is created by DevX or * Verbs. So, we use query HCA attribute function to check it. */ if (cdev->config.devx || cdev->config.device_fd != MLX5_ARG_UNSET) { /* Query HCA attributes. */ ret = mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr); if (ret) { DRV_LOG(ERR, "Unable to read HCA caps in DevX mode."); rte_errno = ENOTSUP; goto error; } cdev->config.devx = 1; } DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT "); I deduced that following the above patch, the correct value for maximum Rx and Tx descriptors will only be set if DevX is enabled (see the if condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx descriptors will be 1, which will make the ports fail to start. Perhaps we should keep the previous default value (65535) if config.devx == 0 (DevX off)? This could be done like this, for example: diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 7708a0b80883..8ba3eb4a32de 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -359,10 +359,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info) info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK; mlx5_set_default_params(dev, info); mlx5_set_txlimit_params(dev, info); - info->rx_desc_lim.nb_max = - 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; - info->tx_desc_lim.nb_max = - 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; + if (priv->sh->cdev->config.devx) { + info->rx_desc_lim.nb_max = + 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; + info->tx_desc_lim.nb_max = + 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; + } if (priv->sh->cdev->config.hca_attr.mem_rq_rmp && priv->obj_ops.rxq_obj_new == devx_obj_ops.rxq_obj_new) info->dev_capa |= RTE_ETH_DEV_CAPA_RXQ_SHARE; Thanks in advance for your help. Regards, Edwin Brossette.
Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
Hello, Thank you for your answer. The short patch I joined with my first mail was just a rough example to report what I tested. I believe you know the driver's code better than I do, so I wouldn't be opposed to see you fix this issue. Thank you in advance. Regards, Edwin Brossette. On Wed, Mar 5, 2025 at 10:17 AM Slava Ovsiienko wrote: > Hi, Edwin > > > > Thank you for the patch. > > You are quite right, “sh->cdev->config.hca_attr.log_max_wq_sz” is not set > if DevX is disengaged. > > I found some other places where the uninitialized “log_max_wq_sz” might be > used. > So. I’d rather prefer to configure the “log_max_wq_sz” for IBV case as > well, instead of just fixing mlx5_dev_infos_get(). > > > There is the property in “priv->sh->dev_cap.max_qp_wr”, it reflects the > max number of descriptors if rdma_core is used. > > Would you like to update your patch with this? Or would you prefer me to > do it ? > > With best regards, > Slava > > > > > > *From:* Edwin Brossette > *Sent:* Wednesday, February 12, 2025 4:34 PM > *To:* Slava Ovsiienko > *Cc:* Asaf Penso ; igooto...@gmail.com; Laurent Hardy < > laurent.ha...@6wind.com>; Olivier Matz ; Didier > Pallard ; Jean-Mickael Guerin ; > dev@dpdk.org > *Subject:* Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off > > > > Hello, > > > > Sorry for bothering you again. > > May I inquire if this issue is still being worked on ? > > If so, when can I expect to see a fix ? > > > > Best regards, > > Edwin Brossette > > > > On Mon, Dec 23, 2024 at 2:09 PM Slava Ovsiienko > wrote: > > Confirm, it’s a bug, IIUC was introduced by reporting function update. > AFAIK, we do not test with non-DevX environment anymore, so missed this. > > Fix should be provided. > > > > With best regards, > > Slava > > > > *From:* Asaf Penso > *Sent:* Sunday, December 22, 2024 9:39 AM > *To:* igooto...@gmail.com; Slava Ovsiienko > *Cc:* Laurent Hardy ; Olivier Matz < > olivier.m...@6wind.com>; Didier Pallard ; > Jean-Mickael Guerin ; Edwin Brossette < > edwin.brosse...@6wind.com>; dev@dpdk.org > *Subject:* RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off > > > > Hello Igor and Slava, > > Can you please check out this issue? > > > > Regards, > > Asaf Penso > > > > *From:* Edwin Brossette > *Sent:* Friday, 20 December 2024 19:06 > *To:* dev@dpdk.org > *Cc:* Laurent Hardy ; Olivier Matz < > olivier.m...@6wind.com>; Didier Pallard ; > Jean-Mickael Guerin > *Subject:* net/mlx5: wrong Rx/Tx descriptor limits when DevX is off > > > > Hello, > > I have run into a regression following an update to stable dpdk-24.11 with > a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics > in my lab which have DevX disabled: using mstconfig utility, I can see the > flag UCTX_EN is not set. > > Mainly, the issue is that the ports cannot be started, with the following > error logs in the journal: > > Set nb_rxd=1 (asked=512) for port=0 > Set nb_txd=1 (asked=512) for port=0 > starting port 0 > Initializing port 0 [7c:fe:90:65:e6:54] > port 0: ntfp1 (mlx5_pci) > nb_rxq=2 nb_txq=2 > rxq0=c9 rxq1=c25 > txq0=c9 txq1=c25 > port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526 > mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be > higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 > mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next > power of two (64) > mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be > higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 > mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next > power of two (64) > mlx5_net: Port 0 Rx queue 0 CQ creation failure. > mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory > rte_eth_dev_start(port 0) failed, error=-12 > Failed to start port 0, set link down > Failed to start port 0 > > > Looking more precisely into the problem, it appears that the number of Rx > and Tx descriptors configured for my queues is 1. This happens because > mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is > unexpected. I identified this patch to be responsible for the regression: > > 4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits > > https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c > > After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never > configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is > called in this bit of code: > >
[PATCH 3/5] Revert "net/qede: fix maximum Rx packet length"
From: Edwin Brossette This reverts commit d8ded501e05ce879f27f0ed1df7721a88b737e25. The maximum length for Rx packets computed in qede_rx_queue_setup() takes Ethernet CRC into account. This is not consistent with the value computed in qede_set_mtu(). RTE_ETHER_CRC_LEN should not be added to max_rx_pktlen, as HW does not include CRC in received frames passed to host. The original commit tries to fix another bug with this inappropriate patch: packets with size nearing MTU limit are being dropped. This is not because CRC length is not being accounted for in Rx buff size, but because of the flooring applied to it: the rx_buff size computed is lower than expected because we try to align it. This issue will be fixed in the following patch. CC: sta...@dpdk.org Signed-off-by: Edwin Brossette Acked-by: Didier Pallard --- drivers/net/qede/qede_rxtx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c index 601fcb30b357..fe839a6ba844 100644 --- a/drivers/net/qede/qede_rxtx.c +++ b/drivers/net/qede/qede_rxtx.c @@ -235,7 +235,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid, dev->data->rx_queues[qid] = NULL; } - max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN; + max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN; /* Fix up RX buffer size */ bufsz = (uint16_t)rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM; -- 2.35.0.4.g44a5d4affccf
QEDE bug report
Hello, I have several issues to report concerning the qede pmd as well as potential solutions for them. Most of them have to do with configuring the MTU. == Abort on mtu change === First, the qede_assign_rxtx_handlers() seems to be working wrong since an API change in the rte_eth lib. The commit linked bellow changes the way packet receive and transmit functions are handled: https://git.dpdk.org/dpdk/commit/?id=c87d435a4d79739c0cec2ed280b94b41cb908af7 Originally, the Rx/Tx handlers were located in rte_eth_dev struct. This is currently no longer the case and the polling of incoming packets is done with functions registered in rte_eth_fp_ops instead. The rte lib change is supposed to be transparent for the individual pmds, but the polling functions in rte_eth_dev are only synchronized with the ones in rte_eth_fp_ops at the device start. This leads to an issue when trying to configure the MTU while there is ongoing traffic: -> Trying to change the MTU triggers a port restart. -> qede_assign_rxtx_handlers() assign dummy polling functions in dev->rx_pkt_burst and dev->tx_pkt_burst while the port is down. -> However, rte_eth_rx_burst() polls in &rte_eth_fp_ops[port_id].rx_pkt_burst which still points to qede_recv_pkts_regular() -> The application keep polling packets in the receive function and triggers an assert(rx_mb != NULL) which caused an abort. Since rte_eth_fp_ops is reset in rte_eth_dev_stop(), it may be better to call this function instead of qede_dev_stop(). However the dummy functions defined in lib/ethdev/ethdev_private.c log an error and dump stack when called so they might not be intended to be used this way. The way I fixed this issue in our applications is by forcing a complete stop of the port before configuring the MTU. I have no DPDK patch to suggest for this -- Reproduction -- 1) Start testpmd: dpdk-testpmd --log-level=pmd.net.qede.driver:7 -a :17:00.0 -a :17:00.1 -- -i --rxq=2 --txq=2 --coremask=0x0c --total-num-mbufs=25 2) Start packet forwarding: start io packet forwarding - ports=2 - cores=2 - streams=4 - NUMA support enabled, MP allocation mode: native Logical Core 2 (socket 0) forwards packets on 2 streams: RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01 RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 Logical Core 3 (socket 1) forwards packets on 2 streams: RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01 RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00 io packet forwarding packets/burst=32 nb forwarding cores=2 - nb forwarding ports=2 port 0: RX queue number: 2 Tx queue number: 2 Rx offloads=0x8 Tx offloads=0x0 RX queue: 0 RX desc=0 - RX free threshold=0 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 RX Offloads=0x8 TX queue: 0 TX desc=0 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX offloads=0x8000 - TX RS bit threshold=0 port 1: RX queue number: 2 Tx queue number: 2 Rx offloads=0x8 Tx offloads=0x0 RX queue: 0 RX desc=0 - RX free threshold=0 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 RX Offloads=0x8 TX queue: 0 TX desc=0 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX offloads=0x8000 - TX RS bit threshold=0 3) Send a continuous stream of packets and change the mtu while they are being forwarded: p = Ether()/IP(src='10.100.0.1', dst='10.200.0.1')/UDP(sport=1, dport=2)/Raw(load='A'*100) sendp(p, iface='ntfp1', count=500, inter=0.001) port config mtu 0 1500 [qede_dev_set_link_state:1842(17:00.0:dpdk-port-0)]setting link state 0 [qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1 Status 0 [ecore_int_attentions:1239(17:00.0:dpdk-port-0-0)]MFW indication via attention [qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1 Status 0 [qede_activate_vport:536(17:00.0:dpdk-port-0)]vport is deactivated [qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 0 [qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 0 stopped [qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 1 [qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 1 stopped [qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 0 [qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 0 stopped [qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 1 [qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 1 stopped dpdk-testpmd: ../../../source/dpdk-24.11/drivers/net/qede/qede_rxtx.c:1600: qede_recv_pkts_regular: Assertion `rx_mb != NULL' failed. Aborted (core dumped) <=== As you can see, the application is aborted. = Bad Rx buffer size for mbufs === Another issue I had when trying to set the MTU was that I got bad sanity checks on mbufs when sending large packets, cau
[PATCH 5/5] net/qede: fix rx_buf_size calculation
From: Edwin Brossette When the MTU configured is lower than maximum mbuf size (all packet data can be stored in a single mbuf), then rx buffer size is configured with MTU + some overhead. A flooring is applied to this value to align it, meaning its actual value is going to be lower than expected. This is a considerable design flaw, because a data packet fitting exactly the MTU might not fit in the rx buffer. What is observed in this case is that the nic splits the datagram in two segments, essentially doing Rx scatter. However, the qede pmd does not expect gather scatter in this case and does not use the required function: it uses qede_recv_pkts_regular() which is not capable of handling this case. Thus, we end up with malformed packet with m->nb_segs = 2 and m->next = NULL. This means the last part of the data is missing. CEIL max_rx_pktlen instead of FLOORing it. Also change the check on max_rx_pktlen > bufsz in qede_rx_queue_setup(): if this ceiling would make max_rx_pktlen exceed mbuf size, then force enable scatter-gather mode. Fixes: 318d7da3122b ("net/qede: fix Rx buffer size calculation") CC: sta...@dpdk.org Signed-off-by: Edwin Brossette Acked-by: Didier Pallard --- drivers/net/qede/qede_rxtx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c index ea77f09e18be..2a6f1290ad4a 100644 --- a/drivers/net/qede/qede_rxtx.c +++ b/drivers/net/qede/qede_rxtx.c @@ -111,7 +111,7 @@ qede_calc_rx_buf_size(struct rte_eth_dev *dev, uint16_t mbufsz, rx_buf_size = max_frame_size; /* Align to cache-line size if needed */ - return QEDE_FLOOR_TO_CACHE_LINE_SIZE(rx_buf_size); + return QEDE_CEIL_TO_CACHE_LINE_SIZE(rx_buf_size); } static struct qede_rx_queue * @@ -237,7 +237,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid, /* cache align the mbuf size to simplify rx_buf_size calculation */ bufsz = QEDE_FLOOR_TO_CACHE_LINE_SIZE(bufsz); if ((rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER) || - max_rx_pktlen > bufsz) { + QEDE_CEIL_TO_CACHE_LINE_SIZE(max_rx_pktlen) > bufsz) { if (!dev->data->scattered_rx) { DP_INFO(edev, "Forcing scatter-gather mode\n"); dev->data->scattered_rx = 1; -- 2.35.0.4.g44a5d4affccf
[PATCH 4/5] net/qede: fix QEDE_ETH_OVERHEAD being counted twice in rx_buf_size
From: Edwin Brossette rx_buf_size is computed at 2 different places: in qede_rx_queue_setup() and in qede_set_mtu(). In qede_rx_queue_setup(), it is initialized with mtu + RTE_ETHER_HDR_LEN and QEDE_ETH_OVERHEAD is added to it in qede_calc_rx_buf_size(). In qede_set_mtu(), it is initialized with mtu + RTE_ETHER_HDR_LEN + QEDE_ETH_OVERHEAD and QEDE_ETH_OVERHEAD is added to it a second time in qede_calc_rx_buf_size(). This is inconsistent and wrong in the case of qede_set_mtu(). Initialize this variable with mtu + QEDE_MAX_ETHER_HDR_LEN instead and stop adding + QEDE_ETH_OVERHEAD over and over again. This will factorize the code. Fixes: 318d7da3122b ("net/qede: fix Rx buffer size calculation") CC: sta...@dpdk.org Signed-off-by: Edwin Brossette Acked-by: Didier Pallard --- drivers/net/qede/qede_rxtx.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c index fe839a6ba844..ea77f09e18be 100644 --- a/drivers/net/qede/qede_rxtx.c +++ b/drivers/net/qede/qede_rxtx.c @@ -101,19 +101,14 @@ qede_calc_rx_buf_size(struct rte_eth_dev *dev, uint16_t mbufsz, * buffers can be used for single packet. So need to make sure * mbuf size is sufficient enough for this. */ - if ((mbufsz * ETH_RX_MAX_BUFF_PER_PKT) < -(max_frame_size + QEDE_ETH_OVERHEAD)) { + if ((mbufsz * ETH_RX_MAX_BUFF_PER_PKT) < max_frame_size) { DP_ERR(edev, "mbuf %d size is not enough to hold max fragments (%d) for max rx packet length (%d)\n", mbufsz, ETH_RX_MAX_BUFF_PER_PKT, max_frame_size); return -EINVAL; } - - rx_buf_size = RTE_MAX(mbufsz, - (max_frame_size + QEDE_ETH_OVERHEAD) / - ETH_RX_MAX_BUFF_PER_PKT); - } else { - rx_buf_size = max_frame_size + QEDE_ETH_OVERHEAD; - } + rx_buf_size = mbufsz; + } else + rx_buf_size = max_frame_size; /* Align to cache-line size if needed */ return QEDE_FLOOR_TO_CACHE_LINE_SIZE(rx_buf_size); @@ -235,14 +230,14 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid, dev->data->rx_queues[qid] = NULL; } - max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN; + max_rx_pktlen = dev->data->mtu + QEDE_MAX_ETHER_HDR_LEN; /* Fix up RX buffer size */ bufsz = (uint16_t)rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM; /* cache align the mbuf size to simplify rx_buf_size calculation */ bufsz = QEDE_FLOOR_TO_CACHE_LINE_SIZE(bufsz); if ((rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER) || - (max_rx_pktlen + QEDE_ETH_OVERHEAD) > bufsz) { + max_rx_pktlen > bufsz) { if (!dev->data->scattered_rx) { DP_INFO(edev, "Forcing scatter-gather mode\n"); dev->data->scattered_rx = 1; -- 2.35.0.4.g44a5d4affccf
[PATCH 1/5] qede: fix tunnel checksums offload flags
From: Didier Pallard In tunnel case, L3 bad checksum is properly setting RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD while all other flags are set in inner part of offload flags, this can cause both L4 flags BAD and GOOD to be set in inner offloads when a tunnel packet is processed, changing these flags to RTE_MBUF_F_RX_L4_CKSUM_NONE instead of GOOD/BAD values. This in turn can cause upper layers to take incorrect decision on what to do with the packet. Remove IP_CKSUM_GOOD flag on outer IP layer, since there is currently no way to indicate that this csum is good using DPDK offload flags. Fixes: 81f8804992c9 ("net/qede: enhance Rx CPU utilization") Fixes: 3d4bb4411683 ("net/qede: add fastpath support for VXLAN tunneling") CC: sta...@dpdk.org Signed-off-by: Edwin Brossette Signed-off-by: Didier Pallard Acked-by: Olivier Matz --- drivers/net/qede/qede_rxtx.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c index 25e28fd9f61b..c764e3d83763 100644 --- a/drivers/net/qede/qede_rxtx.c +++ b/drivers/net/qede/qede_rxtx.c @@ -1617,9 +1617,9 @@ qede_recv_pkts_regular(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) "L4 csum failed, flags = 0x%x", parse_flag); rxq->rx_hw_errors++; - ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_BAD; + ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_BAD; } else { - ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD; + ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD; } if (unlikely(qede_check_tunn_csum_l3(parse_flag))) { @@ -1628,8 +1628,6 @@ qede_recv_pkts_regular(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) parse_flag); rxq->rx_hw_errors++; ol_flags |= RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD; - } else { - ol_flags |= RTE_MBUF_F_RX_IP_CKSUM_GOOD; } flags = fp_cqe->tunnel_pars_flags.flags; @@ -1887,9 +1885,9 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) "L4 csum failed, flags = 0x%x", parse_flag); rxq->rx_hw_errors++; - ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_BAD; + ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_BAD; } else { - ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD; + ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD; } if (unlikely(qede_check_tunn_csum_l3(parse_flag))) { @@ -1898,8 +1896,6 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) parse_flag); rxq->rx_hw_errors++; ol_flags |= RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD; - } else { - ol_flags |= RTE_MBUF_F_RX_IP_CKSUM_GOOD; } if (tpa_start_flg) -- 2.35.0.4.g44a5d4affccf
[PATCH 2/5] net/qede: fix bad sanity check on Rx queue release
From: Edwin Brossette As per the rte_mbuf API: the driver is responsible of initializing all the required fields. This is not done at qede alloc, meaning there can be garbage data in mbufs memory, although this garbage data should be overwritten when the mbufs are used. Since a sanity check is done when freeing the queues, its possible some remaining garbage data causes a panic when trying to release the queues if some mbufs are being processed. Use rte_pktmbuf_raw_free() instead of rte_pktmbuf_free() as the sanity check is more relaxed. Fixes: 2ea6f76aff40 ("qede: add core driver") CC: sta...@dpdk.org Signed-off-by: Edwin Brossette Acked-by: Didier Pallard --- drivers/net/qede/qede_rxtx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c index c764e3d83763..601fcb30b357 100644 --- a/drivers/net/qede/qede_rxtx.c +++ b/drivers/net/qede/qede_rxtx.c @@ -305,7 +305,7 @@ static void qede_rx_queue_release_mbufs(struct qede_rx_queue *rxq) if (rxq->sw_rx_ring) { for (i = 0; i < rxq->nb_rx_desc; i++) { if (rxq->sw_rx_ring[i]) { - rte_pktmbuf_free(rxq->sw_rx_ring[i]); + rte_mbuf_raw_free(rxq->sw_rx_ring[i]); rxq->sw_rx_ring[i] = NULL; } } -- 2.35.0.4.g44a5d4affccf