net/bnxt: wrong link status when lsc_intr is used

2023-01-20 Thread Edwin Brossette
: dev->data->dev_conf.intr_conf=1
bnxt_mq_rx_configure(): pools = 1 nb_q_per_grp = 2
bnxt_mq_rx_configure(): rxq[0] = 0x106200280 vnic[0] = 0x100200080
bnxt_mq_rx_configure(): rxq[1] = 0x105f10e40 vnic[0] = 0x100200080
bnxt_setup_one_vnic(): vnic[0] = 0x100200080 vnic->fw_grp_ids = 0x105f07e00
bnxt_hwrm_vnic_alloc(): Alloc VNIC. Start 0, End 2
bnxt_hwrm_vnic_alloc(): VNIC ID 3
bnxt_setup_one_vnic(): rxq[0]->vnic=0x100200080 vnic->fw_grp_ids=0x105f07e00
bnxt_setup_one_vnic(): rxq[1]->vnic=0x100200080 vnic->fw_grp_ids=0x105f07e00
bnxt_setup_one_vnic(): vnic->rx_queue_cnt = 2
bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0
bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0
bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0
bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0
bnxt_ulp_port_init(): Skip ulp init for port: 1, TF is not enabled
bnxt_receive_function(): Using SSE vector mode receive for port 1
bnxt_transmit_function(): Using SSE vector mode transmit for port 1
Port 1: 00:0A:F7:B6:E3:D1

Here, we can see that lsc interrupts are enabled even though we specified
not to enable them. Then given autoneg does not work on my nic, I can try
setting the link up and showing port info:

testpmd> set link-up port 0
bnxt_print_link_info(): Port 0 Link Up - speed 0 Mbps - half-duplex

testpmd> show port info 0

* Infos for port 0  *
MAC address: 00:0A:F7:B6:E3:D0
Device name: :02:00.0
Driver name: net_bnxt
Firmware-version: 223.0.161.0
Devargs:
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: None
Link duplex: half-duplex
Autoneg status: Off
MTU: 1500
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 128
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
  strip off, filter off, extend off, qinq strip off
Hash key size in bytes: 40
Redirection table size: 128
Supported RSS offload flow types:
  ipv4  ipv4-tcp  ipv4-udp  ipv6  ipv6-tcp  ipv6-udp
  user-defined-50  user-defined-51
Minimum size of RX buffer: 1
Maximum configurable length of RX packet: 9600
Maximum configurable size of LRO aggregated packet: 0
Maximum number of VMDq pools: 64
Current number of RX queues: 2
Max possible RX queues: 117
Max possible number of RXDs per queue: 8192
Min possible number of RXDs per queue: 16
RXDs number alignment: 1
Current number of TX queues: 2
Max possible TX queues: 117
Max possible number of TXDs per queue: 4096
Min possible number of TXDs per queue: 16
TXDs number alignment: 1
Max segment number per packet: 65535
Max segment number per MTU/TSO: 65535
Device capabilities: 0x3( RUNTIME_RX_QUEUE_SETUP RUNTIME_TX_QUEUE_SETUP )
Switch name: :02:00.0
Switch domain Id: 0
Switch Port Id: 32768
Device error handling mode: proactive

This shows link status is seen as up, even although link speed is None.

I was wondering if patching the code to move this line which sets lsc
interrupt on somewhere else might be reasonable, or if this could cause
further trouble. Maybe having a parameter to trigger it ON/OFF might be a
good addition.
May I have your opinion on this matter?

Sincerely,
Edwin Brossette


[PATCH] bnxt: fix unwanted interrupt config on link state change

2023-02-06 Thread edwin . brossette
From: Edwin Brossette 

When getting the device's info via bnxt_dev_info_get_op(), the device
enables interrupts on link state changes because of the following line:

  > eth_dev->data->dev_conf.intr_conf.lsc = 1;

Enabling this mode might not be wanted by the user.

The flag RTE_ETH_DEV_INTR_LSC can be used to inform the above
application that lsc interrupts are supported. Thus, checking this flag,
the user can decide whether or not to enable these interrupts.

Since there is no reason for a function meant to display config to
actually modify it, remove this line. In addition, raise the dev_flag
associated with this state on the device's intialization to show the
device supports link state change interrupts.

Fixes: 7bc8e9a227cc ("net/bnxt: support async link notification")
Signed-off-by: Edwin Brossette 
---
 drivers/net/bnxt/bnxt_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index b3de490d3667..753e86b4b2af 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -1017,7 +1017,6 @@ static int bnxt_dev_info_get_op(struct rte_eth_dev 
*eth_dev,
.tx_free_thresh = 32,
.tx_rs_thresh = 32,
};
-   eth_dev->data->dev_conf.intr_conf.lsc = 1;
 
dev_info->rx_desc_lim.nb_min = BNXT_MIN_RING_DESC;
dev_info->rx_desc_lim.nb_max = BNXT_MAX_RX_RING_DESC;
@@ -5859,6 +5858,7 @@ bnxt_dev_init(struct rte_eth_dev *eth_dev, void *params 
__rte_unused)
 
rte_eth_copy_pci_info(eth_dev, pci_dev);
eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
+   eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
 
bp = eth_dev->data->dev_private;
 
-- 
2.35.0.4.g44a5d4affccf



Re: net/bnxt: wrong link status when lsc_intr is used

2023-02-08 Thread Edwin Brossette
gt; bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0
> bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0
> bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0
>
[...]
>
bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0
> bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0
> Port 0 Link down
> bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:40,Support:140,Force:0
> bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0
> Port 1 Link down
> Done
> testpmd> show port info 0
> bnxt_hwrm_port_phy_qcfg(): Link Speed:0,Auto:4:64:140,Support:140,Force:0
> bnxt_hwrm_port_phy_qcfg(): Link Signal:0,PAM::Auto:0,Support:0,Force:0
>
> * Infos for port 0  *
> MAC address: 00:0A:F7:B6:E3:D0
> Device name: :02:00.0
> Driver name: net_bnxt
> Firmware-version: 223.0.161.0
> Devargs:
> Connect to socket: 0
> memory allocation on the socket: 0
> Link status: down<==
> Link speed: None
> Link duplex: half-duplex
> Autoneg status: Off
> MTU: 1500
> Promiscuous mode: enabled
> Allmulticast mode: disabled
> Maximum number of MAC addresses: 128
> Maximum number of MAC addresses of hash filtering: 0
> VLAN offload:
>   strip off, filter off, extend off, qinq strip off
> Hash key size in bytes: 40
> Redirection table size: 128
> Supported RSS offload flow types:
>   ipv4  ipv4-tcp  ipv4-udp  ipv6  ipv6-tcp  ipv6-udp
>   user-defined-50  user-defined-51
> Minimum size of RX buffer: 1
> Maximum configurable length of RX packet: 9600
> Maximum configurable size of LRO aggregated packet: 0
> Maximum number of VMDq pools: 64
> Current number of RX queues: 2
> Max possible RX queues: 117
> Max possible number of RXDs per queue: 8192
> Min possible number of RXDs per queue: 16
> RXDs number alignment: 1
> Current number of TX queues: 2
> Max possible TX queues: 117
> Max possible number of TXDs per queue: 4096
> Min possible number of TXDs per queue: 16
> TXDs number alignment: 1
> Max segment number per packet: 65535
> Max segment number per MTU/TSO: 65535
> Device capabilities: 0x3( RUNTIME_RX_QUEUE_SETUP RUNTIME_TX_QUEUE_SETUP )
> Switch name: :02:00.0
> Switch domain Id: 0
> Switch Port Id: 32768
> Device error handling mode: proactive
> testpmd>
>

 Here you can see the link status appears down if lsc interrupts are turned
off, which is the expected result.

Sincerely,
Edwin Brossette.

On Tue, Jan 31, 2023 at 6:08 AM Somnath Kotur 
wrote:

> On Thu, Jan 19, 2023 at 7:07 PM Edwin Brossette
>  wrote:
> >
> > Hello,
> >
> Hi Edwin,
>  Thanks for reaching out, here's my attempt at answering your questions
> > I am trying to operate a Broadcom BCM57414 2x10G nic using dpdk bnxt
> pmd. I use DPDK 22.11.
> > However, doing so I stumbled over a number of different issues. Mainly,
> using the dpdk rte_eth library, I don't seem to be able to correctly poll
> the link status: I expect my nic has a problem using autoneg to set the
> link speed/duplex, because these parameters remain unknown while autoneg is
> on. However, after trying to set link up, instead of showing the link state
> as down, I see the link being up, which is in truth not the case, as no
> packets can transit on the line and the switch at the other end sees it
> down.
> >
> > When searching around and trying to debug the code, I found the function
> bnxt_dev_info_get_op() sets my nic in interrupt mode:
> >
> > > eth_dev->data->dev_conf.intr_conf.lsc = 1;
> This was added long back in the driver code by this commit (as you can
> see 6+ yrs old :)) , so I believe at the time the intent was to get
> this link notification asynchronous
> and 'lsc' I believe was for link state change?  Please suggest an
> alternative place to do this or you may even post the patch yourself
> to help do this the right way
> commit 7bc8e9a227ccbc649c2d230f0ee92ee58b1cb955
> Author: Ajit Khaparde 
> Date:   Tue Oct 11 16:47:50 2016 -0500
>
> net/bnxt: support async link notification
>
> >
> > Which is a bit of a surprising thing to do for a function meant to get
> info.
> > Thus my card ends up working in a mode I didn't configure it to, which
> may be the cause of my issue: later when setting the link up in function
> bnxt_dev_set_link_up_op():
> >
> > >  if (!bp->link_info->link_up)
> > > rc = bnxt_set_hwrm_link_config(bp, true);
> > >  if (!rc)
> > > eth_dev->data->dev_link.link_status = 1;
> >
> > So link_status in eth_dev g

[ixgbevf] Problem with RSS initial config after device init on X550 nic

2024-02-15 Thread Edwin Brossette
Hello,

We recently ran into an issue with our product when working with an X550
nic with stable dpdk-23.11. We observed that all the incoming traffic was
directed only into a single queue.

The issue became evident after displaying the RSS reta which was fully
zeroed after device init, thus directing all traffic to rxq0. Moreover, RSS
hash key did not seem to be correctly initialized. Manually setting the
reta afterwards was enough to balance the incoming traffic between our
queues, which convinced me that the issue here was simply a matter of
correctly initializing the device on port start.

Looking into the pmd's code, I couldn't see any RSS configuration done vf
side at device startup, at least not in ixgbevf_dev_rx_init(). I've seen
ixgbe_dev_mq_rx_configure() was called during the pf's init and configured
RSS to be handled by vf if sriov was on, but this isn't enough to fully
configure RSS for the vf. (see code here:
https://git.dpdk.org/dpdk/tree/drivers/net/ixgbe/ixgbe_rxtx.c#n4644 )

I have also observed that all different models of nics using ixgbe did not
handle RSS in the same way. For example, for nics of the 82599 series, it
is written in their datasheet that, for IOV mode: "— Note that RSS is not
supported in IOV mode since there is only a single RSS hash function in the
hardware." On the contrary, x550 nics have special registers to handle RSS
in VF, like VFRSSRK and VFRETA, for example. I believe the RSS config not
being initialized for X550 nics might come from a slight misunderstanding
on this part.

Therefore, I can suggest a patch to add a call to ixgbe_rss_configure()
somewhere in ixgbevf_dev_rx_init() specifically for this model of nic.
Despite this function being named ixgbe_xxx instead of ixgbevf_xxx, it will
do correct initialization for RSS in vf mode because all functions to get
RSS-related registers such as ixgbe_reta_reg_get() or ixgbe_reta_size_get()
will check if the device is in vf or pf mode and fetch the appropriate
registers.

Here is a way to reproduce, on an X550 card:

Here are the nics I am using:
:08:00.0  ntfp1   ac:1f:6b:57:57:74  ixgbe1x2.5 GT/s PCIe  1x2.5
GT/s PCIe  Intel Corporation Ethernet Connection X553 10 GbE SFP+
:08:00.1  ntfp2   ac:1f:6b:57:57:75  ixgbe1x2.5 GT/s PCIe  1x2.5
GT/s PCIe  Intel Corporation Ethernet Connection X553 10 GbE SFP+
:08:10.0  eth0d2:c4:fc:c5:c3:05  ixgbevf  1x2.5 GT/s PCIe
 0xUnknownIntel Corporation X553 Virtual Function
:08:10.2  eth1e2:a8:68:09:20:29  ixgbevf  1x2.5 GT/s PCIe
 0xUnknownIntel Corporation X553 Virtual Function

1) Starting up dpdk-testpmd:

sudo dpdk-hugepages.py --setup 2G;
dpdk-devbind --bind=vfio-pci :08:10.0
dpdk-devbind --bind=vfio-pci :08:10.2

dpdk-testpmd -a :08:10.0 -a :08:10.2 -- -i --rxq=2 --txq=2
--coremask=0xff0 --total-num-mbufs=25
EAL: Detected CPU lcores: 12
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_ixgbe_vf (8086:15c5) device: :08:10.0
(socket -1)
EAL: Probe PCI driver: net_ixgbe_vf (8086:15c5) device: :08:10.2
(socket -1)
Interactive-mode selected
previous number of forwarding cores 1 - changed to number of configured
cores 8
Warning: NUMA should be configured manually by using --port-numa-config and
--ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool : n=25, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 02:09:C0:9E:09:75
Configuring Port 1 (socket 0)
Port 1: 02:09:C0:76:6D:4B
Checking link statuses...
Done
testpmd>

2) Display port info:

testpmd> show port info 0

* Infos for port 0  *
MAC address: 02:09:C0:B8:68:2F
Device name: :08:10.0
Driver name: net_ixgbe_vf
Firmware-version: not available
Devargs:
Connect to socket: 0
memory allocation on the socket: 0
Link status: down
Link speed: None
Link duplex: half-duplex
Autoneg status: On
MTU: 1500
Promiscuous mode: disabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 128
Maximum number of MAC addresses of hash filtering: 4096
VLAN offload:
  strip off, filter off, extend off, qinq strip off
Hash key size in bytes: 40
Redirection table size: 64
Supported RSS offload flow types:
  ipv4  ipv4-tcp  ipv4-udp  ipv6  ipv6-tcp  ipv6-udp  ipv6-ex
  ipv6-tcp-ex  ipv6-udp-ex
Minimum size of RX buffer: 1024
Maximum configurable length of RX packet: 9728
Maximum configurable size of LRO aggregated packet: 0
Maximum number of VMDq pools: 64
Current number of RX queues: 2
Max possible RX queues: 4
Max possible number of RXDs per queue: 4096
Min possible number of RXDs per queue: 32
RXDs number alignment: 8
Current number of TX queues: 2
Max possible TX queues: 4
Max possible number of TXDs per queue: 4096
Min possible number of 

[PATCH] net/ixgbevf: fix RSS init for x550 nics

2024-02-15 Thread edwin . brossette
From: Edwin Brossette 

Different Intel nics with the igxbe pmd do not handle RSS in the same
way when working with virtualization. While some nics like Intel 82599ES
only have a single RSS table in the device and leave all rss features to
be handled by the pf, some other nics like x550 let the vf handle RSS
features. This can lead to different behavior when rss is enabled
depending on the model of nic used.

In particular, it occurred that ixgbevf_dev_rx_init() do not initiate
rss parameters at device init, even if the multi-queue mode option is
set in the device configuration (ie: RTE_ETH_MQ_RX_RSS is set). Note
that this issue went unnoticed until now, probably because some nics do
not really have support for RSS in virtualization mode.

Thus, depending on the nic used, we can we find ourselves in a situation
where RSS is not configured despite being enabled. This will cause
serious performance issues because the RSS reta will be fully zeroed,
causing all packets to go only in the first queue and leaving all the
others empty.

By looking at ixgbe_reta_size_get(), we can see that only X550 nic
models have a non zero reta size set in vf mode. Thus add a call to
ixgbe_rss_configure() for these cards in ixgbevf_dev_rx_init() if the
option to enable RSS is set.

Fixes: f4d1598ee14f ("ixgbevf: support RSS config on x550")
Signed-off-by: Edwin Brossette 
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 537aa2f68de8..0aa968f7e258 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -5873,6 +5873,25 @@ ixgbevf_dev_rx_init(struct rte_eth_dev *dev)
IXGBE_PSRTYPE_RQPL_SHIFT;
IXGBE_WRITE_REG(hw, IXGBE_VFPSRTYPE, psrtype);
 
+   /* Initialize the rss for x550_vf cards if enabled */
+   switch (hw->mac.type) {
+   case ixgbe_mac_X550_vf:
+   case ixgbe_mac_X550EM_x_vf:
+   case ixgbe_mac_X550EM_a_vf:
+   switch (dev->data->dev_conf.rxmode.mq_mode) {
+   case RTE_ETH_MQ_RX_RSS:
+   case RTE_ETH_MQ_RX_DCB_RSS:
+   case RTE_ETH_MQ_RX_VMDQ_RSS:
+   ixgbe_rss_configure(dev);
+   break;
+   default:
+   break;
+   }
+   break;
+   default:
+   break;
+   }
+
ixgbe_set_rx_function(dev);
 
return 0;
-- 
2.35.0.4.g44a5d4affccf



net/virtio: duplicated xstats

2023-11-24 Thread Edwin Brossette
Hello,

I noticed a small inconsistency in the virtio pmd's xstats.
The stat "rx_q0_errors" appears twice.
I also think the stats "rx_q0_packets", "rx_q0_bytes", "tx_q0_packets" and
"tx_q0_bytes" are duplicates of "rx_q0_good_packets", "rx_q0_good_bytes",
"tx_q0_good_packets" and "tx_q0_good_bytes"

I believe this issue probably appeared after this commit:

f30e69b41f94: ethdev: add device flag to bypass auto-filled queue xstats
http://scm.6wind.com/vendor/dpdk.org/dpdk/commit/?id=f30e69b41f949cd4a9afb6ff39de196e661708e2

>From what I understand, the rxq0_error stat was originally reported by the
librte. However, changes were made so it is reported by the pmd instead.
The flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS was temporarily set to keep the
old behaviour so that every pmd could have time to adapt the change.
But it seems the flag was forgotten in the virtio pmd and as a result, some
stats are fetched at two different times when displaying xstats.

First in lib_rte_ethdev:
https://git.dpdk.org/dpdk/tree/lib/ethdev/rte_ethdev.c#n3266
(you can see the check on the RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS flag before
the snprintf on eth_dev_rxq_stats_strings[] )

And a second time in the virtio pmd:
https://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtio_ethdev.c#n705 pmd
(see the snprintf on rte_virtio_rxq_stat_strings[] )

This problem can be reproduced on testpmd simply by displaying the xstats
on a port with the net_virtio driver:

Reproduction:
===

1) start dpdk-testpmd:

modprobe -a uio_pci_generic
dpdk-devbind -b uio_pci_generic 03:00.0
dpdk-devbind -b uio_pci_generic 04:00.0

dpdk-devbind -s

Network devices using DPDK-compatible driver

:03:00.0 'Virtio 1.0 network device 1041' drv=uio_pci_generic
unused=vfio-pci
:04:00.0 'Virtio 1.0 network device 1041' drv=uio_pci_generic
unused=vfio-pci
[...]

dpdk-testpmd -a :03:00.0 -a :04:00.0 -- -i --rxq=1 --txq=1
--coremask=0x4 --total-num-mbufs=25
EAL: Detected CPU lcores: 3
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: net_virtio (1af4:1041) device: :03:00.0 (socket
-1)
EAL: Probe PCI driver: net_virtio (1af4:1041) device: :04:00.0 (socket
-1)
Interactive-mode selected
Warning: NUMA should be configured manually by using --port-numa-config and
--ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool : n=25, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 52:54:00:B0:8F:88
Configuring Port 1 (socket 0)
Port 1: 52:54:00:EF:09:1F
Checking link statuses...
Done

2) port info:

show port info 0

* Infos for port 0  *
MAC address: 52:54:00:B0:8F:88
Device name: :03:00.0
Driver name: net_virtio
Firmware-version: not available
Devargs:
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: Unknown
Link duplex: full-duplex
Autoneg status: On
MTU: 1500
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 64
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
  strip off, filter off, extend off, qinq strip off
No RSS offload flow type is supported.
Minimum size of RX buffer: 64
Maximum configurable length of RX packet: 9728
Maximum configurable size of LRO aggregated packet: 0
Current number of RX queues: 1
Max possible RX queues: 1
Max possible number of RXDs per queue: 32768
Min possible number of RXDs per queue: 32
RXDs number alignment: 1
Current number of TX queues: 1
Max possible TX queues: 1
Max possible number of TXDs per queue: 32768
Min possible number of TXDs per queue: 32
TXDs number alignment: 1
Max segment number per packet: 65535
Max segment number per MTU/TSO: 65535
Device capabilities: 0x0( )
Device error handling mode: none
Device private info:
guest_features: 0x110af8020
vtnet_hdr_size: 12
use_vec: rx-0 tx-0
use_inorder: rx-0 tx-0
intr_lsc: 1
max_mtu: 9698
max_rx_pkt_len: 1530
max_queue_pairs: 1
req_guest_features: 0x805f10ef8028

3) show port xstats:

show port xstats 0
## NIC extended statistics for port 0
rx_good_packets: 0
tx_good_packets: 0
rx_good_bytes: 0
tx_good_bytes: 0
rx_missed_errors: 0
rx_errors: 0
tx_errors: 0
rx_mbuf_allocation_errors: 0
rx_q0_packets: 0
rx_q0_bytes: 0
rx_q0_errors: 0  <==
tx_q0_packets: 0
tx_q0_bytes: 0
rx_q0_good_packets: 0
rx_q0_good_bytes: 0
rx_q0_errors: 0  <==
rx_q0_multicast_packets: 0
rx_q0_broadcast_packets: 0
rx_q0_undersize_packets: 0
rx_q0_size_64_packets: 0
rx_q0_size_65_127_packets: 0
rx_q0_size_128_255_packets: 0
rx_q0_size_256_511_packets: 0
rx_q0_size_512_1023_packets: 0
rx_q0_size_1024_1518_packets: 0
rx_q0_size_1519_max_packets: 0
tx_q0_good_packets: 0
tx_q0_good_bytes: 0
tx_q0_multicast_packets: 0
tx_q0_br

Re: net/virtio: duplicated xstats

2023-11-24 Thread Edwin Brossette
Hello again,

The flag is already set during the device init, so it should be removed,
not added.
I can confirm removing it fixed my issue.

I will submit a patch for this bug.

Regards,
Edwin Brossette.

On Fri, Nov 24, 2023 at 11:39 AM Maxime Coquelin 
wrote:

> Hi Edwin,
>
> Thanks for reporting the issue.
>
> On 11/24/23 10:26, Olivier Matz wrote:
> > Fix Maxime's mail.
>
> Thanks Olivier.
>
> > On Fri, Nov 24, 2023 at 10:18:27AM +0100, Edwin Brossette wrote:
> >> Hello,
> >>
> >> I noticed a small inconsistency in the virtio pmd's xstats.
> >> The stat "rx_q0_errors" appears twice.
> >> I also think the stats "rx_q0_packets", "rx_q0_bytes", "tx_q0_packets"
> and
> >> "tx_q0_bytes" are duplicates of "rx_q0_good_packets",
> "rx_q0_good_bytes",
> >> "tx_q0_good_packets" and "tx_q0_good_bytes"
> >>
> >> I believe this issue probably appeared after this commit:
> >>
> >> f30e69b41f94: ethdev: add device flag to bypass auto-filled queue xstats
> >>
> http://scm.6wind.com/vendor/dpdk.org/dpdk/commit/?id=f30e69b41f949cd4a9afb6ff39de196e661708e2
>
> Adding Ferruh as he is the author of this commit.
>
> >>  From what I understand, the rxq0_error stat was originally reported by
> the
> >> librte. However, changes were made so it is reported by the pmd instead.
> >> The flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS was temporarily set to keep
> the
> >> old behaviour so that every pmd could have time to adapt the change.
> >> But it seems the flag was forgotten in the virtio pmd and as a result,
> some
> >> stats are fetched at two different times when displaying xstats.
>
> Have you tried adding this flag to Virtio PMD? Does it fixes the issue?
>
> >> First in lib_rte_ethdev:
> >> https://git.dpdk.org/dpdk/tree/lib/ethdev/rte_ethdev.c#n3266
> >> (you can see the check on the RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS flag
> before
> >> the snprintf on eth_dev_rxq_stats_strings[] )
> >>
> >> And a second time in the virtio pmd:
> >> https://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtio_ethdev.c#n705
> pmd
> >> (see the snprintf on rte_virtio_rxq_stat_strings[] )
> >>
> >> This problem can be reproduced on testpmd simply by displaying the
> xstats
> >> on a port with the net_virtio driver:
> >>
> >> Reproduction:
> >> ===
> >>
> >>  1) start dpdk-testpmd:
> >>
> >> modprobe -a uio_pci_generic
> >> dpdk-devbind -b uio_pci_generic 03:00.0
> >> dpdk-devbind -b uio_pci_generic 04:00.0
> >>
> >> dpdk-devbind -s
> >>
> >> Network devices using DPDK-compatible driver
> >> 
> >> :03:00.0 'Virtio 1.0 network device 1041' drv=uio_pci_generic
> >> unused=vfio-pci
> >> :04:00.0 'Virtio 1.0 network device 1041' drv=uio_pci_generic
> >> unused=vfio-pci
> >> [...]
> >>
> >> dpdk-testpmd -a :03:00.0 -a :04:00.0 -- -i --rxq=1 --txq=1
> >> --coremask=0x4 --total-num-mbufs=25
> >> EAL: Detected CPU lcores: 3
> >> EAL: Detected NUMA nodes: 1
> >> EAL: Detected static linkage of DPDK
> >> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> >> EAL: Selected IOVA mode 'PA'
> >> EAL: VFIO support initialized
> >> EAL: Probe PCI driver: net_virtio (1af4:1041) device: :03:00.0
> (socket
> >> -1)
> >> EAL: Probe PCI driver: net_virtio (1af4:1041) device: :04:00.0
> (socket
> >> -1)
> >> Interactive-mode selected
> >> Warning: NUMA should be configured manually by using --port-numa-config
> and
> >> --ring-numa-config parameters along with --numa.
> >> testpmd: create a new mbuf pool : n=25, size=2176,
> socket=0
> >> testpmd: preferred mempool ops selected: ring_mp_mc
> >> Configuring Port 0 (socket 0)
> >> Port 0: 52:54:00:B0:8F:88
> >> Configuring Port 1 (socket 0)
> >> Port 1: 52:54:00:EF:09:1F
> >> Checking link statuses...
> >> Done
> >>
> >>  2) port info:
> >>
> >> show port info 0
> >>
> >> * Infos for port 0  *
> >> MAC address: 52:54:00:B0:8F:88
> >> Device name: :03:00.0
> >> Driver name: net_virtio
> >> Firmware-version: not available
> >> Devargs

[PATCH] net/virtio: fix duplicated rxq xstats

2023-11-24 Thread edwin . brossette
From: Edwin Brossette 

The flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS was temporarily set while
moving queue stats from 'struct rte_eth_stats' to the individual pmds,
as explained in commit f30e69b41f94 ("ethdev: add device flag to bypass
auto-filled queue xstats").

This flag was added so every pmd would keep its original behavior until
the change was implemented. However, this flag was not removed
afterwards in the virtio pmd and as a result, some queue stats are
displayed twice when trying to get them: once in lib_rte_ethdev, and a
second time in the virtio pmd.

Remove this flag so stats are printed only once.

Fixes: f30e69b41f94 ("ethdev: add device flag to bypass auto-filled queue 
xstats")
Cc: sta...@dpdk.org

Signed-off-by: Edwin Brossette 
---
 drivers/net/virtio/virtio_ethdev.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c2c0a1a11137..517585740eeb 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1793,8 +1793,6 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t 
req_features)
else
eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
 
-   eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
-
/* Setting up rx_header size for the device */
if (virtio_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
virtio_with_feature(hw, VIRTIO_F_VERSION_1) ||
-- 
2.35.0.4.g44a5d4affccf



net/e1000: Test issues following change in max rx/tx queues

2023-11-30 Thread Edwin Brossette
Hello,

We recently started to observe concerning behaviour on our continuous
integration platform  following this commit, which set the maximum amount
of tx/rx queues at 2 for the e1000 pmd:

net/e1000: fix queue number initialization
https://git.dpdk.org/dpdk/commit/?id=c1a42d646472fd3477429bf016f682e0865b77f0

Reverting this change locally on our side was enough to fix our test issues.

There is a considerately long explanation in a comment just above the
change explaining why the number of rx/tx queues is limited at 1, yet it
was changed at 2 anyway.

Since I couldn't see any mention in logs about theses restrictions being
removed, I wished to inquire whether this change was truly intended and if
there was any problem which motivated it.
Maybe the max number of rx/tx queues should reverted back to 1? Or maybe
the comment should be updated if limitations are no longer true?

Regards,
Edwin Brossette.


net/ixgbe: all interrupt disabled after enabling lsc on X552/X557-AT card

2024-04-18 Thread Edwin Brossette
Hello,

I have found a bug in the ixgbe pmd, concerning this model of nic in
particular:
 -> Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T

I was able to detect this bug after enabling the use of Link state change
interrupt in the card via the following flag:
- port->rte_conf.intr_conf.lsc = 1;

With these interrupts enabled on this particular network card, there is a
chance on every link-up that all interrupts get disabled, as the interrupt
mask is unintentionally set to 0.
This happens because there are two interrupts that are triggered on a link
status change on this model of nic. See related code here:
https://git.dpdk.org/dpdk/tree/drivers/net/ixgbe/ixgbe_ethdev.c?id=c99e1db87798c2cd2c0ac94c7676e961eedffc87#n4552

Whenever an lsc interrupt is triggered, we enter this
ixgbe_dev_interrupt_get_status() function.
Bit 20 (IXGBE_EICR_LSC ) and 25 (IXGBE_EICR_GPI_SDP0_X550EM_x ) of the
Extendend Interrupt Cause Register (EICR) will be set to '1' on a link
status change.
When the interrupt is handled in the ixgbe_dev_interrupt_delayed_handler()
function, if the IXGBE_FLAG_NEED_LINK_UPDATE flag is raised (happens when
EICR_LSC bit is raised in eicr),  an alarm for a delayed handler will be
programmed.

The following code poses issue:
https://git.dpdk.org/dpdk/tree/drivers/net/ixgbe/ixgbe_ethdev.c?id=c99e1db87798c2cd2c0ac94c7676e961eedffc87#n4736

intr->mask = intr->mask_original;
intr->mask_original = 0;

Sometimes, it's possible to see a first interrupt with bit IXGBE_EICR_LSC
raised, quickly followed by a second interrupt with both bit IXGBE_EICR_LSC
and bit IXGBE_EICR_GPI_SDP0_X550EM_x being raised.
In this case, the flag IXGBE_FLAG_NEED_LINK_UPDATE will be raised in both
cases and the delayed handler will be programmed to be ran twice.
This will lead to intr->mask_original being set to 0 after the first
delayed_handler is done, and then intr->mask being also set to 0 after the
second time it's run.
This effectively disables all interrupts from working on the device, which
is particularly painful since we no longer poll for link state on our side,
having switched to interrupt-based link state handling.

My suggestion to solve this issue is to avoid creating an alarm for
ixgbe_dev_interrupt_delayed_handler() if there is already one pending. I do
this by checking for bit IXGBE_EIMS_LSC  in intr->mask.
I no longer see this issue after pushing in my local repository. I will
suggest a patch by responding to this mail.

I can show a way to reproduce bellow with test pmd. I have added some
custom logs into the code for my own convenience:

=

1) start testpmd:

PCI-SLOT  IFNAME  MAC-ADDRESSKMOD   MAX SPEEDCURRENT
SPEEDDEVICE
:03:00.0  ntfp1   0c:c4:7a:75:8f:ec  ixgbe  1x2.5 GT/s PCIe  1x2.5 GT/s
PCIe  Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T
:03:00.1  ntfp2   0c:c4:7a:75:8f:ed  ixgbe  1x2.5 GT/s PCIe  1x2.5 GT/s
PCIe  Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T

sudo dpdk-hugepages.py --setup 2G;
dpdk-devbind --bind=vfio-pci :03:00.0
dpdk-devbind --bind=vfio-pci :03:00.1

 dpdk-testpmd -a :03:00.0 -a :03:00.1 -- -i --rxq=2 --txq=2
--coremask=0x0c --total-num-mbufs=25
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Using IOMMU type 8 (No-IOMMU)
EAL: Probe PCI driver: net_ixgbe (8086:15ad) device: :03:00.0 (socket
-1)
EAL: Probe PCI driver: net_ixgbe (8086:15ad) device: :03:00.1 (socket
-1)
Interactive-mode selected
previous number of forwarding cores 1 - changed to number of configured
cores 2
Warning: NUMA should be configured manually by using --port-numa-config and
--ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool : n=25, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 0C:C4:7A:75:8F:EC
Configuring Port 1 (socket 0)
Port 1: 0C:C4:7A:75:8F:ED
Checking link statuses...
Done


2) To reproduce, simply repeatedly set link up and down. At one point
there should be something like this:

testpmd> set link-up port 0
ixgbe_dev_interrupt_get_status(): <><><><>><>: eicr register:
10  <-- IXGBE_EICR_LSC
ixgbe_dev_interrupt_get_status(): <><><><>><>: set
IXGBE_FLAG_NEED_LINK_UPDATE
ixgbe_dev_interrupt_get_status(): <><><><>><>: current flags: flags = 1
ixgbe_dev_interrupt_action(): <><><><>><>: current flags (1): flags = 1
ixgbe_dev_interrupt_action(): <><><><>><>: current flags (2): flags = 1
ixgbe_dev_interrupt_action(): <><><><>><>: current flags (3): flags = 1
ixgbe_dev_interrupt_action(): <><><><>><>: for dev: 0x557c0cd844c0
ixgbe_dev_interrupt_action(): <><><><>><>: saved intr: mask_original =
36700160  < original interrupt mask
ixgbe_dev_interrupt_act

[PATCH] net/ixgbe: don't create a delayed interrupt handler if one already exists

2024-04-18 Thread edwin . brossette
From: Edwin Brossette 

Since link state may need some time to stabilize after a link state
change, we cannot update the link state right after one occurs. So link
state change interrupts (lsc) are handled after a delay. To do this, an
alarm to call a delayed handler is programmed. This delayed handler is
tasked with updating the link after a variable delay of one to four
seconds which should be enough time for the link state to become stable
again.

However, a problem can occur with some models of network cards. For
example, ixgbe_mac_X550EM_x may trigger this interrupt twice because
another interrupt signal is received on the General Purpose Interrupt
pin SPD0, which has the same interrupt handler. In such a case, the
delayed interrupt handler would be programmed to be executed twice.

Since we save the original interrupt mask value to restore it after the
delayed handler is done with its work, we end up overwritting its value
after the second alarm is programmed. Even worse: when restoring it the
first time, the saved original mask variable is reset to 0, so we end up
completely disabling all interrupts when trying to restore this mask
after the second time the delayed handler is executed.

Add a check on the interrupt mask value when programming the alarm for
the delayed handler. If the bit for lsc interrupts is unset, it means an
alarm was already programmed for the delayed handler. In this case, skip
the alarm creation.

Fixes: 9b667210700e ("net/ixgbe: fix blocked interrupts")
Cc: sta...@dpdk.org

Signed-off-by: Edwin Brossette 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index c61c52b2966b..52cafcbc965f 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -4667,14 +4667,20 @@ ixgbe_dev_interrupt_action(struct rte_eth_dev *dev)
timeout = IXGBE_LINK_DOWN_CHECK_TIMEOUT;
 
ixgbe_dev_link_status_print(dev);
-   if (rte_eal_alarm_set(timeout * 1000,
- ixgbe_dev_interrupt_delayed_handler, 
(void *)dev) < 0)
-   PMD_DRV_LOG(ERR, "Error setting alarm");
-   else {
-   /* remember original mask */
-   intr->mask_original = intr->mask;
-   /* only disable lsc interrupt */
-   intr->mask &= ~IXGBE_EIMS_LSC;
+
+   /* Don't program delayed handler if LSC interrupt is disabled.
+* It means one is already programmed.
+*/
+   if (intr->mask & IXGBE_EIMS_LSC){
+   if (rte_eal_alarm_set(timeout * 1000,
+ 
ixgbe_dev_interrupt_delayed_handler, (void *)dev) < 0)
+   PMD_DRV_LOG(ERR, "Error setting alarm");
+   else {
+   /* remember original mask */
+   intr->mask_original = intr->mask;
+   /* only disable lsc interrupt */
+   intr->mask &= ~IXGBE_EIMS_LSC;
+   }
}
}
 
-- 
2.35.0.4.g44a5d4affccf



Crash in tap pmd when using more than 8 rx queues

2024-09-05 Thread Edwin Brossette
Hello,

I have recently stumbled into an issue with my DPDK-based application
running the failsafe pmd. This pmd uses a tap device, with which my
application fails to start if more than 8 rx queues are used. This issue
appears to be related to this patch:
https://git.dpdk.org/dpdk/commit/?id=c36ce7099c2187926cd62cff7ebd479823554929

I have seen in the documentation that there was a limitation to 8 max
queues shared when using a tap device shared between multiple processes.
However, my application uses a single primary process, with no secondary
process, but it appears that I am still running into this limitation.

Now if we look at this small chunk of code:

memset(&msg, 0, sizeof(msg));
strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
strlcpy(request_param->port_name, dev->data->name,
sizeof(request_param->port_name));
msg.len_param = sizeof(*request_param);
for (i = 0; i < dev->data->nb_tx_queues; i++) {
msg.fds[fd_iterator++] = process_private->txq_fds[i];
msg.num_fds++;
request_param->txq_count++;
}
for (i = 0; i < dev->data->nb_rx_queues; i++) {
msg.fds[fd_iterator++] = process_private->rxq_fds[i];
msg.num_fds++;
request_param->rxq_count++;
}
(Note that I am not using the latest DPDK version, but stable v23.11.1. But
I believe the issue is still present on latest.)

There are no checks on the maximum value i can take in the for loops. Since
the size of msg.fds is limited by the maximum of 8 queues shared between
process because of the IPC API, there is a potential buffer overflow which
can happen here.

See the struct declaration:
struct rte_mp_msg {
 char name[RTE_MP_MAX_NAME_LEN];
 int len_param;
 int num_fds;
 uint8_t param[RTE_MP_MAX_PARAM_LEN];
 int fds[RTE_MP_MAX_FD_NUM];
};

This means that if the number of queues used is more than 8, the program
will crash. This is what happens on my end as I get the following log:
*** stack smashing detected ***: terminated

Reverting the commit mentionned above fixes my issue. Also setting a check
like this works for me:

if (dev->data->nb_tx_queues + dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM)
 return -1;

I've made the changes on my local branch to fix my issue. This mail is just
to bring attention on this problem.
Thank you in advance for considering it.

Regards,
Edwin Brossette.


net/failsafe: segfault happens on hotplug alarm.

2024-09-05 Thread Edwin Brossette
Hello,

I recently ran into an issue when using DPDK's failsafe pmd on a Microsoft
Azure setup. On this setup, I have the failsafe pmd managing a netvsc
interface with a Mellanox nic (which can be used through the hardware
acceleration feature). A segfault is sometimes seen whenever I unplug the
Mellanox device, which is a case that should be handled by the pmd. On a
more recent DPDK version (I tested with stable v23.11.1), this segfault is
systematic.

This seems to happen because the function rte_eth_dev_release_port() is
called twice when the hotplug_alarm triggers. You can see it in this bit of
code here:
https://git.dpdk.org/dpdk/tree/drivers/net/failsafe/failsafe_ether.c#n276

In the fs_dev_remove() function, the rte_eth_dev_close() calls run the
rte_eth_dev_release_port() the first time, and it is then called a second
time when handling the DEV_PROBED case. I noticed when searching into the
mailing list that this problem was already seen once and a patch was
suggested. Here is the link to the mail:
https://mails.dpdk.org/archives/dev/2022-November/256898.html

Applying this patch on my local branched fixed the issue on my end although
I cannot attest for sure that it is the best possible fix. I could see a
memory leak happening if the fs_dev_remove() function was called as
sdev->state == DEV_PROBED, as the release would never be done. But it is
still weird that if sdev->state == DEV_STARTED, we do
rte_eth_dev_release_port() first then rte_dev_remove(), and if sdev->state
== DEV_PROBED, we call them in the opposite order.

Perhaps someone could look into it?

Regards,
Edwin Brossette.


Re: Crash in tap pmd when using more than 8 rx queues

2024-09-06 Thread Edwin Brossette
Hello,

I created a Bugzilla PR, just as you requested:
https://bugs.dpdk.org/show_bug.cgi?id=1536

As for the bug resolution, I have other matters to attend to and I'm afraid
I cannot spend more time on this issue, so I was only planning to report it.

Regards,
Edwin Brossette.

On Fri, Sep 6, 2024 at 1:16 PM Ferruh Yigit  wrote:

> On 9/5/2024 1:55 PM, Edwin Brossette wrote:
> > Hello,
> >
> > I have recently stumbled into an issue with my DPDK-based application
> > running the failsafe pmd. This pmd uses a tap device, with which my
> > application fails to start if more than 8 rx queues are used. This issue
> > appears to be related to this patch:
> > https://git.dpdk.org/dpdk/commit/?
> > id=c36ce7099c2187926cd62cff7ebd479823554929 <https://git.dpdk.org/dpdk/
> > commit/?id=c36ce7099c2187926cd62cff7ebd479823554929>
> >
> > I have seen in the documentation that there was a limitation to 8 max
> > queues shared when using a tap device shared between multiple processes.
> > However, my application uses a single primary process, with no secondary
> > process, but it appears that I am still running into this limitation.
> >
> > Now if we look at this small chunk of code:
> >
> > memset(&msg, 0, sizeof(msg));
> > strlcpy(msg.name <http://msg.name>, TAP_MP_REQ_START_RXTX,
> > sizeof(msg.name <http://msg.name>));
> > strlcpy(request_param->port_name, dev->data->name, sizeof(request_param-
> >>port_name));
> > msg.len_param = sizeof(*request_param);
> > for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > msg.fds[fd_iterator++] = process_private->txq_fds[i];
> > msg.num_fds++;
> > request_param->txq_count++;
> > }
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > msg.fds[fd_iterator++] = process_private->rxq_fds[i];
> > msg.num_fds++;
> > request_param->rxq_count++;
> > }
> > (Note that I am not using the latest DPDK version, but stable v23.11.1.
> > But I believe the issue is still present on latest.)
> >
> > There are no checks on the maximum value i can take in the for loops.
> > Since the size of msg.fds is limited by the maximum of 8 queues shared
> > between process because of the IPC API, there is a potential buffer
> > overflow which can happen here.
> >
> > See the struct declaration:
> > struct rte_mp_msg {
> >  char name[RTE_MP_MAX_NAME_LEN];
> >  int len_param;
> >  int num_fds;
> >  uint8_t param[RTE_MP_MAX_PARAM_LEN];
> >  int fds[RTE_MP_MAX_FD_NUM];
> > };
> >
> > This means that if the number of queues used is more than 8, the program
> > will crash. This is what happens on my end as I get the following log:
> > *** stack smashing detected ***: terminated
> >
> > Reverting the commit mentionned above fixes my issue. Also setting a
> > check like this works for me:
> >
> > if (dev->data->nb_tx_queues + dev->data->nb_rx_queues >
> RTE_MP_MAX_FD_NUM)
> >  return -1;
> >
> > I've made the changes on my local branch to fix my issue. This mail is
> > just to bring attention on this problem.
> > Thank you in advance for considering it.
> >
>
> Hi Edwin,
>
> Thanks for the report, I confirm issue is valid, although that code
> changed a little (to increase 8 limit) [3].
>
> And in this release Stephen put another patch [1] to increase the limit
> even more, but irrelevant from the limit, tap code needs to be fixed.
>
> To fix:
> 1. We need to add "nb_rx_queues > RTE_MP_MAX_FD_NUM" check you
> mentioned, to not blindly update the 'msg.fds[]'
> 2. We should prevent this to be a limit for tap PMD when there is only
> primary process, this seems was oversight in our end.
>
>
> Can you work on the issue or just reporting it?
> Can you please report the bug in Bugzilla [2], to record the issue?
>
>
>
> [1]
>
> https://patches.dpdk.org/project/dpdk/patch/20240905162018.74301-1-step...@networkplumber.org/
>
> [2]
> https://bugs.dpdk.org/
>
> [3]
> https://git.dpdk.org/dpdk/commit/?id=72ab1dc1598e
>
>


net/netvsc: problem with configuring netvsc port after first port start

2024-10-01 Thread Edwin Brossette
nimum size of RX buffer: 1024
Maximum configurable length of RX packet: 65536
Maximum configurable size of LRO aggregated packet: 0
Current number of RX queues: 2
Max possible RX queues: 64
Max possible number of RXDs per queue: 65535
Min possible number of RXDs per queue: 0
RXDs number alignment: 1
Current number of TX queues: 2
Max possible TX queues: 64
Max possible number of TXDs per queue: 4096
Min possible number of TXDs per queue: 1
TXDs number alignment: 1
Max segment number per packet: 40
Max segment number per MTU/TSO: 40
Device capabilities: 0x0( )
Device error handling mode: none
Device private info:
  none

 -> First, stop port:

testpmd> port stop 3
Stopping ports...
Done

 -> Then, change something in the port config. This will trigger a call
to rte_eth_dev_configure() on the next port start. Here I change the link
speed/duplex:
testpmd> port config 3 speed 1 duplex full
testpmd>

 -> Finally, try to start the port:

testpmd> port start 3
Configuring Port 3 (socket 0)
hn_nvs_alloc_subchans(): nvs subch alloc failed: 0x2
hn_dev_configure(): subchannel configuration failed
ETHDEV: Port3 dev_configure = -5
Fail to configure port 3  <--


As you can see, the port configuration fails.
The error happens in hn_nvs_alloc_subchans(). Maybe the previous ressources
were not properly deallocated on port stop?

When I looked around in the pmd's code, I noticed the function hn_reinit()
with the following commentary:

/*
 * Connects EXISTING rx/tx queues to NEW vmbus channel(s), and
 * re-initializes NDIS and RNDIS, including re-sending initial
 * NDIS/RNDIS configuration. To be used after the underlying vmbus
 * has been un- and re-mapped, e.g. as must happen when the device
 * MTU is changed.
 */

This function shows that it is possible to call rte_eth_dev_configure()
without failing, as this funtion hn_reinit() calls hn_dev_configure(), and
is called when the mtu is changed.
I suspect the operations described in comment above might also be needed
when running rte_eth_dev_configure().

Is this a known issue ?
In my application case, this bug causes issues when I need to restart and
configure the port.
Thank you for considering this issue.

Regards,
Edwin Brossette.


Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off

2025-02-12 Thread Edwin Brossette
Hello,

Sorry for bothering you again.
May I inquire if this issue is still being worked on ?
If so, when can I expect to see a fix ?

Best regards,
Edwin Brossette

On Mon, Dec 23, 2024 at 2:09 PM Slava Ovsiienko 
wrote:

> Confirm, it’s a bug, IIUC was introduced by reporting function update.
> AFAIK, we do not test with non-DevX environment anymore, so missed this.
>
> Fix should be provided.
>
>
>
> With best regards,
>
> Slava
>
>
>
> *From:* Asaf Penso 
> *Sent:* Sunday, December 22, 2024 9:39 AM
> *To:* igooto...@gmail.com; Slava Ovsiienko 
> *Cc:* Laurent Hardy ; Olivier Matz <
> olivier.m...@6wind.com>; Didier Pallard ;
> Jean-Mickael Guerin ; Edwin Brossette <
> edwin.brosse...@6wind.com>; dev@dpdk.org
> *Subject:* RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
>
>
>
> Hello Igor and Slava,
>
> Can you please check out this issue?
>
>
>
> Regards,
>
> Asaf Penso
>
>
>
> *From:* Edwin Brossette 
> *Sent:* Friday, 20 December 2024 19:06
> *To:* dev@dpdk.org
> *Cc:* Laurent Hardy ; Olivier Matz <
> olivier.m...@6wind.com>; Didier Pallard ;
> Jean-Mickael Guerin 
> *Subject:* net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
>
>
>
> Hello,
>
> I have run into a regression following an update to stable dpdk-24.11 with
> a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics
> in my lab which have DevX disabled: using mstconfig utility, I can see the
> flag UCTX_EN is not set.
>
> Mainly, the issue is that the ports cannot be started, with the following
> error logs in the journal:
>
> Set nb_rxd=1 (asked=512) for port=0
> Set nb_txd=1 (asked=512) for port=0
> starting port 0
> Initializing port 0 [7c:fe:90:65:e6:54]
> port 0: ntfp1 (mlx5_pci)
> nb_rxq=2 nb_txq=2
> rxq0=c9 rxq1=c25
> txq0=c9 txq1=c25
> port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526
> mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be
> higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
> mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next
> power of two (64)
> mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be
> higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
> mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next
> power of two (64)
> mlx5_net: Port 0 Rx queue 0 CQ creation failure.
> mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
> rte_eth_dev_start(port 0) failed, error=-12
> Failed to start port 0, set link down
> Failed to start port 0
>
>
> Looking more precisely into the problem, it appears that the number of Rx
> and Tx descriptors configured for my queues is 1. This happens because
> mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is
> unexpected. I identified this patch to be responsible for the regression:
>
> 4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits
>
> https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c
>
> After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never
> configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is
> called in this bit of code:
>
> https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681
>
> /*
> * When CTX is created by Verbs, query HCA attribute is unsupported.
> * When CTX is imported, we cannot know if it is created by DevX or
> * Verbs. So, we use query HCA attribute function to check it.
> */
> if (cdev->config.devx || cdev->config.device_fd != MLX5_ARG_UNSET) {
>
> /* Query HCA attributes. */
> ret = mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr);
> if (ret) {
>
> DRV_LOG(ERR, "Unable to read HCA caps in DevX mode.");
> rte_errno = ENOTSUP;
> goto error;
>
> }
> cdev->config.devx = 1;
>
> }
> DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT ");
>
>
>
> I deduced that following the above patch, the correct value for maximum Rx
> and Tx descriptors will only be set if DevX is enabled (see the if
> condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx
> descriptors will be 1, which will make the ports fail to start. Perhaps we
> should keep the previous default value (65535) if config.devx == 0 (DevX
> off)? This could be done like this, for example:
>
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> b/drivers/net/mlx5/mlx5_ethdev.c
> index 7708a0b80883..8ba3eb4a32de 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -359,10 +359,12 @@ mlx5_dev

net/mlx5: wrong Rx/Tx descriptor limits when DevX is off

2024-12-20 Thread Edwin Brossette
Hello,

I have run into a regression following an update to stable dpdk-24.11 with
a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics
in my lab which have DevX disabled: using mstconfig utility, I can see the
flag UCTX_EN is not set.

Mainly, the issue is that the ports cannot be started, with the following
error logs in the journal:

Set nb_rxd=1 (asked=512) for port=0
Set nb_txd=1 (asked=512) for port=0
starting port 0
Initializing port 0 [7c:fe:90:65:e6:54]
port 0: ntfp1 (mlx5_pci)
nb_rxq=2 nb_txq=2
rxq0=c9 rxq1=c25
txq0=c9 txq1=c25
port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526
mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be
higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next
power of two (64)
mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be
higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next
power of two (64)
mlx5_net: Port 0 Rx queue 0 CQ creation failure.
mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
rte_eth_dev_start(port 0) failed, error=-12
Failed to start port 0, set link down
Failed to start port 0

Looking more precisely into the problem, it appears that the number of Rx
and Tx descriptors configured for my queues is 1. This happens because
mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is
unexpected. I identified this patch to be responsible for the regression:

4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits
https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c

After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never
configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is
called in this bit of code:

https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681

/*
* When CTX is created by Verbs, query HCA attribute is unsupported.
* When CTX is imported, we cannot know if it is created by DevX or
* Verbs. So, we use query HCA attribute function to check it.
*/
if (cdev->config.devx || cdev->config.device_fd != MLX5_ARG_UNSET) {
/* Query HCA attributes. */
ret = mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr);
if (ret) {
DRV_LOG(ERR, "Unable to read HCA caps in DevX mode.");
rte_errno = ENOTSUP;
goto error;
}
cdev->config.devx = 1;
}
DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT ");

I deduced that following the above patch, the correct value for maximum Rx
and Tx descriptors will only be set if DevX is enabled (see the if
condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx
descriptors will be 1, which will make the ports fail to start. Perhaps we
should keep the previous default value (65535) if config.devx == 0 (DevX
off)? This could be done like this, for example:

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 7708a0b80883..8ba3eb4a32de 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -359,10 +359,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct
rte_eth_dev_info *info)
info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK;
mlx5_set_default_params(dev, info);
mlx5_set_txlimit_params(dev, info);
-   info->rx_desc_lim.nb_max =
-   1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
-   info->tx_desc_lim.nb_max =
-   1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+   if (priv->sh->cdev->config.devx) {
+   info->rx_desc_lim.nb_max =
+   1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+   info->tx_desc_lim.nb_max =
+   1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+   }
if (priv->sh->cdev->config.hca_attr.mem_rq_rmp &&
    priv->obj_ops.rxq_obj_new == devx_obj_ops.rxq_obj_new)
info->dev_capa |= RTE_ETH_DEV_CAPA_RXQ_SHARE;

Thanks in advance for your help.

Regards,
Edwin Brossette.


Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off

2025-03-17 Thread Edwin Brossette
Hello,

Thank you for your answer.
The short patch I joined with my first mail was just a rough example to
report what I tested. I believe you know the driver's code better than I
do, so I wouldn't be opposed to see you fix this issue.
Thank you in advance.

Regards,
Edwin Brossette.

On Wed, Mar 5, 2025 at 10:17 AM Slava Ovsiienko 
wrote:

> Hi, Edwin
>
>
>
> Thank you for the patch.
>
> You are quite right, “sh->cdev->config.hca_attr.log_max_wq_sz” is not set
> if DevX is disengaged.
>
> I found some other places where the uninitialized “log_max_wq_sz” might be
> used.
> So. I’d rather prefer to configure the “log_max_wq_sz” for IBV case as
> well, instead of just fixing mlx5_dev_infos_get().
>
>
> There is the property in  “priv->sh->dev_cap.max_qp_wr”, it reflects the
> max number of descriptors if rdma_core is used.
>
> Would you like to update your patch with this? Or would you prefer me to
> do it ?
>
> With best regards,
> Slava
>
>
>
>
>
> *From:* Edwin Brossette 
> *Sent:* Wednesday, February 12, 2025 4:34 PM
> *To:* Slava Ovsiienko 
> *Cc:* Asaf Penso ; igooto...@gmail.com; Laurent Hardy <
> laurent.ha...@6wind.com>; Olivier Matz ; Didier
> Pallard ; Jean-Mickael Guerin ;
> dev@dpdk.org
> *Subject:* Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
>
>
>
> Hello,
>
>
>
> Sorry for bothering you again.
>
> May I inquire if this issue is still being worked on ?
>
> If so, when can I expect to see a fix ?
>
>
>
> Best regards,
>
> Edwin Brossette
>
>
>
> On Mon, Dec 23, 2024 at 2:09 PM Slava Ovsiienko 
> wrote:
>
> Confirm, it’s a bug, IIUC was introduced by reporting function update.
> AFAIK, we do not test with non-DevX environment anymore, so missed this.
>
> Fix should be provided.
>
>
>
> With best regards,
>
> Slava
>
>
>
> *From:* Asaf Penso 
> *Sent:* Sunday, December 22, 2024 9:39 AM
> *To:* igooto...@gmail.com; Slava Ovsiienko 
> *Cc:* Laurent Hardy ; Olivier Matz <
> olivier.m...@6wind.com>; Didier Pallard ;
> Jean-Mickael Guerin ; Edwin Brossette <
> edwin.brosse...@6wind.com>; dev@dpdk.org
> *Subject:* RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
>
>
>
> Hello Igor and Slava,
>
> Can you please check out this issue?
>
>
>
> Regards,
>
> Asaf Penso
>
>
>
> *From:* Edwin Brossette 
> *Sent:* Friday, 20 December 2024 19:06
> *To:* dev@dpdk.org
> *Cc:* Laurent Hardy ; Olivier Matz <
> olivier.m...@6wind.com>; Didier Pallard ;
> Jean-Mickael Guerin 
> *Subject:* net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
>
>
>
> Hello,
>
> I have run into a regression following an update to stable dpdk-24.11 with
> a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics
> in my lab which have DevX disabled: using mstconfig utility, I can see the
> flag UCTX_EN is not set.
>
> Mainly, the issue is that the ports cannot be started, with the following
> error logs in the journal:
>
> Set nb_rxd=1 (asked=512) for port=0
> Set nb_txd=1 (asked=512) for port=0
> starting port 0
> Initializing port 0 [7c:fe:90:65:e6:54]
> port 0: ntfp1 (mlx5_pci)
> nb_rxq=2 nb_txq=2
> rxq0=c9 rxq1=c25
> txq0=c9 txq1=c25
> port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526
> mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be
> higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
> mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next
> power of two (64)
> mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be
> higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
> mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next
> power of two (64)
> mlx5_net: Port 0 Rx queue 0 CQ creation failure.
> mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
> rte_eth_dev_start(port 0) failed, error=-12
> Failed to start port 0, set link down
> Failed to start port 0
>
>
> Looking more precisely into the problem, it appears that the number of Rx
> and Tx descriptors configured for my queues is 1. This happens because
> mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is
> unexpected. I identified this patch to be responsible for the regression:
>
> 4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits
>
> https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c
>
> After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never
> configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is
> called in this bit of code:
>
>

[PATCH 3/5] Revert "net/qede: fix maximum Rx packet length"

2025-04-22 Thread edwin . brossette
From: Edwin Brossette 

This reverts commit d8ded501e05ce879f27f0ed1df7721a88b737e25.

The maximum length for Rx packets computed in qede_rx_queue_setup()
takes Ethernet CRC into account. This is not consistent with the value
computed in qede_set_mtu(). RTE_ETHER_CRC_LEN should not be added to
max_rx_pktlen, as HW does not include CRC in received frames passed to
host.

The original commit tries to fix another bug with this inappropriate
patch: packets with size nearing MTU limit are being dropped. This is
not because CRC length is not being accounted for in Rx buff size, but
because of the flooring applied to it: the rx_buff size computed is
lower than expected because we try to align it.

This issue will be fixed in the following patch.

CC: sta...@dpdk.org

Signed-off-by: Edwin Brossette 
Acked-by: Didier Pallard 
---
 drivers/net/qede/qede_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index 601fcb30b357..fe839a6ba844 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -235,7 +235,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid,
dev->data->rx_queues[qid] = NULL;
}
 
-   max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
+   max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN;
 
/* Fix up RX buffer size */
bufsz = (uint16_t)rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM;
-- 
2.35.0.4.g44a5d4affccf



QEDE bug report

2025-04-22 Thread Edwin Brossette
Hello,

I have several issues to report concerning the qede pmd as well as
potential solutions for them. Most of them have to do with configuring the
MTU.

== Abort on mtu change ===

First, the qede_assign_rxtx_handlers() seems to be working wrong since an
API change in the rte_eth lib. The commit linked bellow changes the way
packet receive and transmit functions are handled:
https://git.dpdk.org/dpdk/commit/?id=c87d435a4d79739c0cec2ed280b94b41cb908af7

Originally, the Rx/Tx handlers were located in rte_eth_dev struct. This is
currently no longer the case and the polling of incoming packets is done
with functions registered in rte_eth_fp_ops instead. The rte lib change is
supposed to be transparent for the individual pmds, but the polling
functions in rte_eth_dev are only synchronized with the ones in
rte_eth_fp_ops at the device start. This leads to an issue when trying to
configure the MTU while there is ongoing traffic:
-> Trying to change the MTU triggers a port restart.
-> qede_assign_rxtx_handlers() assign dummy polling functions in
dev->rx_pkt_burst and dev->tx_pkt_burst while the port is down.
-> However, rte_eth_rx_burst() polls in
&rte_eth_fp_ops[port_id].rx_pkt_burst which still points to
qede_recv_pkts_regular()
-> The application keep polling packets in the receive function and
triggers an assert(rx_mb != NULL) which caused an abort.

Since rte_eth_fp_ops is reset in rte_eth_dev_stop(), it may be better to
call this function instead of qede_dev_stop(). However the dummy functions
defined in lib/ethdev/ethdev_private.c log an error and dump stack when
called so they might not be intended to be used this way.
The way I fixed this issue in our applications is by forcing a complete
stop of the port before configuring the MTU. I have no DPDK patch to
suggest for this

-- Reproduction --

1) Start testpmd:
dpdk-testpmd --log-level=pmd.net.qede.driver:7 -a :17:00.0 -a
:17:00.1 -- -i --rxq=2 --txq=2 --coremask=0x0c --total-num-mbufs=25

2) Start packet forwarding:
start
io packet forwarding - ports=2 - cores=2 - streams=4 - NUMA support
enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 1) forwards packets on 2 streams:
  RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=2 - nb forwarding ports=2
  port 0: RX queue number: 2 Tx queue number: 2
Rx offloads=0x8 Tx offloads=0x0
RX queue: 0
  RX desc=0 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0  wthresh=0
  RX Offloads=0x8
TX queue: 0
  TX desc=0 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0  wthresh=0
  TX offloads=0x8000 - TX RS bit threshold=0
  port 1: RX queue number: 2 Tx queue number: 2
Rx offloads=0x8 Tx offloads=0x0
RX queue: 0
  RX desc=0 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0  wthresh=0
  RX Offloads=0x8
TX queue: 0
  TX desc=0 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0  wthresh=0
  TX offloads=0x8000 - TX RS bit threshold=0

3) Send a continuous stream of packets and change the mtu while they
are being forwarded:

p = Ether()/IP(src='10.100.0.1', dst='10.200.0.1')/UDP(sport=1,
dport=2)/Raw(load='A'*100)
sendp(p, iface='ntfp1', count=500, inter=0.001)

port config mtu 0 1500
[qede_dev_set_link_state:1842(17:00.0:dpdk-port-0)]setting link state 0
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1
Status 0
[ecore_int_attentions:1239(17:00.0:dpdk-port-0-0)]MFW indication via
attention
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1
Status 0
[qede_activate_vport:536(17:00.0:dpdk-port-0)]vport is deactivated
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 0
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 0 stopped
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 1
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 1 stopped
[qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 0
[qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 0 stopped
[qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 1
[qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 1 stopped
dpdk-testpmd: ../../../source/dpdk-24.11/drivers/net/qede/qede_rxtx.c:1600:
qede_recv_pkts_regular: Assertion `rx_mb != NULL' failed.
Aborted (core dumped)  <===

As you can see, the application is aborted.


= Bad Rx buffer size for mbufs ===

Another issue I had when trying to set the MTU was that I got bad sanity
checks on mbufs when sending large packets, cau

[PATCH 5/5] net/qede: fix rx_buf_size calculation

2025-04-22 Thread edwin . brossette
From: Edwin Brossette 

When the MTU configured is lower than maximum mbuf size (all packet data
can be stored in a single mbuf), then rx buffer size is configured with
MTU + some overhead. A flooring is applied to this value to align it,
meaning its actual value is going to be lower than expected.

This is a considerable design flaw, because a data packet fitting
exactly the MTU might not fit in the rx buffer. What is observed in this
case is that the nic splits the datagram in two segments, essentially
doing Rx scatter. However, the qede pmd does not expect gather scatter
in this case and does not use the required function: it uses
qede_recv_pkts_regular() which is not capable of handling this case.

Thus, we end up with malformed packet with m->nb_segs = 2 and
m->next = NULL. This means the last part of the data is missing.

CEIL max_rx_pktlen instead of FLOORing it. Also change the check on
max_rx_pktlen > bufsz in qede_rx_queue_setup(): if this ceiling would
make max_rx_pktlen exceed mbuf size, then force enable scatter-gather
mode.

Fixes: 318d7da3122b ("net/qede: fix Rx buffer size calculation")
CC: sta...@dpdk.org

Signed-off-by: Edwin Brossette 
Acked-by: Didier Pallard 
---
 drivers/net/qede/qede_rxtx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index ea77f09e18be..2a6f1290ad4a 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -111,7 +111,7 @@ qede_calc_rx_buf_size(struct rte_eth_dev *dev, uint16_t 
mbufsz,
rx_buf_size = max_frame_size;
 
/* Align to cache-line size if needed */
-   return QEDE_FLOOR_TO_CACHE_LINE_SIZE(rx_buf_size);
+   return QEDE_CEIL_TO_CACHE_LINE_SIZE(rx_buf_size);
 }
 
 static struct qede_rx_queue *
@@ -237,7 +237,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid,
/* cache align the mbuf size to simplify rx_buf_size calculation */
bufsz = QEDE_FLOOR_TO_CACHE_LINE_SIZE(bufsz);
if ((rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER) ||
-   max_rx_pktlen > bufsz) {
+   QEDE_CEIL_TO_CACHE_LINE_SIZE(max_rx_pktlen) > bufsz) {
if (!dev->data->scattered_rx) {
DP_INFO(edev, "Forcing scatter-gather mode\n");
dev->data->scattered_rx = 1;
-- 
2.35.0.4.g44a5d4affccf



[PATCH 4/5] net/qede: fix QEDE_ETH_OVERHEAD being counted twice in rx_buf_size

2025-04-22 Thread edwin . brossette
From: Edwin Brossette 

rx_buf_size is computed at 2 different places: in qede_rx_queue_setup()
and in qede_set_mtu().

In qede_rx_queue_setup(), it is initialized with mtu + RTE_ETHER_HDR_LEN
and QEDE_ETH_OVERHEAD is added to it in qede_calc_rx_buf_size().

In qede_set_mtu(), it is initialized with
mtu + RTE_ETHER_HDR_LEN + QEDE_ETH_OVERHEAD and QEDE_ETH_OVERHEAD is
added to it a second time in qede_calc_rx_buf_size().

This is inconsistent and wrong in the case of qede_set_mtu(). Initialize
this variable with mtu + QEDE_MAX_ETHER_HDR_LEN instead and stop adding
+ QEDE_ETH_OVERHEAD over and over again. This will factorize the code.

Fixes: 318d7da3122b ("net/qede: fix Rx buffer size calculation")
CC: sta...@dpdk.org

Signed-off-by: Edwin Brossette 
Acked-by: Didier Pallard 
---
 drivers/net/qede/qede_rxtx.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index fe839a6ba844..ea77f09e18be 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -101,19 +101,14 @@ qede_calc_rx_buf_size(struct rte_eth_dev *dev, uint16_t 
mbufsz,
 * buffers can be used for single packet. So need to make sure
 * mbuf size is sufficient enough for this.
 */
-   if ((mbufsz * ETH_RX_MAX_BUFF_PER_PKT) <
-(max_frame_size + QEDE_ETH_OVERHEAD)) {
+   if ((mbufsz * ETH_RX_MAX_BUFF_PER_PKT) < max_frame_size) {
DP_ERR(edev, "mbuf %d size is not enough to hold max 
fragments (%d) for max rx packet length (%d)\n",
   mbufsz, ETH_RX_MAX_BUFF_PER_PKT, max_frame_size);
return -EINVAL;
}
-
-   rx_buf_size = RTE_MAX(mbufsz,
- (max_frame_size + QEDE_ETH_OVERHEAD) /
-  ETH_RX_MAX_BUFF_PER_PKT);
-   } else {
-   rx_buf_size = max_frame_size + QEDE_ETH_OVERHEAD;
-   }
+   rx_buf_size = mbufsz;
+   } else
+   rx_buf_size = max_frame_size;
 
/* Align to cache-line size if needed */
return QEDE_FLOOR_TO_CACHE_LINE_SIZE(rx_buf_size);
@@ -235,14 +230,14 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid,
dev->data->rx_queues[qid] = NULL;
}
 
-   max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN;
+   max_rx_pktlen = dev->data->mtu + QEDE_MAX_ETHER_HDR_LEN;
 
/* Fix up RX buffer size */
bufsz = (uint16_t)rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM;
/* cache align the mbuf size to simplify rx_buf_size calculation */
bufsz = QEDE_FLOOR_TO_CACHE_LINE_SIZE(bufsz);
if ((rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER) ||
-   (max_rx_pktlen + QEDE_ETH_OVERHEAD) > bufsz) {
+   max_rx_pktlen > bufsz) {
if (!dev->data->scattered_rx) {
DP_INFO(edev, "Forcing scatter-gather mode\n");
dev->data->scattered_rx = 1;
-- 
2.35.0.4.g44a5d4affccf



[PATCH 1/5] qede: fix tunnel checksums offload flags

2025-04-22 Thread edwin . brossette
From: Didier Pallard 

In tunnel case, L3 bad checksum is properly setting
RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD while all other flags are set in inner
part of offload flags, this can cause both L4 flags BAD and GOOD to be set
in inner offloads when a tunnel packet is processed, changing these flags
to RTE_MBUF_F_RX_L4_CKSUM_NONE instead of GOOD/BAD values. This in turn can
cause upper layers to take incorrect decision on what to do with the
packet.

Remove IP_CKSUM_GOOD flag on outer IP layer, since there is currently no
way to indicate that this csum is good using DPDK offload flags.

Fixes: 81f8804992c9 ("net/qede: enhance Rx CPU utilization")
Fixes: 3d4bb4411683 ("net/qede: add fastpath support for VXLAN tunneling")
CC: sta...@dpdk.org

Signed-off-by: Edwin Brossette 
Signed-off-by: Didier Pallard 
Acked-by: Olivier Matz 
---
 drivers/net/qede/qede_rxtx.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index 25e28fd9f61b..c764e3d83763 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -1617,9 +1617,9 @@ qede_recv_pkts_regular(void *p_rxq, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
"L4 csum failed, flags = 0x%x",
parse_flag);
rxq->rx_hw_errors++;
-   ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_BAD;
+   ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_BAD;
} else {
-   ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD;
+   ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD;
}
 
if (unlikely(qede_check_tunn_csum_l3(parse_flag))) {
@@ -1628,8 +1628,6 @@ qede_recv_pkts_regular(void *p_rxq, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
parse_flag);
rxq->rx_hw_errors++;
ol_flags |= RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD;
-   } else {
-   ol_flags |= RTE_MBUF_F_RX_IP_CKSUM_GOOD;
}
 
flags = fp_cqe->tunnel_pars_flags.flags;
@@ -1887,9 +1885,9 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
"L4 csum failed, flags = 0x%x",
parse_flag);
rxq->rx_hw_errors++;
-   ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_BAD;
+   ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_BAD;
} else {
-   ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD;
+   ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD;
}
 
if (unlikely(qede_check_tunn_csum_l3(parse_flag))) {
@@ -1898,8 +1896,6 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
parse_flag);
  rxq->rx_hw_errors++;
ol_flags |= RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD;
-   } else {
-   ol_flags |= RTE_MBUF_F_RX_IP_CKSUM_GOOD;
}
 
if (tpa_start_flg)
-- 
2.35.0.4.g44a5d4affccf



[PATCH 2/5] net/qede: fix bad sanity check on Rx queue release

2025-04-22 Thread edwin . brossette
From: Edwin Brossette 

As per the rte_mbuf API: the driver is responsible of initializing all
the required fields. This is not done at qede alloc, meaning there can
be garbage data in mbufs memory, although this garbage data should be
overwritten when the mbufs are used. Since a sanity check is done when
freeing the queues, its possible some remaining garbage data causes a
panic when trying to release the queues if some mbufs are being
processed.

Use rte_pktmbuf_raw_free() instead of rte_pktmbuf_free() as the sanity
check is more relaxed.

Fixes: 2ea6f76aff40 ("qede: add core driver")
CC: sta...@dpdk.org

Signed-off-by: Edwin Brossette 
Acked-by: Didier Pallard 
---
 drivers/net/qede/qede_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index c764e3d83763..601fcb30b357 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -305,7 +305,7 @@ static void qede_rx_queue_release_mbufs(struct 
qede_rx_queue *rxq)
if (rxq->sw_rx_ring) {
for (i = 0; i < rxq->nb_rx_desc; i++) {
if (rxq->sw_rx_ring[i]) {
-   rte_pktmbuf_free(rxq->sw_rx_ring[i]);
+   rte_mbuf_raw_free(rxq->sw_rx_ring[i]);
rxq->sw_rx_ring[i] = NULL;
}
}
-- 
2.35.0.4.g44a5d4affccf