Hi, > -----Original Message----- > From: Dariusz Sosnowski <dsosnow...@nvidia.com> > Sent: Thursday, November 9, 2023 8:01 PM > To: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko > <viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; Suanming Mou > <suanmi...@nvidia.com>; Bing Zhao <bi...@nvidia.com> > Cc: dev@dpdk.org; Raslan Darawsheh <rasl...@nvidia.com>; > sta...@dpdk.org > Subject: [PATCH] net/mlx5: fix unbind of incorrect hairpin queue > > Let's take an application with the following configuration: > > - It uses 2 ports. > - Each port has 3 Rx queues and 3 Tx queues. > - On each port, Rx queues have a following purposes: > - Rx queue 0 - SW queue, > - Rx queue 1 - hairpin queue, bound to Tx queue on the same port, > - Rx queue 2 - hairpin queue, bound to Tx queue on another port. > - On each port, Tx queues have a following purposes: > - Tx queue 0 - SW queue, > - Tx queue 1 - hairpin queue, bound to Rx queue on the same port, > - Tx queue 2 - hairpin queue, bound to Rx queue on another port. > - Application configured all of the hairpin queues for manual binding. > > After ports are configured and queues are set up, if the application does the > following API call sequence: > > 1. rte_eth_dev_start(port_id=0) > 2. rte_eth_hairpin_bind(tx_port=0, rx_port=0) 3. > rte_eth_hairpin_bind(tx_port=0, rx_port=1) > > mlx5 PMD fails to modify SQ and logs this error: > > mlx5_common: mlx5_devx_cmds.c:2079: mlx5_devx_cmd_modify_sq(): > Failed to modify SQ using DevX > > This error was caused by an incorrect unbind operation taken during error > handling inside call (3). > > (3) fails, because port 1 (Rx side of the hairpin) was not started. > As a result of this failure, PMD goes into error handling, where all > previously > bound hairpin queues are unbound. > This is incorrect, since this error handling procedure in > rte_eth_hairpin_bind() > implementation assumes that all hairpin queues are bound to the same > rx_port, which is not the case. > The following sequence of function calls appears: > > - rte_eth_hairpin_queue_peer_unbind(rx_port=**1**, rx_queue=1, 0), > - mlx5_hairpin_queue_peer_unbind(dev=**port 0**, tx_queue=1, 1). > > Which violates the hairpin queue destroy flow, by unbinding Tx queue 1 on > port 0, before unbinding Rx queue 1 on port 1. > > This patch fixes that behavior, by filtering Tx queues on which error > handling is > done to only affect: > > - hairpin queues (it also reduces unnecessary debug log messages), > - hairpin queues connected to the rx_port which is currently processed. > > Fixes: 37cd4501e873 ("net/mlx5: support two ports hairpin mode") > Cc: bi...@nvidia.com > Cc: sta...@dpdk.org > > Signed-off-by: Dariusz Sosnowski <dsosnow...@nvidia.com> > Acked-by: Viacheslav Ovsiienko <viachesl...@nvidia.com> > ---
Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh