Okay, that's good - thanks for that. Closing this ticket as invalid. ** Changed in: linux (Ubuntu Focal) Status: New => Invalid
** Changed in: ubuntu-z-systems Status: Incomplete => Opinion ** Changed in: ubuntu-z-systems Status: Opinion => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1990275 Title: [UBUNTU 20.04] Unexpected LAG affinity behaviour with mlx5_core driver in Ubuntu 20.04 Status in Ubuntu on IBM z Systems: Invalid Status in linux package in Ubuntu: Invalid Status in linux source package in Focal: Invalid Bug description: == Comment: #0 - KISHORE KUMAR G <kishor...@in.ibm.com> - 2022-09-19 04:39:42 == ---Problem Description--- On a Ubuntu/s390 system that houses a Mellanox CX5 Adapter with two ports connected to the a pair of TOR switches , act as entry point to cluster of compute nodes to access public network ( edge node) with following level of mlx firmware : ethtool -i p0 driver: mlx5e_rep version: 5.4.0-104.118- firmware-version: 16.27.1016 (MT_0000000013) expansion-rom-version: bus-info: 0100:00:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no The LAG affinity module of mlx5_core in upstream 5.4 kernel listens to routing events and sets the LAG affinity accordingly , whereas in one of custom services has Fabcon service listens to the routing events and sets the LAG affinity in the mellanox driver accordingly. The edge node routes defined in compute nodes use both the two interfaces (port1 -P0 and port2- P1) for the LAG affinity. For instance 10.66.0.170 proto bgp src 10.66.11.43 metric 20 nexthop via 172.31.22.42 dev p0 weight 1 nexthop via 172.31.22.170 dev p1 weight 1 As an example post an edge node bootup , LAG mapping gets converged to use both port1(P0) and port2 (P1) by default root@pok1-qz1-sr1-rk011-s20:/# dmesg | grep lag [ 282.043011] mlx5_core 0100:00:00.0: lag map port 1:2 port 2:2 [ 282.083541] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:2 (<------ Both ports are equally mapped) The issue comes, when the mlx5_core driver cannot derive the LAG configuration from specific routes. For instance,an operation of disabling an interface from edge node above (10.66.0.170) or addition/removal of the interface, causes mlx5_core driver to listen on the routing change and change the LAG affinity to use a single network interface only. In the following example ,a new static route entry to a single destination (10.66.47.34) is added as below ip route add 10.66.47.34 proto static src 10.66.11.43 metric 20 via 172.31.22.42 dev p0 Caused the LAG mapping change to port1(p0) as detected as following root@pok1-qz1-sr1-rk011-s20:/# dmesg | grep lag [ 282.043011] mlx5_core 0100:00:00.0: lag map port 1:2 port 2:2 [ 282.083541] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:2 [ 757.878626] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:1 <----mapping directs to go thru P0. The above behaviour, causes all the traffic in 10.x to use a single network interface. The TOR switches (Fabric) doesn't capture or know such a LAG affinity change and therefore the packets will be dropped on "not in use" interface ( Eg. Port 2 (P1) ). So the mellanox(mlx5_core) should not be changing the LAG mapping /config based on the last route event, rather should rely on the default routes only. Mellanox agreed to patch this and its is available in 5.15.29 Ubuntu and 5.15.39 respectively Following are the commits that resolves this issue . 1. net/mlx5e: Lag,Only handle events from highest priority multipath entry . Available in upstream Kernel 5.15.29 - https://github.com/torvalds/linux/commit/ad11c4f1d8fd1f03639460e425a36f7fd0ea83f5 2.net/mlx5e: Lag, Don't skip fib events on current dst . (5.15.29)https://github.com/torvalds/linux/commit/4a2a664ed87962c4ddb806a84b5c9634820bcf55 )3. net/mlx5e: Lag, Fix fib_info pointer assignment - ( 5.15.39 ) https://github.com/torvalds/linux/commit/a6589155ec9847918e00e7279b8aa6d4c272bea7 4. net/mlx5e: Lag, Fix use-after-free in fib event handler - (5.15.39) https://github.com/torvalds/linux/commit/27b0420fd959e38e3500e60b637d39dfab065645 The request is to have the above commits backported in Ubuntu 20.04.x series including the Ubuntu 18.04 HWE kernel Contact Information = Kishore Kumar G/kishore.pil...@in.ibm.com utsav.shrivas...@ibm.com ---Additional Hardware Info--- Mellanox CX5 adapter with firmware-version: 16.27.1016 (MT_0000000013) ---uname output--- Linux version version: 5.4.0-104.118 Machine Type = s390x LPAR ---Debugger--- A debugger is not configured ---Steps to Reproduce--- ... " default proto bgp src 10.66.11.41 metric 20 nexthop via 172.31.22.40 dev p0 weight 1 nexthop via 172.31.22.168 dev p1 weight 1" ...... 172.31.22.40/31 dev p0 proto kernel scope link src 172.31.22.41 172.31.22.168/31 dev p1 proto kernel scope link src 172.31.22.169 .. Also we have around 64 SRIOV devices for VM Consumption. In the above case, the LAG mapping is working as expected as below, to use both the ports (p0 and p1) for traffic root@pok1-qz1-sr1-rk011-s20:/# dmesg | grep lag [ 282.043011] mlx5_core 0100:00:00.0: lag map port 1:2 port 2:2 [ 282.083541] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:2 <<<---behavior expected The issue comes , when we set an additional route to a single IP in the underlying network with a single/one next hop , we observe that all the traffic is being shifted to a single next hop port as the example below shows. root@pok1-qz1-sr1-rk011-s20:/# ip route add 10.66.47.34 proto static src 10.66.11.41 metric 20 via 172.31.22.40 dev p0 root@pok1-qz1-sr1-rk011-s20:/# dmesg | grep lag [ 282.043011] mlx5_core 0100:00:00.0: lag map port 1:2 port 2:2 [ 282.083541] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:2 [ 757.878626] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:1 <<<<------- Issue Stack trace output: no Oops output: no System Dump Info: The system is not configured to capture a system dump. *Additional Instructions for Kishore Kumar G/kishore.pil...@in.ibm.com utsav.shrivas...@ibm.com: -Attach sysctl -a output output to the bug. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1990275/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp