Hi everyone, I noticed a behavior that I'm not sure if it's to be expected after this patch or not.
SRC VM --> Router VM (with accelerated networking / Mellanox driver) --> 2 tunnel links ECMP --> destination VM Seems that inside a single flow: - SYN from SRC VM gets hashed on tunnel link one on Router VM - ACK (part of the same flow/tcp handshake) gets hashed on tunnel link two on Router VM RX hash (RSS hash) is on. I would expect that the hashing and equal load balancing of traffic would happen per flow, not per packet. Is it supposed to happen per packet with the mlx5_core driver? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1902531 Title: [linux-azure] IP forwarding issue in netvsc Status in linux-azure package in Ubuntu: Fix Released Status in linux-azure-4.15 package in Ubuntu: New Status in linux-azure source package in Bionic: Invalid Status in linux-azure-4.15 source package in Bionic: Fix Released Status in linux-azure source package in Focal: Fix Released Status in linux-azure-4.15 source package in Focal: Invalid Bug description: [Impact] We identified an issue with the Linux netvsc driver when used in IP forwarding mode. The problem is that the RSS hash value is not propagated to the outgoing packet, and so such packets go out on channel 0. This produces an imbalance across outgoing channels, and a possible overload on the single host-side CPU that is processing channel 0. The problem does not occur when Accelerated Networking is used because the packets go out through the Mellanox driver. Because it is tied to IP forwarding, the problem is presumably most likely to be visible in a virtual appliance device that is doing network load balancing or other kinds of packet filtering and redirection. We would like to request fixes to this issue in 16.04, 18.04 and 20.04. Two fixes are already in the upstream v5.5+, so they’re already in 5.8.0-1011.11. For 5.4.0-1031.32, the 2 fixes can apply cleanly: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1fac7ca4e63bf935780cc632ccb6ba8de5f22321 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f3aeb1ba05d41320e6cf9a60f698d9c4e44348e For 5.0.0-1036.38, we need 1 more patch applied first, so the list is: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b441f79532ec13dc82d05c55badc4da1f62a6141 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1fac7ca4e63bf935780cc632ccb6ba8de5f22321 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f3aeb1ba05d41320e6cf9a60f698d9c4e44348e For 4.15.0-1098.109~16.04.1, the 2 patches can not apply cleanly, so Dexuan backported them here: https://github.com/dcui/linux/commit/4ed58762a56cccfd006e633fac63311176508795 https://github.com/dcui/linux/commit/40ad7849a6365a5a485f05453e10e3541025e25a (The 2 patches are on the branch https://github.com/dcui/linux/commits/decui/ubuntu_16.04/linux-azure/Ubuntu-azure-4.15.0-1098.109_16.04.1) [Test Case] As described in https://bugs.launchpad.net/ubuntu/+source/linux- azure/+bug/1902531/comments/6 [Where problems could occur] A potential regression would affect Azure instance using netvsc without accelerated networking. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1902531/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp