I would have thought this would be the relevant patch: bonding: speed/duplex update at NETDEV_UP event Mahesh Bandewar authored and davem330 committed on Sep 28, 2017 1 parent b5c7d4e commit 4d2c0cda07448ea6980f00102dc3964eb25e241c
However, it was first available in v4.15-rc1. At least as far as bonding kernel changes go, there does not seem another obvious candidate that might have fixed this problem between 4.12 and 4.13 (first skim). At least for one scenario I looked at, we got a bad speed/duplex setting, which eventually ended up with the bond interface aggregating on a separate port, and/or ending up in LACP DISABLED state which it never got out of. We only checked correct/latest device speed/duplex settings via the NETDEV_CHANGE path, where we called _ethtool_get_settings(). If we don't receive a change event again to correct the speed/duplex, we never recover. There are some other patches which help address this at different points, but are either before or later (see above) the window. I'll take a look at code outside the bonding dir which might impact this. Joseph, could you provide the raw config files you used as well? It was not super clear in the png image if those were the only diffs. They did not seem very relevant diffs either. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1753662 Title: [i40e] LACP bonding start up race conditions Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Triaged Bug description: When provisioning Ubuntu servers with MAAS at once, some bonding pairs will have unexpected LACP status such as "Expired". It randomly happens at each provisioning with the default xenial kernel(4.4), but not reproducible with HWE kernel(4.13). I'm using Intel X710 cards (Dell-branded). Using the HWE kernel works as a workaround for short term, but it's not ideal since 4.13 is not covered by Canonical Livepatch service. How to reproduce: 1. configure LACP bonding with MAAS 2. provision machines 3. check the bonding status in /proc/net/bonding/bond* frequency of occurrence: About 5 bond pairs in 22 pairs at each provisioning. [reproducible combination] $ uname -a Linux comp006 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux $ sudo ethtool -i eno1 driver: i40e version: 1.4.25-k firmware-version: 6.00 0x800034e6 18.3.6 expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes [non-reproducible combination] $ uname -a Linux comp006 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux $ sudo ethtool -i eno1 driver: i40e version: 2.1.14-k firmware-version: 6.00 0x800034e6 18.3.6 expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.4.0-116-generic 4.4.0-116.140 ProcVersionSignature: Ubuntu 4.4.0-116.140-generic 4.4.98 Uname: Linux 4.4.0-116-generic x86_64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Mar 6 06:37 seq crw-rw---- 1 root audio 116, 33 Mar 6 06:37 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.1-0ubuntu2.15 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Mar 6 06:46:32 2018 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 002 Device 002: ID 8087:8002 Intel Corp. Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub Bus 001 Device 002: ID 8087:800a Intel Corp. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Dell Inc. PowerEdge R730 PciMultimedia: ProcEnviron: TERM=screen PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 EFI VGA ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-116-generic.efi.signed root=UUID=0528f88e-cf1a-43e2-813a-e7261b88d460 ro console=tty0 console=ttyS0,115200n8 RelatedPackageVersions: linux-restricted-modules-4.4.0-116-generic N/A linux-backports-modules-4.4.0-116-generic N/A linux-firmware 1.157.17 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 08/16/2017 dmi.bios.vendor: Dell Inc. dmi.bios.version: 2.5.5 dmi.board.name: 072T6D dmi.board.vendor: Dell Inc. dmi.board.version: A08 dmi.chassis.asset.tag: 0018880 dmi.chassis.type: 23 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr2.5.5:bd08/16/2017:svnDellInc.:pnPowerEdgeR730:pvr:rvnDellInc.:rn072T6D:rvrA08:cvnDellInc.:ct23:cvr: dmi.product.name: PowerEdge R730 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp