I would have thought this would be the relevant patch:

bonding: speed/duplex update at NETDEV_UP event
 Mahesh Bandewar authored and davem330 committed on Sep 28, 2017
1 parent b5c7d4e commit 4d2c0cda07448ea6980f00102dc3964eb25e241c 

However, it was first available in v4.15-rc1.

At least as far as bonding kernel changes go, there does not
seem another obvious candidate that might have fixed this problem
between 4.12 and 4.13 (first skim).

At least for one scenario I looked at, we got a bad speed/duplex
setting, which eventually ended up with the bond interface 
aggregating on a separate port, and/or ending up in LACP DISABLED
state which it never got out of. We only checked correct/latest
device speed/duplex settings via the NETDEV_CHANGE path, where
we called _ethtool_get_settings(). If we don't receive a change
event again to correct the speed/duplex, we never recover.

There are some other patches which help address this at different
points, but are either before or later (see above) the window.

I'll take a look at code outside the bonding dir which might
impact this. 

Joseph, could you provide the raw config files you used as well?
It was not super clear in the png image if those were the only
diffs. They did not seem very relevant diffs either.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1753662

Title:
  [i40e] LACP bonding start up race conditions

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  When provisioning Ubuntu servers with MAAS at once, some bonding pairs
  will have unexpected LACP status such as "Expired". It randomly
  happens at each provisioning with the default xenial kernel(4.4), but
  not reproducible with HWE kernel(4.13). I'm using Intel X710 cards
  (Dell-branded).

  Using the HWE kernel works as a workaround for short term, but it's
  not ideal since 4.13 is not covered by Canonical Livepatch service.

  How to reproduce:
  1. configure LACP bonding with MAAS
  2. provision machines
  3. check the bonding status in /proc/net/bonding/bond*

  frequency of occurrence:
  About 5 bond pairs in 22 pairs at each provisioning.

  [reproducible combination]
  $ uname -a
  Linux comp006 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 1.4.25-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: 0000:01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  [non-reproducible combination]
  $ uname -a
  Linux comp006 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: 0000:01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-116-generic 4.4.0-116.140
  ProcVersionSignature: Ubuntu 4.4.0-116.140-generic 4.4.98
  Uname: Linux 4.4.0-116-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Mar  6 06:37 seq
   crw-rw---- 1 root audio 116, 33 Mar  6 06:37 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu2.15
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Mar  6 06:46:32 2018
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 002 Device 002: ID 8087:8002 Intel Corp. 
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
   Bus 001 Device 002: ID 8087:800a Intel Corp. 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R730
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-116-generic.efi.signed 
root=UUID=0528f88e-cf1a-43e2-813a-e7261b88d460 ro console=tty0 
console=ttyS0,115200n8
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-116-generic N/A
   linux-backports-modules-4.4.0-116-generic  N/A
   linux-firmware                             1.157.17
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/16/2017
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.5.5
  dmi.board.name: 072T6D
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A08
  dmi.chassis.asset.tag: 0018880
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr2.5.5:bd08/16/2017:svnDellInc.:pnPowerEdgeR730:pvr:rvnDellInc.:rn072T6D:rvrA08:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R730
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to