Hello,
I'd like to share also my experience with this bug, which also affects
us at work (GRNET). We have the following setup:
# cat /etc/debian_version
9.3
# dpkg -l | grep -e ifupdown -e vlan -e bridge-utils | awk '{print $2, $3}'
ii bridge-utils 1.5-13+deb9u1
ii ifupdown 0.8.19
ii vlan 1.9-3.2+b1
# uname -a
Linux foo 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64
GNU/Linux
I have reproduced it by disabling networking.service, not loading the
bonding module on boot, with the following configuration:
# cat /etc/network/interfaces
auto bond0
iface bond0 inet static
mtu 9000
bond-mode 802.3ad
bond_xmit_hash_policy layer3+4
bond-miimon 100
slaves ens5f0 ens5f1
auto vlan109
iface vlan109 inet manual
bridge_ports bond0.109
bridge_stp off
bridge_maxwait 0
bridge_fd 0
mtu 9000
auto vlan110
iface vlan110 inet manual
bridge_ports bond0.110
bridge_stp off
bridge_maxwait 0
bridge_fd 0
mtu 9000
# cat /etc/modules
8021q
bonding
In our case, we noticed the following timeline which is quite similar
like Apollon's one:
* bonding module gets loaded into the kernel, way before
networking.service gets started (defined in /etc/modules), should
be unnecessary tbh)
* Interface bond0 gets created, which triggers a udev 'add' action
* The action calls bridge-network-interface with INTERFACE=bond0
* bridge-network-interface creates interface bond0.109. bond0.109 has
MTU 1500 because ifup has not ran yet
* The creation of bond0.109 triggers another udev 'add' action (which, I
think, should not happen)
* bridge-network-interface tries to run ifup --allow auto vlan109
* The above command fails because it cannot set the MTU of vlan109 to
9000, because bond0.109's MTU is 1500. vlan109 interface is left in an
unconfigured state.
* /lib/udev/bridge-network-interface fails because of set -e
* The second call of bridge-network-interface with INTERFACE=bond0.109
fails in a similar way. All other interfaces are untouched.
* systemd starts up networking.service and runs ifup --allow=auto -a
* bond0 gets MTU 9000
* ifup tries to get vlan109 interface up
* This fails because bond0.109's MTU is 1500. It seems that ifupdown
and/or bridge-utils do not touch it
* ifup for vlan110 runs successfully because it creates a new bond0.110
interface, which inherits the MTU of bond0, which is now 9000 and gets
up correctly
The above behavior does not always happen: If, for some reason,
networking.service gets started before bridge-network-interface runs its
stuff, all interfaces will get up correctly. Also, this affects only the
first interface in /e/n/i which has bridge_ports stanza defined, because
bridge-network-interface fails for the reasons I described above.
I agree with Apollon, I really do not understand what the code is trying
to do and why BRIDGE_HOTPLUG defaults to yes. We ran into serious
problems with silent packet loss in QEMU VMs, which had their tap
interfaces bridged to the above vlanXXX interfaces and MTU 9000 and the
only way to mitigate this problem for now is to set BRIDGE_HOTPLUG=no.
Unfortunately, it's not quite easy for us to suggest a solution but we
can provide more information if needed.
Regards,
Nikos