Discussing this with Foundations we concluded ifupdown should not only lock "per-interface" basis, but it should have also a way of creating an hierarchy of interfaces (which locking the master one would imply in all slaves to be locked also - for vlan, aliases, bridging, etc) so in a possible parallel execution ifupdown would obey those restrictions and configure interfaces in a proper order - guaranteeing locking.
I'm preparing those changes and I'll suggest them upstream. If they get accepted I'll provide SRUs for precise and trusty. If SRUs or upstream code proposal are not accepted I may created a parallel ifupdown package being maintained by me to address those issues. Thank you.. Coming back to this soon. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to ifupdown in Ubuntu. https://bugs.launchpad.net/bugs/1337873 Title: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition Status in “ifupdown” package in Ubuntu: In Progress Status in “ifupdown” package in Debian: New Bug description: * please consider my bonding examples are using eth1 and eth2 as slave interfaces. ifupdown some race conditions explained bellow. ifenslave does not behave well with sysv networking and upstart network-interface scripts running together. !!!! case 1) (a) ifup eth0 (b) ifup -a for eth0 ----------------------------------------------------------------- 1-1. Lock ifstate.lock file. 1-1. Wait for locking ifstate.lock file. 1-2. Read ifstate file to check the target NIC. 1-3. close(=release) ifstate.lock file. 1-4. Judge that the target NIC isn't processed. 1-2. Read ifstate file to check the target NIC. 1-3. close(=release) ifstate.lock file. 1-4. Judge that the target NIC isn't processed. 2. Lock and update ifstate file. Release the lock. 2. Lock and update ifstate file. Release the lock. !!! to be explained !!! case 2) (a) ifenslave of eth0 (b) ifenslave of eth0 ------------------------------------------------------------------ 3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0. 4. Link down the target NIC. 5. Write NIC id to /sys/class/net/bond0/bonding /slaves then NIC gets up 4. Link down the target NIC. 5. Fails to write NIC id to /sys/class/net/bond0/bonding/ slaves it is already written. !!! ##################################################################### #### My setup: root@provisioned:~# cat /etc/modprobe.d/bonding.conf alias bond0 bonding options bonding mode=1 arp_interval=2000 Both, /etc/init.d/networking and upstart network-interface begin enabled. #### Beginning: root@provisioned:~# cat /etc/network/interfaces # /etc/network/interfaces auto lo iface lo inet loopback auto eth0 iface eth0 inet dhcp I'm able to boot with both scripts (networking and network-interface enabled) with no problem. I can also boot with only "networking" script enabled: --- root@provisioned:~# initctl list | grep network network-interface stop/waiting networking start/running --- OR only the script "network-interface" enabled: --- root@provisioned:~# initctl list | grep network network-interface (eth2) start/running network-interface (lo) start/running network-interface (eth0) start/running network-interface (eth1) start/running --- #### Enabling bonding: Following ifenslave configuration example (/usr/share/doc/ifenslave/ examples/two_hotplug_ethernet), my /etc/network/interfaces has to look like this: --- auto eth1 iface eth1 inet manual bond-master bond0 auto eth2 iface eth2 inet manual bond-master bond0 auto bond0 iface bond0 inet static bond-mode 1 bond-miimon 100 bond-primary eth1 eth2 address 192.168.169.1 netmask 255.255.255.0 broadcast 192.168.169.255 --- Having both scripts running does not make any difference since we are missing "bond-slaves" keyword on slave interfaces, for ifenslave to work, and they are set to "manual". Ifenslave code: """ for slave in $BOND_SLAVES ; do ... # Ensure $slave is down. ip link set "$slave" down 2>/dev/null if ! sysfs_add slaves "$slave" 2>/dev/null ; then echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER ready and a bonding interface ?" >&2 else # Bring up slave if it is the target of an allow-bondX stanza. # This is usefull to bring up slaves that need extra setup. if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" --list | grep -q $slave; then ifup $v --allow "$BOND_MASTER" "$slave" fi """ Without the keyword "bond-slaves" on the master interface declaration, ifenslave will NOT bring any slave interface up on the "master" interface ifup invocation. *********** Part 1 So, having networking sysv init script AND upstart network-interface script running together... the following example works: --- root@provisioned:~# cat /etc/network/interfaces # /etc/network/interfaces auto lo iface lo inet loopback auto eth0 iface eth0 inet dhcp auto eth1 iface eth1 inet manual bond-master bond0 auto eth2 iface eth2 inet manual bond-master bond0 auto bond0 iface bond0 inet static bond-mode 1 bond-miimon 100 bond-primary eth1 bond-slaves eth1 eth2 address 192.168.169.1 netmask 255.255.255.0 broadcast 192.168.169.255 --- Ifenslave script sets link down to all slave interfaces, declared by "bond-slaves" keyword, and assigns them to correct bonding. Ifenslave script ONLY tries to make a reentrant call to ifupdown if the slave interfaces have "allow-bondX" stanza (not our case). So this should not work, since when the master bonding interface (bond0) is called, ifenslave does not configure slaves without "allow-bondX" stanza. What is happening, why is it working ? If we disable upstart "network-interface" script.. our bonding stops to work on the boot. This is because upstart was the one setting the slave interfaces up (with the configuration above) and not sysv networking scripts. It is clear that ifenslave from sysv script invocation can set the slave interface down anytime (even during upstart script execution) so it might work and might not: """ ip link set "$slave" down 2>/dev/null """ root@provisioned:~# initctl list | grep network-interface network-interface (eth2) start/running network-interface (lo) start/running network-interface (bond0) start/running network-interface (eth0) start/running network-interface (eth1) start/running Since having the interface down is a requirement to slave it, running both scripts together (upstart and sysv) could create a situation where upstart puts slave interface online but ifenslave from sysv script puts it down and never bring it up again (because it does not have "allow-bondX" stanza). *********** Part 2 What if I disable upstart "network-interface", stay only with the sysv script but introduce the "allow-bondX" stanza to slave interfaces ? The funny part begins... without upstart, the ifupdown tool calls ifenslave, for bond0 interface, and ifenslave calls this line: """ for slave in $BOND_SLAVES ; do ... if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" --list | grep -q $slave; then ifup $v --allow "$BOND_MASTER" "$slave" fi """ But ifenslave stays waiting for the bond0 interface to be online forever. We do have a chicken egg situation now: * ifupdown trys to put bond0 interface online. * we are not running upstart network-interface script. * ifupdown for bond0 calls ifenslave. * ifenslave tries to find interfaces with "allow-bondX" stanza * ifenslave tries to ifup slave interfaces with that stanza * slave interfaces keep forever waiting for the master * master is waiting for the slave interface * slave interface is waiting for the master interface ... :D And we have an infinite loop for ifenslave: """ # Wait for the master to be ready [ ! -f /run/network/ifenslave.$BOND_MASTER ] && echo "Waiting for bond master $BOND_MASTER to be ready" while :; do if [ -f /run/network/ifenslave.$BOND_MASTER ]; then break fi sleep 0.1 done """ *********** Conclusion That can be achieved if correct triggers are set (like the ones I just showed). Not having ifupdown parallel executions (sysv and upstart, for example) can make an infinite loop to happen during the boot. Having parallel ifupdown executions can trigger race conditions between: 1) ifupdown itself (case a on the bug description). 2) ifupdown and ifenslave script (case b on the bug description). To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp