I bisected again, and again it came back to that mount point change.
This seems so bizarre.

$ git bisect log
# bad: [6d4f0a79e5a307b6fd3ee3cc5bbb2fcb701b09db] UBUNTU: Ubuntu-4.4.0-57.78
# good: [db5f146d309e70067dae57798c9ea679af835aa7] UBUNTU: Ubuntu-4.4.0-53.74
git bisect start 'Ubuntu-4.4.0-57.78' 'Ubuntu-4.4.0-53.74'
# bad: [02bf412367b827aa5be05a315088ef5fdcf267ca] dmaengine: at_xdmac: fix 
spurious flag status for mem2mem transfers
git bisect bad 02bf412367b827aa5be05a315088ef5fdcf267ca
# bad: [1e089050b800ba7d6ba1bf5814827e6cca301ad5] smc91x: avoid self-comparison 
warning
git bisect bad 1e089050b800ba7d6ba1bf5814827e6cca301ad5
# bad: [d7632bdaba3dd143eac3c80bb7e2b0f62259583d] xhci: use default 
USB_RESUME_TIMEOUT when resuming ports.
git bisect bad d7632bdaba3dd143eac3c80bb7e2b0f62259583d
# bad: [7942010de9a2fe39e72b84e628867f4ff29a70f2] libxfs: clean up 
_calc_dquots_per_chunk
git bisect bad 7942010de9a2fe39e72b84e628867f4ff29a70f2
# good: [9d2524b0bdeb57f80d0279f6695a833606ad0597] UBUNTU: SAUCE: Bluetooth: 
decrease refcount after use
git bisect good 9d2524b0bdeb57f80d0279f6695a833606ad0597
# bad: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount 
namespace limit on the number of mounts
git bisect bad fd4b5fa6e3487d15ede746f92601af008b2abbc0
# good: [f2109fe47ceb77647ef7d4f545efeba43d06fb64] videobuf2-v4l2: Verify 
planes array in buffer dequeueing
git bisect good f2109fe47ceb77647ef7d4f545efeba43d06fb64
# good: [d5d9494d2092a7e571dee635ca254075912355c1] thinkpad_acpi: Add support 
for HKEY version 0x200
git bisect good d5d9494d2092a7e571dee635ca254075912355c1
# first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per 
mount namespace limit on the number of mounts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been
  (re?)introduced that is breaking dhcp booting in the initrd
  environment.  This is stopping instances that use iscsi storage from
  being able to connect.

  Over serial console it outputs:

  IP-Config: no response after 2 secs - giving up
  IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
  IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP
  IP-Config: no response after 3 secs - giving up

  with increasing delays until it fails.  At which point a simple
  ipconfig -t dhcp -d "ens2f0"  works.  The console output is slightly
  garbled but should give you an idea:

  (initramfs) ipconfig -t dhcp -[  728.379793] ixgbe 0000:13:00.0 ens2f0: 
changing MTU from 1500 to 9000
  d "ens2f0"
  IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
  IP-Config: ens2f0 guessed broadcast address 10.0.1.255
  IP-Config: ens2f0 complete (dhcp from 169.254.169.254):
   addres[  728.980448] ixgbe 0000:13:00.0 ens2f0: detected SFP+: 3
  s: 10.0.1.56        broadcast: 10.0.1.255       netmask: 255.255.255.0
   gateway: 10.0.1.1   [  729.148410] ixgbe 0000:13:00.0 ens2f0: NIC Link is Up 
10 Gbps, Flow Control: RX/TX
        dns0     : 169.254.169.254  dns1   : 0.0.0.0
   rootserver: 169.254.169.254 rootpath:
   filename  : /ipxe.efi

  
  tcpdumps show that dhcp requests are being received from the host, and 
responses sent, but not accepted by the host.  When the ipconfig command is 
issued manually, an identical dhcp request and response happens, only this time 
it is accepted.  It doesn't appear to be that the messages are being sent and 
received incorrectly, just silently ignored by ipconfig.

  I was seeing this behaviour earlier this year, which I was able to fix
  by specifying "ip=dhcp" as a kernel parameter.  About a month ago that
  was identified as causing us other problems (long story) and we
  dropped it, at which point we discovered the original bug was no
  longer an issue.

  Putting "ip=dhcp" back on with this kernel no longer fixes the
  problem.

  I've compared the two initrds and effectively the only thing that has
  changed between the two is the kernel components.

  I'm going to try and track back through kernel versions to see if I
  can find which version the fix happened in to maybe provide some
  additional context.  I'll also attach copies of the initrds, packet
  captures etc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to