> Is there no other more reliable way to reproduce the case that's being
fixed here

sure yes, here's what i did; first, setup:

-create VMs for the releases (T/X/A), managed by virsh (e.g. with uvt-kvm or 
whatever)
-add a second interface to each of them, e.g.:

$ virsh attach-interface lp1718568-artful network default --model virtio
--persistent

-set up another server/vm/container/whatever, connected to the same
network as the test VM second interfaces, and install and configure isc-
dhcp-server (or dnsmasq or whatever dhcpv6 server) on that to serve out
dhcpv6

now on each release VM:

-check the virsh interface you'll be using, e.g.:

$ virsh domiflist lp1718568-artful
Interface  Type       Source     Model       MAC
-------------------------------------------------------
vnet3      network    default    virtio      52:54:00:3f:b9:ad
vnet0      network    default    virtio      52:54:00:bf:16:8a

confirm which matches the test interface in the VM using mac; in my case
it's vnet0.  now, bring its link state down using virsh:

$ virsh domif-setlink lp1718568-artful vnet0 down

and ssh into the test VM (on its first, still working, default interface
- or use virt-viewer or whatever) and test dhclient -6, making sure to
first verify the test interface is down (i.e. that it doesn't already
have a link-local addr):

ubuntu@lp1718568-artful:~$ sudo ip a show ens7
3: ens7: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group 
default qlen 1000
    link/ether 52:54:00:bf:16:8a brd ff:ff:ff:ff:ff:ff

ubuntu@lp1718568-artful:~$ sudo dhclient -v -6 ens7
Internet Systems Consortium DHCP Client 4.3.5
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

no link-local IPv6 address for ens7

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at www.isc.org or in the README file
before submitting a bug.  These pages explain the proper
process and the information we find helpful for debugging..

exiting.


as expected (for this bug), that fails immediately.  now, upgrade to the test 
version in my ppa
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1718568

ubuntu@lp1718568-artful:~$ dpkg -l | grep isc-dhcp
ii  isc-dhcp-client                            
4.3.5-3ubuntu2.2+hf1718568v20180302b1        amd64        DHCP client for 
automatically obtaining an IP address
ii  isc-dhcp-common                            
4.3.5-3ubuntu2.2+hf1718568v20180302b1        amd64        common manpages 
relevant to all of the isc-dhcp packages


again, make sure the interface is down (no link-local addr) and try the new 
dhclient:

ubuntu@lp1718568-artful:~$ sudo ip l set down dev ens7
ubuntu@lp1718568-artful:~$ sudo ip a show ens7
3: ens7: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group 
default qlen 1000
    link/ether 52:54:00:bf:16:8a brd ff:ff:ff:ff:ff:ff

ubuntu@lp1718568-artful:~$ sudo dhclient -v -6 ens7
Internet Systems Consortium DHCP Client 4.3.5
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/


now, instead of immediately exiting with error, it waits - this is where it's 
waiting for the interface to get a 'tentative' link-local address, and then 
complete dad to switch to normal link-local so dhcpv6 can begin.  After only a 
few seconds (dhclient-script.linux defaults to 60 attempts, with 0.1 second 
delays between, so ~6 seconds of waiting) it will give up and exit as before:

no link-local IPv6 address for ens7

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at www.isc.org or in the README file
before submitting a bug.  These pages explain the proper
process and the information we find helpful for debugging..

exiting.


Ok, we verified the patch does force dhclient to wait for the link-local addr 
(even without any tentative addr); now bring the interface back down and 
re-test, but this time immediately switch back to the host and use virsh to 
bring the link state back up (before dhclient times out):

ubuntu@lp1718568-artful:~$ sudo ip l set down dev ens7
ubuntu@lp1718568-artful:~$ sudo ip a show ens7
3: ens7: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group 
default qlen 1000
    link/ether 52:54:00:bf:16:8a brd ff:ff:ff:ff:ff:ff

ubuntu@lp1718568-artful:~$ sudo dhclient -v -6 ens7
Internet Systems Consortium DHCP Client 4.3.5
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/


now quickly in the host:

$ virsh domif-setlink lp1718568-artful vnet0 up
Device updated successfully


and back in the test VM:

Listening on Socket/ens7
Sending on   Socket/ens7
PRC: Soliciting for leases (INIT).
XMT: Forming Solicit, 0 ms elapsed.
XMT:  X-- IA_NA 00:bf:16:8a
XMT:  | X-- Request renew in  +3600
XMT:  | X-- Request rebind in +5400
XMT:  | X-- Request address 2001:db8::99.
XMT:  | | X-- Request preferred in +7200
XMT:  | | X-- Request valid in     +10800
XMT: Solicit on ens7, interval 1080ms.
RCV: Advertise message on ens7 from fe80::5054:ff:fec9:897c.
RCV:  X-- IA_NA 00:bf:16:8a
RCV:  | X-- starts 1520020524
RCV:  | X-- t1 - renew  +3600
RCV:  | X-- t2 - rebind +7200
RCV:  | X-- [Options]
RCV:  | | X-- IAADDR 2001:db8::99
RCV:  | | | X-- Preferred lifetime 604800.
RCV:  | | | X-- Max lifetime 2592000.
RCV:  X-- Server ID: 00:01:00:01:22:2c:61:12:52:54:00:c9:89:7c
RCV:  Advertisement recorded.
PRC: Selecting best advertised lease.
PRC: Considering best lease.
PRC:  X-- Initial candidate 00:01:00:01:22:2c:61:12:52:54:00:c9:89:7c (s: 
10105, p: 0).
XMT: Forming Request, 0 ms elapsed.
XMT:  X-- IA_NA 00:bf:16:8a
XMT:  | X-- Requested renew  +3600
XMT:  | X-- Requested rebind +5400
XMT:  | | X-- IAADDR 2001:db8::99
XMT:  | | | X-- Preferred lifetime +7200
XMT:  | | | X-- Max lifetime +7500
XMT:  V IA_NA appended.
XMT: Request on ens7, interval 950ms.
RCV: Reply message on ens7 from fe80::5054:ff:fec9:897c.
RCV:  X-- IA_NA 00:bf:16:8a
RCV:  | X-- starts 1520020525
RCV:  | X-- t1 - renew  +3600
RCV:  | X-- t2 - rebind +7200
RCV:  | X-- [Options]
RCV:  | | X-- IAADDR 2001:db8::99
RCV:  | | | X-- Preferred lifetime 604800.
RCV:  | | | X-- Max lifetime 2592000.
RCV:  X-- Server ID: 00:01:00:01:22:2c:61:12:52:54:00:c9:89:7c
PRC: Bound to lease 00:01:00:01:22:2c:61:12:52:54:00:c9:89:7c.


patch works!

Of course, if the NIC can't get its interface up within the ~6 second
timeout, dhclient will still fail, but I think for any incredibly slow
interface hw like that, it's not unreasonable to expect some additional
ifupdown/netplan/networkd/NetworkManager configuration to delay dhclient
after bringing up the interface.  I can't imagine what HW takes more
than 6 seconds to bring up link state.

I should point out the regression potential for this as well - and nic
that's configured for dhcpv6 but has no link state previously failed
immediately, while with this patch it won't fail for ~6 seconds.  That
may delay boot by those 6 seconds if systemd/upstart is waiting for the
interface to get its dhcpv6 address.  However, consider that for an
interface that *does* have link-state, but there is no dhcpv6 server on
its network, dhclient will wait much, much longer for a dhcpv6 response.
I think the additional 6 seconds for a broken configuration is not
unreasonable to get some slow-to-come-up nics working.

** Description changed:

  [impact]
  
  bug 1633479 made a change to isc-dhcp to wait for an interface's link-local 
ipv6 address to switch from 'tentative' to normal, because all link-local 
addresses briefly go through a 'tentative' state while the kernel is performing 
ipv6 link-local 'duplicate address detection' (DAD).  While in the 'tentative' 
state, dhclient can't take over the interface and send out
  dhcpv6 requests; it must wait until DAD completes.
  
  However, the change made in that bug does not account for the case where
  the 'tentative' check is done before the interface has even set up a
  link-local address; its case statement assumes if there is no
  'tentative' or 'dadfailed' string in the output, the link-local address
  is ready to use.  When the address check finds no address at all, this
  will return as successful, even though it shouldn't, and dhclient will
  fail to get the dhcpv6 address.
  
  [test case]
  
  on a system that is configured for dhcpv6 on one or more of its
  interfaces, repeatedly try to get the dhcpv6 address.  For interfaces
  that are slower to actually set up their initial tentative link-local
  address, they will occasionally fail, since the current code is a race
  between the kernel adding the tentative link-local address, and the
  dhclient-script.linux code checking the interface for a tentative
  address.
  
  with the patch to correct this, even interfaces slow to add their
  tentative link-local address should correctly wait for the address to
  get added, and then transition from tentative to normal, and then begin
  the dhcpv6 process.
  
  [regression potential]
  
  errors in this function can cause dhclient to fail to get a ipv6 address
  for an interface; regression would happen if this patch makes it fail
  more than it already is failing, but would not cause other failures or
  problems after getting an ipv6 address; this patch will affect only
  startup-time.
  
+ additionally, the current behavior of dhclient when using an interface
+ that has no link-local address after being brought up is to exit
+ immediately; while after this patch dhclient will wait ~6 seconds before
+ exiting (while waiting for the interface to get a non-tentative link-
+ local addr).  This is the point of this bug, that some NIC hw doesn't
+ show a tentative link-local addr immediately after coming up.  However,
+ if dhclient -6 is configured to run on an interface without any link
+ state at all (e.g. its physical cable is unplugged), then while before
+ dhclient would exit immediately with error, it now waits 6 seconds.  If
+ the system is misconfigured like that, or if someone pulls a cable and
+ reboots, then system boot will be delayed an extra 6 seconds.  However,
+ that short delay for misconfigured/broken systems seems acceptable to
+ me, in exchange for allowing dhclient to work with slightly slow NIC hw.
+ Additionally, consider that if the problem is instead no dhcpv6 server,
+ dhclient -6 will wait a much, much longer amount of time for a dhcpv6
+ response before giving up.
+ 
+ 
  [other info]
  
  related bug 1633479
  
- 
  [original description]
- 
  
  Summary:
  ========
  
  If a interface does not yet have a link-local address (as it may have
  just been brought up), dhclient -6 <ifname> will fail. The built-in
  "wait for link-local address" loop does not function properly, causing
  DHCP failure.
  
  Discussion:
  ===========
  
  In trying to configure isc-dhcp-client 4.3.5-3ubuntu1 for IPv6 on Ubuntu
  17.04, I was finding that on boot I was getting failures with the logged
  message "no link-local IPv6 address for <ifname>"
  
  I found that it took several seconds for the link-local address to be
  assigned when the interface came up (in this case, the ISP/modem-facing
  interface), and worked around it with a script that looks at
  
    /sbin/ifconfig $IFACE | /bin/fgrep -q 'scopeid 0x20'
  
  and loops for a fixed number of times for that to be successful.
  
  On looking at /sbin/dhclient-script it appears that it *tries* to do the
  same thing in
  
    # set the link up and wait for ipv6 link local dad to finish
    ipv6_link_up_and_dad()
  
  this code sets
  
    out=$(ip -6 -o address show dev "$dev" scope link)
  
  then checks it with a case statement inside of a loop for
  
          case " $out " in
              *\ dadfailed\ *)
                  error "$dev: ipv6 dad failed."
                  return 1;;
              *\ tentative\ *) :;;
              *) return 0;;
          esac
  
  If there is no link-local address, $out will be empty. The default case
  is taken, and the loop exits immediately:
  
  $ echo "'$out'" ; case " $out " in
  >     *\ dadfailed\ *)
  >         echo "dadfailed"
  >         ;;
  >     *\ tentative\ *)
  >         echo "tentative"
  >         ;;
  >     *)
  >         echo "default"
  > esac
  ''
  default
  
  As a result, there is no "wait for link-local address" and when there is
  no link-local address, dhclient fails later on.
  
  Possible Fix:
  =============
  
  Adding "the missing case" for "no address" case that continues the loop
  is one possible solution.
  
  .        case " $out " in
  .            *\ dadfailed\ *)
  .                error "$dev: ipv6 dad failed."
  .                return 1;;
  .            *\ tentative\ *) :;;
  +            "  ")
  +                :
  +                ;;
  .            *) return 0;;
  .        esac
  
  At least in my situation, this prevents the failure of dhclient due to
  the link-local address not being "ready" yet.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1718568

Title:
  dhclient-script fails to wait for link-local address

Status in isc-dhcp package in Ubuntu:
  Fix Released
Status in isc-dhcp source package in Trusty:
  In Progress
Status in isc-dhcp source package in Xenial:
  In Progress
Status in isc-dhcp source package in Zesty:
  Won't Fix
Status in isc-dhcp source package in Artful:
  In Progress
Status in isc-dhcp source package in Bionic:
  Fix Released
Status in isc-dhcp package in Debian:
  New

Bug description:
  [impact]

  bug 1633479 made a change to isc-dhcp to wait for an interface's link-local 
ipv6 address to switch from 'tentative' to normal, because all link-local 
addresses briefly go through a 'tentative' state while the kernel is performing 
ipv6 link-local 'duplicate address detection' (DAD).  While in the 'tentative' 
state, dhclient can't take over the interface and send out
  dhcpv6 requests; it must wait until DAD completes.

  However, the change made in that bug does not account for the case
  where the 'tentative' check is done before the interface has even set
  up a link-local address; its case statement assumes if there is no
  'tentative' or 'dadfailed' string in the output, the link-local
  address is ready to use.  When the address check finds no address at
  all, this will return as successful, even though it shouldn't, and
  dhclient will fail to get the dhcpv6 address.

  [test case]

  on a system that is configured for dhcpv6 on one or more of its
  interfaces, repeatedly try to get the dhcpv6 address.  For interfaces
  that are slower to actually set up their initial tentative link-local
  address, they will occasionally fail, since the current code is a race
  between the kernel adding the tentative link-local address, and the
  dhclient-script.linux code checking the interface for a tentative
  address.

  with the patch to correct this, even interfaces slow to add their
  tentative link-local address should correctly wait for the address to
  get added, and then transition from tentative to normal, and then
  begin the dhcpv6 process.

  [regression potential]

  errors in this function can cause dhclient to fail to get a ipv6
  address for an interface; regression would happen if this patch makes
  it fail more than it already is failing, but would not cause other
  failures or problems after getting an ipv6 address; this patch will
  affect only startup-time.

  additionally, the current behavior of dhclient when using an interface
  that has no link-local address after being brought up is to exit
  immediately; while after this patch dhclient will wait ~6 seconds
  before exiting (while waiting for the interface to get a non-tentative
  link-local addr).  This is the point of this bug, that some NIC hw
  doesn't show a tentative link-local addr immediately after coming up.
  However, if dhclient -6 is configured to run on an interface without
  any link state at all (e.g. its physical cable is unplugged), then
  while before dhclient would exit immediately with error, it now waits
  6 seconds.  If the system is misconfigured like that, or if someone
  pulls a cable and reboots, then system boot will be delayed an extra 6
  seconds.  However, that short delay for misconfigured/broken systems
  seems acceptable to me, in exchange for allowing dhclient to work with
  slightly slow NIC hw.  Additionally, consider that if the problem is
  instead no dhcpv6 server, dhclient -6 will wait a much, much longer
  amount of time for a dhcpv6 response before giving up.

  
  [other info]

  related bug 1633479

  [original description]

  Summary:
  ========

  If a interface does not yet have a link-local address (as it may have
  just been brought up), dhclient -6 <ifname> will fail. The built-in
  "wait for link-local address" loop does not function properly, causing
  DHCP failure.

  Discussion:
  ===========

  In trying to configure isc-dhcp-client 4.3.5-3ubuntu1 for IPv6 on
  Ubuntu 17.04, I was finding that on boot I was getting failures with
  the logged message "no link-local IPv6 address for <ifname>"

  I found that it took several seconds for the link-local address to be
  assigned when the interface came up (in this case, the ISP/modem-
  facing interface), and worked around it with a script that looks at

    /sbin/ifconfig $IFACE | /bin/fgrep -q 'scopeid 0x20'

  and loops for a fixed number of times for that to be successful.

  On looking at /sbin/dhclient-script it appears that it *tries* to do
  the same thing in

    # set the link up and wait for ipv6 link local dad to finish
    ipv6_link_up_and_dad()

  this code sets

    out=$(ip -6 -o address show dev "$dev" scope link)

  then checks it with a case statement inside of a loop for

          case " $out " in
              *\ dadfailed\ *)
                  error "$dev: ipv6 dad failed."
                  return 1;;
              *\ tentative\ *) :;;
              *) return 0;;
          esac

  If there is no link-local address, $out will be empty. The default
  case is taken, and the loop exits immediately:

  $ echo "'$out'" ; case " $out " in
  >     *\ dadfailed\ *)
  >         echo "dadfailed"
  >         ;;
  >     *\ tentative\ *)
  >         echo "tentative"
  >         ;;
  >     *)
  >         echo "default"
  > esac
  ''
  default

  As a result, there is no "wait for link-local address" and when there
  is no link-local address, dhclient fails later on.

  Possible Fix:
  =============

  Adding "the missing case" for "no address" case that continues the
  loop is one possible solution.

  .        case " $out " in
  .            *\ dadfailed\ *)
  .                error "$dev: ipv6 dad failed."
  .                return 1;;
  .            *\ tentative\ *) :;;
  +            "  ")
  +                :
  +                ;;
  .            *) return 0;;
  .        esac

  At least in my situation, this prevents the failure of dhclient due to
  the link-local address not being "ready" yet.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1718568/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to