On 2018-10-11, Markus Wichmann <nullp...@gmx.net> wrote: > Hi all, > > I recently read the source code of sdhcp, and it does seem to be a nice > tool, but I couldn't help but notice a few things. Now, maybe I'm dense, > and none of the things I'm about to bring up matter for some reason, and > if so, that would be really cool. But some of these really do seem like > major issues, so here goes:
Hey Markus, I have a branch of sdhcp[0] I'm using for oasis that fixes some of these issues. I didn't send them to the ML though since they were linux-specific (timerfd, CLOCK_BOOTTIME, ...), and I wasn't sure if sdhcp was meant to work on other systems as well. It's been a while since I looked at it, but sdhcp has mostly been working fine for me since then. Though, there have been a couple of occasions where I wasn't able to connect for whatever reason, so I keep a build of ISC dhclient as a backup. It'd be nice to get any remaining issues resolved. [0] https://github.com/michaelforney/sdhcp > 1. Timeout based on alarm() > --------------------------- > > That is, in principle, not a bad way to do it, but unsuitable for a > laptop application, which might see extended time spent in a sleep > state. This is mostly an issue during the Bound state, as I have no idea > if the alarms get adjusted for the time slept, and what happens if the > alarm times out during sleep. > > That said, I also have no idea what happens to monotonic clocks, so what > the heck. > 2. Signal handlers established with signal() > -------------------------------------------- > > Unfortunately, the BSD-v-SysV war has destroyed that interface beyond > usability. On Linux, it depends on the libc, and for glibc also on the > compiler flags, whether you get oneshot behavior or persistent behavior. > So now we have to use sigaction() to establish signal handlers, or > always re-establish the handlers from within, just in case. > > musl gives the BSD behavior always, BTW. > > 3. Broken timing model > ---------------------- > > The code reads the value from the Lease Time option into a variable > called "t1", and proceeds to use that variable as T1 (i.e. the time > after which the lease should be renewed). That is not what the lease > time is! > > The RFC states that after the lease time has elapsed, the client must no > longer use the address granted to it, and failing to do so was the > reason why many Apple devices where banned from some Ivy League campus > network or other, some years back; which was a great source of humour > for me and others. Yet here we are, doing the same thing wrong. > > The RFC states that T1 is specified in option 58, and T2 in option 59. > Failing these, T1 is 1/2 lease time, and T2 is 7/8 lease time. Apropos > failing: > > 4. Christian search routines > ---------------------------- > > The Bible states "Seek and you will find" but it might not be a > great idea to implement this in a computer program. Unfortunately, > optget() has absolutely no way of communicating failure to find a given > option, and its callers don't seem to care much, either way. "Be lenient > in what you accept" is all well and good, but not when the most > important pieces of data are missing. > > 5. Absolutely no timeout in Renewing and Rebinding states > --------------------------------------------------------- > > I couldn't find info on what happens after an alarm fires, but the only > behaviors that make any sense are (a) alarm() is oneshot, and (b) > alarm() is cyclical. Once sdhcp has made it into Bound state, an alarm() > for the lease time is registered. When that fires, no further alarm is > registered as sdhcp sends out a single request for renewal. If that > packet, or the acknowledgment of it, drops for any reason, we're stuck > in Renewing with either no timeout, if (a) is true, or one lease time > (typically something like 1 day) as timeout, if (b) is true. > > The RFC states that the client should set the packet timeout in these > states to half the remaining time (until T2 or lease time), down to one > minute. > > 6. No XID reroll or comparison > ------------------------------ > > The XID is rolled once, at the start of the process, and never rolled > again. Shouldn't a different XID be used for each separate transaction? > That might be down to taste, but this one is not: XID is never compared > to the XID of a received packet. And neither is the CID. Thus, sdhcp > ends up stealing people's leases! I mean, the whole protocol is > broadcast, we're bound to see a packet that wasn't meant for us. > > 7. Use of the realtime clock for time measurement > ------------------------------------------------- > > For some reason I don't quite get, DHCP clients are required to send the > time they spent working in the packet header. Well, not required; we > could just null the field. But since we do send something, we might as > well do it right. Currently, the time is measured by way of the time() > function, which returns the current realtime clock value. Thus, if we > were starting up on a system without battery backup, we would get a time > in 1970. If after getting a lease, the clock is then corrected with NTP, > suddenly we will tell the DHCP server that we spent almost 50 years > talking to it, which it might take exception to. > > All of this could be solved by way of clock_gettime(CLOCK_MONOTONIC). > Just wrap it with an interface similar to time() first. I think what you want here is CLOCK_BOOTTIME, which is like CLOCK_MONOTONIC, but also counts time sleeping. This way, if you resume your computer from sleep long after the lease expired, sdhcp will immediately attempt to get a new lease. > 8. Race condition for timeouts > ------------------------------ > > If an alarm should fire while we are not blocked in a syscall, it will > be ignored, and the program will hang. That could be solved with a > global flag that is set from the signal handler, but that will > significantly increase complexity (since we need to reset the flag after > taking notice of it) and is not without its pitfalls (cf. ppoll()). > Might be better just to use a self-pipe. > > 9. poll() with 1 descriptor? > ---------------------------- > > That one I just don't get. Wouldn't it be simpler to park the program in > a blocking recv()? Or am I missing something. I mean, as written, the > poll is conceptually useless. Add another descriptor, that's another > matter. > > 10. UDP recvfrom() misunderstanding > ----------------------------------- > > udprecv() seems to suffer from the misconception that the socket address > supplied to recvfrom() is an input parameter. It is not. A UDP socket > will receive every packet sent to the right port, and the socket address > in recvfrom() is an output parameter, identifying the source of that > particular datagram. > > 11. Failing to set "up" flag at start > ------------------------------------- > > Most other popular Linux DHCP clients set the "up" flag on the interface > before starting. I'm not 100% on what happens if it is missing, but I > guess that would mean no sending and no receiving packets. And I am all > but certain the flag is missing on startup. Might be worth documenting > at least. > > Want a patch? Might take a while, though. > > Ciao, > Markus