Hi all, I recently read the source code of sdhcp, and it does seem to be a nice tool, but I couldn't help but notice a few things. Now, maybe I'm dense, and none of the things I'm about to bring up matter for some reason, and if so, that would be really cool. But some of these really do seem like major issues, so here goes:
1. Timeout based on alarm() --------------------------- That is, in principle, not a bad way to do it, but unsuitable for a laptop application, which might see extended time spent in a sleep state. This is mostly an issue during the Bound state, as I have no idea if the alarms get adjusted for the time slept, and what happens if the alarm times out during sleep. That said, I also have no idea what happens to monotonic clocks, so what the heck. 2. Signal handlers established with signal() -------------------------------------------- Unfortunately, the BSD-v-SysV war has destroyed that interface beyond usability. On Linux, it depends on the libc, and for glibc also on the compiler flags, whether you get oneshot behavior or persistent behavior. So now we have to use sigaction() to establish signal handlers, or always re-establish the handlers from within, just in case. musl gives the BSD behavior always, BTW. 3. Broken timing model ---------------------- The code reads the value from the Lease Time option into a variable called "t1", and proceeds to use that variable as T1 (i.e. the time after which the lease should be renewed). That is not what the lease time is! The RFC states that after the lease time has elapsed, the client must no longer use the address granted to it, and failing to do so was the reason why many Apple devices where banned from some Ivy League campus network or other, some years back; which was a great source of humour for me and others. Yet here we are, doing the same thing wrong. The RFC states that T1 is specified in option 58, and T2 in option 59. Failing these, T1 is 1/2 lease time, and T2 is 7/8 lease time. Apropos failing: 4. Christian search routines ---------------------------- The Bible states "Seek and you will find" but it might not be a great idea to implement this in a computer program. Unfortunately, optget() has absolutely no way of communicating failure to find a given option, and its callers don't seem to care much, either way. "Be lenient in what you accept" is all well and good, but not when the most important pieces of data are missing. 5. Absolutely no timeout in Renewing and Rebinding states --------------------------------------------------------- I couldn't find info on what happens after an alarm fires, but the only behaviors that make any sense are (a) alarm() is oneshot, and (b) alarm() is cyclical. Once sdhcp has made it into Bound state, an alarm() for the lease time is registered. When that fires, no further alarm is registered as sdhcp sends out a single request for renewal. If that packet, or the acknowledgment of it, drops for any reason, we're stuck in Renewing with either no timeout, if (a) is true, or one lease time (typically something like 1 day) as timeout, if (b) is true. The RFC states that the client should set the packet timeout in these states to half the remaining time (until T2 or lease time), down to one minute. 6. No XID reroll or comparison ------------------------------ The XID is rolled once, at the start of the process, and never rolled again. Shouldn't a different XID be used for each separate transaction? That might be down to taste, but this one is not: XID is never compared to the XID of a received packet. And neither is the CID. Thus, sdhcp ends up stealing people's leases! I mean, the whole protocol is broadcast, we're bound to see a packet that wasn't meant for us. 7. Use of the realtime clock for time measurement ------------------------------------------------- For some reason I don't quite get, DHCP clients are required to send the time they spent working in the packet header. Well, not required; we could just null the field. But since we do send something, we might as well do it right. Currently, the time is measured by way of the time() function, which returns the current realtime clock value. Thus, if we were starting up on a system without battery backup, we would get a time in 1970. If after getting a lease, the clock is then corrected with NTP, suddenly we will tell the DHCP server that we spent almost 50 years talking to it, which it might take exception to. All of this could be solved by way of clock_gettime(CLOCK_MONOTONIC). Just wrap it with an interface similar to time() first. 8. Race condition for timeouts ------------------------------ If an alarm should fire while we are not blocked in a syscall, it will be ignored, and the program will hang. That could be solved with a global flag that is set from the signal handler, but that will significantly increase complexity (since we need to reset the flag after taking notice of it) and is not without its pitfalls (cf. ppoll()). Might be better just to use a self-pipe. 9. poll() with 1 descriptor? ---------------------------- That one I just don't get. Wouldn't it be simpler to park the program in a blocking recv()? Or am I missing something. I mean, as written, the poll is conceptually useless. Add another descriptor, that's another matter. 10. UDP recvfrom() misunderstanding ----------------------------------- udprecv() seems to suffer from the misconception that the socket address supplied to recvfrom() is an input parameter. It is not. A UDP socket will receive every packet sent to the right port, and the socket address in recvfrom() is an output parameter, identifying the source of that particular datagram. 11. Failing to set "up" flag at start ------------------------------------- Most other popular Linux DHCP clients set the "up" flag on the interface before starting. I'm not 100% on what happens if it is missing, but I guess that would mean no sending and no receiving packets. And I am all but certain the flag is missing on startup. Might be worth documenting at least. Want a patch? Might take a while, though. Ciao, Markus