On 23.09.24 13:39, Daniel Gröber wrote:
On Mon, Sep 23, 2024 at 12:25:15PM +0200, Chris Hofstaedtler wrote:
* Pierre-Elliott Bécue <p...@debian.org> [240923 11:34]:
I like ifupdown. It's simple and just works.
I find this quite funny, given a recent discussion about IPv6 dad
issues with ifupdown on #debian-admin.
The "discussion" was about ifup@eth0 being in a failed state on a
particular server due to a DAD failure and someone having to manually
intervene.
I find my ghost being invoked here.
Chris, what behaviour do you expect here? Below I'm going to assume what
you're getting at is that we should continue to retry DAD.
To me going to a stable failure state seems desirable. Continuing to re-try
for IPs could cause instability in the face of legitimate address
conflicts: when the owning machine reboots the conflicting machine would
now win the IP due to continous retrying. The change in owner would cause
disruption to services entirely unrelated to the machine that was just
rebooted.
DAD did not fail, it timed out after 60 sleeps of 0.1, aka 6s. The
kernel subsequently succeeded to configure the network. The script in
question was added in response to [1] and [2] to have a pause during
boot to give the kernel time to resolve the situation before continuing
the bootup. So it left the race around because there's not that much it
can do better as a script-based setup without much state.
Unfortunately there's zero information from ifup@eth0 in the process as
to when that happened. Which adds to the frustrating debugging stories
when you can't get enough intel about what happened after the fact.
(Which to be fair, also probably needs env vars to be set with
systemd-networkd to increase the debug level.) As far as I can see
processes started listening on the IP in question (that... again...
wasn't logged because it's eaten by the script) a second afterwards.
So no, it did not enter a stable state. It let the kernel do its thing,
which was to actually enable the address. I don't know why it takes
Linux to run DAD for that long and what the assumptions around that are.
But if you listen on netlink you learn when that happens and don't need
to poll and could send events once that happens.
To be ultimately fair to ifupdown: There was probably not much of a
winning move here. The annoying bit was the systemd service that was
still in a failed state even though the failure condition resolved
itself <1s later.
Kind regards
Philipp Kern
[1] https://www.agwa.name/blog/post/beware_the_ipv6_dad_race_condition
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705996