On Thu, May 13, 2021 at 4:12 PM Steve Langasek
<steve.langa...@ubuntu.com> wrote:
Hi there,
On Wed, May 12, 2021 at 05:52:07PM +1000, Christopher James Halse
Rogers wrote:
> There's an nfs-utils SRU¹ hanging around waiting for a policy
decision on
> use of the After=network-online.target systemd unit dependency.
I'm not an
> expert here, but it looks like part of my SRU rotation today is
starting the
> discussion on this so we can resolve it one way or another!
> I am not an expert in this area, but as I understand it, the
tradeoff here
> is:
> 1. Without a dependency on After=network-online.target there is
no guarantee
> that the network interface(s) will be usable at the time the
nfs-utils unit
> triggers, and nfs-utils will fail if the relevant ntwork
interface is not
> usable, or
> 2. With a dependency on After=network-online.target nfs-utils
will reliably
> start, but if there are any interfaces which are configured but
do not come
> up this will result in the boot hanging until the timeout is hit.
> In mitigation of (2), there are apparently a number of default
packages
> which already have a dependency on After=network-online.target,
so boot
> hanging if interfaces are down is the status quo?
From one of the comments in the bug report, I gathered that systemd
upstream
(who, specifically?) was taking a position that distributions
should not use
After=network-online.target. I think this is entirely unhelpful;
the target
exists for this purpose, it is not required for systemd internally
to get
the system up but exists only for other services to depend on.
There are risks of services not starting on boot because the
network-online
target is not reached. However, that is not the same thing as a
"hung
boot", because other services will still start on their own, and
things like
gdm and tty don't depend on network-online.target, *unless* you're
in a
situation where you've introduced a dependency between the
filesystem and
network-online. This is possible when we're talking about nfs,
because the
same system might both export nfs filesystems and mount them from
localhost.
But I'm not sure it should block this specific change.
> The obvious thing to do here would be to follow Debian, but as
far as I can
> tell there is not currently a Debian policy about this - the best
I can find
> is an ancient draft of a best-practises-guide² suggesting that
pacakages
> SHOULD handle networking dynamically, but if they do not MUST
have a
> dependency on After=network-online.target
> As far I understand it, handling networking dynamically requires
upstream
> code changes (although maybe fairly simple code changes?).
It does require upstream code changes; not always simple. And it's
not
always *correct* to make upstream code changes instead of simply
starting
the service when the system is "online"; you can find a number of
examples
in Ubuntu of services that it only makes sense to start once your
network is
"up" - e.g. apt-daily.service, update-notifier, whoopsie, ...
There are issues with the network-online target, to be sure. There
is not a
clear definition of the target, and there have definitely been
implementation bugs in what does/does not block the target. I've
had
discussions with the Foundations Team in the past about this but it
has yet
to result in a specification.
My working definition of what network-online.target SHOULD mean is:
- at least one interface is up, with routes
- all interfaces which are 'optional: no' (netplan sense) are up
- including completion of ipv6 RA and ipv4 link-local if enabled
on the
interface
- there is a default route for at least one configured address
family
- attempts to discover default routes for other configured address
families
have completed (success or fail)
- DNS is configured
Thinks that must not block the network-online target:
- interfaces that are marked 'optional: yes'
- address sources that are listed in 'optional-addresses' for an
interface
- default route for an address family for which no interfaces have
addresses
At least historically, neither networkd nor NetworkManager has
fulfilled
this definition. It would be nice to get there, but the first step
is
having some agreed definition such as the above so that we can treat
deviations as bugs.
If netplan.io can implement that would be nice. I.e. either
synthetically (i.e. by generating a service unit on the fly that calls
systemd-networkd-wait-online with extra arguments specifying all the
non-optional interfaces) , or by creating a new binary which is
"netplan-wait-online" which will be wanted by network-online.target
and perform all of the above.
> It seems unlikely that, whatever we decide, we'll immediately do
a full
> sweep of the archive and fix everything, so it looks like our
choice is
> between:
> 1. The long-term goal is to have no After=network-online.target
dependencies
> in default boot (stretch goal: in main). Whenever we run into a
> package-fails-if-network-is-not-yet-up bug, we patch the code and
submit
> upstream. Over time we audit existing users of
After=network-online.target
> and patch them for dynamic networking, as time permits.
> 2. We don't expect to be able to reach no
After=network-online.target
> dependencies in the default boot, so it's not a priority to avoid
them.
> Whenever we run into a package-fails-if-network-is-not-yet-up
bug, we add an
> After=network-online.target dependency.
3. We expect to reach network-online.target in the common case,
but accept
that there are systems for which it will ordinarily not be reached
on boot
(i.e. offline systems). Services which depend on
network-online.target
should be those which it is reasonable to not start if the system
is not
connected to the Internet. This includes systems that are
connected to a
local network, but have no default route.
So from my point of view a short term fix of like having
After=network-online.target or even
[Unit]
After=systemd-resolved.service
[Service]
ExecStartPre=-/lib/systemd/systemd-networkd-wait-online --any
--timeout 30
Is fine to be SRUed.
However, I still have the same question - what if network connectivity
drops & gets re-established? Should we bounce the
network-online.target (aka restart it)? We can declare for units to be
restarted, when network-online.target is restarted, if they otherwise
themselves are incapable to dynamically detect networking loss &
networking resumption.
If we use this as the standard, it's easy to see that *in principle*
nfs-utils shouldn't depend on there being a route to the global
Internet.
It does, however, at least give us a framework for understanding the
behavior, and for users to modify the behavior if they have
different
requirements.
None of this makes it any safer for an SRU, since at the end of the
day if
users have such a config that is impacted if you set
After=network-online.target for nfs-utils, it would still be a
regression.
--
Steve Langasek Give me a lever long enough and a
Free OS
Debian Developer to set it on, and I can move the
world.
Ubuntu Developer
https://www.debian.org/
slanga...@ubuntu.com
vor...@debian.org
--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
--
Regards,
Dimitri.
--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel