I am chasing a bug in the systemd response to the initiation of an ethernet 
interface "device" unit by the kernel.  When an ethernet interface "device" 
unit is initiated by the kernel, systemd will Start an associated service unit 
in response, but will subsquently "Enqueue" that same already running service 
unit a second time.  Consequently, systemd will forcefully terminate the 
already running service unit, and then restart it.  After this first "mystery" 
restart, the service units then operate as expected.

This "Replace" action is a pointless waste of time, of course.  Can someone 
familiar with the systemd state engine explain the source of the trigger which 
causes systemd to "Enqueue" a service unit a second time, while that same 
service unit is already running?

Since systemd is not able to mount a root file system when booting with systemd 
"debug" reporting, enabled from the kernel command line, it is necessary to set 
systemd "debug" reporting after boot, and then initialize an ethernet interface 
"device" on a running system.  This can be done by removing and then reloading 
the ethernet hardware device driver using modprobe.

The function of the service units effected is to configure networking on the 
interface device.  The service units configure WIDE dhcp6c, a sit tunnel, 
udhcp, and some static addresses.  The service units for udhcp and the static 
addresses are somehow not effected by this "Replace" bug.

The service units are formatted as shown here:
----
After= sys-subsystem-net-devices-%i.device
BindsTo= sys-subsystem-net-devices-%i.device
ConditionPathExists= /sys/class/net/%i

[Service]
EnvironmentFile= /etc/conf.d/network.conf
Type= oneshot
RemainAfterExit= yes
...
[Install]
WantedBy= sys-subsystem-net-devices-%i.device
----

The udhcp6c service unit also includes:
----
 TimeoutStopSec= 20s
 Restart= no
----

For dhcp6c and the tunnel, the log shows:
----
 systemd[1]: Starting WIDE-DHCPv6 dhcp6c with PD Subnet on enp4s0 and NA on 
interface enp6s0...
...
 systemd[1]: Starting Tunnel Interface "sit1" through Physical Interface 
enp6s0...
...
 systemd[1]: Got message type=method_call sender=n/a 
destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 
interface=org.freedesktop.systemd1.Manager member=RestartUnit  cookie=1 
reply_cookie=0 signature=ss error-name=n/a error-message=n/a
 systemd[1]: sit1@enp6s0.service: Trying to enqueue job 
sit1@enp6s0.service/restart/replace
 systemd[1]: sit1@enp6s0.service: Merged into running job, re-running: 
sit1@enp6s0.service/restart as 7268
 systemd[1]: sit1@enp6s0.service: Enqueued job sit1@enp6s0.service/restart as 
7268
...
 systemd[1]: sit1@enp6s0.service: Converting job sit1@enp6s0.service/restart -> 
sit1@enp6s0.service/start
----
and similarly:
----
 systemd[1]: Got message type=method_call sender=n/a 
destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 
interface=org.freedesktop.systemd1.Manager member=RestartUnit  cookie=1 
reply_cookie=0 signature=ss error-name=n/a error-message=n/a
 systemd[1]: dhcp6c-enp4s0@enp6s0.service: Trying to enqueue job 
dhcp6c-enp4s0@enp6s0.service/restart/replace
 systemd[1]: dhcp6c-enp4s0@enp6s0.service: Installed new job 
dhcp6c-enp4s0@enp6s0.service/restart as 7984
 systemd[1]: dhcp6c-enp4s0@enp6s0.service: Enqueued job 
dhcp6c-enp4s0@enp6s0.service/restart as 7984
...
 systemd[1]: dhcp6c-enp4s0@enp6s0.service: Converting job 
dhcp6c-enp4s0@enp6s0.service/restart -> dhcp6c-enp4s0@enp6s0.service/start
----

The basic question is, why does systemd "enqueue" these running service units?  
And, who or what is "sender=n/a", "member=RestartUnit", which appears to be the 
intermediary for this "restart/replace" process?  What is the trigger?  Why are 
the udhcp and the static address service units not effected in the same way?

This "Replace" state does not seem to be well documented, but man systemctl 
does tell us:
----
--job-mode=
           When queuing a new job, this option controls how to deal with 
already queued jobs. It takes one of "fail", "replace", "replace-irreversibly", 
"isolate", "ignore-dependencies", "ignore-requirements", "flush", "triggering", 
or "restart-dependencies". Defaults to "replace", except when the isolate 
command is used which implies the "isolate" job mode.
...
           If "replace" (the default) is specified, any conflicting pending job 
will be replaced, as necessary.
...
----

As used there, does "conflicting pending job" mean a service unit configured in 
another service unit as "Conflicts="?  But then, how would a service unit 
"Conflict" with itself?

Reply via email to