Hi! Before I get to the actual question, a bit of context: for NixOS we've recently been restructuring the units around PostgreSQL[0]. On our first iteration we've got
# postgresql.target [Install] WantedBy=multi-user.target [Unit] BindsTo=postgresql.service BindsTo=postgresql-setup.service # postgresql.service [Unit] BindsTo=postgresql.target After=network.target # postgresql-setup.service [Unit] Requires=postgresql.service After=postgresql.service [Service] Type=oneshot RemainAfterExit=yes The goals of this approach are * Being able to restart either the target or the server-unit and subsequently triggering a restart of both. * `postgresql.service` being "active" equals the server being in _at least_ read-only mode (Type=notify is load-bearing here) and `postgresql.target` equals the server being in read-write mode (that's one of the things, postgresql-setup is for). Now, when adding `Restart=always` to `postgresql.service` I noticed that sometimes both the target and the service get restarted and sometimes they don't. In a deeper investigation[1] I noticed that `BindsTo=` behaves differently compared to e.g. PartOf/Requires because of the `UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT` property in the systemd code which - to my understanding - schedules an immediate stop of the bound units when the binding unit gets stopped. This races with the Restart=always when e.g. killing the server. The result is sometimes that PostgreSQL doesn't get back up after killing it despite the Restart=always. Interestingly, I have one machine where I reliably kill unit+target and another where the auto-restart reliably happens (both are on the exact same nixpkgs commit, i.e. have the exact same software running). The most notable difference is that the former is essentially idle, so the best answer I have is that it somehow depends on how "busy" the service-manager is. To my understanding, using a combination of PartOf/Wants (or Requires) appears to work around this because the Restart gets handled _after_ the stop is propagated[1], so the entire problematic of a potential race is circumvented. What I want to get at: to me, this behavior was a little surprising and I ended up diving into the systemd code to understand the exact differences and the man-page didn't reflect that aspect that well. The `BindsTo` section in `systemd.unit(5)` states that it's > [...], very similar in style to Requires=. However, this dependency type > is stronger in addition to the effect of Requires= it declares that if the > unit bound to is stopped, this unit will be stopped too. To me this reads like the "stronger" aspect of `BindsTo=` is the that the stop will be propagated, just like it's the case with `PartOf=`. The property that a unit "strictly has to be in active state" is only mentioned for the combination of BindsTo & After below. Now, my plan is to actually contribute a fix for this, but upon starting I realized, that I need some pointers: * This seems like a little bit of a special case: you need bidirectional dependencies and at least one unit involved needs `BindsTo`. Does it make sense to add as another paragraph to `systemd.unit(5)`? Is there a better-suited place for this? Or even a reason that speaks for the status-quo, i.e. not documenting potential implementation details? * Am I on the right track with my observations? As mentioned above, I noticed today that this is reliably reproducible on an idle VM, but not on my workstation. Is there some insight I'm missing? Assuming, what I wrote above is actually correct, are there some more details around this that you'd like to see documented? Cheers! Ma27 [0] https://github.com/NixOS/nixpkgs/pull/403645 [1] https://github.com/NixOS/nixpkgs/pull/424625