applied

On 3/18/20 10:46 AM, Thomas Lamprecht wrote:
CONTAINER_INTERFACE[0] is something systemd people call their API and
we need to adapt to it a bit, even if it means doing stupid
unnecessary things, as else systemd decides to regress and suddenly
break network stack in CT after an upgrade[1].

This mounts the parent /sys as mixed, which is:
mount /sys as read-only but with /sys/devices/virtual/net writable.
-- man 5 lxc.container.conf

Allow users to overwrite that with a features knob, as surely some
run into other issues else and manually adding a "lxc.mount.auto"
entry in the container .conf is not an nice user experience for most.

Fixes the system regression in up to date Arch installations
introduced by[2].

[0]: https://systemd.io/CONTAINER_INTERFACE/
[1]: https://github.com/systemd/systemd/issues/15101#issuecomment-598607582
[2]: 
https://github.com/systemd/systemd/commit/bf331d87171b7750d1c72ab0b140a240c0cf32c3

Signed-off-by: Thomas Lamprecht <t.lampre...@proxmox.com>
---

changes v1 -> v2:
* use sys:mixed and only do this for upriv. CTs
* add knob to allow easier opting out of this

  src/PVE/LXC.pm        | 6 ++++++
  src/PVE/LXC/Config.pm | 7 +++++++
  2 files changed, 13 insertions(+)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index 0742a53..df52afa 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -662,6 +662,12 @@ sub update_lxc_config {
        $raw .= "lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file 0 
0\n";
      }
+ if ($unprivileged && !$features->{force_rw_sys}) {
+       # unpriv. CT default to sys:rw, but that doesn't always plays well with
+       # systemd, e.g., systemd-networkd 
https://systemd.io/CONTAINER_INTERFACE/
+       $raw .= "lxc.mount.auto = sys:mixed\n";
+    }
+
      # WARNING: DO NOT REMOVE this without making sure that loop device nodes
      # cannot be exposed to the container with r/w access (cgroup perms).
      # When this is enabled mounts will still remain in the monitor's namespace
diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index e88ba0b..0909773 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -331,6 +331,13 @@ my $features_desc = {
            ." This requires a kernel with seccomp trap to user space support (5.3 
or newer)."
            ." This is experimental.",
      },
+    force_rw_sys => {
+       optional => 1,
+       type => 'boolean',
+       default => 0,
+       description => "Mount /sys in unprivileged containers as `rw` instead of 
`mixed`."
+           ." This can break networking under newer (>= v245) systemd-network 
use."
+    },
  };
my $confdesc = {



_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to