Hey folks, I've upgraded this system to buster and it seems that either the new kernel (4.19.0-8-amd64) or new lxc version (1:3.1.0+really3.0.3-8) has fixed this problem: I can now again re-exec systemd in containers even with lxc.cap.drop = sys_admin enabled.
I guess this issue could be closed? Feel free to do so if you think it is appropriate. Anyway, below is some more info I collected a long time ago but never gotten around to cleaning up and sending. I'm including it here, in case it is useful for anyone else running into the same. Gr. Matthijs == Old debugging info below == When running systemd with debug loglevel (in /etc/systemd/system.conf), I see the following on boot (from the console logfile, since journald isn't running at that point yet): Using cgroup controller name=systemd. File system hierarchy is at /sys/fs/cgroup/systemd. Release agent already installed. When reexecuting systemd, I get the following (from journalctl): Using cgroup controller name=systemd. File system hierarchy is at /sys/fs/cgroup/systemd/../... Release agent already installed. Failed to create /../../init.scope control group: Operation not permitted Failed to allocate manager object: Operation not permitted The ../../init.scope is, I think, based on this file: $ cat /proc/1/cgroup 10:freezer:/ 9:pids:/../../init.scope 8:net_cls,net_prio:/ 7:devices:/../../init.scope 6:blkio:/../../init.scope 5:memory:/../../init.scope 4:perf_event:/ 3:cpu,cpuacct:/../../init.scope 2:cpuset:/ 1:name=systemd:/../../init.scope This is how it looks before and after the re-exec. I'm not sure what this file looks like when systemd first starts in the container, but I suspect the ../../ is not there yet, given the "File system hierarchy is at /sys/fs/cgroup/systemd" log message, or maybe systemd does not read it on initial startup? On the host, the file looks like this: $ cat /proc/1/cgroup 10:freezer:/ 9:pids:/init.scope 8:net_cls,net_prio:/ 7:devices:/init.scope 6:blkio:/init.scope 5:memory:/init.scope 4:perf_event:/ 3:cpu,cpuacct:/init.scope 2:cpuset:/ 1:name=systemd:/init.scope When I look up the container's pid 1 on the host, it looks like this: matthijs@tika:/etc/lxc$ cat /proc/1755/cgroup 10:freezer:/lxc/template 9:pids:/init.scope 8:net_cls,net_prio:/lxc/template 7:devices:/init.scope 6:blkio:/init.scope 5:memory:/init.scope 4:perf_event:/lxc/template 3:cpu,cpuacct:/init.scope 2:cpuset:/lxc/template 1:name=systemd:/init.scope When I start the container *with* CAP_SYS_ADMIN, the file inside the container looks different: matthijs@template:~$ cat /proc/1/cgroup | grep systemd 1:name=systemd:/init.scope When I look up the container's pid 1 on the host, it looks like this: matthijs@tika:/etc/lxc$ sudo cat /proc/507/cgroup | grep systemd 1:name=systemd:/lxc/template/init.scope == New debug info == After the upgrade to buster, it seems that the scopes are now correct. Inside the container *without* CAP_SYS_ADMIN, I now get: $ cat /proc/1/cgroup |grep systemd 1:name=systemd:/init.scope
signature.asc
Description: PGP signature