On 11/8/19 11:06 AM, Wolfgang Bumiller wrote: > The pve-common path of this patch set should be straight forward: > minor additions to ProcFSTools and Tools, as well as the new mount api > constants added to Syscall.pm. >
It is, thus applied that one already. > The container part then makes use of the new mount api in case the > currently running kernel supports it. The hope for the future would be > to simplify the code a bit once we can stop supporting kernels older > than 5.2. So 7.X material right there ^^ > For now, it starts with the ability to stage a mount point, and then > moves on to changing the startup process to use this. > Previously, the startup goes through the mount points in order and > mounts them directly at the target location. This is prone to symlink > attacks (especially when using nested shared bind mounts). > When staging a mount in a fixed directory first, we can pick it up > afterwards with the new `open_tree()` syscall, and move it in place with > the new `move_mount()` syscall, which can work relative to directory > file descriptors and has flags for whether or not the paths are allowed > to follow symlinks. (In the future this can be hardened even more using > `openat2()` using the container's root directory as "implicit chroot" > while looking up the target directory and then issuing a `move_mount()` > right onto the resulting path file descriptor via > `MOVE_MOUNT_T_EMPTY_PATH`.) > > The main advantage of the new API however, is that we can pick up the > mounts as file descriptors, then switch into the running container's > mount namespace and `move_mount()` the mount point in place, without > having to rely on an existing MS_SHARED mount point "hack". Hence the > final patch adds support for mount point hotplugging - but only hotplug, > not un-plug, since unmounting has a lot of issues (open file > descriptors, unshared MS_PRIVATE mount namespaces referencing the mount > (as well as those namespaces opened as file descriptors...), mounts > having been moved (if they were previously hotplugged at least), ...). sounds all good, need to take a closer look at the meat tough :) Oguz, can you please give this also a testing spin? > > Wolfgang Bumiller (8): > implement "staged mountpoints" > add open_pid_fd, open_lxc_pid, open_ppid helpers > split open_namespace out of enter_namespace > add get_container_namespace helper > add mount stage directory helpers > prestart-hook: use staged mountpoints on newer kernels > config: vmconfig_apply_pending_mountpoint helper > implement mountpoint hotplugging > > src/PVE/LXC.pm | 183 ++++++++++++++++++++++++++++++++++++-- > src/PVE/LXC/Config.pm | 87 ++++++++++++------ > src/lxc-pve-prestart-hook | 79 +++++++++++++--- > 3 files changed, 304 insertions(+), 45 deletions(-) > _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel