On Thu, Sep 01, 2016 at 03:11:30PM -0000, oleg wrote: > Question (related to UBUNTU: SAUCE: fs: Refuse uid/gid changes which don't > map into s_user_ns) > --------------------------- > > When an overlayfs is mounted inside a user-namespace, should it permit > the creation of inodes in the upper layer with uids that are outside the > user-namespace? > > > My Tentative Answer > ------------------- > > (TLDR; yes) > If a directory D is granted world-write permissions (in the initial > namespace) via 'chmod -R a+rwX', then any user is permitted to edit or delete > files in D, even if the files and D are owned by root.
Some operations are still restricted for normal users though, like chowning a file that user doesn't own or setting certain xattrs. > If we subsequently enter a user-namespace, then we are still permitted > to edit files in D. > > But if we enter a user-namespace and subsequently utilize D as the lower > layer of an overlayfs, then editing files in D necessitates the creation > of files in the upper layer with uids which are outside the user > namespace. And copying up files copies up all xattrs, setuid, setgid, etc. So what if the lowerdir was in a nosuid mount and we copy up a suid-root file to an upperdir that is in a mount without nosuid? Actually we have protections against that specific scenario. I give it as an example of how copy-up can be dangerous if the user doesn't control the ids of the file being copied. > While restriciting the permissible range of uids in the upper layer may > well enhance security, it also limits the utility of overlayfs. > overlayfs will sometimes deny permissions which were granted in the > initial namespace. overlayfs will remain useful for mounting a rootfs > (since all uids are within the user-namespace), but not for mounting > directories onto the rootfs. Security is part of it, but not the full story. tmpfs is a little unusual though, it's easier to understand if you think of using a filesystem with a backing store (understanding that it's generally not possible to mount these filesystems in user namespaces, but some support for that is coming). When we do a mount of a filesystem in a user namespace, the kernel will interpret the uids in that filesystem as being in the user ns. So if your user ns mapping is 0:100000:65536 id 0 in the filesystem will be mapped to id 100000 in the kernel. That's done for a couple of reasons - it means that id 0 in the filesystem shows up as id 0 in the user ns where you mounted the filesystem, and it means a user can't inject inodes into the kernel with an id that the user doesn't have control of (i.e. any id not mapped into the user ns). This also means that a kuid outside of the rante 100000-165535 is literally meaningless with respect to that mount - the kernel has no way to map it to an id valid in the filesystem. That is the real reason for the patch. Back to tmpfs - since it has no backing store that restriction isn't needed for that reason, and we could treat all tmpfs mounts as being in init_user_ns. However the way it is now does have the benefit of reducing the kernel's attack surface. And as of 4.8-rc this patch is upstream, so upstream tmpfs is going to behave the same way xenial does now. I'm still mulling all of this over. The truth is that this is a regression in Ubuntu because we allow overlayfs mounts in user namespaces, but upstream does not so there's no regression there and thus it may be difficult to convince upstream to change the behavior. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1617388 Title: When using overlayfs with kernel 4.4, some files cannot be deleted. Status in linux package in Ubuntu: Triaged Bug description: #!/bin/bash # --------------------------------------------------------------------- # This script exhibits a bug in overlayfs in kernel 4.4. # The bug is not present in kernel 4.2. # The bug can be reproduced in an x86_64 virtual-machine; # 32-bit has not been tested. # # With kernel 4.2, the script output ends with: # "script completed without encountering a kernel bug" # # With kernel 4.4, the script output ends with: # "rm: cannot remove ‘mnt_ovl/sub/sub.txt’: # Value too large for defined data type" # # The script depends upon lxc-usernsexec (part of the lxc1 package) to # create a user-namespace. # # The script should be run as a normal user (not root), in a directory where # the user has write-permission: # ./script # -------------------------------------------------------------------- cleanup() { [[ -d "$storedir" ]] || exit 1 cd "$storedir" || exit 1 [[ -d "$tmpdir" ]] || exit 1 lxc-usernsexec -m b:0:1000:1 -m b:100000:100000:1 -- rm -rf "$tmpdir" } trap cleanup EXIT set -e storedir="$(pwd)" # create tmpdir tmpdir="$(mktemp -d --tmpdir=.)" cd "$tmpdir" # create lowerdir for overlay mkdir -p lower/sub touch lower/lower.txt lower/sub/sub.txt cd .. chmod -R a+rwX "$tmpdir" # run a script in a user namepace lxc-usernsexec -m b:0:100000:65534 -- bash << EOF set -e cd "$tmpdir" # create tmpfs mkdir mnt_tmpfs mount -t tmpfs tmpfs mnt_tmpfs # create upperdir and workdir for overlay mkdir mnt_tmpfs/{upper,work} # mount overlay mkdir mnt_ovl mount -t overlay \ -o lowerdir=lower,upperdir=mnt_tmpfs/upper,workdir=mnt_tmpfs/work \ overlay mnt_ovl echo 'overlay directory listing' ls -RF mnt_ovl echo '' set -x rm mnt_ovl/lower.txt # always succeeds rm mnt_ovl/sub/sub.txt # fails with kernel 4.4+ set +x echo 'script completed without encountering a kernel bug' EOF To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1617388/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp