On Sat, Apr 20, 2024 at 09:33:07PM +0000, Jordan Glover wrote: > bubblwrap has --disable-userns option which prevents creation of nested > namespaces (from manpage): > > --disable-userns > Prevent the process in the sandbox from creating further user namespaces, so > that it cannot rearrange the filesystem namespace or do other more complex > namespace modification. This is currently implemented by setting the > user.max_user_namespaces sysctl to 1, and then entering a nested user > namespace which is unable to raise that limit in the outer namespace. This > option requires --unshare-user, and doesn't work in the setuid version of > bubblewrap. > > Flatpak uses this (or seccomp filter) to block nested namespaces as this can > bypass security its design. For this reason firefox own sandbox doesn't use > namespaces in flatpak, see > https://bugzilla.mozilla.org/show_bug.cgi?id=1756236
Thanks, I didn't expect it was this advanced already. In what exact way would nested namespaces bypass the security design of Flatpak? Is this about the kernel's attack surface exposed by capabilities in a namespace or something else? I guess capabilities are also dropped in the nested namespace? After reviewing some kernel code, I have doubts as to how effective the dropping of capabilities in a namespace actually is. security/commoncap.c: cap_capable() includes this: /* * The owner of the user namespace in the parent of the * user namespace has all caps. */ if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid)) return 0; this check is only reached when cap_capable() is called for a target namespace other than one the credentials are from. However, such uses do exist, e.g. via Netlink, which would expose e.g. Netfilter: net/netlink/af_netlink.c: /** * netlink_net_capable - Netlink network namespace message capability test * @skb: socket buffer holding a netlink command from userspace * @cap: The capability to use * * Test to see if the opener of the socket we received the message * from had when the netlink socket was created and the sender of the * message has the capability @cap over the network namespace of * the socket we received the message from. */ bool netlink_net_capable(const struct sk_buff *skb, int cap) { return netlink_ns_capable(skb, sock_net(skb->sk)->user_ns, cap); } So I worry whether even with all namespaces in a sandbox having dropped capabilities, an attack can still be arranged (with a pair of namespaces one nested in the other) where a task effectively "has all caps" for a dangerous operation like configuring Netfilter due to it hitting code paths like this, which bypass capability bit checks. The above finding may be a reason for us to prefer making capabilities in a namespace ineffective vs. dropping capabilities. In context of my idea/proposal for a new sysctl, it could be better for it to work as I had described, overriding security_capable() return, instead of e.g. hooking return of create_user_ns() and dropping new cred's capabilities. I hope the Ubuntu/AppArmor solution is also safe in this respect, as it sounds like it similarly makes capabilities ineffective instead of dropping them. Alexander