On Sat, Apr 20, 2024 at 09:33:07PM +0000, Jordan Glover wrote:
> bubblwrap has --disable-userns option which prevents creation of nested 
> namespaces (from manpage):
> 
>        --disable-userns
> Prevent the process in the sandbox from creating further user namespaces, so 
> that it cannot rearrange the filesystem namespace or do other more complex 
> namespace modification. This is currently implemented by setting the 
> user.max_user_namespaces sysctl to 1, and then entering a nested user 
> namespace which is unable to raise that limit in the outer namespace. This 
> option requires --unshare-user, and doesn't work in the setuid version of 
> bubblewrap.
> 
> Flatpak uses this (or seccomp filter) to block nested namespaces as this can 
> bypass security its design. For this reason firefox own sandbox doesn't use 
> namespaces in flatpak, see 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1756236

Thanks, I didn't expect it was this advanced already.

In what exact way would nested namespaces bypass the security design of
Flatpak?  Is this about the kernel's attack surface exposed by
capabilities in a namespace or something else?  I guess capabilities are
also dropped in the nested namespace?

After reviewing some kernel code, I have doubts as to how effective the
dropping of capabilities in a namespace actually is.

security/commoncap.c: cap_capable() includes this:

                /*
                 * The owner of the user namespace in the parent of the
                 * user namespace has all caps.
                 */
                if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, 
cred->euid))
                        return 0;

this check is only reached when cap_capable() is called for a target
namespace other than one the credentials are from.  However, such uses
do exist, e.g. via Netlink, which would expose e.g. Netfilter:

net/netlink/af_netlink.c:

/**
 * netlink_net_capable - Netlink network namespace message capability test
 * @skb: socket buffer holding a netlink command from userspace
 * @cap: The capability to use
 *
 * Test to see if the opener of the socket we received the message
 * from had when the netlink socket was created and the sender of the
 * message has the capability @cap over the network namespace of
 * the socket we received the message from.
 */
bool netlink_net_capable(const struct sk_buff *skb, int cap)
{
        return netlink_ns_capable(skb, sock_net(skb->sk)->user_ns, cap);
}

So I worry whether even with all namespaces in a sandbox having dropped
capabilities, an attack can still be arranged (with a pair of namespaces
one nested in the other) where a task effectively "has all caps" for a
dangerous operation like configuring Netfilter due to it hitting code
paths like this, which bypass capability bit checks.

The above finding may be a reason for us to prefer making capabilities
in a namespace ineffective vs. dropping capabilities.  In context of my
idea/proposal for a new sysctl, it could be better for it to work as I
had described, overriding security_capable() return, instead of e.g.
hooking return of create_user_ns() and dropping new cred's capabilities.

I hope the Ubuntu/AppArmor solution is also safe in this respect, as it
sounds like it similarly makes capabilities ineffective instead of
dropping them.

Alexander

Reply via email to