On Sun, 05 Feb 2023 at 21:23:18 +0100, Helmut Grohne wrote:
> What is needed to make this work? mmdebstrap --mode=unshare requires the
> following features:
>  * unprivileged unsharing of user namespaces
>    - This is prohibited on DSA machines via a sysctl
>    - It works on most other systems
>    - Test case: unshare -U true
>  * A subuid allocation in /etc/subuid
>    - Allocated by default during user creation
>    - Test case: grep -q ^$(id -un): /etc/subuid

To make this genuinely useful for typical container use-cases, you
usually want a block of 65536 uids; otherwise you'll tend to get weird
failures. User creation allocates a suitable block by default.

>  * subuid allocation must be mapped by container technology (if any)
>    - I suppose the unshare backend fails this. Likely also unprivileged
>      podmand.
>  * It must be possible to mount proc in the unshared user+mount+pid
>    namespace.
>    - This should always work but may be restricted by the container
>      technology for some reason.

Container technologies that don't unshare the user namespace (uid 0 on
the host = uid 0 in the container), notably Docker and older privileged
setups for lxc, have no userns-based protection from misuse of things
like /proc/sysrq-trigger; so they have to either prevent those attacks
some other way, which tends to involve disallowing mounting /proc, or
accept that root in the container gives you root in real life (leading
to the mantra "containers don't contain").

>    - Test case: unshare -U -m -p -f -r --mount-proc true
>    - Paul tried this in the operational lxc containers. Successfully.
>    - I tried this in a local autopkgtest-unstable lxc container.
>      Successfully (unprivileged).
>    - Johannes reported that this would be the step that fails.
>  * Maybe more, but I don't know what would be missing. Maybe Johannes
>    knows.

A somewhat weaker form of user namespace support could be summarized
as "enough to run bubblewrap", which requires unprivileged unsharing
of user namespaces and the ability to mount a new instance of /proc,
but does not require a subuid allocation or any setuid helpers like
newuidmap/newgidmap.

This would be enough for single-user single-app sandboxes like bubblewrap,
Flatpak, epiphany-browser (GNOME Web), and probably Chromium's sandboxing
(which doesn't use bubblewrap, but makes similar syscalls itself),
but not enough for containers that want multiple uids, such as podman
and mmdebstrap/unshare.

    smcv

Reply via email to