On 25.04.2025 20:39, Ludovic Courtès wrote:
Hi,
I committed the /etc/group fix in
0d3bc50b0cffeae05beb12d0c270c6599186c0d7 together with a test.
keinflue <keinf...@posteo.net> writes:
I think this happens if the user running guix-daemon has supplementary
groups. These are not mapped via /proc/gid_map in the build container
and therefore are reported as the overflow gid (65534) by getgroups.
The test cases assume that they can change ownership to this
additional group but that is not permitted on the overflow gid.
I think supplementary groups should be dropped in the user namespace
for the build container to make the behavior
reproducible. Unfortunately this may be impossible if the parent
namespace has set /proc/[...]/setgroups to "deny".
I came up with this test:
--8<---------------cut here---------------start------------->8---
(use-modules (guix)
(gcrypt hash)
(gnu packages bootstrap))
(computed-file "kvm-access"
#~(begin
(pk '#$(gettimeofday))
(let ((st (stat "/dev/kvm")))
(pk '/dev/kvm st)
(pk '/dev/kvm:owner (stat:uid st) (stat:gid st))
(pk 'getgroups (getgroups))
;; XXX: When running the daemon as root, /dev/kvm
is
;; owned by UID 0, which has no entry in
/etc/passwd.
;; (pk 'kvm-user (getpwuid (stat:uid st)))
;; xxx: /etc/group never contained an entry to the
"kvm"
;; group so the thing below always failed.
;; (pk 'kvm-group (getgrgid (stat:gid st)))
)
(when (open-fdes "/dev/kvm" O_RDWR)
(mkdir #$output)))
#:guile %bootstrap-guile)
--8<---------------cut here---------------end--------------->8---
Privileged:
--8<---------------cut here---------------start------------->8---
$ guix build -f ~/src/guix-debugging/dev-kvm-access.scm
substitute: looking for substitutes on 'http://192.168.1.48:8123'...
0.0%guix substitute: warning: 192.168.1.48: connection failed:
Connection timed out
substitute:
substitute: looking for substitutes on 'https://ci.guix.gnu.org'...
100.0%
substitute: looking for substitutes on
'https://bordeaux.guix.gnu.org'... 100.0%
substitute: looking for substitutes on
'https://guix.bordeaux.inria.fr'... 100.0%
The following derivation will be built:
/gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv
substitute: looking for substitutes on 'http://192.168.1.48:8123'...
0.0%
building /gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv...
;;; ((1745606160 . 233876))
;;; (/dev/kvm #(6 483 8624 1 0 984 2792 0 1745359386 1745359386
1745359386 4096 0 char-special 432 382791307 382791307 1745359386))
;;; (/dev/kvm:owner 0 984)
;;; (getgroups #(984 30000))
successfully built
/gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv
/gnu/store/36fin1iw2fh9066jg0y2fjd78j9wyjwp-kvm-access
--8<---------------cut here---------------end--------------->8---
Unprivileged:
--8<---------------cut here---------------start------------->8---
$ ./test-env guix build -f ~/src/guix-debugging/dev-kvm-access.scm
accepted connection from pid 2591, user ludo
accepted connection from pid 2601, user ludo
substitute: guix substitute: warning: ACL for archive imports seems to
be uninitialized, substitutes may be unavailable
substitute: guix substitute: warning: authentication and authorization
of substitutes disabled!
The following derivation will be built:
/home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv
substitute: guix substitute: warning: authentication and authorization
of substitutes disabled!
building
/home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv...
;;; ((1745606200 . 636919))
;;; (/dev/kvm #(6 483 8624 1 65534 65534 2792 0 1745359386 1745359386
1745359386 4096 0 char-special 432 382791307 382791307 1745359386))
;;; (/dev/kvm:owner 65534 65534)
;;; (getgroups #(65534 65534 65534 65534 65534 65534 65534 30000
65534))
successfully built
/home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv
/home/ludo/src/guix/test-tmp/store/ffh8zaw279dgdsh6q54mlldh4nikxiqp-kvm-access
--8<---------------cut here---------------end--------------->8---
In both cases, /dev/kvm is accessible.
In both cases, only the primary group has an entry in /etc/group;
supplementary groups are lacking.
So:
1. I don’t think we need to map the “kvm” UID/GID into the user
namespace;
For the purpose of the passive permission checks that is not necessary,
yes. There are no uids or gids being translated between the user
namespaces. However if all supplementary groups would be dropped, that
would include the kvm group and then this test will fail to access
/dev/kvm. That was the problem I saw with that first suggestion.
2. I’m confused as to what makes the Coreutils test suite fail.
The result from getgroups includes both the primary gid 30000 and a
supplementary gid 65534 (where the repeated 65534 are the overflow gid
produced by viewing supplementary gids that aren't mapped into the user
namespace via /proc/[pid]/gid_map).
Coreutils sees this and so assumes that it can do the equivalent of
touch testfile
chgrp 65534 testfile
to create a file owned by group 30000 initially and to then change group
ownership of that file to 65534. Normally an unprivileged user is
allowed to change group ownership of files they own between groups that
they are member of, so this would always succeed outside a user
namespace context.
However, any uid/gid used inside the user namespace is translated back
to the host namespace via the uid/gid_map before permission checks. But
in this case because 65534 doesn't map back to any gid in the host
namespace, the syscall will fail.
If there is no supplementary group reported by getgroups at all, then
coreutils just skips the test and it is ok again. Probably the coreutils
test case should remove any gid reported by getgroups that is equal to
the overflow gid before making that decision.
Dropping all supplementary groups from the build process (after unshare
and before writing "deny" to /proc/pid/setgroups) would make it so that
this test case is always skipped by having getgroups only report 30000,
however that would also drop the kvm group as mentioned above and is
also not permitted in all environments (e.g. when the parent namespace
already set /proc/[pid]/setgroups to "deny").
So I think that instead either all supplementary groups of the user or
at least the kvm group specifically needs to be mapped via
/proc/[pid]/gid_map. When doing so getgroups would report 30000 and 984
(assuming identity gid map for 984) in your test case above and the
coreutils test case would work again, because
chgrp 984 testfile
would then succeed with 984 mapping back to the host namespace to a
supplementary group of the process.
From a point of reproducibility and information leakage into the build
container I think however that it would be preferable to not retain
supplementary groups if possible. In contrast to the privileged build
with a distinct build user that the can be given desired supplementary
groups at will, the unprivileged environment may be one where the
supplementary groups of the user running the daemon can't easily be
changed to what is supposed to be seen in the build environment.
The contents of /etc/group are not relevant for this test case failure,
they are never consulted.
But a few other asides (for which I don't necessarily think anything
should be changed):
- I also noticed that the build container /etc/group is written with
65534 assumed as overflow gid. I am not sure whether anyone actually
does this, but the overflow uid/gid are technically configurable (and
retrievable) via sysctl entries (/proc/sys/kernel/overflow(uid|gid)).
65534 is just the default value.
- I also noticed that the operating-system defaults do not write an
entry for the overflow gid to /etc/group (while they do for the overflow
uid to /etc/passwd). I think such an entry should exist by default as
well. The entry for /etc/passwd also assumes the default overflow uid of
65534. This isn't only relevant for a user namespace context, but also
file systems that can't map the whole range of Linux uids/gids.
It would still be good to drop any supplementary group other than “kvm”
though.
WDYT?
Thanks,
Ludo’.