Thank you for this thourough investigation and for finding the workaround !
I just submitted a patch to the doc based on your email. Cheers, Edouard. conjaroy writes: > In an eariler bug comment [1] I corroborated that nscd was leaking > /etc/passwd information from the host OS into the Guix container, and I > wondered aloud why the container would use the host OS's nscd if there was > a risk of this happening. > > I've looked into how Guix configures its own nscd, and it turns out that by > default it enables lookups only for `hosts` and `services` - not for > `passwd`, `group`, or `netgroup`. Presumably, then, this configuration is > sufficient for nscd to prevent the glibc compatibility issues described in > the manual [3]. > > After adding the following 3 lines in nscd.conf on my foreign distro > (Debian 10) and restarting nscd, my Guix system containers were able to > boot successfully while talking to the daemon: > > enable-cache passwd no > enable-cache group no > enable-cache netgroup no > > So I think the bug here is that the Guix manual page advising the use of > nscd on a foreign distro [3] doesn't elaborate on which types of service > lookups are safe to enable in the daemon. If Guix is used only to build and > run binaries then perhaps it could use nscd for all lookups, but this is > evidently not the case for Guix system containers. > > > Cheers, > > Jason > > > [1] https://www.mail-archive.com/bug-guix@gnu.org/msg19915.html > [2] > https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?h=version-1.1.0#n1238 > [3] https://guix.gnu.org/manual/en/html_node/Application-Setup.html > > On Mon, Aug 24, 2020 at 11:15 PM conjaroy <conja...@gmail.com> wrote: > >> I've observed this error under similar circumstances: launching a guix >> system container script with network sharing enabled, on a foreign disto >> (Debian 10) with nscd running. >> >> Using `strace -f /gnu/store/...-run-container`, we can observe the >> container's lookup of user accounts via the foreign distro's nscd socket: >> >> [pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11 >> [pid 16582] connect(11, {sa_family=AF_UNIX, >> sun_path="/var/run/nscd/socket"}, 110) = 0 >> [pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21, >> MSG_NOSIGNAL, NULL, 0) = 21 >> [pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 >> ([{fd=11, revents=POLLIN}]) >> [pid 16582] read(11, >> "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"..., >> 36) = 36 >> [pid 16582] close(11) = 0 >> >> Since the user ("postgres") is indeed missing in the foreign disto, the >> lookup fails. In this case, disabling nscd on the foreign distro allowed >> the container script to run without error. >> >> Based on comments in https://issues.guix.info/issue/28128, I see that it >> was a deliberate choice to bind-mount the foreign distro's nscd socket >> inside the container (instead of starting a separate containerized nscd >> instance). But I'm having trouble seeing why it's acceptable to leak state >> from the foreign distro's user space into the container. Is there something >> I'm missing? >> >> Cheers, >> >> Jason >>