On Wed, Jul 8, 2015 at 8:46 AM, Ludovic Courtès <l...@gnu.org> wrote: > "Thompson, David" <dthomps...@worcester.edu> skribis: > >> On Tue, Jul 7, 2015 at 6:28 AM, Ludovic Courtès <l...@gnu.org> wrote: > > [...] > >>>> (lambda () >>>> (sethostname "guix-0.8.3")) >>> >>> Surprisingly, calling ‘getpid’ in the thunk returns the PID of the >>> parent (I was expecting it to return 1.) Not sure why that is the >>> case. I’m still amazed that this works as non-root, BTW. >> >> The first process created inside the PID namespace gets the honor of >> being PID 1, not the process created with the 'clone' call. >> >> For more information, see: https://lwn.net/Articles/532748/ > > To me, the thunk above is just like ‘childFunc’ in > <https://lwn.net/Articles/533492/>–i.e., it’s the procedure that ‘clone’ > calls in the first child process of the new PID name space. > > What am I missing?
It's non-intuitive because PID namespaces are given special treatment. The cloned process is like PID 1 in the sense that if you fork, the new process is PID 2. However, if you call 'getpid' in the cloned process, it returns the PID in the context of the parent PID namespace, and you are expecting PID 1. In that example from LWN, 'childFunc' calls 'execvp', and *that* new process becomes PID 1 (and 'getpid' agrees). This is the usual pattern I see in all container implementations: The process that calls clone sets up the environment and then execs the real init system. Is it more clear now? >>> There’s an issue when the parent’s Guile is not mapped into the >>> container’s file system: ‘use-modules’ forms and auto-loading will fail. >>> For instance, I did (use-modules (ice-9 ftw)) in the parent and called >>> ‘scandir’ in the child, but that failed because of an attempt to >>> auto-load (ice-9 i18n), which is unavailable in the container. >> >> Hmm, I don't know of a way to deal with that other than the user being >> careful to bind-mount in the Guile modules they need. > > Right. Maybe the best we can do is to add a word of caution in the > docstring or something. Okay, I will do that. >> Hmm, there's various reasons that EINVAL would be thrown. Could you >> readlink "those" files, that is /proc/<pid-outside-container>/ns/user >> and /proc/<pid-inside-container>/ns/user, and tell me if the contents >> are the same? They shouldn't be, but this will eliminate one of the >> possible causes of EINVAL. > > It turns out I was targeting the wrong PID. Glad it's not totally broken on machines other than mine. :) >>> Also, I think we should add --expose and --share as for ‘guix system’, >>> though that can come later. >> >> Yes, I also really want that, but it's a task for another time. > > Sure. > >>>> Here's how you build it: >>>> >>>> guix system container container.scm >>> >>> Very neat. I wonder if that should automatically override the >>> ‘file-systems’ field to be ‘%container-file-systems’, so that one can >>> reuse existing OS declarations unmodified. WDYT? >> >> This would be a better user experience, for sure. I thought about >> this, but I don't know how to do it in a way that isn't surprising or >> just broken. Ideas? > > IMO it’d be fine to simply override the subset of ‘file-systems’ that > clashes with ‘%container-file-systems’, similar to what > ‘virtualized-operating-system’ does in (gnu system vm). I will implement that. Thanks! - Dave