Quoting Michael H. Warfield (m...@wittsend.com): > On Tue, 2013-11-19 at 14:53 -0600, Serge Hallyn wrote: > > Quoting Michael H. Warfield (m...@wittsend.com): > > > If autodev is not specifically set to 0 or 1, attempts to determine if > > > systemd is being utilized and forces autodev=1 to prevent host system > > > conflicts and collisions. > > > > > > If autodev is enabled and the host /dev is mounted with devtmpfs > > > or /dev/.lxc is mounted with another file system... > > > > > > Each container created by a privileged user gets a /dev directory > > > mapped off the host /dev here: > > > > > > /dev/.lxc/${name}.$( hash $lxcpath/$name ) > > > > > > Each container created by a non-privileged user gets a /dev/directory > > > mapped off the host /dev here: > > > > > > /dev/.lxc/user/${name}.$( hash $lxcpath/$name ) > > > > > > The /dev/.lxc/user is mode 1777 to allow unpriv access. > > > > > > The /dev/.lxc/{containerdev} is bind mounted into the container /dev. > > > > > > Fallback on failure is to mount tmpfs into the container /dev. > > > > > > A symlink is created from $lxcpath/$name/rootfs.dev back to the /dev > > > relative directory to provid a code consistent reference for updating > > > container devs. > > > > Hi Michael, > > > > I haven't run it, but looking over the code in detail, it looks good > > to me. I know Stéphane in the past has objected to trying to autodetect > > systemd based containers. IMO it'd be fine to expect the templates to > > set lxc.autodev, as I worry about building distro-detecting hacks into > > core lxc, but unless Stéphane objects I don't care enough in this case > > to nack it. > > > Did you consider this patch good enough to apply, or was this still an > > RFC? I'm happy either way. > > I do consider it ready, which is why I, belatedly, signed off on it.
Hi Stéphane, Please let me know if you're ok with the check_autodev() below. -serge > > > Acked-by: Serge E. Hallyn <serge.hal...@ubuntu.com> > > Regards, > Mike > > > > > > --- > > > src/lxc/conf.c | 369 > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > > > src/lxc/conf.h | 4 +- > > > src/lxc/start.c | 2 +- > > > 3 files changed, 363 insertions(+), 12 deletions(-) > > > > > > diff --git a/src/lxc/conf.c b/src/lxc/conf.c > > > index a756731..d17cb2a 100644 > > > --- a/src/lxc/conf.c > > > +++ b/src/lxc/conf.c > > > @@ -29,6 +29,7 @@ > > > #include <string.h> > > > #include <dirent.h> > > > #include <unistd.h> > > > +#include <inttypes.h> > > > #include <sys/wait.h> > > > #include <sys/syscall.h> > > > #include <time.h> > > > @@ -1164,20 +1165,275 @@ static int setup_rootfs_pivot_root(const char > > > *rootfs, const char *pivotdir) > > > return 0; > > > } > > > > > > + > > > +/* > > > + * Note: This is a verbatum copy of what is in monitor.c. We're just > > > + * usint it here to generate a safe subdirectory in /dev/ for the > > > + * containers /dev/ > > > + */ > > > + > > > +/* Note we don't use SHA-1 here as we don't want to depend on > > > HAVE_GNUTLS. > > > + * FNV has good anti collision properties and we're not worried > > > + * about pre-image resistance or one-way-ness, we're just trying to make > > > + * the name unique in the 108 bytes of space we have. > > > + */ > > > +#define FNV1A_64_INIT ((uint64_t)0xcbf29ce484222325ULL) > > > +static uint64_t fnv_64a_buf(void *buf, size_t len, uint64_t hval) > > > +{ > > > + unsigned char *bp; > > > + > > > + for(bp = buf; bp < (unsigned char *)buf + len; bp++) > > > + { > > > + /* xor the bottom with the current octet */ > > > + hval ^= (uint64_t)*bp; > > > + > > > + /* gcc optimised: > > > + * multiply by the 64 bit FNV magic prime mod 2^64 > > > + */ > > > + hval += (hval << 1) + (hval << 4) + (hval << 5) + > > > + (hval << 7) + (hval << 8) + (hval << 40); > > > + } > > > + > > > + return hval; > > > +} > > > + > > > +/* > > > + * Check to see if a directory has something mounted on it and, > > > + * if it does, return the fstype. > > > + * > > > + * Code largely based on detect_shared_rootfs below > > > + * > > > + * Returns: # of matching entries in /proc/self/mounts > > > + * if != 0 fstype is filled with the last filesystem value. > > > + * if == 0 no matches found, fstype unchanged. > > > + * > > > + * ToDo: Maybe return the mount options in another parameter... > > > + */ > > > + > > > +#define LINELEN 4096 > > > +#define MAX_FSTYPE_LEN 128 > > > +int mount_check_fs( const char *dir, char *fstype ) > > > +{ > > > + char buf[LINELEN], *p; > > > + struct stat s; > > > + FILE *f; > > > + int found_fs = 0; > > > + char *p2; > > > + > > > + DEBUG("entering mount_check_fs for %s\n", dir); > > > + > > > + if ( 0 != access(dir, F_OK) || 0 != stat(dir, &s) || 0 == > > > S_ISDIR(s.st_mode) ) { > > > + return 0; > > > + } > > > + > > > + process_lock(); > > > + f = fopen("/proc/self/mounts", "r"); > > > + process_unlock(); > > > + if (!f) > > > + return 0; > > > + while ((p = fgets(buf, LINELEN, f))) { > > > + p = index(buf, ' '); > > > + if( !p ) > > > + continue; > > > + *p = '\0'; > > > + p2 = p + 1; > > > + > > > + p = index(p2, ' '); > > > + if( !p ) > > > + continue; > > > + *p = '\0'; > > > + > > > + /* Compare the directory in the entry to desired */ > > > + if( strcmp( p2, dir ) ) { > > > + continue; > > > + } > > > + > > > + p2 = p + 1; > > > + p = index( p2, ' '); > > > + if( !p ) > > > + continue; > > > + *p = '\0'; > > > + > > > + ++found_fs; > > > + > > > + if( fstype ) { > > > + strncpy( fstype, p2, MAX_FSTYPE_LEN - 1 ); > > > + fstype [ MAX_FSTYPE_LEN - 1 ] = '\0'; > > > + } > > > + } > > > + > > > + process_lock(); > > > + fclose(f); > > > + process_unlock(); > > > + > > > + DEBUG("mount_check_fs returning %d last %s\n", found_fs, fstype); > > > + > > > + return found_fs; > > > +} > > > + > > > +/* > > > + * Locate a devtmpfs mount (should be on /dev) and create a container > > > + * subdirectory on it which we can then bind mount to the container > > > + * /dev instead of mounting a tmpfs there. > > > + * If we fail, return NULL. > > > + * Else return the pointer to the name buffer with the string to > > > + * the devtmpfs subdirectory. > > > + */ > > > + > > > +char *mk_devtmpfs(const char *name, char *path, const char *lxcpath) > > > +{ > > > + int ret; > > > + struct stat s; > > > + char tmp_path[MAXPATHLEN]; > > > + char fstype[MAX_FSTYPE_LEN]; > > > + char *base_path = "/dev/.lxc"; > > > + char *user_path = "/dev/.lxc/user"; > > > + uint64_t hash; > > > + > > > + if ( 0 != access(base_path, F_OK) || 0 != stat(base_path, &s) || 0 == > > > S_ISDIR(s.st_mode) ) { > > > + /* This is just making /dev/.lxc it better work or we're done */ > > > + ret = mkdir(base_path, S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | > > > S_IXOTH); > > > + if ( ret ) { > > > + SYSERROR( "Unable to create /dev/.lxc for autodev" ); > > > + return NULL; > > > + } > > > + } > > > + > > > + /* > > > + * Programmers notes: > > > + * We can not do mounts in this area of code that we want > > > + * to be visible in the host. Consequently, /dev/.lxc must > > > + * be set up earlier if we need a tmpfs mounted there. > > > + * That only affects the rare cases where autodev is enabled > > > + * for a container and devtmpfs is not mounted on /dev in the > > > + * host. In that case, we'll fall back to the old method > > > + * of mounting a tmpfs in the container and have no visibility > > > + * into the container /dev. > > > + */ > > > + if( ! mount_check_fs( "/dev", fstype ) > > > + || strcmp( "devtmpfs", fstype ) ) { > > > + /* Either /dev was not mounted or was not devtmpfs */ > > > + > > > + if ( ! mount_check_fs( "/dev/.lxc", NULL ) ) { > > > + /* > > > + * /dev/.lxc is not already mounted > > > + * Doing a mount here does no good, since > > > + * it's not visible in the host. > > > + */ > > > + > > > + ERROR("/dev/.lxc is not setup - taking fallback" ); > > > + return NULL; > > > + } > > > + } > > > + > > > + if ( 0 != access(user_path, F_OK) || 0 != stat(user_path, &s) || 0 == > > > S_ISDIR(s.st_mode) ) { > > > + /* > > > + * This is making /dev/.lxc/user path for non-priv users. > > > + * If this doesn't work, we'll have to fall back in the > > > + * case of non-priv users. It's mode 1777 like /tmp. > > > + */ > > > + ret = mkdir(user_path, S_IRWXU | S_IRWXG | S_IRWXO | S_ISVTX); > > > + if ( ret ) { > > > + /* Issue an error but don't fail yet! */ > > > + ERROR("Unable to create /dev/.lxc/user"); > > > + } > > > + /* Umask tends to screw us up here */ > > > + chmod(user_path, S_IRWXU | S_IRWXG | S_IRWXO | S_ISVTX); > > > + } > > > + > > > + /* > > > + * Since the container name must be unique within a given > > > + * lxcpath, we're going to use a hash of the path > > > + * /lxcpath/name as our hash name in /dev/.lxc/ > > > + */ > > > + > > > + ret = snprintf(tmp_path, MAXPATHLEN, "%s/%s", lxcpath, name); > > > + if (ret < 0 || ret >= MAXPATHLEN) > > > + return NULL; > > > + > > > + hash = fnv_64a_buf(tmp_path, ret, FNV1A_64_INIT); > > > + > > > + ret = snprintf(tmp_path, MAXPATHLEN, "%s/%s.%016" PRIx64, base_path, > > > name, hash); > > > + if (ret < 0 || ret >= MAXPATHLEN) > > > + return NULL; > > > + > > > + if ( 0 != access(tmp_path, F_OK) || 0 != stat(tmp_path, &s) || 0 == > > > S_ISDIR(s.st_mode) ) { > > > + ret = mkdir(tmp_path, S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | > > > S_IXOTH); > > > + if ( ret ) { > > > + /* Something must have failed with the base_path... > > > + * Maybe unpriv user. Try user_path now... */ > > > + INFO("Setup in /dev/.lxc failed. Trying > > > /dev/.lxc/user." ); > > > + > > > + ret = snprintf(tmp_path, MAXPATHLEN, "%s/%s.%016" > > > PRIx64, user_path, name, hash); > > > + if (ret < 0 || ret >= MAXPATHLEN) > > > + return NULL; > > > + > > > + if ( 0 != access(tmp_path, F_OK) || 0 != stat(tmp_path, > > > &s) || 0 == S_ISDIR(s.st_mode) ) { > > > + ret = mkdir(tmp_path, S_IRWXU | S_IRGRP | > > > S_IXGRP | S_IROTH | S_IXOTH); > > > + if ( ret ) { > > > + ERROR("Container /dev setup in host > > > /dev failed - taking fallback" ); > > > + return NULL; > > > + } > > > + } > > > + } > > > + } > > > + > > > + strcpy( path, tmp_path ); > > > + return path; > > > +} > > > + > > > + > > > /* > > > * Do we want to add options for max size of /dev and a file to > > > * specify which devices to create? > > > */ > > > -static int mount_autodev(char *root) > > > +static int mount_autodev(const char *name, char *root, const char > > > *lxcpath) > > > { > > > int ret; > > > + struct stat s; > > > char path[MAXPATHLEN]; > > > + char host_path[MAXPATHLEN]; > > > + char devtmpfs_path[MAXPATHLEN]; > > > > > > INFO("Mounting /dev under %s\n", root); > > > + > > > + ret = snprintf(host_path, MAXPATHLEN, "%s/%s/rootfs.dev", lxcpath, > > > name); > > > + if (ret < 0 || ret > MAXPATHLEN) > > > + return -1; > > > + > > > ret = snprintf(path, MAXPATHLEN, "%s/dev", root); > > > if (ret < 0 || ret > MAXPATHLEN) > > > return -1; > > > - ret = mount("none", path, "tmpfs", 0, "size=100000"); > > > + > > > + if (mk_devtmpfs( name, devtmpfs_path, lxcpath ) ) { > > > + /* > > > + * Get rid of old links and directoriess > > > + * This could be either a symlink and we remove it, > > > + * or an empty directory and we remove it, > > > + * or non-existant and we don't care, > > > + * or a non-empty directory, and we will then emit an error > > > + * but we will not fail out the process. > > > + */ > > > + unlink( host_path ); > > > + rmdir( host_path ); > > > + ret = symlink(devtmpfs_path, host_path); > > > + > > > + if ( ret < 0 ) { > > > + SYSERROR("WARNING: Failed to create symlink > > > '%s'->'%s'\n", host_path, devtmpfs_path); > > > + } > > > + DEBUG("Bind mounting %s to %s", devtmpfs_path , path ); > > > + ret = mount(devtmpfs_path, path, NULL, MS_BIND, 0 ); > > > + } else { > > > + /* Only mount a tmpfs on here if we don't already a mount */ > > > + if ( ! mount_check_fs( host_path, NULL ) ) { > > > + DEBUG("Mounting tmpfs to %s", host_path ); > > > + ret = mount("none", path, "tmpfs", 0, "size=100000"); > > > + } else { > > > + /* This allows someone to manually set up a mount */ > > > + DEBUG("Bind mounting %s to %s", host_path, path ); > > > + ret = mount(host_path , path, NULL, MS_BIND, 0 ); > > > + } > > > + } > > > if (ret) { > > > SYSERROR("Failed to mount /dev at %s\n", root); > > > return -1; > > > @@ -1185,10 +1441,16 @@ static int mount_autodev(char *root) > > > ret = snprintf(path, MAXPATHLEN, "%s/dev/pts", root); > > > if (ret < 0 || ret >= MAXPATHLEN) > > > return -1; > > > - ret = mkdir(path, S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH); > > > - if (ret) { > > > - SYSERROR("Failed to create /dev/pts in container"); > > > - return -1; > > > + /* > > > + * If we are running on a devtmpfs mapping, dev/pts may already exist. > > > + * If not, then create it and exit if that fails... > > > + */ > > > + if ( 0 != access(path, F_OK) || 0 != stat(path, &s) || 0 == > > > S_ISDIR(s.st_mode) ) { > > > + ret = mkdir(path, S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | > > > S_IXOTH); > > > + if (ret) { > > > + SYSERROR("Failed to create /dev/pts in container"); > > > + return -1; > > > + } > > > } > > > > > > INFO("Mounted /dev under %s\n", root); > > > @@ -2369,6 +2631,7 @@ struct lxc_conf *lxc_conf_init(void) > > > > > > new->loglevel = LXC_LOG_PRIORITY_NOTSET; > > > new->personality = -1; > > > + new->autodev = -1; > > > new->console.log_path = NULL; > > > new->console.log_fd = -1; > > > new->console.path = NULL; > > > @@ -3136,7 +3399,89 @@ int ttys_shift_ids(struct lxc_conf *c) > > > return 0; > > > } > > > > > > -int lxc_setup(const char *name, struct lxc_conf *lxc_conf, const char > > > *lxcpath, struct cgroup_process_info *cgroup_info) > > > +/* > > > + * This routine is called when the configuration does not already > > > specify a value > > > + * for autodev (mounting a file system on /dev and populating it in a > > > container). > > > + * If a hard override value has not be specified, then we try to apply > > > some > > > + * heuristics to determine if we should switch to autodev mode. > > > + * > > > + * For instance, if the container has an /etc/systemd/system directory > > > then it > > > + * is probably running systemd as the init process and it needs the > > > autodev > > > + * mount to prevent it from mounting devtmpfs on /dev on it's own > > > causing conflicts > > > + * in the host. > > > + * > > > + * We may also want to enable autodev if the host has devtmpfs mounted > > > on its > > > + * /dev as this then enable us to use subdirectories under /dev for the > > > container > > > + * /dev directories and we can fake udev devices. > > > + */ > > > +struct start_args { > > > + char *const *argv; > > > +}; > > > + > > > +#define MAX_SYMLINK_DEPTH 32 > > > + > > > +int check_autodev( const char *rootfs, void *data ) > > > +{ > > > + struct start_args *arg = data; > > > + int ret; > > > + int loop_count = 0; > > > + struct stat s; > > > + char absrootfs[MAXPATHLEN]; > > > + char path[MAXPATHLEN]; > > > + char abs_path[MAXPATHLEN]; > > > + char *command = "/sbin/init"; > > > + > > > + if (rootfs == NULL || strlen(rootfs) == 0) > > > + return -2; > > > + > > > + if (!realpath(rootfs, absrootfs)) > > > + return -2; > > > + > > > + if( arg && arg->argv[0] ) { > > > + command = arg->argv[0]; > > > + DEBUG("Set exec command to %s\n", command ); > > > + } > > > + > > > + strncpy( path, command, MAXPATHLEN-1 ); > > > + > > > + if ( 0 != access(path, F_OK) || 0 != stat(path, &s) ) > > > + return -2; > > > + > > > + /* Dereference down the symlink merry path testing as we go. */ > > > + /* If anything references systemd in the path - set autodev! */ > > > + /* Renormalize to the rootfs before each dereference */ > > > + /* Relative symlinks should fall out in the wash even with .. */ > > > + while( 1 ) { > > > + if ( strstr( path, "systemd" ) ) { > > > + INFO("Container with systemd init detected - enabling > > > autodev!"); > > > + return 1; > > > + } > > > + > > > + ret = snprintf(abs_path, MAXPATHLEN-1, "%s/%s", absrootfs, > > > path); > > > + if (ret < 0 || ret > MAXPATHLEN) > > > + return -2; > > > + > > > + ret = readlink( abs_path, path, MAXPATHLEN-1 ); > > > + > > > + if ( ( ret <= 0 ) || ( ++loop_count > MAX_SYMLINK_DEPTH ) ) { > > > + break; /* Break out for other tests */ > > > + } > > > + path[ret] = '\0'; > > > + } > > > + > > > + /* > > > + * Add future checks here. > > > + * Return positive if we should go autodev > > > + * Return 0 if we should NOT go autodev > > > + * Return negative if we encounter an error or can not determine... > > > + */ > > > + > > > + /* All else fails, we don't need autodev */ > > > + INFO("Autodev not required."); > > > + return 0; > > > +} > > > + > > > +int lxc_setup(const char *name, struct lxc_conf *lxc_conf, const char > > > *lxcpath, struct cgroup_process_info *cgroup_info, void *data) > > > { > > > if (setup_utsname(lxc_conf->utsname)) { > > > ERROR("failed to setup the utsname for '%s'", name); > > > @@ -3158,8 +3503,12 @@ int lxc_setup(const char *name, struct lxc_conf > > > *lxc_conf, const char *lxcpath, > > > return -1; > > > } > > > > > > - if (lxc_conf->autodev) { > > > - if (mount_autodev(lxc_conf->rootfs.mount)) { > > > + if (lxc_conf->autodev < 0) { > > > + lxc_conf->autodev = check_autodev(lxc_conf->rootfs.mount, data); > > > + } > > > + > > > + if (lxc_conf->autodev > 0) { > > > + if (mount_autodev(name, lxc_conf->rootfs.mount, lxcpath)) { > > > ERROR("failed to mount /dev in the container"); > > > return -1; > > > } > > > @@ -3197,7 +3546,7 @@ int lxc_setup(const char *name, struct lxc_conf > > > *lxc_conf, const char *lxcpath, > > > return -1; > > > } > > > > > > - if (lxc_conf->autodev) { > > > + if (lxc_conf->autodev > 0) { > > > if (run_lxc_hooks(name, "autodev", lxc_conf, lxcpath, NULL)) { > > > ERROR("failed to run autodev hooks for container > > > '%s'.", name); > > > return -1; > > > diff --git a/src/lxc/conf.h b/src/lxc/conf.h > > > index 940d493..090c5b3 100644 > > > --- a/src/lxc/conf.h > > > +++ b/src/lxc/conf.h > > > @@ -356,7 +356,9 @@ extern int lxc_clear_hooks(struct lxc_conf *c, const > > > char *key); > > > > > > struct cgroup_process_info; > > > extern int lxc_setup(const char *name, struct lxc_conf *lxc_conf, > > > - const char *lxcpath, struct cgroup_process_info > > > *cgroup_info); > > > + const char *lxcpath, > > > + struct cgroup_process_info *cgroup_info, > > > + void *data); > > > > > > extern void lxc_rename_phys_nics_on_shutdown(struct lxc_conf *conf); > > > > > > diff --git a/src/lxc/start.c b/src/lxc/start.c > > > index 3b2ba8f..3d2bc86 100644 > > > --- a/src/lxc/start.c > > > +++ b/src/lxc/start.c > > > @@ -553,7 +553,7 @@ static int do_start(void *data) > > > #endif > > > > > > /* Setup the container, ip, names, utsname, ... */ > > > - if (lxc_setup(handler->name, handler->conf, handler->lxcpath, > > > handler->cgroup)) { > > > + if (lxc_setup(handler->name, handler->conf, handler->lxcpath, > > > handler->cgroup, handler->data) ){ > > > ERROR("failed to setup the container"); > > > goto out_warn_father; > > > } > > > -- > > > 1.8.3.1 > > > > > > > > > -- > > > Michael H. Warfield (AI4NB) | (770) 978-7061 | m...@wittsend.com > > > /\/\|=mhw=|\/\/ | (678) 463-0932 | > > > http://www.wittsend.com/mhw/ > > > NIC whois: MHW9 | An optimist believes we live in the best of > > > all > > > PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it! > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > November Webinars for C, C++, Fortran Developers > > > Accelerate application performance with scalable programming models. > > > Explore > > > techniques for threading, error checking, porting, and tuning. Get the > > > most > > > from the latest Intel processors and coprocessors. See abstracts and > > > register > > > http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk > > > > > _______________________________________________ > > > Lxc-devel mailing list > > > Lxc-devel@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/lxc-devel > > > > > > -- > Michael H. Warfield (AI4NB) | (770) 978-7061 | m...@wittsend.com > /\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/ > NIC whois: MHW9 | An optimist believes we live in the best of all > PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it! > ------------------------------------------------------------------------------ Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk _______________________________________________ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel