[lxc-devel] [PATCH 2/2] arguments: remove trailing slashes for the input lxcpath

2013-09-24 Thread Qiang Huang
In lxc_cmd(), we use
snprintf(path, len, "%s/%s/command", lxcpath ? lxcpath : inpath, name);
to fill sock name, this assume lxcpath have no trailing slashes, so
if we use
lxc-info -n test -P /usr/local/var/lib/lxc_anon/
to get a running container's state, we will get state: STOPPED which
is wrong, because we combine a wrong sock name.

To fix this, just remove trailing slashes when parsing arguments.

Signed-off-by: Qiang Huang 
---
 src/lxc/arguments.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/lxc/arguments.c b/src/lxc/arguments.c
index c35dfd8..adcf8fe 100644
--- a/src/lxc/arguments.c
+++ b/src/lxc/arguments.c
@@ -197,6 +197,7 @@ extern int lxc_arguments_parse(struct lxc_arguments *args,
case 'l':   args->log_priority = optarg; break;
case 'q':   args->quiet = 1; break;
case 'P':
+   remove_trailing_slashes(optarg);
ret = lxc_arguments_lxcpath_add(args, optarg);
if (ret < 0)
return ret;
-- 
1.8.3


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/2] utils: move remove_trailing_slashes to utils

2013-09-24 Thread Qiang Huang

Signed-off-by: Qiang Huang 
---
 src/lxc/lxccontainer.c | 7 ---
 src/lxc/utils.c| 7 +++
 src/lxc/utils.h| 1 +
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c
index ea5a84e..f5d41b3 100644
--- a/src/lxc/lxccontainer.c
+++ b/src/lxc/lxccontainer.c
@@ -69,13 +69,6 @@ static bool file_exists(char *f)
return stat(f, &statbuf) == 0;
 }

-static void remove_trailing_slashes(char *p)
-{
-   int l = strlen(p);
-   while (--l >= 0 && (p[l] == '/' || p[l] == '\n'))
-   p[l] = '\0';
-}
-
 /*
  * A few functions to help detect when a container creation failed.
  * If a container creation was killed partway through, then trying
diff --git a/src/lxc/utils.c b/src/lxc/utils.c
index a908b5c..dc94a3c 100644
--- a/src/lxc/utils.c
+++ b/src/lxc/utils.c
@@ -211,6 +211,13 @@ extern int mkdir_p(const char *dir, mode_t mode)
return 0;
 }

+extern void remove_trailing_slashes(char *p)
+{
+   int l = strlen(p);
+   while (--l >= 0 && (p[l] == '/' || p[l] == '\n'))
+   p[l] = '\0';
+}
+
 static char *copy_global_config_value(char *p)
 {
int len = strlen(p);
diff --git a/src/lxc/utils.h b/src/lxc/utils.h
index 55f98fa..87a914b 100644
--- a/src/lxc/utils.h
+++ b/src/lxc/utils.h
@@ -37,6 +37,7 @@ extern int lxc_rmdir_onedev(char *path);
 extern int lxc_setup_fs(void);
 extern int get_u16(unsigned short *val, const char *arg, int base);
 extern int mkdir_p(const char *dir, mode_t mode);
+extern void remove_trailing_slashes(char *p);
 extern const char *get_rundir(void);

 /*
-- 
1.8.3


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH] refactor AppArmor into LSM backend, add SELinux support

2013-09-24 Thread Dwight Engen
Currently, a maximum of one LSM within LXC will be initialized and
used. If in the future stacked LSMs become a reality, we can support it
without changing the configuration syntax and add support for more than
a single LSM at a time to the lsm code.

Generic LXC code should note that lsm_process_label_set() will take
effect "now" for AppArmor, and upon exec() for SELinux.

- fix Oracle template mounting of proc and sysfs, needed when using SELinux

Signed-off-by: Dwight Engen 
---
 configure.ac|  14 
 doc/lxc.conf.sgml.in|  25 ++
 src/lxc/Makefile.am |  21 -
 src/lxc/apparmor.c  | 219 
 src/lxc/apparmor.h  |  56 -
 src/lxc/attach.c|  31 +++
 src/lxc/attach.h|   2 +-
 src/lxc/conf.c  |  43 +++---
 src/lxc/conf.h  |  12 +--
 src/lxc/confile.c   |  56 -
 src/lxc/lsm/apparmor.c  | 168 +
 src/lxc/lsm/lsm.c   | 156 ++
 src/lxc/lsm/lsm.h   |  52 
 src/lxc/lsm/nop.c   |  46 ++
 src/lxc/lsm/selinux.c   | 101 ++
 src/lxc/start.c |  15 +++-
 src/lxc/start.h |   3 -
 templates/lxc-oracle.in |   9 +-
 18 files changed, 663 insertions(+), 366 deletions(-)
 delete mode 100644 src/lxc/apparmor.c
 delete mode 100644 src/lxc/apparmor.h
 create mode 100644 src/lxc/lsm/apparmor.c
 create mode 100644 src/lxc/lsm/lsm.c
 create mode 100644 src/lxc/lsm/lsm.h
 create mode 100644 src/lxc/lsm/nop.c
 create mode 100644 src/lxc/lsm/selinux.c

diff --git a/configure.ac b/configure.ac
index cffbdac..9d77bb5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -115,6 +115,20 @@ AM_COND_IF([ENABLE_APPARMOR],
AC_CHECK_LIB([apparmor], [aa_change_profile],[],[AC_MSG_ERROR([You must 
install the AppArmor development package in order to compile lxc])])
AC_SUBST([APPARMOR_LIBS], [-lapparmor])])
 
+# SELinux
+AC_ARG_ENABLE([selinux],
+   [AC_HELP_STRING([--enable-selinux], [enable SELinux support])],
+   [], [enable_selinux=check])
+
+if test "x$enable_selinux" = xcheck; then
+   
AC_CHECK_LIB([selinux],[setexeccon_raw],[enable_selinux=yes],[enable_selinux=no])
+fi
+AM_CONDITIONAL([ENABLE_SELINUX], [test "x$enable_selinux" = "xyes"])
+AM_COND_IF([ENABLE_SELINUX],
+   [AC_CHECK_HEADER([selinux/selinux.h],[],[AC_MSG_ERROR([You must install 
the SELinux development package in order to compile lxc])])
+   AC_CHECK_LIB([selinux], [setexeccon_raw],[],[AC_MSG_ERROR([You must 
install the SELinux development package in order to compile lxc])])
+   AC_SUBST([SELINUX_LIBS])])
+
 # Seccomp syscall filter
 AC_ARG_ENABLE([seccomp],
[AC_HELP_STRING([--enable-seccomp], [enable seccomp])],
diff --git a/doc/lxc.conf.sgml.in b/doc/lxc.conf.sgml.in
index dc416e8..bad553c 100644
--- a/doc/lxc.conf.sgml.in
+++ b/doc/lxc.conf.sgml.in
@@ -811,6 +811,31 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, 
MA 02110-1301 USA
 
 
 
+  SELinux context
+  
+   If lxc was compiled and installed with SELinux support, and the host
+   system has SELinux enabled, then the SELinux context under which the
+   container should be run can be specified in the container
+   configuration.  The default is unconfined_t,
+   which means that lxc will not attempt to change contexts.
+  
+  
+   
+ 
+   lxc.se_context
+ 
+ 
+   
+ Specify the SELinux context under which the container should
+ be run or unconfined_t. For example
+   
+   lxc.se_context = 
unconfined_u:unconfined_r:lxc_t:s0-s0:c0.c1023
+ 
+   
+  
+
+
+
   Seccomp configuration
   
 A container can be started with a reduced set of available
diff --git a/src/lxc/Makefile.am b/src/lxc/Makefile.am
index f19a994..873b97d 100644
--- a/src/lxc/Makefile.am
+++ b/src/lxc/Makefile.am
@@ -37,6 +37,18 @@ sodir=$(libdir)
 # use PROGRAMS to avoid complains from automake
 so_PROGRAMS = liblxc.so
 
+LSM_SOURCES = \
+   lsm/nop.c \
+   lsm/lsm.h lsm/lsm.c
+
+if ENABLE_APPARMOR
+LSM_SOURCES += lsm/apparmor.c
+endif
+
+if ENABLE_SELINUX
+LSM_SOURCES += lsm/selinux.c
+endif
+
 liblxc_so_SOURCES = \
arguments.c arguments.h \
bdev.c bdev.h \
@@ -73,10 +85,11 @@ liblxc_so_SOURCES = \
af_unix.c af_unix.h \
\
lxcutmp.c lxcutmp.h \
-   apparmor.c apparmor.h \
lxclock.h lxclock.c \
lxccontainer.c lxccontainer.h \
-   version.c version.h
+   version.c version.h \
+   \
+   $(LSM_SOURCES)
 
 if IS_BIONIC
 liblxc_so_SOURCES += \
@@ -107,6 +120,10 @@ if ENABLE_APPARMOR
 AM_CFLAGS += -DHAVE_APPARMOR
 endif
 
+if ENABLE_SELINUX
+AM_CFLAGS += -DHAVE_SELINUX
+endif
+
 if HAVE_NEWUIDMAP
 AM_CFLAGS += -DHAVE_NEWUIDMAP
 endif
diff --git a/src/lxc/apparmor.c b/src/lxc/apparm

Re: [lxc-devel] [PATCH 1/2] utils: move remove_trailing_slashes to utils

2013-09-24 Thread Serge Hallyn
Quoting Qiang Huang (h.huangqi...@huawei.com):
> 
> Signed-off-by: Qiang Huang 

Acked-by: Serge E. Hallyn 

> ---
>  src/lxc/lxccontainer.c | 7 ---
>  src/lxc/utils.c| 7 +++
>  src/lxc/utils.h| 1 +
>  3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c
> index ea5a84e..f5d41b3 100644
> --- a/src/lxc/lxccontainer.c
> +++ b/src/lxc/lxccontainer.c
> @@ -69,13 +69,6 @@ static bool file_exists(char *f)
>   return stat(f, &statbuf) == 0;
>  }
> 
> -static void remove_trailing_slashes(char *p)
> -{
> - int l = strlen(p);
> - while (--l >= 0 && (p[l] == '/' || p[l] == '\n'))
> - p[l] = '\0';
> -}
> -
>  /*
>   * A few functions to help detect when a container creation failed.
>   * If a container creation was killed partway through, then trying
> diff --git a/src/lxc/utils.c b/src/lxc/utils.c
> index a908b5c..dc94a3c 100644
> --- a/src/lxc/utils.c
> +++ b/src/lxc/utils.c
> @@ -211,6 +211,13 @@ extern int mkdir_p(const char *dir, mode_t mode)
>   return 0;
>  }
> 
> +extern void remove_trailing_slashes(char *p)
> +{
> + int l = strlen(p);
> + while (--l >= 0 && (p[l] == '/' || p[l] == '\n'))
> + p[l] = '\0';
> +}
> +
>  static char *copy_global_config_value(char *p)
>  {
>   int len = strlen(p);
> diff --git a/src/lxc/utils.h b/src/lxc/utils.h
> index 55f98fa..87a914b 100644
> --- a/src/lxc/utils.h
> +++ b/src/lxc/utils.h
> @@ -37,6 +37,7 @@ extern int lxc_rmdir_onedev(char *path);
>  extern int lxc_setup_fs(void);
>  extern int get_u16(unsigned short *val, const char *arg, int base);
>  extern int mkdir_p(const char *dir, mode_t mode);
> +extern void remove_trailing_slashes(char *p);
>  extern const char *get_rundir(void);
> 
>  /*
> -- 
> 1.8.3
> 

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 2/2] arguments: remove trailing slashes for the input lxcpath

2013-09-24 Thread Serge Hallyn
Quoting Qiang Huang (h.huangqi...@huawei.com):
> In lxc_cmd(), we use
> snprintf(path, len, "%s/%s/command", lxcpath ? lxcpath : inpath, name);
> to fill sock name, this assume lxcpath have no trailing slashes, so
> if we use
> lxc-info -n test -P /usr/local/var/lib/lxc_anon/
> to get a running container's state, we will get state: STOPPED which
> is wrong, because we combine a wrong sock name.
> 
> To fix this, just remove trailing slashes when parsing arguments.
> 
> Signed-off-by: Qiang Huang 

Acked-by: Serge E. Hallyn 

> ---
>  src/lxc/arguments.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/lxc/arguments.c b/src/lxc/arguments.c
> index c35dfd8..adcf8fe 100644
> --- a/src/lxc/arguments.c
> +++ b/src/lxc/arguments.c
> @@ -197,6 +197,7 @@ extern int lxc_arguments_parse(struct lxc_arguments *args,
>   case 'l':   args->log_priority = optarg; break;
>   case 'q':   args->quiet = 1; break;
>   case 'P':
> + remove_trailing_slashes(optarg);
>   ret = lxc_arguments_lxcpath_add(args, optarg);
>   if (ret < 0)
>   return ret;
> -- 
> 1.8.3
> 

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [lxc/lxc] e55500: arguments: remove trailing slashes for the input l...

2013-09-24 Thread GitHub
  Branch: refs/heads/master
  Home:   https://github.com/lxc/lxc
  Commit: e555005b15a1d8e95997bd2d72abd0bc230a541d
  https://github.com/lxc/lxc/commit/e555005b15a1d8e95997bd2d72abd0bc230a541d
  Author: Qiang Huang 
  Date:   2013-09-24 (Tue, 24 Sep 2013)

  Changed paths:
M src/lxc/arguments.c

  Log Message:
  ---
  arguments: remove trailing slashes for the input lxcpath

In lxc_cmd(), we use
snprintf(path, len, "%s/%s/command", lxcpath ? lxcpath : inpath, name);
to fill sock name, this assume lxcpath have no trailing slashes, so
if we use
lxc-info -n test -P /usr/local/var/lib/lxc_anon/
to get a running container's state, we will get state: STOPPED which
is wrong, because we combine a wrong sock name.

To fix this, just remove trailing slashes when parsing arguments.

Signed-off-by: Qiang Huang 
Acked-by: Serge Hallyn 
Signed-off-by: Serge Hallyn 


  Commit: 89cd77934835d1f04edd17c718d77591974e01f5
  https://github.com/lxc/lxc/commit/89cd77934835d1f04edd17c718d77591974e01f5
  Author: Qiang Huang 
  Date:   2013-09-24 (Tue, 24 Sep 2013)

  Changed paths:
M src/lxc/lxccontainer.c
M src/lxc/utils.c
M src/lxc/utils.h

  Log Message:
  ---
  utils: move remove_trailing_slashes to utils

Signed-off-by: Qiang Huang 
Acked-by: Serge Hallyn 
Signed-off-by: Serge Hallyn 


Compare: https://github.com/lxc/lxc/compare/9d0cda4f22f7...89cd77934835
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Serge Hallyn
Quoting Rob Landley (r...@landley.net):
> On 09/23/2013 11:19:17 AM, Serge Hallyn wrote:
> >Quoting Rob Landley (r...@landley.net):
> >> On 09/12/2013 01:27:07 PM, Christian Seiler wrote:
> >> > Hi there,
> >> >
> >> > just a quick question: currently, rootfs is pinned with a
> >.hold file
> >> > in
> >> > the parent directory (which btw. does not help against file
> >systems
> >> > that
> >> > are already mounted on the host but directly in the rootfs
> >directory).
> >> > The problem with the .hold file is that it doesn't make the
> >directory
> >> > necessarily pretty; I tend to mount all rootfs to
> >/srv/lxc/$container
> >> > (config remaining in /var/lib/lxc), and then when doing a ls
> >> > /srv/lxc, I
> >> > see tons of .hold files. (I'm not even sure that they are removed
> >> > after
> >> > container termination - but even if they are, the default
> >state of a
> >> > typical system tends to be that at least some containers are
> >> > running...)
> >> >
> >> > Couldn't we just open $rootfs/lxc.hold for writing, keep the
> >fd (as
> >> > current pinfd) and then unlink (!) the file directly? According to
> >> > POSIX
> >> > semantics, the file is then still open and the pinning should work
> >> > (now
> >> > also for the above case), but there are no files lying around
> >anymore.
> >> > (Note: I didn't test that, it could well be that that doesn't
> >work.)
> >> >
> >> > Thoughts?
> >>
> >> Why doesn't keeping a file open to the directory itself work? (I'm
> >> assuming it doesn't, I'm wondering why.)
> >
> >Tried it under tmpfs, and open("/mnt", O_RDWR) with tmpfs mounted
> >at /mnt does not work, gives EISDIR.  O_RDONLY does work, but that
> >doesn't prevent mount -o remount,ro.
> 
> The filesystem hitting an error (including one from the block
> device) can make most filesystems remount themselves read only,
> forcibly even with active writers. The permissions to do so from
> userspace should be roughly analogous to calling shutdown or "kill
> -1"? (I'm wondering what lxc's interest is in preventing the
> container-local root from doing something container-local
> dangerous?)

Some people have a block device mounted at /var/lib/lxc, and
keep all their containers and rootfs' there.

If they start a single container and shut it down, most distros
during shutdown will mount -o remount,ro /, which will end up
remounting /var/lib/lxc ro.  Now other containers can't start up.
So it's not actually container-local dangerous.

Now, it's possible that we should just make sure that any
directory-backed (or btrfs-backed) containers always bind-mount
$rootfs onto itself.  That might work and might be a cleaner
solution.

-serge

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Stéphane Graber
On Tue, Sep 24, 2013 at 09:41:04AM -0500, Serge Hallyn wrote:
> Quoting Rob Landley (r...@landley.net):
> > On 09/23/2013 11:19:17 AM, Serge Hallyn wrote:
> > >Quoting Rob Landley (r...@landley.net):
> > >> On 09/12/2013 01:27:07 PM, Christian Seiler wrote:
> > >> > Hi there,
> > >> >
> > >> > just a quick question: currently, rootfs is pinned with a
> > >.hold file
> > >> > in
> > >> > the parent directory (which btw. does not help against file
> > >systems
> > >> > that
> > >> > are already mounted on the host but directly in the rootfs
> > >directory).
> > >> > The problem with the .hold file is that it doesn't make the
> > >directory
> > >> > necessarily pretty; I tend to mount all rootfs to
> > >/srv/lxc/$container
> > >> > (config remaining in /var/lib/lxc), and then when doing a ls
> > >> > /srv/lxc, I
> > >> > see tons of .hold files. (I'm not even sure that they are removed
> > >> > after
> > >> > container termination - but even if they are, the default
> > >state of a
> > >> > typical system tends to be that at least some containers are
> > >> > running...)
> > >> >
> > >> > Couldn't we just open $rootfs/lxc.hold for writing, keep the
> > >fd (as
> > >> > current pinfd) and then unlink (!) the file directly? According to
> > >> > POSIX
> > >> > semantics, the file is then still open and the pinning should work
> > >> > (now
> > >> > also for the above case), but there are no files lying around
> > >anymore.
> > >> > (Note: I didn't test that, it could well be that that doesn't
> > >work.)
> > >> >
> > >> > Thoughts?
> > >>
> > >> Why doesn't keeping a file open to the directory itself work? (I'm
> > >> assuming it doesn't, I'm wondering why.)
> > >
> > >Tried it under tmpfs, and open("/mnt", O_RDWR) with tmpfs mounted
> > >at /mnt does not work, gives EISDIR.  O_RDONLY does work, but that
> > >doesn't prevent mount -o remount,ro.
> > 
> > The filesystem hitting an error (including one from the block
> > device) can make most filesystems remount themselves read only,
> > forcibly even with active writers. The permissions to do so from
> > userspace should be roughly analogous to calling shutdown or "kill
> > -1"? (I'm wondering what lxc's interest is in preventing the
> > container-local root from doing something container-local
> > dangerous?)
> 
> Some people have a block device mounted at /var/lib/lxc, and
> keep all their containers and rootfs' there.
> 
> If they start a single container and shut it down, most distros
> during shutdown will mount -o remount,ro /, which will end up
> remounting /var/lib/lxc ro.  Now other containers can't start up.
> So it's not actually container-local dangerous.
> 
> Now, it's possible that we should just make sure that any
> directory-backed (or btrfs-backed) containers always bind-mount
> $rootfs onto itself.  That might work and might be a cleaner
> solution.
> 
> -serge

Yep, we discussed this at Plumbers and I think it's really the way to
go, basically remove all of that fs pinning code and just do a
bind-mount of the rootfs on itself in the container's mountns before
starting it.

That way if the container decideds to remount / ro at any point, it'll
succeed and will give the user a read-only / but without affecting the
outside world.

-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com


signature.asc
Description: Digital signature
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v2 rebased against github master 9d0cda4f] refactor AppArmor into LSM backend, add SELinux support

2013-09-24 Thread Dwight Engen
Currently, a maximum of one LSM within LXC will be initialized and
used. If in the future stacked LSMs become a reality, we can support it
without changing the configuration syntax and add support for more than
a single LSM at a time to the lsm code.

Generic LXC code should note that lsm_process_label_set() will take
effect "now" for AppArmor, and upon exec() for SELinux.

- fix Oracle template mounting of proc and sysfs, needed when using SELinux

Signed-off-by: Dwight Engen 
---
 configure.ac|  14 +++
 doc/lxc.conf.sgml.in|  25 ++
 src/lxc/Makefile.am |  21 -
 src/lxc/apparmor.c  | 230 
 src/lxc/apparmor.h  |  56 
 src/lxc/attach.c|  31 +++
 src/lxc/attach.h|   2 +-
 src/lxc/conf.c  |  43 +++--
 src/lxc/conf.h  |  12 +--
 src/lxc/confile.c   |  56 +++-
 src/lxc/lsm/apparmor.c  | 179 +
 src/lxc/lsm/lsm.c   | 156 
 src/lxc/lsm/lsm.h   |  52 +++
 src/lxc/lsm/nop.c   |  46 ++
 src/lxc/lsm/selinux.c   | 101 +
 src/lxc/start.c |  15 +++-
 src/lxc/start.h |   3 -
 templates/lxc-oracle.in |   9 +-
 18 files changed, 674 insertions(+), 377 deletions(-)
 delete mode 100644 src/lxc/apparmor.c
 delete mode 100644 src/lxc/apparmor.h
 create mode 100644 src/lxc/lsm/apparmor.c
 create mode 100644 src/lxc/lsm/lsm.c
 create mode 100644 src/lxc/lsm/lsm.h
 create mode 100644 src/lxc/lsm/nop.c
 create mode 100644 src/lxc/lsm/selinux.c

diff --git a/configure.ac b/configure.ac
index adc4e8a..1a5c8aa 100644
--- a/configure.ac
+++ b/configure.ac
@@ -116,6 +116,20 @@ AM_COND_IF([ENABLE_APPARMOR],
AC_CHECK_LIB([apparmor], [aa_change_profile],[],[AC_MSG_ERROR([You must 
install the AppArmor development package in order to compile lxc])])
AC_SUBST([APPARMOR_LIBS], [-lapparmor])])
 
+# SELinux
+AC_ARG_ENABLE([selinux],
+   [AC_HELP_STRING([--enable-selinux], [enable SELinux support])],
+   [], [enable_selinux=check])
+
+if test "x$enable_selinux" = xcheck; then
+   
AC_CHECK_LIB([selinux],[setexeccon_raw],[enable_selinux=yes],[enable_selinux=no])
+fi
+AM_CONDITIONAL([ENABLE_SELINUX], [test "x$enable_selinux" = "xyes"])
+AM_COND_IF([ENABLE_SELINUX],
+   [AC_CHECK_HEADER([selinux/selinux.h],[],[AC_MSG_ERROR([You must install 
the SELinux development package in order to compile lxc])])
+   AC_CHECK_LIB([selinux], [setexeccon_raw],[],[AC_MSG_ERROR([You must 
install the SELinux development package in order to compile lxc])])
+   AC_SUBST([SELINUX_LIBS])])
+
 # Seccomp syscall filter
 AC_ARG_ENABLE([seccomp],
[AC_HELP_STRING([--enable-seccomp], [enable seccomp])],
diff --git a/doc/lxc.conf.sgml.in b/doc/lxc.conf.sgml.in
index dc416e8..bad553c 100644
--- a/doc/lxc.conf.sgml.in
+++ b/doc/lxc.conf.sgml.in
@@ -811,6 +811,31 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, 
MA 02110-1301 USA
 
 
 
+  SELinux context
+  
+   If lxc was compiled and installed with SELinux support, and the host
+   system has SELinux enabled, then the SELinux context under which the
+   container should be run can be specified in the container
+   configuration.  The default is unconfined_t,
+   which means that lxc will not attempt to change contexts.
+  
+  
+   
+ 
+   lxc.se_context
+ 
+ 
+   
+ Specify the SELinux context under which the container should
+ be run or unconfined_t. For example
+   
+   lxc.se_context = 
unconfined_u:unconfined_r:lxc_t:s0-s0:c0.c1023
+ 
+   
+  
+
+
+
   Seccomp configuration
   
 A container can be started with a reduced set of available
diff --git a/src/lxc/Makefile.am b/src/lxc/Makefile.am
index f19a994..873b97d 100644
--- a/src/lxc/Makefile.am
+++ b/src/lxc/Makefile.am
@@ -37,6 +37,18 @@ sodir=$(libdir)
 # use PROGRAMS to avoid complains from automake
 so_PROGRAMS = liblxc.so
 
+LSM_SOURCES = \
+   lsm/nop.c \
+   lsm/lsm.h lsm/lsm.c
+
+if ENABLE_APPARMOR
+LSM_SOURCES += lsm/apparmor.c
+endif
+
+if ENABLE_SELINUX
+LSM_SOURCES += lsm/selinux.c
+endif
+
 liblxc_so_SOURCES = \
arguments.c arguments.h \
bdev.c bdev.h \
@@ -73,10 +85,11 @@ liblxc_so_SOURCES = \
af_unix.c af_unix.h \
\
lxcutmp.c lxcutmp.h \
-   apparmor.c apparmor.h \
lxclock.h lxclock.c \
lxccontainer.c lxccontainer.h \
-   version.c version.h
+   version.c version.h \
+   \
+   $(LSM_SOURCES)
 
 if IS_BIONIC
 liblxc_so_SOURCES += \
@@ -107,6 +120,10 @@ if ENABLE_APPARMOR
 AM_CFLAGS += -DHAVE_APPARMOR
 endif
 
+if ENABLE_SELINUX
+AM_CFLAGS += -DHAVE_SELINUX
+endif
+
 if HAVE_NEWUIDMAP
 AM_CFLAGS += -DHAVE_NEWUIDMAP
 endif
diff --git a/src/lxc/apparmor.c b/src/lxc/apparmor.c
del

Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Michael H. Warfield
On Tue, 2013-09-24 at 10:45 -0400, Stéphane Graber wrote: 
> On Tue, Sep 24, 2013 at 09:41:04AM -0500, Serge Hallyn wrote:
> > Quoting Rob Landley (r...@landley.net):
> > > On 09/23/2013 11:19:17 AM, Serge Hallyn wrote:
> > > >Quoting Rob Landley (r...@landley.net):
> > > >> On 09/12/2013 01:27:07 PM, Christian Seiler wrote:
> > > >> > Hi there,
> > > >> >
> > > >> > just a quick question: currently, rootfs is pinned with a
> > > >.hold file
> > > >> > in
> > > >> > the parent directory (which btw. does not help against file
> > > >systems
> > > >> > that
> > > >> > are already mounted on the host but directly in the rootfs
> > > >directory).
> > > >> > The problem with the .hold file is that it doesn't make the
> > > >directory
> > > >> > necessarily pretty; I tend to mount all rootfs to
> > > >/srv/lxc/$container
> > > >> > (config remaining in /var/lib/lxc), and then when doing a ls
> > > >> > /srv/lxc, I
> > > >> > see tons of .hold files. (I'm not even sure that they are removed
> > > >> > after
> > > >> > container termination - but even if they are, the default
> > > >state of a
> > > >> > typical system tends to be that at least some containers are
> > > >> > running...)
> > > >> >
> > > >> > Couldn't we just open $rootfs/lxc.hold for writing, keep the
> > > >fd (as
> > > >> > current pinfd) and then unlink (!) the file directly? According to
> > > >> > POSIX
> > > >> > semantics, the file is then still open and the pinning should work
> > > >> > (now
> > > >> > also for the above case), but there are no files lying around
> > > >anymore.
> > > >> > (Note: I didn't test that, it could well be that that doesn't
> > > >work.)
> > > >> >
> > > >> > Thoughts?
> > > >>
> > > >> Why doesn't keeping a file open to the directory itself work? (I'm
> > > >> assuming it doesn't, I'm wondering why.)
> > > >
> > > >Tried it under tmpfs, and open("/mnt", O_RDWR) with tmpfs mounted
> > > >at /mnt does not work, gives EISDIR.  O_RDONLY does work, but that
> > > >doesn't prevent mount -o remount,ro.
> > > 
> > > The filesystem hitting an error (including one from the block
> > > device) can make most filesystems remount themselves read only,
> > > forcibly even with active writers. The permissions to do so from
> > > userspace should be roughly analogous to calling shutdown or "kill
> > > -1"? (I'm wondering what lxc's interest is in preventing the
> > > container-local root from doing something container-local
> > > dangerous?)
> > 
> > Some people have a block device mounted at /var/lib/lxc, and
> > keep all their containers and rootfs' there.
> > 
> > If they start a single container and shut it down, most distros
> > during shutdown will mount -o remount,ro /, which will end up
> > remounting /var/lib/lxc ro.  Now other containers can't start up.
> > So it's not actually container-local dangerous.
> > 
> > Now, it's possible that we should just make sure that any
> > directory-backed (or btrfs-backed) containers always bind-mount
> > $rootfs onto itself.  That might work and might be a cleaner
> > solution.
> > 
> > -serge

> Yep, we discussed this at Plumbers and I think it's really the way to
> go, basically remove all of that fs pinning code and just do a
> bind-mount of the rootfs on itself in the container's mountns before
> starting it.

> That way if the container decideds to remount / ro at any point, it'll
> succeed and will give the user a read-only / but without affecting the
> outside world.

Ideally, I think that's the way to go and I use to do that manually when
setting up my containers but I was thinking there was some breakage
between that and the way we were working around the pivot_root problem
introduced by systemd (Fedora, Suse, Arch, et al).  If we can verify
that works with all the init flavors without breaking, that could be
part of the general cleanup of the mount tables in the containers as
well, maybe...

> --
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  m...@wittsend.com
   /\/\|=mhw=|\/\/  | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9  | An optimist believes we live in the best of all
 PGP Key: 0x674627FF| possible worlds.  A pessimist is sure of it!


signature.asc
Description: This is a digitally signed message part
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel proce

[lxc-devel] [PATCH 4/4] Automatic mounting: document options in lxc.conf(5) manpage

2013-09-24 Thread Christian Seiler
Signed-off-by: Christian Seiler 
---
 doc/lxc.conf.sgml.in |   99 ++
 1 file changed, 99 insertions(+)

diff --git a/doc/lxc.conf.sgml.in b/doc/lxc.conf.sgml.in
index dc416e8..d904b56 100644
--- a/doc/lxc.conf.sgml.in
+++ b/doc/lxc.conf.sgml.in
@@ -656,6 +656,105 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, 
Boston, MA 02110-1301 USA
  

 
+   
+ 
+   lxc.mount.auto
+ 
+ 
+   
+ specify which standard kernel file systems should be
+ automatically mounted. This may dramatically simplify
+ the configuration. The file systems are:
+   
+   
+ 
+   proc:mixed (or proc):
+   mount /proc as read-write, but
+   remount /proc/sys and
+   /proc/sysrq-trigger read-only
+   for security / container isolation purposes.
+ 
+ 
+   proc:rw: mount
+   /proc as read-write
+ 
+ 
+   sys:ro (or sys):
+   mount /sys as read-only
+   for security / container isolation purposes.
+ 
+ 
+   sys:rw: mount
+   /sys as read-write
+ 
+ 
+   cgroup:mixed (or
+   cgroup):
+   mount a tmpfs to /sys/fs/cgroup,
+   create directories for all hierarchies to which
+   the container is added, create subdirectories
+   there with the name of the cgroup, and bind-mount
+   the container's own cgroup into that directory.
+   The container will be able to write to its own
+   cgroup directory, but not the parents, since they
+   will be remounted read-only
+ 
+ 
+   cgroup:ro: similar to
+   cgroup:mixed, but everything will
+   be mounted read-only.
+ 
+ 
+   cgroup:rw: similar to
+   cgroup:mixed, but everything will
+   be mounted read-write. Note that the paths leading
+   up to the container's own cgroup will be writable,
+   but will not be a cgroup filesystem but just part
+   of the tmpfs of /sys/fs/cgroup
+ 
+ 
+   cgroup-full:mixed (or
+   cgroup-full):
+   mount a tmpfs to /sys/fs/cgroup,
+   create directories for all hierarchies to which
+   the container is added, bind-mount the hierarchies
+   from the host to the container and make everything
+   read-only except the container's own cgroup. Note
+   that compared to cgroup, where
+   all paths leading up to the container's own cgroup
+   are just simple directories in the underlying
+   tmpfs, here
+   /sys/fs/cgroup/$hierarchy
+   will contain the host's full cgroup hierarchy,
+   albeit read-only outside the container's own cgroup.
+   This may leak quite a bit of information into the
+   container.
+ 
+ 
+   cgroup-full:ro: similar to
+   cgroup-full:mixed, but everything
+   will be mounted read-only.
+ 
+ 
+   cgroup-full:rw: similar to
+   cgroup-full:mixed, but everything
+   will be mounted read-write. Note that in this case,
+   the container may escape its own cgroup. (Note also
+   that if the container has CAP_SYS_ADMIN support
+   and can mount the cgroup filesystem itself, it may
+   do so anyway.)
+ 
+   
+   
+ Examples:
+   
+   
+ lxc.mount.auto = proc sys cgroup
+ lxc.mount.auto = proc:rw sys:rw cgroup-full:rw
+   
+ 
+   
+
   
 
 
-- 
1.7.10.4


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 0/4] Automatic mounting improvements

2013-09-24 Thread Christian Seiler
Hi there,

I've attached the automatic mounting improvements that were discussed
in the thread


The patches are against current github master and I've also pushed them
to my github:


The gist of it:

 - manpage updates
 - allow /sys etc. to be remounted read-write
 - allow for the full cgroup tree to be mounted (in mixed-mode, ro and
   rw)
 - write_config now honors lxc.mount.auto

-- Christian


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 3/4] Automatic mounting: add more ways to mount the cgroup filesystem

2013-09-24 Thread Christian Seiler
This adds quite a few more ways to mount the cgroup filesystem
automatically:

 - Specify ro/rw/mixed:
   - ro: everything mounted read-only
   - rw: everything mounted read-write
   - mixed: only container's own cgroup is rw, rest ro
(default)
 - Add cgroup-full that mounts the entire cgroup tree to the
   corresponding directories. ro/rw/mixed also apply here.

Signed-off-by: Christian Seiler 
---
 src/lxc/cgroup.c |   87 +++---
 src/lxc/cgroup.h |2 +-
 src/lxc/conf.c   |2 +-
 3 files changed, 71 insertions(+), 20 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index 1270b8a..6025a4d 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -1215,7 +1215,7 @@ int lxc_setup_cgroup_devices(struct lxc_handler *h, 
struct lxc_list *cgroup_sett
return do_setup_cgroup(h, cgroup_settings, true);
 }
 
-int lxc_setup_mount_cgroup(const char *root, struct cgroup_process_info 
*base_info)
+int lxc_setup_mount_cgroup(const char *root, struct cgroup_process_info 
*base_info, int type)
 {
size_t bufsz = strlen(root) + sizeof("/sys/fs/cgroup");
char *path = NULL;
@@ -1226,6 +1226,12 @@ int lxc_setup_mount_cgroup(const char *root, struct 
cgroup_process_info *base_in
struct cgroup_process_info *info;
int r, saved_errno = 0;
 
+   if (type < LXC_AUTO_CGROUP_RO || type > LXC_AUTO_CGROUP_FULL_MIXED) {
+   ERROR("could not mount cgroups into container: invalid type 
specified internally");
+   errno = EINVAL;
+   return -1;
+   }
+
path = calloc(1, bufsz);
if (!path)
return -1;
@@ -1272,27 +1278,71 @@ int lxc_setup_mount_cgroup(const char *root, struct 
cgroup_process_info *base_in
goto out_error;
}
 
-   /* create path for container's cgroup */
abs_path2 = lxc_append_paths(abs_path, info->cgroup_path);
if (!abs_path2)
goto out_error;
-   r = mkdir_p(abs_path2, 0755);
-   if (r < 0 && errno != EEXIST) {
-   SYSERROR("could not create cgroup directory 
/sys/fs/cgroup/%s%s", dirname, info->cgroup_path);
-   goto out_error;
-   }
 
-   free(abs_path);
-   abs_path = NULL;
+   if (type == LXC_AUTO_CGROUP_FULL_RO || type == 
LXC_AUTO_CGROUP_FULL_RW || type == LXC_AUTO_CGROUP_FULL_MIXED) {
+   /* bind-mount the cgroup entire filesystem there */
+   if (strcmp(mp->mount_prefix, "/") != 0) {
+   /* FIXME: maybe we should just try to remount 
the entire hierarchy
+*with a regular mount command? may 
that works? */
+   ERROR("could not automatically mount 
cgroup-full to /sys/fs/cgroup/%s: host has no mount point for this cgroup 
filesystem that has access to the root cgroup", dirname);
+   goto out_error;
+   }
+   r = mount(mp->mount_point, abs_path, "none", MS_BIND, 
0);
+   if (r < 0) {
+   SYSERROR("error bind-mounting %s to %s", 
mp->mount_point, abs_path);
+   goto out_error;
+   }
+   /* main cgroup path should be read-only */
+   if (type == LXC_AUTO_CGROUP_FULL_RO || type == 
LXC_AUTO_CGROUP_FULL_MIXED) {
+   r = mount(NULL, abs_path, NULL, 
MS_REMOUNT|MS_BIND|MS_RDONLY, NULL);
+   if (r < 0) {
+   SYSERROR("error re-mounting %s 
readonly", abs_path);
+   goto out_error;
+   }
+   }
+   /* own cgroup should be read-write */
+   if (type == LXC_AUTO_CGROUP_FULL_MIXED) {
+   r = mount(abs_path2, abs_path2, NULL, MS_BIND, 
NULL);
+   if (r < 0) {
+   SYSERROR("error bind-mounting %s onto 
itself", abs_path2);
+   goto out_error;
+   }
+   r = mount(NULL, abs_path2, NULL, 
MS_REMOUNT|MS_BIND, NULL);
+   if (r < 0) {
+   SYSERROR("error re-mounting %s 
readwrite", abs_path2);
+   goto out_error;
+   }
+   }
+   } else {
+   /* create path for container's cgroup */
+   r = mkdir_p(abs_path2, 0755);
+   if (r < 0 && errno != EEXIST) {
+ 

[lxc-devel] [PATCH 1/4] Automatic mounts: improvements for /proc and /sys

2013-09-24 Thread Christian Seiler
Improve lxc.mount.auto code: allow the user to specify whether to mount
certain things read-only or read-write. Also make the code much more
easily extensible for the future.

Signed-off-by: Christian Seiler 
---
 src/lxc/conf.c|  144 +
 src/lxc/conf.h|   21 ++--
 src/lxc/confile.c |   25 --
 3 files changed, 105 insertions(+), 85 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index e933c9a..dd69a80 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -743,85 +743,85 @@ int pin_rootfs(const char *rootfs)
 
 static int lxc_mount_auto_mounts(struct lxc_conf *conf, int flags, struct 
cgroup_process_info *cgroup_info)
 {
-   char *path = NULL;
-   char *dev_null = NULL;
int r;
-
-   dev_null = lxc_append_paths(conf->rootfs.mount, "/dev/null");
-   if (!dev_null) {
-   SYSERROR("memory allocation error");
-   goto cleanup;
-   }
-
-   if (flags & LXC_AUTO_PROC) {
-   path = lxc_append_paths(conf->rootfs.mount, "/proc");
-   if (!path) {
-   SYSERROR("memory allocation error trying to 
automatically mount /proc");
-   goto cleanup;
-   }
-
-   r = mount("proc", path, "proc", MS_NODEV|MS_NOEXEC|MS_NOSUID, 
NULL);
-   if (r < 0) {
-   SYSERROR("error mounting /proc");
-   goto cleanup;
-   }
-
-   free(path);
-   path = NULL;
-   }
-
-   if (flags & LXC_AUTO_PROC_SYSRQ) {
-   path = lxc_append_paths(conf->rootfs.mount, 
"/proc/sysrq-trigger");
-   if (!path) {
-   SYSERROR("memory allocation error trying to 
automatically mount /proc");
-   goto cleanup;
-   }
-
-   /* safety measure, mount /dev/null over /proc/sysrq-trigger,
-* otherwise, a container may trigger a host reboot or such
+   size_t i;
+   static struct {
+   int match_mask;
+   int match_flag;
+   const char *source;
+   const char *destination;
+   const char *fstype;
+   unsigned long flags;
+   const char *options;
+   } default_mounts[] = {
+   /* Read-only bind-mounting... In older kernels, doing that 
required
+* to do one MS_BIND mount and then MS_REMOUNT|MS_RDONLY the 
same
+* one. According to mount(2) manpage, MS_BIND honors MS_RDONLY 
from
+* kernel 2.6.26 onwards. However, this apparently does not 
work on
+* kernel 3.8. Unfortunately, on that very same kernel, doing 
the
+* same trick as above doesn't seem to work either, there one 
needs
+* to ALSO specify MS_BIND for the remount, otherwise the entire
+* fs is remounted read-only or the mount fails because it's 
busy...
+* MS_REMOUNT|MS_BIND|MS_RDONLY seems to work for kernels as 
low as
+* 2.6.32...
 */
-   r = mount(dev_null, path, NULL, MS_BIND, NULL);
-   if (r < 0)
-   WARN("error mounting /dev/null over 
/proc/sysrq-trigger: %s", strerror(errno));
-
-   free(path);
-   path = NULL;
-   }
-
-   if (flags & LXC_AUTO_SYS) {
-   path = lxc_append_paths(conf->rootfs.mount, "/sys");
-   if (!path) {
-   SYSERROR("memory allocation error trying to 
automatically mount /sys");
-   goto cleanup;
-   }
+   { LXC_AUTO_PROC_MASK, LXC_AUTO_PROC_MIXED, "proc",  
"%r/proc",   "proc",  MS_NODEV|MS_NOEXEC|MS_NOSUID, NULL },
+   { LXC_AUTO_PROC_MASK, LXC_AUTO_PROC_MIXED, "%r/proc/sys",   
"%r/proc/sys",   NULL,MS_BIND,  NULL },
+   { LXC_AUTO_PROC_MASK, LXC_AUTO_PROC_MIXED, NULL,
"%r/proc/sys",   NULL,MS_REMOUNT|MS_BIND|MS_RDONLY, NULL },
+   { LXC_AUTO_PROC_MASK, LXC_AUTO_PROC_MIXED, 
"%r/proc/sysrq-trigger", "%r/proc/sysrq-trigger", NULL,MS_BIND, 
 NULL },
+   { LXC_AUTO_PROC_MASK, LXC_AUTO_PROC_MIXED, NULL,
"%r/proc/sysrq-trigger", NULL,MS_REMOUNT|MS_BIND|MS_RDONLY, NULL },
+   { LXC_AUTO_PROC_MASK, LXC_AUTO_PROC_RW,"proc",  
"%r/proc",   "proc",  MS_NODEV|MS_NOEXEC|MS_NOSUID, NULL },
+   { LXC_AUTO_SYS_MASK,  LXC_AUTO_SYS_RW, "sysfs", 
"%r/sys","sysfs", 0,NULL },
+   { LXC_AUTO_SYS_MASK,  LXC_AUTO_SYS_RO, "sysfs", 
"%r/sys","sysfs", MS_RDONLY,NULL },
+   { 0, 

[lxc-devel] [PATCH 2/4] Automatic mounting: write lxc.mount.auto in write_config

2013-09-24 Thread Christian Seiler
Signed-off-by: Christian Seiler 
---
 src/lxc/confile.c |   23 +++
 1 file changed, 23 insertions(+)

diff --git a/src/lxc/confile.c b/src/lxc/confile.c
index 04b8e57..0d5cf1f 100644
--- a/src/lxc/confile.c
+++ b/src/lxc/confile.c
@@ -2002,6 +2002,29 @@ void write_config(FILE *fout, struct lxc_conf *c)
lxc_list_for_each(it, &c->mount_list) {
fprintf(fout, "lxc.mount.entry = %s\n", (char *)it->elem);
}
+   if (c->auto_mounts & LXC_AUTO_ALL_MASK) {
+   fprintf(fout, "lxc.mount.auto =");
+   switch (c->auto_mounts & LXC_AUTO_PROC_MASK) {
+   case LXC_AUTO_PROC_MIXED:fprintf(fout, " 
proc:mixed");break;
+   case LXC_AUTO_PROC_RW:   fprintf(fout, " 
proc:rw");   break;
+   default: break;
+   }
+   switch (c->auto_mounts & LXC_AUTO_SYS_MASK) {
+   case LXC_AUTO_SYS_RO:fprintf(fout, " 
sys:ro");break;
+   case LXC_AUTO_SYS_RW:fprintf(fout, " 
sys:rw");break;
+   default: break;
+   }
+   switch (c->auto_mounts & LXC_AUTO_CGROUP_MASK) {
+   case LXC_AUTO_CGROUP_MIXED:  fprintf(fout, " 
cgroup:mixed");  break;
+   case LXC_AUTO_CGROUP_RO: fprintf(fout, " 
cgroup:ro"); break;
+   case LXC_AUTO_CGROUP_RW: fprintf(fout, " 
cgroup:rw"); break;
+   case LXC_AUTO_CGROUP_FULL_MIXED: fprintf(fout, " 
cgroup-full:mixed"); break;
+   case LXC_AUTO_CGROUP_FULL_RO:fprintf(fout, " 
cgroup-full:ro");break;
+   case LXC_AUTO_CGROUP_FULL_RW:fprintf(fout, " 
cgroup-full:rw");break;
+   default: break;
+   }
+   fprintf(fout, "\n");
+   }
if (c->tty)
fprintf(fout, "lxc.tty = %d\n", c->tty);
if (c->pts)
-- 
1.7.10.4


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Christian Seiler
Hi there,

>> Yep, we discussed this at Plumbers and I think it's really the way 
>> to
>> go, basically remove all of that fs pinning code and just do a
>> bind-mount of the rootfs on itself in the container's mountns before
>> starting it.
>
>> That way if the container decideds to remount / ro at any point, 
>> it'll
>> succeed and will give the user a read-only / but without affecting 
>> the
>> outside world.
>
> Ideally, I think that's the way to go and I use to do that manually 
> when
> setting up my containers but I was thinking there was some breakage
> between that and the way we were working around the pivot_root 
> problem
> introduced by systemd (Fedora, Suse, Arch, et al).  If we can verify
> that works with all the init flavors without breaking, that could be
> part of the general cleanup of the mount tables in the containers as
> well, maybe...

Just a short comment about what I found out when looking at the
auto-mount stuff I just sent to the list when it comes to
bind-mounts and remounting ro:

Take the following example:

mount --bind /foo /bar
mount -o remount,ro /bar

In kernels up to at least 3.2 (but not much later) this would make the
mount /bar read-only, but keep /foo read-write.

But: in kernel from at most 3.8 (possibly earlier), this would actually
remount the entire filesystem read-only or give a busy message. There
was apparently some kind of change here.

In order to properly remount bind-mounts read-only in newer kernels,
you have to do the following:

mount -o remount,bind,ro /bar

This will also work in older kernels (I could only test 2.6.32, not
earlier), so in that sense it's portable.

BUT: the typical bind-mount trick one could use to keep the container
from remounting / ro at shutdown will apparently, as far as I can
tell, not work anymore in 3.8, possibly earlier, since typical
shutdown will do the equivalent of remount,ro and not add the bind
option there.

So unfortunately, I think we'll have to stick with pinning... :(

-- Christian


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Serge Hallyn
Quoting Christian Seiler (christ...@iwakd.de):
> Hi there,
> 
> >> Yep, we discussed this at Plumbers and I think it's really the way 
> >> to
> >> go, basically remove all of that fs pinning code and just do a
> >> bind-mount of the rootfs on itself in the container's mountns before
> >> starting it.
> >
> >> That way if the container decideds to remount / ro at any point, 
> >> it'll
> >> succeed and will give the user a read-only / but without affecting 
> >> the
> >> outside world.
> >
> > Ideally, I think that's the way to go and I use to do that manually 
> > when
> > setting up my containers but I was thinking there was some breakage
> > between that and the way we were working around the pivot_root 
> > problem
> > introduced by systemd (Fedora, Suse, Arch, et al).  If we can verify
> > that works with all the init flavors without breaking, that could be
> > part of the general cleanup of the mount tables in the containers as
> > well, maybe...
> 
> Just a short comment about what I found out when looking at the
> auto-mount stuff I just sent to the list when it comes to
> bind-mounts and remounting ro:
> 
> Take the following example:
> 
> mount --bind /foo /bar
> mount -o remount,ro /bar
> 
> In kernels up to at least 3.2 (but not much later) this would make the
> mount /bar read-only, but keep /foo read-write.
> 
> But: in kernel from at most 3.8 (possibly earlier), this would actually
> remount the entire filesystem read-only or give a busy message. There
> was apparently some kind of change here.
> 
> In order to properly remount bind-mounts read-only in newer kernels,
> you have to do the following:
> 
> mount -o remount,bind,ro /bar
> 
> This will also work in older kernels (I could only test 2.6.32, not
> earlier), so in that sense it's portable.
> 
> BUT: the typical bind-mount trick one could use to keep the container
> from remounting / ro at shutdown will apparently, as far as I can
> tell, not work anymore in 3.8, possibly earlier, since typical
> shutdown will do the equivalent of remount,ro and not add the bind
> option there.
> 
> So unfortunately, I think we'll have to stick with pinning... :(

The following works for me both in 3.2 and 3.8:

sudo mkdir -p /tmp/a /tmp/b
sudo mount -t tmpfs tmpfs /tmp/a
sudo mount --bind /tmp/a /tmp/b
sudo mount -o remount,bind,rw /tmp/c /tmp/c
sudo mount -o remount,ro /tmp/c
sudo touch /tmp/b/a # succeeds
sudo touch /tmp/c/a # fails

-serge

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 0/4] Automatic mounting improvements

2013-09-24 Thread Serge Hallyn
Quoting Christian Seiler (christ...@iwakd.de):
> Hi there,
> 
> I've attached the automatic mounting improvements that were discussed
> in the thread
> 
> 
> The patches are against current github master and I've also pushed them
> to my github:
> 
> 
> The gist of it:
> 
>  - manpage updates
>  - allow /sys etc. to be remounted read-write
>  - allow for the full cgroup tree to be mounted (in mixed-mode, ro and
>rw)
>  - write_config now honors lxc.mount.auto

Thanks.  Before I get to these I want to handle Dwight's cgroup
mem-leak fixups.  I'm in the process of splitting up
lxc_cgroup_load_meta2() right now.

Should get to this set in the next few days.

thanks,
-serge

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Michael H. Warfield
On Tue, 2013-09-24 at 21:51 +0100, Christian Seiler wrote: 
> Hi there,
> 
> >> Yep, we discussed this at Plumbers and I think it's really the way 
> >> to
> >> go, basically remove all of that fs pinning code and just do a
> >> bind-mount of the rootfs on itself in the container's mountns before
> >> starting it.
> >
> >> That way if the container decideds to remount / ro at any point, 
> >> it'll
> >> succeed and will give the user a read-only / but without affecting 
> >> the
> >> outside world.
> >
> > Ideally, I think that's the way to go and I use to do that manually 
> > when
> > setting up my containers but I was thinking there was some breakage
> > between that and the way we were working around the pivot_root 
> > problem
> > introduced by systemd (Fedora, Suse, Arch, et al).  If we can verify
> > that works with all the init flavors without breaking, that could be
> > part of the general cleanup of the mount tables in the containers as
> > well, maybe...

> Just a short comment about what I found out when looking at the
> auto-mount stuff I just sent to the list when it comes to
> bind-mounts and remounting ro:

> Take the following example:

> mount --bind /foo /bar
> mount -o remount,ro /bar

> In kernels up to at least 3.2 (but not much later) this would make the
> mount /bar read-only, but keep /foo read-write.

> But: in kernel from at most 3.8 (possibly earlier), this would actually
> remount the entire filesystem read-only or give a busy message. There
> was apparently some kind of change here.

No.  There's a change there, all right, and thank you for reminding me
of that, but (afaik) it's NOT in the kernel itself.  It's a mount
option.  It's that bloody MS_SHARED option and, to a lessor extent,
MS_SLAVE option that are behind how those things are propagated.
MS_SHARED will propagate certain things from a child mount to the mount
point and to other children, IIRC, while MS_SLAVE propagates in one
direction and MS_PRIVATE restricts it.  I think the trouble maker is
MS_SHARED and that's what caused all the "pivot_root" calls to face
plant when systemd started mounting everything with MS_SHARED in the
host system.  I was using bind mounts to avoid some of these problems
but then they changed systemd and its default mount options and broke a
number of things I had running.

> In order to properly remount bind-mounts read-only in newer kernels,
> you have to do the following:

> mount -o remount,bind,ro /bar

Check your mount point options and read the man page for mount and
"shared subtrees options".  Some of the distros have been changing the
defaults.  I don't believe it's a kernel default issue but I could be
wrong.
> This will also work in older kernels (I could only test 2.6.32, not
> earlier), so in that sense it's portable.
> 
> BUT: the typical bind-mount trick one could use to keep the container
> from remounting / ro at shutdown will apparently, as far as I can
> tell, not work anymore in 3.8, possibly earlier, since typical
> shutdown will do the equivalent of remount,ro and not add the bind
> option there.

> So unfortunately, I think we'll have to stick with pinning... :(

Actually, there, I think I agree with you, unfortunately.  I think we're
stuck with it due to ill behavior in some distros and their defaults, in
particular with regards to systemd based distros.  We need to do things
in a way that do not break on a distro running the host and in a way
that doesn't allow an arbitrary distro running in a container to
propagate random acts of terrorism to the host or other containers.  But
that's probably a good paradigm for us, anyways.

> -- Christian

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  m...@wittsend.com
   /\/\|=mhw=|\/\/  | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9  | An optimist believes we live in the best of all
 PGP Key: 0x674627FF| possible worlds.  A pessimist is sure of it!


signature.asc
Description: This is a digitally signed message part
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Serge Hallyn
Quoting Michael H. Warfield (m...@wittsend.com):
> No.  There's a change there, all right, and thank you for reminding me
> of that, but (afaik) it's NOT in the kernel itself.  It's a mount
> option.  It's that bloody MS_SHARED option and, to a lessor extent,

There *is* a kernel change which dhansen was telling me about last
week - I believe it's commit 4ed5e82fe77f4147cf386327c9a63a2dd7eff518.
It allows you to now do

sudo mount -t tmpfs tmpfs /tmp/a
sudo mount -o bind,remount,ro /tmp/a /tmp/b

In the past you had to first create a bind mount before you could
mark it readonly, i.e.

sudo mount -t tmpfs tmpfs /tmp/a
sudo mount --bind /tmp/a /tmp/b
sudo mount -o remount,ro /tmp/b /tmp/b

In either case first making sure there is a bind-mount for us to mark
read-write seems to work.  (We'll have to, of course, make sure it
was actually read-write to begin with, and that the user *wants* it
read-write.  That'll be the only painful part of this patch)

-serge

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] fix some larger memory leaks in cgroup code

2013-09-24 Thread Serge Hallyn
Quoting Dwight Engen (dwight.en...@oracle.com):
> Don't worry about saved_errno since none of the *_free routines will set it
> 
> Signed-off-by: Dwight Engen 

Acked-by: Serge E. Hallyn 

Thanks, sorry for taking so long.  I'll rebase this on top of my
split-up-meta2 patch and apply

> ---
>  src/lxc/cgroup.c | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
> index 101998b..bf7a2a2 100644
> --- a/src/lxc/cgroup.c
> +++ b/src/lxc/cgroup.c
> @@ -293,7 +293,7 @@ struct cgroup_meta_data *lxc_cgroup_load_meta2(const char 
> **subsystem_whitelist)
>   }
>  
>   while (getline(&line, &sz, proc_self_mountinfo) != -1) {
> - char *token, *saveptr = NULL;
> + char *token, *line_tok, *saveptr = NULL;
>   size_t i, j, k;
>   struct cgroup_mount_point *mount_point;
>   struct cgroup_hierarchy *h;
> @@ -302,7 +302,7 @@ struct cgroup_meta_data *lxc_cgroup_load_meta2(const char 
> **subsystem_whitelist)
>   if (line[0] && line[strlen(line) - 1] == '\n')
>   line[strlen(line) - 1] = '\0';
>  
> - for (i = 0; (token = strtok_r(line, " ", &saveptr)); line = 
> NULL) {
> + for (i = 0, line_tok = line; (token = strtok_r(line_tok, " ", 
> &saveptr)); line_tok = NULL) {
>   r = lxc_grow_array((void ***)&tokens, &token_capacity, 
> i + 1, 64);
>   if (r < 0)
>   goto out4;
> @@ -441,6 +441,7 @@ struct cgroup_meta_data *lxc_cgroup_put_meta(struct 
> cgroup_meta_data *meta_data)
>   lxc_cgroup_hierarchy_free(meta_data->hierarchies[i]);
>   }
>   free(meta_data->hierarchies);
> + free(meta_data);
>   return NULL;
>  }
>  
> @@ -1067,29 +1068,30 @@ char *lxc_cgroup_get_hierarchy_abs_path(const char 
> *subsystem, const char *name,
>   struct cgroup_process_info *base_info, *info;
>   struct cgroup_mount_point *mp;
>   char *result = NULL;
> - int saved_errno;
>  
>   meta = lxc_cgroup_load_meta();
>   if (!meta)
>   return NULL;
>   base_info = lxc_cgroup_get_container_info(name, lxcpath, meta);
>   if (!base_info)
> - return NULL;
> + goto out1;
>   info = find_info_for_subsystem(base_info, subsystem);
>   if (!info)
> - return NULL;
> + goto out2;
>   if (info->designated_mount_point) {
>   mp = info->designated_mount_point; 
>   } else {
>   mp = lxc_cgroup_find_mount_point(info->hierarchy, 
> info->cgroup_path, true);
>   if (!mp)
> - return NULL;
> + goto out3;
>   }
>   result = cgroup_to_absolute_path(mp, info->cgroup_path, NULL);
> - saved_errno = errno;
> +out3:
> + lxc_cgroup_process_info_free(info);
> +out2:
>   lxc_cgroup_process_info_free(base_info);
> +out1:
>   lxc_cgroup_put_meta(meta);
> - errno = saved_errno;
>   return result;
>  }
>  
> -- 
> 1.8.1.4
> 
> 
> --
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
> ___
> Lxc-devel mailing list
> Lxc-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lxc-devel

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Michael H. Warfield
On Tue, 2013-09-24 at 17:19 -0500, Serge Hallyn wrote: 
> Quoting Michael H. Warfield (m...@wittsend.com):
> > No.  There's a change there, all right, and thank you for reminding me
> > of that, but (afaik) it's NOT in the kernel itself.  It's a mount
> > option.  It's that bloody MS_SHARED option and, to a lessor extent,
> 
> There *is* a kernel change which dhansen was telling me about last
> week - I believe it's commit 4ed5e82fe77f4147cf386327c9a63a2dd7eff518.
> It allows you to now do

>   sudo mount -t tmpfs tmpfs /tmp/a
>   sudo mount -o bind,remount,ro /tmp/a /tmp/b

> In the past you had to first create a bind mount before you could
> mark it readonly, i.e.

>   sudo mount -t tmpfs tmpfs /tmp/a
>   sudo mount --bind /tmp/a /tmp/b
>   sudo mount -o remount,ro /tmp/b /tmp/b

Interesting point.  Very interesting.  I guess I can dig into it an look
it up but, what rev did that commit show up in and does it impact the
way we handle things dependent on kernel version?

> In either case first making sure there is a bind-mount for us to mark
> read-write seems to work.  (We'll have to, of course, make sure it
> was actually read-write to begin with, and that the user *wants* it
> read-write.  That'll be the only painful part of this patch)

Yeah...  That sounds about right.

> -serge

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  m...@wittsend.com
   /\/\|=mhw=|\/\/  | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9  | An optimist believes we live in the best of all
 PGP Key: 0x674627FF| possible worlds.  A pessimist is sure of it!


signature.asc
Description: This is a digitally signed message part
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/2] split up lxc_cgroup_load_meta2

2013-09-24 Thread Serge Hallyn
This one's easier to review by looking at the before and after files.  It
splits up lxc_cgroup_load_meta2() by adding 3 helpers.

The result seems easier to reason about.  A question I had, is, should
the kernel_subsystems ** be freed in the success case?  I assumed it was
being used elsewhere but I can't find where.  Currently it is only being
freed in the error case.  I suspect we want to free it in the success
case as well.

Cc: Christian Seiler 
Cc: Dwight Engen 
Signed-off-by: Serge Hallyn 
---
 src/lxc/cgroup.c | 184 +--
 1 file changed, 110 insertions(+), 74 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index 1270b8a..72abc2f 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -107,53 +107,22 @@ struct cgroup_meta_data *lxc_cgroup_load_meta()
return md;
 }
 
-struct cgroup_meta_data *lxc_cgroup_load_meta2(const char 
**subsystem_whitelist)
+/* Step 1: determine all kernel subsystems */
+static bool find_cgroup_subsystems(char ***kernel_subsystems)
 {
-   FILE *proc_cgroups = NULL;
-   FILE *proc_self_cgroup = NULL;
-   FILE *proc_self_mountinfo = NULL;
-   bool all_kernel_subsystems = true;
-   bool all_named_subsystems = false;
-   struct cgroup_meta_data *meta_data = NULL;
-   char **kernel_subsystems = NULL;
-   size_t kernel_subsystems_count = 0;
-   size_t kernel_subsystems_capacity = 0;
-   size_t hierarchy_capacity = 0;
-   size_t mount_point_capacity = 0;
-   size_t mount_point_count = 0;
-   char **tokens = NULL;
-   size_t token_capacity = 0;
+   FILE *proc_cgroups;
+   bool bret = false;
char *line = NULL;
size_t sz = 0;
-   int r, saved_errno = 0;
-
-   /* if the subsystem whitelist is not specified, include all
-* hierarchies that contain kernel subsystems by default but
-* no hierarchies that only contain named subsystems
-*
-* if it is specified, the specifier @all will select all
-* hierarchies, @kernel will select all hierarchies with
-* kernel subsystems and @named will select all named
-* hierarchies
-*/
-   all_kernel_subsystems = subsystem_whitelist ?
-   (lxc_string_in_array("@kernel", subsystem_whitelist) || 
lxc_string_in_array("@all", subsystem_whitelist)) :
-   true;
-   all_named_subsystems = subsystem_whitelist ?
-   (lxc_string_in_array("@named", subsystem_whitelist) || 
lxc_string_in_array("@all", subsystem_whitelist)) :
-   false;
-
-   meta_data = calloc(1, sizeof(struct cgroup_meta_data));
-   if (!meta_data)
-   return NULL;
-   meta_data->ref = 1;
+   size_t kernel_subsystems_count = 0;
+   size_t kernel_subsystems_capacity = 0;
+   int r;
 
-   /* Step 1: determine all kernel subsystems */
process_lock();
proc_cgroups = fopen_cloexec("/proc/cgroups", "r");
process_unlock();
if (!proc_cgroups)
-   goto out_error;
+   return false;
 
while (getline(&line, &sz, proc_cgroups) != -1) {
char *tab1;
@@ -180,24 +149,38 @@ struct cgroup_meta_data *lxc_cgroup_load_meta2(const char 
**subsystem_whitelist)
continue;
(void)hierarchy_number;
 
-   r = lxc_grow_array((void ***)&kernel_subsystems, 
&kernel_subsystems_capacity, kernel_subsystems_count + 1, 12);
+   r = lxc_grow_array((void ***)kernel_subsystems, 
&kernel_subsystems_capacity, kernel_subsystems_count + 1, 12);
if (r < 0)
-   goto out_error;
-   kernel_subsystems[kernel_subsystems_count] = strdup(line);
-   if (!kernel_subsystems[kernel_subsystems_count])
-   goto out_error;
+   goto out;
+   (*kernel_subsystems)[kernel_subsystems_count] = strdup(line);
+   if (!(*kernel_subsystems)[kernel_subsystems_count])
+   goto out;
kernel_subsystems_count++;
}
+   bret = true;
 
+out:
process_lock();
fclose(proc_cgroups);
process_unlock();
-   proc_cgroups = NULL;
+   return bret;
+}
+
+/* Step 2: determine all hierarchies (by reading /proc/self/cgroup),
+ * since mount points don't specify hierarchy number and
+ * /proc/cgroups does not contain named hierarchies
+ */
+static bool find_cgroup_hierarchies(struct cgroup_meta_data *meta_data,
+   bool all_kernel_subsystems, bool all_named_subsystems,
+   const char **subsystem_whitelist)
+{
+   FILE *proc_self_cgroup;
+   char *line = NULL;
+   size_t sz = 0;
+   int r;
+   bool bret = false;
+   size_t hierarchy_capacity = 0;
 
-   /* Step 2: determine all hierarchies (by reading /proc/self/cgroup),
-* since mount points don't specify hierarchy num

[lxc-devel] [PATCH 2/2] fix some larger memory leaks in cgroup code

2013-09-24 Thread Serge Hallyn
From: Dwight Engen 

Don't worry about saved_errno since none of the *_free routines will set it

Signed-off-by: Dwight Engen 
Signed-off-by: Serge Hallyn 
---
 src/lxc/cgroup.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index 72abc2f..730d3b7 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -296,7 +296,7 @@ static bool find_hierarchy_mountpts( struct 
cgroup_meta_data *meta_data, char **
return false;
 
while (getline(&line, &sz, proc_self_mountinfo) != -1) {
-   char *token, *saveptr = NULL;
+   char *token, *line_tok, *saveptr = NULL;
size_t i, j, k;
struct cgroup_mount_point *mount_point;
struct cgroup_hierarchy *h;
@@ -305,7 +305,7 @@ static bool find_hierarchy_mountpts( struct 
cgroup_meta_data *meta_data, char **
if (line[0] && line[strlen(line) - 1] == '\n')
line[strlen(line) - 1] = '\0';
 
-   for (i = 0; (token = strtok_r(line, " ", &saveptr)); line = 
NULL) {
+   for (i = 0, line_tok = line; (token = strtok_r(line_tok, " ", 
&saveptr)); line_tok = NULL) {
r = lxc_grow_array((void ***)&tokens, &token_capacity, 
i + 1, 64);
if (r < 0)
goto out;
@@ -477,6 +477,7 @@ struct cgroup_meta_data *lxc_cgroup_put_meta(struct 
cgroup_meta_data *meta_data)
lxc_cgroup_hierarchy_free(meta_data->hierarchies[i]);
}
free(meta_data->hierarchies);
+   free(meta_data);
return NULL;
 }
 
@@ -1103,29 +1104,30 @@ char *lxc_cgroup_get_hierarchy_abs_path(const char 
*subsystem, const char *name,
struct cgroup_process_info *base_info, *info;
struct cgroup_mount_point *mp;
char *result = NULL;
-   int saved_errno;
 
meta = lxc_cgroup_load_meta();
if (!meta)
return NULL;
base_info = lxc_cgroup_get_container_info(name, lxcpath, meta);
if (!base_info)
-   return NULL;
+   goto out1;
info = find_info_for_subsystem(base_info, subsystem);
if (!info)
-   return NULL;
+   goto out2;
if (info->designated_mount_point) {
mp = info->designated_mount_point;
} else {
mp = lxc_cgroup_find_mount_point(info->hierarchy, 
info->cgroup_path, true);
if (!mp)
-   return NULL;
+   goto out3;
}
result = cgroup_to_absolute_path(mp, info->cgroup_path, NULL);
-   saved_errno = errno;
+out3:
+   lxc_cgroup_process_info_free(info);
+out2:
lxc_cgroup_process_info_free(base_info);
+out1:
lxc_cgroup_put_meta(meta);
-   errno = saved_errno;
return result;
 }
 
-- 
1.8.3.2


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [RFC] rootfs pinning

2013-09-24 Thread Christian Seiler
Hi there,

> No.  There's a change there, all right, and thank you for reminding 
> me
> of that, but (afaik) it's NOT in the kernel itself.  It's a mount
> option.  It's that bloody MS_SHARED option and, to a lessor extent,
> MS_SLAVE option that are behind how those things are propagated.
> MS_SHARED will propagate certain things from a child mount to the 
> mount
> point and to other children, IIRC, while MS_SLAVE propagates in one
> direction and MS_PRIVATE restricts it.  I think the trouble maker is
> MS_SHARED and that's what caused all the "pivot_root" calls to face
> plant when systemd started mounting everything with MS_SHARED in the
> host system.  I was using bind mounts to avoid some of these problems
> but then they changed systemd and its default mount options and broke 
> a
> number of things I had running.

This is not MS_SHARED. The 3.8 instance I'm testing this with is
a Debian Wheezy with a custom kernel (the 3.8 from Serge's and/or
Stéphane's repository for userns which floated around here half
a year ago or so. (I never had a chance to upgrade, it's in a KVM,
so that I don't break my main  system).

Look at the following:

root@lxcdev:~# mkdir /foo/bar /foo/baz -p
root@lxcdev:~# mount --bind /foo/bar /foo/baz
root@lxcdev:~# grep /foo /proc/self/mountinfo
25 20 253:1 /foo/bar /foo/baz rw,relatime - ext4 
/dev/disk/by-uuid/b2e1ac13-e6d0-48e7-a3b0-9fcdf81db294 
rw,errors=remount-ro,data=ordered
root@lxcdev:~# grep ^20 /proc/self/mountinfo
20 1 253:1 / / rw,relatime - ext4 
/dev/disk/by-uuid/b2e1ac13-e6d0-48e7-a3b0-9fcdf81db294 
rw,errors=remount-ro,data=ordered
root@lxcdev:~# mount /foo/baz -o remount,ro
mount: /foo/baz is busy
root@lxcdev:~# mount /foo/baz -o remount,bind,ro
root@lxcdev:~# grep /foo /proc/self/mountinfo
25 20 253:1 /foo/bar /foo/baz ro,relatime - ext4 
/dev/disk/by-uuid/b2e1ac13-e6d0-48e7-a3b0-9fcdf81db294 
rw,errors=remount-ro,data=ordered
root@lxcdev:~# uname -a
Linux lxcdev 3.8.0-rc3+ #1 SMP Sun Jan 27 16:39:34 CET 2013 x86_64 
GNU/Linux

I don't see any shared: in /proc/self/mountinfo. Obviously,
this could be a side-effect of the specific kernel I'm using,
but I don't recall the additional userns patches to change
anything in that regard.

Also note that a mount --make-private / doesn't change
anything. And that this isn't even in an own namespace.

I don't have that much time atm, so I won't be able to
check with a current official kernel.

-- Christian


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel