[lxc-devel] [PATCH] Set high byte of mac addresses for host veth devices to 0xfe

2011-11-16 Thread Christian Seiler
Hi,

I've run into the same problem as was discussed in BUG #3411497 [1] and on
the users mailing list [2]. To solve this, I've decided to implement the
patch that was proposed on the mailing list [3].

The attached patch is against current trunk. Since trunk currently doesn't
compile for me, I tested the patch against the current Debian package for
LXC version 0.7.2. There, it still applies and works as expected for me,
the bridge interface still keeps its mac address and the high byte of the
mac address of the host veth interface is correctly set to 0xfe.

It would be great if this patch or a slightly modified version could be
applied to LXC.

Thanks,

Christian

[1]
http://sourceforge.net/tracker/index.php?func=detail&aid=3411497&group_id=163076&atid=826303
[2] http://thread.gmane.org/gmane.linux.kernel.containers.lxc.general/2709
[3]
http://article.gmane.org/gmane.linux.kernel.containers.lxc.general/2796From e1b4779a89964ec43fa2bc5f76fafd965c89f73f Mon Sep 17 00:00:00 2001
From: Christian Seiler 
Date: Tue, 15 Nov 2011 18:53:53 +0100
Subject: [PATCH] Set high byte of mac addresses for host veth devices to 0xfe

When used in conjunction with a bridge, veth devices with random addresses
may change the mac address of the bridge itself if the mac address of the
interface newly added is numerically lower than the previous mac address
of the bridge. This is documented kernel behavior. To avoid changing the
host's mac address back and forth when starting and/or stopping containers,
this patch ensures that the high byte of the mac address of the veth
interface visible from the host side is set to 0xfe.

A similar logic is also implemented in libvirt.

Fixes SF bug #3411497
See also: <http://thread.gmane.org/gmane.linux.kernel.containers.lxc.general/2709>
---
 src/lxc/conf.c |   40 
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index 613e476..a5d067b 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -1402,6 +1402,36 @@ static int setup_network(struct lxc_list *network)
 	return 0;
 }
 
+static int setup_private_host_hw_addr(char *veth1)
+{
+	struct ifreq ifr;
+	int err;
+	int sockfd;
+	
+	sockfd = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sockfd < 0)
+		return -errno;
+	
+	snprintf((char *)ifr.ifr_name, IFNAMSIZ, "%s", veth1);
+	err = ioctl(sockfd, SIOCGIFHWADDR, &ifr);
+	if (err < 0) {
+		close(sockfd);
+		return -errno;
+	}
+	
+	ifr.ifr_hwaddr.sa_data[0] = 0xfe;
+	err = ioctl(sockfd, SIOCSIFHWADDR, &ifr);
+	close(sockfd);
+	if (err < 0)
+		return -errno;
+	
+	DEBUG("mac address of host interface '%s' changed to private %02x:%02x:%02x:%02x:%02x:%02x",
+	  veth1, ifr.ifr_hwaddr.sa_data[0] & 0xff, ifr.ifr_hwaddr.sa_data[1] & 0xff, ifr.ifr_hwaddr.sa_data[2] & 0xff,
+	  ifr.ifr_hwaddr.sa_data[3] & 0xff, ifr.ifr_hwaddr.sa_data[4] & 0xff, ifr.ifr_hwaddr.sa_data[5] & 0xff);
+	
+	return 0;
+}
+
 struct lxc_conf *lxc_conf_init(void)
 {
 	struct lxc_conf *new;
@@ -1455,6 +1485,16 @@ static int instanciate_veth(struct lxc_handler *handler, struct lxc_netdev *netd
 		  strerror(-err));
 		return -1;
 	}
+	
+	/* changing the high byte of the mac address to 0xfe, the bridge interface
+	 * will always keep the host's mac address and not take the mac address
+	 * of a container */
+	err = setup_private_host_hw_addr(veth1);
+	if (err) {
+		ERROR("failed to change mac address of host interface '%s' : %s",
+			veth1, strerror(-err));
+		goto out_delete;
+	}
 
 	if (netdev->mtu) {
 		err = lxc_netdev_set_mtu(veth1, atoi(netdev->mtu));
-- 
1.7.2.5

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Set high byte of mac addresses for host veth devices to 0xfe

2011-12-21 Thread Christian Seiler
Hi,

Sorry I didn't reply earlier.

> Thanks, sorry for the trouble.  Looks good, with one exception - if 
> ioctl failed, then you may end up returning the wrong errno (from the 
> close syscall).  With that fixed, please do apply.

Oh, yes, nice catch. I've attached a fixed patch.

Thanks!

Christian
>From 0dce40ea882c560e0847a78058f962cd20fb4813 Mon Sep 17 00:00:00 2001
From: Christian Seiler 
Date: Tue, 15 Nov 2011 18:53:53 +0100
Subject: [PATCH] Set high byte of mac addresses for host veth devices to 0xfe

When used in conjunction with a bridge, veth devices with random addresses
may change the mac address of the bridge itself if the mac address of the
interface newly added is numerically lower than the previous mac address
of the bridge. This is documented kernel behavior. To avoid changing the
host's mac address back and forth when starting and/or stopping containers,
this patch ensures that the high byte of the mac address of the veth
interface visible from the host side is set to 0xfe.

A similar logic has previously also been implemented by the libvirt
developers for libvirt.

Fixes SF bug #3411497
See also: <http://thread.gmane.org/gmane.linux.kernel.containers.lxc.general/2709>
---
 src/lxc/conf.c |   41 +
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index 613e476..b9b25f9 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -1402,6 +1402,37 @@ static int setup_network(struct lxc_list *network)
 	return 0;
 }
 
+static int setup_private_host_hw_addr(char *veth1)
+{
+	struct ifreq ifr;
+	int err;
+	int sockfd;
+	
+	sockfd = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sockfd < 0)
+		return -errno;
+	
+	snprintf((char *)ifr.ifr_name, IFNAMSIZ, "%s", veth1);
+	err = ioctl(sockfd, SIOCGIFHWADDR, &ifr);
+	if (err < 0) {
+		err = -errno;
+		close(sockfd);
+		return err;
+	}
+	
+	ifr.ifr_hwaddr.sa_data[0] = 0xfe;
+	err = ioctl(sockfd, SIOCSIFHWADDR, &ifr);
+	close(sockfd);
+	if (err < 0)
+		return -errno;
+	
+	DEBUG("mac address of host interface '%s' changed to private %02x:%02x:%02x:%02x:%02x:%02x",
+	  veth1, ifr.ifr_hwaddr.sa_data[0] & 0xff, ifr.ifr_hwaddr.sa_data[1] & 0xff, ifr.ifr_hwaddr.sa_data[2] & 0xff,
+	  ifr.ifr_hwaddr.sa_data[3] & 0xff, ifr.ifr_hwaddr.sa_data[4] & 0xff, ifr.ifr_hwaddr.sa_data[5] & 0xff);
+	
+	return 0;
+}
+
 struct lxc_conf *lxc_conf_init(void)
 {
 	struct lxc_conf *new;
@@ -1455,6 +1486,16 @@ static int instanciate_veth(struct lxc_handler *handler, struct lxc_netdev *netd
 		  strerror(-err));
 		return -1;
 	}
+	
+	/* changing the high byte of the mac address to 0xfe, the bridge interface
+	 * will always keep the host's mac address and not take the mac address
+	 * of a container */
+	err = setup_private_host_hw_addr(veth1);
+	if (err) {
+		ERROR("failed to change mac address of host interface '%s' : %s",
+			veth1, strerror(-err));
+		goto out_delete;
+	}
 
 	if (netdev->mtu) {
 		err = lxc_netdev_set_mtu(veth1, atoi(netdev->mtu));
-- 
1.7.2.5

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Set high byte of mac addresses for host veth devices to 0xfe

2011-12-21 Thread Christian Seiler
Hi again,

> Sorry I didn't reply earlier.
> 
>> Thanks, sorry for the trouble.  Looks good, with one exception - if 
>> ioctl failed, then you may end up returning the wrong errno (from the 
>> close syscall).  With that fixed, please do apply.
> 
> Oh, yes, nice catch. I've attached a fixed patch.

I shouldn't be fixing patches so late at night, I only caught the first
instance of this. Now I've attached a version which really fixes the
problem. (Hopefully.)

Sorry about the confusion.

Christian
>From 83f4ee619ed322be2845fd8ae988730cae2f36b5 Mon Sep 17 00:00:00 2001
From: Christian Seiler 
Date: Tue, 15 Nov 2011 18:53:53 +0100
Subject: [PATCH] Set high byte of mac addresses for host veth devices to 0xfe

When used in conjunction with a bridge, veth devices with random addresses
may change the mac address of the bridge itself if the mac address of the
interface newly added is numerically lower than the previous mac address
of the bridge. This is documented kernel behavior. To avoid changing the
host's mac address back and forth when starting and/or stopping containers,
this patch ensures that the high byte of the mac address of the veth
interface visible from the host side is set to 0xfe.

A similar logic has previously also been implemented by the libvirt
developers for libvirt.

Fixes SF bug #3411497
See also: <http://thread.gmane.org/gmane.linux.kernel.containers.lxc.general/2709>
---
 src/lxc/conf.c |   43 +++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index 613e476..517edec 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -1402,6 +1402,39 @@ static int setup_network(struct lxc_list *network)
 	return 0;
 }
 
+static int setup_private_host_hw_addr(char *veth1)
+{
+	struct ifreq ifr;
+	int err;
+	int saved_errno;
+	int sockfd;
+	
+	sockfd = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sockfd < 0)
+		return -errno;
+	
+	snprintf((char *)ifr.ifr_name, IFNAMSIZ, "%s", veth1);
+	err = ioctl(sockfd, SIOCGIFHWADDR, &ifr);
+	if (err < 0) {
+		saved_errno = errno;
+		close(sockfd);
+		return -saved_errno;
+	}
+	
+	ifr.ifr_hwaddr.sa_data[0] = 0xfe;
+	err = ioctl(sockfd, SIOCSIFHWADDR, &ifr);
+	saved_errno = errno;
+	close(sockfd);
+	if (err < 0)
+		return -saved_errno;
+	
+	DEBUG("mac address of host interface '%s' changed to private %02x:%02x:%02x:%02x:%02x:%02x",
+	  veth1, ifr.ifr_hwaddr.sa_data[0] & 0xff, ifr.ifr_hwaddr.sa_data[1] & 0xff, ifr.ifr_hwaddr.sa_data[2] & 0xff,
+	  ifr.ifr_hwaddr.sa_data[3] & 0xff, ifr.ifr_hwaddr.sa_data[4] & 0xff, ifr.ifr_hwaddr.sa_data[5] & 0xff);
+	
+	return 0;
+}
+
 struct lxc_conf *lxc_conf_init(void)
 {
 	struct lxc_conf *new;
@@ -1455,6 +1488,16 @@ static int instanciate_veth(struct lxc_handler *handler, struct lxc_netdev *netd
 		  strerror(-err));
 		return -1;
 	}
+	
+	/* changing the high byte of the mac address to 0xfe, the bridge interface
+	 * will always keep the host's mac address and not take the mac address
+	 * of a container */
+	err = setup_private_host_hw_addr(veth1);
+	if (err) {
+		ERROR("failed to change mac address of host interface '%s' : %s",
+			veth1, strerror(-err));
+		goto out_delete;
+	}
 
 	if (netdev->mtu) {
 		err = lxc_netdev_set_mtu(veth1, atoi(netdev->mtu));
-- 
1.7.2.5

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] lxc-attach and capabilities

2011-12-22 Thread Christian Seiler
Hi,

Using kernel 3.1 and the LXC patches[*] to make lxc-attach work, if I
drop capabilities such as CAP_NET_ADMIN from a container, if I access
the container with lxc-attach, I have the full capabilities available
in my host shell, not the limited capabilities of the container.

Is this on purpose? In my opinion the sensible behaviour would be to
acquire the same capabilities as configured for the container. On the
other hand, it could be useful to enter the container and keep the
capabilities if, for example, one wants to reconfigure parts of the
network (which cannot be done directly frome the outside since the
network namespace separates these devices).

The way I see it, the ideal solution would probably be that lxc-attach
drops its capabilities by default (according to the config of the
container specified with the -n option) and that there is an option
(e.g. --keep-capabilities) that overrides this, in case the admin wants
to execute something in the container with elevated privileges.

If you agree with me on the behaviour, I'd be happy to write a patch
that implements this.

Christian

[*] http://lxc.sourceforge.net/patches/linux/3.0.0/3.0.0-lxc1/
 Btw. they do not cleanly apply against 3.1 anymore, but can
 be trivially modified. And are these patches going to be
 merged with the official kernel tree at some point?


--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [GIT] lxc branch, master, updated. aa198728a83e7016cd02583349fce1f5b1a60c66

2012-01-05 Thread Christian Seiler
Hi there,

> commit 49684c0b43d79310429b314e484ac2b1ab4ac6a1
> Author: Christian Seiler 
> Date:   Tue Nov 15 18:53:53 2011 +0100

Thanks a lot for applying this. Unfortunately, you did not include the
fixes for errno handling that came up later in the thread, where I
resent the patch. I've attached a patch against the current master
branch that takes care of this.

Christian
>From 56a822e6f035fe37a67e45d499419376de5d8711 Mon Sep 17 00:00:00 2001
From: Christian Seiler 
Date: Fri, 6 Jan 2012 00:10:48 +0100
Subject: [PATCH] Fix errno handling in setup_private_host_hw_addr

The function did not return the correct errno in some cases.
---
 src/lxc/conf.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index 5e41d38..1a9851e 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -1406,6 +1406,7 @@ static int setup_private_host_hw_addr(char *veth1)
 {
 	struct ifreq ifr;
 	int err;
+	int saved_errno;
 	int sockfd;
 
 	sockfd = socket(AF_INET, SOCK_DGRAM, 0);
@@ -1415,15 +1416,17 @@ static int setup_private_host_hw_addr(char *veth1)
 	snprintf((char *)ifr.ifr_name, IFNAMSIZ, "%s", veth1);
 	err = ioctl(sockfd, SIOCGIFHWADDR, &ifr);
 	if (err < 0) {
+		saved_errno = errno;
 		close(sockfd);
-		return -errno;
+		return -saved_errno;
 	}
 
 	ifr.ifr_hwaddr.sa_data[0] = 0xfe;
 	err = ioctl(sockfd, SIOCSIFHWADDR, &ifr);
+	saved_errno = errno;
 	close(sockfd);
 	if (err < 0)
-		return -errno;
+		return -saved_errno;
 
 	DEBUG("mac address of host interface '%s' changed to private "
 	  "%02x:%02x:%02x:%02x:%02x:%02x", veth1,
-- 
1.7.2.5

--
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 3/3] Accept numeric values for capabilities to drop

2012-02-01 Thread Christian Seiler
lxc.cap.drop now also accepts numeric values for capabilities. This allows
the user to specify capabilities LXC doesn't know about yet or capabilities
that were not part of the kernel headers LXC was compiled against.
---
 src/lxc/conf.c |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index 3fbc0eb..d3c1052 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -60,6 +60,7 @@
 #include "conf.h"
 #include "log.h"
 #include "lxc.h"   /* for lxc_cgroup_set() */
+#include "caps.h"   /* for lxc_caps_last_cap() */
 
 lxc_log_define(lxc_conf, lxc);
 
@@ -1123,6 +1124,7 @@ static int setup_caps(struct lxc_list *caps)
 {
struct lxc_list *iterator;
char *drop_entry;
+   char *ptr;
int i, capid;
 
lxc_list_for_each(iterator, caps) {
@@ -1140,6 +1142,21 @@ static int setup_caps(struct lxc_list *caps)
break;
}
 
+   if (capid < 0) {
+   /* try to see if it's numeric, so the user may specify
+   * capabilities  that the running kernel knows about but
+   * we don't */
+   capid = strtol(drop_entry, &ptr, 10);
+   if (!ptr || *ptr != '\0' ||
+   capid == LONG_MIN || capid == LONG_MAX)
+   /* not a valid number */
+   capid = -1;
+   else if (capid > lxc_caps_last_cap())
+   /* we have a number but it's not a valid
+   * capability */
+   capid = -1;
+   }
+
if (capid < 0) {
ERROR("unknown capability %s", drop_entry);
return -1;
-- 
1.7.2.5


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 2/3] Add CAP_SYSLOG and CAP_WAKE_ALARM to list of capabilities

2012-02-01 Thread Christian Seiler
---
 src/lxc/conf.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index 1a9851e..3fbc0eb 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -201,6 +201,12 @@ static struct caps_opt caps_opt[] = {
{ "setfcap",   CAP_SETFCAP   },
{ "mac_override",  CAP_MAC_OVERRIDE  },
{ "mac_admin", CAP_MAC_ADMIN },
+#ifdef CAP_SYSLOG
+   { "syslog",CAP_SYSLOG},
+#endif
+#ifdef CAP_WAKE_ALARM
+   { "wake_alarm",CAP_WAKE_ALARM},
+#endif
 };
 
 static int run_script(const char *name, const char *section,
-- 
1.7.2.5


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/3] Add function to determine CAP_LAST_CAP of the current kernel dynamically

2012-02-01 Thread Christian Seiler
The function lxc_caps_last_cap() determines CAP_LAST_CAP of the current kernel
dynamically. It first tries to read /proc/sys/kernel/cap_last_cap. If that
fails, because the kernel does not support this interface yet, it loops
through all capabilities and tries to determine whether the current capability
is part of the bounding set. The first capability for which prctl() fails is
considered to be CAP_LAST_CAP.
---
 src/lxc/caps.c |   46 ++
 src/lxc/caps.h |2 ++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/src/lxc/caps.c b/src/lxc/caps.c
index 46a2766..d919353 100644
--- a/src/lxc/caps.c
+++ b/src/lxc/caps.c
@@ -23,6 +23,9 @@
 
 #define _GNU_SOURCE
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 
@@ -167,3 +170,46 @@ int lxc_caps_init(void)
 
return 0;
 }
+
+static int _real_caps_last_cap(void)
+{
+   int fd;
+   int result = -1;
+
+   /* try to get the maximum capability over the kernel
+   * interface introduced in v3.2 */
+   fd = open("/proc/sys/kernel/cap_last_cap", O_RDONLY);
+   if (fd >= 0) {
+   char buf[32];
+   char *ptr;
+   int n;
+
+   if ((n = read(fd, buf, 31)) >= 0) {
+   buf[n] = '\0';
+   result = strtol(buf, &ptr, 10);
+   if (!ptr || (*ptr != '\0' && *ptr != '\n') ||
+   result == LONG_MIN || result == LONG_MAX)
+   result = -1;
+   }
+
+   close(fd);
+   }
+
+   /* try to get it manually by trying to get the status of
+   * each capability indiviually from the kernel */
+   if (result < 0) {
+   int cap = 0;
+   while (prctl(PR_CAPBSET_READ, cap) >= 0) cap++;
+   result = cap - 1;
+   }
+   
+   return result;
+}
+
+int lxc_caps_last_cap(void)
+{
+   static int last_cap = -1;
+   if (last_cap < 0) last_cap = _real_caps_last_cap();
+
+   return last_cap;
+}
diff --git a/src/lxc/caps.h b/src/lxc/caps.h
index 4c07b69..e4e0d42 100644
--- a/src/lxc/caps.h
+++ b/src/lxc/caps.h
@@ -28,6 +28,8 @@ extern int lxc_caps_down(void);
 extern int lxc_caps_up(void);
 extern int lxc_caps_init(void);
 
+extern int lxc_caps_last_cap(void);
+
 #define lxc_priv(__lxc_function)   \
({  \
int __ret, __ret2, __errno = 0; \
-- 
1.7.2.5


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH] Improve capability handling in LXC

2012-02-01 Thread Christian Seiler
Hi,

I've attached patches that improve capability handling in LXC. I stumbled
upon the issue that I wanted to deactivate "dmesg" from inside containers
with a fairly recent kernel. Instead of dropping CAP_SYS_ADMIN, as it was
the case with previous kernel versions, one is now supposed to drop
CAP_SYSLOG. Unfortunately, LXC doesn't know about it yet.

The attached patches do the following:
 - add CAP_SYSLOG and CAP_WAKE_ALARM to the list of capabilities, since
   they are new
 - add a function that determines the maximum number of capabilities the
   current running kernel (not the one LXC is compiled against) supports
 - support the specification of numerical IDs for capabilities when using
   lxc.cap.drop. Then, even if LXC doesn't understand the capability or
   was compiled against an older kernel, it is still possible to drop that
   specific capability.

Christian


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 3/4] Add lxc_setup_for_attach function

2012-02-03 Thread Christian Seiler
lxc_setup_for_attach changes the context of the current running process in
such a way that it matches that of the container it is supposed to attach to
(personality, capabilities).
---
 src/lxc/conf.c |   16 
 src/lxc/conf.h |2 ++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index d3c1052..24f10e8 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -1945,3 +1945,19 @@ int lxc_setup(const char *name, struct lxc_conf 
*lxc_conf)
 
return 0;
 }
+
+int lxc_setup_for_attach(const char *name, struct lxc_conf *lxc_conf, int 
keep_capabilities)
+{
+   if (setup_personality(lxc_conf->personality)) {
+   ERROR("failed to setup personality");
+   return -1;
+   }
+
+   if (!keep_capabilities && setup_caps(&lxc_conf->caps)) {
+   ERROR("failed to drop capabilities");
+   return -1;
+   }
+
+   return 0;
+}
+
diff --git a/src/lxc/conf.h b/src/lxc/conf.h
index 973f694..745a840 100644
--- a/src/lxc/conf.h
+++ b/src/lxc/conf.h
@@ -232,4 +232,6 @@ extern void lxc_delete_tty(struct lxc_tty_info *tty_info);
  */
 
 extern int lxc_setup(const char *name, struct lxc_conf *lxc_conf);
+extern int lxc_setup_for_attach(const char *name, struct lxc_conf *lxc_conf, 
int keep_capabilities);
+
 #endif
-- 
1.7.2.5


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 4/4] lxc-attach: Change cgroup, personality and drop capabilities when attaching to container

2012-02-03 Thread Christian Seiler
lxc-attach is reworked so that it adds the newly attached process to the
cgroup of the container, changes the personality of the process to that
of the container and drops capabilities to those specified in the container
configuration file. The latter can be overridden with a new option that
allows to retain capabilities.

In order to correctly put the new process in the correct cgroup, lxc-attach
now uses a similar synchronization logic to lxc-start, i.e. the parent
process puts the child into the cgroup, and then tells the child to proceed.
The child then attaches itself to the correct namespace, changes personality
and drops capabilities. This also implies that the fork() is done before
attaching to the container's namespaces, i.e. only the child process will do
that, the parent is left out of it.
---
 src/lxc/lxc_attach.c |  151 -
 1 files changed, 135 insertions(+), 16 deletions(-)

diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index ed3d5a4..016806d 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -35,13 +35,35 @@
 #include "namespace.h"
 #include "caps.h"
 #include "log.h"
+#include "conf.h"
+#include "confile.h"
+#include "start.h"
+#include "sync.h"
+#include "cgroup.h"
 
 lxc_log_define(lxc_attach_ui, lxc);
 
+static struct lxc_list defines;
+
 static const struct option my_longopts[] = {
+   {"rcfile", required_argument, 0, 'f'},
+   {"define", required_argument, 0, 's'},
+   {"keep-capabilities", no_argument, 0, 'k'},
LXC_COMMON_OPTIONS
 };
 
+static int keep_capabilities = 0;
+
+static int my_parser(struct lxc_arguments* args, int c, char* arg)
+{
+   switch (c) {
+   case 'f': args->rcfile = arg; break;
+   case 's': return lxc_config_define_add(&defines, arg);
+   case 'k': keep_capabilities = 1; break;
+   }
+   return 0;
+}
+
 static struct lxc_arguments my_args = {
.progname = "lxc-attach",
.help = "\
@@ -50,19 +72,31 @@ static struct lxc_arguments my_args = {
 Execute the specified command - enter the container NAME\n\
 \n\
 Options :\n\
-  -n, --name=NAME   NAME for name of the container\n",
+  -n, --name=NAMENAME for name of the container\n\
+  -f, --rcfile=FILE  Load configuration file FILE\n\
+  -s, --define KEY=VAL   Assign VAL to configuration variable KEY\n\
+  -k, --keep-kapabilties Don't drop capabilities when attaching to container\n\
+ WARNING: This may leak capabilities into the\n\
+ container, especially if using lxc-attach to\n\
+ start programs such as sshd or cron.",
.options  = my_longopts,
-   .parser   = NULL,
+   .parser   = my_parser,
.checker  = NULL,
 };
 
 int main(int argc, char *argv[], char *envp[])
 {
+   int err = -1;
int ret;
pid_t pid;
struct passwd *passwd;
+   struct lxc_conf *conf;
+   struct lxc_handler *handler;
uid_t uid;
char *curdir;
+   char *rcfile = NULL;
+   
+   lxc_list_init(&defines);
 
ret = lxc_caps_init();
if (ret)
@@ -77,40 +111,96 @@ int main(int argc, char *argv[], char *envp[])
if (ret)
return ret;
 
+   /* rcfile is specified in the cli option */
+   if (my_args.rcfile)
+   rcfile = (char *)my_args.rcfile;
+   else {
+   int rc;
+
+   rc = asprintf(&rcfile, LXCPATH "/%s/config", my_args.name);
+   if (rc == -1) {
+   SYSERROR("failed to allocate memory");
+   return err;
+   }
+
+   /* container configuration does not exist */
+   if (access(rcfile, F_OK)) {
+   free(rcfile);
+   rcfile = NULL;
+   }
+   }
+
+   conf = lxc_conf_init();
+   if (!conf) {
+   ERROR("failed to initialize configuration");
+   return err;
+   }
+
+   if (rcfile && lxc_config_read(rcfile, conf)) {
+   ERROR("failed to read configuration file");
+   return err;
+   }
+
+   if (lxc_config_define_load(&defines, conf))
+   return err;
+
pid = get_init_pid(my_args.name);
if (pid < 0) {
ERROR("failed to get the init pid");
return -1;
}
 
-   curdir = get_current_dir_name();
-
-   ret = lxc_attach(pid);
-   if (ret < 0) {
-   ERROR("failed to enter the namespace");
+   /* hack: we need sync.h infrastructure - and that needs a handler */
+   handler = malloc(sizeof(*handler));
+   if (!handler) {
+   ERROR("failed to allocate memory for internal handle");
return -1;
}
 
-   if (curdir && chdir(curdir))
-   WARN("could not change directory to '%s'", curdir);
+   memset(handler, 0, sizeof(*handler));
 
- 

[lxc-devel] [PATCH 2/4] Add lxc_cgroup_attach function

2012-02-03 Thread Christian Seiler
This commit adds the lxc_cgroup_attach function that adds a pid to the tasks
file of a specific cgroup in all subsystems. This is required for lxc-attach
to be able to put newly started processes in the same cgroup as the
container.
---
 src/lxc/cgroup.c |   47 +++
 src/lxc/cgroup.h |1 +
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index 6ae67bd..db5d2fa 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -433,3 +433,50 @@ int lxc_cgroup_nrtasks(const char *name)
 
return count;
 }
+
+int lxc_cgroup_attach(const char *name, pid_t pid)
+{
+   char cgname[MAXPATHLEN];
+   struct mntent *mntent;
+   FILE *file = NULL;
+   int err = -1;
+   int found = 0;
+
+   file = setmntent(MTAB, "r");
+   if (!file) {
+   SYSERROR("failed to open %s", MTAB);
+   return -1;
+   }
+
+   while ((mntent = getmntent(file))) {
+
+   DEBUG("checking '%s' (%s)", mntent->mnt_dir, mntent->mnt_type);
+
+   if (!strcmp(mntent->mnt_type, "cgroup")) {
+
+   INFO("[%d] found cgroup mounted at '%s',opts='%s'",
+++found, mntent->mnt_dir, mntent->mnt_opts);
+
+   snprintf(cgname, MAXPATHLEN, "%s/%s",
+mntent->mnt_dir, name);
+
+   if (access(cgname, F_OK)) {
+   ERROR("No cgroup '%s' found "
+ "in subsystem mounted at '%s'",
+ name, mntent->mnt_dir);
+   goto out;
+   }
+
+   err = cgroup_attach(cgname, pid);
+   if (err)
+   goto out;
+   }
+   };
+
+   if (!found)
+   ERROR("No cgroup mounted on the system");
+
+out:
+   endmntent(file);
+   return err;
+}
diff --git a/src/lxc/cgroup.h b/src/lxc/cgroup.h
index 31dd2de..3c90696 100644
--- a/src/lxc/cgroup.h
+++ b/src/lxc/cgroup.h
@@ -30,5 +30,6 @@ extern int lxc_cgroup_create(const char *name, pid_t pid);
 extern int lxc_cgroup_destroy(const char *name);
 extern int lxc_cgroup_path_get(char **path, const char *subsystem, const char 
*name);
 extern int lxc_cgroup_nrtasks(const char *name);
+extern int lxc_cgroup_attach(const char *name, pid_t pid);
 extern int lxc_ns_is_mounted(void);
 #endif
-- 
1.7.2.5


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/4] Add missing 'extern' keyword to functions defined in cgroup.h

2012-02-03 Thread Christian Seiler
---
 src/lxc/cgroup.h |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/lxc/cgroup.h b/src/lxc/cgroup.h
index 188d948..31dd2de 100644
--- a/src/lxc/cgroup.h
+++ b/src/lxc/cgroup.h
@@ -26,9 +26,9 @@
 #define MAXPRIOLEN 24
 
 struct lxc_handler;
-int lxc_cgroup_create(const char *name, pid_t pid);
-int lxc_cgroup_destroy(const char *name);
-int lxc_cgroup_path_get(char **path, const char *subsystem, const char *name);
-int lxc_cgroup_nrtasks(const char *name);
-int lxc_ns_is_mounted(void);
+extern int lxc_cgroup_create(const char *name, pid_t pid);
+extern int lxc_cgroup_destroy(const char *name);
+extern int lxc_cgroup_path_get(char **path, const char *subsystem, const char 
*name);
+extern int lxc_cgroup_nrtasks(const char *name);
+extern int lxc_ns_is_mounted(void);
 #endif
-- 
1.7.2.5


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH] lxc-attach: Consider cgroup, personality and capabilities when attaching processes to a container

2012-02-03 Thread Christian Seiler
Hi,

As I didn't hear anything on this issue, I looked at it more closely and
found found that not only are capabilities currently not dropped from
withing lxc, but also the personality is not set correctly and the newly
started process is not put in the correct cgroup (circumventing e.g. device
restrictions!) when using lxc-attach.

I've now created a set of patches that now make sure that every attached
process is now

 - in the correct cgroup of the container
 - has the correct personality set
 - drops its capabilities

I also added the -f and -s switches to lxc-attach, because it now needs to
read the same configuration file as lxc-start to determine the capabilities
and personality. Additionally, lxc-attach now has a -k switch, which will
inhibit it from dropping the capabilities, so an administrator from the
outside may use this to reconfigure things in the container which he now may
not have been able to.

I hope you are agreeable to this improvement being merged.

Thanks,
Christian

PS: I already didn't get any reply to my previous email: Is there any
progress on pushing the last few patches required for lxc-attach to work to
the upstream Linux kernel?


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 1/4] Add missing 'extern' keyword to functions defined in cgroup.h

2012-02-03 Thread Christian Seiler
Hi,

> Note that "extern" keyword on function declarations has no
> effect whatsoever.

Yes, but I personally think it's good practice to always put it there,
since it doesn't cause any harm either, because otherwise one may
forget the keyword with variables, where it really matters. Also, lxc
uses 'extern' throughout the rest of the code-base, so I see this part
of my patch as just harmonizing the file with coding standards of the
rest of the project.

Regards,
Christian


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] lxc-attach: Consider cgroup, personality and capabilities when attaching processes to a container

2012-02-05 Thread Christian Seiler
Hi Daniel,

> thanks for your patches and your analysis.
> 
> IMO, we have to take into account the process we want to attach could be 
> an admin task and this one may want to have the full permissions within 
> the container. Also that could be an external daemon with the same 
> permissions as the container's processes. So inheriting should be 
> optional as it is up to the administrator to do the right action.

Yes, that's why I added the --keep-capabilities option to lxc-attach, to
make it possible for the administrator to execute a process inside the
container with higher permissions.

However, I only included capabilities there; it's true that cgroups may
impose an additional constraint. (Especially the device cgroup
controller.) On the other hand, the personality (which in LXC context
essentially means the architecture such as x86-64 vs. x86-32) is not
something I see as a "permission", but rather as a general property of
the container.

So the approach would then be:

 - default behaviour: use same restrictions as container
 - command line flag that allows one to ignore cgroups and capabilities
 - command line option to choose any architecture that's supported by
   the current running kernel (defaults to the arch of the container)

I do strongly think the default behaviour should be to use the same
restrictions as the container, as I see that to be the primary use case,
take for example

lxc-attach -n container -- /etc/init.d/sshd restart

This could easily leak privileges - the admin should explicitly state
that he/she wants to use elevated privileges if required.

> The parsing of the configuration file is right at the moment the 
> container has a configuration file and we did not launched the container 
> with the -s lxc.. options, or we did not modify the configuration file 
> after the container is launched.
> 
> I think it is much more sane to retrieve the needed informations from:
> 
>   * /proc//status : for the capabilities
>   * /proc//cgroup
>   * /proc//personality
> 
> Where  is the init pid of the container we can get through 
> get_init_pid function.

Yes, that seems like a reasonable approach. I'd rework the patches as
follows:

No flags: container's privileges according to /proc
-e/--elevated-privileges: maximum privileges (cgroup, capabilities)
-a x86/--arch=x86:manually specify the architecture
  (default to container's arch)

Is that agreeable?

Regards,
Christian

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 4/9] cgroup: Make cgroup_attach a public function

2012-02-09 Thread Christian Seiler
lxc-attach needs to be able to attach a process to specific cgroup, so
cgroup_attach is renamed to lxc_cgroup_attach and now also defined in the
header file.
---
 src/lxc/cgroup.c |4 ++--
 src/lxc/cgroup.h |1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index 2e04c79..a8e6c27 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -170,7 +170,7 @@ static int cgroup_enable_clone_children(const char *path)
return ret;
 }
 
-static int cgroup_attach(const char *path, pid_t pid)
+int lxc_cgroup_attach(const char *path, pid_t pid)
 {
FILE *f;
char tasks[MAXPATHLEN];
@@ -250,7 +250,7 @@ static int lxc_one_cgroup_create(const char *name,
}
 
/* Let's add the pid to the 'tasks' file */
-   if (cgroup_attach(cgname, pid)) {
+   if (lxc_cgroup_attach(cgname, pid)) {
SYSERROR("failed to attach pid '%d' to '%s'", pid, cgname);
rmdir(cgname);
return -1;
diff --git a/src/lxc/cgroup.h b/src/lxc/cgroup.h
index 31dd2de..611d9f4 100644
--- a/src/lxc/cgroup.h
+++ b/src/lxc/cgroup.h
@@ -30,5 +30,6 @@ extern int lxc_cgroup_create(const char *name, pid_t pid);
 extern int lxc_cgroup_destroy(const char *name);
 extern int lxc_cgroup_path_get(char **path, const char *subsystem, const char 
*name);
 extern int lxc_cgroup_nrtasks(const char *name);
+extern int lxc_cgroup_attach(const char *path, pid_t pid);
 extern int lxc_ns_is_mounted(void);
 #endif
-- 
1.7.2.5


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v2] lxc-attach: Consider cgroups/personality/capabilities of container

2012-02-09 Thread Christian Seiler
Hi,

This is the new version of my patch that implements the features discussed
in the previous thread.

 - The current status of the container is now read from /proc/init_pid/*,
   where init_pid is the pid of the container's init process.
 - By default:
* The attached process acquires the personality of the container (i.e.
  architecture: 32bit vs. 64bit)
* The attached process drops its capabilities according to those of the
  container
* The attached process is put into the same cgroup as the container
  itself
 - Overrides:
* -a/--arch option to set the architecture which the attached process
  sees manually
* -e/--elevated-privileges option to stop the attached process from being
  put in the same cgroup as the container and to let it retain the
  capability bounding set it already posesses.
 - Add a manual page for lxc-attach(1)

Regards,
Christian


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 8/9] lxc-attach: Drop privileges when attaching to container unless requested otherwise

2012-02-09 Thread Christian Seiler
lxc-attach will now put the process that is attached to the container into
the correct cgroups corresponding to the container, set the correct
personality and drop the privileges.

The information is extracted from entries in /proc of the init process of
the container. Note that this relies on the (reasonable) assumption that the
init process does not in fact drop additional capabilities from its bounding
set.

Additionally, 2 command line options are added to lxc-attach: One to prevent
the capabilities from being dropped and the process from being put into the
cgroup (-e, --elevated-privileges) and a second one to explicitly state the
architecture which the process will see, (-a, --arch) which defaults to the
container's current architecture.
---
 src/lxc/lxc_attach.c |  115 --
 1 files changed, 102 insertions(+), 13 deletions(-)

diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index c8643d1..3571b09 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -29,19 +29,45 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "attach.h"
 #include "commands.h"
 #include "arguments.h"
 #include "caps.h"
+#include "attach.h"
+#include "confile.h"
+#include "start.h"
+#include "sync.h"
 #include "log.h"
 
 lxc_log_define(lxc_attach_ui, lxc);
 
 static const struct option my_longopts[] = {
+   {"elevated-privileges", no_argument, 0, 'e'},
+   {"arch", required_argument, 0, 'a'},
LXC_COMMON_OPTIONS
 };
 
+static int elevated_privileges = 0;
+static signed long new_personality = -1;
+
+static int my_parser(struct lxc_arguments* args, int c, char* arg)
+{
+   switch (c) {
+   case 'e': elevated_privileges = 1; break;
+   case 'a':
+   new_personality = lxc_config_parse_arch(arg);
+   if (new_personality < 0) {
+   lxc_error(args, "invalid architecture specified: %s", 
arg);
+   return -1;
+   }
+   break;
+   }
+
+   return 0;
+}
+
 static struct lxc_arguments my_args = {
.progname = "lxc-attach",
.help = "\
@@ -50,17 +76,26 @@ static struct lxc_arguments my_args = {
 Execute the specified command - enter the container NAME\n\
 \n\
 Options :\n\
-  -n, --name=NAME   NAME for name of the container\n",
+  -n, --name=NAME   NAME for name of the container\n\
+  -e, --elevated-privileges\n\
+Use elevated privileges (capabilities, cgroup\n\
+restrictions) instead of those of the container.\n\
+WARNING: This may leak privleges into the container.\n\
+Use with care.\n\
+  -a, --arch=ARCH   Use ARCH for program instead of container's own\n\
+architecture.\n",
.options  = my_longopts,
-   .parser   = NULL,
+   .parser   = my_parser,
.checker  = NULL,
 };
 
 int main(int argc, char *argv[], char *envp[])
 {
int ret;
-   pid_t pid;
+   pid_t pid, init_pid;
struct passwd *passwd;
+   struct lxc_proc_context_info *init_ctx;
+   struct lxc_handler *handler;
uid_t uid;
char *curdir;
 
@@ -77,24 +112,25 @@ int main(int argc, char *argv[], char *envp[])
if (ret)
return ret;
 
-   pid = get_init_pid(my_args.name);
-   if (pid < 0) {
+   init_pid = get_init_pid(my_args.name);
+   if (init_pid < 0) {
ERROR("failed to get the init pid");
return -1;
}
 
-   curdir = get_current_dir_name();
-
-   ret = lxc_attach_to_ns(pid);
-   if (ret < 0) {
-   ERROR("failed to enter the namespace");
+   init_ctx = lxc_proc_get_context_info(init_pid);
+   if (!init_ctx) {
+   ERROR("failed to get context of the init process, pid = %d", 
init_pid);
return -1;
}
 
-   if (curdir && chdir(curdir))
-   WARN("could not change directory to '%s'", curdir);
+   /* hack: we need sync.h infrastructure - and that needs a handler */
+   handler = calloc(1, sizeof(*handler));
 
-   free(curdir);
+   if (lxc_sync_init(handler)) {
+   ERROR("failed to initialize synchronization socket");
+   return -1;
+   }
 
pid = fork();
 
@@ -106,6 +142,23 @@ int main(int argc, char *argv[], char *envp[])
if (pid) {
int status;
 
+   lxc_sync_fini_child(handler);
+
+   /* wait until the child has done configuring itself before
+* we put it in a cgroup that potentially limits these
+* possibilities */
+   if (lxc_sync_wait_child(handler, LXC_SYNC_CONFIGURE))
+   return -1;
+
+   if (!elevated_privileges && lxc_attach_proc_to_cgroups(pid, 
init_ctx))
+   return -1;
+
+   /* tell the child we are done initializing */
+   if (lxc_

[lxc-devel] [PATCH 7/9] Move lxc_attach from namespace.c to attach.c and rename it to lxc_attach_to_ns

2012-02-09 Thread Christian Seiler
Since lxc-attach helper functions now have an own source file, lxc_attach is
moved from namespace.c to attach.c and is renamed to lxc_attach_to_ns,
because that better reflects what the function does (attaching to a
container can also contain the setting of the process's personality, adding
it to the corresponding cgroups and dropping specific capabilities).
---
 src/lxc/attach.c |   35 +++
 src/lxc/attach.h |1 +
 src/lxc/lxc_attach.c |4 ++--
 src/lxc/namespace.c  |   47 ---
 src/lxc/namespace.h  |1 -
 5 files changed, 38 insertions(+), 50 deletions(-)

diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index 9392116..0cd3a54 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -226,6 +226,41 @@ int lxc_attach_proc_to_cgroups(pid_t pid, struct 
lxc_proc_context_info *ctx)
return 0;
 }
 
+int lxc_attach_to_ns(pid_t pid)
+{
+   char path[MAXPATHLEN];
+   char *ns[] = { "pid", "mnt", "net", "ipc", "uts" };
+   const int size = sizeof(ns) / sizeof(char *);
+   int fd[size];
+   int i;
+
+   snprintf(path, MAXPATHLEN, "/proc/%d/ns", pid);
+   if (access(path, X_OK)) {
+   ERROR("Does this kernel version support 'attach' ?");
+   return -1;
+   }
+
+   for (i = 0; i < size; i++) {
+   snprintf(path, MAXPATHLEN, "/proc/%d/ns/%s", pid, ns[i]);
+   fd[i] = open(path, O_RDONLY);
+   if (fd[i] < 0) {
+   SYSERROR("failed to open '%s'", path);
+   return -1;
+   }
+   }
+
+   for (i = 0; i < size; i++) {
+   if (setns(fd[i], 0)) {
+   SYSERROR("failed to set namespace '%s'", ns[i]);
+   return -1;
+   }
+
+   close(fd[i]);
+   }
+
+   return 0;
+}
+
 int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx)
 {
int last_cap = lxc_caps_last_cap();
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index 7e67455..d2b7533 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -42,6 +42,7 @@ extern struct lxc_proc_context_info 
*lxc_proc_get_context_info(pid_t pid);
 extern void lxc_proc_free_context_info(struct lxc_proc_context_info *info);
 
 extern int lxc_attach_proc_to_cgroups(pid_t pid, struct lxc_proc_context_info 
*ctx);
+extern int lxc_attach_to_ns(pid_t other_pid);
 extern int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx);
 
 #endif
diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index ed3d5a4..c8643d1 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -30,9 +30,9 @@
 #include 
 #include 
 
+#include "attach.h"
 #include "commands.h"
 #include "arguments.h"
-#include "namespace.h"
 #include "caps.h"
 #include "log.h"
 
@@ -85,7 +85,7 @@ int main(int argc, char *argv[], char *envp[])
 
curdir = get_current_dir_name();
 
-   ret = lxc_attach(pid);
+   ret = lxc_attach_to_ns(pid);
if (ret < 0) {
ERROR("failed to enter the namespace");
return -1;
diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c
index 6512685..3e6fc3a 100644
--- a/src/lxc/namespace.c
+++ b/src/lxc/namespace.c
@@ -34,8 +34,6 @@
 #include "namespace.h"
 #include "log.h"
 
-#include "setns.h"
-
 lxc_log_define(lxc_namespace, lxc);
 
 struct clone_arg {
@@ -43,16 +41,6 @@ struct clone_arg {
void *arg;
 };
 
-int setns(int fd, int nstype)
-{
-#ifndef __NR_setns
-   errno = ENOSYS;
-   return -1;
-#else
-   return syscall(__NR_setns, fd, nstype);
-#endif
-}
-
 static int do_clone(void *arg)
 {
struct clone_arg *clone_arg = arg;
@@ -81,38 +69,3 @@ pid_t lxc_clone(int (*fn)(void *), void *arg, int flags)
 
return ret;
 }
-
-int lxc_attach(pid_t pid)
-{
-   char path[MAXPATHLEN];
-   char *ns[] = { "pid", "mnt", "net", "ipc", "uts" };
-   const int size = sizeof(ns) / sizeof(char *);
-   int fd[size];
-   int i;
-
-   sprintf(path, "/proc/%d/ns", pid);
-   if (access(path, X_OK)) {
-   ERROR("Does this kernel version support 'attach' ?");
-   return -1;
-   }
-
-   for (i = 0; i < size; i++) {
-   sprintf(path, "/proc/%d/ns/%s", pid, ns[i]);
-   fd[i] = open(path, O_RDONLY);
-   if (fd[i] < 0) {
-   SYSERROR("failed to open '%s'", path);
-   return -1;
-   }
-   }
-
-   for (i = 0; i < size; i++) {
-   if (setns(fd[i], 0)) {
-   SYSERROR("failed to set namespace '%s'", ns[i]);
-   return -1;
-   }
-
-   close(fd[i]);
-   }
-
-   return 0;
-}
diff --git a/src/lxc/namespace.h b/src/lxc/namespace.h
index 9c6b7ec..5442dd3 100644
--- a/src/lxc/namespace.h
+++ b/src/lxc/namespace.h
@@ -49,6 +49,5 @@
 #endif
 
 extern pid_t lxc_clone(int (*fn)(void *), vo

[lxc-devel] [PATCH 9/9] Add man page for lxc-attach

2012-02-09 Thread Christian Seiler
---
 configure.ac   |1 +
 doc/Makefile.am|1 +
 doc/lxc-attach.sgml.in |  189 
 doc/see_also.sgml.in   |5 ++
 4 files changed, 196 insertions(+), 0 deletions(-)
 create mode 100644 doc/lxc-attach.sgml.in

diff --git a/configure.ac b/configure.ac
index 02f652b..f43dc07 100644
--- a/configure.ac
+++ b/configure.ac
@@ -120,6 +120,7 @@ AC_CONFIG_FILES([
doc/lxc-ps.sgml
doc/lxc-cgroup.sgml
doc/lxc-kill.sgml
+   doc/lxc-attach.sgml
doc/lxc.conf.sgml
doc/lxc.sgml
doc/common_options.sgml
diff --git a/doc/Makefile.am b/doc/Makefile.am
index 8530ee9..b18c5eb 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -23,6 +23,7 @@ man_MANS = \
lxc-ps.1 \
lxc-cgroup.1 \
lxc-kill.1 \
+   lxc-attach.1 \
\
lxc.conf.5 \
\
diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
new file mode 100644
index 000..39181ba
--- /dev/null
+++ b/doc/lxc-attach.sgml.in
@@ -0,0 +1,189 @@
+
+
+
+
+]>
+
+
+
+  @LXC_GENERATE_DATE@
+
+  
+lxc-attach
+1
+  
+
+  
+lxc-attach
+
+
+  start a process inside a running container.
+
+  
+
+  
+lxc-attach -n
+name -a
+arch -e
+-- command
+  
+
+  
+Description
+
+
+  lxc-attach runs the specified
+  command inside the container
+  specified by name. The container
+  has to be running already.
+
+
+  If no command is specified, the
+  current default shell of the user running
+  lxc-attach will be looked up inside the
+  container and executed. This will fail if no such user exists
+  inside the container or the container does not have a working
+  nsswitch mechanism.
+
+
+  
+
+  
+
+Options
+
+
+
+  
+   
+ -a, --arch arch
+   
+   
+ 
+   Specify the architecture which the kernel should appear to be
+   running as to the command executed. This option will accept the
+   same settings as the lxc.arch option in
+   container configuration files, see
+   
+ lxc.conf
+ 5
+   . By default, the current archictecture of the
+   running container will be used.
+ 
+   
+  
+
+  
+   
+ -e, --elevated-privileges
+   
+   
+ 
+   Do not drop privileges when running
+   command inside the container. If
+   this option is specified, the new process will
+   not be added to the container's cgroup(s)
+   and it will not drop its capabilities before executing.
+ 
+ 
+   Warning: This may leak privileges into the
+   container if the command starts subprocesses that remain active
+   after the main process that was attached is terminated. The
+   (re-)starting of daemons inside the container is problematic,
+   especially if the daemon starts a lot of subprocesses such as
+   cron or sshd.
+   Use with great care.
+ 
+   
+  
+
+
+
+  
+
+  &commonoptions;
+
+  
+Examples
+  
+To spawn a new shell running inside an existing container, use
+
+  lxc-attach -n container
+
+  
+  
+To restart the cron service of a running Debian container, use
+
+  lxc-attach -n container -- /etc/init.d/cron restart
+
+  
+  
+To deactivate the network link eth1 of a running container that
+does not have the NET_ADMIN capability, use the -e
+option to use increased capabilities:
+
+  lxc-attach -n container -e -- /sbin/ip link delete eth1
+
+  
+
+
+  
+
+  
+Security
+
+  The -e should be used with care, as it may break
+  the isolation of the containers if used improperly.
+
+  
+
+  &seealso;
+
+  
+Author
+Daniel Lezcano daniel.lezc...@free.fr
+  
+
+
+
+
diff --git a/doc/see_also.sgml.in b/doc/see_also.sgml.in
index 78b99b4..e400e8b 100644
--- a/doc/see_also.sgml.in
+++ b/doc/see_also.sgml.in
@@ -108,6 +108,11 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   ,
 
   
+   lxc-attach
+   1
+  ,
+
+  
lxc.conf
5
   
-- 
1.7.2.5


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 5/9] Add lxc_config_parse_arch to parse architecture strings

2012-02-09 Thread Christian Seiler
Add the function lxc_config_parse_arch that parses an architecture string
(x86, i686, x86_64, amd64) and returns the corresponding personality. This
is required for lxc-attach, which accepts architectures independently of
lxc.arch. The parsing of lxc.arch now also uses the same function to ensure
consistency.
---
 src/lxc/confile.c |   52 +---
 src/lxc/confile.h |3 +++
 2 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/src/lxc/confile.c b/src/lxc/confile.c
index 550102c..1adce91 100644
--- a/src/lxc/confile.c
+++ b/src/lxc/confile.c
@@ -37,6 +37,7 @@
 #include 
 
 #include "parse.h"
+#include "confile.h"
 #include "utils.h"
 
 #include 
@@ -584,30 +585,12 @@ static int config_network_script(const char *key, char 
*value,
 static int config_personality(const char *key, char *value,
  struct lxc_conf *lxc_conf)
 {
-   struct per_name {
-   char *name;
-   int per;
-   } pername[4] = {
-   { "x86", PER_LINUX32 },
-   { "i686", PER_LINUX32 },
-   { "x86_64", PER_LINUX },
-   { "amd64", PER_LINUX },
-   };
-   size_t len = sizeof(pername) / sizeof(pername[0]);
+   signed long personality = lxc_config_parse_arch(value);
 
-   int i;
-
-   for (i = 0; i < len; i++) {
-
-   if (strcmp(pername[i].name, value))
-   continue;
-
-   lxc_conf->personality = pername[i].per;
-
-   return 0;
-   }
-
-   WARN("unsupported personality '%s'", value);
+   if (personality >= 0)
+   lxc_conf->personality = personality;
+   else
+   WARN("unsupported personality '%s'", value);
 
return 0;
 }
@@ -974,3 +957,26 @@ int lxc_config_define_load(struct lxc_list *defines, 
struct lxc_conf *conf)
 
return ret;
 }
+
+signed long lxc_config_parse_arch(const char *arch)
+{
+   struct per_name {
+   char *name;
+   unsigned long per;
+   } pername[4] = {
+   { "x86", PER_LINUX32 },
+   { "i686", PER_LINUX32 },
+   { "x86_64", PER_LINUX },
+   { "amd64", PER_LINUX },
+   };
+   size_t len = sizeof(pername) / sizeof(pername[0]);
+
+   int i;
+
+   for (i = 0; i < len; i++) {
+   if (!strcmp(pername[i].name, arch))
+   return pername[i].per;
+   }
+
+   return -1;
+}
diff --git a/src/lxc/confile.h b/src/lxc/confile.h
index f415e55..d2faa75 100644
--- a/src/lxc/confile.h
+++ b/src/lxc/confile.h
@@ -34,4 +34,7 @@ extern int lxc_config_define_add(struct lxc_list *defines, 
char* arg);
 extern int lxc_config_define_load(struct lxc_list *defines,
  struct lxc_conf *conf);
 
+/* needed for lxc-attach */
+extern signed long lxc_config_parse_arch(const char *arch);
+
 #endif
-- 
1.7.2.5


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 3/9] Enable get_cgroup_mount to search for mount points satisfying multiple subsystems at once

2012-02-09 Thread Christian Seiler
lxc-attach functionality reads /proc/init_pid/cgroup to determine the cgroup
of the container for a given subsystem. However, since subsystems may be
mounted together, we want to be on the safe side and be sure that we really
find the correct mount point, so we allow get_cgroup_mount to check for
*all* the subsystems; the subsystem parameter may now be a comma-separated
list.
---
 src/lxc/cgroup.c |   31 ++-
 1 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index 6ae67bd..2e04c79 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -52,6 +52,35 @@ enum {
CGROUP_CLONE_CHILDREN,
 };
 
+static char *hasmntopt_multiple(struct mntent *mntent, const char *options)
+{
+   const char *ptr = options;
+   const char *ptr2 = strchr(options, ',');
+   char *result;
+
+   while (ptr2 != NULL) {
+   char *option = strndup(ptr, ptr2 - ptr);
+   if (!option) {
+   SYSERROR("Temporary memory allocation error");
+   return NULL;
+   }
+
+   result = hasmntopt(mntent, option);
+   free(option);
+
+   if (!result) {
+   return NULL;
+   }
+
+   ptr = ptr2 + 1;
+   ptr2 = strchr(ptr, ',');
+   }
+
+   /* for multiple mount options, the return value is basically NULL
+* or non-NULL, so this should suffice for our purposes */
+   return hasmntopt(mntent, ptr);
+}
+
 static int get_cgroup_mount(const char *subsystem, char *mnt)
 {
struct mntent *mntent;
@@ -67,7 +96,7 @@ static int get_cgroup_mount(const char *subsystem, char *mnt)
 
if (strcmp(mntent->mnt_type, "cgroup"))
continue;
-   if (!subsystem || hasmntopt(mntent, subsystem)) {
+   if (!subsystem || hasmntopt_multiple(mntent, subsystem)) {
strcpy(mnt, mntent->mnt_dir);
fclose(file);
DEBUG("using cgroup mounted at '%s'", mnt);
-- 
1.7.2.5


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/9] Add missing 'extern' keyword to functions defined in cgroup.h

2012-02-09 Thread Christian Seiler
---
 src/lxc/cgroup.h |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/lxc/cgroup.h b/src/lxc/cgroup.h
index 188d948..31dd2de 100644
--- a/src/lxc/cgroup.h
+++ b/src/lxc/cgroup.h
@@ -26,9 +26,9 @@
 #define MAXPRIOLEN 24
 
 struct lxc_handler;
-int lxc_cgroup_create(const char *name, pid_t pid);
-int lxc_cgroup_destroy(const char *name);
-int lxc_cgroup_path_get(char **path, const char *subsystem, const char *name);
-int lxc_cgroup_nrtasks(const char *name);
-int lxc_ns_is_mounted(void);
+extern int lxc_cgroup_create(const char *name, pid_t pid);
+extern int lxc_cgroup_destroy(const char *name);
+extern int lxc_cgroup_path_get(char **path, const char *subsystem, const char 
*name);
+extern int lxc_cgroup_nrtasks(const char *name);
+extern int lxc_ns_is_mounted(void);
 #endif
-- 
1.7.2.5


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 6/9] Add attach.[ch]: Helper functions for lxc-attach

2012-02-09 Thread Christian Seiler
The following helper functions for lxc-attach are added to a new file
attach.c:
 - lxc_proc_get_context_info: Get cgroup memberships, personality and
   capability bounding set from /proc for a given process.
 - lxc_proc_free_context_info: Free the data structure responsible
 - lxc_attach_proc_to_cgroups: Add the process specified by the pid
   parameter to the cgroups given by the ctx parameter.
 - lxc_attach_drop_privs: Drop capabilities to the capability mask given in
   the ctx parameter.
---
 src/lxc/Makefile.am |4 +-
 src/lxc/attach.c|  247 +++
 src/lxc/attach.h|   47 ++
 3 files changed, 297 insertions(+), 1 deletions(-)
 create mode 100644 src/lxc/attach.c
 create mode 100644 src/lxc/attach.h

diff --git a/src/lxc/Makefile.am b/src/lxc/Makefile.am
index 924cf1d..e695883 100644
--- a/src/lxc/Makefile.am
+++ b/src/lxc/Makefile.am
@@ -12,7 +12,8 @@ pkginclude_HEADERS = \
conf.h \
list.h \
log.h \
-   state.h
+   state.h \
+   attach.h
 
 sodir=$(libdir)
 # use PROGRAMS to avoid complains from automake
@@ -41,6 +42,7 @@ liblxc_so_SOURCES = \
list.h \
state.c state.h \
log.c log.h \
+   attach.c attach.h \
\
network.c network.h \
 nl.c nl.h \
diff --git a/src/lxc/attach.c b/src/lxc/attach.c
new file mode 100644
index 000..9392116
--- /dev/null
+++ b/src/lxc/attach.c
@@ -0,0 +1,247 @@
+/*
+ * lxc: linux Container library
+ *
+ * (C) Copyright IBM Corp. 2007, 2008
+ *
+ * Authors:
+ * Daniel Lezcano 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#if !HAVE_DECL_PR_CAPBSET_DROP
+#define PR_CAPBSET_DROP 24
+#endif
+
+#include "namespace.h"
+#include "log.h"
+#include "attach.h"
+#include "caps.h"
+#include "cgroup.h"
+#include "config.h"
+
+#include "setns.h"
+
+lxc_log_define(lxc_attach, lxc);
+
+int setns(int fd, int nstype)
+{
+#ifndef __NR_setns
+   errno = ENOSYS;
+   return -1;
+#else
+   return syscall(__NR_setns, fd, nstype);
+#endif
+}
+
+struct lxc_proc_context_info *lxc_proc_get_context_info(pid_t pid)
+{
+   struct lxc_proc_context_info *info = calloc(1, sizeof(*info));
+   FILE *proc_file;
+   char proc_fn[MAXPATHLEN];
+   char *line = NULL, *ptr, *ptr2;
+   size_t line_bufsz = 0;
+   int ret, found, l;
+   int i;
+
+   if (!info) {
+   SYSERROR("Could not allocate memory.");
+   return NULL;
+   }
+
+   /* read capabilities */
+   snprintf(proc_fn, MAXPATHLEN, "/proc/%d/status", pid);
+
+   proc_file = fopen(proc_fn, "r");
+   if (!proc_file) {
+   SYSERROR("Could not open %s", proc_fn);
+   goto out_error;
+   }
+
+   found = 0;
+   while (getline(&line, &line_bufsz, proc_file) != -1) {
+   ret = sscanf(line, "CapBnd: %llx", &info->capability_mask);
+   if (ret != EOF && ret > 0) {
+   found = 1;
+   break;
+   }
+   }
+
+   fclose(proc_file);
+
+   if (!found) {
+   SYSERROR("Could not read capability bounding set from %s", 
proc_fn);
+   errno = ENOENT;
+   goto out_error;
+   }
+
+   /* read personality */
+   snprintf(proc_fn, MAXPATHLEN, "/proc/%d/personality", pid);
+
+   proc_file = fopen(proc_fn, "r");
+   if (!proc_file) {
+   SYSERROR("Could not open %s", proc_fn);
+   goto out_error;
+   }
+
+   ret = fscanf(proc_file, "%lx", &info->personality);
+   fclose(proc_file);
+
+   if (ret == EOF || ret == 0) {
+   SYSERROR("Could not read personality from %s", proc_fn);
+   errno = ENOENT;
+   goto out_error;
+   }
+
+   /* read cgroups */
+   snprintf(proc_fn, MAXPATHLEN, "/proc/%d/cgroup", pid);
+
+   proc_file = fopen(proc_fn, "r");
+   if (!proc_file) {
+   SYSERROR("Could not open %s", proc_fn);
+   goto out_error;
+   }
+
+   /* we don't really know how many cgroup subsystems there 

[lxc-devel] [PATCH 2/9] Add missing double-include #ifndef/#define/#endif to confile.h

2012-02-09 Thread Christian Seiler
---
 src/lxc/confile.h |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/src/lxc/confile.h b/src/lxc/confile.h
index 6698fb2..f415e55 100644
--- a/src/lxc/confile.h
+++ b/src/lxc/confile.h
@@ -21,6 +21,9 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
  */
 
+#ifndef _confile_h
+#define _confile_h
+
 struct lxc_conf;
 struct lxc_list;
 
@@ -30,3 +33,5 @@ extern int lxc_config_readline(char *buffer, struct lxc_conf 
*conf);
 extern int lxc_config_define_add(struct lxc_list *defines, char* arg);
 extern int lxc_config_define_load(struct lxc_list *defines,
  struct lxc_conf *conf);
+
+#endif
-- 
1.7.2.5


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH v2] lxc-attach: Consider cgroups/personality/capabilities of container

2012-02-17 Thread Christian Seiler
Hi there,

> This is the new version of my patch that implements the features 
> discussed
> in the previous thread.
>
>  - The current status of the container is now read from 
> /proc/init_pid/*,
>where init_pid is the pid of the container's init process.
>  - By default:
> * The attached process acquires the personality of the container 
> (i.e.
>   architecture: 32bit vs. 64bit)
> * The attached process drops its capabilities according to those 
> of the
>   container
> * The attached process is put into the same cgroup as the 
> container
>   itself
>  - Overrides:
> * -a/--arch option to set the architecture which the attached 
> process
>   sees manually
> * -e/--elevated-privileges option to stop the attached process 
> from being
>   put in the same cgroup as the container and to let it retain 
> the
>   capability bounding set it already posesses.
>  - Add a manual page for lxc-attach(1)

Any comments on this?

Regards,
Christian


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH v2] lxc-attach: Consider cgroups/personality/capabilities of container

2012-02-17 Thread Christian Seiler
Hi Daniel,

> your patchset is in my tree. I will do some tests and push it.

Thanks!

Regards,
Christian


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH] Add lxc-net tool

2012-05-17 Thread Christian Seiler
Add a tool that switches context to enter the network namespace and then
execute an arbitrary command. Since we don't change mount / pid namespaces,
this allows the user to use the host's networking tools such as iputils,
iptables, netstat to query / configure the container from the outside. This
may be especially useful for administrators who have dropped network
configuration capabilities inside the container as a security measure.

Since network namespace switching via setns() was introduced in kernel 3.0,
this tool will work with any newer vanilla kernel version.
---
 configure.ac|1 +
 doc/Makefile.am |1 +
 doc/lxc-net.sgml.in |  131 +++
 src/lxc/Makefile.am |4 +-
 src/lxc/attach.c|   22 +
 src/lxc/attach.h|1 +
 src/lxc/lxc_net.c   |   92 
 7 files changed, 251 insertions(+), 1 deletions(-)
 create mode 100644 doc/lxc-net.sgml.in
 create mode 100644 src/lxc/lxc_net.c

diff --git a/configure.ac b/configure.ac
index 0c8aa69..8605102 100644
--- a/configure.ac
+++ b/configure.ac
@@ -120,6 +120,7 @@ AC_CONFIG_FILES([
doc/lxc-cgroup.sgml
doc/lxc-kill.sgml
doc/lxc-attach.sgml
+   doc/lxc-net.sgml
doc/lxc.conf.sgml
doc/lxc.sgml
doc/common_options.sgml
diff --git a/doc/Makefile.am b/doc/Makefile.am
index b18c5eb..fd1a58b 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -24,6 +24,7 @@ man_MANS = \
lxc-cgroup.1 \
lxc-kill.1 \
lxc-attach.1 \
+   lxc-net.1 \
\
lxc.conf.5 \
\
diff --git a/doc/lxc-net.sgml.in b/doc/lxc-net.sgml.in
new file mode 100644
index 000..b675b4f
--- /dev/null
+++ b/doc/lxc-net.sgml.in
@@ -0,0 +1,131 @@
+
+
+
+
+]>
+
+
+
+  @LXC_GENERATE_DATE@
+
+  
+lxc-net
+1
+  
+
+  
+lxc-net
+
+
+  run a process in the network context of a container
+
+  
+
+  
+lxc-net -n
+name -- command
+  
+
+  
+Description
+
+
+  lxc-net runs the specified
+  command inside the network
+  namespace of the container specified by
+  name. The container
+  has to be running already.
+
+
+  This may be used to query or modify the network
+  devices of a container from the outside. Note that
+  the executable itself must reside outside of the
+  container, since only the network namespace is
+  changed; the command will still see the file system
+  and process ids of the outside container.
+
+
+  
+
+  &commonoptions;
+
+  
+Examples
+  
+To show the ip addresses of the devices of a container,
+assuming iputils is installed outside, use
+
+  lxc-net -n container -- ip -4 addr show
+
+  
+  
+To show the connections inside a container via netstat
+
+  lxc-net -n container -- netstat
+
+  
+  
+To add an iptables filter rule
+
+  lxc-net -n container -- iptables -A INPUT -d someip -j DROP
+
+  
+  
+
+  
+Notes
+
+  This requires kernel version 3.0 to work.
+
+  
+
+  &seealso;
+
+  
+Author
+Daniel Lezcano daniel.lezc...@free.fr
+  
+
+
+
+
diff --git a/src/lxc/Makefile.am b/src/lxc/Makefile.am
index 1c26952..7b6f795 100644
--- a/src/lxc/Makefile.am
+++ b/src/lxc/Makefile.am
@@ -95,7 +95,8 @@ bin_PROGRAMS = \
lxc-unfreeze \
lxc-checkpoint \
lxc-restart \
-   lxc-kill
+   lxc-kill \
+   lxc-net
 
 pkglibexec_PROGRAMS = \
lxc-init
@@ -122,6 +123,7 @@ lxc_unfreeze_SOURCES = lxc_unfreeze.c
 lxc_unshare_SOURCES = lxc_unshare.c
 lxc_wait_SOURCES = lxc_wait.c
 lxc_kill_SOURCES = lxc_kill.c
+lxc_net_SOURCES = lxc_net.c
 
 install-exec-local: install-soPROGRAMS
mv $(DESTDIR)$(libdir)/liblxc.so 
$(DESTDIR)$(libdir)/liblxc.so.$(VERSION)
diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index a95b3d3..45a3fd3 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -156,6 +156,28 @@ int lxc_attach_to_ns(pid_t pid)
return 0;
 }
 
+int lxc_attach_to_ns_type(pid_t pid, char *type)
+{
+   char path[MAXPATHLEN];
+   int fd;
+   
+   snprintf(path, MAXPATHLEN, "/proc/%d/ns/%s", pid, type);
+   fd = open(path, O_RDONLY);
+   if (fd < 0) {
+   SYSERROR("failed to open '%s'", path);
+   return -1;
+   }
+   
+   if (setns(fd, 0)) {
+   SYSERROR("failed to set namespace '%s'", type);
+   return -1;
+   }
+   
+   close(fd);
+   
+   return 0;
+}
+
 int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx)
 {
int last_cap = lxc_caps_last_cap();
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index 2d46c83..0cfd6ae 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -34,6 +34,7 @@ struct lxc_proc_context_info {
 extern struct lxc_proc_context_info *lxc_proc_get_context_info(pid_t pid);
 

Re: [lxc-devel] [PATCH] Add lxc-net tool

2012-05-17 Thread Christian Seiler
Hi,

> Until lxc-attach is extended (Serge Hallyn took that action item),

Extending lxc-attach would also be fine by me - is anybody already
working on this? If not, I'd be willing to do that.

> I suggest using this very simple script to switch network namespaces:
> http://paste.ubuntu.com/992744/

Thanks, I didn't know that current versions of iproute2's ip could do
that (the version I have installed actually can't, I'll have to
upgrade).

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Add lxc-net tool

2012-05-17 Thread Christian Seiler
Hi again,

>> Extending lxc-attach would also be fine by me - is anybody already
>> working on this? If not, I'd be willing to do that.
>
> It's on the todo list at the moment, I don't believe anyone is
> actively working on it at the moment.
>
> So if you want to do it, it'd be much appreciated.

Ok, I'll get back to you on that.

> I believe the idea was to keep the same syntax as lxc-unshare for
> lxc-attach. Essentially introduce a -s flag accepting an ORed list of
> namespaces.

Yes, sounds reasonable to me. Just one question: I've actually taken a
peek at iproute2's implementation now that you mentioned it and they
actually do two things in ip netns exec:

 - they setns() to the new namespace
 - they unshare the mount namespace and remount /sys - apparently, in
   contrast to /proc, which depends on the current process's context,
   /sys depends on the context of the process mounting it

So for lxc-attach without mount namespaces but with network namespaces,
should we do the same? (i.e. catch that case) Or should we just ignore
/sys? Do any non-shell-script utilities actually use it? I am under the
impression that any network interface information is also available via
netlink or similar, and that all "real" tools use that, which does take
into account process context. (One could even argue that /sys should -
just as /proc does - reflect the current namespace for relevant things
and that this is a bug/missing feature in the kernel.) Thoughts?

> Currently only ipc, net and uts are usable on a standard kernel,
> patches for the remaining ones and the user namespace are currently
> on lkml with some of them in linux-next.

About only ipc, net and uts being usable, I know about that, I'm
currently using the patches under

(slightly modified to apply against 3.2) to be able to properly use
lxc-attach. I'm glad to hear that they are on their way into the kernel.

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Add lxc-net tool

2012-05-17 Thread Christian Seiler
Hi,

>>  - they unshare the mount namespace and remount /sys - apparently, in
>>contrast to /proc, which depends on the current process's context,
>>/sys depends on the context of the process mounting it
> 
> Both actually depend on the context of the process mounting it.  If you
> do "lxc-unshare -s PID /bin/bash" and then do "echo $$" and "ls /proc",
> you'll see proc is still the old proc.

Actually, it's even more complicated than that. Try the following:

lxc-unshare -s NETWORK -- cat /proc/self/net/dev

I did a few simple tests and found the following:

Network namespaces:
  /proc/$pid/net Context of process $pid
  /sys/class/net etc.Context of process mounting /sys
PID namespaces:
  /proc  Context of process mounting /proc
Mount namespaces:
  /proc/$pid/mountinfo   Context of process $pid

So - due to the /proc/self logic - for network namespaces, one only
needs to remount /sys, for PID namespaces, only /proc; and for mount
namespaces we don't really care since if we attach to a mount namespace
that belongs to a container, the corresponding file systems we see are
already mounted in the correct context.

>> So for lxc-attach without mount namespaces but with network namespaces,
>> should we do the same? (i.e. catch that case) Or should we just ignore
> 
> I think we should let users do this themselves, but warn about it in
> the lxc-attach manpage.

I agree it may be wise not to do too much as a default in order not to
confuse users, however, I really would like lxc-attach to be able to
handle this stuff on its own if needed.

Suggestion:

1) Default behavior: Just attach to specified namespaces.
2) Additional command line flag -R, (or something else, if you prefer)
   that does the following:

 a) If the process is to be attached to either NETWORK or PID
namespaces
   -and-
 b) it is NOT to be attached to the MOUNT namespace

   then *additionally* unshare (not attach) MOUNT namespace,
   remount /sys and /proc.

   Ignore the flag if those conditions are not met.

Because if we leave that completely to the user, one really has to do
something along the lines of

lxc-attach -n container -s NETWORK -- \
   lxc-unshare -s MOUNT -- /bin/bash -c \
  "umount /sys ; mount -t sysfs none /sys ; \
   umount /proc ; mount -t proc none /proc ; \
   /some/complicated/command/that/uses//sys"

instead of simply

lxc-attach -n container -R -s NETWORK -- \
   /some/complicated/command/that/uses//sys

The first seems like too much of a mouthful to me. Thoughts?

Other than this issue and the man page, I have a patch for lxc-attach
ready; as soon as I get to update the man page I'll post it to the list.
(The /proc and /sys stuff can be added later IMHO.)

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH] Add option to lxc-attach to select specific namespaces

2012-05-18 Thread Christian Seiler
This patch adds the -s/--namespaces option to lxc-attach that works
analogously to lxc-unshare, allowing the user to select the namespaces the
process should be attached to.

Signed-off-by: Christian Seiler 
Cc: Stéphane Graber 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 doc/lxc-attach.sgml.in |   92 +---
 src/lxc/attach.c   |   12 +-
 src/lxc/attach.h   |2 +-
 src/lxc/lxc_attach.c   |   36 +--
 src/lxc/lxc_unshare.c  |   46 
 src/lxc/namespace.c|   46 
 src/lxc/namespace.h|3 ++
 7 files changed, 180 insertions(+), 57 deletions(-)

diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
index 7092f16..4e462f9 100644
--- a/doc/lxc-attach.sgml.in
+++ b/doc/lxc-attach.sgml.in
@@ -49,7 +49,8 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
 lxc-attach -n
 name -a
-arch -e
+arch -e -s
+namespaces
 -- command
   
 
@@ -122,6 +123,28 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA

   
 
+  
+   
+ -s, --namespaces 
namespaces
+   
+   
+ 
+   Specify the namespaces to attach to, as a pipe-separated liste,
+   e.g. NETWORK|IPC. Allowed values are
+   MOUNT, PID,
+   UTSNAME, IPC
+   and NETWORK. This allows one to change
+   the context of the process to e.g. the network namespace of the
+   container while retaining the other namespaces as those of the
+   host.
+ 
+ 
+   Important: This option implies
+   -e.
+ 
+   
+  
+
 
 
   
@@ -144,19 +167,78 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
   
 To deactivate the network link eth1 of a running container that
-does not have the NET_ADMIN capability, use the -e
-option to use increased capabilities:
+does not have the NET_ADMIN capability, use either the
+-e option to use increased capabilities,
+assuming the ip tool is installed:
 
   lxc-attach -n container -e -- /sbin/ip link delete eth1
 
+Or, alternatively, use the -s to use the
+tools installed on the host outside the container:
+
+  lxc-attach -n container -s NETWORK -- /sbin/ip link delete eth1
+
   
   
 
   
+Compatibility
+
+  Attaching completely (including the pid and mount namespaces) to a
+  container requires a patched kernel, please see the lxc website for
+  details. lxc-attach will fail in that case if
+  used with an unpatched kernel.
+
+
+  Nevertheless, it will succeed on an unpatched kernel of version 3.0
+  or higher if the -s option is used to restrict the
+  namespaces that the process is to be attached to to one or more of 
+  NETWORK, IPC
+  and UTSNAME.
+
+  
+
+  
+Notes
+
+  The Linux /proc and
+  /sys filesystems contain information
+  about some quantities that are affected by namespaces, such as
+  the directories named after process ids in
+  /proc or the network interface infromation
+  in /sys/class/net. The namespace of the
+  process mounting the pseudo-filesystems determines what information
+  is shown, not the namespace of the process
+  accessing /proc or
+  /sys.
+
+
+  If one uses the -s option to only attach to
+  the pid namespace of a container, but not its mount namespace
+  (which will contain the /proc of the
+  container and not the host), the contents of /proc
+  will reflect that of the host and not the container. Analogously,
+  the same issue occurs when reading the contents of
+  /sys/class/net and attaching to just
+  the network namespace.
+
+
+  A workaround is to use lxc-unshare to unshare
+  the mount namespace after using lxc-attach with
+  -s PID and/or -s
+  NETWORK and then unmount and then mount again both
+  pseudo-filesystems within that new mount namespace, before
+  executing a program/script that relies on this information to be
+  correct.
+
+  
+
+  
 Security
 
-  The -e should be used with care, as it may break
-  the isolation of the containers if used improperly.
+  The -e and -s options should
+  be used with care, as it may break the isolation of the containers
+  if used improperly.
 
   
 
diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index a95b3d3..fcf47da 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -121,10 +121,11 @@ out_error:
return NULL;
 }
 
-int lxc_attach_to_ns(pid_t pid)
+int lxc_attach_to_ns(pid_t pid, int which)
 {
char path[MAXPATHLEN];
char *ns[] = { "pid", "mnt", "net", "ipc", "

[lxc-devel] [PATCH] [trivial] Add files to .gitignore

2012-05-18 Thread Christian Seiler
---
 .gitignore |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/.gitignore b/.gitignore
index 8c84a23..1171613 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,7 +34,12 @@ templates/lxc-altlinux
 templates/lxc-sshd
 templates/lxc-busybox
 templates/lxc-archlinux
+templates/lxc-lenny
+templates/lxc-opensuse
+templates/lxc-ubuntu
+templates/lxc-ubuntu-cloud
 
+src/lxc/liblxc.so.0
 src/lxc/lxc-attach
 src/lxc/lxc-cgroup
 src/lxc/lxc-checkconfig
@@ -63,6 +68,8 @@ src/lxc/lxc-unfreeze
 src/lxc/lxc-unshare
 src/lxc/lxc-version
 src/lxc/lxc-wait
+src/lxc/lxc-clone
+src/lxc/lxc-setuid
 
 config/compile
 config/config.guess
-- 
1.7.2.5


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Add option to lxc-attach to select specific namespaces

2012-05-18 Thread Christian Seiler
>> +int flags[] = { CLONE_NEWPID, CLONE_NEWNS, CLONE_NEWNET, CLONE_NEWIPC, 
>> CLONE_NEWUTS };
> ...
>> -static char *namespaces_list[] = {
>> -"MOUNT", "PID", "UTSNAME", "IPC",
>> -"USER", "NETWORK"
>> -};
>> -static int cloneflags_list[] = {
>> -CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
>> -CLONE_NEWUSER, CLONE_NEWNET
>> -};
> 
> These should be commonized.  I'm surprised this patch worked for you, as
> the indices for network don't match up.

Yes, they do, but you have to see which one matches against which:

namespaces_list <-> cloneflags_list

but in the lxc_attach_to_pid there's

char *ns[] = { "pid", "mnt", "net", "ipc", "uts" };

so we have

flags <-> ns. In the end, those are totally different arrays.

I've blacklisten USER because I don't know to which file in
/proc/$pid/ns it will map to once the feature is inside the kernel (I
only see the 5 in the char *ns[] list on my system) - I'll happily
rearrange them and add CLONE_NEWUSER to the flags and ns lists in the
attach to pid function.

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v2 1/2] Add option to lxc-attach to select specific namespaces

2012-05-21 Thread Christian Seiler
This patch adds the -s/--namespaces option to lxc-attach that works
analogously to lxc-unshare, allowing the user to select the namespaces the
process should be attached to.

User namespaces are supported, under the assumption that the file in
/proc/pid/ns will be called 'usr'. Currently, user namespaces will be
skipped (without having lxc-attach fail, unlike for other namespaces) if the
kernel lacks support.

Signed-off-by: Christian Seiler 
Cc: Stéphane Graber 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 doc/lxc-attach.sgml.in |   99 +--
 src/lxc/attach.c   |   72 --
 src/lxc/attach.h   |2 +-
 src/lxc/lxc_attach.c   |   28 -
 src/lxc/lxc_unshare.c  |   46 --
 src/lxc/namespace.c|   46 ++
 src/lxc/namespace.h|3 +
 7 files changed, 236 insertions(+), 60 deletions(-)

diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
index 7092f16..d7fb223 100644
--- a/doc/lxc-attach.sgml.in
+++ b/doc/lxc-attach.sgml.in
@@ -49,7 +49,8 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
 lxc-attach -n
 name -a
-arch -e
+arch -e -s
+namespaces
 -- command
   
 
@@ -122,6 +123,29 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA

   
 
+  
+   
+ -s, --namespaces 
namespaces
+   
+   
+ 
+   Specify the namespaces to attach to, as a pipe-separated liste,
+   e.g. NETWORK|IPC. Allowed values are
+   MOUNT, PID,
+   UTSNAME, IPC,
+   USER  and
+   NETWORK. This allows one to change
+   the context of the process to e.g. the network namespace of the
+   container while retaining the other namespaces as those of the
+   host.
+ 
+ 
+   Important: This option implies
+   -e.
+ 
+   
+  
+
 
 
   
@@ -144,19 +168,84 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
   
 To deactivate the network link eth1 of a running container that
-does not have the NET_ADMIN capability, use the -e
-option to use increased capabilities:
+does not have the NET_ADMIN capability, use either the
+-e option to use increased capabilities,
+assuming the ip tool is installed:
 
   lxc-attach -n container -e -- /sbin/ip link delete eth1
 
+Or, alternatively, use the -s to use the
+tools installed on the host outside the container:
+
+  lxc-attach -n container -s NETWORK -- /sbin/ip link delete eth1
+
   
   
 
   
+Compatibility
+
+  Attaching completely (including the pid and mount namespaces) to a
+  container requires a patched kernel, please see the lxc website for
+  details. lxc-attach will fail in that case if
+  used with an unpatched kernel.
+
+
+  Nevertheless, it will succeed on an unpatched kernel of version 3.0
+  or higher if the -s option is used to restrict the
+  namespaces that the process is to be attached to to one or more of 
+  NETWORK, IPC
+  and UTSNAME.
+
+
+  Attaching to user namespaces is currently completely unsupported
+  by the kernel. User namespaces will be skipped (but will not cause
+  lxc-attach to fail) unless used with a future
+  version of the kernel that supports this.
+
+  
+
+  
+Notes
+
+  The Linux /proc and
+  /sys filesystems contain information
+  about some quantities that are affected by namespaces, such as
+  the directories named after process ids in
+  /proc or the network interface infromation
+  in /sys/class/net. The namespace of the
+  process mounting the pseudo-filesystems determines what information
+  is shown, not the namespace of the process
+  accessing /proc or
+  /sys.
+
+
+  If one uses the -s option to only attach to
+  the pid namespace of a container, but not its mount namespace
+  (which will contain the /proc of the
+  container and not the host), the contents of /proc
+  will reflect that of the host and not the container. Analogously,
+  the same issue occurs when reading the contents of
+  /sys/class/net and attaching to just
+  the network namespace.
+
+
+  A workaround is to use lxc-unshare to unshare
+  the mount namespace after using lxc-attach with
+  -s PID and/or -s
+  NETWORK and then unmount and then mount again both
+  pseudo-filesystems within that new mount namespace, before
+  executing a program/script that relies on this information to be
+  correct.
+
+  
+
+  
 Security
 
-  The -e should be used with care, as it may break
-  the isolation of the containers if used improperly.
+  

[lxc-devel] [PATCH v2 0/2] Partial namespaces for lxc-attach

2012-05-21 Thread Christian Seiler
Hi Serge,

I've updated my patch for lxc-attach in order to reflect your comments: The
ordering of the flags is now consistent across the source code and I've
added CLONE_NEWUSER to the list of flags. The only thing I wasn't clear
about was what the file in /proc/pid/ns will be called once setns() supports
user namespaces - I used 'usr' but added a comment that this might need to
be changed once the kernel actually supports this and the kernel developers
decide to use something else.

I've also added a patch to add the -R option to allow the remounting of
/proc and /sys when attaching to e.g. only a network namespace.

Regards, 
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v2 2/2] lxc-attach: Add -R option to remount /sys and /proc when only partially attaching

2012-05-21 Thread Christian Seiler
When attaching to only some namespaces of the container but not the mount
namespace, the contents of /sys and /proc of the host system do not properly
reflect the context of the container's pid and/or network namespaces, and
possibly others.

The introduced -R option adds the possibility to additionally unshare the
mount namespace (when it is not being attached) and remount /sys and /proc
in order for those filesystems to properly reflect the container's context
even when only attaching to some of the namespaces.

Signed-off-by: Christian Seiler 
Cc: Stéphane Graber 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 doc/lxc-attach.sgml.in |   42 ++
 src/lxc/attach.c   |   44 
 src/lxc/attach.h   |1 +
 src/lxc/lxc_attach.c   |   27 ++-
 4 files changed, 105 insertions(+), 9 deletions(-)

diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
index d7fb223..1d7684e 100644
--- a/doc/lxc-attach.sgml.in
+++ b/doc/lxc-attach.sgml.in
@@ -50,7 +50,7 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
 lxc-attach -n
 name -a
 arch -e -s
-namespaces
+namespaces -R
 -- command
   
 
@@ -146,6 +146,29 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA

   
 
+  
+   
+ -R, --remount-sys-proc
+   
+   
+ 
+   When using -s and the mount namespace is not
+   included, this flag will cause lxc-attach
+   to remount /proc and
+   /sys to reflect the current other
+   namespace contexts.
+ 
+ 
+   Please see the Notes section for more
+   details.
+ 
+ 
+   This option will be ignored if one tries to attach to the
+   mount namespace anyway.
+ 
+   
+  
+
 
 
   
@@ -230,13 +253,16 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   the network namespace.
 
 
-  A workaround is to use lxc-unshare to unshare
-  the mount namespace after using lxc-attach with
-  -s PID and/or -s
-  NETWORK and then unmount and then mount again both
-  pseudo-filesystems within that new mount namespace, before
-  executing a program/script that relies on this information to be
-  correct.
+  To work around this problem, the -R flag provides
+  the option to remount /proc and
+  /sys in order for them to reflect the
+  network/pid namespace context of the attached process. In order
+  not to interfere with the host's actual filesystem, the mount
+  namespace will be unshared (like lxc-unshare
+  does) before this is done, esentially giving the process a new
+  mount namespace, which is identical to the hosts's mount namespace
+  except for the /proc and
+  /sys filesystems.
 
   
 
diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index 9d598f0..2ae9587 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #if !HAVE_DECL_PR_CAPBSET_DROP
@@ -218,6 +219,49 @@ int lxc_attach_to_ns(pid_t pid, int which)
return 0;
 }
 
+int lxc_attach_remount_sys_proc()
+{
+   int ret;
+
+   ret = unshare(CLONE_NEWNS);
+   if (ret < 0) {
+   SYSERROR("failed to unshare mount namespace: %s", 
strerror(errno));
+   return -1;
+   }
+
+   /* assume /proc is always mounted, so remount it */
+   ret = umount2("/proc", MNT_DETACH);
+   if (ret < 0) {
+   SYSERROR("failed to unmount /proc: %s", strerror(errno));
+   return -1;
+   }
+
+   ret = mount("none", "/proc", "proc", 0, NULL);
+   if (ret < 0) {
+   SYSERROR("failed to remount /proc: %s", strerror(errno));
+   return -1;
+   }
+
+   /* try to umount /sys - if it's not a mount point,
+* we'll get EINVAL, then we ignore it because it
+* may not have been mounted in the first place
+*/
+   ret = umount2("/sys", MNT_DETACH);
+   if (ret < 0 && errno != EINVAL) {
+   SYSERROR("failed to unmount /sys: %s", strerror(errno));
+   return -1;
+   } else if (ret == 0) {
+   /* remount it */
+   ret = mount("none", "/sys", "sysfs", 0, NULL);
+   if (ret < 0) {
+   SYSERROR("failed to remount /sys: %s", strerror(errno));
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
 int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx)
 {
int last_cap = lxc_caps_last_cap();
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index

Re: [lxc-devel] [PATCH v2 1/2] Add option to lxc-attach to select specific namespaces

2012-05-22 Thread Christian Seiler
Hi Serge,

> Note that for now the same thing will happen with pid.  I don't think
> CLONE_NEWUSER needs to be special cased.  Likewise, someone may want
> to use this lxc on an older kernel without any setns support at all.

I'm not sure this is wise: Currently, kernel 3.0 supports all 
namespaces
except pid, mount and user for setns(). Since user namespaces are not
very well supported in general, lxc-start currently does not even set
one up when starting a container.

Therefore, I think the correct logic should be the following:

  * If the namespace is used by lxc-start when clone()ing to initialize
a container, FAIL if one wants to attach without specifying partial
namespaces. This follows the principle of least surprise: lxc-attach
without parameters will either work and one is completely attached
or it will fail.

For those administrators who don't care about pid/mount namespaces
on current vanilla kernels but want to do partial attachments, the
-s flag still allows for that.

  * If lxc-start does not use the namespace (currently only user
namespaces), still try to attach to it, (making lxc-attach
future-proof) but ignore any failure since it doesn't really matter
if it fails as long as user namespaces aren't used by lxc-start.

Thoughts?

> Your choices for behavior are good (print a msg for which == -1,
> and error out if the namespace was specially chosen), but I think
> you should simply do it for all namespaces.

Actually, ERROR() just prints an error message, but does not terminate 
the
program, (the only difference between ERROR and DEBUG is that ERROR 
will be
seen on stderr by default, for DEBUG you need a log file) so missing 
user
namespaces will not cause the program to terminate. But thinking about 
it,
it's probably better if it did, because if the user explicitly 
requested it,
this should really be an error condition. I'll update the patch.

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH v2 1/2] Add option to lxc-attach to select specific namespaces

2012-05-22 Thread Christian Seiler
Hi Serge,

> That sounds good, but then to do it right the "which namespaces were
> unshared by the container" shouldn't be hardcoded in.  Unfortunately,
> without the /proc/self/ns/ links there's no way to tell, so we can't
> answer your question.
>
> So I think we should do your point 1, but not your point 2.  I'm 
> still
> not happy about special casing user ns in the code.  What will happen
> when we get devices namespaces and most people, but not all, have
> /proc/self/ns/user?  More hard-coded exceptions?
>
> I don't have an answer right now, just not happy with any of the
> ones I can think of.  (Will keep thinking)

What about if we update the command interface to add an additional
command along the lines of LXC_COMMAND_GET_NSFLAGS or similar, which
returns the bitmask of CLONE_* used for starting the container? Then
we would have the logic:

  - no -s paramter for lxc-attach: attach to all namespaces found in
the bitmask retrieved via the command interface (and fail if
kernel doesn't support it)
  - user supplied -s parameter: try only those and fail if that doesn't
work

Then nothing would be hard-coded and it'd be completely future-proof.

Thoughts?

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH v2 1/2] Add option to lxc-attach to select specific namespaces

2012-05-22 Thread Christian Seiler
Hi Serge,

>> What about if we update the command interface to add an additional
>> command along the lines of LXC_COMMAND_GET_NSFLAGS or similar, which
>> returns the bitmask of CLONE_* used for starting the container? Then
>> we would have the logic:
>
> That works fine for persistent containers which were started without
> any command line changes.  But even with a persistent container with
> no network section, I could add a network section on the lxc-start
> command line with '-s' arguments, making the set of cloned namespaces
> different from what you'd expect from the config file.  So there is
> no good way I can think of, generally, to get that bitmask of CLONE_*
> flags used for starting the container.

You misunderstood me: I don't want to read the configuration file - I
want to ask the still-running lxc-start process (that listens on the
abstract socket for the container) to give me the flags it used when
it was run. Just as it may be asked to return a file descriptor for
the console or the PID of the init process. We don't have to generate
any file or store anything, we can just keep the information in a
simple variable that we return via the command interface in case
lxc-attach (or somebody else) asks.

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH v2 1/2] Add option to lxc-attach to select specific namespaces

2012-05-22 Thread Christian Seiler
Hi Serge,

>> >>What about if we update the command interface to add an additional
>> >>command along the lines of LXC_COMMAND_GET_NSFLAGS or similar, 
>> which
>> >>returns the bitmask of CLONE_* used for starting the container? 
>> Then
>> >>we would have the logic:
>> >
>> >That works fine for persistent containers which were started 
>> without
>> >any command line changes.  But even with a persistent container 
>> with
>> >no network section, I could add a network section on the lxc-start
>> >command line with '-s' arguments, making the set of cloned 
>> namespaces
>> >different from what you'd expect from the config file.  So there is
>> >no good way I can think of, generally, to get that bitmask of 
>> CLONE_*
>> >flags used for starting the container.
>>
>> You misunderstood me: I don't want to read the configuration file - 
>> I
>> want to ask the still-running lxc-start process (that listens on the
>> abstract socket for the container) to give me the flags it used when
>> it was run. Just as it may be asked to return a file descriptor for
>> the console or the PID of the init process. We don't have to 
>> generate
>> any file or store anything, we can just keep the information in a
>> simple variable that we return via the command interface in case
>> lxc-attach (or somebody else) asks.
>
> That sounds good :)

Ok, then I'll update the patches and resend them to the list.

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v3 4/6] lxc-unshare: Move functions to determine clone flags from command line options to namespace.c

2012-05-24 Thread Christian Seiler
In order to be able to reuse code in lxc-attach, the functions
lxc_namespace_2_cloneflag and lxc_fill_namespace_flags are moved from
lxc_unshare.c to namespace.c.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 src/lxc/lxc_unshare.c |   45 -
 src/lxc/namespace.c   |   45 +
 src/lxc/namespace.h   |3 +++
 3 files changed, 48 insertions(+), 45 deletions(-)

diff --git a/src/lxc/lxc_unshare.c b/src/lxc/lxc_unshare.c
index 0baccb0..fda2ed8 100644
--- a/src/lxc/lxc_unshare.c
+++ b/src/lxc/lxc_unshare.c
@@ -85,51 +85,6 @@ static uid_t lookup_user(const char *optarg)
return uid;
 }
 
-static char *namespaces_list[] = {
-   "MOUNT", "PID", "UTSNAME", "IPC",
-   "USER", "NETWORK"
-};
-static int cloneflags_list[] = {
-   CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
-   CLONE_NEWUSER, CLONE_NEWNET
-};
-
-static int lxc_namespace_2_cloneflag(char *namespace)
-{
-   int i, len;
-   len = sizeof(namespaces_list)/sizeof(namespaces_list[0]);
-   for (i = 0; i < len; i++)
-   if (!strcmp(namespaces_list[i], namespace))
-   return cloneflags_list[i];
-
-   ERROR("invalid namespace name %s", namespace);
-   return -1;
-}
-
-static int lxc_fill_namespace_flags(char *flaglist, int *flags)
-{
-   char *token, *saveptr = NULL;
-   int aflag;
-
-   if (!flaglist) {
-   ERROR("need at least one namespace to unshare");
-   return -1;
-   }
-
-   token = strtok_r(flaglist, "|", &saveptr);
-   while (token) {
-
-   aflag = lxc_namespace_2_cloneflag(token);
-   if (aflag < 0)
-   return -1;
-
-   *flags |= aflag;
-
-   token = strtok_r(NULL, "|", &saveptr);
-   }
-   return 0;
-}
-
 
 struct start_arg {
char ***args;
diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c
index 3e6fc3a..3fa027b 100644
--- a/src/lxc/namespace.c
+++ b/src/lxc/namespace.c
@@ -69,3 +69,48 @@ pid_t lxc_clone(int (*fn)(void *), void *arg, int flags)
 
return ret;
 }
+
+static char *namespaces_list[] = {
+   "MOUNT", "PID", "UTSNAME", "IPC",
+   "USER", "NETWORK"
+};
+static int cloneflags_list[] = {
+   CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
+   CLONE_NEWUSER, CLONE_NEWNET
+};
+
+int lxc_namespace_2_cloneflag(char *namespace)
+{
+   int i, len;
+   len = sizeof(namespaces_list)/sizeof(namespaces_list[0]);
+   for (i = 0; i < len; i++)
+   if (!strcmp(namespaces_list[i], namespace))
+   return cloneflags_list[i];
+
+   ERROR("invalid namespace name %s", namespace);
+   return -1;
+}
+
+int lxc_fill_namespace_flags(char *flaglist, int *flags)
+{
+   char *token, *saveptr = NULL;
+   int aflag;
+
+   if (!flaglist) {
+   ERROR("need at least one namespace to unshare");
+   return -1;
+   }
+
+   token = strtok_r(flaglist, "|", &saveptr);
+   while (token) {
+
+   aflag = lxc_namespace_2_cloneflag(token);
+   if (aflag < 0)
+   return -1;
+
+   *flags |= aflag;
+
+   token = strtok_r(NULL, "|", &saveptr);
+   }
+   return 0;
+}
diff --git a/src/lxc/namespace.h b/src/lxc/namespace.h
index 5442dd3..04e81bb 100644
--- a/src/lxc/namespace.h
+++ b/src/lxc/namespace.h
@@ -50,4 +50,7 @@
 
 extern pid_t lxc_clone(int (*fn)(void *), void *arg, int flags);
 
+extern int lxc_namespace_2_cloneflag(char *namespace);
+extern int lxc_fill_namespace_flags(char *flaglist, int *flags);
+
 #endif
-- 
1.7.2.5


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v3 0/6] Partial namespaces for lxc-attach

2012-05-24 Thread Christian Seiler
Hi there,

this is my third (and hopefully final :)) patch series for partial
namespaces in lxc-attach. I've made the following changes to the previous
versions:

 - Split up the patches a tiny bit more, should make the changes a bit
   clearer.

 - I actually encountered a problem with pid namespaces that I introduced
   when I first added cgroup attaching support to lxc-attach: For pid
   namespaces, only the children of the process doing setns() are really
   100% in the namespace, so the process doing setns() won't get a new pid
   and if that process remounts /proc, it will still show the host's and not
   the container's contents. So I've changed it up a bit to make the setns()
   call again before the fork() - but then I had to adapt the cgroup logic.
   The current solution is the simplest I could come up with. This is the
   patch #2.

 - lxc-start now has a command interface (patch #1) that is used to retrieve
   the clone flags and to attach only to those namespaces when running
   lxc-attach without any parameters (patch #3)

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v3 1/6] lxc-start: Add command to retrieve the clone flags used to start the container.

2012-05-24 Thread Christian Seiler
Add the LXC_COMMAND_CLONE_FLAGS that retrieves the flags passed to clone(2)
when the container was started. This allows external programs to determine
which namespaces the container was unshared from.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 src/lxc/commands.c |   30 ++
 src/lxc/commands.h |2 ++
 src/lxc/start.c|   34 --
 src/lxc/start.h|1 +
 4 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/src/lxc/commands.c b/src/lxc/commands.c
index 1d488ae..3e551ee 100644
--- a/src/lxc/commands.c
+++ b/src/lxc/commands.c
@@ -148,11 +148,32 @@ pid_t get_init_pid(const char *name)
return command.answer.pid;
 }
 
+int lxc_get_clone_flags(const char *name)
+{
+   struct lxc_command command = {
+   .request = { .type = LXC_COMMAND_CLONE_FLAGS },
+   };
+
+   int ret, stopped = 0;
+
+   ret = lxc_command(name, &command, &stopped);
+   if (ret < 0 && stopped)
+   return -1;
+
+   if (ret < 0) {
+   ERROR("failed to send command");
+   return -1;
+   }
+
+   return command.answer.ret;
+}
+
 extern void lxc_console_remove_fd(int, struct lxc_tty_info *);
 extern int  lxc_console_callback(int, struct lxc_request *, struct lxc_handler 
*);
 extern int  lxc_stop_callback(int, struct lxc_request *, struct lxc_handler *);
 extern int  lxc_state_callback(int, struct lxc_request *, struct lxc_handler 
*);
 extern int  lxc_pid_callback(int, struct lxc_request *, struct lxc_handler *);
+extern int  lxc_clone_flags_callback(int, struct lxc_request *, struct 
lxc_handler *);
 
 static int trigger_command(int fd, struct lxc_request *request,
   struct lxc_handler *handler)
@@ -160,10 +181,11 @@ static int trigger_command(int fd, struct lxc_request 
*request,
typedef int (*callback)(int, struct lxc_request *, struct lxc_handler 
*);
 
callback cb[LXC_COMMAND_MAX] = {
-   [LXC_COMMAND_TTY]   = lxc_console_callback,
-   [LXC_COMMAND_STOP]  = lxc_stop_callback,
-   [LXC_COMMAND_STATE] = lxc_state_callback,
-   [LXC_COMMAND_PID]   = lxc_pid_callback,
+   [LXC_COMMAND_TTY] = lxc_console_callback,
+   [LXC_COMMAND_STOP]= lxc_stop_callback,
+   [LXC_COMMAND_STATE]   = lxc_state_callback,
+   [LXC_COMMAND_PID] = lxc_pid_callback,
+   [LXC_COMMAND_CLONE_FLAGS] = lxc_clone_flags_callback,
};
 
if (request->type < 0 || request->type >= LXC_COMMAND_MAX)
diff --git a/src/lxc/commands.h b/src/lxc/commands.h
index d5c013f..3b0ac9a 100644
--- a/src/lxc/commands.h
+++ b/src/lxc/commands.h
@@ -28,6 +28,7 @@ enum {
LXC_COMMAND_STOP,
LXC_COMMAND_STATE,
LXC_COMMAND_PID,
+   LXC_COMMAND_CLONE_FLAGS,
LXC_COMMAND_MAX,
 };
 
@@ -48,6 +49,7 @@ struct lxc_command {
 };
 
 extern pid_t get_init_pid(const char *name);
+extern int lxc_get_clone_flags(const char *name);
 
 extern int lxc_command(const char *name, struct lxc_command *command,
int *stopped);
diff --git a/src/lxc/start.c b/src/lxc/start.c
index 920ff77..7e9913f 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -277,6 +277,29 @@ int lxc_pid_callback(int fd, struct lxc_request *request,
return 0;
 }
 
+int lxc_clone_flags_callback(int fd, struct lxc_request *request,
+struct lxc_handler *handler)
+{
+   struct lxc_answer answer;
+   int ret;
+
+   answer.pid = 0;
+   answer.ret = handler->clone_flags;
+
+   ret = send(fd, &answer, sizeof(answer), 0);
+   if (ret < 0) {
+   WARN("failed to send answer to the peer");
+   return -1;
+   }
+
+   if (ret != sizeof(answer)) {
+   ERROR("partial answer sent");
+   return -1;
+   }
+
+   return 0;
+}
+
 int lxc_set_state(const char *name, struct lxc_handler *handler, lxc_state_t 
state)
 {
handler->state = state;
@@ -531,17 +554,16 @@ out_warn_father:
 
 int lxc_spawn(struct lxc_handler *handler)
 {
-   int clone_flags;
int failed_before_rename = 0;
const char *name = handler->name;
 
if (lxc_sync_init(handler))
return -1;
 
-   clone_flags = CLONE_NEWUTS|CLONE_NEWPID|CLONE_NEWIPC|CLONE_NEWNS;
+   handler->clone_flags = 
CLONE_NEWUTS|CLONE_NEWPID|CLONE_NEWIPC|CLONE_NEWNS;
if (!lxc_list_empty(&handler->conf->network)) {
 
-   clone_flags |= CLONE_NEWNET;
+   handler->clone_flags |= CLONE_NEWNET;
 
/* Find gateway addresses from the link device, which is
 * no longer accessible inside the container. Do this
@@ -564,7 +586,7 @@ int lxc_spawn(struct lxc_handler *han

[lxc-devel] [PATCH v3 2/6] lxc-attach: Remodel cgroup attach logic and attach to namespaces again in parent process

2012-05-24 Thread Christian Seiler
With the introduction of lxc-attach's functionality to attach to cgroups,
the setns() calls were put in the child process after the fork() and not the
parent process before the fork() so the parent process remained outside the
namespaces and could add the child to the correct cgroup.

Unfortunately, the pid namespace really affects only children of the current
process and not the process itself, which has several drawbacks: The
attached program does not have a pid inside the container and the context
that is used when remounting /proc from that process is wrong. Thus, the
previous logic of first setting the namespaces and then forking so the child
process (which then exec()s to the desired program) is a real member of the
container.

However, inside the container, there is no guarantee that the cgroup
filesystem is still be mounted and that we are allowed to write to it (which
is why the setns() was moved in the first place).

To work around both problems, we separate the cgroup attach functionality
into two parts: Preparing the attach process, which just opens the tasks
files of all cgroups and keeps the file descriptors open and the writing to
those fds part. This allows us to open all the tasks files in lxc_attach,
then call setns(), then fork, in the child process close them completely and
in the parent process just write the pid of the child process to all those
fds.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 src/lxc/cgroup.c |  152 -
 src/lxc/cgroup.h |3 +
 src/lxc/lxc_attach.c |   62 +++-
 3 files changed, 186 insertions(+), 31 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index e124499..f1461f4 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -254,13 +254,37 @@ static int cgroup_enable_clone_children(const char *path)
return ret;
 }
 
-static int lxc_one_cgroup_attach(const char *name,
-struct mntent *mntent, pid_t pid)
+static int lxc_one_cgroup_finish_attach(int fd, pid_t pid)
 {
-   FILE *f;
+   char buf[32];
+   int ret;
+
+   snprintf(buf, 32, "%ld", (long)pid);
+
+   ret = write(fd, buf, strlen(buf));
+   if (ret <= 0) {
+   SYSERROR("failed to write pid '%ld' to fd '%d'", (long)pid, fd);
+   ret = -1;
+   } else {
+   ret = 0;
+   }
+
+   close(fd);
+   return ret;
+}
+
+static int lxc_one_cgroup_dispose_attach(int fd)
+{
+   close(fd);
+   return 0;
+}
+
+static int lxc_one_cgroup_prepare_attach(const char *name, struct mntent 
*mntent)
+{
+   int fd;
char tasks[MAXPATHLEN], initcgroup[MAXPATHLEN];
char *cgmnt = mntent->mnt_dir;
-   int flags, ret = 0;
+   int flags;
 
flags = get_cgroup_flags(mntent);
 
@@ -269,31 +293,83 @@ static int lxc_one_cgroup_attach(const char *name,
 (flags & CGROUP_NS_CGROUP) ? "" : "/lxc",
 name);
 
-   f = fopen(tasks, "w");
-   if (!f) {
+   fd = open(tasks, O_WRONLY);
+   if (fd < 0) {
SYSERROR("failed to open '%s'", tasks);
return -1;
}
 
-   if (fprintf(f, "%d", pid) <= 0) {
-   SYSERROR("failed to write pid '%d' to '%s'", pid, tasks);
-   ret = -1;
+   return fd;
+}
+
+static int lxc_one_cgroup_attach(const char *name, struct mntent *mntent, 
pid_t pid)
+{
+   int fd;
+
+   fd = lxc_one_cgroup_prepare_attach(name, mntent);
+   if (fd < 0) {
+   return -1;
}
 
-   fclose(f);
+   return lxc_one_cgroup_finish_attach(fd, pid);
+}
+
+int lxc_cgroup_dispose_attach(void *data)
+{
+   int *fds = data;
+   int ret, err;
+
+   if (!fds) {
+   return 0;
+   }
+
+   ret = 0;
+
+   for (; *fds >= 0; fds++) {
+   err = lxc_one_cgroup_dispose_attach(*fds);
+   if (err) {
+   ret = err;
+   }
+   }
+
+   free(data);
 
return ret;
 }
 
-/*
- * for each mounted cgroup, attach a pid to the cgroup for the container
- */
-int lxc_cgroup_attach(const char *name, pid_t pid)
+int lxc_cgroup_finish_attach(void *data, pid_t pid)
+{
+   int *fds = data;
+   int err;
+
+   if (!fds) {
+   return 0;
+   }
+
+   for (; *fds >= 0; fds++) {
+   err = lxc_one_cgroup_finish_attach(*fds, pid);
+   if (err) {
+   /* get rid of the rest of them */
+   lxc_cgroup_dispose_attach(data);
+   return -1;
+   }
+   *fds = -1;
+   }
+
+   free(data);
+
+   return 0;
+}
+
+int lxc_cgroup_prepare_attach(const char *name, void **data)
 {
   

[lxc-devel] [PATCH v3 3/6] lxc-attach: Detect which namespaces to attach to dynamically

2012-05-24 Thread Christian Seiler
Use the command interface to contact lxc-start to receive the set of
flags passed to clone() when starting the container. This allows lxc-attach
to determine which namespaces were used for the container and select only
those to attach to.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 src/lxc/attach.c |   42 +-
 src/lxc/attach.h |2 +-
 src/lxc/lxc_attach.c |   16 +++-
 3 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index a95b3d3..37e667f 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -121,13 +121,22 @@ out_error:
return NULL;
 }
 
-int lxc_attach_to_ns(pid_t pid)
+int lxc_attach_to_ns(pid_t pid, int which)
 {
char path[MAXPATHLEN];
-   char *ns[] = { "pid", "mnt", "net", "ipc", "uts" };
-   const int size = sizeof(ns) / sizeof(char *);
+   /* according to 
<http://article.gmane.org/gmane.linux.kernel.containers.lxc.devel/1429>,
+* the file for user namepsaces in /proc/$pid/ns will be called
+* 'user' once the kernel supports it
+*/
+   static char *ns[] = { "mnt", "pid", "uts", "ipc", "user", "net" };
+   static int flags[] = {
+   CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
+   CLONE_NEWUSER, CLONE_NEWNET
+   };
+   static const int size = sizeof(ns) / sizeof(char *);
int fd[size];
-   int i;
+   int i, j, saved_errno;
+
 
snprintf(path, MAXPATHLEN, "/proc/%d/ns", pid);
if (access(path, X_OK)) {
@@ -136,16 +145,39 @@ int lxc_attach_to_ns(pid_t pid)
}
 
for (i = 0; i < size; i++) {
+   /* ignore if we are not supposed to attach to that
+* namespace
+*/
+   if (which != -1 && !(which & flags[i])) {
+   fd[i] = -1;
+   continue;
+   }
+
snprintf(path, MAXPATHLEN, "/proc/%d/ns/%s", pid, ns[i]);
fd[i] = open(path, O_RDONLY);
if (fd[i] < 0) {
+   saved_errno = errno;
+
+   /* close all already opened file descriptors before
+* we return an error, so we don't leak them
+*/
+   for (j = 0; j < i; j++)
+   close(fd[j]);
+
+   errno = saved_errno;
SYSERROR("failed to open '%s'", path);
return -1;
}
}
 
for (i = 0; i < size; i++) {
-   if (setns(fd[i], 0)) {
+   if (fd[i] >= 0 && setns(fd[i], 0) != 0) {
+   saved_errno = errno;
+
+   for (j = i; j < size; j++)
+   close(fd[j]);
+
+   errno = saved_errno;
SYSERROR("failed to set namespace '%s'", ns[i]);
return -1;
}
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index 2d46c83..d96fdae 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -33,7 +33,7 @@ struct lxc_proc_context_info {
 
 extern struct lxc_proc_context_info *lxc_proc_get_context_info(pid_t pid);
 
-extern int lxc_attach_to_ns(pid_t other_pid);
+extern int lxc_attach_to_ns(pid_t other_pid, int which);
 extern int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx);
 
 #endif
diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index e4f604b..10d4a64 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -51,6 +51,7 @@ static const struct option my_longopts[] = {
 
 static int elevated_privileges = 0;
 static signed long new_personality = -1;
+static int namespace_flags = -1;
 
 static int my_parser(struct lxc_arguments* args, int c, char* arg)
 {
@@ -139,11 +140,24 @@ int main(int argc, char *argv[])
 
curdir = get_current_dir_name();
 
+   /* determine which namespaces the container was created with
+* by asking lxc-start
+*/
+   if (namespace_flags == -1) {
+   namespace_flags = lxc_get_clone_flags(my_args.name);
+   /* call failed */
+   if (namespace_flags == -1) {
+   ERROR("failed to automatically determine the "
+ "namespaces which the container unshared");
+   return -1;
+   }
+   }
+
/* we need to attach before we fork since certain namespaces
 * (such as pid namespaces) only really affect children of the
 * current process and not 

[lxc-devel] [PATCH v3 5/6] lxc-attach: Add -s option to select namespaces to attach to

2012-05-24 Thread Christian Seiler
This patch allows the user to select any list of namespaces (network, pid,
mount, uts, ipc, user) that lxc-attach should use when attaching to the
container; all other namespaces will not be attached to.

This allows the user to for example attach to just the network namespace and
use the host's (and not the container's) network tools to reconfigure the
network of the container.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 doc/lxc-attach.sgml.in |   98 +--
 src/lxc/lxc_attach.c   |   20 +-
 2 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
index 7092f16..035cd27 100644
--- a/doc/lxc-attach.sgml.in
+++ b/doc/lxc-attach.sgml.in
@@ -49,7 +49,8 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
 lxc-attach -n
 name -a
-arch -e
+arch -e -s
+namespaces
 -- command
   
 
@@ -122,6 +123,29 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA

   
 
+  
+   
+ -s, --namespaces 
namespaces
+   
+   
+ 
+   Specify the namespaces to attach to, as a pipe-separated liste,
+   e.g. NETWORK|IPC. Allowed values are
+   MOUNT, PID,
+   UTSNAME, IPC,
+   USER  and
+   NETWORK. This allows one to change
+   the context of the process to e.g. the network namespace of the
+   container while retaining the other namespaces as those of the
+   host.
+ 
+ 
+   Important: This option implies
+   -e.
+ 
+   
+  
+
 
 
   
@@ -144,19 +168,83 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
   
 To deactivate the network link eth1 of a running container that
-does not have the NET_ADMIN capability, use the -e
-option to use increased capabilities:
+does not have the NET_ADMIN capability, use either the
+-e option to use increased capabilities,
+assuming the ip tool is installed:
 
   lxc-attach -n container -e -- /sbin/ip link delete eth1
 
+Or, alternatively, use the -s to use the
+tools installed on the host outside the container:
+
+  lxc-attach -n container -s NETWORK -- /sbin/ip link delete eth1
+
   
   
 
   
+Compatibility
+
+  Attaching completely (including the pid and mount namespaces) to a
+  container requires a patched kernel, please see the lxc website for
+  details. lxc-attach will fail in that case if
+  used with an unpatched kernel.
+
+
+  Nevertheless, it will succeed on an unpatched kernel of version 3.0
+  or higher if the -s option is used to restrict the
+  namespaces that the process is to be attached to to one or more of 
+  NETWORK, IPC
+  and UTSNAME.
+
+
+  Attaching to user namespaces is currently completely unsupported
+  by the kernel. lxc-attach should however be able
+  to do this once once future kernel versions implement this.
+
+  
+
+  
+Notes
+
+  The Linux /proc and
+  /sys filesystems contain information
+  about some quantities that are affected by namespaces, such as
+  the directories named after process ids in
+  /proc or the network interface infromation
+  in /sys/class/net. The namespace of the
+  process mounting the pseudo-filesystems determines what information
+  is shown, not the namespace of the process
+  accessing /proc or
+  /sys.
+
+
+  If one uses the -s option to only attach to
+  the pid namespace of a container, but not its mount namespace
+  (which will contain the /proc of the
+  container and not the host), the contents of /proc
+  will reflect that of the host and not the container. Analogously,
+  the same issue occurs when reading the contents of
+  /sys/class/net and attaching to just
+  the network namespace.
+
+
+  A workaround is to use lxc-unshare to unshare
+  the mount namespace after using lxc-attach with
+  -s PID and/or -s
+  NETWORK and then unmount and then mount again both
+  pseudo-filesystems within that new mount namespace, before
+  executing a program/script that relies on this information to be
+  correct.
+
+  
+
+  
 Security
 
-  The -e should be used with care, as it may break
-  the isolation of the containers if used improperly.
+  The -e and -s options should
+  be used with care, as it may break the isolation of the containers
+  if used improperly.
 
   
 
diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index 10d4a64..4f22752 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -40,12 +40,14 @@
 #include "start.h"
 #include "

[lxc-devel] [PATCH v3 6/6] lxc-attach: Add -R option to remount /sys and /proc when only partially attaching

2012-05-24 Thread Christian Seiler
When attaching to only some namespaces of the container but not the mount
namespace, the contents of /sys and /proc of the host system do not properly
reflect the context of the container's pid and/or network namespaces, and
possibly others.

The introduced -R option adds the possibility to additionally unshare the
mount namespace (when it is not being attached) and remount /sys and /proc
in order for those filesystems to properly reflect the container's context
even when only attaching to some of the namespaces.

Signed-off-by: Christian Seiler 
Acked-by: Serge Hallyn 
Cc: Daniel Lezcano 
---
 doc/lxc-attach.sgml.in |   44 +++-
 src/lxc/attach.c   |   44 
 src/lxc/attach.h   |1 +
 src/lxc/lxc_attach.c   |   22 +-
 4 files changed, 101 insertions(+), 10 deletions(-)

diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
index 035cd27..1724393 100644
--- a/doc/lxc-attach.sgml.in
+++ b/doc/lxc-attach.sgml.in
@@ -50,7 +50,7 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
 lxc-attach -n
 name -a
 arch -e -s
-namespaces
+namespaces -R
 -- command
   
 
@@ -146,7 +146,30 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA

   
 
-
+  
+   
+ -R, --remount-sys-proc
+   
+   
+ 
+   When using -s and the mount namespace is not
+   included, this flag will cause lxc-attach
+   to remount /proc and
+   /sys to reflect the current other
+   namespace contexts.
+ 
+ 
+   Please see the Notes section for more
+   details.
+ 
+ 
+   This option will be ignored if one tries to attach to the
+   mount namespace anyway.
+ 
+   
+  
+
+ 
 
   
 
@@ -229,13 +252,16 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   the network namespace.
 
 
-  A workaround is to use lxc-unshare to unshare
-  the mount namespace after using lxc-attach with
-  -s PID and/or -s
-  NETWORK and then unmount and then mount again both
-  pseudo-filesystems within that new mount namespace, before
-  executing a program/script that relies on this information to be
-  correct.
+  To work around this problem, the -R flag provides
+  the option to remount /proc and
+  /sys in order for them to reflect the
+  network/pid namespace context of the attached process. In order
+  not to interfere with the host's actual filesystem, the mount
+  namespace will be unshared (like lxc-unshare
+  does) before this is done, esentially giving the process a new
+  mount namespace, which is identical to the hosts's mount namespace
+  except for the /proc and
+  /sys filesystems.
 
   
 
diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index 37e667f..ec0e083 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #if !HAVE_DECL_PR_CAPBSET_DROP
@@ -188,6 +189,49 @@ int lxc_attach_to_ns(pid_t pid, int which)
return 0;
 }
 
+int lxc_attach_remount_sys_proc()
+{
+   int ret;
+
+   ret = unshare(CLONE_NEWNS);
+   if (ret < 0) {
+   SYSERROR("failed to unshare mount namespace");
+   return -1;
+   }
+
+   /* assume /proc is always mounted, so remount it */
+   ret = umount2("/proc", MNT_DETACH);
+   if (ret < 0) {
+   SYSERROR("failed to unmount /proc");
+   return -1;
+   }
+
+   ret = mount("none", "/proc", "proc", 0, NULL);
+   if (ret < 0) {
+   SYSERROR("failed to remount /proc");
+   return -1;
+   }
+
+   /* try to umount /sys - if it's not a mount point,
+* we'll get EINVAL, then we ignore it because it
+* may not have been mounted in the first place
+*/
+   ret = umount2("/sys", MNT_DETACH);
+   if (ret < 0 && errno != EINVAL) {
+   SYSERROR("failed to unmount /sys");
+   return -1;
+   } else if (ret == 0) {
+   /* remount it */
+   ret = mount("none", "/sys", "sysfs", 0, NULL);
+   if (ret < 0) {
+   SYSERROR("failed to remount /sys");
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
 int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx)
 {
int last_cap = lxc_caps_last_cap();
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index d96fdae..aab47e3 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -34,6 +34,7 @@ struct lxc_proc_context

Re: [lxc-devel] [PATCH v3 6/6] lxc-attach: Add -R option to remount /sys and /proc when only partially attaching

2012-05-24 Thread Christian Seiler
Hi Serge,

> Note there is no reason to resend this patch for this,

Actually, there were some trivial changes here due to patch #2, which
reordered some code in lxc_attach.c - that's why I resent it. Now it
should be trivial to apply all of the 6 patches to the current master
branch, before you'd have had to do a bit of merging.

> but do you think it would be worthwhile to warn if the user specified
> -R, but CLONE_NEWNS was already in the mount flags?

I don't think its necessary (this is a very specialized feature
anyway), but I don't really care, so if you think this should be done,
I can update the patch.

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] Shutting down containers properly

2012-05-25 Thread Christian Seiler
Hi,

Currently, lxc-stop sends SIGKILL to the init process of the container,
which causes all the other processes in the container to also receive
a SIGKILL. I don't think that is a good course of action, since sending
SIGKILL to for example a database server can lead to potential data
loss.

A much better way of stopping containers would be in my opinion to
first send the container a shutdown signal - and then wait for a
specified amount of time before really killing the container with a
KILL signal.

Unfortunately, no init system will react to SIGTERM and shut down the
container, so it is not quite as easy. I've looked a bit at different
init systems to see how to properly shut them down:

  - lxc application containers (lxc-execute): lxc-init will do a
kill(-1, SIGTERM) if it receives a SIGTERM itself, so sending
it a SIGTERM is sufficient to initiate a proper shutdown

  - sysvinit: open /run/initctl (newer Debian) or /dev/initctl (older
Debian and other distros) and send them a binary message to switch
to runlevel 0

  - upstart: connect to DBus and tell it to switch to runlevel 0

  - systemd: either connect to DBus and tell it to switch runlevel or
send SIGRTMIN + 4, that will also cause a shutdown

  - sysvinit + upstart + systemd also all provide a 'telinit' binary,
where calling 'telinit 0' will initiate a shutdown

My proposal would be the following:

lxc-stop first sends a new SHUTDOWN command (instead of the current
STOP command), which initiates the shutdown and returns immediately.
The command handler in lxc-start will then initiate a shutdown of the
container (see below). lxc-stop will wait for a given amount of seconds
and if the container is not stopped by then, it will send the current
STOP command to actually kill the container with SIGKILL.

On the other hand, add a --force option that will make lxc-stop still
be able to kill all processes immediately.

Now how to shut down the container? In lxc.conf there should be a new
configuration option, lxc.shutdown_method, which can carry the
following values: "application", "sysvinit", "systemd" and "exec".
For application containers started with lxc-execute, it will default
to "application", for system containers started with lxc-start, it will
default to "sysvinit".

The following actions will be performed:

"application": send SIGTERM to init process of the container

"sysvinit": fork(), child process does setns() for mount namespace,
 tries to send signal to /run/initctl and /dev/initctl
 (whichever exists), but first checks whether st_dev and
 st_ino entries do NOT match those of the host's files,
 so we don't accidentally shut down the host (if the
 container hares filesystem with the host)

"systemd": send SIGRTMIN + 4 to init process of the container

"exec": run lxc-attach for the container with the contents of the
 new option lxc.shutdown_command as parameter

I haven't included any explicit method for shutting down upstart, so
containers running upstart inside (assuming that's even possible, I
don't know much about upstart) should probably use method exec and
execute telinit 0 inside the container. Sending simple signals to the
init process as in application / systemd or opening a FIFO and writing
some bytes for sysvinit is still quite trivial, but implementing DBus
(esp. across container boundaries) - which would be required for native
upstart shutdown support - seems like overkill to me.

On the other hand, I do want to explicitly implement the sysvinit way,
since there we can check that we're definitely not going to shut down
the host accidentally (by checking the device/inode numbers of the
initctl FIFOs), which we can't be 100% sure of with exec.

Caveats:

1. application / systemd methods should always work, since we just send
a signal to the init process; sysvinit will only work if attaching
mount namespaces is implemented in the kernel and exec only if full
lxc-attach works (so all namespaces). But the worst case scenario here
is that we still kill all processes in the container with lxc-stop if
the kernel doesn't support attach, so there is no loss for current
users.

2. If the container is frozen, the current logic first sends the KILL
signal and then unfreezes it, so the container immediately goes away.
However, how should we react if we just want to shut it down? Unfreeze
it and send the shutdown signal? Or just kill it immediately? Or do
nothing and report an error?

Thoughts?

(Note: I'd be willing to implement this feature, once a consensus is
reached on how to proceeed.)

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. ht

Re: [lxc-devel] Shutting down containers properly

2012-05-25 Thread Christian Seiler
Hi,

> my lxc management script uses lxc-stop just for an emergency action
> called "fored-stop" and it will be also applied, if a normal "stop"
> (alias "halt") action will timeout after 5min. For this normal
> shutdown (or reboot) of a container, I'm sending just a SIGPWR (or
> SIGINT) to the containers init (sysvinit) process. This will result
> into the same actions as one will call 'poweroff' or 'reboot' inside
> it.

Yes, obviously that also works, but then you need to manually update
the inittab of a container and rewire reboot/power failure actions to
cause a shutdown as you describe.

I'd really prefer shutting down a container just worked out of the box
without any strange modifications to /etc/inittab for sysvinit. I
really think shutting down containers properly is a functionality that
LXC should support out of the box.

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] Shutting down containers properly

2012-05-25 Thread Christian Seiler
Hi,

> Have you looked at the lxc-shutdown script we have in Ubuntu and the
> integration we have with upstart?

No, not yet, but I'll look at it later.

> lxc-shutdown sends two different signals:
>  reboot => SIGINT
>  shutdown => SIGPWR
>
> These are caught by upstart and will trigger a clean reboot or 
> shutdown
> of the container. It's what happens on shutdown of the host in 12.04 
> LTS.

On a Debian container I had lying around here it had no effect 
whatsoever, because there's nothing in the /etc/inittab catching it.

Is it document behaviour that upstart shuts down on SIGPWR? (upstart 
has no /etc/inittab where this may be configured, right?) Because if it 
is that easy to cause upstart deterministically to shut down, then that 
is definitely something we should use.

Still, I think my initial rationale still holds that lxc-stop should 
shut down by default, because I certainly didn't expect lxc-stop to kill 
everything with SIGKILL when I tried it for the first time. So basically 
all I'm saying is that Ubuntu's lxc-shutdown logic should be implemented 
in lxc-stop and that it should be a bit more generic with the 
possibility that the user can configure different methods in the config 
file.

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [GIT] lxc branch, master, updated. 60a742e0afd34e02299f64536df35116d68d888d

2012-08-12 Thread Christian Seiler
Hi there,

I just wanted to ask what happend to my patches that improved upon
attach/unshare?

Final version:
http://thread.gmane.org/gmane.linux.kernel.containers.lxc.devel/1408/focus=1436

Thanks in advance!

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] lxc-wait: Add timeout option

2012-08-21 Thread Christian Seiler
Hi Serge,

> My github tree is my staging tree for things I'd like to have merged into
> lxc.sf.net, so hopefully when Daniel has time again he'll take it.  (I
> posted it to my tree after the last time Daniel merged)

Btw. could you also add my patchset under

http://thread.gmane.org/gmane.linux.kernel.containers.lxc.devel/1408/focus=1436

to your tree? Thanks!

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v3 RESENT] Partial namespaces for lxc-attach

2012-08-21 Thread Christian Seiler
Serge,

Here you go, I've rebased the patches against Daniel's current master
branch, so that they properly apply.

Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 5/6] lxc-attach: Add -s option to select namespaces to attach to

2012-08-21 Thread Christian Seiler
This patch allows the user to select any list of namespaces (network, pid,
mount, uts, ipc, user) that lxc-attach should use when attaching to the
container; all other namespaces will not be attached to.

This allows the user to for example attach to just the network namespace and
use the host's (and not the container's) network tools to reconfigure the
network of the container.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 doc/lxc-attach.sgml.in |   98 +--
 src/lxc/lxc_attach.c   |   20 +-
 2 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
index 7092f16..035cd27 100644
--- a/doc/lxc-attach.sgml.in
+++ b/doc/lxc-attach.sgml.in
@@ -49,7 +49,8 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
 lxc-attach -n
 name -a
-arch -e
+arch -e -s
+namespaces
 -- command
   
 
@@ -122,6 +123,29 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA

   
 
+  
+   
+ -s, --namespaces 
namespaces
+   
+   
+ 
+   Specify the namespaces to attach to, as a pipe-separated liste,
+   e.g. NETWORK|IPC. Allowed values are
+   MOUNT, PID,
+   UTSNAME, IPC,
+   USER  and
+   NETWORK. This allows one to change
+   the context of the process to e.g. the network namespace of the
+   container while retaining the other namespaces as those of the
+   host.
+ 
+ 
+   Important: This option implies
+   -e.
+ 
+   
+  
+
 
 
   
@@ -144,19 +168,83 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
   
 To deactivate the network link eth1 of a running container that
-does not have the NET_ADMIN capability, use the -e
-option to use increased capabilities:
+does not have the NET_ADMIN capability, use either the
+-e option to use increased capabilities,
+assuming the ip tool is installed:
 
   lxc-attach -n container -e -- /sbin/ip link delete eth1
 
+Or, alternatively, use the -s to use the
+tools installed on the host outside the container:
+
+  lxc-attach -n container -s NETWORK -- /sbin/ip link delete eth1
+
   
   
 
   
+Compatibility
+
+  Attaching completely (including the pid and mount namespaces) to a
+  container requires a patched kernel, please see the lxc website for
+  details. lxc-attach will fail in that case if
+  used with an unpatched kernel.
+
+
+  Nevertheless, it will succeed on an unpatched kernel of version 3.0
+  or higher if the -s option is used to restrict the
+  namespaces that the process is to be attached to to one or more of 
+  NETWORK, IPC
+  and UTSNAME.
+
+
+  Attaching to user namespaces is currently completely unsupported
+  by the kernel. lxc-attach should however be able
+  to do this once once future kernel versions implement this.
+
+  
+
+  
+Notes
+
+  The Linux /proc and
+  /sys filesystems contain information
+  about some quantities that are affected by namespaces, such as
+  the directories named after process ids in
+  /proc or the network interface infromation
+  in /sys/class/net. The namespace of the
+  process mounting the pseudo-filesystems determines what information
+  is shown, not the namespace of the process
+  accessing /proc or
+  /sys.
+
+
+  If one uses the -s option to only attach to
+  the pid namespace of a container, but not its mount namespace
+  (which will contain the /proc of the
+  container and not the host), the contents of /proc
+  will reflect that of the host and not the container. Analogously,
+  the same issue occurs when reading the contents of
+  /sys/class/net and attaching to just
+  the network namespace.
+
+
+  A workaround is to use lxc-unshare to unshare
+  the mount namespace after using lxc-attach with
+  -s PID and/or -s
+  NETWORK and then unmount and then mount again both
+  pseudo-filesystems within that new mount namespace, before
+  executing a program/script that relies on this information to be
+  correct.
+
+  
+
+  
 Security
 
-  The -e should be used with care, as it may break
-  the isolation of the containers if used improperly.
+  The -e and -s options should
+  be used with care, as it may break the isolation of the containers
+  if used improperly.
 
   
 
diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index 10d4a64..4f22752 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -40,12 +40,14 @@
 #include "start.h"
 #include "

[lxc-devel] [PATCH 2/6] lxc-attach: Remodel cgroup attach logic and attach to namespaces again in parent process

2012-08-21 Thread Christian Seiler
With the introduction of lxc-attach's functionality to attach to cgroups,
the setns() calls were put in the child process after the fork() and not the
parent process before the fork() so the parent process remained outside the
namespaces and could add the child to the correct cgroup.

Unfortunately, the pid namespace really affects only children of the current
process and not the process itself, which has several drawbacks: The
attached program does not have a pid inside the container and the context
that is used when remounting /proc from that process is wrong. Thus, the
previous logic of first setting the namespaces and then forking so the child
process (which then exec()s to the desired program) is a real member of the
container.

However, inside the container, there is no guarantee that the cgroup
filesystem is still be mounted and that we are allowed to write to it (which
is why the setns() was moved in the first place).

To work around both problems, we separate the cgroup attach functionality
into two parts: Preparing the attach process, which just opens the tasks
files of all cgroups and keeps the file descriptors open and the writing to
those fds part. This allows us to open all the tasks files in lxc_attach,
then call setns(), then fork, in the child process close them completely and
in the parent process just write the pid of the child process to all those
fds.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 src/lxc/cgroup.c |  153 -
 src/lxc/cgroup.h |3 +
 src/lxc/lxc_attach.c |   62 +++-
 3 files changed, 187 insertions(+), 31 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index 15f0212..2e6ee1e 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -254,13 +254,38 @@ static int cgroup_enable_clone_children(const char *path)
return ret;
 }
 
-static int lxc_one_cgroup_attach(const char *name,
-struct mntent *mntent, pid_t pid)
+static int lxc_one_cgroup_finish_attach(int fd, pid_t pid)
 {
-   FILE *f;
+   char buf[32];
+   int ret;
+
+   snprintf(buf, 32, "%ld", (long)pid);
+
+   ret = write(fd, buf, strlen(buf));
+   if (ret <= 0) {
+   SYSERROR("failed to write pid '%ld' to fd '%d'", (long)pid, fd);
+   ret = -1;
+   } else {
+   ret = 0;
+   }
+
+   close(fd);
+   return ret;
+}
+
+static int lxc_one_cgroup_dispose_attach(int fd)
+{
+   close(fd);
+   return 0;
+}
+
+static int lxc_one_cgroup_prepare_attach(const char *name,
+struct mntent *mntent)
+{
+   int fd;
char tasks[MAXPATHLEN], initcgroup[MAXPATHLEN];
char *cgmnt = mntent->mnt_dir;
-   int flags, ret = 0;
+   int flags;
int rc;
 
flags = get_cgroup_flags(mntent);
@@ -274,31 +299,83 @@ static int lxc_one_cgroup_attach(const char *name,
return -1;
}
 
-   f = fopen(tasks, "w");
-   if (!f) {
+   fd = open(tasks, O_WRONLY);
+   if (fd < 0) {
SYSERROR("failed to open '%s'", tasks);
return -1;
}
 
-   if (fprintf(f, "%d", pid) <= 0) {
-   SYSERROR("failed to write pid '%d' to '%s'", pid, tasks);
-   ret = -1;
+   return fd;
+}
+
+static int lxc_one_cgroup_attach(const char *name, struct mntent *mntent, 
pid_t pid)
+{
+   int fd;
+
+   fd = lxc_one_cgroup_prepare_attach(name, mntent);
+   if (fd < 0) {
+   return -1;
}
 
-   fclose(f);
+   return lxc_one_cgroup_finish_attach(fd, pid);
+}
+
+int lxc_cgroup_dispose_attach(void *data)
+{
+   int *fds = data;
+   int ret, err;
+
+   if (!fds) {
+   return 0;
+   }
+
+   ret = 0;
+
+   for (; *fds >= 0; fds++) {
+   err = lxc_one_cgroup_dispose_attach(*fds);
+   if (err) {
+   ret = err;
+   }
+   }
+
+   free(data);
 
return ret;
 }
 
-/*
- * for each mounted cgroup, attach a pid to the cgroup for the container
- */
-int lxc_cgroup_attach(const char *name, pid_t pid)
+int lxc_cgroup_finish_attach(void *data, pid_t pid)
+{
+   int *fds = data;
+   int err;
+
+   if (!fds) {
+   return 0;
+   }
+
+   for (; *fds >= 0; fds++) {
+   err = lxc_one_cgroup_finish_attach(*fds, pid);
+   if (err) {
+   /* get rid of the rest of them */
+   lxc_cgroup_dispose_attach(data);
+   return -1;
+   }
+   *fds = -1;
+   }
+
+   free(data);
+
+   return 0;
+}
+
+int lxc_cgroup_prepare_attach(const char *name, void **data)
 {
struct mnten

[lxc-devel] [PATCH 3/6] lxc-attach: Detect which namespaces to attach to dynamically

2012-08-21 Thread Christian Seiler
Use the command interface to contact lxc-start to receive the set of
flags passed to clone() when starting the container. This allows lxc-attach
to determine which namespaces were used for the container and select only
those to attach to.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 src/lxc/attach.c |   42 +-
 src/lxc/attach.h |2 +-
 src/lxc/lxc_attach.c |   16 +++-
 3 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index a95b3d3..37e667f 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -121,13 +121,22 @@ out_error:
return NULL;
 }
 
-int lxc_attach_to_ns(pid_t pid)
+int lxc_attach_to_ns(pid_t pid, int which)
 {
char path[MAXPATHLEN];
-   char *ns[] = { "pid", "mnt", "net", "ipc", "uts" };
-   const int size = sizeof(ns) / sizeof(char *);
+   /* according to 
<http://article.gmane.org/gmane.linux.kernel.containers.lxc.devel/1429>,
+* the file for user namepsaces in /proc/$pid/ns will be called
+* 'user' once the kernel supports it
+*/
+   static char *ns[] = { "mnt", "pid", "uts", "ipc", "user", "net" };
+   static int flags[] = {
+   CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
+   CLONE_NEWUSER, CLONE_NEWNET
+   };
+   static const int size = sizeof(ns) / sizeof(char *);
int fd[size];
-   int i;
+   int i, j, saved_errno;
+
 
snprintf(path, MAXPATHLEN, "/proc/%d/ns", pid);
if (access(path, X_OK)) {
@@ -136,16 +145,39 @@ int lxc_attach_to_ns(pid_t pid)
}
 
for (i = 0; i < size; i++) {
+   /* ignore if we are not supposed to attach to that
+* namespace
+*/
+   if (which != -1 && !(which & flags[i])) {
+   fd[i] = -1;
+   continue;
+   }
+
snprintf(path, MAXPATHLEN, "/proc/%d/ns/%s", pid, ns[i]);
fd[i] = open(path, O_RDONLY);
if (fd[i] < 0) {
+   saved_errno = errno;
+
+   /* close all already opened file descriptors before
+* we return an error, so we don't leak them
+*/
+   for (j = 0; j < i; j++)
+   close(fd[j]);
+
+   errno = saved_errno;
SYSERROR("failed to open '%s'", path);
return -1;
}
}
 
for (i = 0; i < size; i++) {
-   if (setns(fd[i], 0)) {
+   if (fd[i] >= 0 && setns(fd[i], 0) != 0) {
+   saved_errno = errno;
+
+   for (j = i; j < size; j++)
+   close(fd[j]);
+
+   errno = saved_errno;
SYSERROR("failed to set namespace '%s'", ns[i]);
return -1;
}
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index 2d46c83..d96fdae 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -33,7 +33,7 @@ struct lxc_proc_context_info {
 
 extern struct lxc_proc_context_info *lxc_proc_get_context_info(pid_t pid);
 
-extern int lxc_attach_to_ns(pid_t other_pid);
+extern int lxc_attach_to_ns(pid_t other_pid, int which);
 extern int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx);
 
 #endif
diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index e4f604b..10d4a64 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -51,6 +51,7 @@ static const struct option my_longopts[] = {
 
 static int elevated_privileges = 0;
 static signed long new_personality = -1;
+static int namespace_flags = -1;
 
 static int my_parser(struct lxc_arguments* args, int c, char* arg)
 {
@@ -139,11 +140,24 @@ int main(int argc, char *argv[])
 
curdir = get_current_dir_name();
 
+   /* determine which namespaces the container was created with
+* by asking lxc-start
+*/
+   if (namespace_flags == -1) {
+   namespace_flags = lxc_get_clone_flags(my_args.name);
+   /* call failed */
+   if (namespace_flags == -1) {
+   ERROR("failed to automatically determine the "
+ "namespaces which the container unshared");
+   return -1;
+   }
+   }
+
/* we need to attach before we fork since certain namespaces
 * (such as pid namespaces) only really affect children of the
 * current process and not 

[lxc-devel] [PATCH 4/6] lxc-unshare: Move functions to determine clone flags from command line options to namespace.c

2012-08-21 Thread Christian Seiler
In order to be able to reuse code in lxc-attach, the functions
lxc_namespace_2_cloneflag and lxc_fill_namespace_flags are moved from
lxc_unshare.c to namespace.c.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 src/lxc/lxc_unshare.c |   45 -
 src/lxc/namespace.c   |   45 +
 src/lxc/namespace.h   |3 +++
 3 files changed, 48 insertions(+), 45 deletions(-)

diff --git a/src/lxc/lxc_unshare.c b/src/lxc/lxc_unshare.c
index 498d6e0..3a848b2 100644
--- a/src/lxc/lxc_unshare.c
+++ b/src/lxc/lxc_unshare.c
@@ -84,51 +84,6 @@ static uid_t lookup_user(const char *optarg)
return uid;
 }
 
-static char *namespaces_list[] = {
-   "MOUNT", "PID", "UTSNAME", "IPC",
-   "USER", "NETWORK"
-};
-static int cloneflags_list[] = {
-   CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
-   CLONE_NEWUSER, CLONE_NEWNET
-};
-
-static int lxc_namespace_2_cloneflag(char *namespace)
-{
-   int i, len;
-   len = sizeof(namespaces_list)/sizeof(namespaces_list[0]);
-   for (i = 0; i < len; i++)
-   if (!strcmp(namespaces_list[i], namespace))
-   return cloneflags_list[i];
-
-   ERROR("invalid namespace name %s", namespace);
-   return -1;
-}
-
-static int lxc_fill_namespace_flags(char *flaglist, int *flags)
-{
-   char *token, *saveptr = NULL;
-   int aflag;
-
-   if (!flaglist) {
-   ERROR("need at least one namespace to unshare");
-   return -1;
-   }
-
-   token = strtok_r(flaglist, "|", &saveptr);
-   while (token) {
-
-   aflag = lxc_namespace_2_cloneflag(token);
-   if (aflag < 0)
-   return -1;
-
-   *flags |= aflag;
-
-   token = strtok_r(NULL, "|", &saveptr);
-   }
-   return 0;
-}
-
 
 struct start_arg {
char ***args;
diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c
index 3e6fc3a..3fa027b 100644
--- a/src/lxc/namespace.c
+++ b/src/lxc/namespace.c
@@ -69,3 +69,48 @@ pid_t lxc_clone(int (*fn)(void *), void *arg, int flags)
 
return ret;
 }
+
+static char *namespaces_list[] = {
+   "MOUNT", "PID", "UTSNAME", "IPC",
+   "USER", "NETWORK"
+};
+static int cloneflags_list[] = {
+   CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
+   CLONE_NEWUSER, CLONE_NEWNET
+};
+
+int lxc_namespace_2_cloneflag(char *namespace)
+{
+   int i, len;
+   len = sizeof(namespaces_list)/sizeof(namespaces_list[0]);
+   for (i = 0; i < len; i++)
+   if (!strcmp(namespaces_list[i], namespace))
+   return cloneflags_list[i];
+
+   ERROR("invalid namespace name %s", namespace);
+   return -1;
+}
+
+int lxc_fill_namespace_flags(char *flaglist, int *flags)
+{
+   char *token, *saveptr = NULL;
+   int aflag;
+
+   if (!flaglist) {
+   ERROR("need at least one namespace to unshare");
+   return -1;
+   }
+
+   token = strtok_r(flaglist, "|", &saveptr);
+   while (token) {
+
+   aflag = lxc_namespace_2_cloneflag(token);
+   if (aflag < 0)
+   return -1;
+
+   *flags |= aflag;
+
+   token = strtok_r(NULL, "|", &saveptr);
+   }
+   return 0;
+}
diff --git a/src/lxc/namespace.h b/src/lxc/namespace.h
index 5442dd3..04e81bb 100644
--- a/src/lxc/namespace.h
+++ b/src/lxc/namespace.h
@@ -50,4 +50,7 @@
 
 extern pid_t lxc_clone(int (*fn)(void *), void *arg, int flags);
 
+extern int lxc_namespace_2_cloneflag(char *namespace);
+extern int lxc_fill_namespace_flags(char *flaglist, int *flags);
+
 #endif
-- 
1.7.8.6


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/6] lxc-start: Add command to retrieve the clone flags used to start the container.

2012-08-21 Thread Christian Seiler
Add the LXC_COMMAND_CLONE_FLAGS that retrieves the flags passed to clone(2)
when the container was started. This allows external programs to determine
which namespaces the container was unshared from.

Signed-off-by: Christian Seiler 
Cc: Daniel Lezcano 
Cc: Serge Hallyn 
---
 src/lxc/commands.c |   30 ++
 src/lxc/commands.h |2 ++
 src/lxc/start.c|   34 --
 src/lxc/start.h|1 +
 4 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/src/lxc/commands.c b/src/lxc/commands.c
index cce24db..dc93815 100644
--- a/src/lxc/commands.c
+++ b/src/lxc/commands.c
@@ -154,11 +154,32 @@ pid_t get_init_pid(const char *name)
return command.answer.pid;
 }
 
+int lxc_get_clone_flags(const char *name)
+{
+   struct lxc_command command = {
+   .request = { .type = LXC_COMMAND_CLONE_FLAGS },
+   };
+
+   int ret, stopped = 0;
+
+   ret = lxc_command(name, &command, &stopped);
+   if (ret < 0 && stopped)
+   return -1;
+
+   if (ret < 0) {
+   ERROR("failed to send command");
+   return -1;
+   }
+
+   return command.answer.ret;
+}
+
 extern void lxc_console_remove_fd(int, struct lxc_tty_info *);
 extern int  lxc_console_callback(int, struct lxc_request *, struct lxc_handler 
*);
 extern int  lxc_stop_callback(int, struct lxc_request *, struct lxc_handler *);
 extern int  lxc_state_callback(int, struct lxc_request *, struct lxc_handler 
*);
 extern int  lxc_pid_callback(int, struct lxc_request *, struct lxc_handler *);
+extern int  lxc_clone_flags_callback(int, struct lxc_request *, struct 
lxc_handler *);
 
 static int trigger_command(int fd, struct lxc_request *request,
   struct lxc_handler *handler)
@@ -166,10 +187,11 @@ static int trigger_command(int fd, struct lxc_request 
*request,
typedef int (*callback)(int, struct lxc_request *, struct lxc_handler 
*);
 
callback cb[LXC_COMMAND_MAX] = {
-   [LXC_COMMAND_TTY]   = lxc_console_callback,
-   [LXC_COMMAND_STOP]  = lxc_stop_callback,
-   [LXC_COMMAND_STATE] = lxc_state_callback,
-   [LXC_COMMAND_PID]   = lxc_pid_callback,
+   [LXC_COMMAND_TTY] = lxc_console_callback,
+   [LXC_COMMAND_STOP]= lxc_stop_callback,
+   [LXC_COMMAND_STATE]   = lxc_state_callback,
+   [LXC_COMMAND_PID] = lxc_pid_callback,
+   [LXC_COMMAND_CLONE_FLAGS] = lxc_clone_flags_callback,
};
 
if (request->type < 0 || request->type >= LXC_COMMAND_MAX)
diff --git a/src/lxc/commands.h b/src/lxc/commands.h
index d5c013f..3b0ac9a 100644
--- a/src/lxc/commands.h
+++ b/src/lxc/commands.h
@@ -28,6 +28,7 @@ enum {
LXC_COMMAND_STOP,
LXC_COMMAND_STATE,
LXC_COMMAND_PID,
+   LXC_COMMAND_CLONE_FLAGS,
LXC_COMMAND_MAX,
 };
 
@@ -48,6 +49,7 @@ struct lxc_command {
 };
 
 extern pid_t get_init_pid(const char *name);
+extern int lxc_get_clone_flags(const char *name);
 
 extern int lxc_command(const char *name, struct lxc_command *command,
int *stopped);
diff --git a/src/lxc/start.c b/src/lxc/start.c
index 48e9962..7dfe1ba 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -278,6 +278,29 @@ int lxc_pid_callback(int fd, struct lxc_request *request,
return 0;
 }
 
+int lxc_clone_flags_callback(int fd, struct lxc_request *request,
+struct lxc_handler *handler)
+{
+   struct lxc_answer answer;
+   int ret;
+
+   answer.pid = 0;
+   answer.ret = handler->clone_flags;
+
+   ret = send(fd, &answer, sizeof(answer), 0);
+   if (ret < 0) {
+   WARN("failed to send answer to the peer");
+   return -1;
+   }
+
+   if (ret != sizeof(answer)) {
+   ERROR("partial answer sent");
+   return -1;
+   }
+
+   return 0;
+}
+
 int lxc_set_state(const char *name, struct lxc_handler *handler, lxc_state_t 
state)
 {
handler->state = state;
@@ -542,7 +565,6 @@ out_warn_father:
 
 int lxc_spawn(struct lxc_handler *handler)
 {
-   int clone_flags;
int failed_before_rename = 0;
const char *name = handler->name;
int pinfd;
@@ -550,10 +572,10 @@ int lxc_spawn(struct lxc_handler *handler)
if (lxc_sync_init(handler))
return -1;
 
-   clone_flags = CLONE_NEWUTS|CLONE_NEWPID|CLONE_NEWIPC|CLONE_NEWNS;
+   handler->clone_flags = 
CLONE_NEWUTS|CLONE_NEWPID|CLONE_NEWIPC|CLONE_NEWNS;
if (!lxc_list_empty(&handler->conf->network)) {
 
-   clone_flags |= CLONE_NEWNET;
+   handler->clone_flags |= CLONE_NEWNET;
 
/* Find gateway addresses from the link device, which is
 * no longer access

[lxc-devel] [PATCH 6/6] lxc-attach: Add -R option to remount /sys and /proc when only partially attaching

2012-08-21 Thread Christian Seiler
When attaching to only some namespaces of the container but not the mount
namespace, the contents of /sys and /proc of the host system do not properly
reflect the context of the container's pid and/or network namespaces, and
possibly others.

The introduced -R option adds the possibility to additionally unshare the
mount namespace (when it is not being attached) and remount /sys and /proc
in order for those filesystems to properly reflect the container's context
even when only attaching to some of the namespaces.

Signed-off-by: Christian Seiler 
Acked-by: Serge Hallyn 
Cc: Daniel Lezcano 
---
 doc/lxc-attach.sgml.in |   44 +++-
 src/lxc/attach.c   |   44 
 src/lxc/attach.h   |1 +
 src/lxc/lxc_attach.c   |   22 +-
 4 files changed, 101 insertions(+), 10 deletions(-)

diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
index 035cd27..1724393 100644
--- a/doc/lxc-attach.sgml.in
+++ b/doc/lxc-attach.sgml.in
@@ -50,7 +50,7 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
 lxc-attach -n
 name -a
 arch -e -s
-namespaces
+namespaces -R
 -- command
   
 
@@ -146,7 +146,30 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA

   
 
-
+  
+   
+ -R, --remount-sys-proc
+   
+   
+ 
+   When using -s and the mount namespace is not
+   included, this flag will cause lxc-attach
+   to remount /proc and
+   /sys to reflect the current other
+   namespace contexts.
+ 
+ 
+   Please see the Notes section for more
+   details.
+ 
+ 
+   This option will be ignored if one tries to attach to the
+   mount namespace anyway.
+ 
+   
+  
+
+ 
 
   
 
@@ -229,13 +252,16 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   the network namespace.
 
 
-  A workaround is to use lxc-unshare to unshare
-  the mount namespace after using lxc-attach with
-  -s PID and/or -s
-  NETWORK and then unmount and then mount again both
-  pseudo-filesystems within that new mount namespace, before
-  executing a program/script that relies on this information to be
-  correct.
+  To work around this problem, the -R flag provides
+  the option to remount /proc and
+  /sys in order for them to reflect the
+  network/pid namespace context of the attached process. In order
+  not to interfere with the host's actual filesystem, the mount
+  namespace will be unshared (like lxc-unshare
+  does) before this is done, esentially giving the process a new
+  mount namespace, which is identical to the hosts's mount namespace
+  except for the /proc and
+  /sys filesystems.
 
   
 
diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index 37e667f..ec0e083 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #if !HAVE_DECL_PR_CAPBSET_DROP
@@ -188,6 +189,49 @@ int lxc_attach_to_ns(pid_t pid, int which)
return 0;
 }
 
+int lxc_attach_remount_sys_proc()
+{
+   int ret;
+
+   ret = unshare(CLONE_NEWNS);
+   if (ret < 0) {
+   SYSERROR("failed to unshare mount namespace");
+   return -1;
+   }
+
+   /* assume /proc is always mounted, so remount it */
+   ret = umount2("/proc", MNT_DETACH);
+   if (ret < 0) {
+   SYSERROR("failed to unmount /proc");
+   return -1;
+   }
+
+   ret = mount("none", "/proc", "proc", 0, NULL);
+   if (ret < 0) {
+   SYSERROR("failed to remount /proc");
+   return -1;
+   }
+
+   /* try to umount /sys - if it's not a mount point,
+* we'll get EINVAL, then we ignore it because it
+* may not have been mounted in the first place
+*/
+   ret = umount2("/sys", MNT_DETACH);
+   if (ret < 0 && errno != EINVAL) {
+   SYSERROR("failed to unmount /sys");
+   return -1;
+   } else if (ret == 0) {
+   /* remount it */
+   ret = mount("none", "/sys", "sysfs", 0, NULL);
+   if (ret < 0) {
+   SYSERROR("failed to remount /sys");
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
 int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx)
 {
int last_cap = lxc_caps_last_cap();
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index d96fdae..aab47e3 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -34,6 +34,7 @@ struct lxc_proc_context

[lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-13 Thread Christian Seiler
This patch adds a simple notification system that allows the container to
notify the host (in particular, the lxc-start process) that the boot process
has been completed successfully. It also adds an additional status BOOTING
that lxc-info may return. This allows the administrator and scripts to
distinguish between a fully-running container and a container that is still
in the process of booting.

If nothing is added to the configuration file, the current behavior is not
changed, i.e. after lxc-start finishes the initialization, the container is
immediately put into the RUNNING state. This ensures backwards
compatibility.

If lxc.notification.type is set to 'fifo', after lxc-start initialization
the container is initially put into the state BOOTING. Also, the FIFO
/var/lib/lxc/%s/notification-fifo is created and bind-mounted into the
container, by default to /dev/lxc-notify, but this can be changed via the
lxc.notification.path configuration setting.

Inside the container one may execute 'echo RUNNING > /dev/lxc-notify' or an
equivalent command to notify lxc-start that the container has now booted.
Similarly, 'echo STOPPING > /dev/lxc-notify' will change the status to
STOPPING, which may be done on shutdown. Currently, only RUNNING and
STOPPING are allowed, other states are ignored.

This patch only provides the LXC part for the notification system, the
counterpart inside the container has to be provided separately. The
interface has been kept extremely simple to facilitate this.

The choice of the option lxc.notification.type, as opposed to
lxc.notification.enabled, is deliberate in order to make this extensible. If
at some point there is some kind of standardized system for these types of
notifications, it will be simple to just add a new value for the
lxc.notification.type option.

Signed-off-by: Christian Seiler 
Cc: Serge Hallyn 
Cc: Guido Jäkel 
---
 src/lxc/Makefile.am|1 +
 src/lxc/conf.c |8 +
 src/lxc/conf.h |3 +
 src/lxc/confile.c  |   34 +
 src/lxc/notification.c |  349 
 src/lxc/notification.h |   50 +++
 src/lxc/start.c|   22 +++-
 src/lxc/start.h|1 +
 src/lxc/state.c|1 +
 src/lxc/state.h|3 +-
 10 files changed, 468 insertions(+), 4 deletions(-)
 create mode 100644 src/lxc/notification.c
 create mode 100644 src/lxc/notification.h

diff --git a/src/lxc/Makefile.am b/src/lxc/Makefile.am
index 7d86ad6..d976bf7 100644
--- a/src/lxc/Makefile.am
+++ b/src/lxc/Makefile.am
@@ -32,6 +32,7 @@ liblxc_so_SOURCES = \
freezer.c \
checkpoint.c \
restart.c \
+   notification.h notification.c \
error.h error.c \
parse.c parse.h \
cgroup.c cgroup.h \
diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index 1450ca6..422b742 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -61,6 +61,7 @@
 #include "log.h"
 #include "lxc.h"   /* for lxc_cgroup_set() */
 #include "caps.h"   /* for lxc_caps_last_cap() */
+#include "notification.h"
 
 #if HAVE_APPARMOR
 #include 
@@ -2253,6 +2254,11 @@ int lxc_setup(const char *name, struct lxc_conf 
*lxc_conf)
return -1;
}
 
+   if (lxc_notification_mount_hook(name, lxc_conf)) {
+   ERROR("failed to init notification mechanism for container 
'%s'.", name);
+   return -1;
+   }
+
if (setup_cgroup(name, &lxc_conf->cgroup)) {
ERROR("failed to setup the cgroups for '%s'", name);
return -1;
@@ -2540,6 +2546,8 @@ void lxc_conf_free(struct lxc_conf *conf)
if (conf->aa_profile)
free(conf->aa_profile);
 #endif
+   if (conf->notification_path)
+   free(conf->notification_path);
lxc_clear_config_caps(conf);
lxc_clear_cgroups(conf, "lxc.cgroup");
lxc_clear_hooks(conf);
diff --git a/src/lxc/conf.h b/src/lxc/conf.h
index dcf79fe..5ed67ec 100644
--- a/src/lxc/conf.h
+++ b/src/lxc/conf.h
@@ -31,6 +31,7 @@
 #include 
 
 #include  /* for lxc_handler */
+#include  /* for notification types */
 
 enum {
LXC_NET_EMPTY,
@@ -237,6 +238,8 @@ struct lxc_conf {
 #endif
char *seccomp;  // filename with the seccomp rules
int maincmd_fd;
+   lxc_notification_type_t notification_type;
+   char *notification_path;
 };
 
 int run_lxc_hooks(const char *name, char *hook, struct lxc_conf *conf);
diff --git a/src/lxc/confile.c b/src/lxc/confile.c
index 2d14e0f..f48b8c0 100644
--- a/src/lxc/confile.c
+++ b/src/lxc/confile.c
@@ -53,6 +53,8 @@ static int config_ttydir(const char *, char *, struct 
lxc_conf *);
 #if HAVE_APPARMOR
 static int config_aa_profile(const char *, char *, struct lxc_conf *);
 #endif
+static int config_notification_type(const char *, char *, struct lxc_conf *);
+static int c

Re: [lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-13 Thread Christian Seiler
> I like the idea but haven't looked at the implementation yet as the
> patch is really quite large. Quickly scanning through I briefly noticed
> that the copyright headers for the new files are wrong (refer to IBM and
> Daniel instead of Christian).

I just copy&pasted them from the other files, most header files I saw
contained the same copyright. Just tell me what exactly to put there and
then I'll do that for the next version of the patch.

> I'm also wondering if we shouldn't try to keep the "protocol" a bit more
> generic to eventually allow the container to send/receive more than just
> its status?

If we want to have a back-channel, we'd need a socket, which makes just
doing echo RUNNING > /dev/lxc-notify impossible, you'd need a special
program for that. Having the template scripts dump an additional script
or upstart job or systemd unit file or whatever in the container when
creating it seems a lot easier than having to use a special program.

On the other hand, it wouldn't be too complicated to have two special
files lying around: One for simple status updates using the current text
interface (easily scriptable, not much hassle to get basic status
notification functionality right) and a socket that supports an
extensible (binary?) protocol, which currently also only allows one to
change the status. But because it's extensible, the interface would
already be there.

Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] Fwd: Re: [PATCH] Add mechanism for container to notify host about end of boot

2012-09-13 Thread Christian Seiler
forwarding to the list because I forgot to use reply-to-all - sorry.

--- Begin Message ---
>> If we want to have a back-channel, we'd need a socket, which makes just
>> doing echo RUNNING > /dev/lxc-notify impossible, you'd need a special
>> program for that. Having the template scripts dump an additional script
>> or upstart job or systemd unit file or whatever in the container when
>> creating it seems a lot easier than having to use a special program.
> 
> Well, talking to a socket is really easy from the command line too:
> echo status STARTING | nc -U /dev/lxc_socket
> 
> It's depending on netcat but netcat is part of all distros and even
> exists in busybox (in a stripped down version).

Unfortunately, I know of at least 4 different netcat implementations (my
count includes Busybox's implementation) and only one of those 4
different implementations supports -U and/or UNIX domain sockets, that
is the one from OpenBSD, which is NOT the standard one that is installed
with many distros, including Debian and Gentoo. I don't think one should
rely on that.

But I thought about it and came up with the following Perl "one-liner"
that should do the trick:

echo status RUNNING | perl -e \
'use IO::Socket;
 $client = IO::Socket::UNIX->new(
Peer => "FILENAME",
Type => SOCK_STREAM,
Timeout => 10);
  while (<>) { print $client $_; }'

So yeah, a socket would probably be the better choice, the question now
is what kind of protocol should be specified...

Christian
--- End Message ---
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-14 Thread Christian Seiler
>> If we want to have a back-channel, we'd need a socket, which makes 
>> just
>> doing echo RUNNING > /dev/lxc-notify impossible, you'd need a 
>> special
>> program for that. Having the template scripts dump an additional 
>> script
>> or upstart job or systemd unit file or whatever in the container 
>> when
>> creating it seems a lot easier than having to use a special program.
>
> FYI, the systemd team actually want to be able to expose a full 
> socket
> from the container to the host, so that the host systemd/systemctl 
> cmd
> can directly communicate with the container's systemd. So I don't 
> think
> that /dev/lxc-notify would be useful for systemd.

First of all, you have to separate two things - I mentioned systemd
here in the sense that when the system reaches default.target,
/dev/lxc-notify should be pinged so that the lxc state now changes from
BOOTING to RUNNING. What you are talking about is a systemctl on the
outside of the container affecting the inside. I wanted to solve the
first problem, where /dev/lxc-notify is useful anyway, with or without
systemd.

The use case you are describing is a bit more complicated. You want to
expose a socket outside the container that is listened to by a program
inside the container. The problem here is that if you want to
bind-mount it before the pivot_root call, this will not work since
bind() for a socket will fail if the file already exists. But as soon
as you are already in the container, if systemd actually does listen to
a socket somewhere, you'll have a hard time bind-mounting it back to
the outside, How do you bridge the mount namespace? Obviously, if the
container's filesystem is mounted on the host anyway, you don't have a
problem, since you don't need to take care about the namespace; but
what if the lxc config specifies a block device that is then only
mounted inside the container's namespace?

That being said, if we actually implement /dev/lxc-notify (or however
one wants to call it, perhaps /run/lxc-host-interface?) as a socket
with an extensible protocol, it would be possible not only to have a
command that tells lxc to open a socket on the host and pass the fd
back through the connection, then systemd on the inside would be in
posession of a socket that listens on the outside and that an outside
systemctl could affect. So my proposal with the modifications suggested
by Stéphane would actually be able also solve your use case.

However, first I'd like to have the basic version just for status
updates (because that is a useful feature anyway, independently of the
init system) in order to keep it simple - and once that is done, one
may think about how/whether to extend this to include other use cases
that are more specialized.


--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-14 Thread Christian Seiler
> I'm very pleased about the discussion and efforts to implement such a
> feature because I already have asked for it in former times. In the
> one hand, this fifo approach may be used for more than the current
> task. But in the other hand, it's seems to need a bunch of
> dependencies.

You mean that you have to modify the container? Yes. sure, but the
modification is rather trivial - one should just ping the notification
FIFO/socket/whatever at boot (on systems with LSB init in a rc.local or
similar script for example) to notify lxc that the container is booted.

What you seem to want is a way for lxc to detect that automatically
without any intervention from the container. I don't think that's
possible in any kind of way that is not a complete and utter hack,
since only the container itself can have any concept of whether it's
done with booting or not. Of course, if you assume that you have
sysvinit and LSB scripts, you can check whether the command line of
init in /proc is now "init [3]" and that /etc/init.d/rc 3 (or however
that is called on your distro) has finished running inside the
container and the process doesn't exist anymore. But that doesn't take
into account upstart or systemd or any other kind of init system.

If you can guarantee a certain environment, you could probably hack
something together along the above lines, but I personally don't think
that it would be a good idea for the much more general lxc code to
include some hack like this.

Christian


--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-14 Thread Christian Seiler
> I must admit the details aren't worked out, but the rough idea was
> something like the following. On the host have a directory per
> container, in which the socket is setup
>
>/var/lib/systemd/container/
>
> And bind '/var/lib/systemd/containerXXX' into the container in some
> location, lets say '/var/lib/systemd/self/'. The idea is that if
> systemd in the container now listens on 
> /var/lib/systemd/self/systemd.sock
> that a process in the host can connect via
>
>   /var/lib/systemd/container/systemd.sock

This you can already do in current lxc - just add an entry in the form

lxc.mount.entry = /var/lib/systemd/containerXXX var/lib/systemd/self 
none bind 0 0

to the lxc config file of your container. There's no need to change any
code for that. (You have to make sure both directories exist, however.)

OTOH, for the status updates I'm proposing, it's more LXC itself having
some form of indication as to whether the container is currently really
running, just booting or in the process of shutting down - that makes
lxc-info much more useful.

> I'm a little fuzzy on exactly how UNIX domain socket paths interact
> wrt mount namespaces

As long as you can see the socket, you can connect to it. If you
bind-mount a directory, any socket you create inside the container will
also appear on the host. What you can't do is just bind-mount a socket
itself, since it already has to exist, which means that you can't bind
to it and listen after that.

The only tricky thing are UNIX domain sockets in the abstract 
namespace,
i.e. the ones starting with a 0-byte in their name: They are tied to 
the
network namespace, so you can *never* see an abstract UNIX socket from
another namespace (unless you manage to pass around the fd in some 
way).
But for sockets which are tied to a real object in the filesystem, this
restriction doesn't apply.

By the way, as a side-note for your idea for systemctl working from the
outside: If you really want to isolate your container from the host,
then you have to make sure that in can't DOS the host by filling up
/var. This is not possible if you just bind-mount a socket/FIFO, but
that doesn't work for your use-case, so you probably would want to
mount a tmpfs with a *very* small quota to 
/var/lib/systemd/containerXXX
(in the pre-start lxc hook for example) and then bind-mount that 
instead
of part of a real file system that may be filled up.

Regards,
Christian


--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-17 Thread Christian Seiler
Hi,

> It is a bit weird to bind mount this fifo.

IMHO, a lot of low-level stuff appears to be at first glance, until you 
understand the reasoning behind it. Bind-mounting non-directories is 
actually used a lot in namepsaces. For example, if you want to keep a 
network namespace around after the last process in that namespace exits, 
you will want to bind-mount /proc/$some_pid_in_net_namespace/ns/net to 
somewhere else - then, only when the bind-mount is removed AND the last 
pid in the namespace has died is the namepsace released.

> Furthermore, I would suggest
> to prevent using a fifo it is prone to problems and could hang the
> supervisor process (aka lxc-start).

Why is it by itself prone to problems and/or hangs? The file descriptor 
is opened in nonblocking way, it is added to the epoll logic in 
mainloop. I don't see any way how that in and by itself could cause a 
hang. You should also keep in mind that the current mainloop code also 
does the user socket stuff (and since it's abstract, any user from the 
outside can connect and send at least one message before the code 
notices via SCM_RIGHTS that the credentials are wrong) and the console 
stuff (arbitrary data from inside the container is manually piped to the 
log file, if present). I don't see how one fifo / socket where the 
container's root can write anything to would make any qualitative 
difference. It's just an fd more to take care of.

> Maybe here a simple file in the rootfs let's say
> rootfs/var/run/lxc-notify would be sufficient.
> From lxc-start monitor this file and when it is created or modified 
> or
> whatever, the system running the container is booted.

How would that work if a block device is mounted as rootfs solely in 
the container namespace? Or, even if you leave the rootfs to be 
accessible from the outside, most distributions mount /run (/var/run is 
usually a symlink to /run) as a tmpfs nowadays - which is done inside 
the mount namespace of the container. You won't be able to see that from 
lxc-start. That's why I want to bind-mount something into the container, 
it's the only reliable way I can see to make sure there is a backchannel 
from the container.

> I suggest to decorrelate the states sent by lxc-start to lxc-info and 
> so
> from this notification mechanism.

I don't exactly understand what you mean by 'decorrelate' here?

>> Inside the container one may execute 'echo RUNNING > 
>> /dev/lxc-notify' or an
>> equivalent command to notify lxc-start that the container has now 
>> booted.
>> Similarly, 'echo STOPPING > /dev/lxc-notify' will change the status 
>> to
>> STOPPING, which may be done on shutdown. Currently, only RUNNING and
>> STOPPING are allowed, other states are ignored.
>
> How the process writing the "STOPPING" string can know the container 
> is
> shutting down ?

Let's say root inside a container writes 'shutdown -h now' - then the 
container is technically still running, even though init will exit 
really soon. I think that does qualify as STOPPING. If you think 
STOPPING is not completely accurate, we may introduce another status 
such as SHUTDOWN, but I think the principle applies.

That all said, in this thread quite a few people have said that they'd 
prefer a socket instead of a fifo, so if you are agreeable to the basic 
principle, the next version of my patch will use a socket instead.

Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 1/2] fix trivial off by one error

2012-09-18 Thread Christian Seiler
Hi,

Just a heads up:

> Since the if uses >=, the - 1 is not needed and the MAXFDS'th
> entry in the fds array can be used.

This was from part of one of my patches regarding lxc-attach and it is
NOT an off-by-one error, it is meant to be this way. The problem is that
the array has to be traversed later (both for completing the attach
operation or aborting it), see lxc_cgroup_(dispose|finish)_attach:

| for (; *fds >= 0; fds++) {

This loop will result in a potential buffer overflow with your change.

Obviously, you could rewrite it to also check against the offset of the
beginning of the array and compare that to MAXFDS in those loops, but
that makes the traversal loop more complicated to read and whether the
maximum number of cgroup controllers mounted that lxc-attach supports is
255 or 256 shouldn't matter much. On the other hand, this means the
overflow wouldn't occur in practice - however, if at some later point
somebody should change MAXFDS to 16 to save some memory, for example on
embedded systems, then this could lead to undefined behaviour - and even
potential security wholes if lxc-attach is installed as setuid root.

But because this appears to be something that needs clarification,
perhaps you could change your patch to just add a comment explaining the
situation?

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 1/2] fix trivial off by one error

2012-09-18 Thread Christian Seiler
Hi,

> Do you think mallocing an fd_set and using FD_SET() and friends
> would be better? The (dispose|finish) loops would visit FD_SETSIZE bits
> with an FD_ISSET() test, which is more work than you have currently
> with the early out, but we would probably save on the initialization
> with FD_ZERO(). I don't know if lxc_cgroup_(dispose|finish)_attach is
> performance critical.

I don't think performance is that much of an issue here, but to me it
seems that using fd_set logic would complicate things quite a bit
unnecessarily. The current logic is already a bit complicated because
the cgroup task files have to be opened before setns() but written to
only after the fork() call when we know the pid which happens after
setns(). Having a simple array with a loop over it appears to be much
more straight-forward to me, especially since iterating over an fd_set
is kind of convoluted.

> Or I can just add a comment :)

My suggestion would be to do just that unless someone has a good reason
to change the current logic.

All IMHO of course, I just wrote the initial patch, in the end other
people get to decide what goes in. ;-)

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-19 Thread Christian Seiler
Hi there,

I've now updated my patch, there are now the following changes,
partially based on feedback from this list, partially from sorting
things in my head a bit.

  - socket instead of a FIFO
  - is now in /run instead of /dev
  - parent directories of socket inside container are automatically
created
  - extensible but very simple protocol, line-based, currently, only
'status NEWSTATUS' is supported, where NEWSTATUS may be either
RUNNING or STOPPING
  - now returns either 'OK' or 'ERR message' to the caller

Regards,
Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v2] Add mechanism for container to notify host about end of boot

2012-09-19 Thread Christian Seiler
This patch adds a simple notification system that allows the container to
notify the host (in particular, the lxc-start process) that the boot process
has been completed successfully. It also adds an additional status BOOTING
that lxc-info may return. This allows the administrator and scripts to
distinguish between a fully-running container and a container that is still
in the process of booting.

If nothing is added to the configuration file, the current behavior is not
changed, i.e. after lxc-start finishes the initialization, the container is
immediately put into the RUNNING state. This ensures backwards
compatibility.

If lxc.notification.enabled is set to yes, after lxc-start initialization
the container is initially put into the state BOOTING. Also, a UNIX domain
socket /var/lib/lxc/%s/notification-socket is created and bind-mounted into
the container, by default to /run/lxc-notify. (This can be changed by the
lxc.notification.path setting.) The default access mode of this socket is
0600, so only root may access it.

A program in the container may connect to the socket and use a very simple
protocol to notify lxc-start that the boot sequence of the container has
been completed. Sending 'status RUNNING' or 'status STOPPING' will cause the
container to change the status to either RUNNING or STOPPING, others are not
allowed from within the container. If everything succeeds, lxc-start will
respond with 'OK', otherwise with 'ERR $error_message'.

The following script may be used from the shell to notify lxc-start from
within the container that boot has completed successfully:

echo status RUNNING | perl -e \
   'use IO::Socket;
$client = IO::Socket::UNIX->new(
   Peer => "/run/lxc-notify",
   Type => SOCK_STREAM,
   Timeout => 10
);
while (<>) {
  print $client $_;
}'

If OpenBSD's version of netcat is available, executing

echo status RUNNING | nc -U /run/lxc-notify

is also possible. In addition to RUNNING, STOPPING is also allowed from
within the container, so if somebody types 'shutdown -h now' in the
container, the status gets updated immediately.

The mechanism is designed to be extensible, i.e. commands other than
'status' may be supported at a later point, if other needs arise for the
communication between inside and outside the container.

Signed-off-by: Christian Seiler 
Cc: Serge Hallyn 
---
 src/lxc/Makefile.am|1 +
 src/lxc/conf.c |8 +
 src/lxc/conf.h |2 +
 src/lxc/confile.c  |   35 
 src/lxc/notification.c |  435 
 src/lxc/notification.h |   40 +
 src/lxc/start.c|   22 ++-
 src/lxc/start.h|1 +
 src/lxc/state.c|1 +
 src/lxc/state.h|3 +-
 10 files changed, 544 insertions(+), 4 deletions(-)
 create mode 100644 src/lxc/notification.c
 create mode 100644 src/lxc/notification.h

diff --git a/src/lxc/Makefile.am b/src/lxc/Makefile.am
index 7d86ad6..d976bf7 100644
--- a/src/lxc/Makefile.am
+++ b/src/lxc/Makefile.am
@@ -32,6 +32,7 @@ liblxc_so_SOURCES = \
freezer.c \
checkpoint.c \
restart.c \
+   notification.h notification.c \
error.h error.c \
parse.c parse.h \
cgroup.c cgroup.h \
diff --git a/src/lxc/conf.c b/src/lxc/conf.c
index f3c2334..7c9e936 100644
--- a/src/lxc/conf.c
+++ b/src/lxc/conf.c
@@ -61,6 +61,7 @@
 #include "log.h"
 #include "lxc.h"   /* for lxc_cgroup_set() */
 #include "caps.h"   /* for lxc_caps_last_cap() */
+#include "notification.h"
 
 #if HAVE_APPARMOR
 #include 
@@ -2278,6 +2279,11 @@ int lxc_setup(const char *name, struct lxc_conf 
*lxc_conf)
return -1;
}
 
+   if (lxc_notification_mount_hook(name, lxc_conf)) {
+   ERROR("failed to init notification mechanism for container 
'%s'.", name);
+   return -1;
+   }
+
if (setup_cgroup(name, &lxc_conf->cgroup)) {
ERROR("failed to setup the cgroups for '%s'", name);
return -1;
@@ -2583,6 +2589,8 @@ void lxc_conf_free(struct lxc_conf *conf)
if (conf->aa_profile)
free(conf->aa_profile);
 #endif
+   if (conf->notification_path)
+   free(conf->notification_path);
lxc_clear_config_caps(conf);
lxc_clear_cgroups(conf, "lxc.cgroup");
lxc_clear_hooks(conf, "lxc.hook");
diff --git a/src/lxc/conf.h b/src/lxc/conf.h
index dccc176..af9a178 100644
--- a/src/lxc/conf.h
+++ b/src/lxc/conf.h
@@ -237,6 +237,8 @@ struct lxc_conf {
 #endif
char *seccomp;  // filename with the seccomp rules
int maincmd_fd;
+   int notification_enabled;
+   char *notification_path;
 };
 
 int run_lxc_hooks(const char *name, char *hook, struct lxc_c

Re: [lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-19 Thread Christian Seiler
Hi,

>>   - is now in /run instead of /dev
>
> I don't think that part's going to work... Most distros mount /run as
> tmpfs at boot time which will hide anything you're putting in there
> before boot.

Hmmm, that is indeed a problem... Do you have any suggestions? Or
should we just keep it in /dev for now (path is configurable anyway)
and worry about this later?

Christian


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v2.1] Add mechanism for container to notify host about end of boot

2012-09-19 Thread Christian Seiler
This patch adds a simple notification system that allows the container to
notify the host (in particular, the lxc-start process) that the boot process
has been completed successfully. It also adds an additional status BOOTING
that lxc-info may return. This allows the administrator and scripts to
distinguish between a fully-running container and a container that is still
in the process of booting.

If nothing is added to the configuration file, the current behavior is not
changed, i.e. after lxc-start finishes the initialization, the container is
immediately put into the RUNNING state. This ensures backwards
compatibility.

If lxc.notification.enabled is set to yes, after lxc-start initialization
the container is initially put into the state BOOTING. Also, a UNIX domain
socket /var/lib/lxc/%s/notification-socket is created and bind-mounted into
the container, by default to /dev/lxc-notify. (This can be changed by the
lxc.notification.path setting.) The default access mode of this socket is
0600, so only root may access it.

A program in the container may connect to the socket and use a very simple
protocol to notify lxc-start that the boot sequence of the container has
been completed. Sending 'status RUNNING' or 'status STOPPING' will cause the
container to change the status to either RUNNING or STOPPING, others are not
allowed from within the container. If everything succeeds, lxc-start will
respond with 'OK', otherwise with 'ERR $error_message'.

The following script may be used from the shell to notify lxc-start from
within the container that boot has completed successfully:

echo status RUNNING | perl -e \
   'use IO::Socket;
$client = IO::Socket::UNIX->new(
   Peer => "/dev/lxc-notify",
   Type => SOCK_STREAM,
   Timeout => 10
);
while (<>) {
  print $client $_;
}'

If OpenBSD's version of netcat is available, executing

echo status RUNNING | nc -U /dev/lxc-notify

is also possible. In addition to RUNNING, STOPPING is also allowed from
within the container, so if somebody types 'shutdown -h now' in the
container, the status gets updated immediately.

The mechanism is designed to be extensible, i.e. commands other than
'status' may be supported at a later point, if other needs arise for the
communication between inside and outside the container.

Signed-off-by: Christian Seiler 
Cc: Serge Hallyn 
---
 doc/lxc.conf.sgml.in   |   67 
 src/lxc/Makefile.am|1 +
 src/lxc/conf.c |8 +
 src/lxc/conf.h |2 +
 src/lxc/confile.c  |   35 
 src/lxc/notification.c |  435 
 src/lxc/notification.h |   40 +
 src/lxc/start.c|   22 ++-
 src/lxc/start.h|1 +
 src/lxc/state.c|1 +
 src/lxc/state.h|3 +-
 11 files changed, 611 insertions(+), 4 deletions(-)
 create mode 100644 src/lxc/notification.c
 create mode 100644 src/lxc/notification.h

diff --git a/doc/lxc.conf.sgml.in b/doc/lxc.conf.sgml.in
index 1428f25..a4950e8 100644
--- a/doc/lxc.conf.sgml.in
+++ b/doc/lxc.conf.sgml.in
@@ -738,6 +738,73 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307 USA
   
 
 
+
+  Notifications
+  
+   LXC supports a notification system that allows the container
+   to notify LXC when it has finished booting. If enabled, the
+   container has the status of BOOTING when control is given to
+   the init process and the container must notify lxc once the
+   boot sequence has completed. Since this requires intervention
+   from the inside of the container, it is disabled by default.
+  
+  
+   This system allows the administrator to distinguish from the
+   outside a container is that fully started (RUNNING) from a
+   container that is still in the process of initialization
+   (BOOTING).
+  
+  
+   The system is kept simple: LXC bind-mounts a UNIX socket into
+   the container (by default to
+   /dev/lxc-notify) and the container can
+   inform LXC via a very simple protocol. The following piece of
+   code may be used for the notification:
+  
+  
+   echo status RUNNING | perl -e 'use IO::Socket;
+   $client = IO::Socket::UNIX->new(
+  Peer => "/dev/lxc-notify",
+  Type => SOCK_STREAM,
+  Timeout => 10
+   );
+   while (<>) {
+ print $client $_;
+   }'
+  
+  
+   The following options control notifications:
+  
+  
+   
+ 
+   lxc.notification.enabled
+ 
+ 
+   
+ Whether the notification system is enabled. The values
+ true, yes,
+ on and 1 indicate that
+ it should be enabled, any

Re: [lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

2012-09-19 Thread Christian Seiler
Hi,

>> I think /dev is the safest at the moment. Arguably it's wrong as it's
>> not an actual device node, but it's the only directory that lxc already
>> requires all distros not to mess with (or we wouldn't have working
>> console, tty, ...).
> 
> What are some other alternatives?
> 
> We could use some sysvipc mechanism - just have the container share
> the ipcns with the monitor.

The socket approach has the advantage that it can still be scripted from
the shell - my guess would be that this becomes really hard with sysvipc.

> We could create a directory (default /container, specifiable in
> the container config) where such communication files will be
> mounted.  Let the per-distro templates set up the location and
> the distro-userspace to work together.

Since other people in this thread have expressed interest in having some
general kind of directory to communicate with the container, perhaps
this really is the best idea. Then this would consist of the following:

 1) Create a directory that is shared between host and container
More specifically:
  - mount a tmpfs with size=512k and nr_inodes=512
(should be MORE than sufficient to put a few sockets or
similar there) to /var/lib/lxc/$name/interface (or wherever)
  - just before pivot_root: bind-mount it to /container or
any place specified in the config

 2) Create a lxc-specific socket inside /var/lib/lxc/$name/interface
for status notifications when the mainloop is started.

 3) Other applications may choose to put sockets there for their own
purposes if they whish.

The small tmpfs will make sure that the container can't do a disk space
denial-of-service on the host.

Thoughts?

> Others?

My guess is that other methods would certainly be possible but unless
I'm missing something obvious, I don't think there's anything out there
that isn't quite a bit more complicated than all the solutions discussed
here.

Regards,
Christian

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH] [trivial] lxc-ls: Scan cgroup mount points from fstype and not device

2012-09-24 Thread Christian Seiler
lxc-ls --active now scans mount points that have the 'cgroup' filesystem
type and not the 'cgroup' device name (which is ignored anyway and may be
anything).

Signed-off-by: Christian Seiler 
Cc: Serge Hallyn 
---
 src/lxc/lxc-ls.in |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/lxc/lxc-ls.in b/src/lxc/lxc-ls.in
index 4b9dc77..9293323 100644
--- a/src/lxc/lxc-ls.in
+++ b/src/lxc/lxc-ls.in
@@ -56,7 +56,7 @@ get_parent_cgroup()
init_cgroup=${fields#*:}
 
# Get the filesystem mountpoint of the hierarchy
-   mountpoint=$(grep -E "^cgroup [^ ]+ [^ ]+ ([^ 
]+,)?$subsystems(,[^ ]+)? " /proc/self/mounts | cut -d ' ' -f 2)
+   mountpoint=$(grep -E "^[^ ]+ [^ ]+ cgroup ([^ 
]+,)?$subsystems(,[^ ]+)? " /proc/self/mounts | cut -d ' ' -f 2)
if [ -z "$mountpoint" ]; then continue; fi
 
# Return the absolute path to the containers' parent cgroup
-- 
1.7.2.5


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH] Multiple IP addresses: add them in the correct order

2013-01-15 Thread Christian Seiler
Make sure that when configuring containers that have interfaces containing
multiple IP addresses they are added in the order of the configuration file
(i.e. the first being the primary one) and not the reverse order.

Signed-off-by: Christian Seiler 
---
 src/lxc/confile.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

This problem hit me when I created a container with multiple IPv4 addresses
and they were added in the reverse order, making the last one in the config
file the primary address from which all outgoing connections were made
unless the program bound source IPs explicitly, which then caused an
IP-based filter on the other end to deny access.

diff --git a/src/lxc/confile.c b/src/lxc/confile.c
index 034136e..da87088 100644
--- a/src/lxc/confile.c
+++ b/src/lxc/confile.c
@@ -624,7 +624,7 @@ static int config_network_ipv4(const char *key, const char 
*value,
htonl(INADDR_BROADCAST >>  inetdev->prefix);
}
 
-   lxc_list_add(&netdev->ipv4, list);
+   lxc_list_add_tail(&netdev->ipv4, list);
 
free(addr);
return 0;
@@ -716,7 +716,7 @@ static int config_network_ipv6(const char *key, const char 
*value,
return -1;
}
 
-   lxc_list_add(&netdev->ipv6, list);
+   lxc_list_add_tail(&netdev->ipv6, list);
 
free(valdup);
return 0;
-- 
1.7.2.5


--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 1/1] lxc_attach: fix break with user namespaces (v2)

2013-01-21 Thread Christian Seiler
Hi Serge,

Just a few quick comments because I'm very interested in the lxc-attach
utility:

> + ret = lxc_cgroup_prepare_attach(my_args.name, 
> &cgroup_data);
> + if (ret < 0) {
> + ERROR("failed to prepare attaching to cgroup");
> + return -1;
> + }
> +
> + ret = lxc_cgroup_finish_attach(cgroup_data, gchild);
> + if (ret < 0) {
> + ERROR("failed to attach process to cgroup");
> + return -1;
> + }

Note that I made the whole cgroup attach logic so complicated a while
back (i.e. prepare before setns/fork -> finish/dispose after fork),
precisely so that one wouldn't need a second fork and so I didn't have
to play around with IPC to get the PID of the process that is to be
added to the cgroup. If an additional fork is needed anyway due to user
namespaces (that reminds me that I should definitely try them out...),
the reason for making the cgroup attach logic so complicated disappears
and one could return to the direct approach from before, probably makes
it quite a bit easier to read.

Side note 1: that I would use pid_t and not int as the data type of the
object sent through the pipe, that seems more portable to me.

Side note 2: Idea, but untested and not completely thought through, just
wanted to put it out there: The "middle" process in your logic need not
necessarily be kept around if the kernel version is at least 3.4 -
because Linux then supports prctl(PR_SET_CHILD_SUBREAPER), which - if
I'm not completely mistaken - would reparent the "inner" process (that
will execve() to the requested program inside the container) to the
"outer" attach process after the "middle" process exits. That way we
might keep only one event loop around. On the other hand, this still
requires both implementations if kernels < 3.4 are still to be
supported... (as I said, not completely thought through, just wanted to
put the idea out there)

> + return -1;
> +
> + return 0;

This seems a bit weird... ;-)

> + /* XXX FIXME this should get the uid of the container 
> init and setuid to that */
> + /* XXX FIXME or perhaps try to map in the lxc-attach 
> caller's uid? */

I believe a sane default would be the first option (uid of init, on the
other hand, that isn't so easy to get by, because one has to assume
/proc is mounted and certain security features are turned off) and let
the user specify a uid otherwise.

Btw. I noticed recently that if the glibc/nss implementations of host
and container are incompatible, even a plain lxc-attach without
specifying /bin/sh won't work since the lxc-attach code running that
tries to determine the login shell comes from the host's glibc and the
nss module loaded comes from the container. (There probably should be a
default of /bin/sh in that case instead of failure.) That same issue
will come to bite even harder if lxc-attach tries to lookup a user name
that the user has specified in order to map it to a given user id... I
don't see an easy solution for this in general... (Other than to simply
say: never do nss lookups from lxc-attach, i.e. only use numerical ids,
default for /bin/sh for the shell and let the user specify otherwise if
wanted.)

Regards,
Christian

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 1/1] lxc_attach: fix break with user namespaces (v2)

2013-01-22 Thread Christian Seiler
Hi Serge,

> Would you care to update the patch along these lines?

Will do, but it will take me a few days time, since I have to set up an
environment where I can test user namespaces first.

Regards,
Christian


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 0/2] lxc_attach and user namespaces

2013-03-03 Thread Christian Seiler
As discussed earlier on this list with Serge, here is my first set of
patches that fixes lxc_attach to user namespaces.

The first patch is bascially Serge's patch v2 with the following changes:

 - use socketpair() instead of pipes because we need two-way
   communication; before we exec() we need to make sure that
   the process was added to cgroups, otherwise this can be
   racy (for example, we execute something that fork()s
   immediately, then that may happen before we return from
   attaching the child to groups - this is now fixed)

 - some minor cleanups

 - a large explanatory comment in the source code about the
   general logic

 - use lxc_cgroup_attach directly, don't use prepare/finish/dispose
   (We don't need them any more if we double-fork()!)

The second patch just gets rid of the unnecessary
prepare/finish/dispose functions for cgroup attaching that were
introduced to avoid a triple-fork in the first place.

A few more patches will follow shortly, especially w.r.t. to UID
and shell handling.

-- Christian

PS: As a side note: I currently get some weird error messages when the
attached process ends:
  /bin/sh: 0: Cannot set tty process group (No such process)
Apprently, upon exit, the shell of the container tries to reset the
controlling terminal to have the process group of its parent process be
the foreground process group. That fails, (because parent pid appears to
be 0 from the inside), so it prints this message. Strangely enough, I
got this message only recently, is this a new feature of the shell
current Ubuntu versions use?

I don't see an easy way to suppress the message btw., so I'm open
to suggestions.


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 2/2] lxc_attach: Clean up cgroup attaching code

2013-03-03 Thread Christian Seiler
Since lxc_attach now works with two fork()s anyway due to user
namespaces, the code for attaching to cgroups can be simplified again.

This patch removes the prepare/finish/dispose functions for attaching
to cgroups and just keeps the lxc_cgroup_attach function.
---
 src/lxc/cgroup.c |  154 ++---
 src/lxc/cgroup.h |3 -
 2 files changed, 18 insertions(+), 139 deletions(-)

diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
index 6630d6c..8420e08 100644
--- a/src/lxc/cgroup.c
+++ b/src/lxc/cgroup.c
@@ -259,37 +259,12 @@ static int cgroup_enable_clone_children(const char *path)
return ret;
 }
 
-static int lxc_one_cgroup_finish_attach(int fd, pid_t pid)
-{
-   char buf[32];
-   int ret;
-
-   snprintf(buf, 32, "%ld", (long)pid);
-
-   ret = write(fd, buf, strlen(buf));
-   if (ret <= 0) {
-   SYSERROR("failed to write pid '%ld' to fd '%d'", (long)pid, fd);
-   ret = -1;
-   } else {
-   ret = 0;
-   }
-
-   close(fd);
-   return ret;
-}
-
-static int lxc_one_cgroup_dispose_attach(int fd)
-{
-   close(fd);
-   return 0;
-}
-
-static int lxc_one_cgroup_prepare_attach(const char *name,
-struct mntent *mntent)
+static int lxc_one_cgroup_attach(const char *name, struct mntent *mntent, 
pid_t pid)
 {
int fd;
char tasks[MAXPATHLEN], initcgroup[MAXPATHLEN];
char *cgmnt = mntent->mnt_dir;
+   char buf[32];
int flags;
int rc;
 
@@ -310,77 +285,26 @@ static int lxc_one_cgroup_prepare_attach(const char *name,
return -1;
}
 
-   return fd;
-}
-
-static int lxc_one_cgroup_attach(const char *name, struct mntent *mntent, 
pid_t pid)
-{
-   int fd;
-
-   fd = lxc_one_cgroup_prepare_attach(name, mntent);
-   if (fd < 0) {
-   return -1;
-   }
-
-   return lxc_one_cgroup_finish_attach(fd, pid);
-}
-
-int lxc_cgroup_dispose_attach(void *data)
-{
-   int *fds = data;
-   int ret, err;
-
-   if (!fds) {
-   return 0;
-   }
-
-   ret = 0;
-
-   for (; *fds >= 0; fds++) {
-   err = lxc_one_cgroup_dispose_attach(*fds);
-   if (err) {
-   ret = err;
-   }
-   }
-
-   free(data);
-
-   return ret;
-}
-
-int lxc_cgroup_finish_attach(void *data, pid_t pid)
-{
-   int *fds = data;
-   int err;
+   snprintf(buf, 32, "%ld", (long)pid);
 
-   if (!fds) {
-   return 0;
+   rc = write(fd, buf, strlen(buf));
+   if (rc <= 0) {
+   SYSERROR("failed to write pid '%ld' to fd '%d'", (long)pid, fd);
+   rc = -1;
+   } else {
+   rc = 0;
}
 
-   for (; *fds >= 0; fds++) {
-   err = lxc_one_cgroup_finish_attach(*fds, pid);
-   if (err) {
-   /* get rid of the rest of them */
-   lxc_cgroup_dispose_attach(data);
-   return -1;
-   }
-   *fds = -1;
-   }
-
-   free(data);
-
-   return 0;
+   close(fd);
+   return rc;
 }
 
-int lxc_cgroup_prepare_attach(const char *name, void **data)
+int lxc_cgroup_attach(const char *name, pid_t pid)
 {
struct mntent *mntent;
FILE *file = NULL;
-   int err = -1;
int found = 0;
-   int *fds;
-   int i;
-   static const int MAXFDS = 256;
+   int err = 0;
 
file = setmntent(MTAB, "r");
if (!file) {
@@ -388,29 +312,7 @@ int lxc_cgroup_prepare_attach(const char *name, void 
**data)
return -1;
}
 
-   /* create a large enough buffer for all practical
-* use cases
-*/
-   fds = malloc(sizeof(int) * MAXFDS);
-   if (!fds) {
-   err = -1;
-   goto out;
-   }
-   for (i = 0; i < MAXFDS; i++) {
-   fds[i] = -1;
-   }
-
-   err = 0;
-   i = 0;
while ((mntent = getmntent(file))) {
-   if (i >= MAXFDS - 1) {
-   ERROR("too many cgroups to attach to, aborting");
-   lxc_cgroup_dispose_attach(fds);
-   errno = ENOMEM;
-   err = -1;
-   goto out;
-   }
-
DEBUG("checking '%s' (%s)", mntent->mnt_dir, mntent->mnt_type);
 
if (strcmp(mntent->mnt_type, "cgroup"))
@@ -421,42 +323,22 @@ int lxc_cgroup_prepare_attach(const char *name, void 
**data)
INFO("[%d] found cgroup mounted at '%s',opts='%s'",
 ++found, mntent->mnt_dir, mntent->mnt_opts);
 
-   fds[i] = lxc_one_cgroup_prepare_attach(name, mntent);
-   if (fds[i] < 0) {
-   err = fds[i];
-   lxc_cgroup_dispose_attach(fds);
+   err = lxc_one_cgroup_attach(name, mnt

[lxc-devel] [PATCH 1/2] lxc_attach: fix break with user namespaces (v3)

2013-03-03 Thread Christian Seiler
When you clone a new user_ns, the child cannot write to the fds
opened by the parent.  Hnadle this by doing an extra fork.  The
grandparent hangs around and waits for its child to tell it the
pid of of the grandchild, which will be the one attached to the
container.  The grandparent then moves the grandchild into the
right cgroup, then waits for the child who in turn is waiting on
the grandchild to complete.

Secondly, when attaching to a new user namespace, your old uid is
not valid, so you are uid -1.  This patch simply does setid+setuid
to 0 if that is the case.  We probably want to be smarter, but
for now this allows lxc-attach to work.

Signed-off-by: Christian Seiler 
---
 src/lxc/lxc_attach.c |  178 ++
 1 files changed, 150 insertions(+), 28 deletions(-)

diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index e1511ef..1f60266 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "attach.h"
@@ -128,9 +129,9 @@ int main(int argc, char *argv[])
struct passwd *passwd;
struct lxc_proc_context_info *init_ctx;
struct lxc_handler *handler;
-   void *cgroup_data = NULL;
uid_t uid;
char *curdir;
+   int cgroup_ipc_sockets[2];
 
ret = lxc_caps_init();
if (ret)
@@ -157,18 +158,6 @@ int main(int argc, char *argv[])
return -1;
}
 
-   if (!elevated_privileges) {
-   /* we have to do this now since /sys/fs/cgroup may not
-* be available inside the container or we may not have
-* the required permissions anymore
-*/
-   ret = lxc_cgroup_prepare_attach(my_args.name, &cgroup_data);
-   if (ret < 0) {
-   ERROR("failed to prepare attaching to cgroup");
-   return -1;
-   }
-   }
-
curdir = getcwd(NULL, 0);
 
/* determine which namespaces the container was created with
@@ -184,6 +173,106 @@ int main(int argc, char *argv[])
}
}
 
+   /* For the cgroup attaching logic to work in conjunction with pid and 
user namespaces,
+* we need to have the following hierarchy:
+*
+* lxc-attach [process executed externally]
+* | socketpair(cgroup_ipc_sockets)
+* | fork()   -> child
+* |   | setns()
+* |   | fork()-> grandchild
+* |   |   | initialize
+* |   |   | signal parent
+* |   |<--|+
+* |   | signal parent |
+* |<--|-+ |
+* | add to cgroups|   |
+* | signal child >|   |
+* |   | signal child >|
+* | waitpid() | waitpid() | exec()
+* |   |<--| exit()
+* |<--| exit()
+* | exit()
+*
+* The rationale is the following: The first parent is needed because 
after
+* setns() (mount + user namespace) we can't access the cgroup 
filesystem
+* to add the pid to the corresponding cgroup. Therefore, we need to do 
that
+* in a process executed on the host, so that's why we need to fork and 
wait
+* for it to have done some initialization (cgroups may restrict certain
+* operations so we have to do that in the end) and use IPC for 
signaling.
+*
+* Then in the child process we do the setns(). However, a process is 
never
+* really attached to a pid namespace (never changes its pid, doesn't 
appear
+* in the pid namespace /proc), only child processes of that process are
+* truely inside the new pid namespace. That's why we need to fork() 
again
+* after setns() before performing final initializations, then signal 
our
+* parent, which signals the primary process, which does cgroup adding,
+* which then signals to the grandchild that it can exec().
+*/
+   ret = socketpair(PF_LOCAL, SOCK_STREAM, 0, cgroup_ipc_sockets);
+   if (ret < 0) {
+   SYSERROR("could not set up required IPC mechanism for 
attaching");
+   return -1;
+   }
+
+   pid = fork();
+   if (pid < 0) {
+   SYSERROR("failed to create first subprocess");
+   return -1;
+   }
+
+   if (pid) {
+

Re: [lxc-devel] [PATCH 2/2] lxc_attach: Clean up cgroup attaching code

2013-03-04 Thread Christian Seiler
Hi Serge,

> (Note - no signed-off-by in this patch.  How are you generating them?
> I'd recommend either using git-send-email, or get format-patch...)

Oh, I didn't know git format-patch had a --signoff option, I always
added the line manually when committing and this time I just forgot it.
;-)

> Thanks, Christian.  Unfortunately this will clash badly with my cgroup
> update which does the same thing, so while I 100% ack the concept,
> Stéphane please do not apply this.

Ok, I didn't know you were working on that. Btw. I'll be posting a few
other patches w.r.t. attach soon, but they shouldn't touch cgroup.[ch],
so they probably will apply correctly regardless.

- Christian

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/3] lxc-attach: Default to /bin/sh if shell cannot be determined or exec'd

2013-03-04 Thread Christian Seiler
If the NSS implementation of the host and the container is
incompatible, getpwuid() will fail and the shell of the user in the
container cannot be determined. In that case, don't simply fail, but
rather default to /bin/sh. Since this code path is only executed when
attaching to a container without a command argument, this makes the
default behavior of lxc-attach a lot more robust.

Signed-off-by: Christian Seiler 
---
 src/lxc/lxc_attach.c |   22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index 1f60266..292b5b5 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -438,15 +438,26 @@ int main(int argc, char *argv[])
uid = getuid();
 
passwd = getpwuid(uid);
-   if (!passwd) {
-   SYSERROR("failed to get passwd "\
-"entry for uid '%d'", uid);
-   return -1;
+
+   if (passwd) {
+   char *const args[] = {
+   passwd->pw_shell,
+   NULL,
+   };
+
+   execvp(args[0], args);
}
 
+   /* executed if either no passwd entry or execvp fails,
+* we will fall back on /bin/sh as a default shell
+*
+* this will make lxc-attach work better out of the box,
+* esp. when attaching to a container that has an
+* incompatible nss implementation
+*/
{
char *const args[] = {
-   passwd->pw_shell,
+   "/bin/sh",
NULL,
};
 
@@ -454,7 +465,6 @@ int main(int argc, char *argv[])
SYSERROR("failed to exec '%s'", args[0]);
return -1;
}
-
}
 
return 0;
-- 
1.7.10.4


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 3/3] lxc-attach: Allow the user to request uid/gid when attaching

2013-03-04 Thread Christian Seiler
This patch implements the -u and -g options for lxc-attach that allows
the user to ask for a specific user and group id when attaching to a
container.

NOTE: DO NOT APPLY THIS PATCH JUST YET, THERE ARE SECURITY IMPLICATIONS
THAT HAVE TO BE CONSIDERED BEFORE DOING SO. THIS IS JUST A DRAFT.
---
 src/lxc/lxc_attach.c |   52 +-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index 6095b54..d39f5db 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -55,6 +56,8 @@ static const struct option my_longopts[] = {
{"arch", required_argument, 0, 'a'},
{"namespaces", required_argument, 0, 's'},
{"remount-sys-proc", no_argument, 0, 'R'},
+   {"uid", required_argument, 0, 'u'},
+   {"gid", required_argument, 0, 'g'},
LXC_COMMON_OPTIONS
 };
 
@@ -62,10 +65,13 @@ static int elevated_privileges = 0;
 static signed long new_personality = -1;
 static int namespace_flags = -1;
 static int remount_sys_proc = 0;
+static long requested_uid = -1;
+static long requested_gid = -1;
 
 static int my_parser(struct lxc_arguments* args, int c, char* arg)
 {
int ret;
+   char *endptr;
 
switch (c) {
case 'e': elevated_privileges = 1; break;
@@ -85,6 +91,24 @@ static int my_parser(struct lxc_arguments* args, int c, 
char* arg)
/* -s implies -e */
elevated_privileges = 1;
break;
+   case 'u':
+   endptr = NULL;
+   requested_uid = strtol(arg, &endptr, 10);
+   if (requested_uid < 0 || requested_uid == LONG_MAX ||
+   !endptr || *endptr || !*arg) {
+   lxc_error(args, "invalid user id specified: %s", arg);
+   return -1;
+   }
+   break;
+   case 'g':
+   endptr = NULL;
+   requested_gid = strtol(arg, &endptr, 10);
+   if (requested_gid < 0 || requested_gid == LONG_MAX ||
+   !endptr || *endptr || !*arg) {
+   lxc_error(args, "invalid group id specified: %s", arg);
+   return -1;
+   }
+   break;
}
 
return 0;
@@ -116,7 +140,10 @@ Options :\n\
 Remount /sys and /proc if not attaching to the\n\
 mount namespace when using -s in order to properly\n\
 reflect the correct namespace context. See the\n\
-lxc-attach(1) manual page for details.\n",
+lxc-attach(1) manual page for details.\n\
+  -u, --uid=UID setuid(UID) when entering the container\n\
+  -g, --gid=GID setgid(GID) when entering the container\n",
+  
.options  = my_longopts,
.parser   = my_parser,
.checker  = NULL,
@@ -425,6 +452,12 @@ int main(int argc, char *argv[])
 */
(void) lxc_attach_get_init_uidgid(&init_uid, &init_gid);
 
+   /* if the user whished for different credentials, use 
them */
+   if (requested_uid != -1)
+   init_uid = (uid_t) requested_uid;
+   if (requested_gid != -1)
+   init_gid = (gid_t) requested_gid;
+
/* try to set the uid/gid combination */
if (setgid(init_gid)) {
SYSERROR("switching to container gid");
@@ -434,6 +467,23 @@ int main(int argc, char *argv[])
SYSERROR("switching to container uid");
return -1;
}
+   } else {
+   /* by default, with no user namespaces, we don't need
+* setgid()/setuid(), but we should use them if 
explicitly
+* requested
+*/
+   if (requested_gid != -1) {
+   if (setgid((gid_t) requested_gid)) {
+   SYSERROR("switching to container gid");
+   return -1;
+   }
+   }
+   if (requested_uid != -1) {
+   if (setuid((uid_t) requested_uid)) {
+   SYSERROR("switching to container uid");
+   return -1;
+   }
+   }
}
 
if (my_args.argc) {
-- 
1.7.10.4


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free 

[lxc-devel] [PATCH 0/3] lxc-attach: Additional improvements

2013-03-04 Thread Christian Seiler
Hi,

I've attached three additional patches for possible improvements to
lxc-attach.

The first two I think should be applied directly, they do the
following:

  1) Create a sane fallback to /bin/sh if it is impossible to detect
 the container's shell because of incompatible nss implementations
 between host and container

  2) Detect the user & group id of PID 1 and use that for lxc-attach
 instead of root, when attaching to user namespaces.

The third patch I'm not really sure about the security implications of,
so I'm sending it as a draft, but somebody who knows more about the
specifics should look over it.

  3) Add -u and -g options to lxc-attach to allow the user to specify
 user and group ids to setuid()/setgid() to when attaching.

 This feature could be really useful, on the other hand, I have
 only ever used lxc running as root (never tried lxc-setcap), so I
 have no idea if this could pose a potential security problem or
 not. (When running as root, you have all the rights anyway, so
 then it's fine.) I'd like some feedback on this before I feel
 comfortable signing off on adding these options.

 Now if somebody tells me that attach is only possible as root
 anyway so far, then I don't have any qualms, but I'd rather be
 safe than sorry.

-- Christian


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 2/3] lxc-attach: User namespaces: Use init's user & group id when attaching

2013-03-04 Thread Christian Seiler
When attaching to a container with a user namespace, try to detect the
user and group ids of init via /proc and attach as that same user. Only
if that is unsuccessful, fall back to (0, 0).

Signed-off-by: Christian Seiler 
---
 src/lxc/attach.c |   53 ++
 src/lxc/attach.h |2 ++
 src/lxc/lxc_attach.c |   15 ++
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index af3d7a0..7845dda 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -275,3 +275,56 @@ int lxc_attach_drop_privs(struct lxc_proc_context_info 
*ctx)
 
return 0;
 }
+
+int lxc_attach_get_init_uidgid(uid_t* init_uid, gid_t* init_gid)
+{
+   FILE *proc_file;
+   char proc_fn[MAXPATHLEN];
+   char *line = NULL;
+   size_t line_bufsz = 0;
+   int ret;
+   long value = -1;
+   uid_t uid = (uid_t)-1;
+   gid_t gid = (gid_t)-1;
+
+   /* read capabilities */
+   snprintf(proc_fn, MAXPATHLEN, "/proc/%d/status", 1);
+
+   proc_file = fopen(proc_fn, "r");
+   if (!proc_file)
+   return -1;
+
+   while (getline(&line, &line_bufsz, proc_file) != -1) {
+   /* format is: real, effective, saved set user, fs
+* we only care about real uid
+*/
+   ret = sscanf(line, "Uid: %ld", &value);
+   if (ret != EOF && ret > 0) {
+   uid = (uid_t) value;
+   } else {
+   ret = sscanf(line, "Gid: %ld", &value);
+   if (ret != EOF && ret > 0)
+   gid = (gid_t) value;
+   }
+   if (uid != (uid_t)-1 && gid != (gid_t)-1)
+   break;
+   }
+
+   fclose(proc_file);
+   free(line);
+
+   /* only override arguments if we found something */
+   if (uid != (uid_t)-1)
+   *init_uid = uid;
+   if (gid != (gid_t)-1)
+   *init_gid = gid;
+
+   /* TODO: we should also parse supplementary groups and use
+* setgroups() to set them */
+
+   /* at least some entries were not found, we return error */
+   if (uid == (uid_t)-1 || gid == (gid_t)-1)
+   return -1;
+
+   return 0;
+}
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index 4d4f719..fc630e2 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -38,4 +38,6 @@ extern int lxc_attach_to_ns(pid_t other_pid, int which);
 extern int lxc_attach_remount_sys_proc();
 extern int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx);
 
+extern int lxc_attach_get_init_uidgid(uid_t* init_uid, gid_t* init_gid);
+
 #endif
diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index 292b5b5..6095b54 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -417,13 +417,20 @@ int main(int argc, char *argv[])
lxc_sync_fini(handler);
 
if (namespace_flags & CLONE_NEWUSER) {
-   /* XXX FIXME this should get the uid of the container 
init and setuid to that */
-   /* XXX FIXME or perhaps try to map in the lxc-attach 
caller's uid? */
-   if (setgid(0)) {
+   uid_t init_uid = 0;
+   gid_t init_gid = 0;
+
+   /* ignore errors, we will fall back to root in that case
+* (/proc could be not mounted etc.)
+*/
+   (void) lxc_attach_get_init_uidgid(&init_uid, &init_gid);
+
+   /* try to set the uid/gid combination */
+   if (setgid(init_gid)) {
SYSERROR("switching to container gid");
return -1;
}
-   if (setuid(0)) {
+   if (setuid(init_uid)) {
SYSERROR("switching to container uid");
return -1;
}
-- 
1.7.10.4


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] lxc-attach: NSS handling

2013-03-04 Thread Christian Seiler
Hi there,

I've run into the problem multiple times now that lxc-attach can't
detect the default shell of my current user properly, since the NSS
implementations of host and container are incompatible.

One of the patches I just sent to the list mitigates that by having a
fallback - use /bin/sh. The only trouble is that calling any modern
shell as /bin/sh will usually not result in a very user-friendly interface.

So my idea would actually be to introduce an additional fallback: glibc
comes with an additional binary getent(1) that allows one to query the
NSS directly. If getpwuid() doesn't work directly, lxc-attach could
spawn "getent passwd %d" and parse the output to figure out the correct
login shell of the user. That will also not work in all cases, but then
we may still fall back on /bin/sh as a last resort.

Do you think implementing that is worthwhile?

-- Christian

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] Fwd: Re: lxc-attach: NSS handling

2013-03-04 Thread Christian Seiler
Sorry, I forgot to post to the list...

 Original-Nachricht 
Betreff: Re: [lxc-devel] lxc-attach: NSS handling
Datum: Tue, 05 Mar 2013 00:01:55 +0100
Von: Christian Seiler 
An: Serge Hallyn 

Hi Serge,

> So if you resend the patchset, I'd suggest this patch first, the
> /bin/sh as default one second, setuids ones next...

I've implemented the use of 'getent', which now makes my life a LOT
easier (I have quite a few containers lying around with incompatible nss
versions) and then manually rebased the previous patches.

> (Btw, do you have a github tree?  Reviewing/acking patches is easier
> on the list, but for actually pushing patches to staging, going from
> github tree is much nicer)

I've pushed my patches to:
https://github.com/chris-se/lxc/tree/attach-fixes-1

I've excluded the -u/-g patch for now (I realized that I should probably
include a man page update anyway), but the rest is in there.

Do you want me to send a pull request?

Regards,
Christian

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 2/3] lxc-attach: Default to /bin/sh if shell cannot be determined or exec'd

2013-03-05 Thread Christian Seiler
If getpwuid() fails and also the fallback of spawning of a 'getent'
process, and the user specified no command to execute, default to
/bin/sh and only fail if even that is not available. This should ensure
that unless the container is *really* weird, no matter what, the user
should always end up with a shell when calling lxc-attach with no
further arguments.
---
 src/lxc/lxc_attach.c |   16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index d84c3d8..9c86ffe 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -449,15 +449,21 @@ int main(int argc, char *argv[])
if (!passwd)
passwd = lxc_attach_getpwuid(uid);
 
-   if (!passwd) {
-   SYSERROR("failed to get passwd "\
-"entry for uid '%d'", uid);
-   return -1;
+   if (passwd) {
+   char *const args[] = {
+   passwd->pw_shell,
+   NULL,
+   };
+
+   (void) execvp(args[0], args);
}
 
+   /* executed if either no passwd entry or execvp fails,
+* we will fall back on /bin/sh as a default shell
+*/
{
char *const args[] = {
-   passwd->pw_shell,
+   "/bin/sh",
NULL,
};
 
-- 
1.7.10.4


--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/3] lxc-attach: Try really hard to determine login shell

2013-03-05 Thread Christian Seiler
If no command is specified, and using getpwuid() to determine the login
shell fails, try to spawn a process that executes the utility 'getent'.
getpwuid() may fail because of incompatibilities between the NSS
implementations on the host and in the container.

Signed-off-by: Christian Seiler 
---
 src/lxc/attach.c |  204 ++
 src/lxc/attach.h |3 +
 src/lxc/lxc_attach.c |   11 +++
 3 files changed, 218 insertions(+)

diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index af3d7a0..88356e1 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -32,7 +32,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #if !HAVE_DECL_PR_CAPBSET_DROP
 #define PR_CAPBSET_DROP 24
@@ -275,3 +277,205 @@ int lxc_attach_drop_privs(struct lxc_proc_context_info 
*ctx)
 
return 0;
 }
+
+struct passwd *lxc_attach_getpwuid(uid_t uid)
+{
+   /* static variables for result, we assume that
+* 256 is large enough to hold the information
+* on username, passwd and gecos. for paths we
+* use MAXPATHLEN instead
+*/
+   static char pwstruct_name[256];
+   static char pwstruct_passwd[256];
+   static char pwstruct_gecos[256];
+   static char pwstruct_dir[MAXPATHLEN];
+   static char pwstruct_shell[MAXPATHLEN];
+   static struct passwd result = {
+   .pw_name = pwstruct_name,
+   .pw_passwd = pwstruct_passwd,
+   .pw_uid = (uid_t) -1,
+   .pw_gid = (gid_t) -1,
+   .pw_gecos = pwstruct_gecos,
+   .pw_dir = pwstruct_dir,
+   .pw_shell = pwstruct_shell,
+   };
+
+   /* local variables */
+   pid_t pid;
+   int pipes[2];
+   int ret;
+   int fd;
+
+   /* we need to fork off a process that runs the
+* getent program, and we need to capture its
+* output, so we use a pipe for that purpose
+*/
+   ret = pipe(pipes);
+   if (ret < 0)
+   return NULL;
+
+   pid = fork();
+   if (pid < 0) {
+   close(pipes[0]);
+   close(pipes[1]);
+   return NULL;
+   }
+
+   if (pid) {
+   /* parent process */
+   FILE *pipe_f;
+   char *line = NULL;
+   size_t line_bufsz = 0;
+   int found = 0;
+   int status;
+
+   close(pipes[1]);
+
+   pipe_f = fdopen(pipes[0], "r");
+   while (getline(&line, &line_bufsz, pipe_f) != -1) {
+   char *token;
+   char *saveptr = NULL;
+   long value;
+   char *endptr = NULL;
+   int i;
+
+   /* if we already found something, just continue
+* to read until the pipe doesn't deliver any more
+* data, but don't modify the existing data
+* structure
+*/
+   if (found)
+   continue;
+
+   /* trim line on the right hand side */
+   for (i = strlen(line); line && i > 0 && (line[i - 1] == 
'\n' || line[i - 1] == '\r'); --i)
+   line[i - 1] = '\0';
+
+   /* split into tokens: first user name */
+   token = strtok_r(line, ":", &saveptr);
+   if (!token)
+   continue;
+   snprintf(pwstruct_name, sizeof(pwstruct_name), "%s", 
token);
+
+   /* next: dummy password field */
+   token = strtok_r(NULL, ":", &saveptr);
+   if (!token)
+   continue;
+   snprintf(pwstruct_passwd, sizeof(pwstruct_passwd), 
"%s", token);
+
+   /* next: user id */
+   token = strtok_r(NULL, ":", &saveptr);
+   value = token ? strtol(token, &endptr, 10) : 0;
+   if (!token || !endptr || *endptr || value == LONG_MIN 
|| value == LONG_MAX)
+   continue;
+   result.pw_uid = (uid_t) value;
+   /* dummy sanity check: user id matches */
+   if (result.pw_uid != uid)
+   continue;
+
+   /* next: gid */
+   token = strtok_r(NULL, ":", &saveptr);
+   value = token ? strtol(token, &endptr, 10) : 0;
+   if (!token || !endptr || *endptr || value == LONG_MIN 
|| value == LONG_MAX)
+   continue;
+

[lxc-devel] [PATCH 3/3] lxc-attach: User namespaces: Use init's user & group id when attaching

2013-03-05 Thread Christian Seiler
When attaching to a container with a user namespace, try to detect the
user and group ids of init via /proc and attach as that same user. Only
if that is unsuccessful, fall back to (0, 0).
---
 src/lxc/attach.c |   53 ++
 src/lxc/attach.h |2 ++
 src/lxc/lxc_attach.c |   15 ++
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index 88356e1..399ec36 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -479,3 +479,56 @@ struct passwd *lxc_attach_getpwuid(uid_t uid)
exit(-1);
}
 }
+
+int lxc_attach_get_init_uidgid(uid_t* init_uid, gid_t* init_gid)
+{
+   FILE *proc_file;
+   char proc_fn[MAXPATHLEN];
+   char *line = NULL;
+   size_t line_bufsz = 0;
+   int ret;
+   long value = -1;
+   uid_t uid = (uid_t)-1;
+   gid_t gid = (gid_t)-1;
+
+   /* read capabilities */
+   snprintf(proc_fn, MAXPATHLEN, "/proc/%d/status", 1);
+
+   proc_file = fopen(proc_fn, "r");
+   if (!proc_file)
+   return -1;
+
+   while (getline(&line, &line_bufsz, proc_file) != -1) {
+   /* format is: real, effective, saved set user, fs
+* we only care about real uid
+*/
+   ret = sscanf(line, "Uid: %ld", &value);
+   if (ret != EOF && ret > 0) {
+   uid = (uid_t) value;
+   } else {
+   ret = sscanf(line, "Gid: %ld", &value);
+   if (ret != EOF && ret > 0)
+   gid = (gid_t) value;
+   }
+   if (uid != (uid_t)-1 && gid != (gid_t)-1)
+   break;
+   }
+
+   fclose(proc_file);
+   free(line);
+
+   /* only override arguments if we found something */
+   if (uid != (uid_t)-1)
+   *init_uid = uid;
+   if (gid != (gid_t)-1)
+   *init_gid = gid;
+
+   /* TODO: we should also parse supplementary groups and use
+* setgroups() to set them */
+
+   /* at least some entries were not found, we return error */
+   if (uid == (uid_t)-1 || gid == (gid_t)-1)
+   return -1;
+
+   return 0;
+}
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index 90e693a..1e9b87e 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -41,4 +41,6 @@ extern int lxc_attach_drop_privs(struct lxc_proc_context_info 
*ctx);
 struct passwd;
 extern struct passwd *lxc_attach_getpwuid(uid_t uid);
 
+extern int lxc_attach_get_init_uidgid(uid_t* init_uid, gid_t* init_gid);
+
 #endif
diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index 9c86ffe..cdc1601 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -417,13 +417,20 @@ int main(int argc, char *argv[])
lxc_sync_fini(handler);
 
if (namespace_flags & CLONE_NEWUSER) {
-   /* XXX FIXME this should get the uid of the container 
init and setuid to that */
-   /* XXX FIXME or perhaps try to map in the lxc-attach 
caller's uid? */
-   if (setgid(0)) {
+   uid_t init_uid = 0;
+   gid_t init_gid = 0;
+
+   /* ignore errors, we will fall back to root in that case
+* (/proc was not mounted etc.)
+*/
+   (void) lxc_attach_get_init_uidgid(&init_uid, &init_gid);
+
+   /* try to set the uid/gid combination */
+   if (setgid(init_gid)) {
SYSERROR("switching to container gid");
return -1;
}
-   if (setuid(0)) {
+   if (setuid(init_uid)) {
SYSERROR("switching to container uid");
return -1;
}
-- 
1.7.10.4


--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 0/3] lxc-attach improvements

2013-03-05 Thread Christian Seiler
Hi Serge,

here are my patches as emails to the lxc-devel list. The first patch
implements the getent(1) logic for trying much harder to determine the
correct login shell of the requested user (but only if getpwuid(3)
fails), the second patch uses /bin/sh as a fallback if even that fails
and the third patch tries to detect the user & group id of init when
attaching to a user namespace.

The patches can be found in the attach-fixes-1 branch over at github


-- Christian


--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 3/3] lxc-attach: User namespaces: Use init's user & group id when attaching

2013-03-06 Thread Christian Seiler
Hi Serge,

> But also...  you don't actually re-try with init_gid/init_uid of 0.
> If lxc_attach_get_init_uidgid() set one of those to -1, then you'll
> just fail here.

No, because lxc_attach_get_init_uidgid doesn't modify them, see the code
below the comment "only override arguments if we found something".

But yes, I can make it return void instead if you like - I just thought
that if somebody else wanted to use it for some other purpose at some
later point in time, it would be nice to provide some feedback.

-- Christian

--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [PATCH 1/3] lxc-attach: Try really hard to determine login shell

2013-03-06 Thread Christian Seiler
Hi Serge,

> Actually, I think it would be better to have lxc_attach_getpwuid()
> become lxc_attach_getpwshell(), and change the caller a bit.
> Would shorten up the code quite a bit.  What do you think?

Ok, will do.

-- Christian

--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH v2 0/3] lxc-attach improvements

2013-03-06 Thread Christian Seiler
Hi Serge,

here are the updated versions of my patches (I also have to resend the
second one due to minor changes in the code from the first patch) that
implement the changes you requested.

They can be found in the branch attach-fixes-1-v2 at github,


-- Christian


--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 1/3] lxc-attach: Try really hard to determine login shell

2013-03-06 Thread Christian Seiler
If no command is specified, and using getpwuid() to determine the login
shell fails, try to spawn a process that executes the utility 'getent'.
getpwuid() may fail because of incompatibilities between the NSS
implementations on the host and in the container.

Signed-off-by: Christian Seiler 
---
 src/lxc/attach.c |  154 ++
 src/lxc/attach.h |2 +
 src/lxc/lxc_attach.c |   18 +-
 3 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index af3d7a0..d1b3b0a 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -32,7 +32,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #if !HAVE_DECL_PR_CAPBSET_DROP
 #define PR_CAPBSET_DROP 24
@@ -275,3 +277,155 @@ int lxc_attach_drop_privs(struct lxc_proc_context_info 
*ctx)
 
return 0;
 }
+
+char *lxc_attach_getpwshell(uid_t uid)
+{
+   /* local variables */
+   pid_t pid;
+   int pipes[2];
+   int ret;
+   int fd;
+   char *result = NULL;
+
+   /* we need to fork off a process that runs the
+* getent program, and we need to capture its
+* output, so we use a pipe for that purpose
+*/
+   ret = pipe(pipes);
+   if (ret < 0)
+   return NULL;
+
+   pid = fork();
+   if (pid < 0) {
+   close(pipes[0]);
+   close(pipes[1]);
+   return NULL;
+   }
+
+   if (pid) {
+   /* parent process */
+   FILE *pipe_f;
+   char *line = NULL;
+   size_t line_bufsz = 0;
+   int found = 0;
+   int status;
+
+   close(pipes[1]);
+
+   pipe_f = fdopen(pipes[0], "r");
+   while (getline(&line, &line_bufsz, pipe_f) != -1) {
+   char *token;
+   char *saveptr = NULL;
+   long value;
+   char *endptr = NULL;
+   int i;
+
+   /* if we already found something, just continue
+* to read until the pipe doesn't deliver any more
+* data, but don't modify the existing data
+* structure
+*/
+   if (found)
+   continue;
+
+   /* trim line on the right hand side */
+   for (i = strlen(line); line && i > 0 && (line[i - 1] == 
'\n' || line[i - 1] == '\r'); --i)
+   line[i - 1] = '\0';
+
+   /* split into tokens: first user name */
+   token = strtok_r(line, ":", &saveptr);
+   if (!token)
+   continue;
+   /* next: dummy password field */
+   token = strtok_r(NULL, ":", &saveptr);
+   if (!token)
+   continue;
+   /* next: user id */
+   token = strtok_r(NULL, ":", &saveptr);
+   value = token ? strtol(token, &endptr, 10) : 0;
+   if (!token || !endptr || *endptr || value == LONG_MIN 
|| value == LONG_MAX)
+   continue;
+   /* dummy sanity check: user id matches */
+   if ((uid_t) value != uid)
+   continue;
+   /* skip fields: gid, gecos, dir, go to next field 
'shell' */
+   for (i = 0; i < 4; i++) {
+   token = strtok_r(NULL, ":", &saveptr);
+   if (!token)
+   break;
+   }
+   if (!token)
+   continue;
+   result = strdup(token);
+
+   /* sanity check that there are no fields after that */
+   token = strtok_r(NULL, ":", &saveptr);
+   if (token)
+   continue;
+
+   found = 1;
+   }
+
+   free(line);
+   fclose(pipe_f);
+   again:
+   if (waitpid(pid, &status, 0) < 0) {
+   if (errno == EINTR)
+   goto again;
+   return NULL;
+   }
+
+   /* some sanity checks: if anything even hinted at going
+* wrong: we can't be sure we have a valid result, so
+* we assume we don't
+*/
+
+   if (!WIFEXITED(status))
+   return NULL;
+

[lxc-devel] [PATCH 3/3] lxc-attach: User namespaces: Use init's user & group id when attaching

2013-03-06 Thread Christian Seiler
When attaching to a container with a user namespace, try to detect the
user and group ids of init via /proc and attach as that same user. Only
if that is unsuccessful, fall back to (0, 0).

Signed-off-by: Christian Seiler 
---
 src/lxc/attach.c |   47 +++
 src/lxc/attach.h |2 ++
 src/lxc/lxc_attach.c |   15 +++
 3 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/src/lxc/attach.c b/src/lxc/attach.c
index d1b3b0a..c74faf1 100644
--- a/src/lxc/attach.c
+++ b/src/lxc/attach.c
@@ -429,3 +429,50 @@ char *lxc_attach_getpwshell(uid_t uid)
exit(-1);
}
 }
+
+void lxc_attach_get_init_uidgid(uid_t* init_uid, gid_t* init_gid)
+{
+   FILE *proc_file;
+   char proc_fn[MAXPATHLEN];
+   char *line = NULL;
+   size_t line_bufsz = 0;
+   int ret;
+   long value = -1;
+   uid_t uid = (uid_t)-1;
+   gid_t gid = (gid_t)-1;
+
+   /* read capabilities */
+   snprintf(proc_fn, MAXPATHLEN, "/proc/%d/status", 1);
+
+   proc_file = fopen(proc_fn, "r");
+   if (!proc_file)
+   return;
+
+   while (getline(&line, &line_bufsz, proc_file) != -1) {
+   /* format is: real, effective, saved set user, fs
+* we only care about real uid
+*/
+   ret = sscanf(line, "Uid: %ld", &value);
+   if (ret != EOF && ret > 0) {
+   uid = (uid_t) value;
+   } else {
+   ret = sscanf(line, "Gid: %ld", &value);
+   if (ret != EOF && ret > 0)
+   gid = (gid_t) value;
+   }
+   if (uid != (uid_t)-1 && gid != (gid_t)-1)
+   break;
+   }
+
+   fclose(proc_file);
+   free(line);
+
+   /* only override arguments if we found something */
+   if (uid != (uid_t)-1)
+   *init_uid = uid;
+   if (gid != (gid_t)-1)
+   *init_gid = gid;
+
+   /* TODO: we should also parse supplementary groups and use
+* setgroups() to set them */
+}
diff --git a/src/lxc/attach.h b/src/lxc/attach.h
index f448b1e..ef5d87b 100644
--- a/src/lxc/attach.h
+++ b/src/lxc/attach.h
@@ -40,4 +40,6 @@ extern int lxc_attach_drop_privs(struct lxc_proc_context_info 
*ctx);
 
 extern char *lxc_attach_getpwshell(uid_t uid);
 
+extern void lxc_attach_get_init_uidgid(uid_t* init_uid, gid_t* init_gid);
+
 #endif
diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index 711e1de..25cc2d5 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -418,13 +418,20 @@ int main(int argc, char *argv[])
lxc_sync_fini(handler);
 
if (namespace_flags & CLONE_NEWUSER) {
-   /* XXX FIXME this should get the uid of the container 
init and setuid to that */
-   /* XXX FIXME or perhaps try to map in the lxc-attach 
caller's uid? */
-   if (setgid(0)) {
+   uid_t init_uid = 0;
+   gid_t init_gid = 0;
+
+   /* ignore errors, we will fall back to root in that case
+* (/proc was not mounted etc.)
+*/
+   lxc_attach_get_init_uidgid(&init_uid, &init_gid);
+
+   /* try to set the uid/gid combination */
+   if (setgid(init_gid)) {
SYSERROR("switching to container gid");
return -1;
}
-   if (setuid(0)) {
+   if (setuid(init_uid)) {
SYSERROR("switching to container uid");
return -1;
}
-- 
1.7.10.4


--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [PATCH 2/3] lxc-attach: Default to /bin/sh if shell cannot be determined or exec'd

2013-03-06 Thread Christian Seiler
If getpwuid() fails and also the fallback of spawning of a 'getent'
process, and the user specified no command to execute, default to
/bin/sh and only fail if even that is not available. This should ensure
that unless the container is *really* weird, no matter what, the user
should always end up with a shell when calling lxc-attach with no
further arguments.

Signed-off-by: Christian Seiler 
---
 src/lxc/lxc_attach.c |   16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
index c129eb0..711e1de 100644
--- a/src/lxc/lxc_attach.c
+++ b/src/lxc/lxc_attach.c
@@ -452,15 +452,21 @@ int main(int argc, char *argv[])
 else
 user_shell = passwd->pw_shell;
 
-   if (!user_shell) {
-   SYSERROR("failed to get passwd "\
-"entry for uid '%d'", uid);
-   return -1;
+if (user_shell) {
+   char *const args[] = {
+   user_shell,
+   NULL,
+   };
+
+   (void) execvp(args[0], args);
}
 
+   /* executed if either no passwd entry or execvp fails,
+* we will fall back on /bin/sh as a default shell
+*/
{
char *const args[] = {
-   user_shell,
+   "/bin/sh",
NULL,
};
 
-- 
1.7.10.4


--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


  1   2   3   >