Re: [lxc-devel] RFC: Device Namespaces

2013-09-29 Thread Amir Goldstein
On Thu, Sep 26, 2013 at 12:47 AM, Eric W. Biederman
wrote:

> Jeremy Andrus  writes:
>
> > On Sep 25, 2013, at 4:23 PM, Eric W. Biederman 
> wrote:
> >
> >> Janne Karhunen  writes:
> >>
> >>> That being said, is there a valid reason why binder is part of device
> >>> namespace here instead of IPC?
> >>
> >> I think the practical issue with binder was simply that binder only
> >> allows for a single instance and thus is does not play nicely with
> >> containers.
> >
> > It's true that there was a singleton paradigm in binder that had to be
> > overcome, but I agree with Janne. It really belongs in the IPC namespace,
> > and I don't see any technical reason not to move it there.
>
> *Blink* I missed the IPC namespace suggestion.
>
> The IPC namespace sounds reasonable.
>

Binder rewrite for IPC namespace is in the works (by Oren)
We discussed this with Greg and adding namespace support to binder (in
staging) seemed reasonable to him as well.


> Of course binder is still in staging because it has implementation and
> ABI problems.  Little things like a 64bit kernel and a 32bit userspace
> don't work particularly well.  So while fixing those problems it might
> be possible to fix the single container problem as well.  It would be a
> weird direction for cleanup of binder to come from but I don't see why
> that wouldn't work.
>
> Personally until binder is out of staging it seems reasonable to push
> for an API that sucks less, or for a more general solution that Androdid
> could use instead of binder.
>
> One of the uses of namespaces is to clean up after problematic kernel
> design decisions.  If we still have the option I would rather fix the
> problems than clean up after them.
>
> Eric
>
> ___
> Containers mailing list
> contain...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] RFC: Device Namespaces

2013-09-29 Thread Amir Goldstein
On Wed, Sep 25, 2013 at 11:13 PM, Serge Hallyn wrote:

> Quoting Michael J Coss (michael.c...@alcatel-lucent.com):
> > I've been looking at this problem for some time to help solve my very
> > specific use case.   In our case we are using containers to provide
> > individual "desktops" to a number of users.  We want the desktop to run
> > X, and bind and unbind a display, keyboard, mouse to that X server
> > running in a particular container, and not be able to grab anyone elses
> > keyboard, mouse or display unless granted specific access to that from
> > the owern.  To that end, I started worked on a udev solution.  I
> > understand that most containers don't/won't run udev.  And systemd won't
> > even start udev if the container doesn't have the mknod capability which
> > is a kinda odd cookie but I digress.
> >
> > Currently the kernel effectively broadcasts uevents to all network
> > namespaces, and this is an issue.  I don't want container A to see
> > container B's events.  It should see only what the admin has set for the
> > policy for that container.  This policy should be handled on the host
> > for the containers in userspace.  This deamon can get the events, and
> > then forward to the appropriate container(s) those events that are
> > pertinent, and disregard the rest.  To accomplish this, I had to change
> > the broadcast mechanism, and then provide a forwarding mechanism to
> > specific network namespaces.
> >
> > Back in the day, that would have been sufficient.  Udev running in the
> > container would have gotten the add event, and created the appropriate
> > devices and symlinks, and then cleaned up on remove/change events.  With
> > the introduction of devtmpfs, udev no longer actually creates the device
> > nodes.  It just handles links and name changes.   So, I'm still left
> > with needing to create/manage devtmpfs or some other solution.  This
> > leads me down the path of virtualizing devtmpfs.  I've been fooling
> > around with FUSE, to basically mirror the host /dev (filtered
>
> Rather than using FUSE, I'd recommend looking into doing it the same
> way as the devpts fs.  Might not work out (or be rejected) in the end,
> but at first glance it seems the right way to handle it.  So each new
> instance mount starts empty, changes in one are not reflected in
> another, but new devices which the kernel later creates may (subject
> to device cgroup of the process which mounted it?) be created in the
> new instances.
>

I was thinking it makes sense to tie unique instances of devtmpfs sb to
userns.
If not for any other reason, for the fact that any mount sb already has the
knowledge
of the userns that mounted it.
But also, I think devtmpfs needs to be userns friendly, so it can safely
get the FS_USERNS_DEV_MOUNT flag.



> > appropriately), but there are many ugly security, and implementation
> > details that look bad to me.  I have been kicking around the notion that
> > the device cgroup might provide the list of "acceptable" devices, and
> > construct a filter/view for devtmpfs based on that.
> >
> > I do have these changes working on a mostly stock 3.10 kernel,  the
> > kernel changes are pretty small, and the deamon does a pretty minimal
> > filtering mostly to demonstrate functionality.  It does assume that the
> > containers are running in a separate network namespace, but that's about
> it.
> >
> > Of course, that still leaves you with sysfs needing similar treatment.
> >
> > ---Michael J Coss
> >
> >
> --
> > October Webinars: Code for Performance
> > Free Intel webinars can help you accelerate application performance.
> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> > the latest Intel processors and coprocessors. See abstracts and register
> >
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
> > ___
> > Lxc-devel mailing list
> > Lxc-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/lxc-devel
>
>
> --
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
> ___
> Lxc-devel mailing list
> Lxc-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lxc-devel
>
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel pr

Re: [lxc-devel] Device Namespaces

2013-09-29 Thread Amir Goldstein
On Thu, Sep 26, 2013 at 8:33 AM, Greg Kroah-Hartman <
gre...@linuxfoundation.org> wrote:

> On Wed, Sep 25, 2013 at 02:34:54PM -0700, Eric W. Biederman wrote:
> > So the big issues for a device namespace to solve are filtering which
> > devices a container has access to and being able to dynamically change
> > which devices those are at run time (aka hotplug).
>
> As _all_ devices are hotpluggable now (look, there's no CONFIG_HOTPLUG
> anymore, because it was redundant), I think you need to really think
> this through better (pci, memory, cpus, etc.) before you do anything in
> the kernel.
>
> > After having thought about this for a bit I don't know if a pure
> > userspace solution is sufficient or actually a good idea.
> >
> > - We can manually manage a tmpfs with device nodes in userspace.
> >   (But that is deprecated functionality in the mainstream kernel).
>
> Yes, but I'm not going to namespace devtmpfs, as that is going to be an
> impossible task, right?
>

That sounds like a challenge ;-)
Seriously, as Serge correctly noted, it would not be that different from
devpts
if you start from an empty devtmpfs and populate it with devices that are
"added
in the context of that namespace".
The semantics in which devices are "added in the context of a namespace"
is the missing piece of the puzzle.

What we really like to see is a setns() style API that can be used to
add a device in the context of a namespace in either a "shared" or "private"
mode.
This kind of API is a required building block for us to write device drivers
that are namespace aware in a way that userspace will have enough
flexibility
for dynamic configuration.

We are trying to come up with a proposal for that sort of API.
When we have something decent, we shall post it.


> And remember, udev doesn't create device nodes anymore...
>
> > - We can manually export a subset of sysfs with bind mounts.
> >   (But that feels hacky, and is essentially incompatible with hotplug).
>
> True.
>
> > - We can relay a call of /sbin/hotplug from outside of a container
> >   to inside of a container based on policy.
> >   (But no one uses /sbin/hotplug anymore).
>
> That's right, they should be listening to libudev events, so why can't
> your daemon shuffle them off to the proper container, all in userspace?
>
> > - There is no way to fake netlink uevents for a container to see them.
> >   (The best we could do is replace udev everywhere with something that
> >listens on a unix domain socket).
>
> You shouldn't need to do this.
>
> > - It would be nice to replace the device cgroup with a comprehensive
> >   solution that really works. (Among other things the device cgroup
> >   does not work in terms of struct device the underlying kernel
> >   abstraction for devices).
>
> I didn't even know there was a device cgroup.
>
> Which means that if there is one, odds are it's useless.
>
> > We must manage sysfs entries as well device nodes because:
> > - Seeing more than we should has the real potential to confuse
> >   userspace, especially a userspace that replays uevents.
>
> You should never replay uevents.  If you don't do that, why can't you
> see all of sysfs?
>
> > - Some device control must happens through writing to sysfs files and
> >   if we don't remove all root privileges from a container only by
> >   exporting a subset of sysfs to that container can we limit which
> >   sysfs nodes can be written to.
>
> But you have the issue of controlling devices in a "shared" way, which
> isn't going to be usable for almost all devices.
>
> > The current kernel tagged sysfs entry support does not look like a good
> > match for the impelementing device filtering.   The common case will
> > be allowing devices like /dev/zero, and /dev/null that live in
> > /sys/devices/virtual and are the devices we are most likely to care
> > about.  Those devices need to live in multiple device namespaces so
> > everyone can use them.  Perhaps exclusive assignment will be the more
> > common paradigm for device namespaces like it is for network devices in
> > the network namespace but from what little I can of this problem right
> now I
> > don't think so.
> >
> > I definitely think we should hold off on a kernel level implementation
> > until we really understand the issues and are ready to implement device
> > namespaces correctly.
>
> I agree, especially as I don't think this will ever work.
>
> > A userspace implementation looks like it can only do about 95% of what
> > is really needed, but at the same time looks like an easy way to
> > experiment until the problem is sufficiently well understood.
>
> 95% is probably way better than what you have today, and will fit the
> needs of almost everyone today, so why not do it?
>
> I'd argue that those last 5% either are custom solutions that never get
> merged, or candidates for true virtulization.
>
> > In summary the situation with device hoptlug and containers sucks today,
> > and we need to do something.  Running a linu

Re: [lxc-devel] Device Namespaces

2013-09-29 Thread Greg Kroah-Hartman
On Sun, Sep 29, 2013 at 10:28:55PM +0300, Amir Goldstein wrote:
> 
> 
> 
> On Thu, Sep 26, 2013 at 8:33 AM, Greg Kroah-Hartman 
>  > wrote:
> 
> On Wed, Sep 25, 2013 at 02:34:54PM -0700, Eric W. Biederman wrote:
> > So the big issues for a device namespace to solve are filtering which
> > devices a container has access to and being able to dynamically change
> > which devices those are at run time (aka hotplug).
> 
> As _all_ devices are hotpluggable now (look, there's no CONFIG_HOTPLUG
> anymore, because it was redundant), I think you need to really think
> this through better (pci, memory, cpus, etc.) before you do anything in
> the kernel.
> 
> > After having thought about this for a bit I don't know if a pure
> > userspace solution is sufficient or actually a good idea.
> >
> > - We can manually manage a tmpfs with device nodes in userspace.
> >   (But that is deprecated functionality in the mainstream kernel).
> 
> Yes, but I'm not going to namespace devtmpfs, as that is going to be an
> impossible task, right?
> 
> 
> That sounds like a challenge ;-)
> Seriously, as Serge correctly noted, it would not be that different from 
> devpts
> if you start from an empty devtmpfs and populate it with devices that are
> "added in the context of that namespace".  The semantics in which
> devices are "added in the context of a namespace" is the missing piece
> of the puzzle.

And the fact that these devices are almost all created before userspace
starts up, is a non-trivial "piece of the puzzle" :)

Good luck,

greg k-h

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] [lxc/lxc] fe218c: Fix crasher in get_ips

2013-09-29 Thread GitHub
  Branch: refs/heads/master
  Home:   https://github.com/lxc/lxc
  Commit: fe218ca38358dd69dd51fca6433088ac631d6240
  https://github.com/lxc/lxc/commit/fe218ca38358dd69dd51fca6433088ac631d6240
  Author: Stéphane Graber 
  Date:   2013-09-29 (Sun, 29 Sep 2013)

  Changed paths:
M src/lxc/lxccontainer.c

  Log Message:
  ---
  Fix crasher in get_ips

Check that the interface structure is not NULL before trying to access
its members.

Signed-off-by: Stéphane Graber 



--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel