I've been looking at this problem for some time to help solve my very specific use case. In our case we are using containers to provide individual "desktops" to a number of users. We want the desktop to run X, and bind and unbind a display, keyboard, mouse to that X server running in a particular container, and not be able to grab anyone elses keyboard, mouse or display unless granted specific access to that from the owern. To that end, I started worked on a udev solution. I understand that most containers don't/won't run udev. And systemd won't even start udev if the container doesn't have the mknod capability which is a kinda odd cookie but I digress.
Currently the kernel effectively broadcasts uevents to all network namespaces, and this is an issue. I don't want container A to see container B's events. It should see only what the admin has set for the policy for that container. This policy should be handled on the host for the containers in userspace. This deamon can get the events, and then forward to the appropriate container(s) those events that are pertinent, and disregard the rest. To accomplish this, I had to change the broadcast mechanism, and then provide a forwarding mechanism to specific network namespaces. Back in the day, that would have been sufficient. Udev running in the container would have gotten the add event, and created the appropriate devices and symlinks, and then cleaned up on remove/change events. With the introduction of devtmpfs, udev no longer actually creates the device nodes. It just handles links and name changes. So, I'm still left with needing to create/manage devtmpfs or some other solution. This leads me down the path of virtualizing devtmpfs. I've been fooling around with FUSE, to basically mirror the host /dev (filtered appropriately), but there are many ugly security, and implementation details that look bad to me. I have been kicking around the notion that the device cgroup might provide the list of "acceptable" devices, and construct a filter/view for devtmpfs based on that. I do have these changes working on a mostly stock 3.10 kernel, the kernel changes are pretty small, and the deamon does a pretty minimal filtering mostly to demonstrate functionality. It does assume that the containers are running in a separate network namespace, but that's about it. Of course, that still leaves you with sysfs needing similar treatment. ---Michael J Coss ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk _______________________________________________ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel