Am 15.05.2014 22:26, schrieb Serge E. Hallyn: > Quoting Richard Weinberger (rich...@nod.at): >> Am 15.05.2014 21:50, schrieb Serge Hallyn: >>> Quoting Richard Weinberger (richard.weinber...@gmail.com): >>>> On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman >>>> <gre...@linuxfoundation.org> wrote: >>>>> Then don't use a container to build such a thing, or fix the build >>>>> scripts to not do that :) >>>> >>>> I second this. >>>> To me it looks like some folks try to (ab)use Linux containers >>>> for purposes where KVM would much better fit in. >>>> Please don't put more complexity into containers. They are already >>>> horrible complex >>>> and error prone. >>> >>> I, naturally, disagree :) The only use case which is inherently not >>> valid for containers is running a kernel. Practically speaking there >>> are other things which likely will never be possible, but if someone >>> offers a way to do something in containers, "you can't do that in >>> containers" is not an apropos response. >>> >>> "That abstraction is wrong" is certainly valid, as when vpids were >>> originally proposed and rejected, resulting in the development of >>> pid namespaces. "We have to work out (x) first" can be valid (and >>> I can think of examples here), assuming it's not just trying to hide >>> behind a catch-22/chicken-egg problem. >>> >>> Finally, saying "containers are complex and error prone" is conflating >>> several large suites of userspace code and many kernel features which >>> support them. Being more precise would, if the argument is valid, >>> lend it a lot more weight. >> >> We (my company) use Linux containers since 2011 in production. First LXC, >> now libvirt-lxc. >> To understand the internals better I also wrote my own userspace to >> create/start >> containers. There are so many things which can hurt you badly. >> With user namespaces we expose a really big attack surface to regular users. >> I.e. Suddenly a user is allowed to mount filesystems. > > That is currently not the case. They can mount some virtual filesystems > and do bind mounts, but cannot mount most real filesystems. This keeps > us protected (for now) from potentially unsafe superblock readers in the > kernel.
Yeah, I meant not only "real" filesystems. I had VFS issues in mind where an attacker could do bad things using bind mounts for example. >> Ask Andy, he found already lots of nasty things... > > Yes, of course, and there may be more to come... > >> I agree that user namespaces are the way to go, all the papering with LSM >> over security issues is much worse. >> But we have to make sure that we don't add too much features too fast. > > Agreed. Like I said, 'we have to work (x) out first' could be valid, > including 'we should wait (a year?) for user ns issues to fall out > before relaxing any of the current user ns constraints." > > On the other hand, not exercising the new code may only mean that > existing flaws stick around longer, undetected (by most). Fair point. >> That said, I like containers a lot because they are cheap but as they are >> lightweight >> also therefore also isolation level is lightweight. >> IMHO containers are not a cheap replacement for KVM. > > The building blocks for containers can also be used for entirely > new, simpler use cases - i.e. perhaps a new fakeroot alternative based > on user namespace mappings. Which is why "this is not a use case for > containers" is not the right way to push back, whether or not the > feature ends up being appropriate. Agreed. Maybe I'm too pessimistic. We'll see. :-) Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/