On Thu, May 07, 2015 at 01:17:27PM +0300, Vladimir Davydov wrote: > > We're creating cgroups for container on ve0 but bindmount them > > from inside of container, thus on userspace level (via config file) > > we can setup which cgroups are allowed for use. Still we're not > > limiting anyhow creating new sub-cgroups (via mkdir) inside > > container, and this one should be performance penalty mainly > > (new cgroup allocation is done via direct kzalloc without > > any memory limits as far as I understart). > > Actually, it is accounted to memcg, just like any kmalloc, but the > problem isn't that we miss accounting. The problem is that the more
I see, it's deep inside of slab/slub code, thanks. > features we allow to use from inside a container, the more different > types of kernel objects a container can create, the more potential > security issues we have. E.g. on reclaim the kernel walks over all > memory cgroups, as a result a container user can try to DOS the node by > creating thousands of cgroups. So maybe we should limit the number of nested cgroups in container? There is root->number_of_cgroups maybe we should setup some limit on ve config. > > Thus why we can limit cgroups set itself I don't see easy way to limit > > nested cgroups/dirs without additional kernel modification. Ideas? > > Let me clarify. Currently, we agreed on the following scheme: > > - There is a parameter in the config of a CT about which controllers to > bind mount inside the CT. By default, if there is no such a parameter > the userspace mounts all cgroups except our home-brewed ones (ve, > beancounter). Note, it is about the userspace only, the kernel knows > nothing about it. yes > - If a cgroup is bind mounted, the user of the container can play with > cgroups without any limitations. It is all about trust, in fact. If > you cannot trust a container, just disable bind mounting altogether > in the config. > > - There is the only exception to the previous rule though. Even if we > trust the container, we obviously don't want it to tweak its own > parameters that are set via cgroups (e.g. its memory and swap > limits), i.e. we should disallow it to write to files in its > bind-mounted root. This should be done unconditionally by the kernel. > Just disallow processes inside ve != ve0 to write files of any > top-level cgroup. yes, i'm testing it > > Hope this clears things up. > > A question still remains what to do with the /proc/cgroups file - we > should hide cgroups that are not bind mounted inside the CT there. This > may be done by bind mounting this file itself. Again, up to the > userspace. ok, once finish with previous will back to this one _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel