I don't like this behavior:

[root@ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test1
[root@ptikh-vz7 ~]# echo "a *:* rwmM" > /sys/fs/cgroup/devices/test1/devices.deny
[root@ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/devices.list
[root@ptikh-vz7 ~]# echo "c *:* rwm" > /sys/fs/cgroup/devices/test1/devices.allow
[root@ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test1/test2
[root@ptikh-vz7 ~]# echo "b *:* rwm" > /sys/fs/cgroup/devices/test1/test2/devices.allow
[root@ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/test2/devices.list
c *:* rwm
b *:* rwm
[root@ptikh-vz7 ~]# echo "c *:* rwm" > /sys/fs/cgroup/devices/test1/devices.deny
[root@ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/test2/devices.list
[root@ptikh-vz7 ~]#

When we remove any exception in ancestors "fake"-wildcard rules disappear.

I don't like this behavior even more:

[root@ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test3
[root@ptikh-vz7 ~]# echo "a *:* rwmM" > /sys/fs/cgroup/devices/test3/devices.deny
[root@ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test3/devices.list
[root@ptikh-vz7 ~]# echo "c *:* rwm" > /sys/fs/cgroup/devices/test3/devices.allow
[root@ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test3/test4
[root@ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test3/test4/devices.list
c *:* rwm
[root@ptikh-vz7 ~]# echo $$ > /sys/fs/cgroup/devices/test3/test4/tasks
[root@ptikh-vz7 ~]# mknod test c 1 3
mknod: ‘test’: Operation not permitted

Cgroup which is allowed to mknod all char devices can't mknod /dev/null.

I believe there would be more corner cases which are working similarly bad. I don't feel safe to change match_exception as this can lead to security issues. And properly reviewing such a change takes too much time.

If you have a node with kernel with this patch installed I would like to play 
with it, so that it would be easier to review.

Next time just provide node with kernel with patches applied, so that I don't need to build it myself, please.

On 19.01.2023 18:15, nb wrote:


On 19.01.23 г. 17:02 ч., Pavel Tikhomirov wrote:
I believe this is not covering all cases, for instance it would break adding rules to second level nested cgroup, if second level nested cgroup is in "default deny" and it's parent is in "default deny" and none of them have CGRP_VE_ROOT set. In parent there is allowing wildcard rule and adding same allowing wildcard rule to child would fail as verify_new_ex -> match_exception would return false.

If we don't have CGRP_VE_ROOT then the code should remain as is currently in upstream (see my comment about using ve_is_super below). After all this is only a problem for our container setup.


A  (default deny)
  \
   B (default deny) (allow wild card)
    \
    C <--- Add an allow wild card will be ignored and check will be deferred to parent.

Can you describe visually what scenario you have in mind?



<snip>

@@ -445,10 +450,13 @@ static bool match_exception_partial(struct list_head *exceptions, short type,
   *
   * returns: true in case it matches an exception completely
   */
-static bool match_exception(struct list_head *exceptions, short type,
-                u32 major, u32 minor, short access)
+static bool match_exception(struct dev_cgroup *dev_cgroup, short type,
+                u32 major, u32 minor, short access, bool check_parent)
  {
      struct dev_exception_item *ex;
+    struct cgroup *cgrp = dev_cgroup->css.cgroup;
+    struct list_head *exceptions = &dev_cgroup->exceptions;
+    bool wildcard_skipped = false;

      list_for_each_entry_rcu(ex, exceptions, list) {
          short mismatched_bits;
@@ -462,6 +470,11 @@ static bool match_exception(struct list_head *exceptions, short type,
              continue;
          if (ex->minor != ~0 && ex->minor != minor)
              continue;
+        /* skip wildcard rule if we are child cg */
+        if (is_wildcard_exception(ex) && !test_bit(CGRP_VE_ROOT, &cgrp->flags)) {

Comment does not correspond to what you do, so either comment is wrong or what you do is wrong. Note: CGRP_VE_ROOT bit is only set on root container cgroups when container is running. O maybe I'm not getting "child cg" thing.

A child cg is really any cgroup under an CGRP_VE_ROOT group. What did you think it was? Actually thinking more about it I think in this condition I should also explicitly be checking if we are in a container context via ve_is_super().


+            wildcard_skipped = true;
+            continue;
+        }

          /* provided access cannot have more than the exception rule */
          mismatched_bits = access & (~ex->access) & ~ACC_MOUNT;


<snip>

--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to