The commit is pushed to "branch-rh7-3.10.0-1160.62.1.vz7.187.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1160.62.1.vz7.187.3 ------> commit 129c9c7181a49aeca45f46ae2ccc731ac02fb32e Author: Pavel Tikhomirov <ptikhomi...@virtuozzo.com> Date: Wed Jun 1 11:29:32 2022 +0300
cgroup: switch from cgroup_mutex to cgroup_root_mutex in proc_cgroup_show We saw systemd reading /proc/<pid>/cgroup quiet a lot and we also see that cgroup_mutex is contended enough to make it's waiters wait for several seconds. Assuming that cgroup_root_mutex should be less contended as roots change not so much let's switch /proc/<pid>/cgroup to it. Why we can do this? - roots list is protected by both cgroup_mutex and cgroup_root_mutex - without cgroup_mutex ve_hide_cgroups() can be racy when current task is moved to other ve cgroup, but that's not too bad - for a root from roots list (not "rootnode" root), root->subsys_list is also protected by both mutexes - __task_cgroup_from_root() iterates over task->cgroups->cg_links while holding css_set_lock, which should be safe, and to protect found cgroup from dissapearing after we release css_set_lock, we take rcu_read_lock(). khorenko@ notes: - __put_css_set() also takes rcu_read_lock() before decrementing cgrp->count and states this is enough to keep cgroup alive. - comment in mm/memcontrol.c (and Pasha says the callstack is still valid): * Cgroup destruction stack: * cgroup_offline_fn() * offline_css() * list_del_rcu() * dput() * ... * cgroup_diput() * call_rcu(&cgrp->rcu_head, cgroup_free_rcu) So this callstack also proves holding rcu_read_lock() is enough to keep cgroup alive. https://jira.sw.ru/browse/PSBM-139206 Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com> --- kernel/cgroup.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 64f4227b2305..1f5d80130070 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -705,13 +705,14 @@ static inline bool css_has_host_cgroups(struct css_set *css_set) * Return the cgroup for "task" from the given hierarchy. Must be * called with cgroup_mutex held. */ -static struct cgroup *task_cgroup_from_root(struct task_struct *task, - struct cgroupfs_root *root) +static struct cgroup *__task_cgroup_from_root(struct task_struct *task, + struct cgroupfs_root *root, + int nocgmtx) { struct css_set *css; struct cgroup *res = NULL; - BUG_ON(!mutex_is_locked(&cgroup_mutex)); + BUG_ON(!nocgmtx && !mutex_is_locked(&cgroup_mutex)); read_lock(&css_set_lock); /* * No need to lock the task - since we hold cgroup_mutex the @@ -736,6 +737,12 @@ static struct cgroup *task_cgroup_from_root(struct task_struct *task, return res; } +static struct cgroup *task_cgroup_from_root(struct task_struct *task, + struct cgroupfs_root *root) +{ + return __task_cgroup_from_root(task, root, false); +} + /* * There is one global cgroup mutex. We also require taking * task_lock() when dereferencing a task's cgroup subsys pointers. @@ -5578,7 +5585,8 @@ int proc_cgroup_show(struct seq_file *m, void *v) retval = 0; - mutex_lock(&cgroup_mutex); + /* without cgroup_mutex we must protect roots list */ + mutex_lock(&cgroup_root_mutex); for_each_active_root(root) { struct cgroup_subsys *ss; @@ -5594,8 +5602,13 @@ int proc_cgroup_show(struct seq_file *m, void *v) seq_printf(m, "%sname=%s", count ? "," : "", root->name); seq_putc(m, ':'); - cgrp = task_cgroup_from_root(tsk, root); + + /* without cgroup_mutex we take rcu to make cgrp valid */ + rcu_read_lock(); + cgrp = __task_cgroup_from_root(tsk, root, true); retval = cgroup_path_ve(cgrp, buf, PAGE_SIZE); + rcu_read_unlock(); + if (retval < 0) goto out_unlock; seq_puts(m, buf); @@ -5603,7 +5616,7 @@ int proc_cgroup_show(struct seq_file *m, void *v) } out_unlock: - mutex_unlock(&cgroup_mutex); + mutex_unlock(&cgroup_root_mutex); put_task_struct(tsk); out_free: kfree(buf); _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel