On Mon, May 13, 2019 at 05:27:47PM -0700, Cong Wang wrote: > We have been consistently triggering the warning > WARN_ON_ONCE(cpuctx->cgrp) in perf_cgroup_switch() for a rather > long time, although we still have no clue on how to reproduce it. > > Looking into the code, it seems the only possibility here is that > the process calling perf_event_open() with a cgroup target exits > before the process in the target cgroup exits but after it gains > CPU to run. This is because we use the atomic counter > perf_cgroup_events as an indication of whether cgroup perf event > has enabled or not, which is inaccurate, illustrated as below: > > CPU 0 CPU 1 > // open perf events with a cgroup > // target for all CPU's > perf_event_open(): > account_event_cpu() > // perf_cgroup_events == 1 > // Schedule in a process in the target cgroup > perf_cgroup_switch() > perf_event_release_kernel(): > unaccount_event_cpu() > // perf_cgroup_events == 0 > // schedule out > // but perf_cgroup_sched_out() is skipped > // cpuctx->cgrp left as non-NULL
which implies we observed: 'perf_cgroup_events == 0' > // schedule in another process > perf_cgroup_switch() // WARN triggerred which implies we observed: 'perf_cgroup_events == 1' Which is impossible. It _might_ have been possible if the out and in happened on different CPUs. But then I'm not sure that is enough to trigger the problem. > The proposed fix is kinda ugly, Yes :-) > Suggestions? Thoughts? At perf_event_release time, when it is the last cgroup event, there should not be any cgroup events running anymore, so ideally perf_cgroup_switch() would not set state. Furthermore; list_update_cgroup_event() will actually clear cpuctx->cgrp on removal of the last cgroup event. Also; perf_cgroup_switch() will WARN when there are not in fact any cgroup events at all. I would expect that WARN to trigger too in your scneario. But you're not seeing that? I do however note that that check seems racy; we do that without holding the ctx_lock.