On 2020/7/3 0:02, Roman Gushchin wrote: > On Wed, Jul 01, 2020 at 09:48:48PM -0700, Cong Wang wrote: >> On Tue, Jun 30, 2020 at 3:48 PM Roman Gushchin <g...@fb.com> wrote: >>> >>> Btw if we want to backport the problem but can't blame a specific commit, >>> we can always use something like "Cc: <sta...@vger.kernel.org> [3.1+]". >> >> Sure, but if we don't know which is the right commit to blame, then how >> do we know which stable version should the patch target? :) >> >> I am open to all options here, including not backporting to stable at all. > > It seems to be that the issue was there from bd1060a1d671 ("sock, cgroup: add > sock->sk_cgroup"), > so I'd go with it. Otherwise we can go with 5.4+, as I understand before that > it was > hard to reproduce it. >
Actually I think it should be very easy to reproduce the bug. suppose default cgroup and netcls cgroup are mounted in /cgroup/default and /cgroup/netcls respectively, and then: 1. mkdir /cgroup/default/sub1 2. mkdir /cgroup/default/sub2 3. attach some tasks into sub1/ and sub2/ 4. attach bpf program to sub1/ and sub2/ # get bpf refcnt for those cgroups 5. echo 1 > /cgroup/netcls/classid # this will disable cgroup_sk_alloc 6. kill all tasks in sub1/ and sub2/ 7. rmdir sub1/ sub2/ The last step will deref bpf for the default root cgroup instead of sub1/ and sub2/, and should trigger the bug. FYI I never use bpf, so I might be wrong.