On Sun, Mar 30, 2025 at 05:52:39PM -0400, Waiman Long wrote: > There is a possible race between removing a cgroup diectory that is > a partition root and the creation of a new partition. The partition > to be removed can be dying but still online, it doesn't not currently > participate in checking for exclusive CPUs conflict, but the exclusive > CPUs are still there in subpartitions_cpus and isolated_cpus. These > two cpumasks are global states that affect the operation of cpuset > partitions. The exclusive CPUs in dying cpusets will only be removed > when cpuset_css_offline() function is called after an RCU delay. > > As a result, it is possible that a new partition can be created with > exclusive CPUs that overlap with those of a dying one. When that dying > partition is finally offlined, it removes those overlapping exclusive > CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an > incorrect CPU configuration. > > This bug was found when a warning was triggered in > remote_partition_disable() during testing because the subpartitions_cpus > mask was empty. > > One possible way to fix this is to iterate the dying cpusets as well and > avoid using the exclusive CPUs in those dying cpusets. However, this > can still cause random partition creation failures or other anomalies > due to racing. A better way to fix this race is to reset the partition > state at the moment when a cpuset is being killed. > > Introduce a new css_killed() CSS function pointer and call it, if > defined, before setting CSS_DYING flag in kill_css(). Also update the > css_is_dying() helper to use the CSS_DYING flag introduced by commit > 33c35aa48178 ("cgroup: Prevent kill_css() from being called more than > once") for proper synchronization. > > Add a new cpuset_css_killed() function to reset the partition state of > a valid partition root if it is being killed. > > Fixes: ee8dde0cd2ce ("cpuset: Add new v2 cpuset.sched.partition flag") > Signed-off-by: Waiman Long <long...@redhat.com>
Applied to cgroup/for-6.15-fixes. Thanks. -- tejun