This series introduces Dynamic Housekeeping Management (DHM) to the Linux kernel, enabling runtime reconfiguration of kernel-noise housekeeping (nohz_full tick suppression, RCU NOCB offloading, and managed IRQ migration) through the existing cgroup v2 cpuset isolated partition mechanism — no new kernel ABI required.
When a cpuset partition is set to isolated mode, the CPUs in that partition are removed from the kernel's global housekeeping masks. The housekeeping subsystems (tick/nohz, RCU NOCB, genirq) react via explicit registered callbacks, applying the new masks at runtime. Destroying the partition restores the CPUs to all housekeeping masks. The architecture uses a per-type callback table (struct housekeeping_cbs) with pre_validate/apply hooks, replacing the previous notifier chain. Housekeeping cpumask pointers are RCU-protected to allow lock-free readers during updates. Signed-off-by: Jing Wu <[email protected]> Signed-off-by: Qiliang Yuan <[email protected]> --- V2 -> V3: - Replace notifier chain with explicit per-type callback interface (struct housekeeping_cbs with .name, .pre_validate, .apply fields). - RCU-protect all housekeeping cpumask pointers; callers must hold rcu_read_lock() or use housekeeping_cpumask_rcu() in apply() callbacks. - Drop 5 patches from v2: HK_TYPE enum separation (upstream aliases are already correct), no-op timer/hrtimer patches, kthread dead code, and workqueue double-update. - Fix deadlock in rcu_hk_workfn(): remove cpus_read_lock() wrapper around remove_cpu()/add_cpu() which take cpu_hotplug_lock write side. - Fix UAF in rcu_hk_apply(): snapshot the housekeeping cpumask inside the work function under rcu_read_lock(), not at apply() time where the old pointer may be freed by synchronize_rcu() before the work runs. - Fix tick apply(): snapshot housekeeping_cpumask_rcu() under rcu_read_lock() as required by lockdep for runtime-mutable types. - Activate context_tracking dynamically via ct_cpu_track_user() / ct_cpu_untrack_user() in tick apply(), eliminating the dependency on CONFIG_CONTEXT_TRACKING_USER_FORCE flagged by tglx. - Fix genirq apply(): snapshot HK_TYPE_MANAGED_IRQ mask under rcu_read_lock() before the IRQ iteration loop. - Simplify cpuset noise_types to BIT(HK_TYPE_KERNEL_NOISE) | BIT(HK_TYPE_MANAGED_IRQ), replacing the redundant per-alias bitmask. - housekeeping_update_types(): always use cpu_possible_mask as base for HK_TYPE_KERNEL_NOISE, so de-isolation restores the mask to all possible CPUs rather than leaving it at its last non-trivial value. - Initialize watchdog_cpumask from HK_TYPE_KERNEL_NOISE (not HK_TYPE_TIMER) at boot; keep it in sync at runtime via a new housekeeping_cbs callback. - Add kernel-noise selftest to test_cpuset_prs.sh, including cpu_in_cpulist() for correct cpulist range membership detection and nohz_full sysfs verification when CONFIG_NO_HZ_FULL is active. - Add RCU caller fixes: sched/core (HK_TYPE_KERNEL_NOISE) and drivers/hv (HK_TYPE_MANAGED_IRQ) are required because those types are updated at runtime; hrtimer (HK_TYPE_TIMER) and arm64/topology (HK_TYPE_TICK) are defensive fixes. - Reorder patches so all subsystem callbacks are registered before the cpuset patch that triggers housekeeping_update_types(). V1 -> V2: - Rebrand series from DHEI to DHM (Dynamic Housekeeping Management). - Drop custom sysfs interface entirely. - Integrate housekeeping control into cgroup v2 cpuset isolated partition mechanism. - Add SMT-aware isolation constraints to prevent splitting SMT siblings. - Add comprehensive documentation and cgroup functional selftests. - Refactor mask transition logic to use RCU-safe handover. v2: https://lore.kernel.org/r/[email protected] v1: https://lore.kernel.org/all/[email protected] --- Jing Wu (13): sched/isolation: Replace notifier chain with explicit callback interface sched/isolation: Add housekeeping_update_types() for kernel-noise masks sched/isolation: RCU-protect all housekeeping cpumask readers sched/isolation: Fix RCU protection for runtime-mutable cpumask callers cpu/hotplug: Reserve CPUHP states for nohz_full and managed IRQ down-paths tick/nohz, context_tracking: Prepare for runtime nohz_full updates rcu/nocb: Add explicit housekeeping callback for runtime NOCB toggling genirq: Add explicit housekeeping callback for managed IRQ migration watchdog/lockup_detector: Register housekeeping callback for kernel-noise sched: Guard sched_tick_start/stop against uninitialized tick_work_cpu cgroup/cpuset: Extend isolated partition to trigger kernel-noise isolation docs: cgroup-v2: Document kernel-noise isolation via isolated partitions selftests/cgroup: Add kernel-noise isolation test to cpuset selftest Documentation/admin-guide/cgroup-v2.rst | 8 + arch/arm64/kernel/topology.c | 9 +- drivers/hv/channel_mgmt.c | 50 +++-- include/linux/context_tracking.h | 1 + include/linux/cpuhotplug.h | 2 + include/linux/sched/isolation.h | 41 ++++ kernel/cgroup/cpuset.c | 23 +- kernel/context_tracking.c | 23 +- kernel/irq/manage.c | 86 ++++++++ kernel/rcu/tree.c | 104 +++++++++ kernel/sched/core.c | 7 +- kernel/sched/isolation.c | 256 ++++++++++++++++++++-- kernel/time/hrtimer.c | 5 +- kernel/time/tick-sched.c | 157 ++++++++++++- kernel/watchdog.c | 56 ++++- tools/testing/selftests/cgroup/test_cpuset_prs.sh | 204 ++++++++++++++++- 16 files changed, 968 insertions(+), 64 deletions(-) --- base-commit: eb3f4b7426cfd2b79d65b7d37155480b32259a11 change-id: 20260408-wujing-dhm-8f43e2d49cd8 Best regards, -- Jing Wu <[email protected]>

