Hi all, On Wed, 2026-04-22 at 14:52 -0400, Aaron Tomlin wrote: > Hi, > > I have decided to drive this series forward on behalf of Daniel Wagner, the > original author. The series has been rebased on v7.0-12635-g6596a02b2078. > > Building upon prior iterations, this series introduces critical > architectural refinements to the mapping and affinity spreading algorithms > to guarantee thread safety and resilience against concurrent CPU-hotplug > operations. Previously, the block layer relied on a shared global static > mask (i.e., blk_hk_online_mask), which proved vulnerable to race conditions > during rapid hotplug events. This vulnerability was highlighted by the > kernel test robot, which encountered a NULL pointer dereference during > rcutorture (cpuhotplug) stress testing due to concurrent mask modification. > > To resolve this, the architecture has been fundamentally hardened. The > global static state has been eradicated. Instead, the IRQ affinity core now > employs a newly introduced irq_spread_hk_filter(), which safely intersects > the natively calculated affinity mask with the HK_TYPE_IO_QUEUE mask. > Crucially, this is achieved using a local, hotplug-safe snapshot via > data_race(cpu_online_mask). This approach circumvents the hotplug lock > deadlocks previously identified by Thomas Gleixner, while explicitly > avoiding CONFIG_CPUMASK_OFFSTACK stack bloat hazards on high-core-count > systems. A robust fallback mechanism guarantees that should an interrupt > vector be assigned exclusively to isolated cores, it is safely re-routed to > the system's online housekeeping CPUs. > > Following rigorous testing of multiple queue maps (such as NVMe poll > queues) alongside isolated CPUs, the tenth iteration resolved a critical > page fault regression. The multi-queue mapping logic has been corrected to > strictly maintain absolute hardware queue indices, ensuring faultless queue > initialisation and preventing out-of-bounds memory access. > > Furthermore, following feedback from Ming Lei, the administrative > documentation for isolcpus=io_queue has undergone a comprehensive overhaul > to reflect this architectural reality. Previous iterations lacked the > required technical precision regarding subsystem impact. The expanded > kernel-parameters.txt now explicitly details that this parameter applies > strictly to managed IRQs. It thoroughly documents how the block layer > intercepts multiqueue allocation to match the housekeeping mask, actively > preventing MSI-X vector exhaustion on massive topologies and forcing queue > sharing. Most importantly, it cements the structural guarantee: while an > application on an isolated CPU may freely submit I/O, the hardware > completion interrupt is strictly and safely offloaded to a housekeeping > core. > > Please let me know your thoughts.
This topic reminds me of a discussion started by Tobias [1] some time ago about IRQ spreading of network drivers. The problem was (and still is) that network drivers ignore any CPU isolation when spreading out device IRQs. In general we have two different CPU isolation mechanisms: - The static one, via isolcpus= cmdline parameter - The dynamic one, via cgroups(v2) cpuset controller This series is only taking the static "world" into account, right? Are there any plans to honor the CPU isolations configured the dynamic way? It has been a while since the last investigations on my end. Last time I went through the code, the IRQ core was completely decoupled from the dynamic configuration via cgroups. Are there any plans to fix that gap? Best regards, Florian [1] https://lore.kernel.org/all/[email protected]/

