On Fri, Aug 11, 2023 at 09:52:15PM +0200, Martin Pieuchot wrote: > When stopping a machine, with "halt -p" for example, secondary CPUs are > removed from the scheduler before smr_flush() is called. So there's no > need for the SMR thread to peg itself to such CPUs. This currently > isn't a problem because we use per-CPU runqueues but it doesn't work > with a global one. So the diff below skip halted CPUs. It should also > speed up rebooting/halting on machine with a huge number of CPUs.
Because SPCF_HALTED does not (?) imply that the CPU has stopped processing interrupts, this skipping is not safe as is. Interrupt handlers might access SMR-protected data. One possible solution is to spin. When smr_grace_wait() sees SPCF_HALTED, it should probably call cpu_unidle(ci) and spin until condition READ_ONCE(ci->ci_schedstate.spc_smrgp) == smrgp becomes true. However, for this to work, sched_idle() needs to invoke smr_idle(). Here is a potential problem since the cpu_idle_{enter,cycle,leave}() logic is not consistent between architectures. I think the intent in sched_idle() was that cpu_idle_enter() should block interrupts so that sched_idle() could check without races if the CPU can sleep. Now, on some architectures cpu_idle_enter() is a no-op. These architectures have to check the idle state in their cpu_idle_cycle() function before pausing the CPU. To avoid touching architecture-specific code, cpu_is_idle() could be redefined to ((ci)->ci_schedstate.spc_whichqs == 0 && (ci)->ci_schedstate.spc_smrgp == READ_ONCE(smr_grace_period)) Then the loop conditions while (!cpu_is_idle(curcpu())) { and while (spc->spc_whichqs == 0) { in sched_idle() would have to be changed to while (spc->spc_whichqs != 0) { and while (cpu_is_idle(ci)) { :( > Index: kern/kern_smr.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_smr.c,v > retrieving revision 1.16 > diff -u -p -r1.16 kern_smr.c > --- kern/kern_smr.c 14 Aug 2022 01:58:27 -0000 1.16 > +++ kern/kern_smr.c 11 Aug 2023 19:43:54 -0000 > @@ -158,6 +158,8 @@ smr_grace_wait(void) > CPU_INFO_FOREACH(cii, ci) { > if (!CPU_IS_RUNNING(ci)) > continue; > + if (ci->ci_schedstate.spc_schedflags & SPCF_HALTED) > + continue; > if (READ_ONCE(ci->ci_schedstate.spc_smrgp) == smrgp) > continue; > sched_peg_curproc(ci); >