On Wed, 2005-08-17 at 08:47 +0200, Ingo Molnar wrote: > but stop_machine() looks quite preempt-unsafe to begin with. The > local_irq_disable() would not be needed at all if prior the > for_each_online_cpu() loop we'd use set_cpus_allowed. The current method > of achieving 'no preemption' is simply racy even during normal > CONFIG_PREEMPT.
The code does look flakey, but I think it still works, and it may need to have a raw_local_irq_disable. The do_stop is bound to a CPU when it was created, so it doesn't need to have a set_cpus_allowed. I guess this is what is happening: --- You start do_stop in a thread on the cpu that you want to run a function on. (it's bound to that cpu) do_stop creates a thread on all the other cpus that will run stopmachine. While it creates the threads, the ones that are already created yield in case a thread wakes up on its cpu, so that that thread can migrate to its own cpu. (all the threads are at FIFO MAX_RT_PRIO-1, so they should never be preempted. After all the threads are created and acknowledge themselves, the do_stop changes the state to PREPARE. In stopmachine, when PREPARE is seen, it turns off preemption and stops yielding. do_stop then changes the state to DISABLE_IRQ. In stopmachine, when DISABLE_IRQ is seen, it disables IRQs. Then the do_stop runs the function that is expected to run in the STOPPED STATE. --- Notes: Each time the state is changed, the ack counter is zeroed and it wont continue until all the threads acknowledge they hit the expected point. I'm currious why it needs to go to preempt disable before going to irqs_disabled? It seems that once it is at the point to go, all processes should be locked on their CPUS and it would only need to goto irqs_disable. I'm also assuming that whatever function is being run expects to have interrupts off on all CPUs. So a raw_local_irq_disable may be in order. The comment above the stop_machine (called by do_stop) local_irq_disable looks incorrect. It says /* Don't schedule us away at this point, please */ I don't see how it can be scheduled out if it is running FIFO MAX_RT_PRIO-1. It just assume that it doesn't want any interrupts to go off. Do you still see a problem with stop_machine? -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/