Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'

Srivatsa Vaddagiri Wed, 30 May 2007 09:49:05 -0700

On Tue, May 29, 2007 at 01:56:24PM -0700, Linus Torvalds wrote:
> As far as I'm concerned, we should
>  - use "preempt_disable()" to protect against CPU's coming and going 
>  - use "stop_machine()" or similar that already honors preemption, and 
>    which I trust a whole lot  more than freezer.
>  - .. especially since this is already how we are supposed to be protected 
>    against CPU's going away, and we've already started doing that (for an 
>    example of this, see things like e18f3ffb9c from Andrew)
> 
> It really does seem fairly straightforward to make "__cpu_up()" be called 
> through stop_machine too. Looking at _cpu_down:
> 
>         mutex_lock(&cpu_bitmask_lock);
>         p = __stop_machine_run(take_cpu_down, NULL, cpu);
>         mutex_unlock(&cpu_bitmask_lock);
> 
> and then looking at _cpu_up:
> 
>         mutex_lock(&cpu_bitmask_lock);
>         ret = __cpu_up(cpu);
>         mutex_unlock(&cpu_bitmask_lock);
> 
> I just go "Aww, wouldn't it be nice to just make that "__cpu_up()" call be 
> done through __stop_machine_run() too?"
> 
> Hmm?
> 
> Then, you could get the "cpu_bitmask_lock" if you need to sleep,


and that's where all the problems started - sleepers needing to take that mutex 
recursively (which we did/do not support).

        foo() takes cpu_bitmask_lock and calls
          foo_bar() which also needs cpu_bitmask_lock

What is a solution to that?

        - Forget (hide?) this whole locking mess by using freezer, which
          is what Andrew wanted us to shoot for :) I am somewhat biased
          with Andrew here in that I think it will lead to more stabler cpu 
          hotplug code over time. Again I know some people will beg to differ 
          on this view.

        - extend mutexes to support recursion (which I gather Linux has 
          religiously avoided so far)

        - invent a special lock for cpu hotplug which supports recursion. 
          This is what Gautham tried doing with [1], with the bonus that it 
          made the lock extremely scalable for readers by using per-cpu 
          reference counters and RCU. He is preparing to resend those patches 
          against latest kernel atm

        - Anything else you can think of?

[1] http://lkml.org/lkml/2006/10/26/73


> but if you don't want to do that (and quite often you don't), just doing a 
> "preempt_disable()" or taking a spinlock will *also* guarantee that no new 
> CPU's suddenly show up, so it's safe to look at the CPU online bitmasks.
> 
> Do we really need anything else?

see above

> As mentioned, it's actually fairly easy to add verification calls to make 
> sure that certain accesses are done with preemption disabled, so..

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'

Reply via email to