Hi Nathan, On Thu, 27 Nov 2008 18:14:33 -0600 Nathan Lynch <[EMAIL PROTECTED]> wrote:
> Hi, I have some questions about this patch. > > Sebastien Dugue wrote: > > > > Currently, pseries_cpu_die() calls msleep() while polling RTAS for > > the status of the dying cpu. > > > > However if the cpu that is going down also happens to be the one doing > > the tick then we're hosed as the tick_do_timer_cpu 'baton' is only passed > > later on in tick_shutdown() when _cpu_down() does the CPU_DEAD notification. > > Therefore jiffies won't be updated anymore. > > I confess unfamiliarity with the tick/timer code, but this sounds like > something that should be addressed earlier in the process of taking > down a CPU. Maybe you're right, at least the tick_do_timer_cpu should be changed earlier in the down process, but I'm not sure where we can do that. > > > > This patch replaces that msleep() with a cpu_relax() to make sure we're > > not going to schedule at that point. > > This is a significant change in behavior. With the msleep(), we poll > for at least five seconds before giving up; with the cpu_relax(), the > period will almost certainly be much shorter and we're likely to give > up too soon in some circumstances. Right, I realized a bit late that that would indeed change the behaviour. On my test box (2 Power6) the msleep call is hit in ~10 % of the cases and only loop once, but that may not be the case for all the pSeries out there where a longer delay might be needed. > Could be addressed by using > mdelay(), but... Yep, would be better. I'm still wondering why the hang is not systematic when offlining the tick_do_timer_cpu, I must have missed something in my analysis and there might be a race somewhere I failed to identify. > > It's just not clear to me how busy-waiting in the __cpu_die() path is > a legitimate fix. Is sleeping in this path forbidden now? In that case, I think it is if the cpu going down is also doing the tick. > (I notice > at least native_cpu_die() in x86 does msleep(), btw.) Right, as most other arches do, but the thing is that on those, you cannot offline CPU0, whereas on power (and maybe otheres too) you can. > > As it can take several milliseconds for RTAS to report a CPU > offline, and the maximum latency of the operation is unspecified, it > seems inappropriate to tie up the waiting CPU this way. Agreed. > > > > With this patch my test box survives a 100k iterations hotplug stress > > test on _all_ cpus, whereas without it, it quickly dies after ~50 > > iterations. > > What is the failure (e.g. stack trace, kernel messages)? No stack trace, no kernel messages, nothing :( When that happens, the cpus get stuck in idle and won't reschedule at all. I verified that the decrementer is still ticking, hrtimers are still running but regular timers are stuck. Changing the tick_do_timer_cpu manually under xmon resurects the box. Will have to look at this a bit more. Thanks, Sebastien. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev