Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com> writes: > Am Dienstag, 7. Februar 2017, 08:26:45 BRST schrieb Balbir Singh: >> On Mon, Feb 06, 2017 at 04:58:16PM -0200, Thiago Jung Bauermann wrote: >> > [ 447.714064] Querying DEAD? cpu 134 (134) shows 2 >> > cpu 0x86: Vector: 300 (Data Access) at [c000000007b0fd40] >> > >> > pc: 000000001ec3072c >> > lr: 000000001ec2fee0 >> > sp: 1faf6bd0 >> > >> > msr: 8000000102801000 >> > dar: 212d6c1a2a20c >> >> This looks like we accessed a bad address, but why? > > Am Dienstag, 7. Februar 2017, 13:10:22 BRST schrieb Michael Ellerman: >> We shouldn't be crashing. >> >> So we need to fix that. >> >> We may also need to increase the timeout, though it's pretty gross TBH. >> >> But step one is make sure we don't crash. > > I didn't analyze exactly what is causing the CPU to crash because the root > cause is the inconsistency between what the kernel thinks the CPU state is > and > reality. But if we have to be able to handle that inconsistency I will keep > digging and try to fix that.
Please do. It's obviously not a good situation, but it shouldn't crash the kernel. I realise that may be easier said than done, but that should be the goal. I see smp_ops->cpu_die() can't fail (returns void), which may be part of the problem. >> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c >> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> > @@ -206,7 +206,7 @@ static void pseries_cpu_die(unsigned int cpu) >> > >> > } >> > >> > } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { >> > >> > - for (tries = 0; tries < 25; tries++) { >> > + for (tries = 0; tries < 5000; tries++) { >> >> This fixes some of the asymmetry between handling of CPU_STATE_INACTIVE >> and CPU_STATE_OFFLINE, but I think we can probably move the cpu_relax() >> to msleep(1). > > I didn't change it to msleep() because I thought it would introduce a > regression. commit b906cfa397fd ("powerpc/pseries: Fix cpu hotplug") changed > a > msleep(200) that was there to a cpu_relax() with this explanation: > > Currently, pseries_cpu_die() calls msleep() while polling RTAS for > the status of the dying cpu. > > However, if the cpu that is going down also happens to be the one > doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is > only passed later on in tick_shutdown() when _cpu_down() does the > CPU_DEAD notification. Therefore jiffies won't be updated anymore. > > This replaces that msleep() with a cpu_relax() to make sure we're not > going to schedule at that point. > > With this patch my test box survives a 100k iterations hotplug stress > test on _all_ cpus, whereas without it, it quickly dies after ~50 > iterations. > > I can try to add it back and see what happens. Perhaps that situation won't > happen anymore with today's kernel. The relative timing of the events may have changed. But the fundamental problem still exists AFAIK. So we'd need a pretty solid guarantee that the problem can't happy anymore before I'd change it to msleep(). However it could/should at least use udelay(), which AFAIK is not reliant on jiffies. That way the loop will actually take a set amount of wall time, rather than just cycles. cheers