In message <4bcf78e5.9020...@linux.vnet.ibm.com> you wrote: > On 04/21/2010 04:03 PM, Michael Neuling wrote: > > In message <4bcf029b.1020...@linux.vnet.ibm.com> you wrote: > >> On 04/21/2010 08:35 AM, Michael Ellerman wrote: > >>> On Tue, 2010-04-20 at 22:15 -0500, Brian King wrote: > >>>> On 04/20/2010 09:04 PM, Michael Neuling wrote: > >>>>> In message <201004210154.o3l1sxar001...@d01av04.pok.ibm.com> you wrote: > >>>>>> > >>>>>> Since there is nothing to stop an IPI from occurring to an > >>>>>> offline CPU, rather than printing a warning to the logs, > >>>>>> just ignore the IPI. This was seen while stress testing > >>>>>> SMT enable/disable. > >>>>> > >>>>> This seems like a recipe for disaster. Do we at least need a > >>>>> WARN_ON_ONCE? > >>>> > >>>> Actually we are only seeing it once per offlining of a CPU, > >>>> and only once in a while. > >>>> > >>>> My guess is that once the CPU is marked offline fewer IPIs > >>>> get sent to it since its no longer in the online mask. > >>> > >>> Hmm, right. Once it's offline it shouldn't get _any_ IPIs, AFAICS. > >>> > >>>> Perhaps we should be disabling IPIs to offline CPUs instead? > >>> > >>> You mean not sending them? We do: > >>> > >>> void smp_xics_message_pass(int target, int msg) > >>> { > >>> unsigned int i; > >>> > >>> if (target < NR_CPUS) { > >>> smp_xics_do_message(target, msg); > >>> } else { > >>> for_each_online_cpu(i) { > >>> if (target == MSG_ALL_BUT_SELF > >>> && i == smp_processor_id()) > >>> continue; > >>> smp_xics_do_message(i, msg); > >>> } > >>> } > >>> } > >>> > >>> So it does sound like the IPI was sent while the cpu was online (ie. > >>> before pseries_cpu_disable(), but xics_migrate_irqs_away() has not > >>> caused the IPI to be cancelled. > >>> > >>> Problem is I don't think we can just ignore the IPI. The IPI might have > >>> been sent for a smp_call_function() which is waiting for the result, in > >>> which case if we ignore it the caller will block for ever. > >>> > >>> I don't see how to fix it :/ > >> > >> Any objections to just removing the warning? > > > > Well someone could be waiting for the result, so it could be a real > > problem. > > > > IMHO the warning should stay. > > Looking in arch/powerpc/kernel/smp.c, there are four possible IPIs: > > void smp_message_recv(int msg) > { > switch(msg) { > case PPC_MSG_CALL_FUNCTION: > generic_smp_call_function_interrupt(); > break; > case PPC_MSG_RESCHEDULE: > /* we notice need_resched on exit */ > break; > case PPC_MSG_CALL_FUNC_SINGLE: > generic_smp_call_function_single_interrupt(); > break; > case PPC_MSG_DEBUGGER_BREAK: > if (crash_ipi_function_ptr) { > crash_ipi_function_ptr(get_irq_regs()); > break; > } > #ifdef CONFIG_DEBUGGER > debugger_ipi(get_irq_regs()); > break; > #endif /* CONFIG_DEBUGGER */ > /* FALLTHROUGH */ > > > Both generic_smp_call_function_interrupt and > generic_smp_call_function_single_interrupt have > WARN_ON(!cpu_online(cpu)); in them. The debugger IPI, appears to > ignore the IPI if the cpu is offline, which leaves the reschedule > IPI. This is likely the one I am seeing in test, since I'm not seeing > the other WARN_ON's.
I'm not sure what you are suggesting? If the other methods produce the warning when a CPU is offline, surely we should keep the warning? Maybe we need to add one to the debugger case too if we want to be consistent. Mikey _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev