On Tue, 2010-04-20 at 22:15 -0500, Brian King wrote:
> On 04/20/2010 09:04 PM, Michael Neuling wrote:
> > In message <201004210154.o3l1sxar001...@d01av04.pok.ibm.com> you wrote:
> >>
> >> Since there is nothing to stop an IPI from occurring to an
> >> offline CPU, rather than printing a warning to the logs,
> >> just ignore the IPI. This was seen while stress testing
> >> SMT enable/disable.
> > 
> > This seems like a recipe for disaster.  Do we at least need a
> > WARN_ON_ONCE?
> 
> Actually we are only seeing it once per offlining of a CPU,
> and only once in a while.
>  
> My guess is that once the CPU is marked offline fewer IPIs
> get sent to it since its no longer in the online mask.

Hmm, right. Once it's offline it shouldn't get _any_ IPIs, AFAICS.

> Perhaps we should be disabling IPIs to offline CPUs instead?

You mean not sending them? We do:

void smp_xics_message_pass(int target, int msg)
{
        unsigned int i;

        if (target < NR_CPUS) {
                smp_xics_do_message(target, msg);
        } else {
                for_each_online_cpu(i) {
                        if (target == MSG_ALL_BUT_SELF
                            && i == smp_processor_id())
                                continue;
                        smp_xics_do_message(i, msg);
                }
        }
}      

So it does sound like the IPI was sent while the cpu was online (ie.
before pseries_cpu_disable(), but xics_migrate_irqs_away() has not
caused the IPI to be cancelled.

Problem is I don't think we can just ignore the IPI. The IPI might have
been sent for a smp_call_function() which is waiting for the result, in
which case if we ignore it the caller will block for ever.

I don't see how to fix it :/

cheers

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to