On 12/24/13, Prarit Bhargava <pra...@redhat.com> wrote: > > > On 12/23/2013 11:41 PM, rui wang wrote: >> On 12/23/13, Prarit Bhargava <pra...@redhat.com> wrote: >> > I think I have root caused this to the IRR being set in the down'd cpu. >> > It >> > is >> > admittedly a rare occurrence in the kernel. I usually have to run >> > about >> > 1000 up >> > and down's before hitting it, however, on my current test system it >> > seems to >> > hit >> > much more frequently, almost 1 in 64 times. >> > >>> >> >> If that's the case, then it means stop_machine() doesn't manage to >> clear the IRR and ISR bits. But why not? Since this cpu is down it's >> not supposed to handle any further interrupts. > > ISR bits can be cleared by software (via EOI). IRR bits CANNOT be cleared > by > software. Once IRR bits are set the only thing that can clear them is the > processor itself. > > IMHO we're supposed to >> send EOIs repeatedly until all the APIC_IRR and APIC_ISR bits are >> cleared. If an IRR bit is set, it means that there's (maybe another >> vector) an APIC_ISR bit set with the highest interrupt priority > > I don't think that is quite right (admittedly I may be wrong on this and > I'll > have to dig out an Intel spec to double check). As I mentioned above the > ISR > can be cleared but the IRR cannot be cleared. > > My understanding (also other's understanding on clearing of the IRR) is > that > only the processor can clear the IRR. See arch/x86/kernel/apic/apic.c line > 1352 > and the code below it. Because of kexec boot, we "drain" the IRR by > waiting. I > remember talking to the people who wrote that code about the IRR and the > inability to affect a write to clear requested and pending IRQs. > >> Sending an EOI clears the highest priority APIC_ISR bit, so the LAPIC >> will then clear the next highest priority IRR bit and set the >> corresponding ISR bit... > > But the issue here is that the IRR has been set, but the processor is yet > to > execute the irq's handler. That's what makes this error so difficult to > hit. > It is a race with the hardware to catch the processor in exactly this > state. > > We can repeat the process. It's like handling >> interrups in polled mode. That's the right thing to do IMHO. > > But the IRR register bits are still set and once the processor is out of > interrupt context (ie, in the cpu_die() code) the processor will attempt to > access the vector for that IRR bit and we'll still get a do_IRQ() warning. > That > is different than handling the ISR. > > [Aside: Sorry Rui, but it feels like I repeated myself several times about > the > IRR :). I mean no disrespect towards you by that :) as I'm just answering > each > of your points and pointing out that the IRR is the problem here.] >
I just did some experiments. Yes you are right. The IRR is hard to clear. I thought the IRR could transfer into an ISR when there's no higher priority irq pending. But it's not designed in that way. >> >> The other unanswered question is why isn't cpu_online_mask() protected >> by a spin lock ? Being atomic isn't enough. > > Perhaps it is thought that the cpu_online_mask() can only be changed by the > cpu > hotplug code and that is done via stop_machine() and protected by > cpu_maps_update_begin() and cpu_maps_update_done() in cpu_down()? So an > additional lock is not necessary? > Yes. I missed that part. Thanks Rui >> >> Are these all well-known issues? Are there well-known answers already? > > I thought the bug was well-known because of the number of do_IRQ warning > reports > I stumbled across while searching for a solution. But as the code says in > fixup_irqs() > > /* > * We can remove mdelay() and then send spuriuous interrupts to > * new cpu targets for all the irqs that were handled previously by > * this cpu. While it works, I have seen spurious interrupt > messages > * (nothing wrong but still...). > > so I don't think anyone has realized what the bug actually was. > Yes that comment was what triggered me to think that the issue wasn't understood. You now have a clear enough explanation. Thanks Rui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/