On Thu, 2012-10-18 at 10:54 +1100, Benjamin Herrenschmidt wrote: > On Wed, 2012-09-26 at 18:10 +0800, Li Zhong wrote: > > ../... > > Sorry got distracted, got back on this patch today: > > > > We might need to "sanitize" the enable state in the PACA before we > > > actually enter NAP or in the return from NAP code, like we do for normal > > > idle code... > > > > Hi Ben, > > > > After some further reading of the code, I updated the code as following, > > but I'm not very sure, and guess it most probably has some issues ... > > Could you please help to review and give your comments? > > I think it's still not right, see below > > > In extended_cede_processor(), if there is still lazy irq pending, I used > > local_irq_enable() to make sure the irq replayed and flags cleared, but > > I'm not sure whether it is a proper way. > > Right but that will break things for idle. In idle, if you have > re-enabled interrupts, you need to return and essentially abort the > attempt at going to nap mode, because the interrupt might have set need > resched. That's why we normally just check if something's pending and > return, letting the upper levels re-enable interrupts and do all the > dirty work for us. > > Now, hotplug might differ here in what it needs, but in any case, > extended_cede_processor doesn't seem to be the right place to handle it, > at best that function should return if it thinks there's something that > needs to be done and let the upper layers deal with it appropriately. > > > In pseries_mach_cpu_die(), I added local_irq_disable() after cede, and > > prepare for the start_secondary_resume(), but I'm not sure whether we > > also need a hard_irq_disable() here. > > You probably do if it's going to go back to the start secondary path. It > shouldn't hurt in any case as long as start_secondary_resume() > eventually does a local_irq_enable(). > > > I'm still a little confused by the meaning of PACA_IRQ_HARD_DIS in > > irq_happened. From the checking at the warning point, it seems only > > irq_happened equaling 0x1(PACA_IRQ_HARD_DIS) means hard irqs are > > disabled.
Hi Ben, Below are my current understandings and a few more questions, please correct me if there are any misunderstandings. Thank you. > No. They are disabled if any bit in there corresponding to a "level" > interrupt is set as well. Only the "edge" interrupts (and decrementer > which we treat as edge and reset to a high value when it kicks) are > ignored for the sake of HW irq state. >From the code, it seems that the hardware_interrupt/external_input (corresponding to PACA_IRQ_EE) are "level" interrupts. For "level" interrupts, is it because we will see it again, so we need to hard disable? or else we might enter into an infinite loop? > The reason we have this IRQ_HARD_DIS bit is to indicate that a 'manual' > hard disabling occurred (by opposition to one happening as a result of > an external interrupt). The external interrupt causing the hard disabling, is done in the code of masked_##_H##interrupt: for book3s, or masked_interrupt_book3e PACA_IRQ_EE 1 for book3e ? > We need that so we can avoid doing an mfspr() in local_irq_enable() and > entirely rely on the content of irq_happened to know whether interrupts > are hard enabled or hard disabled. Is this about the code in arch_local_irq_restore()? so if (! irq_happended), we could return directly as we know it is hard enabled. And here if (unlikely(irq_happened != PACA_IRQ_HARD_DIS)) __hard_irq_disable(); We don't need to hard disable if (irq_happened == PACA_IRQ_HARD_DIS), as we know it is hard disabled ('manual'). Then here, can we save a few more mtmsrd by also checking PACA_IRQ_EE bit? like following: - if (unlikely(irq_happened != PACA_IRQ_HARD_DIS)) + if (unlikely(!(irq_happened & (PACA_IRQ_HARD_DIS | PACA_IRQ_EE)))) > We do that because mfspr is a fairly expensive instruction. But that > means that we need to make sure we always have a consistent content in > irq_happened. That's also why I've added all those sanity checks if you > enable IRQ tracing. See. > > Is it possible to set this bit at anyplace the hard irqs are disabled, > > so then we could check whether this bit is set to know whether hard irqs > > are disabled? Then it seems that in MASKED_INTERRUPT, we need set this > > bit where MSR_EE is cleared for something other than decrementer. Maybe > > I missed too much things? > > Either that bit or PACA_IRQ_EE. Both indicate that interrupts are hard > disabled. > > There are some rare cases where do do change MSR:EE without touching > those bits, only when we're going to restore it shortly afterward in the > kernel asm exception entry/exit path for example. I don't get it very clearly here. I might need some more time to read and understand all the related asm codes. Currently, it seems to me in EXCEPTION_COMMON, SOFT_DISABLE_INTS is called to set PACA_IRQ_HARD_DIS, and other bits might be set when __SOFTEN_TEST (or masked_interrupt_book3e) is called. And in the exception exit path, something like SOFT_DISABLE_INTS, .restore_interrupts, restore_check_irq_replay, etc are called to handle the irq_happened bits. Thanks, Zhong > Cheers, > Ben. > > > Thanks, Zhong > > > > =================== > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c > > b/arch/powerpc/platforms/pseries/hotplug-cpu.c > > index 64c97d8..b5f7597 100644 > > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > > @@ -130,6 +130,8 @@ static void pseries_mach_cpu_die(void) > > extended_cede_processor(cede_latency_hint); > > } > > > > + local_irq_disable(); > > + > > if (!get_lppaca()->shared_proc) > > get_lppaca()->donate_dedicated_cpu = 0; > > get_lppaca()->idle = 0; > > diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h > > b/arch/powerpc/platforms/pseries/plpar_wrappers.h > > index 13e8cc4..07560d8 100644 > > --- a/arch/powerpc/platforms/pseries/plpar_wrappers.h > > +++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h > > @@ -2,6 +2,7 @@ > > #define _PSERIES_PLPAR_WRAPPERS_H > > > > #include <linux/string.h> > > +#include <linux/irqflags.h> > > > > #include <asm/hvcall.h> > > #include <asm/paca.h> > > @@ -41,7 +42,19 @@ static inline long extended_cede_processor(unsigned long > > latency_hint) > > u8 old_latency_hint = get_cede_latency_hint(); > > > > set_cede_latency_hint(latency_hint); > > + > > + while (!prep_irq_for_idle()) { > > + local_irq_enable(); > > + local_irq_disable(); > > + } > > + > > rc = cede_processor(); > > +#ifdef CONFIG_TRACE_IRQFLAGS > > + /* Ensure that H_CEDE returns with IRQs on */ > > + if (WARN_ON(!(mfmsr() & MSR_EE))) > > + __hard_irq_enable(); > > +#endif > > + > > set_cede_latency_hint(old_latency_hint); > > > > return rc; > > =================== > > > > > > > > > > Cheers, > > > Ben. > > > > > > > [ 56.618846] WARNING: at arch/powerpc/kernel/irq.c:240 > > > > [ 56.618851] Modules linked in: rcutorture ipv6 dm_mod ext3 jbd > > > > mbcache sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt > > > > ibmveth > > > > [ 56.618883] NIP: c00000000000ff94 LR: c00000000067a5e0 CTR: > > > > 0000000000000001 > > > > [ 56.618889] REGS: c0000001fef6bbe0 TRAP: 0700 Not tainted > > > > (3.6.0-rc1-autokern1) > > > > [ 56.618894] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42000082 > > > > XER: 20000000 > > > > [ 56.618913] SOFTE: 1 > > > > [ 56.618916] CFAR: c00000000067a5dc > > > > [ 56.618920] TASK = c0000001feed79a0[0] 'swapper/5' THREAD: > > > > c0000001fef68000 CPU: 5 > > > > GPR00: 0000000000000001 c0000001fef6be60 c000000000f9ca08 > > > > 0000000000000001 > > > > GPR04: 0000000000000001 0000000000000008 0000000000000001 > > > > 0000000000000000 > > > > GPR08: 0000000000000000 c0000001feed79a0 0008a80000000000 > > > > 0000000000000000 > > > > GPR12: 0000000022000082 c00000000f330f00 c0000001fef6bf90 > > > > 000000000f394b4c > > > > GPR16: 0000000000000000 0000000000000000 0000000000000000 > > > > 0000000000000000 > > > > GPR20: 0000000000000000 0000000000000000 0000000000000000 > > > > 0000000000000000 > > > > GPR24: c000000000fe8f80 0000000000000008 0000000000000028 > > > > 0000000000000000 > > > > GPR28: 0000000000000000 0000000000000020 c000000000f1ab40 > > > > 0000000000000001 > > > > [ 56.619014] NIP [c00000000000ff94] .arch_local_irq_restore+0x34/0xa0 > > > > [ 56.619020] LR [c00000000067a5e0] .start_secondary+0x368/0x37c > > > > [ 56.619025] Call Trace: > > > > [ 56.619030] [c0000001fef6be60] [c000000001ba0500] 0xc000000001ba0500 > > > > (unreliable) > > > > [ 56.619038] [c0000001fef6bed0] [c00000000067a5e0] > > > > .start_secondary+0x368/0x37c > > > > [ 56.619046] [c0000001fef6bf90] [c000000000009380] > > > > .start_secondary_resume+0x10/0x14 > > > > [ 56.619052] Instruction dump: > > > > [ 56.619056] f8010010 f821ff91 986d022a 2fa30000 419e0054 880d022b > > > > 78000621 41820048 > > > > [ 56.619071] 2f800001 40de0064 7c0000a6 78008fe2 <0b000000> 2fa00000 > > > > 40de0050 38000000 > > > > [ 56.619088] ---[ end trace 0199c0d783d7f9ba ]--- > > > > > > > > Reported-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > > > > Signed-off-by: Li Zhong <zh...@linux.vnet.ibm.com> > > > > --- > > > > arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 ++ > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c > > > > b/arch/powerpc/platforms/pseries/hotplug-cpu.c > > > > index 64c97d8..8de539a 100644 > > > > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > > > > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > > > > @@ -137,6 +137,8 @@ static void pseries_mach_cpu_die(void) > > > > if (get_preferred_offline_state(cpu) == > > > > CPU_STATE_ONLINE) { > > > > unregister_slb_shadow(hwcpu); > > > > > > > > + __hard_irq_disable(); > > > > + > > > > /* > > > > * Call to start_secondary_resume() will not > > > > return. > > > > * Kernel stack will be reset and > > > > start_secondary() > > > > > > > > > > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev