Excerpts from Christophe Leroy's message of October 27, 2021 3:00 pm:
> 
> 
> Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
>> Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
>>> Hi,
>>>
>>> We are trying to upgrade kernel from 5.10 to 5.14.11. We have a 
>>> Freescale/NXP
>>> T1023 SOC with two e5500 cores, and are running in 32-bit mode:
>>>
>>>     CONFIG_PPC32=y
>>>     # CONFIG_PPC64 is not set
>>>
>>>     #
>>>     # Processor support
>>>     #
>>>     # CONFIG_PPC_BOOK3S_32 is not set
>>>     CONFIG_PPC_85xx=y
>>>     # CONFIG_PPC_8xx is not set
>>>     # CONFIG_40x is not set
>>>     # CONFIG_44x is not set
>>>     CONFIG_GENERIC_CPU=y
>>>     # CONFIG_E5500_CPU is not set
>>>     # CONFIG_E6500_CPU is not set
>>>     CONFIG_E500=y
>>>     CONFIG_PPC_E500MC=y
>>>     CONFIG_PPC_FPU_REGS=y
>>>     CONFIG_PPC_FPU=y
>>>     CONFIG_FSL_EMB_PERFMON=y
>>>     CONFIG_FSL_EMB_PERF_EVENT=y
>>>     CONFIG_FSL_EMB_PERF_EVENT_E500=y
>>>     CONFIG_BOOKE=y
>>>     CONFIG_FSL_BOOKE=y
>>>     CONFIG_PPC_FSL_BOOK3E=y
>>>     CONFIG_PTE_64BIT=y
>>>     CONFIG_PHYS_64BIT=y
>>>     CONFIG_PPC_MMU_NOHASH=y
>>>     CONFIG_PPC_BOOK3E_MMU=y
>>>     # CONFIG_PMU_SYSFS is not set
>>>     CONFIG_SMP=y
>>>     CONFIG_NR_CPUS=2
>>>     CONFIG_PPC_DOORBELL=y
>>>     # end of Processor support
>>>
>>> We compile using 32-bit Bootlin PPC toolchain:
>>>
>>>     powerpc-e500mc glibc bleeding-edge 2020.08-1.
>>>
>>> When booting, and starting PID 1 we sometimes get a hang. Nothing but our 
>>> init
>>> process is running, and for debugging purposes our init currently looks like
>>> this:
>>>
>>>     int main(int argc, char *argv[]){
>>>             for (int i = 0; ; i++) {
>>>                     FILE *fp = fopen("/dev/kmsg", "w");
>>>                     if (fp) {
>>>                             fprintf(fp, "%d\n", i);
>>>                             fclose(fp);
>>>                     }
>>>                     sleep(1);
>>>             }
>>>             return 0;
>>>     }
>>>
>>> When the hangup occur we don't get any output at all from our PID 1.
>>> The last output is from the kernel:
>>>
>>>     Run /sbin/init as init process
>>>       with arguments:
>>>         /sbin/init
>>>       with environment:
>>>         HOME=/
>>>         TERM=linux
>>>         kgdboc=ttyS0,115200
>>>
>>> When issuing a backtrace on all active cpus we can see that the kernel is
>>> handling an instruction storage exception:
>>>
>>>     sysrq: Show backtrace of all active CPUs
>>>     sysrq: CPU0:
>>>     CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
>>>     NIP:  c02aac78 LR: c02aac2c CTR: 00000000
>>>     REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
>>>     MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
>>>     GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 
>>> 0000000c
>>>     GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 
>>> bfb71670
>>>     GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 
>>> b7c1c478
>>>     GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 
>>> d39a2790
>>>     NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
>>>     LR [c02aac2c] handle_mm_fault+0xac/0x11f0
>>>     Call Trace:
>>>     [c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
>>>     [c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
>>>     [c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
>>>     [c1907f10] [c0000988] InstructionStorage+0x108/0x120
>>>     --- interrupt: 400 at 0x65a238
>>>     NIP:  0065a238 LR: 0052f26c CTR: 0052f260
>>>     REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
>>>     MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
>>>     GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 
>>> 0052f260
>>>     GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 
>>> bfb71670
>>>     GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 
>>> b7c1c478
>>>     GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 
>>> bfb71660
>>>     NIP [0065a238] 0x65a238
>>>     LR [0052f26c] 0x52f26c
>>>     --- interrupt: 400
>>>     Instruction dump:
>>>     60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
>>>     90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 
>>> 41c203dc
>>>
>>> We have also observed that the CPU is continuously servicing the same 
>>> interrupt
>>> (north of 140k times per sec), it is not deadlocked.
>>>
>>> We have not yet been able to reproduce this behavior under QEMU system
>>> emulation.
>>>
>>> When bisecting between 5.10 and 5.14.11 we can see that this behavior 
>>> started
>>> with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
>>>
>>>     powerpc: remove arguments from fault handler functions
>> 
>> Thank you for the excellent work to investigate and report this.
>> 
>>>
>>> Our best guess that the instruction storage exception is not properly 
>>> handled
>>> and the kernel is never able to recover from the page fault, but we don't
>>> really know how to proceed. Does anyone have any suggestions or insights?
>> 
>> Before my patch, zero was passed to the page fault error code, after
>> my patch it passes the contents of ESR SPR.
>> 
>> It looks like the BookE instruction access interrupt does not set ESR
>> (except for BO interrupts maybe?) so you're getting what was in the ESR
>> register from a previous interrupt, and maybe if that was a store then
>> access_error won't cause a segfault because is_exec is true so that
>> test bails out early, then it might just keep retrying the interrupt.
>> 
>> That could explain why you don't always see the same thing.
>> 
>> Now previous code still saved ESR in regs->esr/dsisr for some reason.
>> I can't quite see why that should have been necessary though. Does
>> this patch solve it for you?
>> 
>> Thanks,
>> Nick
>> --
>> 
>> 
>> diff --git a/arch/powerpc/kernel/head_booke.h 
>> b/arch/powerpc/kernel/head_booke.h
>> index e5503420b6c6..0e7cdc8716eb 100644
>> --- a/arch/powerpc/kernel/head_booke.h
>> +++ b/arch/powerpc/kernel/head_booke.h
>> @@ -467,10 +467,11 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>>   
>>   #define INSTRUCTION_STORAGE_EXCEPTION                                      
>>       \
>>      START_EXCEPTION(InstructionStorage)                                   \
>> -    NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);                 \
>> -    mfspr   r5,SPRN_ESR;            /* Grab the ESR and save it */        \
>> +    NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);                         \
>> +    li      r5,0;                                                         \
>> +    mfspr   r5,SPRN_ESR;            /* Store 0 in regs->esr (dsisr) */    \
> 
> I can't see how that can help, you set r5 to 0 and immediately after you 
> reload ESR into r5 so you are still saving garbage into _ESR(r11)
> 

Oops, stupid mistake.

Thanks,
Nick

diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index e5503420b6c6..094c9790a490 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -467,10 +467,10 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 
 #define INSTRUCTION_STORAGE_EXCEPTION                                        \
        START_EXCEPTION(InstructionStorage)                                   \
-       NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);                 \
-       mfspr   r5,SPRN_ESR;            /* Grab the ESR and save it */        \
+       NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);                         \
+       li      r5,0;                   /* Store 0 in regs->esr (dsisr) */    \
        stw     r5,_ESR(r11);                                                 \
-       stw     r12, _DEAR(r11);        /* Pass SRR0 as arg2 */               \
+       stw     r12, _DEAR(r11);        /* Set regs->dear (dar) */            \
        prepare_transfer_to_handler;                                          \
        bl      do_page_fault;                                                \
        b       interrupt_return

Reply via email to