Satheesh Rajendran's on April 8, 2019 5:32 pm:
> Hi,
> 
> Hit with below kernel crash during Power8 Host boot with this patch series on 
> top
> of powerpc merge branch commit 
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge&id=6a821ffee18a6e6c0027c523fa8c958df98ca361
> 
> built with ppc64le_defconfig
> 
> Host Console log:
> [    0.454666] EEH: PCI Enhanced I/O Error Handling Enabled
> [    0.456524] create_dump_obj: New platform dump. ID = 0x4 Size 7457968
> [    0.457627] opal-power: OPAL EPOW, DPO support detected.
> [    0.457722] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.457733] Faulting instruction address: 0xc00000000001a94c
> [    0.457740] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.457745] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> [    0.457750] Modules linked in:
> [    0.457756] CPU: 58 PID: 0 Comm: swapper/58 Not tainted 
> 5.1.0-rc2-gd0ae6c548 #1
> [    0.457762] NIP:  c00000000001a94c LR: c0000000000a6e9c CTR: 
> c000000000008000
> [    0.457768] REGS: c000000f272b7b50 TRAP: 0380   Not tainted  
> (5.1.0-rc2-gd0ae6c548)
> [    0.457773] MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 24004222  
> XER: 00000000
> [    0.457781] CFAR: c0000000000a6e98 IRQMASK: 1 
> [    0.457781] GPR00: c0000000000a6e9c c000000f272b7de0 0000000000000004 
> 0000000000000006 
> [    0.457781] GPR04: c0000000000a5dd4 0000000024004222 c000000f272b7d48 
> 0000000000000001 
> [    0.457781] GPR08: 0000000000000002 ffffffffff761844 c000000f27250c00 
> 0000c3feb1676be1 
> [    0.457781] GPR12: 0000000000004400 c000000ffff9d380 c000000ffe60ff90 
> 0000000000000000 
> [    0.457781] GPR16: 0000000000000000 0000000000000000 c00000000004b4d0 
> c00000000004b4a0 
> [    0.457781] GPR20: c000000001526214 0000000000000800 0000000000000001 
> c000000001521b78 
> [    0.457781] GPR24: 000000000000003a 0000000000000000 0000000000080000 
> 0000000000000000 
> [    0.457781] GPR28: c000000001526140 0000000000000001 0400000000000000 
> c000000001525ce0 
> [    0.457829] NIP [c00000000001a94c] irq_set_pending_from_srr1+0x1c/0x50
> [    0.457835] LR [c0000000000a6e9c] power7_idle+0x3c/0x50
> [    0.457839] Call Trace:
> [    0.457843] [c000000f272b7de0] [c0000000000a6e98] power7_idle+0x38/0x50 
> (unreliable)
> [    0.457849] [c000000f272b7e00] [c0000000000210f4] arch_cpu_idle+0x54/0x160
> [    0.457856] [c000000f272b7e30] [c000000000c47bc4] 
> default_idle_call+0x74/0x88
> [    0.457862] [c000000f272b7e50] [c000000000158f54] do_idle+0x2f4/0x3d0
> [    0.457868] [c000000f272b7ec0] [c000000000159288] 
> cpu_startup_entry+0x38/0x40
> [    0.457874] [c000000f272b7ef0] [c00000000004dae4] 
> start_secondary+0x654/0x680
> [    0.457881] [c000000f272b7f90] [c00000000000b25c] 
> start_secondary_prolog+0x10/0x14
> [    0.457886] Instruction dump:
> [    0.457890] 992d098b 7c630034 5463d97e 4e800020 60000000 3c4c014d 38424dd0 
> 7c0802a6 
> [    0.457898] 60000000 3d22ff76 78637722 39291840 
> [    0.457900] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.457901] <7d4918ae> 2b8a00ff 419e001c 892d098b 
> [    0.457907] Faulting instruction address: 0xc00000000001a94c
> [    0.457910] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.457915] ---[ end trace fa7343cfd21c8798 ]---
> [    0.457919] Faulting instruction address: 0xc00000000001a94c
> [    0.458961] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.458963] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.458964] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.458966] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.458968] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.458970] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.458972] Faulting instruction address: 0xc00000000001a94c
> [    0.458973] Faulting instruction address: 0xc00000000001a94c
> [    0.458974] Faulting instruction address: 0xc00000000001a94c
> [    0.458975] Faulting instruction address: 0xc00000000001a94c
> [    0.458976] Faulting instruction address: 0xc00000000001a94c
> [    0.458978] initcall 
> __machine_initcall_powernv_pnv_init_idle_states+0x0/0xb30 returned 0 after 0 
> usecs
> [    0.458981] calling  __machine_initcall_powernv_opal_time_init+0x0/0x150 @ 
> 1
> [    0.458982] Faulting instruction address: 0xc00000000001a94c
> [    0.459022] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.459040] Faulting instruction address: 0xc00000000001a94c
> [    0.459043] initcall __machine_initcall_powernv_opal_time_init+0x0/0x150 
> returned 0 after 0 usecs
> [    0.459044] BUG: Unable to handle kernel data access at 0xffffffffff76184c
> [    0.459045] Faulting instruction address: 0xc00000000001a94c
> [    0.459060] calling  __machine_initcall_powernv_rng_init+0x0/0x334 @ 1
> [    0.459084] powernv-rng: Registering arch random hook.
> [    0.459141] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.459147] Faulting instruction address: 0xc00000000001a94c
> [    0.459191] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.459199] Faulting instruction address: 0xc00000000001a94c
> [    0.459216] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.459224] Faulting instruction address: 0xc00000000001a94c
> [    0.459228] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.459234] Faulting instruction address: 0xc00000000001a94c
> [    0.459268] BUG: Unable to handle kernel data access at 0xffffffffff76184a
> [    0.459275] Faulting instruction address: 0xc00000000001a94c
> [    0.459375] 
> [    0.459380] Oops: Kernel access of bad area, sig: 11 [#2]
> [    0.459385] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> [    0.459390] Modules linked in:
> [    0.459395] CPU: 63 PID: 0 Comm: swapper/63 Tainted: G      D           
> 5.1.0-rc2-gd0ae6c548 #1
> [    0.459401] NIP:  c00000000001a94c LR: c0000000000a6e9c CTR: 
> c000000000008000
> [    0.459407] REGS: c000000f272a3b50 TRAP: 0380   Tainted: G      D          
>   (5.1.0-rc2-gd0ae6c548)
> [    0.459414] MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 24004222  
> XER: 00000000
> [    0.459419] BUG: Unable to handle kernel data access at 0xffffffffff76184c
> [    0.459422] CFAR: c0000000000a6e98 IRQMASK: 1 
> [    0.459422] GPR00: c0000000000a6e9c c000000f272a3de0 0000000000000004 
> 0000000000000006 
> [    0.459422] GPR04: c0000000000a5dd4 0000000024004222 c000000f272a3d48 
> 0000000000000001 
> [    0.459422] GPR08: 0000000000000007 ffffffffff761844 c000000f27244e00 
> 0000c3feb18a5128 
> [    0.459422] GPR12: 0000000000004400 c000000ffff99080 c000000ffe623f90 
> 0000000000000000 
> [    0.459422] GPR16: 0000000000000000 0000000000000000 c00000000004b4d0 
> c00000000004b4a0 
> [    0.459422] GPR20: c000000001526214 0000000000000800 0000000000000001 
> c000000001521b78 
> [    0.459422] GPR24: 000000000000003f 0000000000000000 0000000000080000 
> 0000000000000000 
> [    0.459422] GPR28: c000000001526140 0000000000000001 8000000000000000 
> c000000001525ce0 
> [    0.459443] NIP [c00000000001a94c] irq_set_pending_from_srr1+0x1c/0x50
> [    0.459449] Faulting instruction address: 0xc00000000001a94c
> [    0.459483] LR [c0000000000a6e9c] power7_idle+0x3c/0x50
> [    0.459485] Call Trace:
> [    0.459490] initcall __machine_initcall_powernv_rng_init+0x0/0x334 
> returned 0 after 0 usecs
> [    0.459493] calling  __machine_initcall_pseries_init_ras_IRQ+0x0/0xf4 @ 1
> [    0.459497] [c000000f272a3de0] [c0000000000a6e98] power7_idle+0x38/0x50 
> (unreliable)
> [    0.459500] [c000000f272a3e00] [c0000000000210f4] arch_cpu_idle+0x54/0x160
> [    0.459503] [c000000f272a3e30] [c000000000c47bc4] 
> default_idle_call+0x74/0x88
> [    0.459507] initcall __machine_initcall_pseries_init_ras_IRQ+0x0/0xf4 
> returned 0 after 0 usecs
> [    0.459510] calling  __machine_initcall_pseries_rng_init+0x0/0xa4 @ 1
> [    0.459514] [c000000f272a3e50] [c000000000158f54] do_idle+0x2f4/0x3d0
> [    0.459518] [c000000f272a3ec0] [c000000000159288] 
> cpu_startup_entry+0x38/0x40
> [    0.459523] initcall __machine_initcall_pseries_rng_init+0x0/0xa4 returned 
> 0 after 0 usecs
> [    0.459527] [c000000f272a3ef0] [c00000000004dae4] 
> start_secondary+0x654/0x680
> [    0.459531] [c000000f272a3f90] [c00000000000b25c] 
> start_secondary_prolog+0x10/0x14
> [    0.459535] calling  __machine_initcall_pseries_ioei_init+0x0/0xd8 @ 1
> [    0.459539] Instruction dump:
> [    0.459542] 992d098b 7c630034 5463d97e 4e800020 60000000 3c4c014d 38424dd0 
> 7c0802a6 
> [    0.459549] initcall __machine_initcall_pseries_ioei_init+0x0/0xd8 
> returned 0 after 0 usecs
> [    0.459553] 60000000 3d22ff76 78637722 39291840 <7d4918ae> 2b8a00ff 
> 419e001c 892d098b 
> [    0.459559] calling  uid_cache_init+0x0/0x108 @ 1
> [    0.459564] ---[ end trace fa7343cfd21c8799 ]---
> [    0.459574] initcall uid_cache_init+0x0/0x108 returned 0 after 0 usecs
> [    0.459576] calling  param_sysfs_init+0x0/0x248 @ 1
> 

This is the problem, the nap sequence does a dummy store to the stack
which clobbers our r2 save:

>> +#define IDLE_STATE_ENTER_SEQ_NORET(IDLE_INST)                       \
>> +    /* Magic NAP/SLEEP/WINKLE mode enter sequence */        \
>> +    std     r0,0(r1);                                       \
>> +    ptesync;                                                \
>> +    ld      r0,0(r1);                                       \
>> +236:        cmpd    cr0,r0,r0;                                      \
>> +    bne     236b;                                           \
>> +    IDLE_INST;                                              \
>> +    b       .       /* catch bugs */

vs

>> +_GLOBAL(isa206_idle_insn_mayloss)
>> +    std     r1,PACAR1(r13)
>> +    mflr    r4
>> +    mfcr    r5
>> +    /* use stack red zone rather than a new frame for saving regs */
>> +    std     r2,-8*0(r1)

I'm not sure where I broke this, I may have been loading r2 from
PACATOC before.

Thanks,
Nick

Reply via email to