Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
Excerpts from Christophe Leroy's message of October 27, 2021 3:51 pm:
Le 27/10/2021 à 07:25, Nicholas Piggin a écrit :
Excerpts from Christophe Leroy's message of October 27, 2021 3:00 pm:
Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
Hi,
We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
T1023 SOC with two e5500 cores, and are running in 32-bit mode:
CONFIG_PPC32=y
# CONFIG_PPC64 is not set
#
# Processor support
#
# CONFIG_PPC_BOOK3S_32 is not set
CONFIG_PPC_85xx=y
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
CONFIG_GENERIC_CPU=y
# CONFIG_E5500_CPU is not set
# CONFIG_E6500_CPU is not set
CONFIG_E500=y
CONFIG_PPC_E500MC=y
CONFIG_PPC_FPU_REGS=y
CONFIG_PPC_FPU=y
CONFIG_FSL_EMB_PERFMON=y
CONFIG_FSL_EMB_PERF_EVENT=y
CONFIG_FSL_EMB_PERF_EVENT_E500=y
CONFIG_BOOKE=y
CONFIG_FSL_BOOKE=y
CONFIG_PPC_FSL_BOOK3E=y
CONFIG_PTE_64BIT=y
CONFIG_PHYS_64BIT=y
CONFIG_PPC_MMU_NOHASH=y
CONFIG_PPC_BOOK3E_MMU=y
# CONFIG_PMU_SYSFS is not set
CONFIG_SMP=y
CONFIG_NR_CPUS=2
CONFIG_PPC_DOORBELL=y
# end of Processor support
We compile using 32-bit Bootlin PPC toolchain:
powerpc-e500mc glibc bleeding-edge 2020.08-1.
When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
process is running, and for debugging purposes our init currently looks like
this:
int main(int argc, char *argv[]){
for (int i = 0; ; i++) {
FILE *fp = fopen("/dev/kmsg", "w");
if (fp) {
fprintf(fp, "%d\n", i);
fclose(fp);
}
sleep(1);
}
return 0;
}
When the hangup occur we don't get any output at all from our PID 1.
The last output is from the kernel:
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
kgdboc=ttyS0,115200
When issuing a backtrace on all active cpus we can see that the kernel is
handling an instruction storage exception:
sysrq: Show backtrace of all active CPUs
sysrq: CPU0:
CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
NIP: c02aac78 LR: c02aac2c CTR: 00000000
REGS: c1907d40 TRAP: 0500 Not tainted (5.14.11)
MSR: 00029002 <CE,EE,ME> CR: 82244284 XER: 20000000
GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84
0000000c
GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514
bfb71670
GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1
b7c1c478
GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355
d39a2790
NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
LR [c02aac2c] handle_mm_fault+0xac/0x11f0
Call Trace:
[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
[c1907f10] [c0000988] InstructionStorage+0x108/0x120
--- interrupt: 400 at 0x65a238
NIP: 0065a238 LR: 0052f26c CTR: 0052f260
REGS: c1907f20 TRAP: 0400 Not tainted (5.14.11)
MSR: 0002d002 <CE,EE,PR,ME> CR: 42242284 XER: 00000000
GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260
0052f260
GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514
bfb71670
GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1
b7c1c478
GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8
bfb71660
NIP [0065a238] 0x65a238
LR [0052f26c] 0x52f26c
--- interrupt: 400
Instruction dump:
60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030
41c203dc
We have also observed that the CPU is continuously servicing the same interrupt
(north of 140k times per sec), it is not deadlocked.
We have not yet been able to reproduce this behavior under QEMU system
emulation.
When bisecting between 5.10 and 5.14.11 we can see that this behavior started
with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
powerpc: remove arguments from fault handler functions
Thank you for the excellent work to investigate and report this.
Our best guess that the instruction storage exception is not properly handled
and the kernel is never able to recover from the page fault, but we don't
really know how to proceed. Does anyone have any suggestions or insights?
Before my patch, zero was passed to the page fault error code, after
my patch it passes the contents of ESR SPR.
It looks like the BookE instruction access interrupt does not set ESR
(except for BO interrupts maybe?) so you're getting what was in the ESR
register from a previous interrupt, and maybe if that was a store then
access_error won't cause a segfault because is_exec is true so that
test bails out early, then it might just keep retrying the interrupt.
That could explain why you don't always see the same thing.
Now previous code still saved ESR in regs->esr/dsisr for some reason.
I can't quite see why that should have been necessary though. Does
According to the e500 Reference Manual, on ISI:
BO is set if the instruction fetch caused a byte-ordering exception;
otherwise cleared. *All* other defined ESR bits are *cleared*.
You're right. In that case it shouldn't change anything unless there
was a BO fault. I'm not sure what the problem is then. Guessing based
on the NIP and instructions, it looks like it's probably got the correct
user address that it's storing into vmf on the stack, so it has got past
the access checks so my theory would be wrong anyway.
Must be something simple but I can't see it yet.
Anyway, I think it is still worth doing the check with setting 0 in
_ESR(r11), maybe the Reference Manual is wrong.
So Jacques, please do the test anyway if you can.
Thanks
Christophe