Hi,
On 27.05.23 21:34, Linus Torvalds wrote:
On Sat, May 27, 2023 at 11:41 AM Frank Scheiner <frank.schei...@web.de> wrote:
Ok, I put the decoded console messages on [2].
[2]: https://pastebin.com/dLYMijfS
Ugh. Apparently ia64 decoding isn't great. But at least it gives
multiple line numbers:
load_module (kernel/module/main.c:2291 kernel/module/main.c:2412
kernel/module/main.c:2868)
except your kernel obviously has those test-patches, so I still don't
know exactly where they are.
Erm, I see. I did recreate a vanilla v6.4-rc3 and ran that, decoded
result is on [1] - not sure if it makes it a little better.
[1]: https://pastebin.com/z5XzEnhq
I did also try to build and run a SP kernel to maybe get a better
picture in the traces, but that seems to require FLATMEM, which seems to
not work on that machine or due to the way it is configured (and yeah,
it was also the wrong commit I used for it and it was patched...):
```
[ 0.000000] Linux version
6.4.0-rc3-933174ae28ba72ab8de5b35cb7c98fc211235096-patch3_sp
(root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39)
#1 Sat May 27 21:28:44 CEST 2023
[...]
[ 0.000000] ACPI: SSDT 0x000000003FE35BA8 00013C (v01 HP rx2620
00000006 INTL 20050309)
[ 0.000000] ACPI: Local APIC address (____ptrval____)
[ 0.000000] 1 CPUs available, 1 CPUs total
[...]
[ 0.000000] Kernel panic - not syncing: Cannot use FLATMEM with
246784MB hole
[ 0.000000] Please switch over to SPARSEMEM
[ 0.000000] ---[ end Kernel panic - not syncing: Cannot use FLATMEM
with 246784MB hole
[ 0.000000] Please switch over to SPARSEMEM ]---
```
But it looks like it is in move_module(). Strange. I don't know how it
gets to "__copy_user" from there...
[ Looks at the ia64 code ]
Oh.
It turns out that it *says* __copy_user(), but the code is actually
shared with the regular memcpy() function, which does
GLOBAL_ENTRY(memcpy)
and r28=0x7,in0
and r29=0x7,in1
mov f6=f0
mov retval=in0
br.cond.sptk .common_code
;;
where that ".common_code" label is - surprise surprise - the common
copy code, and so when the oops reports that the problem happened in
__copy_user(), it actually is in this case just a normal memcpy.
Ok, so it's probably the
memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
in move_module() that takes a fault. And looking at the registers,
the destination is in r17/r18, and your dump has
unable to handle kernel paging request at virtual address 1000000000000000
...
r17 : 0fffffffffffffff r18 : 1000000000000000
so it's almost certainly that 'dest' that is bad.
Which I guess shouldn't surprise anybody.
But that's where my knowledge of ia64 and the new module loader layout ends.
Thanks for your help and going as far as you could, that's greatly
appreciated. Running that stuff is surely easier than debugging it. :-)
Cheers,
Frank