Hi Pedro,
On 26.04.22 17:01, Pedro Miguel Justo wrote:
On 2022/Apr/26, at 06:34, Frank Scheiner <frank.schei...@web.de> wrote:
@Anton:
So maybe best to give `hardened_usercopy=off` a try on your rx2660, too.
From my testing on rx2660 and rx2620 this seems to unbreak the kernel
boot and maybe also makes it less likely to hit the problem post boot. I
don't know why Adrian's rx2660 seems to be unaffected by this, though.
I did. That is why I ended up compiling 5.17 with the entire thing turned off.
With 5.17, on my rx2660 Montvale with 8 cores the machine can’t get past early
boot even with hardened_usercopy=off.
Those ‘warnings' are actually processes being killed. And they depend on the
direction the bad copy was happening.
Thanks for clarification.
If you look at my prior responses, with the 4.19 kernel I was also running
along fine for hours and, after some time building the kernel (a benchmark in
itself) it would start producing these warning and would not allow compilation
to continue any further. I would reboot the machine and that gave me a few more
hours. When I tried 'hardened_usercopy=off’ on the 4.19 kernel that worked. I
no longer got these process terminations after a few hours and the machine was
able to build the entire kernel from beginning to end.
So, 4.19 and 5.17 are different in many ways (symptom-wise):
- I never got a bugckeck (panic) level failure on the 4.19. They were all
process termination level.
- On the 4.19 these took quite some time to show up. Seemed to depend on the
number of processes created in the past and was mitigated by a reboot. On the
5.17 it was very aggressive, showing up early in boot, even on system threads
like the crypto bot self test. Disabling the crypto boot self test made it go
father but not much. If the error is detected on a system thread, there is no
process to terminate: it is game over.
- hardened_usercopy=off was observed by 4.19 but ignored by 5.17
Well, it seems to make a difference for my rx2660, maybe because of
Montecitos instead of Montvales, I don't know. Or it depends on the
available memory (i.e. maybe it happens more/less often with less/more
memory available). Mine has 32 GiB in total.
I don’t exclude the possibility of human error in conducting all these
experiments (some of the process is error prone), but I did run these
experiments more than just a few times, so it would have to be a heck of a
coincidence to and up with consistent results.
Sure, my test results are also more anecdotal as it takes so much time
to boot and run things (`openssl speed -elapsed` takes around 23 mins).
I'll now look at my other Itanium gear, rx2800 i2 first,
First testing with 5.17.0-1-mckinley on my rx2800 i2 interestingly shows
no issues with memcopy at all, not during kernel boot, nor post boot. My
kernel cmdline is as follows:
```
root@rx2800-i2:~# cat /proc/cmdline
BOOT_IMAGE=net0:/AC10027B.vmlinuz root=/dev/nfs ip=:::::enp8s0f0:dhcp
modprobe.blacklist=hpsa,radeon
```
It could well be, that the Tukwilas behave differently in that case. In
the end they have their memory controller included in the processor and
not in the chipset like the older Montecitos or Montvales.
For reference:
firmware info:
```
[rx2800-i2-mp-ilo] CM:hpiLO-> sysrev
SYSREV
Revisions Active Pending
-------------------------------------
iLO FW : 01.54.03
System FW : 01.93
MHW FPGA : 02.02
Power Mon FW : 02.09
PRS HW : 02.06
IOH HW : 02.02
Power Supply 1 : 02.01
Power Supply 2 : 02.01
```
hardware info:
```
root@rx2800-i2:~# uname -a
Linux rx2800-i2 5.17.0-1-mckinley #1 SMP Debian 5.17.3-1 (2022-04-18)
ia64 GNU/Linux
root@rx2800-i2:~# lscpu
Architecture: ia64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Itanium(R) Processor 9320
Model name: Intel(R) Itanium(R) Processor 9320
BIOS Model name: Intel(R) Itanium(R) Processor 9320
CPU family: 32
Model: 4
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
BogoMIPS: 2920.44
Flags: branchlong, 16-byte atomic ops, 0x8
Caches (sum of all):
L1d: 64 KiB (4 instances)
L1i: 64 KiB (4 instances)
L2d: 1 MiB (4 instances)
L2i: 4 MiB (8 instances)
L3: 32 MiB (8 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
root@rx2800-i2:~# free -m
total used free shared buff/cache
available
Mem: 24218 138 23983 2 96
23871
Swap: 0 0 0
```
Cheers,
Frank