Uploaded in bionic[0]. It is now waiting for SRU acceptance for valgrind to start buiding in bionic-proposed and start the testing phase of the SRU.
[0] [ubuntu/bionic-proposed] valgrind 1:3.13.0-2ubuntu2.2 (Waiting for approval) - Eric ** Description changed: [Impact] valgrind on bionic coredump and errors out as follows: ARM64 front end: branch_etc disInstr(arm64): unhandled instruction 0xD5380000 disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000 ==11950== valgrind: Unrecognised instruction at address 0x4014c90. ==11950== at 0x4014C90: init_cpu_features (cpu-features.c:72) ==11950== by 0x4014C90: dl_platform_init (dl-machine.h:208) ==11950== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231) ==11950== by 0x40018C3: _dl_start_final (rtld.c:414) ==11950== by 0x4001B47: _dl_start (rtld.c:523) ==11950== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so) ==11950== Your program just tried to execute an instruction that Valgrind ==11950== did not recognise. There are two possible reasons for this. ==11950== 1. Your program has a bug and erroneously jumped to a non-code ==11950== location. If you are running Memcheck and you just saw a ==11950== warning about a bad jump, it's probably your program's fault. ==11950== 2. The instruction is legitimate but Valgrind doesn't handle it, ==11950== i.e. it's Valgrind's fault. If you think this is the case or ==11950== you are not sure, please let us know and we'll try to fix it. ==11950== Either way, Valgrind will now raise a SIGILL signal which will ==11950== probably kill your program. ==11950== ==11950== Process terminating with default action of signal 4 (SIGILL) ==11950== Illegal opcode at address 0x4014C90 ==11950== at 0x4014C90: init_cpu_features (cpu-features.c:72) ==11950== by 0x4014C90: dl_platform_init (dl-machine.h:208) ==11950== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231) ==11950== by 0x40018C3: _dl_start_final (rtld.c:414) ==11950== by 0x4001B47: _dl_start (rtld.c:523) ==11950== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so) The crash occurs because Valgrind is trying to simulate the CPU instructions when debugging a specific process. Valgrind tries to disassemble the whole instructions running by the process and insert the debugging instructions in run time. However, in this case, Valgrind cannot identify the MIDR_EL1 flag which happens in the "mrs %0, midr_el1" instruction. And this instruction means to read the CPU ID state register to %0(id) variable. asm volatile ("mrs %0, midr_el1" : "=r"(id)); so, Valrind cannot recognize what "midr_el1" is and then crashes. - https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt .... d) CPU Identification : - MIDR_EL1 is exposed to help identify the processor. On a - heterogeneous system, this could be racy (just like getcpu()). The - process could be migrated to another CPU by the time it uses the - register value, unless the CPU affinity is set. Hence, there is no - guarantee that the value reflects the processor that it is - currently executing on. The REVIDR is not exposed due to this - constraint, as REVIDR makes sense only in conjunction with the - MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs - at: + MIDR_EL1 is exposed to help identify the processor. On a + heterogeneous system, this could be racy (just like getcpu()). The + process could be migrated to another CPU by the time it uses the + register value, unless the CPU affinity is set. Hence, there is no + guarantee that the value reflects the processor that it is + currently executing on. The REVIDR is not exposed due to this + constraint, as REVIDR makes sense only in conjunction with the + MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs + at: - /sys/devices/system/cpu/cpu$ID/regs/identification/ - \- midr - \- revidr + /sys/devices/system/cpu/cpu$ID/regs/identification/ + \- midr + \- revidr [Test Case] 1) Write a 'Hello World' program: ---- #include <stdio.h> void main(void) { printf("Hello World!\n"); }; ---- 2) Build it: $ cc -o hello hello.c 3) Then run valgrind on it: $ valgrind ./hello [Regression Potential] For the regression possibility, it should be fine. The symtpom happens when Valgrind is trying to disassemble code inside glibc (sysdeps/unix/sysv/linux/aarch64/cpu-features.c): Even if the HWCAP_CPUID is not supported, the default value is to assign 0 to the midr variable. So, I think it's not an important feature to support. + + As stated in the fix itself as a comment: + + ++ /* Limit the AT_HWCAP to just those features we explicitly + ++ support in VEX. */ + Additionally, the fix is found in Ubuntu already (disco and late). For some reasons, if a regression happens, the regression will be limited to ARM arch and shouldn't affect other cpu(s) architecture. [Other information] Upstream fix: https://sourceware.org/git/?p=valgrind.git;a=commit;h=fbbb696c5d1e93d4ac6cb548c68bb3f443ceef42 * Only affecting Bionic: # git describe --contains fbbb696c5d1e93d4ac6cb548c68bb3f443ceef42 VALGRIND_3_14_0~96 # rmadison valgrind => valgrind | 1:3.13.0-2ubuntu2.1 | bionic-updates valgrind | 1:3.14.0-2ubuntu6 | disco valgrind | 1:3.15.0-1ubuntu3.1 | eoan-updates valgrind | 1:3.15.0-1ubuntu5 | focal [Original Description] I'm performing Valgrind testing on an ElPotato running Ubuntu Bionic Aarch64 image. My program is dying like in https://bugs.kde.org/show_bug.cgi?id=381556 : ``` $ valgrind --track-origins=yes --suppressions=cryptopp.supp ./cryptest.exe v ==12969== Memcheck, a memory error detector ==12969== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==12969== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==12969== Command: ./cryptest.exe v ==12969== ARM64 front end: branch_etc disInstr(arm64): unhandled instruction 0xD5380000 disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000 ==12969== valgrind: Unrecognised instruction at address 0x4014c90. ==12969== at 0x4014C90: init_cpu_features (cpu-features.c:72) ==12969== by 0x4014C90: dl_platform_init (dl-machine.h:208) ==12969== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231) ==12969== by 0x40018C3: _dl_start_final (rtld.c:414) ==12969== by 0x4001B47: _dl_start (rtld.c:523) ==12969== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so) ... ``` Here's a similar Red Hat issue report: https://bugzilla.redhat.com/show_bug.cgi?id=1467952 . Please pickup the patch in the 381556 bug report. ----- $ lsb_release -rd Description: Ubuntu 18.04.2 LTS Release: 18.04 $ apt-cache policy valgrind valgrind: Installed: 1:3.13.0-2ubuntu2.1 Candidate: 1:3.13.0-2ubuntu2.1 Version table: *** 1:3.13.0-2ubuntu2.1 500 500 http://ports.ubuntu.com bionic-updates/main arm64 Packages 100 /var/lib/dpkg/status 1:3.13.0-2ubuntu2 500 500 http://ports.ubuntu.com bionic/main arm64 Packages -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1826811 Title: Valgrind unhandled instruction 0xD5380000 on Aarch64 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs