On Wed, Jun 29, 2022 at 5:34 PM Wookey <woo...@wookware.org> wrote: > > On 2022-06-29 15:13 +0200, Mathieu Malaterre wrote: > > On Wed, Jun 29, 2022 at 2:48 PM Wookey <woo...@wookware.org> wrote: > > > > What exactly is going wrong when you try to use valgrind? > > > > Well you should see something like this on abel.d.o: > > > > * https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=928224#27 > > > > Basically anytime you build valgrind using gcc-11 or gcc-12 (debian > > sid package), you get this weird illegal instruction: > > > > ``` > > % ./vg-in-place > > Illegal instruction > > ``` > > I have a strong suspicion that this is neon-itis. The issue generally > manifests as 'illegal instuction' (i.e a neon instruction is issued on > hardware that isn't able to execute it). It has always been the case > that software should not assume neon is present on v7 (because it > isn't on all hardware), and most code gets this right, but I've > recently seen gcc putting those instuctions into the startup code > (where the C-environment is set up and variables allocated) which gets > executed _before_ any functions checking for which HWCAPS to enable, > and thus which code to run. > > You can check if a binary contains NEON instructions using > readelf -A > > and look for > Tag_Advanced_SIMD_arch: NEONv1 > > However just because its in the binary doesn't mean it's wrong. The > binary may have been built using ifunc or other mechanisms to choose > appropriate functions depending whether or not neon hardware is available. > > A simple check for whether this is your issue is just to run the same test on > harris.debian.org. > If it works OK there that strongly suggests you have a neon problem. > > Also if you run the program under gdb (on abel) and when it barfs do: > (gdb) disassemble > and look for instructions that start with 'v', like 'vmov.i32' > that will confirm which instruction is tripping it up. > > This bug has an example of the problem: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998043 > > I got partway thorugh a long followup with some details of possible > fixes some months ago but got sidetracked (and oh look it's been > pending for 6 months already). > > The reason this has broken appears to be that gcc has changed the way > the fpu is specified/defaulted, so neon _and_ fp are enabled by > default if no specific fpu option is given. (i.e we just set > -march=armv7). It used to be that -march=armv7 implied +nosimd. (or > something like that - I never quite got to the bottom of it enough to > be sure eactly what the right general or specific fix was). > > If you rebuild with > -march=armv7-a+nosimd+nofp > or > -march=armv7-a+nosimd+fp > you should be able to determine if being more explicit about the fp and > simd(neon) instructions used makes it behave.
If I compare gcc-10 vs gcc-11 I see: malat@abel ~ % gcc-10 --verbose Using built-in specs. COLLECT_GCC=gcc-10 COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/10/lto-wrapper Target: arm-linux-gnueabihf Configured with: ../src/configure -v --with-pkgversion='Debian 10.3.0-16' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 10.3.0 (Debian 10.3.0-16) while malat@abel ~ % gcc-11 --verbose Using built-in specs. COLLECT_GCC=gcc-11 COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/11/lto-wrapper Target: arm-linux-gnueabihf Configured with: ../src/configure -v --with-pkgversion='Debian 11.3.0-3' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv7-a+fp --with-float=hard --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.3.0 (Debian 11.3.0-3) Could someone confirm, the spec file is accurate for Debian armhf (no neon) ? I fail to understand why spec file would be different for us (--with-arch=armv7-a --with-fpu=vfpv3-d16 suddenly became --with-arch=armv7-a+fp). If I read the doc online correctly: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html states: -mfpu=name [...] The setting ‘auto’ is the default and is special. It causes the compiler to select the floating-point and Advanced SIMD instructions based on the settings of -mcpu and -march. In the case of valgrind I can see: ` -marm -mcpu=cortex-a8` I cannot find in the doc what 'cortex-a8' stands for: neon or not neon ? > It seems likely that you have hit this problem. > I think this is the same thing too: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=982794 > (Firefox dying with illegal instruction on non-neon hardware) > I _suspect_ that debian needs to change the default flags to actually > say 'armv7+fp+nosimd' by default so that we get what we expect (and > define as the base ISA) and it doesn't depend on what hardware the > build was done on. Ah ! Now it starts to makes sense. > Wookey > -- > Principal hats: Debian, Wookware, ARM > http://wookware.org/