http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

             Bug #: 51980
           Summary: ARM - Neon code polluted by useless stores to the
                    stack with vuzpq / vzipq / vtrnq
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: eric.ba...@allegorithmic.com


Created attachment 26442
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26442
Minimal repro case (C file)

When using UZP/ZIP/TRN Neon intrinsics, gcc-trunk generates a whole lot of
stack operations (and associated stack alignment operations) even if everything
can purely be done using Neon registers. 

Compiler used is GCC trunk, rev 183468, compiled with Android's build-gcc.sh
(arm-linux-androideabi).

Command line is:
arm-linux-androideabi-g++ -c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard
-mfpu=vfp -flax-vector-conversions -mfpu=neon -O2 -o test.s test.c -S

Generated assembly code for attached C file is:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
    vabd.u8    q1, q0, q1
    stmfd    sp!, {r4, fp}       <= Unnecessary
    add    fp, sp, #4          <= Unnecessary
    sub    sp, sp, #48         <= Unnecessary
    add    r3, sp, #15         <= Unnecessary
    vmull.u8    q0, d2, d2
    bic    r3, r3, #15         <= Unnecessary
    vmull.u8    q8, d3, d3
    vuzp.32    q0, q8
    vstmia    r3, {d0-d1}         <= Unnecessary, caused by vuzp.32
    vstr    d16, [r3, #16]      <= Unnecessary, caused by vuzp.32
    vstr    d17, [r3, #24]      <= Unnecessary, caused by vuzp.32
    vpaddl.u16    q0, q0
    vpadal.u16    q0, q8
    sub    sp, fp, #4          <= Unnecessary
    ldmfd    sp!, {r4, fp}       <= Unnecessary
    bx    lr

As no stack operation is needed in this function, ideally the following should
be generated instead:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
    vabd.u8    q1, q0, q1
    vmull.u8    q0, d2, d2
    vmull.u8    q8, d3, d3
    vuzp.32    q0, q8
    vpaddl.u16    q0, q0
    vpadal.u16    q0, q8
    bx    lr

This makes even tight Neon functions written with intrinsics much larger and
slower than necessary, and makes it very hard to write performance-oriented
code with intrinsics in arm-gcc.

gcc -v yields:
Using built-in specs.
COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++
COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper
Target: arm-linux-androideabi
Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu
--with-gnu-as --with-gnu-ld --enable-languages=c,c++
--with-gmp=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp
--enable-threads --disable-nls --disable-libmudflap --disable-libgomp
--disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls
--with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace
--enable-initfini-array --disable-nls
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot
--with-binutils-version=2.21.53 --with-mpfr-version=3.0.1
--with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6
--with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3
--program-transform-name='s,^,arm-linux-androideabi-,'
Thread model: posix
gcc version 4.7.0 20120124 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/cc1plus
-quiet -v -imultilib armv7-a -D_GNU_SOURCE test.c -mbionic -fPIC -quiet
-dumpbase test.c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp
-mfpu=neon -mtls-dialect=gnu -auxbase-strip test.s -O2 -version
-flax-vector-conversions -o test.s -fno-exceptions -fno-rtti
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
    compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/arm-linux-androideabi/armv7-a"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/backward"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/local/include"
#include "..." search starts here:
#include <...> search starts here:

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include-fixed

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/include
End of search list.
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
    compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: d84173bb26a7319ac9d4c1278a6a7e04
COMPILER_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/bin/
LIBRARY_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/lib/
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'

Reply via email to