http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56351
Bug #: 56351 Summary: ARM Big-Endian: storing local double to packed variable causes corruption Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: set...@google.com Created attachment 29478 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29478 Test case which demonstrates incorrect codegen The attached code behaves incorrectly on my platform with gcc 4.7.2. In particular, the output is: val is: 1.234567 (0x3FF3C0C9:539B8887) Calling PrintAndStoreUnaligned: 57432423068808260924249171392224224725059031612325630140261797720764832869069412330679690067968.000000 (0x539B8887:3FF3C0C9) unaligned_double.val is: 57432423068808260924249171392224224725059031612325630140261797720764832869069412330679690067968.000000 (0x539B8887:3FF3C0C9) It appears that storing a double parameter into an unaligned variable can cause all accesses to that parameter within the function to have the upper and lower 32 bits swapped. This code is being built for a TI TMS570-series processor, although I suspect the problem would occur with any big-endian ARM target with VFPv3 floating-point support. Here's compiler info. To build the compiler with these flags requires a minor patch: http://gcc.gnu.org/ml/gcc-patches/2013-02/msg00791.html % third_party/car/embedded/toolchains/gcc_tms570/bin/armeb-unknown-eabi-gcc -v -save-temps -O1 -c gcc_bug.c -o gcc_bug.o -Wa,-adhlsn=gcc_bug.lst Using built-in specs. COLLECT_GCC=third_party/car/embedded/toolchains/gcc_tms570/bin/armeb-unknown-eabi-gcc Target: armeb-unknown-eabi Configured with: ../gcc-4.7.2/configure --prefix=/usr/local/google/armeb/toolchain --build=x86_64-cross-linux-gnu --target=armeb-unknown-eabi --host=x86_64-cross-linux-gnu --with-sysroot=/usr/local/google/armeb/sysroot --with-newlib --with-headers=../newlib-1.19.0/newlib/libc/include --disable-nls --enable-languages=c,c++ --enable-c99 --enable-long-long --with-mpfr=/usr/local/google/armeb/toolchain --with-gmp=/usr/local/google/armeb/toolchain --with-mpc=/usr/local/google/armeb/toolchain --disable-multilib --with-abi=aapcs --with-arch=armv7-r --with-mode=thumb --with-float=hard --with-fpu=vfpv3-d16 --disable-threads --disable-shared --disable-libgomp --disable-libmudflap --disable-libssp Thread model: single gcc version 4.7.2 (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-c' '-o' 'gcc_bug.o' '-march=armv7-r' '-mfloat-abi=hard' '-mfpu=vfpv3-d16' '-mabi=aapcs' '-mthumb' /google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../libexec/gcc/armeb-unknown-eabi/4.7.2/cc1 -E -quiet -v -iprefix /google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/armeb-unknown-eabi/4.7.2/ -D__USES_INITFINI__ gcc_bug.c -march=armv7-r -mfloat-abi=hard -mfpu=vfpv3-d16 -mabi=aapcs -mthumb -O1 -fpch-preprocess -o gcc_bug.i ignoring duplicate directory "/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/../../lib/gcc/armeb-unknown-eabi/4.7.2/include" ignoring nonexistent directory "/usr/local/google/armeb/sysroot/usr/local/include" ignoring duplicate directory "/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/../../lib/gcc/armeb-unknown-eabi/4.7.2/include-fixed" ignoring duplicate directory "/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/../../lib/gcc/armeb-unknown-eabi/4.7.2/../../../../armeb-unknown-eabi/include" #include "..." search starts here: #include <...> search starts here: /google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/armeb-unknown-eabi/4.7.2/include /google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/armeb-unknown-eabi/4.7.2/include-fixed /google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/armeb-unknown-eabi/4.7.2/../../../../armeb-unknown-eabi/include /usr/local/google/armeb/sysroot/usr/include End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-c' '-o' 'gcc_bug.o' '-march=armv7-r' '-mfloat-abi=hard' '-mfpu=vfpv3-d16' '-mabi=aapcs' '-mthumb' /google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../libexec/gcc/armeb-unknown-eabi/4.7.2/cc1 -fpreprocessed gcc_bug.i -quiet -dumpbase gcc_bug.c -march=armv7-r -mfloat-abi=hard -mfpu=vfpv3-d16 -mabi=aapcs -mthumb -auxbase-strip gcc_bug.o -O1 -version -o gcc_bug.s GNU C (GCC) version 4.7.2 (armeb-unknown-eabi) compiled by GNU C version 4.6.x-google 20120601 (prerelease), GMP version 5.0.5, MPFR version 3.1.1, MPC version 1.0.1 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C (GCC) version 4.7.2 (armeb-unknown-eabi) compiled by GNU C version 4.6.x-google 20120601 (prerelease), GMP version 5.0.5, MPFR version 3.1.1, MPC version 1.0.1 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 67327bcd17af73e1cc289bfa68add0a9 COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-c' '-o' 'gcc_bug.o' '-march=armv7-r' '-mfloat-abi=hard' '-mfpu=vfpv3-d16' '-mabi=aapcs' '-mthumb' /google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/armeb-unknown-eabi/4.7.2/../../../../armeb-unknown-eabi/bin/as -march=armv7-r -mfloat-abi=hard -mfpu=vfpv3-d16 -meabi=5 -adhlsn=gcc_bug.lst -o gcc_bug.o gcc_bug.s COMPILER_PATH=/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../libexec/gcc/armeb-unknown-eabi/4.7.2/:/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../libexec/gcc/:/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/armeb-unknown-eabi/4.7.2/../../../../armeb-unknown-eabi/bin/ LIBRARY_PATH=/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/armeb-unknown-eabi/4.7.2/:/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/:/google/src/cloud/sethml/head2/google3/third_party/car/embedded/toolchains/gcc_tms570/bin/../lib/gcc/armeb-unknown-eabi/4.7.2/../../../../armeb-unknown-eabi/lib/ COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O1' '-c' '-o' 'gcc_bug.o' '-march=armv7-r' '-mfloat-abi=hard' '-mfpu=vfpv3-d16' '-mabi=aapcs' '-mthumb' Here's disassembly of the bad code. The problem seems to be the the fmrs instructions copying from s0/s1. ARM document "DDI0363E ARM Cortex-R4-r1p3 technical reference" in section "12.2.1 FPU views of the register bank" says: The mapping between the registers is as follows: • S<2n> maps to the least significant half of D<n> • S<2n+1> maps to the most significant half of D<n>. So, in the code below, r4 gets s0 which is the LSB of d0, and r5 gets s1 which is the MSB of d0. Then it stores r4 first in memory - incorrect for a big-endian architecture. Likewise, the fmdrr instruction is defined as taking the LSB from the first argument, so the fmdrr instruction on line 64 reassembles d0 with its halves swapped. (It's also worth noting that the code below creates a lot of unnecessary temporaries, but that's not my bug.) On gcc 4.7.2: 56 _ZN3car22PrintAndStoreUnalignedEd: 59 002c B538 push {r3, r4, r5, lr} 60 002e EE104A10 fmrs r4, s0 @ int 61 0032 EE105A90 fmrs r5, s1 @ int 62 0036 EE102A10 fmrs r2, s0 @ int 63 003a EE103A90 fmrs r3, s1 @ int 64 003e EC423B10 fmdrr d0, r3, r2 65 0042 F7FFFFFE bl _ZN3car11PrintDoubleEd 66 0046 F2400300 movw r3, #:lower16:.LANCHOR0 67 004a F2C00300 movt r3, #:upper16:.LANCHOR0 68 004e 605D str r5, [r3, #4] 69 0050 601C str r4, [r3, #0] 70 0052 BD38 pop {r3, r4, r5, pc} The latest gcc 4.8 snapshot produces correct code, although I'm not totally convinced that it's fixed the underlying problem, as opposed to just happening to avoid the problem by emitting slightly different instructions: On gcc 4.8-20130210: 44 PrintAndStoreUnaligned: 47 0020 B538 push {r3, r4, r5, lr} 48 0022 EC523B10 fmrrd r3, r2, d0 49 0026 4614 mov r4, r2 50 0028 461D mov r5, r3 51 002a 4622 mov r2, r4 52 002c EC423B10 fmdrr d0, r3, r2 53 0030 F7FFFFFE bl PrintDouble 54 0034 F2400300 movw r3, #:lower16:unaligned_double 55 0038 F2C00300 movt r3, #:upper16:unaligned_double 56 003c 605D str r5, [r3, #4] 57 003e 601C str r4, [r3] 58 0040 BD38 pop {r3, r4, r5, pc} I'm currently building gcc 4.7-20130209 to see if the bug is already fixed in the 4.7 branch. I'll update this bug when my build completes.