Hi, I'm cross compiling for 32 bit bare metal ARMs (modern ones: Cortex-M4 and Cortex M-33) w/ gcc 12.3.0, which is the latest available from ARM, (see gcc -v output below) and have found that va_arg(..., double) (i.e. __builtin_va_arg()) assumes that doubles are 64-bit aligned, but the stack is not always so.
I searched the bug database but didn't see this, so I'm guessing this isn't a GCC bug--the ARM world would be on fire if it were. And I've searched the gcc command line options docs, and the ARM architecture docs to no avail. I'm hoping I didn't miss something obvious... So, does gcc assume or require that doubles on the stack be 64-bit aligned, or is there an option we should be passing to either allow 32-bit alignment or force 64-bit alignment, or is the MCU vendor's startup code a wee buggy (this is what I suspect, but wanted to be damn sure before continuing)? Here's the test code: void va_args_test(int i, ...) { va_list args; va_start(args, i); double d = (int)va_arg(args, double); va_end(args); // display code elided } Here's the generated assembly, with commentary mine: void va_args_test(int i, ...) { 3f60:→ b40f → push→ {r0, r1, r2, r3} 3f62:→ b580 → push→ {r7, lr} 3f64:→ b082 → sub→sp, #8 3f66:→ af00 → add→r7, sp, #0 va_list args; 3f68:→ 2300 → movs→ r3, #0 3f6a:→ 607b → str→r3, [r7, #4] va_start(args, i); 3f6c:→ f107 0314 → add.w→ r3, r7, #20 3f70:→ 607b → str→r3, [r7, #4] double d = (int)va_arg(args, double); 3f72:→ f107 031b → add.w→ r3, r7, #27 ; Loads the address of the last byte of the low order word into r3. 3f76:→ f023 0307 → bic.w→ r3, r3, #7 ; Clears the low 3 bits, which works when the double is 64-bit aligned. Not so much otherwise. 3f7a:→ f103 0208 → add.w→ r2, r3, #8 ; Increments args' internal pointer 3f7e:→ 607a → str→r2, [r7, #4] ; Saves that pointer 3f80:→ e9d3 0100 → ldrd→ r0, r1, [r3] ; Reads the double, right or wrong... Here's the call site assembly: va_args_test(0, (double)1.0); 3fc2:→ 2200 → movs→ r2, #0 3fc4:→ 4b09 → ldr→r3, [pc, #36]→ ; (3fec <main+0x44>) 3fc6:→ 2000 → movs→ r0, #0 3fc8:→ 4909 → ldr→r1, [pc, #36]→ ; (3ff0 <main+0x48>) 3fca:→ 4788 → blx→r1 This is using GCC 12.3.0, cross-compiling for ARM on x86_64 (gcc -v output below sig), with a command line like arm-none-eabi-gcc -o ../build/main/PAC5524/tmp/base/src/main.o base/src/main.c <<-I options elided>>> -mcpu=cortex-m4 -march=armv7e-m -mfpu=fpv4-sp-d16 -std=gnu99 -ffunction-sections -fno-omit-frame-pointer -fno-strict-overflow -fsingle-precision-constant -ftrivial-auto-var-init=zero -mthumb -mlittle-endian -mlong-calls -mfloat-abi=hard -Og -c -MD -MP Removing any one of the -f options happens to align the stack correctly in most cases (I've elided the -f options that don't affect this issue as far as I can tell). Many thanks, Barrie gcc -v output: Using built-in specs. COLLECT_GCC=arm-none-eabi-gcc COLLECT_LTO_WRAPPER=/usr/share/arm-gnu-toolchain-12.3.rel1-x86_64-arm-none-eabi/bin/../libexec/gcc/arm-none-eabi/12.3.1/lto-wrapper Target: arm-none-eabi Configured with: /data/jenkins/workspace/GNU-toolchain/arm-12/src/gcc/configure --target=arm-none-eabi --prefix=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install --with-gmp=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools --with-mpfr=/data/jenkins/workspace/GNU-toolchai n/arm-12/build-arm-none-eabi/host-tools --with-mpc=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools --with-isl=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools --disable-shared --disable-nls --disable-threads --disable-tls --enable-checking=release --enable-language s=c,c++,fortran --with-newlib --with-gnu-as --with-gnu-ld --with-sysroot=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install/arm-none-eabi --with-multilib-list=aprofile,rmprofile --with-pkgversion='Arm GNU Toolchain 12.3.Rel1 (Build arm-12.35)' --with-bugurl= https://bugs.linaro.org/ Thread model: single Supported LTO compression algorithms: zlib gcc version 12.3.1 20230626 (Arm GNU Toolchain 12.3.Rel1 (Build arm-12.35)) Test code (the LED lights very prettily when va_arg() returns the correct value): void va_args_test(int i, ...) { va_list args; va_start(args, i); i = (int)va_arg(args, double); va_end(args); bal_init(); bal_set_AUX_LED1(i == 1); } int main(void) { ...CPU initialization elided... va_args_test(0, (double)1.0); while (true) { } }