Hi,

I'm cross compiling for 32 bit bare metal ARMs (modern ones: Cortex-M4 and
Cortex M-33) w/ gcc 12.3.0, which is the latest available from ARM, (see
gcc -v output below) and have found that va_arg(..., double) (i.e.
__builtin_va_arg()) assumes that doubles are 64-bit aligned, but the stack
is not always so.

I searched the bug database but didn't see this, so I'm guessing this isn't
a GCC bug--the ARM world would be on fire if it were. And I've searched the
gcc command line options docs, and the ARM architecture docs to no avail.
I'm hoping I didn't miss something obvious...

So, does gcc assume or require that doubles on the stack be 64-bit aligned,
or is there an option we should be passing to either allow 32-bit alignment
or force 64-bit alignment, or is the MCU vendor's startup code a wee buggy
(this is what I suspect, but wanted to be damn sure before continuing)?

Here's the test code:

void va_args_test(int i, ...) {
    va_list args;
    va_start(args, i);
    double d = (int)va_arg(args, double);
    va_end(args);
    // display code elided
}

Here's the generated assembly, with commentary mine:

void va_args_test(int i, ...) {
    3f60:→  b40f      → push→   {r0, r1, r2, r3}
    3f62:→  b580      → push→   {r7, lr}
    3f64:→  b082      → sub→sp, #8
    3f66:→  af00      → add→r7, sp, #0

    va_list args;
    3f68:→  2300      → movs→   r3, #0
    3f6a:→  607b      → str→r3, [r7, #4]

    va_start(args, i);
    3f6c:→  f107 0314 → add.w→  r3, r7, #20
    3f70:→  607b      → str→r3, [r7, #4]

    double d = (int)va_arg(args, double);
    3f72:→  f107 031b → add.w→  r3, r7, #27   ; Loads the address of the
last byte of the low order word into r3.
    3f76:→  f023 0307 → bic.w→  r3, r3, #7    ; Clears the low 3 bits,
which works when the double is 64-bit aligned. Not so much otherwise.
    3f7a:→  f103 0208 → add.w→  r2, r3, #8    ; Increments args' internal
pointer
    3f7e:→  607a      → str→r2, [r7, #4]      ; Saves that pointer
    3f80:→  e9d3 0100 → ldrd→   r0, r1, [r3]  ; Reads the double, right or
wrong...

Here's the call site assembly:

    va_args_test(0, (double)1.0);
    3fc2:→  2200      → movs→   r2, #0
    3fc4:→  4b09      → ldr→r3, [pc, #36]→  ; (3fec <main+0x44>)
    3fc6:→  2000      → movs→   r0, #0
    3fc8:→  4909      → ldr→r1, [pc, #36]→  ; (3ff0 <main+0x48>)
    3fca:→  4788      → blx→r1

This is using GCC 12.3.0, cross-compiling for ARM on x86_64 (gcc -v output
below sig), with a command line like

arm-none-eabi-gcc -o ../build/main/PAC5524/tmp/base/src/main.o
base/src/main.c <<-I options elided>>> -mcpu=cortex-m4 -march=armv7e-m
-mfpu=fpv4-sp-d16 -std=gnu99 -ffunction-sections -fno-omit-frame-pointer
-fno-strict-overflow -fsingle-precision-constant
-ftrivial-auto-var-init=zero -mthumb -mlittle-endian -mlong-calls
-mfloat-abi=hard -Og -c -MD -MP

Removing any one of the -f options happens to align the stack correctly in
most cases (I've elided the -f options that don't affect this issue as far
as I can tell).

Many thanks,

Barrie

gcc -v output:

Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/share/arm-gnu-toolchain-12.3.rel1-x86_64-arm-none-eabi/bin/../libexec/gcc/arm-none-eabi/12.3.1/lto-wrapper
Target: arm-none-eabi
Configured with:
/data/jenkins/workspace/GNU-toolchain/arm-12/src/gcc/configure
--target=arm-none-eabi
--prefix=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install
--with-gmp=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
--with-mpfr=/data/jenkins/workspace/GNU-toolchai
n/arm-12/build-arm-none-eabi/host-tools
--with-mpc=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
--with-isl=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
--disable-shared --disable-nls --disable-threads --disable-tls
--enable-checking=release --enable-language
s=c,c++,fortran --with-newlib --with-gnu-as --with-gnu-ld
--with-sysroot=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install/arm-none-eabi
--with-multilib-list=aprofile,rmprofile --with-pkgversion='Arm GNU
Toolchain 12.3.Rel1 (Build arm-12.35)' --with-bugurl=
https://bugs.linaro.org/
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 12.3.1 20230626 (Arm GNU Toolchain 12.3.Rel1 (Build arm-12.35))

Test code (the LED lights very prettily when va_arg() returns the correct
value):

void va_args_test(int i, ...) {
    va_list args;
    va_start(args, i);
    i = (int)va_arg(args, double);
    va_end(args);
    bal_init();
    bal_set_AUX_LED1(i == 1);
}

int main(void) {
   ...CPU initialization elided...
    va_args_test(0, (double)1.0);
    while (true) {
    }
}

Reply via email to