https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104261
Bug ID: 104261 Summary: gcc uses fildq and fistpq on unaligned addesss for atomic accesses Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mikulas at artax dot karlin.mff.cuni.cz Target Milestone: --- GCC uses the instructions fildq and fistpq to read and write atomic 8-byte quantities. According to Intel documentation, these instructions are only atomic if the address is aligned to 8 bytes. When accessing an 8-byte atomic variable that is part of a structure, gcc misaligns the atomic variable so that it is only aligned to 4 bytes and then uses the fildq and fistpq instructions on it - these instructions are not atomic because of the misalignment. I've created this program that demonstrates the problem: struct s { unsigned misalign; struct { _Atomic unsigned long long a; }; }; unsigned size(void) { return sizeof(struct s); } unsigned align(void) { return __alignof__(struct s); } unsigned long long atomic_load(struct s *s) { return s->a; } void atomic_store(struct s *s, unsigned long long v) { s->a = v; } void atomic_inc(struct s *s) { s->a++; } If you compile it with -m32 -O2, you get this output: atomic.c:5:9: note: the alignment of ‘_Atomic long long unsigned int’ fields changed in GCC 11.1 5 | }; | ^ .file "atomic.c" .text .p2align 4 .globl size .type size, @function size: movl $12, %eax ret .size size, .-size .p2align 4 .globl align .type align, @function align: movl $4, %eax ret .size align, .-align .p2align 4 .globl atomic_load .type atomic_load, @function atomic_load: subl $12, %esp movl 16(%esp), %eax fildq 4(%eax) fistpq (%esp) movl (%esp), %eax movl 4(%esp), %edx addl $12, %esp ret .size atomic_load, .-atomic_load .p2align 4 .globl atomic_store .type atomic_store, @function atomic_store: pushl %ebx subl $8, %esp movl 24(%esp), %ebx movl 20(%esp), %ecx movl %ecx, (%esp) movl %ebx, 4(%esp) fildq (%esp) movl 16(%esp), %eax fistpq 4(%eax) lock orl $0, (%esp) addl $8, %esp popl %ebx ret .size atomic_store, .-atomic_store .p2align 4 .globl atomic_inc .type atomic_inc, @function atomic_inc: pushl %ebp pushl %edi pushl %esi pushl %ebx movl 20(%esp), %esi movl 4(%esi), %eax movl 8(%esi), %edx .L9: movl %eax, %ecx movl %edx, %ebx addl $1, %ecx adcl $0, %ebx movl %ebx, %ebp movl %ecx, %ebx movl %ebp, %ecx lock cmpxchg8b 4(%esi) jne .L9 popl %ebx popl %esi popl %edi popl %ebp ret .size atomic_inc, .-atomic_inc .ident "GCC: (Debian 12-20220116-1) 12.0.0 20220116 (experimental) [master r12-6611-g9d7e19255c0]" .section .note.GNU-stack,"",@progbits See the instructions "fildq 4(%eax)" and "fistpq 4(%eax)" that access misaligned address, and thus they are not atomic. The problem was already partially fixed in gcc-10 by changing the ABI, so that atomic variable alignment is 8 bytes, however, if we use nested structures, the alignment is still incorrect in the current gcc-12.