https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116383
Bug ID: 116383 Summary: Value from __atomic_store not forwarded to non-atomic load at same address Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: redbeard0531 at gmail dot com Target Milestone: --- https://godbolt.org/z/1bbjoc87n int test(int* i, int val) { __atomic_store_n(i, val, __ATOMIC_RELAXED); return *i; } The non-atomic load should be able to directly use the value stored by the atomic store, but instead GCC issues a new load: mov DWORD PTR [rdi], esi mov eax, DWORD PTR [rdi] ret Clang recognizes that the load is unnecessary and propagates the value: mov eax, esi mov dword ptr [rdi], esi ret In addition to simply being an unnecessary load, there is an additionally penalty in most CPUs from accessing reading a value still in the CPU's store buffer which it almost certainly would be in this case. And of course this also disables further optimizations eg DSE and value propagation where the compiler knows something about the value. void blocking_further_optimizations(int* i) { if (test(i, 1) == 0) { __builtin_abort(); } } generates the following with gcc mov DWORD PTR [rdi], 1 mov edx, DWORD PTR [rdi] test edx, edx je .L5 ret blocking_further_optimizations(int*) [clone .cold]: .L5: push rax call abort And this much better output with clang mov dword ptr [rdi], 1 ret While I'm using a relaxed store here to show that gcc doesn't apply the optimization in that case, I think the optimization should apply regardless of memory ordering (and clang seems to agree). Also while the minimal example code is contrived, there are several real-world use cases where this pattern can come up. I would expect it in cases where there is a single writer thread but many reader threads. The writes and off-thread reads need to use __atomic ops to avoid data races, but on-thread reads should be safe using ordinary loads, and you would want them to be optimized as much as possible.