https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|INVALID |FIXED --- Comment #7 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #2) > (In reply to Wilco from comment #1) > > Even if you fix the aliasing bugs, it won't emulate a byte-oriented cmpxchg > > correctly, there are bugs in the logic too. > > More than that, it will never be atomic. You can't do byte wise cmpxchg for > an word size and think it will be atomic. And since both Arm and AArch64 have byte-wise __atomic_compare_exchange_n, I can't see a reason to even try when the compiler already does it correctly and efficiently. So my advice is to remove this function altogether and all related code from openjdk - it's completely incorrect and counterproductive.(In reply to Aleksei Voitylov from comment #4) > (In reply to Wilco from comment #1) > > (In reply to Aleksei Voitylov from comment #0) > > > Created attachment 47212 [details] > > > reduced testcase from the openjdk sources > > > > > > While compiling openjdk sources I bumped into a bug: when -ftree-pre is > > > enabled the optimizer hoists out reload of a variable which subsequently > > > leads to the infinite loop situation. > > > > > > Below is the relevant piece of code and "new_value" is the variable that > > > gets hoisted out. > > > > > > template<typename T> > > > inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value, > > > T volatile* dest, > > > T compare_value, > > > atomic_memory_order order) const { > > > printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n", > > > exchange_value, compare_value); > > > uint8_t canon_exchange_value = exchange_value; > > > uint8_t canon_compare_value = compare_value; > > > volatile uint32_t* aligned_dest > > > = reinterpret_cast<volatile uint32_t*>(align_down(dest, > > > sizeof(uint32_t))); > > > size_t offset = (uintptr_t)dest - (uintptr_t)aligned_dest; > > > uint32_t cur = *aligned_dest; > > > uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur); > > > cur_as_bytes[offset] = canon_compare_value; > > > do { > > > uint32_t new_value = cur; > > > reinterpret_cast<uint8_t*>(&new_value)[offset] = > > > canon_exchange_value; > > > printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n", > > > new_value, cur); > > > uint32_t res = cmpxchg(new_value, aligned_dest, cur, order); > > > if (res == cur) break; > > > cur = res; > > > } while (cur_as_bytes[offset] == canon_compare_value); > > > return PrimitiveConversions::cast<T>(cur_as_bytes[offset]); > > > } > > > > > > $ g++ -O1 -ftree-pre t.cpp > > > $ ./a.out > > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 > > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0 > > > > > > $ g++ -O1 t.cpp > > > $ ./a.out > > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 > > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256 > > > > > > Below is the assembler of the loop for the correct version: > > > > > > .L7: > > > ldr r4, [sp] > > > str r4, [sp, #4] > > > strb r7, [r5, #-4] > > > mov r2, r4 > > > ldr r1, [sp, #4] > > > mov r0, r6 > > > bl printf > > > cbz r4, .L6 > > > movs r3, #0 > > > str r3, [sp] > > > ldrb r3, [r8] @ zero_extendqisi2 > > > cmp r3, #1 > > > beq .L7 > > > > > > and for the incorrect one: > > > > > > .L7: > > > str r4, [sp, #4] > > > strb r8, [r6] > > > mov r2, r4 > > > ldr r1, [sp, #4] > > > mov r0, r5 > > > bl printf > > > cbz r4, .L6 > > > movs r4, #0 > > > str r4, [sp] > > > ldrb r3, [r7] @ zero_extendqisi2 > > > cmp r3, #1 > > > beq .L7 > > > > There are serious aliasing bugs in the source - GCC is quite correct in > > assuming that cur and cur_as_bytes[offset] never alias (obviously) and even > > optimize away the cmpxchg (no idea why, that appears wrong). > Isn't > > uint32_t cur = *aligned_dest; > uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur); > > the very definition of the pointer aliasing? Regardless, if the function > being called is doing atomic operations or not, the variable in question > should not be hoisted? This is the essence of this bug. The example as is is completely incorrect - it doesn't even return and ends up executing random code. > Yes, openjdk code is doing nasty things furtheron (and the code predates > builtin gcc operations and is compiled by other compilers as well which may > not be aware of builtins), but the bug as it stands does not depend on that > logic. Arm and AArch64 compilers support byte-wise cmpxchg, so my advice is to remove this function altogether and all related code from openjdk. I can't see a reason to even try when compilers already support it correctly and efficiently.(In reply to Aleksei Voitylov from comment #4) > (In reply to Wilco from comment #1) > > (In reply to Aleksei Voitylov from comment #0) > > > Created attachment 47212 [details] > > > reduced testcase from the openjdk sources > > > > > > While compiling openjdk sources I bumped into a bug: when -ftree-pre is > > > enabled the optimizer hoists out reload of a variable which subsequently > > > leads to the infinite loop situation. > > > > > > Below is the relevant piece of code and "new_value" is the variable that > > > gets hoisted out. > > > > > > template<typename T> > > > inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value, > > > T volatile* dest, > > > T compare_value, > > > atomic_memory_order order) const { > > > printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n", > > > exchange_value, compare_value); > > > uint8_t canon_exchange_value = exchange_value; > > > uint8_t canon_compare_value = compare_value; > > > volatile uint32_t* aligned_dest > > > = reinterpret_cast<volatile uint32_t*>(align_down(dest, > > > sizeof(uint32_t))); > > > size_t offset = (uintptr_t)dest - (uintptr_t)aligned_dest; > > > uint32_t cur = *aligned_dest; > > > uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur); > > > cur_as_bytes[offset] = canon_compare_value; > > > do { > > > uint32_t new_value = cur; > > > reinterpret_cast<uint8_t*>(&new_value)[offset] = > > > canon_exchange_value; > > > printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n", > > > new_value, cur); > > > uint32_t res = cmpxchg(new_value, aligned_dest, cur, order); > > > if (res == cur) break; > > > cur = res; > > > } while (cur_as_bytes[offset] == canon_compare_value); > > > return PrimitiveConversions::cast<T>(cur_as_bytes[offset]); > > > } > > > > > > $ g++ -O1 -ftree-pre t.cpp > > > $ ./a.out > > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 > > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0 > > > > > > $ g++ -O1 t.cpp > > > $ ./a.out > > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 > > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256 > > > > > > Below is the assembler of the loop for the correct version: > > > > > > .L7: > > > ldr r4, [sp] > > > str r4, [sp, #4] > > > strb r7, [r5, #-4] > > > mov r2, r4 > > > ldr r1, [sp, #4] > > > mov r0, r6 > > > bl printf > > > cbz r4, .L6 > > > movs r3, #0 > > > str r3, [sp] > > > ldrb r3, [r8] @ zero_extendqisi2 > > > cmp r3, #1 > > > beq .L7 > > > > > > and for the incorrect one: > > > > > > .L7: > > > str r4, [sp, #4] > > > strb r8, [r6] > > > mov r2, r4 > > > ldr r1, [sp, #4] > > > mov r0, r5 > > > bl printf > > > cbz r4, .L6 > > > movs r4, #0 > > > str r4, [sp] > > > ldrb r3, [r7] @ zero_extendqisi2 > > > cmp r3, #1 > > > beq .L7 > > > > There are serious aliasing bugs in the source - GCC is quite correct in > > assuming that cur and cur_as_bytes[offset] never alias (obviously) and even > > optimize away the cmpxchg (no idea why, that appears wrong). > Isn't > > uint32_t cur = *aligned_dest; > uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur); > > the very definition of the pointer aliasing? Regardless, if the function > being called is doing atomic operations or not, the variable in question > should not be hoisted? This is the essence of this bug. > > Yes, openjdk code is doing nasty things furtheron (and the code predates > builtin gcc operations and is compiled by other compilers as well which may > not be aware of builtins), but the bug as it stands does not depend on that > logic. The example as is is completely incorrect - it doesn't even return and ends up executing random code. > Yes, openjdk code is doing nasty things furtheron (and the code predates > builtin gcc operations and is compiled by other compilers as well which may > not be aware of builtins), but the bug as it stands does not depend on that > logic. Arm and AArch64 compilers support byte-wise cmpxchg, so my advice is to remove this function altogether and all related code from openjdk. I can't see a reason to even try when compilers already support it correctly and efficiently.