https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462
Bug ID: 92462 Summary: [arm32] -ftree-pre makes a variable to be wrongly hoisted out Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: aleksei.voity...@bell-sw.com Target Milestone: --- Created attachment 47212 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47212&action=edit reduced testcase from the openjdk sources While compiling openjdk sources I bumped into a bug: when -ftree-pre is enabled the optimizer hoists out reload of a variable which subsequently leads to the infinite loop situation. Below is the relevant piece of code and "new_value" is the variable that gets hoisted out. template<typename T> inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value, T volatile* dest, T compare_value, atomic_memory_order order) const { printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n", exchange_value, compare_value); uint8_t canon_exchange_value = exchange_value; uint8_t canon_compare_value = compare_value; volatile uint32_t* aligned_dest = reinterpret_cast<volatile uint32_t*>(align_down(dest, sizeof(uint32_t))); size_t offset = (uintptr_t)dest - (uintptr_t)aligned_dest; uint32_t cur = *aligned_dest; uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur); cur_as_bytes[offset] = canon_compare_value; do { uint32_t new_value = cur; reinterpret_cast<uint8_t*>(&new_value)[offset] = canon_exchange_value; printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n", new_value, cur); uint32_t res = cmpxchg(new_value, aligned_dest, cur, order); if (res == cur) break; cur = res; } while (cur_as_bytes[offset] == canon_compare_value); return PrimitiveConversions::cast<T>(cur_as_bytes[offset]); } $ g++ -O1 -ftree-pre t.cpp $ ./a.out Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0 $ g++ -O1 t.cpp $ ./a.out Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256 Below is the assembler of the loop for the correct version: .L7: ldr r4, [sp] str r4, [sp, #4] strb r7, [r5, #-4] mov r2, r4 ldr r1, [sp, #4] mov r0, r6 bl printf cbz r4, .L6 movs r3, #0 str r3, [sp] ldrb r3, [r8] @ zero_extendqisi2 cmp r3, #1 beq .L7 and for the incorrect one: .L7: str r4, [sp, #4] strb r8, [r6] mov r2, r4 ldr r1, [sp, #4] mov r0, r5 bl printf cbz r4, .L6 movs r4, #0 str r4, [sp] ldrb r3, [r7] @ zero_extendqisi2 cmp r3, #1 beq .L7