[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

wilco at gcc dot gnu.org Tue, 12 Nov 2019 04:13:46 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462


Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|INVALID                     |FIXED

--- Comment #7 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> (In reply to Wilco from comment #1)
> > Even if you fix the aliasing bugs, it won't emulate a byte-oriented cmpxchg
> > correctly, there are bugs in the logic too.
> 
> More than that, it will never be atomic.  You can't do byte wise cmpxchg for
> an word size and think it will be atomic.

And since both Arm and AArch64 have byte-wise __atomic_compare_exchange_n, I
can't see a reason to even try when the compiler already does it correctly and
efficiently.

So my advice is to remove this function altogether and all related code from
openjdk - it's completely incorrect and counterproductive.(In reply to Aleksei
Voitylov from comment #4)
> (In reply to Wilco from comment #1)
> > (In reply to Aleksei Voitylov from comment #0)
> > > Created attachment 47212 [details]
> > > reduced testcase from the openjdk sources
> > > 
> > > While compiling openjdk sources I bumped into a bug: when -ftree-pre is
> > > enabled the optimizer hoists out reload of a variable which subsequently
> > > leads to the infinite loop situation.
> > > 
> > > Below is the relevant piece of code and "new_value" is the variable that
> > > gets hoisted out.
> > > 
> > > template<typename T>
> > > inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value,
> > >         T volatile* dest,
> > >         T compare_value,
> > >         atomic_memory_order order) const {
> > >     printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n",
> > > exchange_value, compare_value);
> > >     uint8_t canon_exchange_value = exchange_value;
> > >     uint8_t canon_compare_value = compare_value;
> > >     volatile uint32_t* aligned_dest
> > >         = reinterpret_cast<volatile uint32_t*>(align_down(dest,
> > > sizeof(uint32_t)));
> > >     size_t offset = (uintptr_t)dest  - (uintptr_t)aligned_dest;
> > >     uint32_t cur = *aligned_dest;
> > >     uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur);
> > >     cur_as_bytes[offset] = canon_compare_value;
> > >     do {
> > >         uint32_t new_value = cur;
> > >         reinterpret_cast<uint8_t*>(&new_value)[offset] =
> > > canon_exchange_value;
> > >         printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n",
> > > new_value, cur);
> > >         uint32_t res = cmpxchg(new_value, aligned_dest, cur, order);
> > >         if (res == cur) break;
> > >         cur = res;
> > >     } while (cur_as_bytes[offset] == canon_compare_value);
> > >     return PrimitiveConversions::cast<T>(cur_as_bytes[offset]);
> > > }
> > > 
> > > $ g++ -O1 -ftree-pre t.cpp
> > > $ ./a.out
> > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1
> > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0
> > > 
> > > $ g++ -O1 t.cpp
> > > $ ./a.out
> > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1
> > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256
> > > 
> > > Below is the assembler of the loop for the correct version:
> > > 
> > > .L7:
> > >         ldr     r4, [sp]
> > >         str     r4, [sp, #4]
> > >         strb    r7, [r5, #-4]
> > >         mov     r2, r4
> > >         ldr     r1, [sp, #4]
> > >         mov     r0, r6
> > >         bl      printf
> > >         cbz     r4, .L6
> > >         movs    r3, #0
> > >         str     r3, [sp]
> > >         ldrb    r3, [r8]        @ zero_extendqisi2
> > >         cmp     r3, #1
> > >         beq     .L7
> > > 
> > > and for the incorrect one:
> > > 
> > > .L7:
> > >         str     r4, [sp, #4]
> > >         strb    r8, [r6]
> > >         mov     r2, r4
> > >         ldr     r1, [sp, #4]
> > >         mov     r0, r5
> > >         bl      printf
> > >         cbz     r4, .L6
> > >         movs    r4, #0
> > >         str     r4, [sp]
> > >         ldrb    r3, [r7]        @ zero_extendqisi2
> > >         cmp     r3, #1
> > >         beq     .L7
> > 
> > There are serious aliasing bugs in the source - GCC is quite correct in
> > assuming that cur and cur_as_bytes[offset] never alias (obviously) and even
> > optimize away the cmpxchg (no idea why, that appears wrong).
> Isn't
> 
>    uint32_t cur = *aligned_dest;
>    uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur);
> 
> the very definition of the pointer aliasing? Regardless, if the function
> being called is doing atomic operations or not, the variable in question
> should not be hoisted? This is the essence of this bug.

The example as is is completely incorrect - it doesn't even return and ends up
executing random code.

> Yes, openjdk code is doing nasty things furtheron (and the code predates
> builtin gcc operations and is compiled by other compilers as well which may
> not be aware of builtins), but the bug as it stands does not depend on that
> logic.

Arm and AArch64 compilers support byte-wise cmpxchg, so my advice is to remove
this function altogether and all related code from openjdk. I can't see a
reason to even try when compilers already support it correctly and
efficiently.(In reply to Aleksei Voitylov from comment #4)
> (In reply to Wilco from comment #1)
> > (In reply to Aleksei Voitylov from comment #0)
> > > Created attachment 47212 [details]
> > > reduced testcase from the openjdk sources
> > > 
> > > While compiling openjdk sources I bumped into a bug: when -ftree-pre is
> > > enabled the optimizer hoists out reload of a variable which subsequently
> > > leads to the infinite loop situation.
> > > 
> > > Below is the relevant piece of code and "new_value" is the variable that
> > > gets hoisted out.
> > > 
> > > template<typename T>
> > > inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value,
> > >         T volatile* dest,
> > >         T compare_value,
> > >         atomic_memory_order order) const {
> > >     printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n",
> > > exchange_value, compare_value);
> > >     uint8_t canon_exchange_value = exchange_value;
> > >     uint8_t canon_compare_value = compare_value;
> > >     volatile uint32_t* aligned_dest
> > >         = reinterpret_cast<volatile uint32_t*>(align_down(dest,
> > > sizeof(uint32_t)));
> > >     size_t offset = (uintptr_t)dest  - (uintptr_t)aligned_dest;
> > >     uint32_t cur = *aligned_dest;
> > >     uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur);
> > >     cur_as_bytes[offset] = canon_compare_value;
> > >     do {
> > >         uint32_t new_value = cur;
> > >         reinterpret_cast<uint8_t*>(&new_value)[offset] =
> > > canon_exchange_value;
> > >         printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n",
> > > new_value, cur);
> > >         uint32_t res = cmpxchg(new_value, aligned_dest, cur, order);
> > >         if (res == cur) break;
> > >         cur = res;
> > >     } while (cur_as_bytes[offset] == canon_compare_value);
> > >     return PrimitiveConversions::cast<T>(cur_as_bytes[offset]);
> > > }
> > > 
> > > $ g++ -O1 -ftree-pre t.cpp
> > > $ ./a.out
> > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1
> > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0
> > > 
> > > $ g++ -O1 t.cpp
> > > $ ./a.out
> > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1
> > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256
> > > 
> > > Below is the assembler of the loop for the correct version:
> > > 
> > > .L7:
> > >         ldr     r4, [sp]
> > >         str     r4, [sp, #4]
> > >         strb    r7, [r5, #-4]
> > >         mov     r2, r4
> > >         ldr     r1, [sp, #4]
> > >         mov     r0, r6
> > >         bl      printf
> > >         cbz     r4, .L6
> > >         movs    r3, #0
> > >         str     r3, [sp]
> > >         ldrb    r3, [r8]        @ zero_extendqisi2
> > >         cmp     r3, #1
> > >         beq     .L7
> > > 
> > > and for the incorrect one:
> > > 
> > > .L7:
> > >         str     r4, [sp, #4]
> > >         strb    r8, [r6]
> > >         mov     r2, r4
> > >         ldr     r1, [sp, #4]
> > >         mov     r0, r5
> > >         bl      printf
> > >         cbz     r4, .L6
> > >         movs    r4, #0
> > >         str     r4, [sp]
> > >         ldrb    r3, [r7]        @ zero_extendqisi2
> > >         cmp     r3, #1
> > >         beq     .L7
> > 
> > There are serious aliasing bugs in the source - GCC is quite correct in
> > assuming that cur and cur_as_bytes[offset] never alias (obviously) and even
> > optimize away the cmpxchg (no idea why, that appears wrong).
> Isn't
> 
>    uint32_t cur = *aligned_dest;
>    uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur);
> 
> the very definition of the pointer aliasing? Regardless, if the function
> being called is doing atomic operations or not, the variable in question
> should not be hoisted? This is the essence of this bug.
> 
> Yes, openjdk code is doing nasty things furtheron (and the code predates
> builtin gcc operations and is compiled by other compilers as well which may
> not be aware of builtins), but the bug as it stands does not depend on that
> logic.

The example as is is completely incorrect - it doesn't even return and ends up
executing random code.

> Yes, openjdk code is doing nasty things furtheron (and the code predates
> builtin gcc operations and is compiled by other compilers as well which may
> not be aware of builtins), but the bug as it stands does not depend on that
> logic.

Arm and AArch64 compilers support byte-wise cmpxchg, so my advice is to remove
this function altogether and all related code from openjdk. I can't see a
reason to even try when compilers already support it correctly and efficiently.

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

Reply via email to