Hi all,

According to the architecture pseudocode the FEAT_MOPS sequences overwrite the 
NZCV flags
as part of their operation, so GCC needs to model that in the relevant RTL 
patterns.
For the testcase:
void g();
void foo (int a, size_t N, char *__restrict__ in,
         char *__restrict__ out)
{
  if (a != 3)
    __builtin_memcpy (out, in, N);
  if (a > 3)
    g ();
}

we will currently generate:
foo:
        cmp     w0, 3
        bne     .L6
.L1:
        ret
.L6:
        cpyfp   [x3]!, [x2]!, x1!
        cpyfm   [x3]!, [x2]!, x1!
        cpyfe   [x3]!, [x2]!, x1!
        ble     .L1 // Flags reused after CPYF* sequence
        b       g

This is wrong as the result of cmp needs to be recalculated after the MOPS 
sequence.
With this patch we'll insert a "cmp w0, 3" before the ble, like what clang does.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk and to the GCC 12 branch after some baking time.

Thanks,
Kyrill

gcc/ChangeLog:

        * config/aarch64/aarch64.md (aarch64_cpymemdi): Specify clobber of CC 
reg.
        (*aarch64_cpymemdi): Likewise.
        (aarch64_movmemdi): Likewise.
        (aarch64_setmemdi): Likewise.
        (*aarch64_setmemdi): Likewise.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/mops_5.c: New test.
        * gcc.target/aarch64/mops_6.c: Likewise.
        * gcc.target/aarch64/mops_7.c: Likewise.

Attachment: mops-cc.patch
Description: mops-cc.patch

Reply via email to