https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68696

            Bug ID: 68696
           Summary: [6 Regression] FAIL: gcc.target/aarch64/vbslq_u64_1.c
                    scan-assembler-times bif\\tv 1
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

After r231178 the above testcase started failing.
For the code:
typedef __Uint32x4_t uint32x4_t;

uint32x4_t
vbslq_dummy_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t mask)
{
  return (mask & a) | (~mask & b);
}

at -O3 we started generating:
vbslq_dummy_u32:
        eor     v0.16b, v0.16b, v1.16b
        and     v0.16b, v0.16b, v2.16b
        eor     v0.16b, v0.16b, v1.16b
        ret

instead of:
vbslq_dummy_u32:
        bif     v0.16b, v1.16b, v2.16b
        ret

This is because of the slightly different tree sequences and hence RTL insns
that get generated. So combine now tries and fails to match:
(set (reg:V4SI 79)
    (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 32 v0 [ a ])
                (reg/v:V4SI 77 [ b ]))
            (reg:V4SI 34 v2 [ mask ]))
        (reg/v:V4SI 77 [ b ])))

whereas before it successfully matched the aarch64_simd_bsl<mode>_internal
pattern in aarch64-simd.md with:
(set (reg:V4SI 79)
    (xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 77 [ b ])
                (reg:V4SI 32 v0 [ a ]))
            (reg:V4SI 34 v2 [ mask ]))
        (reg/v:V4SI 77 [ b ])))


note that reg/v 77 and reg v0 swapped places.
This is a deficiency in the aarch64 combine pattern.

Reply via email to