https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78020

            Bug ID: 78020
           Summary: [Aarch64, ARM64] vuzp{1,2}q_f64 implementation
                    identical to vzip{1,2}q_f64 in arm_neon.h and probably
                    incorrect
           Product: gcc
           Version: 6.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: christophe.monat at st dot com
  Target Milestone: ---

Created attachment 39829
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39829&action=edit
Suggested patch to fix the zip vs unzp intrinsic issue

I think that vuzp{1,2}q_f64 in gcc/config/aarch64/arm_neon.h are not correctly
implemented.

For instance:
vzip1q_f64 (float64x2_t __a, float64x2_t __b)
(snippage)
  return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});

and:
vuzp1q_f64 (float64x2_t __a, float64x2_t __b)
(snippage)
  return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});

But then, according to the "ARM Architecture reference Manual, ARMv8 for
ARMv8-A architecture profile" (I am reading ARM DDI 0487A.i (ID 012816)), the
semantic of zip1 and uzp1 differ (C3.5.18 is a convenient starting point to
browse the architectural descriptions).

It looks to me that the correct implementation would look like:
vuzp1q_f64 (float64x2_t __a, float64x2_t __b)
(snippage)
  return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});

that would generate
        zip1    v0.2d, v1.2d, v0.2d
instead of:
        zip1    v0.2d, v0.2d, v1.2d

and has the correct semantic, though it does not use the uzp1 mnemonic (I
expected this uzp1 to appear more or less at the beginning, and scratched a
little bit my head and draw some diagrams to convince me that I was hopefully
correct).

I have also noticed that vtrn{1,2}q_f64 are implemented in terms of zip{1,2},
but it seems ok semantically (though I had to check manually with the semantic
description and more drawings to convince myself).

I have attached a patch suggesting a change in arm_neon.h, in case this
analysis is correct.

I note that if https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70369 were
completed the occurrence of such issue would be less likely.

Reply via email to