Kyrill  Tkachov <kyrylo.tkac...@foss.arm.com> writes:
> On 26/01/18 13:31, Richard Sandiford wrote:
>> sve/extract_[12].c were relying on the target-independent optimisation
>> that removes a redundant vec_select, so that we don't end up with
>> things like:
>>
>>     dup v0.4s, v0.4s[0]
>>     ...use s0...
>>
>> But that optimisation rightly doesn't trigger for big-endian targets,
>> because GCC expects lane 0 to be in the high part of the register
>> rather than the low part.
>>
>> SVE breaks this assumption -- see the comment at the head of
>> aarch64-sve.md for details -- so the optimisation is valid for
>> both endiannesses.  Long term, we probably need some kind of target
>> hook to make GCC aware of this.
>>
>> But there's another problem with the current extract pattern: it doesn't
>> tell the register allocator how cheap an extraction of lane 0 is with
>> tied registers.  It seems better to split the lane 0 case out into
>> its own pattern and use tied operands for the FPR<-SIMD case,
>> so that using different registers has the cost of an extra reload.
>> I think we want this for both endiannesses, regardless of the hook
>> described above.
>>
>> Also, the gen_lowpart in this pattern fails for aarch64_be due to
>> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
>> instead.  We're only creating this rtl in order to print it, so there's
>> no need for anything fancier.
>>
>> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2018-01-26  Richard Sandiford <richard.sandif...@linaro.org>
>>
>> gcc/
>>         * config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New
>>         pattern.
>>         (*vec_extract<mode><Vel>_v128): Require a nonzero lane number.
>>         Use gen_rtx_REG rather than gen_lowpart.
>>
>> Index: gcc/config/aarch64/aarch64-sve.md
>> ===================================================================
>> --- gcc/config/aarch64/aarch64-sve.md   2018-01-13 18:01:51.232735405 +0000
>> +++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:26:50.176756711 +0000
>> @@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>"
>>    }
>>  )
>>
>> +;; Extract element zero.  This is a special case because we want to force
>> +;; the registers to be the same for the second alternative, and then
>> +;; split the instruction into nothing after RA.
>> +(define_insn_and_split "*vec_extract<mode><Vel>_0"
>> +  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
>> Utv")
>> +       (vec_select:<VEL>
>> +         (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
>> +         (parallel [(const_int 0)])))]
>> +  "TARGET_SVE"
>> +  {
>> +    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
>> +    switch (which_alternative)
>> +      {
>> +       case 0:
>> +         return "umov\\t%<vwcore>0, %1.<Vetype>[0]";
>> +       case 1:
>> +         return "#";
>> +       case 2:
>> +         return "st1\\t{%1.<Vetype>}[0], %0";
>> +       default:
>> +         gcc_unreachable ();
>> +      }
>> +  }
>> +  "&& reload_completed
>> +   && REG_P (operands[1])
>> +   && REGNO (operands[0]) == REGNO (operands[1])"
>
> Is it guaranteed that operands[0] will be a REG rtx here?
> The aarch64_simd_nonimmediate_operand predicate and Utv constraint might allow
> a MEM, which would cause an error in an RTL-checking build.
> If I'm not mistaken GCC may attempt the split even when matching alternative 2

Bah, good catch.  The REG_P was checking the wrong operand.
Fixed version below.

Richard


2018-01-26  Richard Sandiford  <richard.sandif...@linaro.org>

gcc/
        * config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New
        pattern.
        (*vec_extract<mode><Vel>_v128): Require a nonzero lane number.

Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md   2018-01-26 15:14:36.016144657 +0000
+++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 15:14:36.171138212 +0000
@@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>"
   }
 )
 
+;; Extract element zero.  This is a special case because we want to force
+;; the registers to be the same for the second alternative, and then
+;; split the instruction into nothing after RA.
+(define_insn_and_split "*vec_extract<mode><Vel>_0"
+  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
Utv")
+       (vec_select:<VEL>
+         (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
+         (parallel [(const_int 0)])))]
+  "TARGET_SVE"
+  {
+    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
+    switch (which_alternative)
+      {
+       case 0:
+         return "umov\\t%<vwcore>0, %1.<Vetype>[0]";
+       case 1:
+         return "#";
+       case 2:
+         return "st1\\t{%1.<Vetype>}[0], %0";
+       default:
+         gcc_unreachable ();
+      }
+  }
+  "&& reload_completed
+   && REG_P (operands[0])
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(const_int 0)]
+  {
+    emit_note (NOTE_INSN_DELETED);
+    DONE;
+  }
+  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]
+)
+
 ;; Extract an element from the Advanced SIMD portion of the register.
 ;; We don't just reuse the aarch64-simd.md pattern because we don't
-;; want any chnage in lane number on big-endian targets.
+;; want any change in lane number on big-endian targets.
 (define_insn "*vec_extract<mode><Vel>_v128"
   [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
Utv")
        (vec_select:<VEL>
          (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
          (parallel [(match_operand:SI 2 "const_int_operand")])))]
   "TARGET_SVE
-   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 1, 15)"
   {
-    operands[1] = gen_lowpart (<V128>mode, operands[1]);
+    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
     switch (which_alternative)
       {
        case 0:

Reply via email to