Re: [PATCH, rs6000] Refactor FP vector comparison operators

Kewen.Lin Tue, 19 Nov 2019 23:32:13 -0800

Hi Segher,

on 2019/11/20 上午1:29, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Nov 12, 2019 at 06:41:07PM +0800, Kewen.Lin wrote:
>> +;; code iterators and attributes for vector FP comparison operators:
>> +(define_code_iterator vector_fp_comparison_operator [lt le ne
>> +                                                 ungt unge unlt unle])
> 
> Let's indent that differently?
> 
> (define_code_iterator
>   vector_fp_comparison_operator [lt le ne ungt unge unlt unle])
> 
> is nice I think.
>


Done.

>> +; There are 14 possible vector FP comparison operators, gt and eq of them 
>> have
>> +; been expanded above, so just support 12 remaining operators here.
>>  
>> +; 0. For ge:
> 
> (Don't number comments please).
> 

Done.

>> +(define_expand "vector_ge<mode>"
>> +  [(set (match_operand:VEC_F 0 "vlogical_operand")
>> +    (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand")
>> +              (match_operand:VEC_F 2 "vlogical_operand")))]
>>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>> +  "")
> 
> This already exists?  Line 764 in vector.md?
> 

Yes, that one is only for VEC_F (aka. vector FP), move to here to centralize
them.  Does it make sense?

>> +; 1. For lt/le/ne/ungt/unge/unlt/unle:
>> +; lt(a,b)   = gt(b,a)
>> +; le(a,b)   = ge(b,a)
>> +; unge(a,b) = ~lt(a,b)
>> +; unle(a,b) = ~gt(a,b)
>> +; ne(a,b)   = ~eq(a,b)
>> +; ungt(a,b) = ~le(a,b)
>> +; unlt(a,b) = ~ge(a,b)
>> +(define_insn_and_split "vector_<code><mode>"
>>    [(set (match_operand:VEC_F 0 "vfloat_operand")
>> +    (vector_fp_comparison_operator:VEC_F
>> +                       (match_operand:VEC_F 1 "vfloat_operand")
>> +                       (match_operand:VEC_F 2 "vfloat_operand")))]
>>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>>    "#"
>> +  "can_create_pseudo_p ()"
>> +  [(pc)]
>>  {
>> +  enum rtx_code cond = <CODE>;
>> +  rtx res = gen_reg_rtx (<MODE>mode);
>> +  bool need_invert = false;
>> +  switch (cond)
>> +    {
>> +    case LT:
>> +    case LE:
>> +      cond = swap_condition (cond);
>> +      std::swap (operands[1], operands[2]);
>> +      break;
>> +    case UNLE:
>> +    case UNLT:
>> +    case NE:
>> +    case UNGE:
>> +    case UNGT:
>> +      cond = reverse_condition_maybe_unordered (cond);
>> +      need_invert = true;
>> +      break;
>> +    default:
>> +      gcc_unreachable ();
>> +    }
>> +
>> +  rtx comp = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]);
>> +  emit_insn (gen_rtx_SET (res, comp));
>> +
>> +  if (need_invert)
>> +    emit_insn (gen_one_cmpl<mode>2 (res, res));
>> +
>> +  emit_insn (gen_rtx_SET (operands[0], res));
>>  })
> 
> Does this work for (say) ungt?  I would do two switch statements: the
> first only to do the invert, and then the second to do the swap (with
> the modified cond!)  So:
> 

Yes, it works but it has to call further splitting for "le".  It looks
better to split to the end at once.  Thanks a lot for your example!

> {
>   enum rtx_code cond = <CODE>;
>   bool need_invert = false;
> 
>   if (cond == UNLE || cond == UNLT || cond == NE
>       || cond == UNGE || cond == UNGT)
>     {
>       cond = reverse_condition_maybe_unordered (cond);
>       need_invert = true;
>     }
> 
>   if (cond == LT || cond == LE)
>     {
>       cond = swap_condition (cond);
>       std::swap (operands[1], operands[2]);
>     }
> 

I added one assert here to guard the codes.

>   rtx comp = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]);
> 
>   if (need_invert)
>     {
>       rtx res = gen_reg_rtx (<MODE>mode);
>       emit_insn (gen_rtx_SET (res, comp));
>       emit_insn (gen_one_cmpl<mode>2 (operands[0], res));
>     }
>   else
>     emit_insn (gen_rtx_SET (operands[0], comp));
> })
> 

>> +; 2. For ltgt/uneq/ordered/unordered:
>> +; ltgt: gt(a,b) | gt(b,a)
>> +; uneq: ~gt(a,b) & ~gt(b,a)
>> +; ordered: ge(a,b) | ge(b,a)
>> +; unordered: ~ge(a,b) & ~ge(b,a)
> 
> You could write that as
>   ~(ge(a,b) | ge(b,a))
> which may show the structure better?
> 

Done.

>> +(define_insn_and_split "vector_<code><mode>"
>>    [(set (match_operand:VEC_F 0 "vfloat_operand")
>> +    (vector_fp_extra_comparison_operator:VEC_F
>> +                       (match_operand:VEC_F 1 "vfloat_operand")
>> +                       (match_operand:VEC_F 2 "vfloat_operand")))]
>>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
>>    "#"
>> +  "can_create_pseudo_p ()"
> 
> This should be "&& can_create_pseudo ()".
> 

Done.

> Also, can_create_pseudo in the split condition can fail, in theory
> anyway.  It should be part of the insn condition, instead, and even
> then it can fail: after the last split pass, but before reload. such
> an insn can in principle be created, and then it sill be never split.
> 
> In this case, maybe you should add a scratch register; in the previous
> case, you can reuse operands[0] for res instead of making a new reg
> for it, if !can_create_pseudo ().
> 

I've updated the previous case with operands[0] with !can_create_pseudo
check.  But for the later one I met ICE if I added two clobbers with
scratch into the RTL pattern like:

  [(set (match_operand:VEC_F 0 "vfloat_operand")
       (vector_fp_extra_comparison_operator:VEC_F
       (match_operand:VEC_F 1 "vfloat_operand")
       (match_operand:VEC_F 2 "vfloat_operand")))
   (clobber (match_scratch:VEC_F 3))
   (clobber (match_scratch:VEC_F 4))]

Some pattern without clobber as below can't be recognized (early in vregs).

(insn 24 23 25 4 (set (reg:V4SF 142)
        (uneq:V4SF (reg:V4SF 141 [ vect__4.11 ])
            (reg:V4SF 127 [ vect_cst__47 ]))) -1
     (nil))

I may miss something, but I can't find a way to satisfy both, to recog the 
simplified pattern and the pattern with clobber.  So I kept it as existing
codes for now to only handle can_create_pseudo (the old define_insn_and_split
even doesn't check can_create_pseudo, but it should since it tries to
gen_reg_rtx?).  Looking forward to your further suggestion!

>>  {
>> +  enum rtx_code cond = <CODE>;
>> +  rtx res1 = gen_reg_rtx (<MODE>mode);
>> +  rtx res2 = gen_reg_rtx (<MODE>mode);
>> +  bool need_invert = false;
>> +  switch (cond)
>> +    {
>> +    case UNEQ:
>> +      need_invert = true;
>> +    /* Fall through.  */
>> +    case LTGT:
>> +      cond = GT;
>> +      break;
>> +    case UNORDERED:
>> +      need_invert = true;
>> +    /* Fall through.  */
>> +    case ORDERED:
>> +      cond = GE;
>> +      break;
>> +    default:
>> +      gcc_unreachable ();
>> +    }
>> +
>> +  rtx comp1 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]);
>> +  emit_insn (gen_rtx_SET (res1, comp1));
>> +  rtx comp2 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[2], operands[1]);
>> +  emit_insn (gen_rtx_SET (res2, comp2));
>> +
>> +  emit_insn (gen_ior<mode>3 (res1, res1, res2));
>> +
>> +  if (need_invert)
>> +    emit_insn (gen_one_cmpl<mode>2 (res1, res1));
>> +
>> +  emit_insn (gen_rtx_SET (operands[0], res1));
>>  })
> 
> You can do this similarly:
> 
> {
>   enum rtx_code cond = <CODE>;
>   bool need_invert = false;
> 
>   if (cond == UNORDERED || cond == UNEQ)
>     {
>       cond = reverse_condition_maybe_unordered (cond);
>       need_invert = true;
>     }
> 
>   switch (cond)
>     {
>     case LTGT:
>       cond = GT;
>       break;
>     case ORDERED:
>       cond = GE;
>       break;
>     default:
>       gcc_unreachable ();
>     }
> 
>   rtx comp1 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]);
>   rtx res1 = gen_reg_rtx (<MODE>mode);
>   emit_insn (gen_rtx_SET (res1, comp1));
>   rtx comp2 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[2], operands[1]);
>   rtx res2 = gen_reg_rtx (<MODE>mode);
>   emit_insn (gen_rtx_SET (res2, comp2));
> 
>   if (need_invert)
>     {
>       rtx comp3 = gen_rtx_fmt_ee (IOR, <MODE>mode, res1, res2);
>       rtx comp4 = gen_rtx_fmt_e (NOT, <MODE>mode, comp3);

I had to add one more gen_rtx_SET for the IOR here, otherwise it gets the error
on the pattern can't be recognized.

>       emit_insn (gen_rtx_SET (operands[0], comp4));
>     }
>   else
>     emit_insn (gen_ior<mode>3 (operands[0], res1, res2));
> })
> 
> (the result of a split does not get combine etc. optimisations anymore,
> so you have to generate pretty much ideal code directly).
> 
> HTH, sorry for the late reply,

Yeah, it helps a lot, thanks again!

I have attached the updated version to get your further review.


BR,
Kewen

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index b132037..deeab9f 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -107,6 +107,12 @@
                                 (smin "smin")
                                 (smax "smax")])
 
+;; code iterators and attributes for vector FP comparison operators:
+(define_code_iterator
+       vector_fp_comparison_operator [lt le ne ungt unge unlt unle])
+(define_code_iterator
+       vector_fp_extra_comparison_operator [ltgt uneq unordered ordered])
+
 
 ;; Vector move instructions.  Little-endian VSX loads and stores require
 ;; special handling to circumvent "element endianness."
@@ -665,88 +671,6 @@
   DONE;
 })
 
-; lt(a,b) = gt(b,a)
-(define_expand "vector_lt<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (lt:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                 (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-{
-  emit_insn (gen_vector_gt<mode> (operands[0], operands[2], operands[1]));
-  DONE;
-})
-
-; le(a,b) = ge(b,a)
-(define_expand "vector_le<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (le:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                 (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-{
-  emit_insn (gen_vector_ge<mode> (operands[0], operands[2], operands[1]));
-  DONE;
-})
-
-; ne(a,b) = ~eq(a,b)
-(define_expand "vector_ne<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (ne:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                 (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-{
-  emit_insn (gen_vector_eq<mode> (operands[0], operands[1], operands[2]));
-  emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0]));
-  DONE;
-})
-
-; unge(a,b) = ~gt(b,a)
-(define_expand "vector_unge<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (unge:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                   (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-{
-  emit_insn (gen_vector_gt<mode> (operands[0], operands[2], operands[1]));
-  emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0]));
-  DONE;
-})
-
-; ungt(a,b) = ~ge(b,a)
-(define_expand "vector_ungt<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (ungt:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                   (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-{
-  emit_insn (gen_vector_ge<mode> (operands[0], operands[2], operands[1]));
-  emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0]));
-  DONE;
-})
-
-; unle(a,b) = ~gt(a,b)
-(define_expand "vector_unle<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (unle:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                   (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-{
-  emit_insn (gen_vector_gt<mode> (operands[0], operands[1], operands[2]));
-  emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0]));
-  DONE;
-})
-
-; unlt(a,b) = ~ge(a,b)
-(define_expand "vector_unlt<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (unlt:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                   (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-{
-  emit_insn (gen_vector_ge<mode> (operands[0], operands[1], operands[2]));
-  emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0]));
-  DONE;
-})
-
 (define_expand "vector_eq<mode>"
   [(set (match_operand:VEC_C 0 "vlogical_operand")
        (eq:VEC_C (match_operand:VEC_C 1 "vlogical_operand")
@@ -761,13 +685,6 @@
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
-(define_expand "vector_ge<mode>"
-  [(set (match_operand:VEC_F 0 "vlogical_operand")
-       (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand")
-                 (match_operand:VEC_F 2 "vlogical_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-  "")
-
 ; >= for integer vectors: swap operands and apply not-greater-than
 (define_expand "vector_nlt<mode>"
   [(set (match_operand:VEC_I 3 "vlogical_operand")
@@ -829,88 +746,118 @@
   operands[3] = gen_reg_rtx_and_attrs (operands[0]);
 })
 
-(define_insn_and_split "vector_uneq<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (uneq:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                   (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-  "#"
-  ""
-  [(set (match_dup 3)
-       (gt:VEC_F (match_dup 1)
-                 (match_dup 2)))
-   (set (match_dup 4)
-       (gt:VEC_F (match_dup 2)
-                 (match_dup 1)))
-   (set (match_dup 0)
-       (and:VEC_F (not:VEC_F (match_dup 3))
-                  (not:VEC_F (match_dup 4))))]
-{
-  operands[3] = gen_reg_rtx (<MODE>mode);
-  operands[4] = gen_reg_rtx (<MODE>mode);
-})
+; There are 14 possible vector FP comparison operators, gt and eq of them have
+; been expanded above, so just support 12 remaining operators here.
 
-(define_insn_and_split "vector_ltgt<mode>"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (ltgt:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                   (match_operand:VEC_F 2 "vfloat_operand")))]
+; For ge:
+(define_expand "vector_ge<mode>"
+  [(set (match_operand:VEC_F 0 "vlogical_operand")
+       (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand")
+                 (match_operand:VEC_F 2 "vlogical_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
-  "#"
-  ""
-  [(set (match_dup 3)
-       (gt:VEC_F (match_dup 1)
-                 (match_dup 2)))
-   (set (match_dup 4)
-       (gt:VEC_F (match_dup 2)
-                 (match_dup 1)))
-   (set (match_dup 0)
-       (ior:VEC_F (match_dup 3)
-                  (match_dup 4)))]
-{
-  operands[3] = gen_reg_rtx (<MODE>mode);
-  operands[4] = gen_reg_rtx (<MODE>mode);
-})
+  "")
 
-(define_insn_and_split "vector_ordered<mode>"
+; For lt/le/ne/ungt/unge/unlt/unle:
+; lt(a,b)   = gt(b,a)
+; le(a,b)   = ge(b,a)
+; unge(a,b) = ~lt(a,b)
+; unle(a,b) = ~gt(a,b)
+; ne(a,b)   = ~eq(a,b)
+; ungt(a,b) = ~le(a,b)
+; unlt(a,b) = ~ge(a,b)
+(define_insn_and_split "vector_<code><mode>"
   [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (ordered:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                      (match_operand:VEC_F 2 "vfloat_operand")))]
+       (vector_fp_comparison_operator:VEC_F
+                          (match_operand:VEC_F 1 "vfloat_operand")
+                          (match_operand:VEC_F 2 "vfloat_operand")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "#"
-  ""
-  [(set (match_dup 3)
-       (ge:VEC_F (match_dup 1)
-                 (match_dup 2)))
-   (set (match_dup 4)
-       (ge:VEC_F (match_dup 2)
-                 (match_dup 1)))
-   (set (match_dup 0)
-       (ior:VEC_F (match_dup 3)
-                  (match_dup 4)))]
+  "&& 1"
+  [(pc)]
 {
-  operands[3] = gen_reg_rtx (<MODE>mode);
-  operands[4] = gen_reg_rtx (<MODE>mode);
+  enum rtx_code cond = <CODE>;
+  bool need_invert = false;
+
+  if (cond == UNLE || cond == UNLT || cond == NE || cond == UNGE
+      || cond == UNGT)
+    {
+      cond = reverse_condition_maybe_unordered (cond);
+      need_invert = true;
+    }
+
+  if (cond == LT || cond == LE)
+    {
+      cond = swap_condition (cond);
+      std::swap (operands[1], operands[2]);
+    }
+
+  gcc_assert (cond == EQ || cond == GE || cond == GT);
+
+  rtx comp = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]);
+
+  if (need_invert)
+    {
+      rtx res
+       = (!can_create_pseudo_p () ? operands[0] : gen_reg_rtx (<MODE>mode));
+      emit_insn (gen_rtx_SET (res, comp));
+      emit_insn (gen_one_cmpl<mode>2 (operands[0], res));
+    }
+  else
+    emit_insn (gen_rtx_SET (operands[0], comp));
 })
 
-(define_insn_and_split "vector_unordered<mode>"
+; For ltgt/uneq/ordered/unordered:
+; ltgt: gt(a,b) | gt(b,a)
+; uneq: ~(gt(a,b) | gt(b,a))
+; ordered: ge(a,b) | ge(b,a)
+; unordered: ~(ge(a,b) | ge(b,a))
+(define_insn_and_split "vector_<code><mode>"
   [(set (match_operand:VEC_F 0 "vfloat_operand")
-       (unordered:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-                        (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
+       (vector_fp_extra_comparison_operator:VEC_F
+                          (match_operand:VEC_F 1 "vfloat_operand")
+                          (match_operand:VEC_F 2 "vfloat_operand")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode) && can_create_pseudo_p ()"
   "#"
-  ""
-  [(set (match_dup 3)
-       (ge:VEC_F (match_dup 1)
-                 (match_dup 2)))
-   (set (match_dup 4)
-       (ge:VEC_F (match_dup 2)
-                 (match_dup 1)))
-   (set (match_dup 0)
-        (and:VEC_F (not:VEC_F (match_dup 3))
-                   (not:VEC_F (match_dup 4))))]
+  "&& 1"
+  [(pc)]
 {
-  operands[3] = gen_reg_rtx (<MODE>mode);
-  operands[4] = gen_reg_rtx (<MODE>mode);
+  enum rtx_code cond = <CODE>;
+  bool need_invert = false;
+
+  if (cond == UNORDERED || cond == UNEQ)
+    {
+      cond = reverse_condition_maybe_unordered (cond);
+      need_invert = true;
+    }
+
+  switch (cond)
+    {
+    case LTGT:
+      cond = GT;
+      break;
+    case ORDERED:
+      cond = GE;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  rtx comp1 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]);
+  rtx res1 = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_rtx_SET (res1, comp1));
+  rtx comp2 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[2], operands[1]);
+  rtx res2 = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_rtx_SET (res2, comp2));
+
+  if (need_invert)
+    {
+      rtx comp3 = gen_rtx_fmt_ee (IOR, <MODE>mode, res1, res2);
+      emit_insn (gen_rtx_SET (operands[0], comp3));
+      rtx comp4 = gen_rtx_fmt_e (NOT, <MODE>mode, operands[0]);
+      emit_insn (gen_rtx_SET (operands[0], comp4));
+    }
+  else
+    emit_insn (gen_ior<mode>3 (operands[0], res1, res2));
 })
 
 ;; Note the arguments for __builtin_altivec_vsel are op2, op1, mask

Re: [PATCH, rs6000] Refactor FP vector comparison operators

Reply via email to