Hi Segher,

Thanks for the comments!

on 2022/11/17 02:44, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
>>      * config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
>>      float only comparison operators.
> 
> Why?  Is that correct?  Your mail says nothing about this :-(
> 
> Is there any testcase that covers this, and that shows things still
> generate the same code?
> 

Sorry for the unclear description, I thought mistakenly that it's
probably straightforward.

With the change in this patch, all 14 vector float comparison operators
(unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
would be handled early in rs6000_emit_vector_compare.

For unordered/ordered/ltgt/uneq, the new way is exactly the same
as what we do in rs6000_emit_vector_compare_inner, it means there is
no chance to get into rs6000_emit_vector_compare_inner with any of them.
For eq/ge/gt, it's the same too, but they are shared with vector integer
comparison, I just left them alone here.  Just noticed we can remove ge
safely too as it's guarded with !MODE_VECTOR_INT.

For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
with reverse_condition_maybe_unordered and invert the result, it's the
same as what we have in vector.md.

; unge(a,b) = ~lt(a,b)
; unle(a,b) = ~gt(a,b)
; ne(a,b)   = ~eq(a,b)
; ungt(a,b) = ~le(a,b)
; unlt(a,b) = ~ge(a,b)

Then eq/ge/gt on the right side would match the cases that were mentioned
above.  So we just need to focus on lt and le then.

For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
it's the same as what we have in vector.md:

; lt(a,b)   = gt(b,a)

, and further matches the case mentioned above.

As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
and further handle lt recursively, that is:
   le = lt(a,b) || eq(a,b)
      = gt(b,a) || eq(a,b)

actually this is worse than what vector.md supports:

; le(a,b)   = ge(b,a)

In short, the function rs6000_emit_vector_compare_inner is only called by
twice in rs6000_emit_vector_compare, there is no chance to enter
rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
any more, I think it's safe to make the change in function
rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
vector float comparison can improve slightly for UNGT and LE handlings.

I constructed a test case, compiled with option -O2 -ftree-vectorize
-fno-vect-cost-model on ppc64le, which goes into this function
rs6000_emit_vector_compare with all 14 vector float comparison codes,
the assembly of most functions doesn't change after this patch,
excepting for test_UNGT_{float,double} and test_LE_{float,double}.

one example from 
before:

          lxvx 12,3,9
          lxvx 11,4,9
          xvcmpgtsp 0,11,12
          xvcmpeqsp 12,12,11
          xxlor 0,0,12
          xxlandc 0,32,0
          stxvx 0,5,9
          addi 9,9,16
          bdnz .L77

vs. 

after: (good to be unrolled)

          lxvx 0,4,10
          lxvx 12,3,10
          addi 9,10,16
          lxvx 11,3,9
          xvcmpgesp 12,0,12
          lxvx 0,4,9
          xvcmpgesp 0,0,11
          xxlandc 12,32,12
          stxvx 12,5,10
          addi 10,10,32
          xxlandc 0,32,0
          stxvx 0,5,9
          bdnz .L77


===============
$ cat test.h

#define UNORD(a, b) (__builtin_isunordered ((a), (b)))
#define ORD(a, b) (!__builtin_isunordered ((a), (b)))
#define LTGT(a, b) (__builtin_islessgreater ((a), (b)))
#define UNEQ(a, b) (!__builtin_islessgreater ((a), (b)))
#define UNGT(a, b) (!__builtin_islessequal ((a), (b)))
#define UNGE(a, b) (!__builtin_isless ((a), (b)))
#define UNLT(a, b) (!__builtin_isgreaterequal ((a), (b)))
#define UNLE(a, b) (!__builtin_isgreater ((a), (b)))
#define GT(a, b) (((a) > (b)))
#define GE(a, b) (((a) >= (b)))
#define LT(a, b) (((a) < (b)))
#define LE(a, b) (((a) <= (b)))
#define EQ(a, b) (((a) == (b)))
#define NE(a, b) (((a) != (b)))

#define TEST_VECT(NAME, TYPE)                                                  \
  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y,         \
                                                     int *res, int n)          \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      res[i] = NAME (x[i], y[i]);                                              \
  }

===============
$ cat test.c

#include "test.h"

#define TEST(TYPE)                                                             \
  TEST_VECT (UNORD, TYPE)                                                      \
  TEST_VECT (ORD, TYPE)                                                        \
  TEST_VECT (LTGT, TYPE)                                                       \
  TEST_VECT (UNEQ, TYPE)                                                       \
  TEST_VECT (UNGT, TYPE)                                                       \
  TEST_VECT (UNGE, TYPE)                                                       \
  TEST_VECT (UNLT, TYPE)                                                       \
  TEST_VECT (UNLE, TYPE)                                                       \
  TEST_VECT (GT, TYPE)                                                         \
  TEST_VECT (GE, TYPE)                                                         \
  TEST_VECT (LT, TYPE)                                                         \
  TEST_VECT (LE, TYPE)                                                         \
  TEST_VECT (EQ, TYPE)                                                         \
  TEST_VECT (NE, TYPE)

TEST (float)
TEST (double)
===============

Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double}
and scan not xvcmp{gt,eq}[sd]p.

With the above explanation, does this patch look good to you?

BR,
Kewen

Reply via email to