Re: vec_cond_expr adjustments

Marc Glisse Fri, 05 Oct 2012 08:01:55 -0700

[I am still a little confused, sorry for the long email...]


On Tue, 2 Oct 2012, Richard Guenther wrote:

+  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
+    {
+      int count = VECTOR_CST_NELTS (op0);
+      tree *elts =  XALLOCAVEC (tree, count);
+      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
+
+      for (int i = 0; i < count; i++)
+       {
+         tree elem_type = TREE_TYPE (type);
+         tree elem0 = VECTOR_CST_ELT (op0, i);
+         tree elem1 = VECTOR_CST_ELT (op1, i);
+
+         elts[i] = fold_relational_const (code, elem_type,
+                                          elem0, elem1);
+
+         if(elts[i] == NULL_TREE)
+           return NULL_TREE;
+
+         elts[i] = fold_negate_const (elts[i], elem_type);



I think you need to invent something new similar to STORE_FLAG_VALUE
or use STORE_FLAG_VALUE here.  With the above you try to map
{0, 1} to {0, -1} which is only true if the operation on the element types
returns {0, 1} (thus, STORE_FLAG_VALUE is 1).


Er, seems to me that constant folding of a scalar comparison in the
front/middle-end only returns {0, 1}.

[and later]

I'd say adjust your fold-const patch to not negate the scalar result
but build a proper -1 / 0 value based on integer_zerop().

I don't mind doing it that way, but I would like to understand first.LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. Sonegating should be the same as testing with integer_zerop to build -1 or0. Is it just a matter of style (then I am ok), or am I missing a reasonwhich makes the negation wrong?

The point is we need to define some semantics for vector comparison
results.

Yes. I think a documentation patch should come first: generic.texi ismissing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn'tmention vectors. But before that we need to decide what to put there...

One variant is to make it target independent which in turn
would inhibit (or make it more difficult) to exploit some target features.
You for example use {0, -1} for truth values - probably to exploit target
features -

Actually it was mostly because that is the meaning in the language. OpenCLsays that a<b is a vector of 0 and -1, and that ?: only looks at the MSBof the elements in the condition. The fact that it matches what sometargets do is a simple consequence of the fact that OpenCL was based onwhat hardware already did.

even though the most natural middle-end way would be to
use {0, 1} as for everything else


I agree that it would be natural and convenient in a number of places.

(caveat: there may be both signed and unsigned bools, we don't allowvector components with non-mode precision, thus you could argue that asigned bool : 1 is just "sign-extended" for your solution).


Not sure how that would translate in the code.

A different variant is to make it target dependent to leverageoptimization opportunities


That's an interesting possibility...

that's why STORE_FLAG_VALUE exists.

AFAICS it only appears when we go from gimple to rtl, not before (andthere is already a VECTOR_STORE_FLAG_VALUE, although no target definesit). Which doesn't mean we couldn't make it appear earlier for vectors.

For example with vector comparisons a < v result, when
performing bitwise operations on it, you either have to make the target
expand code to produce {0, -1} even if the natural compare instruction
would, say, produce {0, 0x80000} - or not constrain the possible values
of its result (like forwprop would do with your patch).  In general we
want constant folding to yield the same results as if the HW carried
out the operation to make -O0 code not diverge from -O1.  Thus,

v4si g;
int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }

should not assign different values to g dependent on constant propagation
performed or not.

That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whetheryour target likes it or not. Depending on how things are handled,comparisons could be constrained internally to only appear (possiblyindirectly) in the first argument of a vec_cond_expr.

The easiest way out is something like STORE_FLAG_VALUE
if there does not exist a middle-end choice for vector true / false components
that can be easily generated from what the target produces.

Like if you perform a FP comparison

int main () { double x = 1.0; static _Bool b; b = x < 3.0; }

you get without CCP on x86_64:

       ucomisd -8(%rbp), %xmm0
       seta    %al
       movb    %al, b.1715(%rip)

thus the equivalent of

   flag_reg = x < 3.0;
   b = flag_reg ? 1 : 0;


where this expansion happens in the back-end.

for vector compares you get something similar:

   flag_vec = x < y;
   res = flag_vec ? { 1, ... } : { 0, ... };

which I think you can see being produced by generic vector lowering
(in do_compare).  Where I can see we indeed use {0, -1} ... which
would match your constant folding behavior.

We may not be able to easily recover from this intermediate step
with combine (I'm not sure), so a target dependent value may
be prefered.

Being able to optimize it is indeed a key point. Let's try on an example(not assuming any specific representation in the middle-end for now). SayI write this C/OpenCL code: ((a<b)&&(c<d))?x:y (not currently supported)


The front-end gives to the middle-end: ((((a<b)?-1:0)&((c<d)?-1:0))<0)?x:y

On an architecture like sse, neon or altivec where VECTOR_STORE_FLAG_VALUEis -1 (well, should be), expansion of (a<b)?-1:0 would just be a<b. The <0can also disappear if the vcond instruction only looks at the MSB (x86).And we are left in the back-end with ((a<b)&(c<d))?x:y, as desired.

On other architectures, expecting the back-end to simplify everything doesseem hard. But it isn't obvious how to handle it in the middle end either.

Some other forms we could imagine the middle-end producing:
(a<b)?(c<d)?x:y:y
or assuming that VECTOR_STORE_FLAG_VALUE is defined:
(((a<b)&(c<d))!=0)?x:y (back-end would remove the != 0 on altivec)
Both would require special code to happen.

But then how do we handle for instance sparc, where IIUC comparing 2vectors returns an integer, where bits 0, 1, etc of the integer representtrue/false for the comparisons of elements 0, 1, etc of the vectors (as invec_merge, but not constant)? Defining VECTOR_STORE_FLAG_VALUE is notpossible since comparisons don't return a vector, but we would still wantto compute a<b, c<d, and perform an AND of those 2 integers before callingthe usual code for the selection.

If we assume a -1/0 and MSB representation in the middle-end, thefront-end could just pass ((a<b)&(c<d))?x:y to the middle-end. Whenmoving to the back-end, "nothing" would happen on x86.

Comparing x86, neon and altivec, they all have comparisons that return avector of -1 and 0. On the other hand, they have different selectioninstructions. x86 uses <0, altivec uses !=0 and neon has a bitwise selectand thus requires exactly -1 or 0. It thus seems to me that we shoulddecide in the middle-end that vector comparisons return vectors of -1 and0. VEC_COND_EXPR is more complicated. We could for instance require thatit takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neonthing are equivalent). Which would leave to decide what the expansion ofvec_cond_expr passes to the targets when the first argument is not acomparison, between !=0, <0, ==-1 or others (I vote for <0 because ofopencl). One issue is that targets wouldn't know if it was a dummycomparison that can safely be ignored because the other part is the resultof logical operations on comparisons (thus composed of -1 and 0) or agenuine comparison with an arbitrary vector, so a new optimization wouldbe needed (in the back-end I guess or we would need an alternateinstruction to vcond) to detect if a vector is a "signed boolean" vector.We could instead say that vec_cond_expr really follows OpenCL's semanticsand looks at the MSB of each element. I am not sure that would changemuch, it would mostly delay the apparition of <0 to RTL expansion time(and thus make gimple slightly lighter).

+/* Return true if EXPR is an integer constant representing true.  */
+
+bool
+integer_truep (const_tree expr)
+{
+  STRIP_NOPS (expr);
+
+  switch (TREE_CODE (expr))
+    {
+    case INTEGER_CST:
+      /* Do not just test != 0, some places expect the value 1.  */
+      return (TREE_INT_CST_LOW (expr) == 1
+             && TREE_INT_CST_HIGH (expr) == 0);



I wonder if using STORE_FLAG_VALUE is better here (note that it
usually differs for FP vs. integral comparisons and the mode passed
to STORE_FLAG_VALUE is that of the comparison result).



I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
simplify-rtx, in a way that seems a bit strange but I'll try to
understand that later). Thanks for showing me this macro, it seems
important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
the RTL level.

It looks like it would be possible to have 3 different semantics:
source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
and back-end is whatever the target wants. The front-end would generate
for a<b : vec_cond_expr(a<b,-1,0)


seems like the middle-end uses this for lowering vector compares,
a < b -> { a[0] < b[0] ? -1 : 0, ... }

and for a?b:c : vec_cond_expr(a<0,b,c)


it looks like ?: is not generally handled by tree-vect-generic, so it must
be either not supported by the frontend or lowered therein (ISTR
it is forced to appear as a != {0,...} ? ... : ...)

Not supported by the front-end yet (not even by the gimplifier), I have(bad) patches but I can't really finish them before this conversation isdone.

I think there are quite few places in the middle-end that assume thatcomparisons return a vector of -1/0 and even fewer that vec_cond_expr onlylooks at the MSB of each element. So it is still time to change that ifyou want to. But if we want to change it, I think it should happen nowbefore even more vector code gets in (not particularly my patches, I amthinking of cilk and others too).



Ok, that's long enough, I need to send it now...

--
Marc Glisse

Re: vec_cond_expr adjustments

Reply via email to