On Fri, Oct 5, 2012 at 5:01 PM, Marc Glisse <marc.gli...@inria.fr> wrote:
> [I am still a little confused, sorry for the long email...]
>
>
> On Tue, 2 Oct 2012, Richard Guenther wrote:
>
>>>>> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>>> +    {
>>>>> +      int count = VECTOR_CST_NELTS (op0);
>>>>> +      tree *elts =  XALLOCAVEC (tree, count);
>>>>> +      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
>>>>> +
>>>>> +      for (int i = 0; i < count; i++)
>>>>> +       {
>>>>> +         tree elem_type = TREE_TYPE (type);
>>>>> +         tree elem0 = VECTOR_CST_ELT (op0, i);
>>>>> +         tree elem1 = VECTOR_CST_ELT (op1, i);
>>>>> +
>>>>> +         elts[i] = fold_relational_const (code, elem_type,
>>>>> +                                          elem0, elem1);
>>>>> +
>>>>> +         if(elts[i] == NULL_TREE)
>>>>> +           return NULL_TREE;
>>>>> +
>>>>> +         elts[i] = fold_negate_const (elts[i], elem_type);
>>>>
>>>>
>>>>
>>>> I think you need to invent something new similar to STORE_FLAG_VALUE
>>>> or use STORE_FLAG_VALUE here.  With the above you try to map
>>>> {0, 1} to {0, -1} which is only true if the operation on the element
>>>> types
>>>> returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
>>>
>>>
>>> Er, seems to me that constant folding of a scalar comparison in the
>>> front/middle-end only returns {0, 1}.
>
> [and later]
>
>> I'd say adjust your fold-const patch to not negate the scalar result
>> but build a proper -1 / 0 value based on integer_zerop().
>
>
> I don't mind doing it that way, but I would like to understand first.
> LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So negating
> should be the same as testing with integer_zerop to build -1 or 0. Is it
> just a matter of style (then I am ok), or am I missing a reason which makes
> the negation wrong?

Just a matter of style.  Negating is a lot less descriptive for the actual
set of return values we produce.

>> The point is we need to define some semantics for vector comparison
>> results.
>
>
> Yes. I think a documentation patch should come first: generic.texi is
> missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't mention
> vectors. But before that we need to decide what to put there...
>
>
>> One variant is to make it target independent which in turn
>> would inhibit (or make it more difficult) to exploit some target features.
>> You for example use {0, -1} for truth values - probably to exploit target
>> features -
>
>
> Actually it was mostly because that is the meaning in the language. OpenCL
> says that a<b is a vector of 0 and -1, and that ?: only looks at the MSB of
> the elements in the condition. The fact that it matches what some targets do
> is a simple consequence of the fact that OpenCL was based on what hardware
> already did.

Yes, it seems that the {0, -1} choice is most reasonable for GENERIC.  So
let's document that.

>
>> even though the most natural middle-end way would be to
>> use {0, 1} as for everything else
>
>
> I agree that it would be natural and convenient in a number of places.
>
>
>> (caveat: there may be both signed and unsigned bools, we don't allow
>> vector components with non-mode precision, thus you could argue that a
>> signed bool : 1 is just "sign-extended" for your solution).
>
>
> Not sure how that would translate in the code.
>
>
>> A different variant is to make it target dependent to leverage
>> optimization opportunities
>
>
> That's an interesting possibility...
>
>
>> that's why STORE_FLAG_VALUE exists.
>
>
> AFAICS it only appears when we go from gimple to rtl, not before (and there
> is already a VECTOR_STORE_FLAG_VALUE, although no target defines it). Which
> doesn't mean we couldn't make it appear earlier for vectors.
>
>
>> For example with vector comparisons a < v result, when
>> performing bitwise operations on it, you either have to make the target
>> expand code to produce {0, -1} even if the natural compare instruction
>> would, say, produce {0, 0x80000} - or not constrain the possible values
>> of its result (like forwprop would do with your patch).  In general we
>> want constant folding to yield the same results as if the HW carried
>> out the operation to make -O0 code not diverge from -O1.  Thus,
>>
>> v4si g;
>> int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }
>>
>> should not assign different values to g dependent on constant propagation
>> performed or not.
>
>
> That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether
> your target likes it or not. Depending on how things are handled,
> comparisons could be constrained internally to only appear (possibly
> indirectly) in the first argument of a vec_cond_expr.

Yes, I realized that later.

>
>> The easiest way out is something like STORE_FLAG_VALUE
>> if there does not exist a middle-end choice for vector true / false
>> components
>> that can be easily generated from what the target produces.
>>
>> Like if you perform a FP comparison
>>
>> int main () { double x = 1.0; static _Bool b; b = x < 3.0; }
>>
>> you get without CCP on x86_64:
>>
>>        ucomisd -8(%rbp), %xmm0
>>        seta    %al
>>        movb    %al, b.1715(%rip)
>>
>> thus the equivalent of
>>
>>    flag_reg = x < 3.0;
>>    b = flag_reg ? 1 : 0;
>
>
> where this expansion happens in the back-end.

In the target specific expander for the comparison.

>
>> for vector compares you get something similar:
>>
>>    flag_vec = x < y;
>>    res = flag_vec ? { 1, ... } : { 0, ... };
>>
>> which I think you can see being produced by generic vector lowering
>> (in do_compare).  Where I can see we indeed use {0, -1} ... which
>> would match your constant folding behavior.
>>
>> We may not be able to easily recover from this intermediate step
>> with combine (I'm not sure), so a target dependent value may
>> be prefered.
>
>
> Being able to optimize it is indeed a key point. Let's try on an example
> (not assuming any specific representation in the middle-end for now). Say I
> write this C/OpenCL code: ((a<b)&&(c<d))?x:y (not currently supported)
>
> The front-end gives to the middle-end: ((((a<b)?-1:0)&((c<d)?-1:0))<0)?x:y
>
> On an architecture like sse, neon or altivec where VECTOR_STORE_FLAG_VALUE
> is -1 (well, should be), expansion of (a<b)?-1:0 would just be a<b. The <0
> can also disappear if the vcond instruction only looks at the MSB (x86). And
> we are left in the back-end with ((a<b)&(c<d))?x:y, as desired.
>
> On other architectures, expecting the back-end to simplify everything does
> seem hard. But it isn't obvious how to handle it in the middle end either.
> Some other forms we could imagine the middle-end producing:
> (a<b)?(c<d)?x:y:y
> or assuming that VECTOR_STORE_FLAG_VALUE is defined:
> (((a<b)&(c<d))!=0)?x:y (back-end would remove the != 0 on altivec)
> Both would require special code to happen.

True.

> But then how do we handle for instance sparc, where IIUC comparing 2 vectors
> returns an integer, where bits 0, 1, etc of the integer represent true/false
> for the comparisons of elements 0, 1, etc of the vectors (as in vec_merge,
> but not constant)? Defining VECTOR_STORE_FLAG_VALUE is not possible since
> comparisons don't return a vector, but we would still want to compute a<b,
> c<d, and perform an AND of those 2 integers before calling the usual code
> for the selection.

Yeah ... :/

>
> If we assume a -1/0 and MSB representation in the middle-end, the front-end
> could just pass ((a<b)&(c<d))?x:y to the middle-end. When moving to the
> back-end, "nothing" would happen on x86.

But the frontend needs to follow a language standard (which seems to
be OpenCL for C).  It of course could see the conditions are only
used in a COND_EXPR and try to optimize that.

> Comparing x86, neon and altivec, they all have comparisons that return a
> vector of -1 and 0. On the other hand, they have different selection
> instructions. x86 uses <0, altivec uses !=0 and neon has a bitwise select
> and thus requires exactly -1 or 0. It thus seems to me that we should decide
> in the middle-end that vector comparisons return vectors of -1 and 0.

Yes, I think -1 and 0 are indeed the best choice.

> VEC_COND_EXPR is more complicated. We could for instance require that it
> takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
> thing are equivalent). Which would leave to decide what the expansion of
> vec_cond_expr passes to the targets when the first argument is not a
> comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
> opencl). One issue is that targets wouldn't know if it was a dummy
> comparison that can safely be ignored because the other part is the result
> of logical operations on comparisons (thus composed of -1 and 0) or a
> genuine comparison with an arbitrary vector, so a new optimization would be
> needed (in the back-end I guess or we would need an alternate instruction to
> vcond) to detect if a vector is a "signed boolean" vector.
> We could instead say that vec_cond_expr really follows OpenCL's semantics
> and looks at the MSB of each element. I am not sure that would change much,
> it would mostly delay the apparition of <0 to RTL expansion time (and thus
> make gimple slightly lighter).

I think we should delay the decision on how to optimize this.  It's indeed
not trivial and the GIMPLE middle-end aggressively forwards feeding
comparisons into the VEC_COND_EXPR expressions already (somewhat
defeating any CSE that might be possible here) in forwprop.

>
>
>>>>> +/* Return true if EXPR is an integer constant representing true.  */
>>>>> +
>>>>> +bool
>>>>> +integer_truep (const_tree expr)
>>>>> +{
>>>>> +  STRIP_NOPS (expr);
>>>>> +
>>>>> +  switch (TREE_CODE (expr))
>>>>> +    {
>>>>> +    case INTEGER_CST:
>>>>> +      /* Do not just test != 0, some places expect the value 1.  */
>>>>> +      return (TREE_INT_CST_LOW (expr) == 1
>>>>> +             && TREE_INT_CST_HIGH (expr) == 0);
>>>>
>>>>
>>>>
>>>> I wonder if using STORE_FLAG_VALUE is better here (note that it
>>>> usually differs for FP vs. integral comparisons and the mode passed
>>>> to STORE_FLAG_VALUE is that of the comparison result).
>>>
>>>
>>>
>>> I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
>>> simplify-rtx, in a way that seems a bit strange but I'll try to
>>> understand that later). Thanks for showing me this macro, it seems
>>> important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
>>> the RTL level.
>>>
>>> It looks like it would be possible to have 3 different semantics:
>>> source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
>>> and back-end is whatever the target wants. The front-end would generate
>>> for a<b : vec_cond_expr(a<b,-1,0)
>>
>>
>> seems like the middle-end uses this for lowering vector compares,
>> a < b -> { a[0] < b[0] ? -1 : 0, ... }
>>
>>> and for a?b:c : vec_cond_expr(a<0,b,c)
>>
>>
>> it looks like ?: is not generally handled by tree-vect-generic, so it must
>> be either not supported by the frontend or lowered therein (ISTR
>> it is forced to appear as a != {0,...} ? ... : ...)
>
>
> Not supported by the front-end yet (not even by the gimplifier), I have
> (bad) patches but I can't really finish them before this conversation is
> done.
>
>
>
> I think there are quite few places in the middle-end that assume that
> comparisons return a vector of -1/0 and even fewer that vec_cond_expr only
> looks at the MSB of each element. So it is still time to change that if you
> want to. But if we want to change it, I think it should happen now before
> even more vector code gets in (not particularly my patches, I am thinking of
> cilk and others too).

I think we should document the -1/0 fact and stick to it.

Thanks,
Richard.

>
> Ok, that's long enough, I need to send it now...
>
> --
> Marc Glisse

Reply via email to