[I am still a little confused, sorry for the long email...]
On Tue, 2 Oct 2012, Richard Guenther wrote:
+ if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
+ {
+ int count = VECTOR_CST_NELTS (op0);
+ tree *elts = XALLOCAVEC (tree, count);
+ gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
+
+ for (int i = 0; i < count; i++)
+ {
+ tree elem_type = TREE_TYPE (type);
+ tree elem0 = VECTOR_CST_ELT (op0, i);
+ tree elem1 = VECTOR_CST_ELT (op1, i);
+
+ elts[i] = fold_relational_const (code, elem_type,
+ elem0, elem1);
+
+ if(elts[i] == NULL_TREE)
+ return NULL_TREE;
+
+ elts[i] = fold_negate_const (elts[i], elem_type);
I think you need to invent something new similar to STORE_FLAG_VALUE
or use STORE_FLAG_VALUE here. With the above you try to map
{0, 1} to {0, -1} which is only true if the operation on the element types
returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
Er, seems to me that constant folding of a scalar comparison in the
front/middle-end only returns {0, 1}.
[and later]
I'd say adjust your fold-const patch to not negate the scalar result
but build a proper -1 / 0 value based on integer_zerop().
I don't mind doing it that way, but I would like to understand first.
LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So
negating should be the same as testing with integer_zerop to build -1 or
0. Is it just a matter of style (then I am ok), or am I missing a reason
which makes the negation wrong?
The point is we need to define some semantics for vector comparison
results.
Yes. I think a documentation patch should come first: generic.texi is
missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't
mention vectors. But before that we need to decide what to put there...
One variant is to make it target independent which in turn
would inhibit (or make it more difficult) to exploit some target features.
You for example use {0, -1} for truth values - probably to exploit target
features -
Actually it was mostly because that is the meaning in the language. OpenCL
says that a<b is a vector of 0 and -1, and that ?: only looks at the MSB
of the elements in the condition. The fact that it matches what some
targets do is a simple consequence of the fact that OpenCL was based on
what hardware already did.
even though the most natural middle-end way would be to
use {0, 1} as for everything else
I agree that it would be natural and convenient in a number of places.
(caveat: there may be both signed and unsigned bools, we don't allow
vector components with non-mode precision, thus you could argue that a
signed bool : 1 is just "sign-extended" for your solution).
Not sure how that would translate in the code.
A different variant is to make it target dependent to leverage
optimization opportunities
That's an interesting possibility...
that's why STORE_FLAG_VALUE exists.
AFAICS it only appears when we go from gimple to rtl, not before (and
there is already a VECTOR_STORE_FLAG_VALUE, although no target defines
it). Which doesn't mean we couldn't make it appear earlier for vectors.
For example with vector comparisons a < v result, when
performing bitwise operations on it, you either have to make the target
expand code to produce {0, -1} even if the natural compare instruction
would, say, produce {0, 0x80000} - or not constrain the possible values
of its result (like forwprop would do with your patch). In general we
want constant folding to yield the same results as if the HW carried
out the operation to make -O0 code not diverge from -O1. Thus,
v4si g;
int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }
should not assign different values to g dependent on constant propagation
performed or not.
That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether
your target likes it or not. Depending on how things are handled,
comparisons could be constrained internally to only appear (possibly
indirectly) in the first argument of a vec_cond_expr.
The easiest way out is something like STORE_FLAG_VALUE
if there does not exist a middle-end choice for vector true / false components
that can be easily generated from what the target produces.
Like if you perform a FP comparison
int main () { double x = 1.0; static _Bool b; b = x < 3.0; }
you get without CCP on x86_64:
ucomisd -8(%rbp), %xmm0
seta %al
movb %al, b.1715(%rip)
thus the equivalent of
flag_reg = x < 3.0;
b = flag_reg ? 1 : 0;
where this expansion happens in the back-end.
for vector compares you get something similar:
flag_vec = x < y;
res = flag_vec ? { 1, ... } : { 0, ... };
which I think you can see being produced by generic vector lowering
(in do_compare). Where I can see we indeed use {0, -1} ... which
would match your constant folding behavior.
We may not be able to easily recover from this intermediate step
with combine (I'm not sure), so a target dependent value may
be prefered.
Being able to optimize it is indeed a key point. Let's try on an example
(not assuming any specific representation in the middle-end for now). Say
I write this C/OpenCL code: ((a<b)&&(c<d))?x:y (not currently supported)
The front-end gives to the middle-end: ((((a<b)?-1:0)&((c<d)?-1:0))<0)?x:y
On an architecture like sse, neon or altivec where VECTOR_STORE_FLAG_VALUE
is -1 (well, should be), expansion of (a<b)?-1:0 would just be a<b. The <0
can also disappear if the vcond instruction only looks at the MSB (x86).
And we are left in the back-end with ((a<b)&(c<d))?x:y, as desired.
On other architectures, expecting the back-end to simplify everything does
seem hard. But it isn't obvious how to handle it in the middle end either.
Some other forms we could imagine the middle-end producing:
(a<b)?(c<d)?x:y:y
or assuming that VECTOR_STORE_FLAG_VALUE is defined:
(((a<b)&(c<d))!=0)?x:y (back-end would remove the != 0 on altivec)
Both would require special code to happen.
But then how do we handle for instance sparc, where IIUC comparing 2
vectors returns an integer, where bits 0, 1, etc of the integer represent
true/false for the comparisons of elements 0, 1, etc of the vectors (as in
vec_merge, but not constant)? Defining VECTOR_STORE_FLAG_VALUE is not
possible since comparisons don't return a vector, but we would still want
to compute a<b, c<d, and perform an AND of those 2 integers before calling
the usual code for the selection.
If we assume a -1/0 and MSB representation in the middle-end, the
front-end could just pass ((a<b)&(c<d))?x:y to the middle-end. When
moving to the back-end, "nothing" would happen on x86.
Comparing x86, neon and altivec, they all have comparisons that return a
vector of -1 and 0. On the other hand, they have different selection
instructions. x86 uses <0, altivec uses !=0 and neon has a bitwise select
and thus requires exactly -1 or 0. It thus seems to me that we should
decide in the middle-end that vector comparisons return vectors of -1 and
0. VEC_COND_EXPR is more complicated. We could for instance require that
it takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
thing are equivalent). Which would leave to decide what the expansion of
vec_cond_expr passes to the targets when the first argument is not a
comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
opencl). One issue is that targets wouldn't know if it was a dummy
comparison that can safely be ignored because the other part is the result
of logical operations on comparisons (thus composed of -1 and 0) or a
genuine comparison with an arbitrary vector, so a new optimization would
be needed (in the back-end I guess or we would need an alternate
instruction to vcond) to detect if a vector is a "signed boolean" vector.
We could instead say that vec_cond_expr really follows OpenCL's semantics
and looks at the MSB of each element. I am not sure that would change
much, it would mostly delay the apparition of <0 to RTL expansion time
(and thus make gimple slightly lighter).
+/* Return true if EXPR is an integer constant representing true. */
+
+bool
+integer_truep (const_tree expr)
+{
+ STRIP_NOPS (expr);
+
+ switch (TREE_CODE (expr))
+ {
+ case INTEGER_CST:
+ /* Do not just test != 0, some places expect the value 1. */
+ return (TREE_INT_CST_LOW (expr) == 1
+ && TREE_INT_CST_HIGH (expr) == 0);
I wonder if using STORE_FLAG_VALUE is better here (note that it
usually differs for FP vs. integral comparisons and the mode passed
to STORE_FLAG_VALUE is that of the comparison result).
I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
simplify-rtx, in a way that seems a bit strange but I'll try to
understand that later). Thanks for showing me this macro, it seems
important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
the RTL level.
It looks like it would be possible to have 3 different semantics:
source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
and back-end is whatever the target wants. The front-end would generate
for a<b : vec_cond_expr(a<b,-1,0)
seems like the middle-end uses this for lowering vector compares,
a < b -> { a[0] < b[0] ? -1 : 0, ... }
and for a?b:c : vec_cond_expr(a<0,b,c)
it looks like ?: is not generally handled by tree-vect-generic, so it must
be either not supported by the frontend or lowered therein (ISTR
it is forced to appear as a != {0,...} ? ... : ...)
Not supported by the front-end yet (not even by the gimplifier), I have
(bad) patches but I can't really finish them before this conversation is
done.
I think there are quite few places in the middle-end that assume that
comparisons return a vector of -1/0 and even fewer that vec_cond_expr only
looks at the MSB of each element. So it is still time to change that if
you want to. But if we want to change it, I think it should happen now
before even more vector code gets in (not particularly my patches, I am
thinking of cilk and others too).
Ok, that's long enough, I need to send it now...
--
Marc Glisse