On Fri, Oct 5, 2012 at 5:01 PM, Marc Glisse <marc.gli...@inria.fr> wrote: > [I am still a little confused, sorry for the long email...] > > > On Tue, 2 Oct 2012, Richard Guenther wrote: > >>>>> + if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST) >>>>> + { >>>>> + int count = VECTOR_CST_NELTS (op0); >>>>> + tree *elts = XALLOCAVEC (tree, count); >>>>> + gcc_assert (TREE_CODE (type) == VECTOR_TYPE); >>>>> + >>>>> + for (int i = 0; i < count; i++) >>>>> + { >>>>> + tree elem_type = TREE_TYPE (type); >>>>> + tree elem0 = VECTOR_CST_ELT (op0, i); >>>>> + tree elem1 = VECTOR_CST_ELT (op1, i); >>>>> + >>>>> + elts[i] = fold_relational_const (code, elem_type, >>>>> + elem0, elem1); >>>>> + >>>>> + if(elts[i] == NULL_TREE) >>>>> + return NULL_TREE; >>>>> + >>>>> + elts[i] = fold_negate_const (elts[i], elem_type); >>>> >>>> >>>> >>>> I think you need to invent something new similar to STORE_FLAG_VALUE >>>> or use STORE_FLAG_VALUE here. With the above you try to map >>>> {0, 1} to {0, -1} which is only true if the operation on the element >>>> types >>>> returns {0, 1} (thus, STORE_FLAG_VALUE is 1). >>> >>> >>> Er, seems to me that constant folding of a scalar comparison in the >>> front/middle-end only returns {0, 1}. > > [and later] > >> I'd say adjust your fold-const patch to not negate the scalar result >> but build a proper -1 / 0 value based on integer_zerop(). > > > I don't mind doing it that way, but I would like to understand first. > LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So negating > should be the same as testing with integer_zerop to build -1 or 0. Is it > just a matter of style (then I am ok), or am I missing a reason which makes > the negation wrong?
Just a matter of style. Negating is a lot less descriptive for the actual set of return values we produce. >> The point is we need to define some semantics for vector comparison >> results. > > > Yes. I think a documentation patch should come first: generic.texi is > missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't mention > vectors. But before that we need to decide what to put there... > > >> One variant is to make it target independent which in turn >> would inhibit (or make it more difficult) to exploit some target features. >> You for example use {0, -1} for truth values - probably to exploit target >> features - > > > Actually it was mostly because that is the meaning in the language. OpenCL > says that a<b is a vector of 0 and -1, and that ?: only looks at the MSB of > the elements in the condition. The fact that it matches what some targets do > is a simple consequence of the fact that OpenCL was based on what hardware > already did. Yes, it seems that the {0, -1} choice is most reasonable for GENERIC. So let's document that. > >> even though the most natural middle-end way would be to >> use {0, 1} as for everything else > > > I agree that it would be natural and convenient in a number of places. > > >> (caveat: there may be both signed and unsigned bools, we don't allow >> vector components with non-mode precision, thus you could argue that a >> signed bool : 1 is just "sign-extended" for your solution). > > > Not sure how that would translate in the code. > > >> A different variant is to make it target dependent to leverage >> optimization opportunities > > > That's an interesting possibility... > > >> that's why STORE_FLAG_VALUE exists. > > > AFAICS it only appears when we go from gimple to rtl, not before (and there > is already a VECTOR_STORE_FLAG_VALUE, although no target defines it). Which > doesn't mean we couldn't make it appear earlier for vectors. > > >> For example with vector comparisons a < v result, when >> performing bitwise operations on it, you either have to make the target >> expand code to produce {0, -1} even if the natural compare instruction >> would, say, produce {0, 0x80000} - or not constrain the possible values >> of its result (like forwprop would do with your patch). In general we >> want constant folding to yield the same results as if the HW carried >> out the operation to make -O0 code not diverge from -O1. Thus, >> >> v4si g; >> int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; } >> >> should not assign different values to g dependent on constant propagation >> performed or not. > > > That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether > your target likes it or not. Depending on how things are handled, > comparisons could be constrained internally to only appear (possibly > indirectly) in the first argument of a vec_cond_expr. Yes, I realized that later. > >> The easiest way out is something like STORE_FLAG_VALUE >> if there does not exist a middle-end choice for vector true / false >> components >> that can be easily generated from what the target produces. >> >> Like if you perform a FP comparison >> >> int main () { double x = 1.0; static _Bool b; b = x < 3.0; } >> >> you get without CCP on x86_64: >> >> ucomisd -8(%rbp), %xmm0 >> seta %al >> movb %al, b.1715(%rip) >> >> thus the equivalent of >> >> flag_reg = x < 3.0; >> b = flag_reg ? 1 : 0; > > > where this expansion happens in the back-end. In the target specific expander for the comparison. > >> for vector compares you get something similar: >> >> flag_vec = x < y; >> res = flag_vec ? { 1, ... } : { 0, ... }; >> >> which I think you can see being produced by generic vector lowering >> (in do_compare). Where I can see we indeed use {0, -1} ... which >> would match your constant folding behavior. >> >> We may not be able to easily recover from this intermediate step >> with combine (I'm not sure), so a target dependent value may >> be prefered. > > > Being able to optimize it is indeed a key point. Let's try on an example > (not assuming any specific representation in the middle-end for now). Say I > write this C/OpenCL code: ((a<b)&&(c<d))?x:y (not currently supported) > > The front-end gives to the middle-end: ((((a<b)?-1:0)&((c<d)?-1:0))<0)?x:y > > On an architecture like sse, neon or altivec where VECTOR_STORE_FLAG_VALUE > is -1 (well, should be), expansion of (a<b)?-1:0 would just be a<b. The <0 > can also disappear if the vcond instruction only looks at the MSB (x86). And > we are left in the back-end with ((a<b)&(c<d))?x:y, as desired. > > On other architectures, expecting the back-end to simplify everything does > seem hard. But it isn't obvious how to handle it in the middle end either. > Some other forms we could imagine the middle-end producing: > (a<b)?(c<d)?x:y:y > or assuming that VECTOR_STORE_FLAG_VALUE is defined: > (((a<b)&(c<d))!=0)?x:y (back-end would remove the != 0 on altivec) > Both would require special code to happen. True. > But then how do we handle for instance sparc, where IIUC comparing 2 vectors > returns an integer, where bits 0, 1, etc of the integer represent true/false > for the comparisons of elements 0, 1, etc of the vectors (as in vec_merge, > but not constant)? Defining VECTOR_STORE_FLAG_VALUE is not possible since > comparisons don't return a vector, but we would still want to compute a<b, > c<d, and perform an AND of those 2 integers before calling the usual code > for the selection. Yeah ... :/ > > If we assume a -1/0 and MSB representation in the middle-end, the front-end > could just pass ((a<b)&(c<d))?x:y to the middle-end. When moving to the > back-end, "nothing" would happen on x86. But the frontend needs to follow a language standard (which seems to be OpenCL for C). It of course could see the conditions are only used in a COND_EXPR and try to optimize that. > Comparing x86, neon and altivec, they all have comparisons that return a > vector of -1 and 0. On the other hand, they have different selection > instructions. x86 uses <0, altivec uses !=0 and neon has a bitwise select > and thus requires exactly -1 or 0. It thus seems to me that we should decide > in the middle-end that vector comparisons return vectors of -1 and 0. Yes, I think -1 and 0 are indeed the best choice. > VEC_COND_EXPR is more complicated. We could for instance require that it > takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon > thing are equivalent). Which would leave to decide what the expansion of > vec_cond_expr passes to the targets when the first argument is not a > comparison, between !=0, <0, ==-1 or others (I vote for <0 because of > opencl). One issue is that targets wouldn't know if it was a dummy > comparison that can safely be ignored because the other part is the result > of logical operations on comparisons (thus composed of -1 and 0) or a > genuine comparison with an arbitrary vector, so a new optimization would be > needed (in the back-end I guess or we would need an alternate instruction to > vcond) to detect if a vector is a "signed boolean" vector. > We could instead say that vec_cond_expr really follows OpenCL's semantics > and looks at the MSB of each element. I am not sure that would change much, > it would mostly delay the apparition of <0 to RTL expansion time (and thus > make gimple slightly lighter). I think we should delay the decision on how to optimize this. It's indeed not trivial and the GIMPLE middle-end aggressively forwards feeding comparisons into the VEC_COND_EXPR expressions already (somewhat defeating any CSE that might be possible here) in forwprop. > > >>>>> +/* Return true if EXPR is an integer constant representing true. */ >>>>> + >>>>> +bool >>>>> +integer_truep (const_tree expr) >>>>> +{ >>>>> + STRIP_NOPS (expr); >>>>> + >>>>> + switch (TREE_CODE (expr)) >>>>> + { >>>>> + case INTEGER_CST: >>>>> + /* Do not just test != 0, some places expect the value 1. */ >>>>> + return (TREE_INT_CST_LOW (expr) == 1 >>>>> + && TREE_INT_CST_HIGH (expr) == 0); >>>> >>>> >>>> >>>> I wonder if using STORE_FLAG_VALUE is better here (note that it >>>> usually differs for FP vs. integral comparisons and the mode passed >>>> to STORE_FLAG_VALUE is that of the comparison result). >>> >>> >>> >>> I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in >>> simplify-rtx, in a way that seems a bit strange but I'll try to >>> understand that later). Thanks for showing me this macro, it seems >>> important indeed. However the STORE_FLAG_VALUE mechanism seems to be for >>> the RTL level. >>> >>> It looks like it would be possible to have 3 different semantics: >>> source code is OpenCL, middle-end whatever we want (0 / 1 for instance), >>> and back-end is whatever the target wants. The front-end would generate >>> for a<b : vec_cond_expr(a<b,-1,0) >> >> >> seems like the middle-end uses this for lowering vector compares, >> a < b -> { a[0] < b[0] ? -1 : 0, ... } >> >>> and for a?b:c : vec_cond_expr(a<0,b,c) >> >> >> it looks like ?: is not generally handled by tree-vect-generic, so it must >> be either not supported by the frontend or lowered therein (ISTR >> it is forced to appear as a != {0,...} ? ... : ...) > > > Not supported by the front-end yet (not even by the gimplifier), I have > (bad) patches but I can't really finish them before this conversation is > done. > > > > I think there are quite few places in the middle-end that assume that > comparisons return a vector of -1/0 and even fewer that vec_cond_expr only > looks at the MSB of each element. So it is still time to change that if you > want to. But if we want to change it, I think it should happen now before > even more vector code gets in (not particularly my patches, I am thinking of > cilk and others too). I think we should document the -1/0 fact and stick to it. Thanks, Richard. > > Ok, that's long enough, I need to send it now... > > -- > Marc Glisse