On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov <artyom.shinkar...@gmail.com> wrote: > On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther > <richard.guent...@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov >> <artyom.shinkar...@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther >>> <richard.guent...@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov >>>> <artyom.shinkar...@gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther >>>>> <richard.guent...@gmail.com> wrote: >>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov >>>>>> <artyom.shinkar...@gmail.com> wrote: >>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther >>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >>>>>>>> <artyom.shinkar...@gmail.com> wrote: >>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>>>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>>>>>>>> <artyom.shinkar...@gmail.com> wrote: >>>>>>>>>>> Richard >>>>>>>>>>> >>>>>>>>>>> I formalized an approach a little-bit, now it works without target >>>>>>>>>>> hooks, but some polishing is still required. I want you to comment >>>>>>>>>>> on >>>>>>>>>>> the several important approaches that I use in the patch. >>>>>>>>>>> >>>>>>>>>>> So how does it work. >>>>>>>>>>> 1) All the vector comparisons at the level of type-checker are >>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands >>>>>>>>>>> being >>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into >>>>>>>>>>> VEC_COND_EXPR<v0 >>>>>>>>>>>> v1, {-1}, {0}>. >>>>>>>>>>> >>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case >>>>>>>>>>> nothing changes. >>>>>>>>>>> 2.b) first operand is something else, in that case, we specially >>>>>>>>>>> mark >>>>>>>>>>> this case, recognize it in the backend, and do not create a >>>>>>>>>>> comparison, but use the mask as it was a result of some comparison. >>>>>>>>>>> >>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> >>>>>>>>>>> is a >>>>>>>>>>> vector comparison we use is_vector_comparison function, if it >>>>>>>>>>> returns >>>>>>>>>>> false, then we replace mask with mask != {0}. >>>>>>>>>>> >>>>>>>>>>> So we end-up with the following functionality: >>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change >>>>>>>>>>> with >>>>>>>>>>> mask != {0}. >>>>>>>>>>> >>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>>>>>>>> which correctly expands, without creating useless masking. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Basically for me there are two questions: >>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so >>>>>>>>>>> on. >>>>>>>>>>> But first is it conceptually fine. >>>>>>>>>>> >>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>>>>>>>> which I find inappropriate, and the functionality gets more >>>>>>>>>>> complicated. >>>>>>>>>> >>>>>>>>>> Why is it inappropriate to not optimize it at -O0? If the user >>>>>>>>>> separates comparison and ?: expression it's his own fault. >>>>>>>>> >>>>>>>>> Well, because all the information is there, and I perfectly envision >>>>>>>>> the case when user expressed comparison separately, just to avoid code >>>>>>>>> duplication. >>>>>>>>> >>>>>>>>> Like: >>>>>>>>> mask = a > b; >>>>>>>>> res1 = mask ? v0 : v1; >>>>>>>>> res2 = mask ? v2 : v3; >>>>>>>>> >>>>>>>>> Which in this case would be different from >>>>>>>>> res1 = a > b ? v0 : v1; >>>>>>>>> res2 = a > b ? v2 : v3; >>>>>>>>> >>>>>>>>>> Btw, the new hook is still in the patch. >>>>>>>>>> >>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers >>>>>>>>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>>>>>>>> non-comparison operands during expansion though, but if >>>>>>>>>> you always forced a != 0 from the start you can then simply >>>>>>>>>> interpret it as a proper comparison result (in which case I'd >>>>>>>>>> modify the backends to have an alternate pattern or directly >>>>>>>>>> expand to masking operations - using the fake comparison >>>>>>>>>> RTX is too much of a hack). >>>>>>>>> >>>>>>>>> Richard, I think you didn't get the problem. >>>>>>>>> I really need the way, to pass the information, that the expression >>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though >>>>>>>>> for quite a while and this hack is the only answer I found, is there a >>>>>>>>> better way to do that. I could for example introduce another >>>>>>>>> tree-node, but it would be overkill as well. >>>>>>>>> >>>>>>>>> Now why do I need it so much: >>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>>>>>>>> always). To check the stuff, I use is_vector_comparison in >>>>>>>>> tree-vect-generic. >>>>>>>>> >>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ? >>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just >>>>>>>>> mask. >>>>>>>>> >>>>>>>>> If this link between optabs and backend breaks, then the patch falls >>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the >>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>>>>>>>> function-comparison, or somehow else? >>>>>>>>> >>>>>>>>> So what would be an appropriate way to connect optabs and the backend? >>>>>>>> >>>>>>>> Well, there is no problem in having the only valid mask operand for >>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >>>>>>>> vec1 : vec2. >>>>>>> >>>>>>> This happens already in the new version of patch (not submitted yet). >>>>>>> >>>>>>>> This comparison can be eliminated by optimization passes >>>>>>>> that >>>>>>>> either replace it by the real comparison computing the mask or just >>>>>>>> propagating the information this mask is already {-1,...} / {0,....} >>>>>>>> by simply >>>>>>>> dropping the comparison against zero. >>>>>>> >>>>>>> This is not a problem, because the backend recognizes these patterns, >>>>>>> so no optimization is needed in this part. >>>>>> >>>>>> I mean for >>>>>> >>>>>> mask = v1 < v2 ? {-1,...} : {0,...}; >>>>>> val = VCOND_EXPR <mask != 0, v3, v4>; >>>>>> >>>>>> optimizers can see how mask is defined and drop the != 0 test or replace >>>>>> it by v1 < v2. >>>>> >>>>> Yes, sure. >>>>> >>>>>>>> For the backends I'd have vcond patterns for both an embedded >>>>>>>> comparison >>>>>>>> and for a mask. (Now we can rewind the discussion a bit and allow >>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise >>>>>>>> selection - what matters is the C frontend semantics which we need to >>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they >>>>>>>> do not have to agree). >>>>>>> >>>>>>> But it seems like another combinatorial explosion here. Considering >>>>>>> what Richard said in his e-mail, in order to support "generic" vcond, >>>>>>> we just need to enumerate all the possible cases. Or I didn't >>>>>>> understand right? >>>>>> >>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern >>>>>> semantically does for a non-comparison operand. I'd argue that using >>>>>> the bitwise selection semantic gives us maximum flexibility and a native >>>>>> instruction with AMD XOP. This non-comparison VCOND_EXPR is >>>>>> also easy to implement in the middle-end expansion code if there is >>>>>> no native instruction for it - by simply emitting the bitwise operations. >>>>>> >>>>>> But I have the feeling we are talking past each other ...? >>>>> >>>>> I am all for the bitwise behaviour in the backend pattern, that is >>>>> something that I rely on at the moment. What I don't want to have is >>>>> the same behaviour in the frontend. So If we can guarantee, that we >>>>> add != 0, when we don't know the "nature" of the mask, then I am >>>>> perfectly fine with the back-end having bitwise-selection behaviour. >>>> >>>> Well, the C frontend would simply always add that != 0 (because it >>>> doesn't know). >>>> >>>>>>> I mean, I don't mind of course, but it seems to me that it would be >>>>>>> cleaner to have one generic enough pattern. >>>>>>> >>>>>>> Is there seriously no way to pass something from optab into the >>>>>>> backend?? >>>>>> >>>>>> You can pass operands. And information is implicitly encoded in the >>>>>> name. >>>>> >>>>> I didn't quite get that, could you give an example? >>>> >>>> It was a larger variant of "no, apart from what is obvious". >>> >>> Ha, joking again. :) >>> >>>>>>>> If the mask is computed by a function you are of course out of luck, >>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere >>>>>>>> either. >>>>>>> >>>>>>> Well, take simpler example >>>>>>> >>>>>>> a = {0}; >>>>>>> for ( ; *p; p += 16) >>>>>>> a &= pattern > (vec)*p; >>>>>>> >>>>>>> res = a ? v0 : v1; >>>>>>> >>>>>>> In this case it is simple to analyse that a is a comparison, but you >>>>>>> cannot embed the operations of a into VEC_COND_EXPR. >>>>>> >>>>>> Sure, but if the above is C source the frontend would generate >>>>>> res = a != 0 ? v0 : v1; initially. An optimization pass could still >>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >>>>>> vector contents though). >>>>> >>>>> Yeah, sure. My point is, that we must be able to pass this information >>>>> in the backend, that we checked everything, and we are sure that a is >>>>> a corerct mask, please don't add any != 0 to it. >>>> >>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR. >>>> That's the whole point of the bitwise semantics. It's only the C frontend >>>> that needs to be careful to impose its stricter semantics. >>>> >>>> Richard. >>>> >>> >>> Ok, I see the last difference in the approaches we envision. >>> I am assuming, that frontend does not put != 0, but the later >>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR >>> and does the same functionality as you describe. So the philosophical >>> question why it is better to first add and then remove, rather than >>> just add if needed? >> >> Well, it's "better be right than sorry". Thus, default to the >> conservatively correct >> way and let optimizers "optimize" it. > > How can we get sorry, it is impossible to skip the vcond during the > optimisation, but whatever, it is not really so important when to add. > Currently I have a bigger problem, see below. >> >>> In all the rest I think we agreed. >> >> Fine. >> >> Thanks, >> Richard. >> >>> >>> Artem. >>> >> > > I found out that I cannot really gimplify correctly the vcond<a >b , > c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is > that gimplifier pulls a > b always as a separate expression during the > gimplification, and I don't think that we can avoid it. So what > happens is: > > vcond <a > b , c , d> > transformed to > x = b > c; > x1 = vcond <x , -1, 0> > vcond <x1, c, d> > > and so on, infinitely long.
Sounds like a bug that is possible to fix. > In order to fix the problem we need whether to introduce a new code > like VEC_COMP_LT, VEC_COMP_GT, and so on > whether a builtin function which we would lower > whether stick back to the idea of hook. > > Anyway, representing a >b using vcond does not work. Well, sure it will work, it just needs some work appearantly. > What would be your thinking here? Do you have a patch that exposes this problem? I can have a look tomorrow. Richard. > > Thanks, > Artem. >