On Fri, 25 Sep 2015, Kyrill Tkachov wrote: > > On 25/09/15 10:49, Richard Biener wrote: > > On Fri, 25 Sep 2015, Kyrill Tkachov wrote: > > > > > Hi all, > > > > > > On 23/09/15 11:10, Richard Biener wrote: > > > > On Wed, 23 Sep 2015, Kyrill Tkachov wrote: > > > > > > > > > On 23/09/15 10:09, Pinski, Andrew wrote: > > > > > > > On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov > > > > > > > <kyrylo.tkac...@arm.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On 22/09/15 20:31, Jeff Law wrote: > > > > > > > > > On 09/22/2015 07:36 AM, Kyrill Tkachov wrote: > > > > > > > > > Hi all, > > > > > > > > > Unfortunately, I see a testsuite regression with this patch: > > > > > > > > > FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<" > > > > > > > > > > > > > > > > > > The reduced part of that test is: > > > > > > > > > void > > > > > > > > > test1 (int x, unsigned u) > > > > > > > > > { > > > > > > > > > if ((1U << x) != 64 > > > > > > > > > || (2 << x) != u > > > > > > > > > || (x << x) != 384 > > > > > > > > > || (3 << x) == 9 > > > > > > > > > || (x << 14) != 98304U > > > > > > > > > || (1 << x) == 14 > > > > > > > > > || (3 << 2) != 12) > > > > > > > > > __builtin_abort (); > > > > > > > > > } > > > > > > > > > > > > > > > > > > The patched ifcombine pass works more or less as expected and > > > > > > > > > produces > > > > > > > > > fewer basic blocks. > > > > > > > > > Before this patch a relevant part of the ifcombine dump for > > > > > > > > > test1 > > > > > > > > > is: > > > > > > > > > ;; basic block 2, loop depth 0, count 0, freq 10000, maybe > > > > > > > > > hot > > > > > > > > > if (x_1(D) != 6) > > > > > > > > > goto <bb 6>; > > > > > > > > > else > > > > > > > > > goto <bb 3>; > > > > > > > > > > > > > > > > > > ;; basic block 3, loop depth 0, count 0, freq 9996, maybe > > > > > > > > > hot > > > > > > > > > _2 = 2 << x_1(D); > > > > > > > > > _3 = (unsigned intD.10) _2; > > > > > > > > > if (_3 != u_4(D)) > > > > > > > > > goto <bb 6>; > > > > > > > > > else > > > > > > > > > goto <bb 4>; > > > > > > > > > > > > > > > > > > > > > > > > > > > After this patch it is: > > > > > > > > > ;; basic block 2, loop depth 0, count 0, freq 10000, maybe > > > > > > > > > hot > > > > > > > > > _2 = 2 << x_1(D); > > > > > > > > > _3 = (unsigned intD.10) _2; > > > > > > > > > _9 = _3 != u_4(D); > > > > > > > > > _10 = x_1(D) != 6; > > > > > > > > > _11 = _9 | _10; > > > > > > > > > if (_11 != 0) > > > > > > > > > goto <bb 5>; > > > > > > > > > else > > > > > > > > > goto <bb 3>; > > > > > > > > > > > > > > > > > > The second form ends up generating worse codegen however, and > > > > > > > > > the > > > > > > > > > badness starts with the dom1 pass. > > > > > > > > > In the unpatched case it manages to deduce that x must be 6 by > > > > > > > > > the > > > > > > > > > time > > > > > > > > > it reaches basic block 3 and > > > > > > > > > uses that information to eliminate the shift in "_2 = 2 << > > > > > > > > > x_1(D)" > > > > > > > > > from > > > > > > > > > basic block 3 > > > > > > > > > In the patched case it is unable to make that call, I think > > > > > > > > > because > > > > > > > > > the > > > > > > > > > x != 6 condition is IORed > > > > > > > > > with another test. > > > > > > > > > > > > > > > > > > I'm not familiar with the internals of the dom pass, so I'm > > > > > > > > > not > > > > > > > > > sure > > > > > > > > > where to go looking for a fix for this. > > > > > > > > > Is the ifcombine change a step in the right direction? If so, > > > > > > > > > what > > > > > > > > > would > > > > > > > > > need to be done to fix the issue with > > > > > > > > > the dom pass? > > > > > > > > I don't see how you can reasonably fix this in DOM. if _9 or > > > > > > > > _10 is > > > > > > > > true, then _11 is true. But we can't reasonably record any kind > > > > > > > > of > > > > > > > > equivalence for _9 or _10 individually. > > > > > > > > > > > > > > > > If the statement > > > > > > > > _11 = _9 | _10; > > > > > > > > > > > > > > > > Were changed to > > > > > > > > > > > > > > > > _11 = _9 & _10; > > > > > > > > > > > > > > > > Then we could record something useful about _9 and _10. > > > > > > > > > > > > > > > > > > > > > > > > > I suppose what we want is to not combine basic blocks if the > > > > > > > > > sequence > > > > > > > > > and conditions of the basic blocks are > > > > > > > > > such that dom can potentially exploit them, but how do we > > > > > > > > > express > > > > > > > > > that? > > > > > > > > I don't think there's going to be a way to directly express > > > > > > > > that. > > > > > > > > You > > > > > > > > could essentially claim that TRUTH_OR is more expensive than > > > > > > > > TRUTH_AND > > > > > > > > because of the impact on DOM, but that in and of itself may not > > > > > > > > resolve > > > > > > > > the situation either. > > > > > > > > > > > > > > > > I think the question we need to answer is whether or not your > > > > > > > > changes > > > > > > > > are generally better, even if there's specific instances where > > > > > > > > they > > > > > > > > make > > > > > > > > things worse. If the benefits outweigh the negatives then we > > > > > > > > can > > > > > > > > xfail > > > > > > > > that test. > > > > > > > Ok, I'll investigate and benchmark some more. > > > > > > > Andrew, this transformation to ifcombine (together with the > > > > > > > restriction > > > > > > > that the inner condition block > > > > > > > has to be a single comparison) was added by you with r204194. > > > > > > > Is there a particular reason for that restriction and why it is > > > > > > > applied to > > > > > > > the inner block and not either? > > > > > > My reasoning at the time was there might be an "expensive" > > > > > > instruction > > > > > > or > > > > > > one that might trap (I did not check to see if the other part of the > > > > > > code > > > > > > was detecting that). > > > > > > The outer block did not need any checks as we have something like > > > > > > ... > > > > > > If (a) > > > > > > If (b) > > > > > > > > > > > > Or > > > > > > .... > > > > > > If (a) > > > > > > Goto f > > > > > > else if (b) > > > > > > .... > > > > > > Else > > > > > > { > > > > > > F: > > > > > > .... > > > > > > } > > > > > > > > > > > > And there was no need to check what was before the if (a) part just > > > > > > what > > > > > > is > > > > > > in between the two ifs. > > > > > Ah, because the code in outer_cond_bb would have to be executed anyway > > > > > whether > > > > > we perform the conversion or not, right? > > > > All ifcombine transforms make the outer condition unconditionally > > > > true/false thus the check should have been on whether the outer > > > > cond BB is "empty". Which would solve your problem, right? > > > > > > > > Note that other transforms (bit test recognition) don't care (sth > > > > we might want to fix?). > > > > > > > > In general this needs a better cost function, maybe simply use > > > > estimate_num_insns with speed estimates and compare against a > > > > new --param. > > > So I've played around with that code and I think I'd like to make it a bit > > > more intricate. Just comparing against estimate_num_insns is too > > > coarse-grained and > > > is just a comparison by a magic number number and I've been struggling to > > > make > > > this > > > aggressive enough without pulling in too much code into the unconditional > > > path. > > > > > > As far as aarch64 is concerned I want to make this transformation more > > > aggressive when > > > it can facilitate conditional comparison generation during expand. This > > > means > > > that I want > > > to allow the cases where the inner block contains comparisons, combined > > > using > > > logical operations > > > like TRUTH_AND_EXPR, TRUTH_IOR_EXPR, TRUTH_NOT_EXPR, their bitwise > > > variants > > > etc. > > > The expand code in ccmp.c knows how to handle these chains of comparisons > > > and > > > emit the appropriate > > > conditional compare patterns. If, however, the inner block contains other > > > types of operations like > > > arithmetic ops, pointer dereferencing etc I'd want to be conservative to > > > avoid > > > pulling in operations > > > that don't facilitate the ccmp.c expansions. > > So the inner block only contains stmts feeding a condition? As we're > > combining that condition with the outer one and the result simplified(?) > > it makes sense to allow that as "empty" block generally. > > My concern is that those stmts feeding the condition will > now be executed unconditionally (rather than only after jumping > to the inner_cond_bb) so we might end up doing redundant work > if the the outer condition would actually have jumped to the else block > rather than to the inner one.
That's true. But how is arm not affected by this? That is, what's so special about ccmps here? > Kyrill > > > > > No? > > > > Thanks, > > Richard. > > > > > So what I'm proposing is: > > > - If a target supports conditional comparisons through > > > TARGET_GEN_CCMP_FIRST > > > (currently only aarch64) then we allow the aforementioned > > > comparisons+logical-combinations blocks. > > > If the block also contains other types of operations we apply the > > > estimate_num_insns cost comparison > > > with the default value for the comparison picked to be such so that it > > > changes > > > codegen the least from > > > the current situation i.e. one instruction. This value will be a new param > > > that targets can increase > > > if they want to. > > > > > > - If the target does not support conditional comparisons we follow only > > > the > > > second scheme (the conservative > > > estimate_num_insns comparison). This should cause the minimal codegen > > > difference for the targets that > > > don't support conditional compares (which is all targets but aarch64) > > > while > > > allowing them to scale > > > the aggressiveness of this transformation if their benchmarking shows it > > > to be > > > beneficial. > > > > > > > > > I believe such a scheme would avoid pulling in too much code that > > > doesn't > > > facilitate conditional compares generation into the unconditional path and > > > would minimise > > > impact on existing targets that don't do conditional compares. > > > > > > Would that be an acceptable plan for the ifcombine_ifandif transformation > > > in > > > tree-ssa-ifcombine.c (pending benchmarking, of course)? > > > > > > Thanks, > > > Kyrill > > > > > > > Thanks, > > > > Richard. > > > > > > > > > Thanks, > > > > > Kyrill > > > > > > > > > > > What I mean by expensive for an example is division or some function > > > > > > call. > > > > > > > > > > > > Thanks, > > > > > > Andrew > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > Kyrill > > > > > > > > > > > > > > > > > > > > > > > > > > > > > jeff > > > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)