On Wed, 23 Sep 2015, Kyrill Tkachov wrote: > > On 23/09/15 11:10, Richard Biener wrote: > > On Wed, 23 Sep 2015, Kyrill Tkachov wrote: > > > > > On 23/09/15 10:09, Pinski, Andrew wrote: > > > > > On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov <kyrylo.tkac...@arm.com> > > > > > wrote: > > > > > > > > > > > > > > > > On 22/09/15 20:31, Jeff Law wrote: > > > > > > > On 09/22/2015 07:36 AM, Kyrill Tkachov wrote: > > > > > > > Hi all, > > > > > > > Unfortunately, I see a testsuite regression with this patch: > > > > > > > FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<" > > > > > > > > > > > > > > The reduced part of that test is: > > > > > > > void > > > > > > > test1 (int x, unsigned u) > > > > > > > { > > > > > > > if ((1U << x) != 64 > > > > > > > || (2 << x) != u > > > > > > > || (x << x) != 384 > > > > > > > || (3 << x) == 9 > > > > > > > || (x << 14) != 98304U > > > > > > > || (1 << x) == 14 > > > > > > > || (3 << 2) != 12) > > > > > > > __builtin_abort (); > > > > > > > } > > > > > > > > > > > > > > The patched ifcombine pass works more or less as expected and > > > > > > > produces > > > > > > > fewer basic blocks. > > > > > > > Before this patch a relevant part of the ifcombine dump for test1 > > > > > > > is: > > > > > > > ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot > > > > > > > if (x_1(D) != 6) > > > > > > > goto <bb 6>; > > > > > > > else > > > > > > > goto <bb 3>; > > > > > > > > > > > > > > ;; basic block 3, loop depth 0, count 0, freq 9996, maybe hot > > > > > > > _2 = 2 << x_1(D); > > > > > > > _3 = (unsigned intD.10) _2; > > > > > > > if (_3 != u_4(D)) > > > > > > > goto <bb 6>; > > > > > > > else > > > > > > > goto <bb 4>; > > > > > > > > > > > > > > > > > > > > > After this patch it is: > > > > > > > ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot > > > > > > > _2 = 2 << x_1(D); > > > > > > > _3 = (unsigned intD.10) _2; > > > > > > > _9 = _3 != u_4(D); > > > > > > > _10 = x_1(D) != 6; > > > > > > > _11 = _9 | _10; > > > > > > > if (_11 != 0) > > > > > > > goto <bb 5>; > > > > > > > else > > > > > > > goto <bb 3>; > > > > > > > > > > > > > > The second form ends up generating worse codegen however, and the > > > > > > > badness starts with the dom1 pass. > > > > > > > In the unpatched case it manages to deduce that x must be 6 by the > > > > > > > time > > > > > > > it reaches basic block 3 and > > > > > > > uses that information to eliminate the shift in "_2 = 2 << x_1(D)" > > > > > > > from > > > > > > > basic block 3 > > > > > > > In the patched case it is unable to make that call, I think > > > > > > > because > > > > > > > the > > > > > > > x != 6 condition is IORed > > > > > > > with another test. > > > > > > > > > > > > > > I'm not familiar with the internals of the dom pass, so I'm not > > > > > > > sure > > > > > > > where to go looking for a fix for this. > > > > > > > Is the ifcombine change a step in the right direction? If so, what > > > > > > > would > > > > > > > need to be done to fix the issue with > > > > > > > the dom pass? > > > > > > I don't see how you can reasonably fix this in DOM. if _9 or _10 is > > > > > > true, then _11 is true. But we can't reasonably record any kind of > > > > > > equivalence for _9 or _10 individually. > > > > > > > > > > > > If the statement > > > > > > _11 = _9 | _10; > > > > > > > > > > > > Were changed to > > > > > > > > > > > > _11 = _9 & _10; > > > > > > > > > > > > Then we could record something useful about _9 and _10. > > > > > > > > > > > > > > > > > > > I suppose what we want is to not combine basic blocks if the > > > > > > > sequence > > > > > > > and conditions of the basic blocks are > > > > > > > such that dom can potentially exploit them, but how do we express > > > > > > > that? > > > > > > I don't think there's going to be a way to directly express that. > > > > > > You > > > > > > could essentially claim that TRUTH_OR is more expensive than > > > > > > TRUTH_AND > > > > > > because of the impact on DOM, but that in and of itself may not > > > > > > resolve > > > > > > the situation either. > > > > > > > > > > > > I think the question we need to answer is whether or not your > > > > > > changes > > > > > > are generally better, even if there's specific instances where they > > > > > > make > > > > > > things worse. If the benefits outweigh the negatives then we can > > > > > > xfail > > > > > > that test. > > > > > Ok, I'll investigate and benchmark some more. > > > > > Andrew, this transformation to ifcombine (together with the > > > > > restriction > > > > > that the inner condition block > > > > > has to be a single comparison) was added by you with r204194. > > > > > Is there a particular reason for that restriction and why it is > > > > > applied to > > > > > the inner block and not either? > > > > My reasoning at the time was there might be an "expensive" instruction > > > > or > > > > one that might trap (I did not check to see if the other part of the > > > > code > > > > was detecting that). > > > > The outer block did not need any checks as we have something like > > > > ... > > > > If (a) > > > > If (b) > > > > > > > > Or > > > > .... > > > > If (a) > > > > Goto f > > > > else if (b) > > > > .... > > > > Else > > > > { > > > > F: > > > > .... > > > > } > > > > > > > > And there was no need to check what was before the if (a) part just what > > > > is > > > > in between the two ifs. > > > Ah, because the code in outer_cond_bb would have to be executed anyway > > > whether > > > we perform the conversion or not, right? > > All ifcombine transforms make the outer condition unconditionally > > true/false thus the check should have been on whether the outer > > cond BB is "empty". Which would solve your problem, right? > > I'm not sure I follow. Why does cond bb has to be empty?
Sorry, I mixed it up. Of course the cond is at the end of cond bb... > > > > Note that other transforms (bit test recognition) don't care (sth > > we might want to fix?). > > > > In general this needs a better cost function, maybe simply use > > estimate_num_insns with speed estimates and compare against a > > new --param. > > Thanks, that looks like a starting point. > If we were add some kind of costing check here, would we even need > the checks mentioned above? I don't think it will affect correctness > (the inner cond bb is checked for no side-effects before entering this > function). No, we'd replace the check with the cost function. Richard. > Thanks, > Kyrill > > > > > Thanks, > > Richard. > > > > > Thanks, > > > Kyrill > > > > > > > What I mean by expensive for an example is division or some function > > > > call. > > > > > > > > Thanks, > > > > Andrew > > > > > > > > > > > > > Thanks, > > > > > Kyrill > > > > > > > > > > > > > > > > > > > > > jeff > > > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)