Hi, Thanks Richard, Segher, Andrew and all.
Segher Boessenkool <seg...@kernel.crashing.org> writes: > Let me try to answer some of this... > > On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote: >> On Mon, 6 May 2019, Jiufu Guo wrote: >> > This patch implements the optimization in PR77820. The optimization >> > eliminates phi and phi's basic block, if the phi is used only by >> > condition branch, and the phi's incoming value in the result of a >> > CMP result. > >> I'm not sure I like a new pass here ;) The transform is basically >> tail-duplicating the PHI block because the exit conditional can >> be "simplfied" - that's something jump threading generally does >> though it relies on "simplified" being a little bit more simplified >> than above. > Adding a new pass is just because it may be relatively easy to extend and maintain. Adding this micor-optimization into other passes is also a good sugguestion. - Similar with jump threading, this new transform is replacing jump old destination with new destination. While for this case, the new destination can not be determined at compile time. - And as Andrew said: forwprop/backprop are similar, but this case is in opposite: it is spread common code(phi+gcond) into different preds. - And there is phiopt pass which focus on making 'phi' stmt better. it also do some similar thing: bb0: if (a != b) goto bb2; else goto bb1; bb1: bb2: x = PHI <a (bb1), b (bb0), ...>; tranform to: bb0: bb2: x = PHI <b (bb0), ...>; If I avoid to add new pass, any suggustions about which exiting pass (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to support this new transform? > Right, but where in the pipeline does this fit in? > >> I suspect this transform was implemented because of some benchmark? > > Something in SPEC... 2006 iirc... Will need to dig it up, I forgot > the details. > >> I suspect the performance benefit is because of better branch >> prediction by not mangling both conditional branches into one? > > No, it is that previously a condition was moved to a GPR, and then compared > again. See PR77820. This is expensive, and serial, too. > This transform focusing on eliminating additional GPR saving and additional compare, then it would helps a lot if the comparasion in in hot path. >> The transform is also somewhat similar to tail-duplication done >> in path splitting or tracer. > > Yes. > >> The pass itself does quite strict pattern-matching but I wonder >> if more cases should be handled this way. > > Maybe. Probably. But which? Currently this pass handles the case if there is only one PHI and the phi is used by gcond. If there are other suitable cases, we can just extend this tranform. > >> Any specific reason for the pass placement between PRE and sinking? >> tracer and path splitting run much later, jump threading runs >> all over the place. > > Dunno. Jiufu, does the pass placement matter much? This pass would just need be inserted after ssa, and before the last dce and forwprop which could help to eliminate temp conversion from bool to int: _1 = a_8(D) < b_9(D); t_14 = (int) _1; if (_1 != 0) Putting it after PRE is just because some case is better be handled by PREx, like below case: if (x) t = a < b; else t = a < b; if (t) xx. While actually, this pass is ok to insert at other placement. > > > Segher