On Mon, 10 Jul 2023, Jan Hubicka wrote: > > On Fri, 7 Jul 2023, Jan Hubicka wrote: > > > > > > > > > > Looks good, but I wonder what we can do to at least make the > > > > multiple exit case behave reasonably? The vectorizer keeps track > > > > > > > of a "canonical" exit, would it be possible to pass in the main > > > > exit edge and use that instead of single_exit (), would other > > > > exits then behave somewhat reasonable or would we totally screw > > > > things up here? That is, the "canonical" exit would be the > > > > counting exit while the other exits are on data driven conditions > > > > and thus wouldn't change probability when we reduce the number > > > > of iterations(?) > > > > > > I can add canonical_exit parameter and make the function to direct flow > > > to it if possible. However overall I think fixup depends on what > > > transformation led to the change. > > > > I think the vectorizer knows there's a single counting IV and all > > other exits are dependent on data processed, so the scaling the > > vectorizer just changes the counting IV. So I think it makes > > sense to pass that exit to the function in all cases. > > It really seems to me that vectorized loop is like N loops happening > in parallel, so the probabilities of alternative exits grows as well. > But canonical exit is right thing to do for prologues - here we really > add extra conditions to the iteration counting exit. > > > > > Assuming that vectorizer did no prologues and apilogues and we > > > vectorized with factor N, then I think the update could be done more > > > specifically as follows. > > > > > > We know that header block count dropped by 4. So we can start from that > > > and each time we reach basic block with exit edge, we know the original > > > count of the edge. This count is unchanged, so one can rescale > > > probabilities out of that BB accordingly. If loop has no inner loops, > > > we can just walk the body in RPO and propagate scales downwards and we > > > sould arrive to right result > > > > That should work for alternate exits as well, no? > Yes, i think it could omstly work for acyclic bodies. I ended up > implementing a special case of this for loop-ch in order to handle > corectly loop invariant conditionals. Will send patch after some > cleanups. (There seems to be more loop invariant conditionals in real > code than I would tought) > > Tampering only with loop exit probabilities is not always enought. > If you have: > while (1) > if (test1) > { > if (test2) > break; > } > increasing count of exit may require increasing probablity of the outer > conditional. Do we support this in vectorization at all and if so, do > we know something here?
Tamar would need to answer this but without early break vectorization the if-conversion pass will flatten everything and I think even early breaks will be in the end a non-nested sequence of BBs with exit conds at the end (or a loopback branch). Note the (scalar) epilogue is copied from the original scalar loop body so it doesn't see any if-conversion. > For example if the test1 is triggered if test1 is true in one of > iterations packed togehter, its probability also increases by > vectorization factor. > > We run into this in peeling i.e. when we prove that test1 will trigger > undefined behaviour after one or two iterations but the orignal > esimtated profile believes in higher iteration count. I added special > case for this yesterday to avoid turning if (test2) to 100% in this case > as that triggers strange codegen in some of fortran testcases. > > We also can have > while (1) > while (test1) > { > if (test2) > break; > } > Which is harder because changing probability of test2 affects number > of iteraitons of the inner loop. So I am giving up on this. > I think currently it happens mostly with unlooping. What I saw most wrecking the profile is when passes turn if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup which then simply deletes one of the outgoing edges without doing anything to the (guessed) profile. > > > I originally added the bound parameter to handle prologues/epilogues > > > which gets new artificial bound. In prologue I think you are right that > > > the flow will be probably directed to the conditional counting > > > iterations. > > > > I suppose we'd need to scale both main and epilogue together since > > the epilogue "steals" from the main loop counts. Likewise if there's > > a skip edge around the vector loop. I think currently we simply > > set the edge probability of those skip conds rather than basing > > this off the niter values they work on. Aka if (niter < VF) goto > > epilogue; do {} while (niter / VF); epilogue: do {} while (niter); > > > > There's also the cost model which might require niter > VF to enter > > the main loop body. > > I think I mostly understand this since we was playing with it with Ondra's > histograms (that can be used to get some of the unknowns in the > transformation right). The unknowns (how many times we end up jumpig to > epilogue, for instance, probably can't be reasonably well guessed if we > do not know the loop histogram which currently we know only if we prove > that loop has constant number of iterations. So I am trying to get > right at least this case first. > > Theoretically correct approach would be to first determine entry counts > of prologue and epilogue, then produce what we believe to be correct > profile of those and subtract it from the main loop profile updating > also probabilities in basic blocks where we did nontrivial changes while > updating prologs/epilogs. Finally scale down the main loop profile and > increase exit probabilities.