On Mon, 10 Jul 2023, Jan Hubicka wrote:

> > On Fri, 7 Jul 2023, Jan Hubicka wrote:
> > 
> > > > 
> > > > Looks good, but I wonder what we can do to at least make the
> > > > multiple exit case behave reasonably?  The vectorizer keeps track
> > > 
> > > > of a "canonical" exit, would it be possible to pass in the main
> > > > exit edge and use that instead of single_exit (), would other
> > > > exits then behave somewhat reasonable or would we totally screw
> > > > things up here?  That is, the "canonical" exit would be the
> > > > counting exit while the other exits are on data driven conditions
> > > > and thus wouldn't change probability when we reduce the number
> > > > of iterations(?)
> > > 
> > > I can add canonical_exit parameter and make the function to direct flow
> > > to it if possible.  However overall I think fixup depends on what
> > > transformation led to the change.
> > 
> > I think the vectorizer knows there's a single counting IV and all
> > other exits are dependent on data processed, so the scaling the
> > vectorizer just changes the counting IV.  So I think it makes
> > sense to pass that exit to the function in all cases.
> 
> It really seems to me that vectorized loop is like N loops happening 
> in parallel, so the probabilities of alternative exits grows as well.
> But canonical exit is right thing to do for prologues - here we really
> add extra conditions to the iteration counting exit.
> > 
> > > Assuming that vectorizer did no prologues and apilogues and we
> > > vectorized with factor N, then I think the update could be done more
> > > specifically as follows.
> > > 
> > > We know that header block count dropped by 4. So we can start from that
> > > and each time we reach basic block with exit edge, we know the original
> > > count of the edge.  This count is unchanged, so one can rescale
> > > probabilities out of that BB accordingly.  If loop has no inner loops,
> > > we can just walk the body in RPO and propagate scales downwards and we
> > > sould arrive to right result
> > 
> > That should work for alternate exits as well, no?
> Yes, i think it could omstly work for acyclic bodies. I ended up
> implementing a special case of this for loop-ch in order to handle
> corectly loop invariant conditionals.  Will send patch after some
> cleanups. (There seems to be more loop invariant conditionals in real
> code than I would tought)
> 
> Tampering only with loop exit probabilities is not always enought.
> If you have:
>   while (1)
>     if (test1)
>       {
>         if (test2)
>         break;
>       }
> increasing count of exit may require increasing probablity of the outer
> conditional.   Do we support this in vectorization at all and if so, do
> we know something here?

Tamar would need to answer this but without early break vectorization
the if-conversion pass will flatten everything and I think even early
breaks will be in the end a non-nested sequence of BBs with
exit conds at the end (or a loopback branch).

Note the (scalar) epilogue is copied from the original scalar loop
body so it doesn't see any if-conversion.

> For example if the test1 is triggered if test1 is true in one of
> iterations packed togehter, its probability also increases by
> vectorization factor.  
> 
> We run into this in peeling i.e. when we prove that test1 will trigger
> undefined behaviour after one or two iterations but the orignal
> esimtated profile believes in higher iteration count.  I added special
> case for this yesterday to avoid turning if (test2) to 100% in this case
> as that triggers strange codegen in some of fortran testcases.
> 
> We also can have
>   while (1)
>     while (test1)
>       {
>         if (test2)
>         break;
>       }
> Which is harder because changing probability of test2 affects number
> of iteraitons of the inner loop.  So I am giving up on this.
> I think currently it happens mostly with unlooping.

What I saw most wrecking the profile is when passes turn
if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
which then simply deletes one of the outgoing edges without doing
anything to the (guessed) profile.

> > > I originally added the bound parameter to handle prologues/epilogues
> > > which gets new artificial bound.  In prologue I think you are right that
> > > the flow will be probably directed to the conditional counting
> > > iterations.
> > 
> > I suppose we'd need to scale both main and epilogue together since
> > the epilogue "steals" from the main loop counts.  Likewise if there's
> > a skip edge around the vector loop.  I think currently we simply
> > set the edge probability of those skip conds rather than basing
> > this off the niter values they work on.  Aka if (niter < VF) goto
> > epilogue; do {} while (niter / VF); epilogue: do {} while (niter);
> > 
> > There's also the cost model which might require niter > VF to enter
> > the main loop body.
> 
> I think I mostly understand this since we was playing with it with Ondra's
> histograms (that can be used to get some of the unknowns in the
> transformation right). The unknowns (how many times we end up jumpig to
> epilogue, for instance, probably can't be reasonably well guessed if we
> do not know the loop histogram which currently we know only if we prove
> that loop has constant number of iterations.  So I am trying to get
> right at least this case first.
> 
> Theoretically correct approach would be to first determine entry counts
> of prologue and epilogue, then produce what we believe to be correct
> profile of those and subtract it from the main loop profile updating
> also probabilities in basic blocks where we did nontrivial changes while
> updating prologs/epilogs. Finally scale down the main loop profile and
> increase exit probabilities.

Reply via email to