14 regression] jump threading de-optimizes nested floating point comparisons

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 10 Jul 2023 04:27:11 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154


--- Comment #64 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #63)
> > > It looks like `-fno-tree-pre` does the trick, but then of course, messes 
> > > up
> > > elsewhere.  The conditional statement seem to stay in the most complicated
> > > form possible in scalar code.
> > > 
> > > I'll try to track down what to turn off and experiment with a pre2 after
> > > vect.
> > > Is before predcom a good place?
> > 
> > I would avoid putting it into the loop pipeline.  Instead I'd turn the
> > FRE pass that runs after tracer into PRE.  Maybe conditional on whether
> > there are any loops.
> > 
> > Note it's not so easy to "tame" PRE, the existing things happen at
> > elimination time in eliminate_dom_walker::eliminate_stmt.  I would
> > experiment with restricting the use of inserted PHIs in innermost(!)
> > loops containing invariants, maybe only if the number of PHI args is
> > more than two ... (but that's somewhat artificial).
> > 
> > That said, I'm not really convinced this is a good idea.
> 
> I hear you.. there's also the added complexity that this likely only is
> beneficial for fully masked architectures.  I wonder, if it might be
> feasible and better to pass on additional information from pre to ifcvt to
> indicate that the operation was created from a common block.
> 
> In which case ifcvt could move the cond to just before the first shared
> statement?

I don't think PRE "knows" where the operation was created from since it's
transforms from a global dataflow problem solution.

Btw, what's the testcase your last examples are from?

[Bug tree-optimization/109154] [13/14 regression] jump threading de-optimizes nested floating point comparisons

Reply via email to