Re: My opinions on tree-level and RTL-level optimization

Richard Kenner Mon, 18 Apr 2005 08:03:40 -0700

    > Please try to keep this discussion on a civil level!

    I am (for a change, maybe) not the one who started making the
    discussion uncivil.


I'm sorry, but in my opinion that doesn't matter.  I don't call people
names or make personal attacks no matter what I'm responding to.

    > This is a very inaccurate characterization of CSE.  Yes, it does those
    > things, but eliminating common subexpressions is indeed the major task
    > it performs.

    Only on the first pass.  I have looked at how many actual common
    subexpressions it eliminates in the two passes following GCSE, and
    it is really almost nothing.  

OK, but that's a completely different matter.  I thought you were talking
about the *complexity* of the passes and what the code was *intending*
to do.  It's certainly the case that each successive pass finds less to
do and that also means that with tree-ssa optimizers, the RTL optimizers
have less to do.  I see that as a *good thing*, but one that doesn't at
all address the complexity of any code.

        >     (Kenner mentioned CSE around loops, but that is already gone.)

    It certainly wasn't a really problematic piece of code, I think.

Not in complexity, but certainly in time.

    But it turned out that CSE around basic blocks (-fcse-skip-blocks) was
    still a very useful thing to do (and it still was, when I looked at it
    again a couple of weeks ago).

And I would *very much* like to know why!  My view was always that any
global CSE at all should render it unnecessary but GCSE did not.  Now we're
doing extensive global optimization at tree level, but it's *still* needed.
That shouldn't be the case.  I think we *really* need to understand why
it's still needed as part of the issue of replacing optimizers.

    Agreed with "too machine-dependent".  But what is that, exactly?  No
    machine dependence at all?  Some, and if so, what?  I'm curious what
    you think about this, I have no idea yet, really.

I certainly don't have a precise idea, though my gut feeling is that
anything but the most simple parameterization is too much.  I think this
is one of the major issues facing GCC today.

    But a machine specific lowering pass, for example, would probably be a
    good thing.  Just to expose more expressions to the tree optimizers,
    so they can do more work and the RTL optimizers do not have to do that
    much work.

I'm not sure this requires much, if any, machine-specificity.  The major
source of expressions we lose are addressing and that's pretty
machine-independent.  The problem on machines like the x86 is that if we
go too far, combine can't fix it up because the expression will be used
in multiple places.  I think this sort of thing is also one of the major
issues we need to face.

    But, much to the credit of their authors, many of the existing RTL
    passes just _work_ quite well, which is why it is so difficult to
    rewrite them :-/

More to the point, a lot of the complexity in the RTL passes reflects the
underlying complexity of the machines that code is being generated for,
so there's a limit as to how simple they can become no matter what IL is used.

    If you look at e.g. how regmove handles (or really does not handle)
    basic block boundaries, 

To be honest, regmove is one of my least favorite RTL passes ...

    or at how CSE does its path following (which, no doubt, made sense
    when it was written that way)

It's not that it "made sense" when it was written that way, just the
there was no other way to do it at the time.  Now, one might be
tempted to rewrite it using the CFG infrastructure, except that it
should be eliminated entirely, so that would be a waste of time.

    Similarly, if you see how heavily some ports rely on reload to fix up
    instructions, 

That's *always* been a mistake and one that I've been fighting for years.
This is the distinction between predicates and constraints.  On the ports
I've done, I've been very careful to keep the amount of rewriting by reload
to an absolute minimum and I know others have been equally careful.
However, the ports that date back to before GCC 2 have had to have htis
retrofitted and that hasn't always been done as well as it could have been.

    Combine finds the insns to combine in a rather random pick-and-try
    fashion, with at most two or three instructions.  

No, exactly two or three.  It's not *random*.

    And it intermixes the actual instruction (or rather, insn) selection
    with a bunch of optimizations, which IMHO should be split out into a
    separate pass.

Agreed.  That's been on my list for a long time and Roger and others
have come a long way there with simplify-rtx.c.

Re: My opinions on tree-level and RTL-level optimization

Reply via email to