> Please try to keep this discussion on a civil level! I am (for a change, maybe) not the one who started making the discussion uncivil.
I'm sorry, but in my opinion that doesn't matter. I don't call people names or make personal attacks no matter what I'm responding to. > This is a very inaccurate characterization of CSE. Yes, it does those > things, but eliminating common subexpressions is indeed the major task > it performs. Only on the first pass. I have looked at how many actual common subexpressions it eliminates in the two passes following GCSE, and it is really almost nothing. OK, but that's a completely different matter. I thought you were talking about the *complexity* of the passes and what the code was *intending* to do. It's certainly the case that each successive pass finds less to do and that also means that with tree-ssa optimizers, the RTL optimizers have less to do. I see that as a *good thing*, but one that doesn't at all address the complexity of any code. > (Kenner mentioned CSE around loops, but that is already gone.) It certainly wasn't a really problematic piece of code, I think. Not in complexity, but certainly in time. But it turned out that CSE around basic blocks (-fcse-skip-blocks) was still a very useful thing to do (and it still was, when I looked at it again a couple of weeks ago). And I would *very much* like to know why! My view was always that any global CSE at all should render it unnecessary but GCSE did not. Now we're doing extensive global optimization at tree level, but it's *still* needed. That shouldn't be the case. I think we *really* need to understand why it's still needed as part of the issue of replacing optimizers. Agreed with "too machine-dependent". But what is that, exactly? No machine dependence at all? Some, and if so, what? I'm curious what you think about this, I have no idea yet, really. I certainly don't have a precise idea, though my gut feeling is that anything but the most simple parameterization is too much. I think this is one of the major issues facing GCC today. But a machine specific lowering pass, for example, would probably be a good thing. Just to expose more expressions to the tree optimizers, so they can do more work and the RTL optimizers do not have to do that much work. I'm not sure this requires much, if any, machine-specificity. The major source of expressions we lose are addressing and that's pretty machine-independent. The problem on machines like the x86 is that if we go too far, combine can't fix it up because the expression will be used in multiple places. I think this sort of thing is also one of the major issues we need to face. But, much to the credit of their authors, many of the existing RTL passes just _work_ quite well, which is why it is so difficult to rewrite them :-/ More to the point, a lot of the complexity in the RTL passes reflects the underlying complexity of the machines that code is being generated for, so there's a limit as to how simple they can become no matter what IL is used. If you look at e.g. how regmove handles (or really does not handle) basic block boundaries, To be honest, regmove is one of my least favorite RTL passes ... or at how CSE does its path following (which, no doubt, made sense when it was written that way) It's not that it "made sense" when it was written that way, just the there was no other way to do it at the time. Now, one might be tempted to rewrite it using the CFG infrastructure, except that it should be eliminated entirely, so that would be a waste of time. Similarly, if you see how heavily some ports rely on reload to fix up instructions, That's *always* been a mistake and one that I've been fighting for years. This is the distinction between predicates and constraints. On the ports I've done, I've been very careful to keep the amount of rewriting by reload to an absolute minimum and I know others have been equally careful. However, the ports that date back to before GCC 2 have had to have htis retrofitted and that hasn't always been done as well as it could have been. Combine finds the insns to combine in a rather random pick-and-try fashion, with at most two or three instructions. No, exactly two or three. It's not *random*. And it intermixes the actual instruction (or rather, insn) selection with a bunch of optimizations, which IMHO should be split out into a separate pass. Agreed. That's been on my list for a long time and Roger and others have come a long way there with simplify-rtx.c.