On Apr 18, 2005 02:51 PM, Richard Kenner <[EMAIL PROTECTED]> wrote: > > Unfortunately you appear to have little clue what you are really > > talking about. So let me provide you with some loud feedback as well. > > Please try to keep this discussion on a civil level!
I am (for a change, maybe) not the one who started making the discussion uncivil. And what I wrote is not untrue either. Roger didn't exactly display knowledge of what he was talking about, see the many false statements he made about the RTL optimizers. > > You seem to have a misguided view on what CSE does. What we call CSE > > is not a common subexpression elimination pass. It just happens to > > catch some CSEs as well, but the most important thing it does right > > now is propagation things forward to select better addressing modes > > (essentially a kind of instruction selection), and limited > > folding/simplifying to clean up the sometimes terrible things 'expand' > > can do. > > This is a very inaccurate characterization of CSE. Yes, it does those > things, but eliminating common subexpressions is indeed the major task > it performs. Only on the first pass. I have looked at how many actual common subexpressions it eliminates in the two passes following GCSE, and it is really almost nothing. CSE2 only catches a significant number if flag_loop_optimize2 and/or flag_tracer are set. CSE1 does catch some common subexpressions, most of those appear to come from expanding a tree expression to multiple RTL expressions (e.g. two tree MULT_EXPRs for which expand_* create a cse). It is undisputed that CSE _can_ remove common subexpressions. It really just does not appear to do that very often in practice. It is unfortunately hard to tell exactly what CSE does, because it is doing so many things and it does not report the replacements it makes. When I looked at this, I simply disabled various parts of CSE to see what the effect would be on the resulting code. I also disabled CSE1 completely and replaced it with a cselib based pass, which didn't catch many cses and missed many of the other things CSE does (interestingly, the effect was worst for -fPIC, but I have not yet figured out why). > > (Kenner mentioned CSE around loops, but that is already gone.) > > Sorry. But that is the cheaper of the two kludges (and I'm allowed to use > that word since I wrote them). :-) I didn't know you wrote that. It certainly wasn't a really problematic piece of code, I think. When I removed it, the plan was to make CSE work on extended basic blocks. But it turned out that CSE around basic blocks (-fcse-skip-blocks) was still a very useful thing to do (and it still was, when I looked at it again a couple of weeks ago). > > I agree that some of the details of the machine-dependent parts of the > > tree optimizers are not the most elegant. > > I think there's a serious conceptual issue in making the tree level too > machine-dependent. Agreed with "too machine-dependent". But what is that, exactly? No machine dependence at all? Some, and if so, what? I'm curious what you think about this, I have no idea yet, really. I tried a simple lowering pass last year, but it didn't work out very well. I didn't try very hard either, though. > The *whole point* of doing tree-level optimizations > is to do machine-*independent* optimizations. Trees are machine-independent > and RTL is machine-dependent. If we go too far away from that, I think > we miss the point. Some machine-dependence would not be bad, I think. Sure, you would never want to explicitly expose all the details about the target details. But a machine specific lowering pass, for example, would probably be a good thing. Just to expose more expressions to the tree optimizers, so they can do more work and the RTL optimizers do not have to do that much work. Many other compilers do this, also. I know that has never been a convincing argument on this list, but it is still a sign that it is not all that insane to do, at least. > > Besides, the RTL optimizers are not exactly a part of GCC to be proud > > of if "ugliness" is a measure. > > Really? Well, yes. I mean no offense to the people who wrote RTL optimizers in the past, when many pieces of infrastructure we take for granted now were simply not available. But many passes could certainly use a good cleanup or rewrite. But, much to the credit of their authors, many of the existing RTL passes just _work_ quite well, which is why it is so difficult to rewrite them :-/ If you look at e.g. how regmove handles (or really does not handle) basic block boundaries, or at how CSE does its path following (which, no doubt, made sense when it was written that way), or at the kludges in sched-ebb.c (didn't write it, still use the word ;-) to tear down and later on fix up basic block boundaries, then yes, that is ugly. Similarly, if you see how heavily some ports rely on reload to fix up instructions, and how far away from actual machine instructions RTL can sometimes be before reload, then, again, yes that is pretty ugly, too. Not to mention how _hard_ it is to change an RTL optimizer and not introduce a regression somewhere, because everything depends so strongly on each other, instead of being modular. > Of course GCC will always need a low-level IR. But, combine is > instruction selection in the worst possible way; > > It served GCC well for decades, so I hardly think that's a fair statement. It serves GCC well still today. That does not mean it is the right way to do it. Combine finds the insns to combine in a rather random pick-and-try fashion, with at most two or three instructions. And it intermixes the actual instruction (or rather, insn) selection with a bunch of optimizations, which IMHO should be split out into a separate pass. A more common instruction selection technique in modern compilers is to use a tree rewriting approach. For GCC, such a pass would rewrite (probably lowered) trees to RTL using a machine generated 'tree' parser. Google for BURS and iburg for examples. Unfortunately it is not even possible to try this approach for GCC because the actual asm instructions (the 'alternative' in GCC terms) are not selected until reload. > reload is register allocation in the worst possible way, > > Reload is not supposed to do register allocation. To the extent that > it does, I agree with you. But what this has to do with the issue of > tree vs. RTL optimization is something I don't follow. It has nothing to do with tree vs. RTL at all. It was part of my reply to a quote from Roger's mail that you cut out above. That quote suggests he thinks reload and register allocation is the same thing: > > > It's unlikely that RTL, or an instruction-based, representation of > > > a program will ever go away. And the RTL passes corresponding to > > > instruction selection (combine), register allocation (reload) and > > > scheduling will almost certainly be RTL-level tasks/passes. > Surely you > aren't suggesting doing register allocation at the tree level? Oh, definitely not, no. Heh. Now _that_ would be really ugly, if it is possible at all. Linus suggested once in this list that GCC should do that... ;-) Gr. Steven