But it turned out that CSE around basic blocks (-fcse-skip-blocks) was still a very useful thing to do (and it still was, when I looked at it again a couple of weeks ago).
And I would *very much* like to know why! My view was always that any global CSE at all should render it unnecessary but GCSE did not. Now we're doing extensive global optimization at tree level, but it's *still* needed. That shouldn't be the case. I think we *really* need to understand why it's still needed as part of the issue of replacing optimizers.
You seem to be confused.
We've known *why* CSE does stuff that GCSE doesn't catch for almost as long as we've had GCSE.
It's because CSE *doesn't just do CSE*!
It does value numbering, and a bunch of other things, which are not really implemented at the RTL level as seperate passes, and reordering RTL passes/running them multiple times is not cheap or easy, like it is with most SSA based tree passes.
Also, the viewpoint that absolutely everything CSE currently does needs to be done in order to remove CSE is wrong.
The correct viewpoint is "we shouldn't remove CSE until every *profitable* transformation it makes is subsumed by something else".
Otherwise, you've started with the unproven assumption that every transformation CSE makes is profitable.
--Dan