On 6/27/24 3:56 PM, Palmer Dabbelt wrote:
This is really more of a question than a patch.
Looking at PR/115687 I managed to convince myself there's a general
class of problems here: splitting might produce constant subexpressions,
but as far as I can tell there's nothing to eliminate those constant
subexpressions. So I very quickly threw together a CSE that doesn't
fold expressions, and it does eliminate the high-part constants in
question.
At that point I realized the implementation here is bogus: it's not the
folding that's the problem, but introducing new expressions post-split
would break things -- or at least I think it would, we'd end up with
insns the backends don't expect to have that late. I'm not sure if
split2 would end up cleaning all that up at a functional level, but it
certainly seems like optimization would be pretty far off the rails at
that point and thus doesn't seem like a good idea. I'm also not sure
how effective this would be without doing the folding, as without
folding we can only eliminate the last insn in the constant sequence --
that's fine here, but it wouldn't work for more complicated stuff.
So I think if this was to go anywhere we'd want to have a CSE that
really only eliminates expressions (ie, doesn't do any of the other
juggling to try and produce more constant subexpressions). There's a
few places where new expressions can be introduced, so it'd probably be
better done as a new cse_insn-type function instead of just a flag. It
seems somewhat manageable to write, though.
That said, I really don't know what I'm doing here. So I figured I'd
just send out what I'd put together, mostly as a way to ask if it's
worth putting time into this?
The biggest problem with running CSE again is the cost. It's been a
while since I've dug into rtl compile-time issues, but traditionally CSE
is the biggest time hog in the RTL pipeline.
There's a natural tension between exposing the synthesis early which
improves CSE, but harms combine vs exposing it late which helps combine
but hurts CSE. Some (myself included) tend to lean towards improving
combine, while others lean towards improving CSE.
As I mentioned in the context of Vineet's recent changes for cactubssn,
I do think we want to go back and revisit the mvconst_internal pattern.
The idea was that it would help combine discover cases where it can
simplify logical/shift ops, without having to write some quite ugly
combiner patterns to rediscover the constants. But it has some
undesirable fallout as well, particularly in forcing us to write
define_insn_and_split patterns rather than define_split patterns which
has the undesirable effect of mucking up combine's costing model for
splitting.
The space you're poking at has been mined quite a bit through the years
by numerous sharp folks and there are no simple answers for how to
handle constants, splitting and the like. I fully expect that anything
we do is going to have negative fallout. It's inherent in this problem
space.
The other thing to remember is that constant synthesis is rarely a major
performance driver. They're constants :-) I spent months on a similar
project many years ago (customer contract) -- and while the end result
looked really good if you were staring at assembly code all day, but
from a real world performance standpoint it was undetectable. Certainly
wasn't worth the effort put in (though some of the infrastructure work
along the way really cleaned up warts in the tree data structures).
jeff
jeff