On 6/27/24 3:56 PM, Palmer Dabbelt wrote:
This is really more of a question than a patch.

Looking at PR/115687 I managed to convince myself there's a general
class of problems here: splitting might produce constant subexpressions,
but as far as I can tell there's nothing to eliminate those constant
subexpressions.  So I very quickly threw together a CSE that doesn't
fold expressions, and it does eliminate the high-part constants in
question.

At that point I realized the implementation here is bogus: it's not the
folding that's the problem, but introducing new expressions post-split
would break things -- or at least I think it would, we'd end up with
insns the backends don't expect to have that late.  I'm not sure if
split2 would end up cleaning all that up at a functional level, but it
certainly seems like optimization would be pretty far off the rails at
that point and thus doesn't seem like a good idea.  I'm also not sure
how effective this would be without doing the folding, as without
folding we can only eliminate the last insn in the constant sequence --
that's fine here, but it wouldn't work for more complicated stuff.

So I think if this was to go anywhere we'd want to have a CSE that
really only eliminates expressions (ie, doesn't do any of the other
juggling to try and produce more constant subexpressions).  There's a
few places where new expressions can be introduced, so it'd probably be
better done as a new cse_insn-type function instead of just a flag.  It
seems somewhat manageable to write, though.

That said, I really don't know what I'm doing here.  So I figured I'd
just send out what I'd put together, mostly as a way to ask if it's
worth putting time into this?
The biggest problem with running CSE again is the cost. It's been a while since I've dug into rtl compile-time issues, but traditionally CSE is the biggest time hog in the RTL pipeline.

There's a natural tension between exposing the synthesis early which improves CSE, but harms combine vs exposing it late which helps combine but hurts CSE. Some (myself included) tend to lean towards improving combine, while others lean towards improving CSE.


As I mentioned in the context of Vineet's recent changes for cactubssn, I do think we want to go back and revisit the mvconst_internal pattern. The idea was that it would help combine discover cases where it can simplify logical/shift ops, without having to write some quite ugly combiner patterns to rediscover the constants. But it has some undesirable fallout as well, particularly in forcing us to write define_insn_and_split patterns rather than define_split patterns which has the undesirable effect of mucking up combine's costing model for splitting.

The space you're poking at has been mined quite a bit through the years by numerous sharp folks and there are no simple answers for how to handle constants, splitting and the like. I fully expect that anything we do is going to have negative fallout. It's inherent in this problem space.

The other thing to remember is that constant synthesis is rarely a major performance driver. They're constants :-) I spent months on a similar project many years ago (customer contract) -- and while the end result looked really good if you were staring at assembly code all day, but from a real world performance standpoint it was undetectable. Certainly wasn't worth the effort put in (though some of the infrastructure work along the way really cleaned up warts in the tree data structures).

jeff

jeff

Reply via email to