Re: [RFC PATCH] cse: Add another CSE pass after split1

Jeff Law Sat, 29 Jun 2024 07:12:53 -0700



On 6/27/24 3:56 PM, Palmer Dabbelt wrote:

This is really more of a question than a patch.

Looking at PR/115687 I managed to convince myself there's a general
class of problems here: splitting might produce constant subexpressions,
but as far as I can tell there's nothing to eliminate those constant
subexpressions.  So I very quickly threw together a CSE that doesn't
fold expressions, and it does eliminate the high-part constants in
question.

At that point I realized the implementation here is bogus: it's not the
folding that's the problem, but introducing new expressions post-split
would break things -- or at least I think it would, we'd end up with
insns the backends don't expect to have that late.  I'm not sure if
split2 would end up cleaning all that up at a functional level, but it
certainly seems like optimization would be pretty far off the rails at
that point and thus doesn't seem like a good idea.  I'm also not sure
how effective this would be without doing the folding, as without
folding we can only eliminate the last insn in the constant sequence --
that's fine here, but it wouldn't work for more complicated stuff.

So I think if this was to go anywhere we'd want to have a CSE that
really only eliminates expressions (ie, doesn't do any of the other
juggling to try and produce more constant subexpressions).  There's a
few places where new expressions can be introduced, so it'd probably be
better done as a new cse_insn-type function instead of just a flag.  It
seems somewhat manageable to write, though.

That said, I really don't know what I'm doing here.  So I figured I'd
just send out what I'd put together, mostly as a way to ask if it's
worth putting time into this?

The biggest problem with running CSE again is the cost. It's been awhile since I've dug into rtl compile-time issues, but traditionally CSEis the biggest time hog in the RTL pipeline.

There's a natural tension between exposing the synthesis early whichimproves CSE, but harms combine vs exposing it late which helps combinebut hurts CSE. Some (myself included) tend to lean towards improvingcombine, while others lean towards improving CSE.

As I mentioned in the context of Vineet's recent changes for cactubssn,I do think we want to go back and revisit the mvconst_internal pattern.The idea was that it would help combine discover cases where it cansimplify logical/shift ops, without having to write some quite uglycombiner patterns to rediscover the constants. But it has someundesirable fallout as well, particularly in forcing us to writedefine_insn_and_split patterns rather than define_split patterns whichhas the undesirable effect of mucking up combine's costing model forsplitting.

The space you're poking at has been mined quite a bit through the yearsby numerous sharp folks and there are no simple answers for how tohandle constants, splitting and the like. I fully expect that anythingwe do is going to have negative fallout. It's inherent in this problemspace.

The other thing to remember is that constant synthesis is rarely a majorperformance driver. They're constants :-) I spent months on a similarproject many years ago (customer contract) -- and while the end resultlooked really good if you were staring at assembly code all day, butfrom a real world performance standpoint it was undetectable. Certainlywasn't worth the effort put in (though some of the infrastructure workalong the way really cleaned up warts in the tree data structures).


jeff

jeff

Re: [RFC PATCH] cse: Add another CSE pass after split1

Reply via email to