"Roger Sayle" <ro...@nextmovesoftware.com> writes: > Hi Richard, > >> "Roger Sayle" <ro...@nextmovesoftware.com> writes: >> > This patch addresses PR rtl-optimization/106594, a significant >> > performance regression affecting aarch64 recently introduced (exposed) >> > by one of my recent RTL simplification improvements. Firstly many >> > thanks to Tamar Christina for confirming that the core of this patch >> > provides ~5% performance improvement on some on his benchmarks. >> > >> > GCC's combine pass uses the function expand_compound_operation to >> > conceptually simplify/canonicalize SIGN_EXTEND and ZERO_EXTEND as a >> > pair of shift operations, as not all targets support extension >> > instructions [technically ZERO_EXTEND may potentially be simplified/ >> > canonicalized to an AND operation, but the theory remains the same]. >> >> Are you sure the reason is that not all targets support extension >> instructions? >> I thought in practice this routine would only tend to see ZERO_EXTEND etc. if >> those codes appeared in the original rtl insns. > > Excellent question. My current understanding is that this subroutine is used > for both SIGN_EXTEND/ZERO_EXTEND (for which may processors have instructions > and even addressing mode support) and also SIGN_EXTRACT/ZERO_EXTRACT for > which many platforms, really do need a pair of shift instructions. (or an > AND). > > The bit (to me) that that's suspicious is the exact wording of the comment... > >> > /* Convert sign extension to zero extension, if we know that the high >> > bit is not set, as this is easier to optimize. It will be converted >> > back to cheaper alternative in make_extraction. */ > > Notice that things are converted back in "make_extraction", and may not be > getting converted back (based on empirical evidence) for non-extractions, > i.e. extensions. This code is being called for ZERO_EXTEND on a path that > doesn't subsequently call make_compound. As shown in the PR, there are > code generation differences (that impact performance), but I agree there's > some ambiguity around the intent of the original code. > > My personal preference is to write backend patterns that contain ZERO_EXTEND > (or SIGN_EXTEND) rather than a pair of shifts, or an AND of a paradoxical > SUBREG. > For example, the new patterns added to i386.md look (to me) "cleaner" than the > forms that they complement/replace. The burden is then on the middle-end to > simplify {ZERO,SIGN}_EXTEND forms as efficiently as it would shifts or ANDs.
Yeah. I think the intention is still that .md patterns match the "compound" forms like zero_extend/sign_extend where possible. AIUI the "expanded" from is just an internal thing, to reduce the number of simplification rules needed, and combine is supposed to call make_compound_operation before recog()nising the result. But given that the simplification rules in simplify-rtx.cc (as opposed to combine.cc itself) have to cope with passes that *don't* do this canonicalisation, I'm not sure how much it really helps in practice. And the cost of make_compound_operation failing to recognise the (possibly quite convoluted) results of substituting into an expanded compound operation can be quite severe. > Interestingly, this is sometimes already the case, for example, we simplify > (ffs (zero_extend ...)) in cases where we wouldn't simplify the equivalent > (ffs (and ... 255)) [but these are perhaps also just missed optimizations]. Yeah, sounds like it. Thanks, Richard > Thoughts? I'll also dig more into the history of these (combine) functions. > > Cheers, > Roger > --