"Roger Sayle" <ro...@nextmovesoftware.com> writes:
> Hi Richard,
>
>> "Roger Sayle" <ro...@nextmovesoftware.com> writes:
>> > This patch addresses PR rtl-optimization/106594, a significant
>> > performance regression affecting aarch64 recently introduced (exposed)
>> > by one of my recent RTL simplification improvements.  Firstly many
>> > thanks to Tamar Christina for confirming that the core of this patch
>> > provides ~5% performance improvement on some on his benchmarks.
>> >
>> > GCC's combine pass uses the function expand_compound_operation to
>> > conceptually simplify/canonicalize SIGN_EXTEND and ZERO_EXTEND as a
>> > pair of shift operations, as not all targets support extension
>> > instructions [technically ZERO_EXTEND may potentially be simplified/
>> > canonicalized to an AND operation, but the theory remains the same].
>> 
>> Are you sure the reason is that not all targets support extension 
>> instructions?
>> I thought in practice this routine would only tend to see ZERO_EXTEND etc. if
>> those codes appeared in the original rtl insns.
>
> Excellent question.  My current understanding is that this subroutine is used
> for both SIGN_EXTEND/ZERO_EXTEND (for which may processors have instructions
> and even addressing mode support) and also SIGN_EXTRACT/ZERO_EXTRACT for
> which many platforms, really do need a pair of shift instructions. (or an 
> AND).
>
> The bit (to me) that that's suspicious is the exact wording of the comment...
>
>> >   /* Convert sign extension to zero extension, if we know that the high
>> >      bit is not set, as this is easier to optimize.  It will be converted
>> >      back to cheaper alternative in make_extraction.  */
>
> Notice that things are converted back in "make_extraction", and may not be
> getting converted back (based on empirical evidence) for non-extractions,
> i.e. extensions.  This code is being called for ZERO_EXTEND on a path that
> doesn't subsequently call make_compound.  As shown in the PR, there are
> code generation differences (that impact performance), but I agree there's
> some ambiguity around the intent of the original code.
>
> My personal preference is to write backend patterns that contain ZERO_EXTEND
> (or SIGN_EXTEND) rather than a pair of shifts, or an AND of a paradoxical 
> SUBREG.
> For example, the new patterns added to i386.md look (to me) "cleaner" than the
> forms that they complement/replace.  The burden is then on the middle-end to
> simplify {ZERO,SIGN}_EXTEND forms as efficiently as it would shifts or ANDs.

Yeah.  I think the intention is still that .md patterns match the
"compound" forms like zero_extend/sign_extend where possible.
AIUI the "expanded" from is just an internal thing, to reduce the
number of simplification rules needed, and combine is supposed to call
make_compound_operation before recog()nising the result.

But given that the simplification rules in simplify-rtx.cc (as opposed
to combine.cc itself) have to cope with passes that *don't* do this
canonicalisation, I'm not sure how much it really helps in practice.
And the cost of make_compound_operation failing to recognise the
(possibly quite convoluted) results of substituting into an expanded
compound operation can be quite severe.

> Interestingly, this is sometimes already the case, for example, we simplify
> (ffs (zero_extend ...)) in cases where we wouldn't simplify the equivalent
> (ffs (and ... 255)) [but these are perhaps also just missed optimizations].

Yeah, sounds like it.

Thanks,
Richard

> Thoughts?  I'll also dig more into the history of these (combine) functions.
>
> Cheers,
> Roger
> --

Reply via email to