> On 24 Jul 2024, at 13:38, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov <ktkac...@nvidia.com> writes:
>> Hi Jennifer,
>> 
>>> On 24 Jul 2024, at 10:52, Jennifer Schmitz <jschm...@nvidia.com> wrote:
>>> 
>>> The following typo was corrected, updated patch file below:
>>> 
>>> /* Fuse CMP and CSET. */
>>> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>>> 
>>>> On 23 Jul 2024, at 12:16, Jennifer Schmitz <jschm...@nvidia.com> wrote:
>>>> 
>>>> According to the Neoverse V2 Software Optimization Guide (section 4.14), 
>>>> the
>>>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>>>> implemented so far. This patch implements and tests the two fusion pairs.
>>>> 
>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>>> regression.
>>>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>>>> OK for mainline?
>>>> 
>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>>>> 
>> 
>> I’ve bootstrapped and tested the patch myself as well and it looks good.
>> I think it can be extended a bit though...
>> 
>> 
>> 
>>>> gcc/
>>>> 
>>>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>>>> fusion logic.
>>>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>>>> (cmp+cset): Likewise.
>>>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>>>> field fusible_ops.
>>>> 
>>>> gcc/testsuite/
>>>> 
>>>> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>>>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
>>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>> 
>> …
>> 
>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>> + && prev_set && curr_set
>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
>> + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
>> + && REG_P (XEXP (SET_SRC (curr_set), 1))
>> + && REG_P (XEXP (SET_SRC (curr_set), 2))
>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>> + return true;
>> +
>> + /* Fuse CMP and CSET. */
>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>> + && prev_set && curr_set
>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
>> + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
>> + && REG_P (SET_DEST (curr_set))
>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>> + return true;
>> 
>> We have zero-extending forms of these patterns that I think we want to match 
>> here as well:
>> *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
>> They have some zero_extends around their operands that we’d want to match 
>> here as well.
>> 
>> Feel free to add them as a follow-up to this patch though as this patch is 
>> correct as is.
>> I’ll push it for you to trunk…
> 
> Sorry for the slow review, was trying to think of a specific suggestion
> before replying, but didn't have time to come up with one.
> 
> Don't these conditions catch quite a bit more than just the CMP and
> CSEL, especially on the CMP side?  Maybe it's ok to have a liberal,
> fuzzy match, but I think it at least deserves a comment.

Maybe we can add a restriction to integer modes on the compare?
I don’t think FCSEL should be getting fused…
Kyrill

> 
> Richard

Reply via email to