Hi Jennifer,

> On 24 Jul 2024, at 10:52, Jennifer Schmitz <jschm...@nvidia.com> wrote:
> 
> The following typo was corrected, updated patch file below:
> 
> /* Fuse CMP and CSET. */
> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
> 
>> On 23 Jul 2024, at 12:16, Jennifer Schmitz <jschm...@nvidia.com> wrote:
>> 
>> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>> implemented so far. This patch implements and tests the two fusion pairs.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 

I’ve bootstrapped and tested the patch myself as well and it looks good.
I think it can be extended a bit though...



>> gcc/
>> 
>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>> fusion logic.
>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>> (cmp+cset): Likewise.
>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>> field fusible_ops.
>> 
>> gcc/testsuite/
>> 
>> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>

…

+ if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
+ && prev_set && curr_set
+ && GET_CODE (SET_SRC (prev_set)) == COMPARE
+ && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
+ && REG_P (XEXP (SET_SRC (curr_set), 1))
+ && REG_P (XEXP (SET_SRC (curr_set), 2))
+ && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
+ return true;
+
+ /* Fuse CMP and CSET. */
+ if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
+ && prev_set && curr_set
+ && GET_CODE (SET_SRC (prev_set)) == COMPARE
+ && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
+ && REG_P (SET_DEST (curr_set))
+ && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
+ return true;

We have zero-extending forms of these patterns that I think we want to match 
here as well:
*cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
They have some zero_extends around their operands that we’d want to match here 
as well.

Feel free to add them as a follow-up to this patch though as this patch is 
correct as is.
I’ll push it for you to trunk…
Thanks,
Kyrill

Reply via email to