Hi Jennifer, > On 24 Jul 2024, at 10:52, Jennifer Schmitz <jschm...@nvidia.com> wrote: > > The following typo was corrected, updated patch file below: > > /* Fuse CMP and CSET. */ > - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) > + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) > <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch> > >> On 23 Jul 2024, at 12:16, Jennifer Schmitz <jschm...@nvidia.com> wrote: >> >> According to the Neoverse V2 Software Optimization Guide (section 4.14), the >> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been >> implemented so far. This patch implements and tests the two fusion pairs. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> There was also no non-noise impact on SPEC CPU2017 benchmark. >> OK for mainline? >> >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>
I’ve bootstrapped and tested the patch myself as well and it looks good. I think it can be extended a bit though... >> gcc/ >> >> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement >> fusion logic. >> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry. >> (cmp+cset): Likewise. >> * config/aarch64/tuning_models/neoversev2.h: Enable logic in >> field fusible_ops. >> >> gcc/testsuite/ >> >> * gcc.target/aarch64/fuse_cmp_csel.c: New test. >> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise. >> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch> … + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) + && prev_set && curr_set + && GET_CODE (SET_SRC (prev_set)) == COMPARE + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE + && REG_P (XEXP (SET_SRC (curr_set), 1)) + && REG_P (XEXP (SET_SRC (curr_set), 2)) + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) + return true; + + /* Fuse CMP and CSET. */ + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) + && prev_set && curr_set + && GET_CODE (SET_SRC (prev_set)) == COMPARE + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE + && REG_P (SET_DEST (curr_set)) + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) + return true; We have zero-extending forms of these patterns that I think we want to match here as well: *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md. They have some zero_extends around their operands that we’d want to match here as well. Feel free to add them as a follow-up to this patch though as this patch is correct as is. I’ll push it for you to trunk… Thanks, Kyrill