On 24 Jul 2024, at 13:40, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
On 24 Jul 2024, at 13:34, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: Hi Jennifer, On 24 Jul 2024, at 10:52, Jennifer Schmitz <jschm...@nvidia.com> wrote: The following typo was corrected, updated patch file below: /* Fuse CMP and CSET. */ - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch> On 23 Jul 2024, at 12:16, Jennifer Schmitz <jschm...@nvidia.com> wrote: According to the Neoverse V2 Software Optimization Guide (section 4.14), the instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been implemented so far. This patch implements and tests the two fusion pairs. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. There was also no non-noise impact on SPEC CPU2017 benchmark. OK for mainline? Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> I’ve bootstrapped and tested the patch myself as well and it looks good. I think it can be extended a bit though... gcc/ * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement fusion logic. * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry. (cmp+cset): Likewise. * config/aarch64/tuning_models/neoversev2.h: Enable logic in field fusible_ops. gcc/testsuite/ * gcc.target/aarch64/fuse_cmp_csel.c: New test. * gcc.target/aarch64/fuse_cmp_cset.c: Likewise. <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch> … + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) + && prev_set && curr_set + && GET_CODE (SET_SRC (prev_set)) == COMPARE + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE + && REG_P (XEXP (SET_SRC (curr_set), 1)) + && REG_P (XEXP (SET_SRC (curr_set), 2)) + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) + return true; + + /* Fuse CMP and CSET. */ + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) + && prev_set && curr_set + && GET_CODE (SET_SRC (prev_set)) == COMPARE + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE + && REG_P (SET_DEST (curr_set)) + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) + return true; We have zero-extending forms of these patterns that I think we want to match here as well: *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md. They have some zero_extends around their operands that we’d want to match here as well. Feel free to add them as a follow-up to this patch though as this patch is correct as is. I’ll push it for you to trunk… Patch pushed as 4c5eb66e701 after fixing up the names of the testsuite files in the ChangeLog to match the files in the patch. As discussed offline, I’ve reverted the patch for now to give Jennifer a chance to address the feedback. Apologies for jumping the gun a bit here. Thanks, Kyrill Thanks, Kyrill Thanks, Kyrill