> On 24 Jul 2024, at 13:38, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Kyrylo Tkachov <ktkac...@nvidia.com> writes: >> Hi Jennifer, >> >>> On 24 Jul 2024, at 10:52, Jennifer Schmitz <jschm...@nvidia.com> wrote: >>> >>> The following typo was corrected, updated patch file below: >>> >>> /* Fuse CMP and CSET. */ >>> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) >>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) >>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch> >>> >>>> On 23 Jul 2024, at 12:16, Jennifer Schmitz <jschm...@nvidia.com> wrote: >>>> >>>> According to the Neoverse V2 Software Optimization Guide (section 4.14), >>>> the >>>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been >>>> implemented so far. This patch implements and tests the two fusion pairs. >>>> >>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>>> regression. >>>> There was also no non-noise impact on SPEC CPU2017 benchmark. >>>> OK for mainline? >>>> >>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>>> >> >> I’ve bootstrapped and tested the patch myself as well and it looks good. >> I think it can be extended a bit though... >> >> >> >>>> gcc/ >>>> >>>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement >>>> fusion logic. >>>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry. >>>> (cmp+cset): Likewise. >>>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in >>>> field fusible_ops. >>>> >>>> gcc/testsuite/ >>>> >>>> * gcc.target/aarch64/fuse_cmp_csel.c: New test. >>>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise. >>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch> >> >> … >> >> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) >> + && prev_set && curr_set >> + && GET_CODE (SET_SRC (prev_set)) == COMPARE >> + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE >> + && REG_P (XEXP (SET_SRC (curr_set), 1)) >> + && REG_P (XEXP (SET_SRC (curr_set), 2)) >> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) >> + return true; >> + >> + /* Fuse CMP and CSET. */ >> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) >> + && prev_set && curr_set >> + && GET_CODE (SET_SRC (prev_set)) == COMPARE >> + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE >> + && REG_P (SET_DEST (curr_set)) >> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) >> + return true; >> >> We have zero-extending forms of these patterns that I think we want to match >> here as well: >> *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md. >> They have some zero_extends around their operands that we’d want to match >> here as well. >> >> Feel free to add them as a follow-up to this patch though as this patch is >> correct as is. >> I’ll push it for you to trunk… > > Sorry for the slow review, was trying to think of a specific suggestion > before replying, but didn't have time to come up with one. > > Don't these conditions catch quite a bit more than just the CMP and > CSEL, especially on the CMP side? Maybe it's ok to have a liberal, > fuzzy match, but I think it at least deserves a comment.
Maybe we can add a restriction to integer modes on the compare? I don’t think FCSEL should be getting fused… Kyrill > > Richard