Thank you for the feedback. I added checks for SCALAR_INT_MODE_P for the reg operands of the compare and if-then-else expressions. As it is not legal to have different modes in the operand registers, I only added one check for each of the expressions. The updated patch was bootstrapped and tested again. Best, Jennifer
0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch
Description: Binary data
> On 24 Jul 2024, at 13:57, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: > > > >> On 24 Jul 2024, at 13:40, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: >> >> >> >>> On 24 Jul 2024, at 13:34, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: >>> >>> Hi Jennifer, >>> >>>> On 24 Jul 2024, at 10:52, Jennifer Schmitz <jschm...@nvidia.com> wrote: >>>> >>>> The following typo was corrected, updated patch file below: >>>> >>>> /* Fuse CMP and CSET. */ >>>> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) >>>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) >>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch> >>>> >>>>> On 23 Jul 2024, at 12:16, Jennifer Schmitz <jschm...@nvidia.com> wrote: >>>>> >>>>> According to the Neoverse V2 Software Optimization Guide (section 4.14), >>>>> the >>>>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been >>>>> implemented so far. This patch implements and tests the two fusion pairs. >>>>> >>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>>>> regression. >>>>> There was also no non-noise impact on SPEC CPU2017 benchmark. >>>>> OK for mainline? >>>>> >>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>>>> >>> >>> I’ve bootstrapped and tested the patch myself as well and it looks good. >>> I think it can be extended a bit though... >>> >>> >>> >>>>> gcc/ >>>>> >>>>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement >>>>> fusion logic. >>>>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry. >>>>> (cmp+cset): Likewise. >>>>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in >>>>> field fusible_ops. >>>>> >>>>> gcc/testsuite/ >>>>> >>>>> * gcc.target/aarch64/fuse_cmp_csel.c: New test. >>>>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise. >>>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch> >>> >>> … >>> >>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) >>> + && prev_set && curr_set >>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE >>> + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE >>> + && REG_P (XEXP (SET_SRC (curr_set), 1)) >>> + && REG_P (XEXP (SET_SRC (curr_set), 2)) >>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) >>> + return true; >>> + >>> + /* Fuse CMP and CSET. */ >>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) >>> + && prev_set && curr_set >>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE >>> + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE >>> + && REG_P (SET_DEST (curr_set)) >>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) >>> + return true; >>> >>> We have zero-extending forms of these patterns that I think we want to >>> match here as well: >>> *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md. >>> They have some zero_extends around their operands that we’d want to match >>> here as well. >>> >>> Feel free to add them as a follow-up to this patch though as this patch is >>> correct as is. >>> I’ll push it for you to trunk… >> >> Patch pushed as 4c5eb66e701 after fixing up the names of the testsuite files >> in the ChangeLog to match the files in the patch. The test-file names match in the updated version. > > As discussed offline, I’ve reverted the patch for now to give Jennifer a > chance to address the feedback. > Apologies for jumping the gun a bit here. > Thanks, > Kyrill > >> >> Thanks, >> Kyrill >> >> >>> Thanks, >>> Kyrill
smime.p7s
Description: S/MIME cryptographic signature