Thank you for the feedback. I added checks for SCALAR_INT_MODE_P for the reg 
operands of the compare and if-then-else expressions. As it is not legal to 
have different modes in the operand registers, I only added one check for each 
of the expressions.
The updated patch was bootstrapped and tested again.
Best,
Jennifer

Attachment: 0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch
Description: Binary data



> On 24 Jul 2024, at 13:57, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
> 
> 
> 
>> On 24 Jul 2024, at 13:40, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
>> 
>> 
>> 
>>> On 24 Jul 2024, at 13:34, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
>>> 
>>> Hi Jennifer,
>>> 
>>>> On 24 Jul 2024, at 10:52, Jennifer Schmitz <jschm...@nvidia.com> wrote:
>>>> 
>>>> The following typo was corrected, updated patch file below:
>>>> 
>>>> /* Fuse CMP and CSET. */
>>>> - if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>>>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>>>> 
>>>>> On 23 Jul 2024, at 12:16, Jennifer Schmitz <jschm...@nvidia.com> wrote:
>>>>> 
>>>>> According to the Neoverse V2 Software Optimization Guide (section 4.14), 
>>>>> the
>>>>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>>>>> implemented so far. This patch implements and tests the two fusion pairs.
>>>>> 
>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>>>> regression.
>>>>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>>>>> OK for mainline?
>>>>> 
>>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>>>>> 
>>> 
>>> I’ve bootstrapped and tested the patch myself as well and it looks good.
>>> I think it can be extended a bit though...
>>> 
>>> 
>>> 
>>>>> gcc/
>>>>> 
>>>>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>>>>> fusion logic.
>>>>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>>>>> (cmp+cset): Likewise.
>>>>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>>>>> field fusible_ops.
>>>>> 
>>>>> gcc/testsuite/
>>>>> 
>>>>> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>>>>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
>>>>> <0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch>
>>> 
>>> …
>>> 
>>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
>>> + && prev_set && curr_set
>>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
>>> + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
>>> + && REG_P (XEXP (SET_SRC (curr_set), 1))
>>> + && REG_P (XEXP (SET_SRC (curr_set), 2))
>>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>>> + return true;
>>> +
>>> + /* Fuse CMP and CSET. */
>>> + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
>>> + && prev_set && curr_set
>>> + && GET_CODE (SET_SRC (prev_set)) == COMPARE
>>> + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE
>>> + && REG_P (SET_DEST (curr_set))
>>> + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>>> + return true;
>>> 
>>> We have zero-extending forms of these patterns that I think we want to 
>>> match here as well:
>>> *cstoresi_insn_uxtw, *cmovsi_insn_uxtw and *cmovdi_insn_uxtw in aarch64.md.
>>> They have some zero_extends around their operands that we’d want to match 
>>> here as well.
>>> 
>>> Feel free to add them as a follow-up to this patch though as this patch is 
>>> correct as is.
>>> I’ll push it for you to trunk…
>> 
>> Patch pushed as 4c5eb66e701 after fixing up the names of the testsuite files 
>> in the ChangeLog to match the files in the patch.
The test-file names match in the updated version.
> 
> As discussed offline, I’ve reverted the patch for now to give Jennifer a 
> chance to address the feedback.
> Apologies for jumping the gun a bit here.
> Thanks,
> Kyrill
> 
>> 
>> Thanks,
>> Kyrill
>> 
>> 
>>> Thanks,
>>> Kyrill


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to