On 2023-10-31 03:16  Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng<zengx...@eswincomputing.com>
>>
>> gcc/ChangeLog:
>>
>>          * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>          (noce_try_cond_zero_arith): handler for condtional zero op
>>          (noce_process_if_block): add noce_try_cond_zero_arith with hook 
>>control
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>> ---
>>   gcc/ifcvt.cc                                  | 112 +++++++++++++++
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c       | 130 ++++++++++++++++++
>>   2 files changed, 242 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index a0af553b9ff..4f98c1c7bf9 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> +static rtx
>> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, 
>> rtx z, rtx target)
>> +{
>> +  machine_mode mode = GET_MODE (target);
>> +  rtx cond_op0 = XEXP (if_info->cond, 0);
>> +  rtx czero_cond
>> +    = gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, 
>> const0_rtx);
>> +  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
>> +  rtx set = gen_rtx_SET (target, if_then_else);
>> +
>> +  start_sequence ();
>> +  rtx_insn *insn = emit_insn (set);
>> +
>> +  if (recog_memoized (insn) >= 0)
>> +    {
>> +      rtx_insn *seq = get_insns ();
>> +      end_sequence ();
>> +      emit_insn (seq);
>> +
>> +      return target;
>> +    }
>> +
>> +  end_sequence ();
>> +  return NULL_RTX;
>> +}
>So just a few notes to further illustrate why I'm currently looking to
>take the VRULL+Ventana implementation.  The code above would be much
>better handled by just calling noce_emit_cmove.  noce_emit_cmove will go
>through the conditional move expander.  So any improvement we make in
>the expander "just work" when called from the if-converter. 
noce_emit_czero is used here to make sure czero insns are emited. 
noce_emit_cmove includes SFB and Thead movcc, which will take precedence
over zicond in RISCV if enabled. Unfortunately we have products with SFB and 
Zicond
both available and saw such conflict. 
And that is also the reason to add hook TARGET_HAVE_COND_ZERO
in [PATCH 1/4] to disallow ineffient code emited by SFB enable and Zicond 
disabled case. 

>> +
>>   /* Try only simple constants and registers here.  More complex cases
>>      are handled in noce_try_cmove_arith after noce_try_store_flag_arith
>>      has had a go at it.  */
>> @@ -2880,6 +2908,88 @@ noce_try_sign_mask (struct noce_if_info *if_info)
>>     return true;
>>   }
>>  
>> +/* Convert x = c ? y + z : y or x = c ? y : y + z. */
>> +
>> +static bool
>> +noce_try_cond_zero_arith (struct noce_if_info *if_info)
>> +{
>> +  rtx target;
>> +  rtx_insn *seq;
>> +  machine_mode mode = GET_MODE (if_info->x);
>> +  rtx common = NULL_RTX;
>> +  enum rtx_code czero_code = UNKNOWN;
>> +  rtx a = if_info->a;
>> +  rtx b = if_info->b;
>> +  rtx z = NULL_RTX;
>> +  rtx cond = if_info->cond;
>> +
>> +  if (!noce_simple_bbs (if_info))
>> +    return false;
>[ ... ]
>So the internal code we have does a bit of canonicalization before the
>optimizing transformations.  In particular we may be presented with
>
>(a == 0) ? b : a which we transform into (a != 0 ? a : b) which allows
>us to pick up more cases.  (b != 0 ? b : a) gets similar handling.
>
>As I mentioned earlier, the VRULL+Ventana code handles wrapping
>extensions & subregs.  Our code also handles if-converting shifts/rotates. 
Cool and waiting for your submit. Shifts/rotates can be added in 
noce_try_cond_zero_arith.
I tried to keep noce_try_cond_zero_arith simple without introducing SCC and 
other stuff
as addtional insns will be generated for greater than like comparision
but may not be generated for branch-insn based SFB.
IMHO, the earlier the noce_try* function emerges in noce_process_if_block, the 
simpler
optimization scenarios are and more efficent codes shall be generated,
then the later function like noce_try_cmove_arith will handle the more general 
case.

BR, 
Fei
>
>Hopefully that explains a bit more why I think cleaning up the
>VRULL+Ventana code is a better choice. 

>
>jeff

Reply via email to