Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD

Richard Sandiford Tue, 28 Nov 2023 09:56:01 -0800

Richard Sandiford <richard.sandif...@arm.com> writes:
> Tamar Christina <tamar.christ...@arm.com> writes:
>> Hi All,
>>
>> This adds an implementation for conditional branch optab for AArch64.
>>
>> For e.g.
>>
>> void f1 ()
>> {
>>   for (int i = 0; i < N; i++)
>>     {
>>       b[i] += a[i];
>>       if (a[i] > 0)
>>      break;
>>     }
>> }
>>
>> For 128-bit vectors we generate:
>>
>>         cmgt    v1.4s, v1.4s, #0
>>         umaxp   v1.4s, v1.4s, v1.4s
>>         fmov    x3, d1
>>         cbnz    x3, .L8
>>
>> and of 64-bit vector we can omit the compression:
>>
>>         cmgt    v1.2s, v1.2s, #0
>>         fmov    x2, d1
>>         cbz     x2, .L13
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>>      * config/aarch64/aarch64-simd.md (cbranch<mode>4): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      * gcc.target/aarch64/vect-early-break-cbranch.c: New test.
>>
>> --- inline copy of patch -- 
>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 
>> 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15
>>  100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
>>    DONE;
>>  })
>>  
>> +;; Patterns comparing two vectors and conditionally jump
>> +
>> +(define_expand "cbranch<mode>4"
>> +  [(set (pc)
>> +        (if_then_else
>> +          (match_operator 0 "aarch64_equality_operator"
>> +            [(match_operand:VDQ_I 1 "register_operand")
>> +             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
>> +          (label_ref (match_operand 3 ""))
>> +          (pc)))]
>> +  "TARGET_SIMD"
>> +{
>> +  auto code = GET_CODE (operands[0]);
>> +  rtx tmp = operands[1];
>> +
>> +  /* If comparing against a non-zero vector we have to do a comparison first
>> +     so we can have a != 0 comparison with the result.  */
>> +  if (operands[2] != CONST0_RTX (<MODE>mode))
>> +    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
>> +                                    operands[2]));
>> +
>> +  /* For 64-bit vectors we need no reductions.  */
>> +  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
>> +    {
>> +      /* Always reduce using a V4SI.  */
>> +      rtx reduc = gen_lowpart (V4SImode, tmp);
>> +      rtx res = gen_reg_rtx (V4SImode);
>> +      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
>> +      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
>> +    }
>> +
>> +  rtx val = gen_reg_rtx (DImode);
>> +  emit_move_insn (val, gen_lowpart (DImode, tmp));
>> +
>> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
>> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
>> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
>> +  DONE;
>
> Are you sure this is correct for the operands[2] != const0_rtx case?
> It looks like it uses the same comparison code for the vector comparison
> and the scalar comparison.
>
> E.g. if the pattern is passed a comparison:
>
>   (eq (reg:V2SI x) (reg:V2SI y))
>
> it looks like we'd generate a CMEQ for the x and y, then branch
> when the DImode bitcast of the CMEQ result equals zero.  This means
> that we branch when no elements of x and y are equal, rather than
> when all elements of x and y are equal.
>
> E.g. for:
>
>    { 1, 2 } == { 1, 2 }
>
> CMEQ will produce { -1, -1 }, the scalar comparison will be -1 == 0,
> and the branch won't be taken.
>
> ISTM it would be easier for the operands[2] != const0_rtx case to use
> EOR instead of a comparison.  That gives a zero result if the input
> vectors are equal and a nonzero result if the input vectors are
> different.  We can then branch on the result using CODE and const0_rtx.
>
> (Hope I've got that right.)
>
> Maybe that also removes the need for patch 18.


Sorry, I forgot to say: we can't use operands[1] as a temporary,
since it's only an input to the pattern.  The EOR destination would
need to be a fresh register.

Thanks,
Richard

Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD

Reply via email to