After several tries:

(define_mode_iterator VF_AUTO [
  (VNx1HF "TARGET_ZVFH && TARGET_MIN_VLEN < 128")
  (VNx2HF "TARGET_ZVFH")
  (VNx4HF "TARGET_ZVFH")
  (VNx8HF "TARGET_ZVFH")
  (VNx16HF "TARGET_ZVFH")
  (VNx32HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
  (VNx64HF "TARGET_ZVFH && TARGET_MIN_VLEN >= 128")

  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
  (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
  (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
  (VNx8DF "TARGET_VECTOR_ELEN_FP_64")
  (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
])


I think we should add VF_AUTO change iterator into using TARGET_ZVFH.
Then it also works now. -march=zvfhmin no auto-vectorization , -march=zvfh has 
auto-vectorization.

Feel free to comments more solutions.

Thanks.


juzhe.zh...@rivai.ai
 
From: 钟居哲
Date: 2023-06-15 05:15
To: Jeff Law; rdapp.gcc; gcc-patches; palmer; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Add autovec FP unary operations.
Hi, Jeff.  Thanks for quick approval.

When I reviewed the patch:
(define_expand "<optab><mode>2"
  [(set (match_operand:VF 0 "register_operand")
    (any_float_unop_nofrm:VF
     (match_operand:VF 1 "register_operand")))]
  "TARGET_VECTOR"
{
  insn_code icode = code_for_pred (<CODE>, <MODE>mode);
  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
  DONE;
})

There could be issue here of FP16 vector. 
Since let's see VF iterator:
(define_mode_iterator VF [
  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
....

You can see For all FP16 mode, we use predicate "TARGET_VECTOR_ELEN_FP_16"
which is true when either TARGET_ZVFHM or TARGET_ZVFHMIN.
The reason we do that since most floating-point instructions are using same 
iterators that we can't add TARGET_ZVFHMIN or TARGET_ZVFH
in naive way. Some instructions pattern are using VF for example vle16.v which 
should be enabled as long as TARGET_ZVFHMIN wheras
the instructions like vfneg.v need TARGET_ZVFH.

So I do the experiment:
void
f (_Float16 *restrict a, _Float16 *restrict b)
{
  for (int i = 0; i < 100; ++i)
    {
      a[i] = -b[i];
    }
}

with compile option:
-march=rv64gcv_zvfhmin --param=riscv-autovec-preference=fixed-vlmax -O3

ICE happens:
auto.c:26:1: error: unable to generate reloads for:
(insn 8 7 9 2 (set (reg:VNx8HF 186 [ vect__6.7 ])
        (if_then_else:VNx8HF (unspec:VNx8BI [
                    (const_vector:VNx8BI [
                            (const_int 1 [0x1]) repeated x8
                        ])
                    (const_int 8 [0x8])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (neg:VNx8HF (reg:VNx8HF 134 [ vect__4.6 ]))
            (unspec:VNx8HF [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "auto.c":24:14 6631 {pred_negvnx8hf}
     (expr_list:REG_DEAD (reg:VNx8HF 134 [ vect__4.6 ])
        (nil)))

The reason of ICE is that we have enabled auto-vectorzation pattern of vfneg.v 
when TARGET_ZVFHMIN according to VF iterators but
the instructions pattern of vfneg.v is correctly disabled and only enabled when 
TARGET_ZVFH since we have this attribute for each
RVV instruction pattern:
(define_attr "fp_vector_disabled" "no,yes"
  (cond [
    (and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
        vfwalu,vfwmul,vfmuladd,vfwmuladd,
        vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
        vfclass,vfmerge,
        vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
        vfredo,vfredu,vfwredo,vfwredu,
        vfslide1up,vfslide1down")
   (and (eq_attr "mode" "VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
        (match_test "!TARGET_ZVFH")))
    (const_string "yes")

    ;; The mode records as QI for the FP16 <=> INT8 instruction.
    (and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
   (and (eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
        (match_test "!TARGET_ZVFH")))
    (const_string "yes")
  ]
  (const_string "no")))

When I slightly change the pattern as follows:
(define_expand "<optab><mode>2"
  [(set (match_operand:VF 0 "register_operand")
    (any_float_unop_nofrm:VF
     (match_operand:VF 1 "register_operand")))]
  "TARGET_VECTOR && !(GET_MODE_INNER (<MODE>mode) == HFmode && !TARGET_ZVFH)"
{
  insn_code icode = code_for_pred (<CODE>, <MODE>mode);
  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
  DONE;
})

Add && !(GET_MODE_INNER (<MODE>mode) == HFmode && !TARGET_ZVFH)
to condition.

It works for both TARGET_ZVFH and TARGET_ZVFHMIN
-march=rv64gcv_zvfhmin:
f:
        li      a4,2147450880
        li      a5,-2147450880
        addi    a4,a4,-1
        addi    a5,a5,1
        slli    a3,a5,32
        slli    a2,a4,32
        mv      a5,a4
        li      a4,-2147450880
        addi    a6,a1,200
        add     a3,a3,a4
        add     a2,a2,a5
.L2:
        ld      a5,0(a1)
        addi    a0,a0,8
        addi    a1,a1,8
        not     a4,a5
        and     a5,a5,a2
        and     a4,a4,a3
        sub     a5,a3,a5
        xor     a5,a4,a5
        sd      a5,-8(a0)
        bne     a1,a6,.L2
        ret

-march=rv64gcv_zvfh:
f:
        vsetivli        zero,8,e16,m1,ta,ma
        addi    a4,a1,16
        addi    a5,a0,16
        vle16.v v1,0(a1)
        vfneg.v v1,v1
        vse16.v v1,0(a0)
        addi    a2,a1,32
        addi    a3,a0,32
        vle16.v v1,0(a4)
        vfneg.v v1,v1
        vse16.v v1,0(a5)
        addi    a4,a1,48
        addi    a5,a0,48
        vle16.v v1,0(a2)
        vfneg.v v1,v1
        vse16.v v1,0(a3)
        addi    a2,a1,64
        addi    a3,a0,64
        vle16.v v1,0(a4)
        vfneg.v v1,v1
        vse16.v v1,0(a5)
        addi    a4,a1,80
        addi    a5,a0,80
        vle16.v v1,0(a2)
        vfneg.v v1,v1
        vse16.v v1,0(a3)
....


This is what we expected, TARGET_ZVFH enable auto-vectorization wheras no 
auto-vectorization when TARGET_ZVFHMIN since
vfneg.v is not allowed in TARGET_ZVFHMIN.

However, I think adding !(GET_MODE_INNER (<MODE>mode) == HFmode && !TARGET_ZVFH)
is an ugly implementation and not easy to maintain since we will need add this 
condition to each floating-point patterns.

So, give me some time to figure out an elegant way to support 
auto-vectorization.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-15 03:43
To: Robin Dapp; gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai
Subject: Re: [PATCH] RISC-V: Add autovec FP unary operations.
 
 
On 6/14/23 09:31, Robin Dapp wrote:
> Hi,
> 
> this patch adds floating-point autovec expanders for vfneg, vfabs as well as
> vfsqrt and the accompanying tests.  vfrsqrt7 will be added at a later time.
So with vrsqrt7 I think the question turns into will we be able to use 
it effectively.  With its limited initial accuracy, we'll be stuck with 
another round of Newton-Raphson or Goldschmidt, so we're not likely 
going to beat the latency of a standard vsqrt.  We can use it to improve 
throughput though since it does pipeline (using the fmacs of course, so 
there's a definite trade-off if the fmacs are already saturated).
 
 
> 
> Similary to the binop tests, there are flavors for zvfh now.  Prerequisites
> as before.
> 
> Regards
>   Robin
> 
> gcc/ChangeLog:
> 
> * config/riscv/autovec.md (<optab><mode>2): Add unop expanders.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
> * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
> * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
> * gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
> * gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
> * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
> * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
> * gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
> * gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
> * gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
> * gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
> * gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
> * gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
LGTM.  So if Juzhe is happy with it, then it's good to go once 
dependencies are resolved.
 
jeff
 
 

Reply via email to