Hi, Jeff. Thanks for quick approval.
When I reviewed the patch:
(define_expand "<optab><mode>2"
[(set (match_operand:VF 0 "register_operand")
(any_float_unop_nofrm:VF
(match_operand:VF 1 "register_operand")))]
"TARGET_VECTOR"
{
insn_code icode = code_for_pred (<CODE>, <MODE>mode);
riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
DONE;
})
There could be issue here of FP16 vector.
Since let's see VF iterator:
(define_mode_iterator VF [
(VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
(VNx2HF "TARGET_VECTOR_ELEN_FP_16")
(VNx4HF "TARGET_VECTOR_ELEN_FP_16")
(VNx8HF "TARGET_VECTOR_ELEN_FP_16")
(VNx16HF "TARGET_VECTOR_ELEN_FP_16")
(VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
(VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
....
You can see For all FP16 mode, we use predicate "TARGET_VECTOR_ELEN_FP_16"
which is true when either TARGET_ZVFHM or TARGET_ZVFHMIN.
The reason we do that since most floating-point instructions are using
same iterators that we can't add TARGET_ZVFHMIN or TARGET_ZVFH
in naive way. Some instructions pattern are using VF for example vle16.v
which should be enabled as long as TARGET_ZVFHMIN wheras
the instructions like vfneg.v need TARGET_ZVFH.
So I do the experiment:
void
f (_Float16 *restrict a, _Float16 *restrict b)
{
for (int i = 0; i < 100; ++i)
{
a[i] = -b[i];
}
}
with compile option:
-march=rv64gcv_zvfhmin --param=riscv-autovec-preference=fixed-vlmax -O3
ICE happens:
auto.c:26:1: error: unable to generate reloads for:
(insn 8 7 9 2 (set (reg:VNx8HF 186 [ vect__6.7 ])
(if_then_else:VNx8HF (unspec:VNx8BI [
(const_vector:VNx8BI [
(const_int 1 [0x1]) repeated x8
])
(const_int 8 [0x8])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(neg:VNx8HF (reg:VNx8HF 134 [ vect__4.6 ]))
(unspec:VNx8HF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "auto.c":24:14 6631 {pred_negvnx8hf}
(expr_list:REG_DEAD (reg:VNx8HF 134 [ vect__4.6 ])
(nil)))
The reason of ICE is that we have enabled auto-vectorzation pattern of
vfneg.v when TARGET_ZVFHMIN according to VF iterators but
the instructions pattern of vfneg.v is correctly disabled and only
enabled when TARGET_ZVFH since we have this attribute for each
RVV instruction pattern:
(define_attr "fp_vector_disabled" "no,yes"
(cond [
(and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
vfwalu,vfwmul,vfmuladd,vfwmuladd,
vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
vfclass,vfmerge,
vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
vfredo,vfredu,vfwredo,vfwredu,
vfslide1up,vfslide1down")
(and (eq_attr "mode"
"VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
(match_test "!TARGET_ZVFH")))
(const_string "yes")
;; The mode records as QI for the FP16 <=> INT8 instruction.
(and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
(and (eq_attr "mode"
"VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
(match_test "!TARGET_ZVFH")))
(const_string "yes")
]
(const_string "no")))
When I slightly change the pattern as follows:
(define_expand "<optab><mode>2"
[(set (match_operand:VF 0 "register_operand")
(any_float_unop_nofrm:VF
(match_operand:VF 1 "register_operand")))]
"TARGET_VECTOR && !(GET_MODE_INNER (<MODE>mode) == HFmode && !TARGET_ZVFH)"
{
insn_code icode = code_for_pred (<CODE>, <MODE>mode);
riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
DONE;
})
Add && !(GET_MODE_INNER (<MODE>mode) == HFmode && !TARGET_ZVFH)
to condition.
It works for both TARGET_ZVFH and TARGET_ZVFHMIN
-march=rv64gcv_zvfhmin:
f:
li a4,2147450880
li a5,-2147450880
addi a4,a4,-1
addi a5,a5,1
slli a3,a5,32
slli a2,a4,32
mv a5,a4
li a4,-2147450880
addi a6,a1,200
add a3,a3,a4
add a2,a2,a5
.L2:
ld a5,0(a1)
addi a0,a0,8
addi a1,a1,8
not a4,a5
and a5,a5,a2
and a4,a4,a3
sub a5,a3,a5
xor a5,a4,a5
sd a5,-8(a0)
bne a1,a6,.L2
ret
-march=rv64gcv_zvfh:
f:
vsetivli zero,8,e16,m1,ta,ma
addi a4,a1,16
addi a5,a0,16
vle16.v v1,0(a1)
vfneg.v v1,v1
vse16.v v1,0(a0)
addi a2,a1,32
addi a3,a0,32
vle16.v v1,0(a4)
vfneg.v v1,v1
vse16.v v1,0(a5)
addi a4,a1,48
addi a5,a0,48
vle16.v v1,0(a2)
vfneg.v v1,v1
vse16.v v1,0(a3)
addi a2,a1,64
addi a3,a0,64
vle16.v v1,0(a4)
vfneg.v v1,v1
vse16.v v1,0(a5)
addi a4,a1,80
addi a5,a0,80
vle16.v v1,0(a2)
vfneg.v v1,v1
vse16.v v1,0(a3)
....
This is what we expected, TARGET_ZVFH enable auto-vectorization wheras
no auto-vectorization when TARGET_ZVFHMIN since
vfneg.v is not allowed in TARGET_ZVFHMIN.
However, I think adding !(GET_MODE_INNER (<MODE>mode) == HFmode &&
!TARGET_ZVFH)
is an ugly implementation and not easy to maintain since we will need
add this condition to each floating-point patterns.
So, give me some time to figure out an elegant way to support
auto-vectorization.