Hi Juzhe, > +;; We can't expand FMA for the following reasons:
But we do :) We just haven't selected the proper alternative yet. > +;; 1. Before RA, we don't know which multiply-add instruction is the ideal > one. > +;; The vmacc is the ideal instruction when operands[3] overlaps > operands[0]. > +;; The vmadd is the ideal instruction when operands[1|2] overlaps > operands[0]. > +;; 2. According to vector.md, the multiply-add patterns has 'merge' operand > which > +;; is the operands[5]. Since operands[5] should overlap operands[0], this > operand > +;; should be allocated the same regno as operands[1|2|3]. > +;; 3. The 'merge' operand is always a real merge operand and we don't allow > undefined > +;; operand. > +;; 3. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL > operand. Can you explain these two points (3 and 4, maybe 2) a bit in the comments? I.e. what makes fma different from a normal insn? > +(define_insn_and_split "*fma<mode>" > + [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr") > + (plus:VI > + (mult:VI > + (match_operand:VI 1 "register_operand" " %0, vr, vr") > + (match_operand:VI 2 "register_operand" " vr, vr, vr")) > + (match_operand:VI 3 "register_operand" " vr, 0, vr"))) > + (clobber (match_scratch:SI 4 "=r,r,r"))] > + "TARGET_VECTOR" > + "#" > + "&& reload_completed" > + [(const_int 0)] > + { > + PUT_MODE (operands[4], Pmode); > + riscv_vector::emit_vlmax_vsetvl (<MODE>mode, operands[4]); > + if (which_alternative == 3) We only have three alternatives here. > + emit_insn (gen_rtx_SET (operands[0], operands[3])); > + rtx ops[] = {operands[0], operands[1], operands[2], operands[3], > operands[0]}; > + riscv_vector::emit_vlmax_ternop_insn (code_for_pred_mul_plus > (<MODE>mode), > + riscv_vector::RVV_TERNOP, ops, > operands[4]); > + DONE; > + } > + [(set_attr "type" "vimuladd") > + (set_attr "mode" "<MODE>")]) > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h > index 36419c95bbd..86b2798fb5e 100644 > --- a/gcc/config/riscv/riscv-protos.h > +++ b/gcc/config/riscv/riscv-protos.h > @@ -140,6 +140,7 @@ enum insn_type > RVV_MERGE_OP = 4, > RVV_CMP_OP = 4, > RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand. */ > + RVV_TERNOP = 5, > }; > +emit_vlmax_ternop_insn (unsigned icode, int op_num, rtx *ops, rtx vl) We have a bit of naming overlap between "insn" an "op" already. I would go with just ternay_insn or tern_insn here. That the insn_types have OP in their name is unfortunate but let's keep that for now. > + machine_mode data_mode = GET_MODE (ops[0]); > + machine_mode mask_mode = get_mask_mode (data_mode).require (); > + /* We have a maximum of 11 operands for RVV instruction patterns according > to > + * vector.md. */ > + insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true, > + /*FULLY_UNMASKED_P*/ true, > + /*USE_REAL_MERGE_P*/ true, /*HAS_AVL_P*/ true, > + /*VLMAX_P*/ true, > + /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode); Can we call data_mode dest_mode here? data_mode imho only makes sense in the context of conditionals where we have a comparison mode and a data mode. I mean you could argue we always have a data mode and a mask mode so the naming makes sense again but then we should get rid of dest_mode. > + /* According to LRA mov pattern in vector.md, we have a clobber operand > + to be used ad VL operand. */ > + e.set_vl (vl); How does the LRA mov pattern (mov_lra?) come into play here? I know the same line is already in emit_vlmax_insn but it also is odd there. What we actually do is pass either NULL as length (before lra/reload) or a pre-allocated scratch that we can use as vlmax length. > +#include <stdint-gcc.h> > + > +#define TEST_TYPE(TYPE) > \ > + __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, > \ > + TYPE *__restrict a, \ > + TYPE *__restrict b, int n) \ > +#define TEST_TYPE(TYPE) > \ > + __attribute__ ((noipa)) void ternop_##TYPE (TYPE *restrict dest1, > \ > + TYPE *restrict dest2, \ > + TYPE *restrict dest3, \ > + TYPE *restrict src1, \ > + TYPE *restrict src2, int n) __restrict vs restrict. > +int __attribute__ ((optimize (1))) main () > +int __attribute__ ((optimize (0))) main () Why the difference here? Why do we need to restrict the optimization here anyway? Btw. any reason why you don't include fms, vnmsac in the patch? Wouldn't the patterns be really similar or do you have other plans for those? Not needed for this patch, just curious. Regards Robin