Hi Juzhe,

> +;; We can't expand FMA for the following reasons:

But we do :)  We just haven't selected the proper alternative yet.

> +;; 1. Before RA, we don't know which multiply-add instruction is the ideal 
> one.
> +;;    The vmacc is the ideal instruction when operands[3] overlaps 
> operands[0].
> +;;    The vmadd is the ideal instruction when operands[1|2] overlaps 
> operands[0].
> +;; 2. According to vector.md, the multiply-add patterns has 'merge' operand 
> which
> +;;    is the operands[5]. Since operands[5] should overlap operands[0], this 
> operand
> +;;    should be allocated the same regno as operands[1|2|3].
> +;; 3. The 'merge' operand is always a real merge operand and we don't allow 
> undefined
> +;;    operand.
> +;; 3. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL 
> operand.

Can you explain these two points (3 and 4, maybe 2) a bit in the comments?
I.e. what makes fma different from a normal insn?

> +(define_insn_and_split "*fma<mode>"
> +  [(set (match_operand:VI 0 "register_operand"     "=vr, vr, ?&vr")
> +     (plus:VI
> +       (mult:VI
> +         (match_operand:VI 1 "register_operand" " %0, vr,   vr")
> +         (match_operand:VI 2 "register_operand" " vr, vr,   vr"))
> +       (match_operand:VI 3 "register_operand"   " vr,  0,   vr")))
> +   (clobber (match_scratch:SI 4 "=r,r,r"))]
> +  "TARGET_VECTOR"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +    PUT_MODE (operands[4], Pmode);
> +    riscv_vector::emit_vlmax_vsetvl (<MODE>mode, operands[4]);
> +    if (which_alternative == 3)

We only have three alternatives here.

> +      emit_insn (gen_rtx_SET (operands[0], operands[3]));
> +    rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[0]};
> +    riscv_vector::emit_vlmax_ternop_insn (code_for_pred_mul_plus 
> (<MODE>mode),
> +                                       riscv_vector::RVV_TERNOP, ops, 
> operands[4]);
> +    DONE;
> +  }
> +  [(set_attr "type" "vimuladd")
> +   (set_attr "mode" "<MODE>")])
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 36419c95bbd..86b2798fb5e 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -140,6 +140,7 @@ enum insn_type
>    RVV_MERGE_OP = 4,
>    RVV_CMP_OP = 4,
>    RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand.  */
> +  RVV_TERNOP = 5,
>  };

> +emit_vlmax_ternop_insn (unsigned icode, int op_num, rtx *ops, rtx vl)

We have a bit of naming overlap between "insn" an "op" already.  I would go
with just ternay_insn or tern_insn here.  That the insn_types have OP in
their name is unfortunate but let's keep that for now. 

> +  machine_mode data_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = get_mask_mode (data_mode).require ();
> +  /* We have a maximum of 11 operands for RVV instruction patterns according 
> to
> +   * vector.md.  */
> +  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
> +                    /*FULLY_UNMASKED_P*/ true,
> +                    /*USE_REAL_MERGE_P*/ true, /*HAS_AVL_P*/ true,
> +                    /*VLMAX_P*/ true,
> +                    /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);

Can we call data_mode dest_mode here?  data_mode imho only makes sense in
the context of conditionals where we have a comparison mode and a data mode.
I mean you could argue we always have a data mode and a mask mode so the
naming makes sense again but then we should get rid of dest_mode.

> +  /* According to LRA mov pattern in vector.md, we have a clobber operand
> +     to be used ad VL operand.  */
> +  e.set_vl (vl);

How does the LRA mov pattern (mov_lra?) come into play here?  I know the same
line is already in emit_vlmax_insn but it also is odd there.  What we actually
do is pass either NULL as length (before lra/reload) or a pre-allocated scratch
that we can use as vlmax length.

> +#include <stdint-gcc.h>
> +
> +#define TEST_TYPE(TYPE)                                                      
>   \
> +  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,          
>   \
> +                                           TYPE *__restrict a,              \
> +                                           TYPE *__restrict b, int n)       \

> +#define TEST_TYPE(TYPE)                                                      
>   \
> +  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *restrict dest1,          
>   \
> +                                           TYPE *restrict dest2,            \
> +                                           TYPE *restrict dest3,            \
> +                                           TYPE *restrict src1,             \
> +                                           TYPE *restrict src2, int n)      

__restrict vs restrict.

> +int __attribute__ ((optimize (1))) main ()
> +int __attribute__ ((optimize (0))) main ()
Why the difference here?  Why do we need to restrict the optimization here
anyway?

Btw. any reason why you don't include fms, vnmsac in the patch?  Wouldn't the
patterns be really similar or do you have other plans for those?  Not needed
for this patch, just curious.

Regards
 Robin

Reply via email to