Hi Robert, Thanks for the update and detailed comments. There are a couple of things which I think still need addressing based solely on your comments but generally the changes seem to have simplified things which is nice to see.
I haven't read the patch again yet and given the changes it looks like a complete re-read is in order so it will be a few days. All being well there won't be much to say. > > > > >+(define_expand "msa_ldi<mode>" > > >+ [(match_operand:IMSA 0 "register_operand") > > >+ (match_operand 1 "const_imm10_operand")] > > >+ "ISA_HAS_MSA" > > >+{ > > >+ unsigned n_elts = GET_MODE_NUNITS (<MODE>mode); > > >+ rtvec v = rtvec_alloc (n_elts); > > >+ HOST_WIDE_INT val = INTVAL (operands[1]); > > >+ unsigned int i; > > >+ > > >+ if (<MODE>mode != V16QImode) > > >+ { > > >+ unsigned shift = HOST_BITS_PER_WIDE_INT - 10; > > >+ val = trunc_int_for_mode ((val << shift) >> shift, <UNITMODE>mode); > > >+ } > > >+ else > > >+ val = trunc_int_for_mode (val, <UNITMODE>mode); > > >+ > > >+ for (i = 0; i < n_elts; i++) > > >+ RTVEC_ELT (v, i) = GEN_INT (val); > > >+ emit_move_insn (operands[0], > > >+ gen_rtx_CONST_VECTOR (<MODE>mode, v)); > > >+ DONE; > > >+}) > > > > This is really weird. We shouldn't be simply discarding bits that don't fit. > > This needs to accept all immediates and generate the correct code to > > get a replicated constant of that value into a register. I think it is > > probably OK to trunc_int_for_mode on the original 'val' for the > > <UNIT>mode but anything out of range for V*HI/SI/DI needs to be expanded > > properly. > > > > Please do not gen_msa_ldi anywhere other than from MSA builtins. There is > > no need just emit a move directly. > > AFAICS, the truncation for everything except V16QImode is not needed > since we have the predicate here. Truncating the immediate for bytes may make > life easier for users when debugging. Although the extra bits are ignored by > the hardware, it doesn't stop us from encoding numbers out of range. > The RTL doesn't seem to have validation of ranges of constants and modes. > I did a small test and could output any number within the allowable range > in the predicate. This is still giving me cause for concern. I think we need to expand this fully so that constants larger than s10 can be loaded albeit via a longer sequence. I am not a fan of simplistic builtins that simply map 1:1 with hardware instructions when that means there is unexpected impact if you don't know the exact details of the underlying instruction. > > > > >+;; Offset load > > >+(define_expand "msa_ld_<msafmt_f>" > > >+ [(match_operand:MSA 0 "register_operand") > > >+ (match_operand 1 "pmode_register_operand") > > >+ (match_operand 2 "aq10<msafmt>_operand")] > > >+ "ISA_HAS_MSA" > > >+{ > > >+ rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], > > >+ INTVAL (operands[2])); > > >+ mips_emit_move (operands[0], gen_rtx_MEM (<MODE>mode, addr)); > > >+ DONE; > > >+}) > > >+ > > >+;; Offset store > > >+(define_expand "msa_st_<msafmt_f>" > > >+ [(match_operand:MSA 0 "register_operand") > > >+ (match_operand 1 "pmode_register_operand") > > >+ (match_operand 2 "aq10<msafmt>_operand")] > > >+ "ISA_HAS_MSA" > > >+{ > > >+ rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], > > >+ INTVAL (operands[2])); > > >+ mips_emit_move (gen_rtx_MEM (<MODE>mode, addr), operands[0]); > > >+ DONE; > > >+}) > > > > There's no real need to expand these in C code. The patterns can be used > > to create the RTL. As an aside, I don't really see the point in intrinsics > > to load and store data the same thing can be done from straight C. > > The patterns also can't be used for const or volatile data as their > > builtin prototypes are neither. I suspect they should at least support > > pointers to const data. > > I was going to remove the expansion in C code but the problem is that > we need the Pmode. I could duplicate the patterns and override the icode > e.g for TARGET_64BIT when expanding the built-ins but this doesn't look > cleaner. Unless I'm missing something here? > Since the intrinsics have been present for a while it would probably confuse > if we removed them now. Indeed, the intrinsics are not really needed as we > can do loads and stores from C. I changed the type in the prototype > to CVPOINTER so qualifiers like const/volatile are not discarded. The volatile is however not propagated to the memref so is effectively lost I believe. The constant in the offset here has a similar problem to msa_ldi as an out-of-range value will just go wrong. I'll look for other cases when reading this time. > > > > >+ "ISA_HAS_MSA" > > >+ "andi.b\t%w0,%w1,%E2" > > >+ [(set_attr "type" "simd_logic") > > >+ (set_attr "mode" "V16QI")]) > > >+ > > > > END USEFUL commnets Oops, thanks for reading on. That was my marker for how far I'd got in one session!. > > >+(define_insn "msa_bclr_<msafmt>" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f") > > >+ (match_operand:IMSA 2 "register_operand" "f")] > > >+ UNSPEC_MSA_BCLR))] > > >+ "ISA_HAS_MSA" > > >+ "bclr.<msafmt>\t%w0,%w1,%w2" > > >+ [(set_attr "type" "simd_bit") > > >+ (set_attr "mode" "<MODE>")]) > > >+ > > >+(define_insn "msa_bclri_<msafmt>" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f") > > >+ (match_operand 2 "const_<bitimm>_operand" "")] > > >+ UNSPEC_MSA_BCLRI))] > > >+ "ISA_HAS_MSA" > > >+ "bclri.<msafmt>\t%w0,%w1,%2" > > >+ [(set_attr "type" "simd_bit") > > >+ (set_attr "mode" "<MODE>")]) > > >+ > > >+(define_insn "msa_binsl_<msafmt>" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0") > > >+ (match_operand:IMSA 2 "register_operand" "f") > > >+ (match_operand:IMSA 3 "register_operand" "f")] > > >+ UNSPEC_MSA_BINSL))] > > >+ "ISA_HAS_MSA" > > >+ "binsl.<msafmt>\t%w0,%w2,%w3" > > >+ [(set_attr "type" "simd_bitins") > > >+ (set_attr "mode" "<MODE>")]) > > >+ > > >+(define_insn "msa_binsli_<msafmt>" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0") > > >+ (match_operand:IMSA 2 "register_operand" "f") > > >+ (match_operand 3 "const_<bitimm>_operand" "")] > > >+ UNSPEC_MSA_BINSLI))] > > >+ "ISA_HAS_MSA" > > >+ "binsli.<msafmt>\t%w0,%w2,%3" > > >+ [(set_attr "type" "simd_bitins") > > >+ (set_attr "mode" "<MODE>")]) > > >+ > > >+(define_insn "msa_binsr_<msafmt>" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0") > > >+ (match_operand:IMSA 2 "register_operand" "f") > > >+ (match_operand:IMSA 3 "register_operand" "f")] > > >+ UNSPEC_MSA_BINSR))] > > >+ "ISA_HAS_MSA" > > >+ "binsr.<msafmt>\t%w0,%w2,%w3" > > >+ [(set_attr "type" "simd_bitins") > > >+ (set_attr "mode" "<MODE>")]) > > >+ > > >+(define_insn "msa_binsri_<msafmt>" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0") > > >+ (match_operand:IMSA 2 "register_operand" "f") > > >+ (match_operand 3 "const_<bitimm>_operand" "")] > > >+ UNSPEC_MSA_BINSRI))] > > >+ "ISA_HAS_MSA" > > >+ "binsri.<msafmt>\t%w0,%w2,%3" > > >+ [(set_attr "type" "simd_bitins") > > >+ (set_attr "mode" "<MODE>")]) > > > > As a follow up these instructions may be represenatble with standard > > RTL. > > The "insvm" appears to be a good candidate but I'm not sure if this > is going to work out of the box for vector modes. This would apply > only for instructions with a replicated immediate. Let's leave it for follow on work. > > >+(define_insn "msa_<FCC:fcc>_<FMSA:msafmt>" > > >+ [(set (match_operand:<VIMODE> 0 "register_operand" "=f") > > >+ (FCC:<VIMODE> (match_operand:FMSA 1 "register_operand" "f") > > >+ (match_operand:FMSA 2 "register_operand" "f")))] > > >+ "ISA_HAS_MSA" > > >+ "<FCC:fcc>.<FMSA:msafmt>\t%w0,%w1,%w2" > > >+ [(set_attr "type" "simd_fcmp") > > >+ (set_attr "mode" "<MODE>")]) > > > > Can't the msa builtins target these patterns directly? Follow on work should > > implement "mov<mode>cc" for vector modes. > > > > >+ > > >+(define_insn "msa_<fsc>_<FMSA:msafmt>" > > >+ [(set (match_operand:<VIMODE> 0 "register_operand" "=f") > > >+ (unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f") > > >+ (match_operand:FMSA 2 "register_operand" "f")] > > >+ FSC_UNS))] > > >+ "ISA_HAS_MSA" > > >+ "<fsc>.<FMSA:msafmt>\t%w0,%w1,%w2" > > >+ [(set_attr "type" "simd_fcmp") > > >+ (set_attr "mode" "<MODE>")]) > > > > i.e. this is not necessary. > > The difference between this and above is the signalling vs quiet NaNs floating > point comparison instructions. The quiet NaNs are represented as standard RTL > and > signalling as UNSPEC. "mov<mode>cc" for vector modes is questionable as > "vcondmn" > is meant to be used for conditional vector moves unless I misunderstood > something. Ah, my mistake. I didn't check if mov<mode>cc was applicable to vector modes. Same thing applies that this kind of thing is good for future work. > > > > >+(define_insn "msa_max_a_<msafmt>" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f") > > >+ (match_operand:IMSA 2 "register_operand" "f")] > > >+ UNSPEC_MSA_MAX_A))] > > >+ "ISA_HAS_MSA" > > >+ "max_a.<msafmt>\t%w0,%w1,%w2" > > >+ [(set_attr "type" "simd_int_arith") > > >+ (set_attr "mode" "<MODE>")]) > > > > This could be used as part of mov<mode>cc support. > > AFAUI, mov<mode>cc is used for non-vector modes and the support > should go into vcond[u]mn SPNs. However, for the above and floating-point > version I'm not sure if we can add it there as we don't have enough > information as it appears to be simplified to a basic comparison. > > I haven't confirmed it yet but it would appear that the support > for these operations would have to be explicitly added to > the auto-vectorizer to recognize these patterns. OK. We will have to see if the work Prachi and Sameera are doing will enable these to be targeted. > > > > >+(define_insn "msa_nloc_<msafmt>" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")] > > >+ UNSPEC_MSA_NLOC))] > > >+ "ISA_HAS_MSA" > > >+ "nloc.<msafmt>\t%w0,%w1" > > >+ [(set_attr "type" "simd_bit") > > >+ (set_attr "mode" "<MODE>")]) > > >+ > > >+(define_insn "clz<mode>2" > > >+ [(set (match_operand:IMSA 0 "register_operand" "=f") > > >+ (clz:IMSA (match_operand:IMSA 1 "register_operand" "f")))] > > >+ "ISA_HAS_MSA" > > >+ "nlzc.<msafmt>\t%w0,%w1" > > >+ [(set_attr "type" "simd_bit") > > >+ (set_attr "mode" "<MODE>")]) > > > > Can you confirm that nlzc has a natural value when given an operand of zero? > > I.e. 8 for B 16 for H 32 for W and 64 for D? > > > > Also CLZ_DEFINED_VALUE_AT_ZERO looks like it needs updating to know the > > bitsize of elements in a vector rather than the whole vector. I.e. I think > > it will say the result is 128 for any vector mode. > > I'm not so sure about this one. > > The above pattern as it stands doesn't accept the zero operand, would the > macro > would still matter? The built-in also rejects the zero argument. I guess not then. > It appears that the code around, where the macro is used, doesn't handle > vector modes > and the modification of this macro may potentially lead to strange errors. > AFAICS, it's very unlikely that we ever hit those paths so it appears we are > safe. > > For the sake of consistency, I think that we still should consider > vector elements so I updated the macro to use GET_MODE_UNIT_BITSIZE. > > OK. I'll take a read when I go through this again. Thanks, Matthew