There seems to some problem with the email server, try my another email
address to send this email.
On 2023/8/29 00:57, Richard Henderson wrote:
On 8/28/23 08:19, Jiajie Chen wrote:
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned
vece,
+ TCGReg rd, int64_t v64)
+{
+ /* Try vldi if imm can fit */
+ if (vece <= MO_32 && (-0x200 <= v64 && v64 <= 0x1FF)) {
+ uint32_t imm = (vece << 10) | ((uint32_t)v64 & 0x3FF);
+ tcg_out_opc_vldi(s, rd, imm);
+ return;
+ }
v64 has the value replicated across 64 bits.
In order to do the comparison above, you'll want
int64_t vale = sextract64(v64, 0, 8 << vece);
if (-0x200 <= vale && vale <= 0x1ff)
...
Since the only documentation for LSX is qemu's own translator code,
why are you testing vece <= MO_32? MO_64 should be available as
well? Or is there a bug in trans_vldi()?
Sorry, my mistake. I was messing MO_64 with bit 12 in vldi imm.
It might be nice to leave a to-do for vldi imm bit 12 set, for the
patterns expanded by vldi_get_value(). In particular, mode == 9 is
likely to be useful, and modes {1,2,3,5} are easy to test for.
Sure, I was thinking about the complexity of pattern matching on those
modes, and decided to skip the hard part in the first patch series.
+
+ /* Fallback to vreplgr2vr */
+ tcg_out_movi(s, type, TCG_REG_TMP0, v64);
type is a vector type; you can't use it here.
Correct would be TCG_TYPE_I64.
Better to load vale instead, since that will take fewer insns in
tcg_out_movi.
Sure.
+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+ unsigned vecl, unsigned vece,
+ const TCGArg args[TCG_MAX_OP_ARGS],
+ const int const_args[TCG_MAX_OP_ARGS])
+{
+ TCGType type = vecl + TCG_TYPE_V64;
+ TCGArg a0, a1, a2;
+ TCGReg base;
+ TCGReg temp = TCG_REG_TMP0;
+ int32_t offset;
+
+ a0 = args[0];
+ a1 = args[1];
+ a2 = args[2];
+
+ /* Currently only supports V128 */
+ tcg_debug_assert(type == TCG_TYPE_V128);
+
+ switch (opc) {
+ case INDEX_op_st_vec:
+ /* Try to fit vst imm */
+ if (-0x800 <= a2 && a2 <= 0x7ff) {
+ base = a1;
+ offset = a2;
+ } else {
+ tcg_out_addi(s, TCG_TYPE_I64, temp, a1, a2);
+ base = temp;
+ offset = 0;
+ }
+ tcg_out_opc_vst(s, a0, base, offset);
+ break;
+ case INDEX_op_ld_vec:
+ /* Try to fit vld imm */
+ if (-0x800 <= a2 && a2 <= 0x7ff) {
+ base = a1;
+ offset = a2;
+ } else {
+ tcg_out_addi(s, TCG_TYPE_I64, temp, a1, a2);
+ base = temp;
+ offset = 0;
+ }
+ tcg_out_opc_vld(s, a0, base, offset);
tcg_out_addi has a hole in bits [15:12], and can take an extra insn if
those bits are set. Better to load the offset with tcg_out_movi and
then use VLDX/VSTX instead of VLD/VST.
Sure.
@@ -159,6 +170,30 @@ typedef enum {
#define TCG_TARGET_HAS_mulsh_i64 1
#define TCG_TARGET_HAS_qemu_ldst_i128 0
+#define TCG_TARGET_HAS_v64 0
+#define TCG_TARGET_HAS_v128 use_lsx_instructions
+#define TCG_TARGET_HAS_v256 0
Perhaps reserve for a follow-up, but TCG_TARGET_HAS_v64 can easily be
supported using the same instructions.
The only difference is load/store, where you could use FLD.D/FST.D to
load the lower 64-bits of the fp/vector register, or VLDREPL.D to load
and initialize all bits and VSTELM.D to store the lower 64-bits.
I tend to think the float insns are more flexible, having a larger
displacement, and the availability of FLDX/FSTX as well.
Sure.
r~