On Fri, 08 Mar 2019 10:24:23 +0900, Richard Henderson wrote: > > On 3/1/19 10:21 PM, Yoshinori Sato wrote: > > My git repository is bellow. > > git://git.pf.osdn.net/gitroot/y/ys/ysato/qemu.git > > Somehow patch 1 did not arrive, so I am reviewing based on > rebasing this branch against master, and then looking at the diff. > > > +struct CCop; > > Unused?
Yes. I forgot remove. Remove it. > > +static inline void cpu_get_tb_cpu_state(CPURXState *env, target_ulong *pc, > > + target_ulong *cs_base, uint32_t > > *flags) > > +{ > > + *pc = env->pc; > > + *cs_base = 0; > > + *flags = 0; > > +} > > *flags should contain PSW[PM], I think, so that knowledge of the > privilege level is given to the translator. > > Looking forward I see that you're currently testing cpu_psw_pm dynamically for > instructions that require privs, so what you have is not wrong, but this is > exactly the sort of thing that TB flags are for. OK. Update PSW.PM copy to DisasContextBase.flags. > > +#define RX_PSW_OP_NEG 4 > > Unused? Yes. Remove it. > > +#define RX_BYTE 0 > > +#define RX_WORD 1 > > +#define RX_LONG 2 > > Redundant with TCGMemOps: MO_8, MO_16, MO_32? OK. Convert it. > > +++ b/target/rx/insns.decode > > Should have a copyright + license notice. > > > +BCnd_b 0010 cd:4 dsp:8 &bcnd > ... > > +#BRA_b 0010 1110 dsp:8 # overlaps BCnd_b > > FYI, using pattern groups this can be written > > { > BRA_b 0010 1110 dsp:8 > Bcnd_b 0010 cd:4 dsp:8 > } > > I expect to have pattern groups merged this week. OK. Update pattern groups. > > +static void rx_tr_init_disas_context(DisasContextBase *dcbase, CPUState > > *cs) > > +{ > > + DisasContext *ctx = container_of(dcbase, DisasContext, base); > > + > > + ctx->src = tcg_temp_new(); > > +} > > This allocation will not survive across a branch or label. > So, after any SHIFTR_REG, for instance. > > I think you need to allocate this on demand each insn, and > free it as necessary after each insn. > > (Although avoiding branches in tcg is always helpful.) OK. Remove it. > > +/* generic load / store wrapper */ > > +static inline void rx_gen_ldst(unsigned int size, unsigned int dir, > > + TCGv reg, TCGv mem) > > +{ > > + if (dir) { > > + tcg_gen_qemu_ld_i32(reg, mem, 0, size | MO_SIGN | MO_TE); > > + } else { > > + tcg_gen_qemu_st_i32(reg, mem, 0, size | MO_TE); > > + } > > +} > > It would probably be worthwhile to split this function, and drop the "dir" > parameter. It is always a constant, so instead of > > rx_gen_ldst(mi, RX_MEMORY_LD, ctx->src, ctx->src); > rx_gen_ldst(a->sz, RX_MEMORY_ST, cpu_regs[a->rs], ctx->src); > > use the shorter > > rx_gen_ld(mi, ctx->src, ctx->src); > rx_gen_st(a->sz, cpu_regs[a->rs], ctx->src); OK. fixed > > > +/* mov.l #uimm4,rd */ > > +/* mov.l #uimm8,rd */ > > +static bool trans_MOV_ri(DisasContext *ctx, arg_MOV_ri *a) > > +{ > > + tcg_gen_movi_i32(cpu_regs[a->rd], a->imm & 0xff); > > + return true; > > +} > > a->imm will already have been masked by the decode. > You can drop the & 0xff here. That was right. It seems that I was misunderstood. > > +/* mov.l #imm,rd */ > > +static bool trans_MOV_rli(DisasContext *ctx, arg_MOV_rli *a) > > +{ > > + tcg_gen_movi_i32(cpu_regs[a->rd], a->imm); > > + return true; > > +} > > As written, this function is redundant. We should be using the same MOV_ri, > with the immediate loaded from %b2_li_2. > > Likewise for trans_MOV_mi vs trans_MOV_mli. That said... OK. Unified various interger instructions. > > +static bool trans_MOV_mli(DisasContext *ctx, arg_MOV_mli *a) > > +{ > > + TCGv imm = tcg_const_i32(a->imm); > > + if (a->ld == 2) { > > + a->dsp = bswap_16(a->dsp); > > + } > > This suggests that the decode is incorrect. Or perhaps the construction of > the > 32-bit insn passed into decode. In decode_load_bytes, we construct a > big-endian value, so it would seem this dsp field should be loaded as a > little-endian value. > > This could be fixed by not attempting to load the LI constant in decodetree > (for this insn), which would in turn not require that you decode the LD > operand > by hand in decodetree. E.g. > > static bool trans_MOV_mli(DisasContext *ctx, arg_MOV_mli *a) > { > TCGv imm; > > /* dsp operand comes before immediate operand */ > rx_index_addr(a->ld, a->sz, a->rd, s); > imm = tcg_const_i32(li(s, a->li)); > rx_gen_st(a->sz, imm, ctx->src); > tcg_temp_free_i32(imm); > return true; > } > > This will be easiest if you ever support the big-endian version of RX. OK. fixed another way. But RX big-endian mode only data access. So operand value always little-endian order. > > +/* push rs */ > > +static bool trans_PUSH_r(DisasContext *ctx, arg_PUSH_r *a) > > +{ > > + if (a->rs != 0) { > > + tcg_gen_subi_i32(cpu_regs[0], cpu_regs[0], 4); > > + rx_gen_ldst(a->sz, RX_MEMORY_ST, cpu_regs[a->rs], cpu_regs[0]); > > + } else { > > + tcg_gen_mov_i32(ctx->src, cpu_regs[a->rs]); > > + tcg_gen_subi_i32(cpu_regs[0], cpu_regs[0], 4); > > + rx_gen_ldst(a->sz, RX_MEMORY_ST, ctx->src, cpu_regs[0]); > > + } > > + return true; > > As far as I can see the THEN and ELSE cases have identical operation. It little different. In the case of r0, the value before decrementing is stored in memory. I added comment. > > +static bool trans_XCHG_rl(DisasContext *ctx, arg_XCHG_rl *a) > > +{ > > + int sz; > > + TCGv tmp; > > + tmp = tcg_temp_new(); > > + if (a->ld == 3) { > > + /* xchg rs,rd */ > > + tcg_gen_mov_i32(tmp, cpu_regs[a->rs]); > > + tcg_gen_mov_i32(cpu_regs[a->rs], cpu_regs[a->rd]); > > + tcg_gen_mov_i32(cpu_regs[a->rd], tmp); > > + } else { > > + switch (a->mi) { > > + case 0 ... 2: > > + rx_index_addr(a->ld, a->mi, a->rs, ctx); > > + rx_gen_ldst(a->mi, RX_MEMORY_LD, tmp, ctx->src); > > + sz = a->mi; > > + break; > > + case 3: > > + rx_index_addr(a->ld, RX_MEMORY_WORD, a->rs, ctx); > > + rx_gen_ldu(RX_MEMORY_WORD, tmp, ctx->src); > > + sz = RX_MEMORY_WORD; > > + break; > > + case 4: > > + rx_index_addr(a->ld, RX_MEMORY_BYTE, a->rs, ctx); > > + rx_gen_ldu(RX_MEMORY_BYTE, tmp, ctx->src); > > + sz = RX_MEMORY_BYTE; > > + break; > > + } > > + rx_gen_ldst(sz, RX_MEMORY_ST, cpu_regs[a->rd], ctx->src); > > + tcg_gen_mov_i32(cpu_regs[a->rd], tmp); > > + } > > While I doubt that there is an SMP RX cpu, I think this should still implement > an atomic xchg operation. This will allow rx-linux-user to run on SMP host > cpus. Thus use > > static inline TCGMemOp mi_to_mop(unsigned mi) > { > static const TCGMemOp mop[5] = { MO_SB, MO_SW, MO_UL, MO_UH, MO_UB }; > tcg_debug_assert(mi < 5); > return mop[mi]; > } > > tcg_gen_atomic_xchg_i32(cpu_regs[a->rd], ctx->src, cpu_regs[a->rd], > 0, mi_to_mop(a->mi)); > OK. > > > +#define STCOND(cond) \ > > + do { \ > > + tcg_gen_movi_i32(ctx->imm, a->imm); \ > > + tcg_gen_movi_i32(ctx->one, 1); \ > > + tcg_gen_movcond_i32(cond, cpu_regs[a->rd], cpu_psw_z, \ > > + ctx->one, ctx->imm, cpu_regs[a->rd]); \ > > + return true; \ > > + } while(0) > > Unused? This seems close to what you wanted instead of... Yes. Removed. > > +#define STZFN(maskop) \ > > + do { \ > > + TCGv mask, imm; \ > > + mask = tcg_temp_new(); \ > > + imm = tcg_temp_new(); \ > > + maskop; \ > > + tcg_gen_andi_i32(imm, mask, a->imm); \ > > + tcg_gen_andc_i32(cpu_regs[a->rd], cpu_regs[a->rd], mask); \ > > + tcg_gen_or_i32(cpu_regs[a->rd], cpu_regs[a->rd], imm); \ > > + tcg_temp_free(mask); \ > > + tcg_temp_free(imm); \ > > + return true; \ > > + } while(0) > > ... this. > > TCGv zero = tcg_const_i32(0); > TCGv imm = tcg_const_i32(a->imm); > tcg_gen_movcond(COND, cpu_regs[a->rd], cpu_psw_z, zero, > imm, cpu_regs[a->rd]); > > where trans_STZ passes TCG_COND_NE and STNZ passes TCG_COND_EQ. > > In addition, there's no point in using a #define when a function would work > just as well. OK. fixed. > > +#define GEN_LOGIC_OP(opr, ret, arg1, arg2) \ > > + do { \ > > + opr(ret, arg1, arg2); \ > > + tcg_gen_setcondi_i32(TCG_COND_EQ, cpu_psw_z, ret, 0); \ > > + tcg_gen_setcondi_i32(TCG_COND_GEU, cpu_psw_s, ret, 0x80000000UL); \ > > + } while(0) > > Note that the second is the same as > > tcg_gen_setcondi_i32(TCG_COND_LT, cpu_psw_s, ret, 0); > or > tcg_gen_shri_i32(cpu_psw_s, ret, 31); > > That said, this extra computation is exactly why ARM (and others) represent > the > flag differently in the field. The inline setting would be > > tcg_gen_mov_i32(cpu_psw_z, ret); > tcg_gen_mov_i32(cpu_psw_s, ret); > > And then whenever we *use* these flags, we only look at the sign bit of S, and > we see if the whole of Z is zero/non-zero. Similarly, the overflow flag is > also stored in the sign bit of O, so that is computed more easily: > > O = (ret ^ in1) & ~(in1 & in2) > > with no right-shift required at the end. > > This also means that we've reduced the amount of inline computation to the > point that there is no point in *not* doing the computation every time it is > required, and so update_psw_o and cpu->psw_v[] goes away. OK. fixed psw operation. > Again, no reason for a macro when a function will do. > They are easier to debug. OK. > > + rs = (a->rs2 < 16) ? a->rs2 : a->rd; > > Is this due to > > > @b2_rds_uimm4 .... .... imm:4 rd:4 &rri rs2=255 len=2 > > this? A better decode is > > @b2_rds_uimm4 .... .... imm:4 rd:4 &rri rs2=%b2_r_0 len=2 > > i.e. extract the rd field twice. Once directly into rd and again from the > same > bits into rs2. OK. It works fine. > > +/* revw rs, rd */ > > +static bool trans_REVW(DisasContext *ctx, arg_REVW *a) > > +{ > > + TCGv hi, lo; > > + > > + hi = tcg_temp_new(); > > + lo = tcg_temp_new(); > > + tcg_gen_shri_i32(hi, cpu_regs[a->rs], 16); > > + tcg_gen_bswap16_i32(hi, hi); > > + tcg_gen_shli_i32(hi, hi, 16); > > + tcg_gen_bswap16_i32(lo, cpu_regs[a->rs]); > > + tcg_gen_or_i32(cpu_regs[a->rd], hi, lo); > > + tcg_temp_free(hi); > > + tcg_temp_free(lo); > > + return true; > > +} > > The bswap16 primitive requires the input to be zero-extended. > Thus this second bswap16 may fail. Perhaps better as > > T0 = 0x00ff00ff; > T1 = RS & T0; > T2 = RS >> 8; > T1 = T1 << 8; > T2 = T2 & T0; > RD = T1 | T2; OK. > > +/* conditional branch helper */ > > +static void rx_bcnd_main(DisasContext *ctx, int cd, int dst, int len) > > +{ > > + TCGv t, f, cond; > > + switch(cd) { > > + case 0 ... 13: > > Indentation is off? checkpatch.pl said "switch and case should be at the same indent". > > +static bool trans_BCnd_s(DisasContext *ctx, arg_BCnd_s *a) > > +{ > > + if (a->dsp < 3) > > + a->dsp += 8; > > This bit should probably be in the decode, via !function. OK. > > +static int16_t rev16(uint16_t dsp) > > +{ > > + return ((dsp << 8) & 0xff00) | ((dsp >> 8) & 0x00ff); > > +} > > + > > +static int32_t rev24(uint32_t dsp) > > +{ > > + dsp = ((dsp << 16) & 0xff0000) | > > + (dsp & 0x00ff00) | > > + ((dsp >> 16) & 0x0000ff); > > + dsp |= (dsp & 0x00800000) ? 0xff000000 : 0x00000000; > > + return dsp; > > +} > > These could also be in the decode, probably with %some_field. > But as with MOV above, maybe ought to be done via an explicit > call to li(), in order to get a (potential) big-endian RX to work. This field always little-endian. So it needed. > Also, watch out for CODING_STYLE violations. > Please recheck everything with ./scripts/checkpatch.pl. OK. > > +/* mvtachi rs */ > > +static bool trans_MVTACHI(DisasContext *ctx, arg_MVTACHI *a) > > +{ > > + TCGv_i32 hi, lo; > > + hi = tcg_temp_new_i32(); > > + lo = tcg_temp_new_i32(); > > + tcg_gen_extr_i64_i32(lo, hi, cpu_acc); > > + tcg_gen_concat_i32_i64(cpu_acc, lo, cpu_regs[a->rs]); > > + tcg_temp_free_i32(hi); > > + tcg_temp_free_i32(lo); > > + return true; > > +} > > Hmm. How about > > TCGv_i64 rs64 = tcg_temp_new_i64(); > tcg_gen_extu_i32_i64(rs64, cpu_regs[a->rs]); > tcg_gen_deposit_i64(cpu_acc, cpu_acc, rs64, 32, 32); OK. Fixed it. > > +static inline void clrsetpsw(int dst, int mode) > > +{ > > + TCGv psw[] = { > > + cpu_psw_c, cpu_psw_z, cpu_psw_s, cpu_psw_o, > > + NULL, NULL, NULL, NULL, > > + cpu_psw_i, cpu_psw_u, NULL, NULL, > > + NULL, NULL, NULL, NULL > > + }; > > + TCGLabel *skip; > > + > > + skip = gen_new_label(); > > + if (dst >= 8) > > + tcg_gen_brcondi_i32(TCG_COND_NE, cpu_psw_pm, 0, skip); > > + tcg_gen_movi_i32(psw[dst], mode); > > + if (dst == 3) > > + tcg_gen_movi_i32(cpu_pswop, RX_PSW_OP_NONE); > > + gen_set_label(skip); > > +} > > When clearing psw_i, you'll need to exit to the main loop so that any pending > interrupts are recognized. Otherwise you risk going around in a loop and > never > delivering the interrupt at all. OK. > > +/* rtfi */ > > +static bool trans_RTFI(DisasContext *ctx, arg_RTFI *a) > > +{ > > + check_previleged(); > > + tcg_gen_mov_i32(cpu_pc, cpu_bpc); > > + tcg_gen_mov_i32(cpu_psw, cpu_bpsw); > > + gen_helper_unpack_psw(cpu_env); > > + ctx->base.is_jmp = DISAS_JUMP; > > + return true; > > +} > > Likewise, you need to exit to the main loop, because psw_i changed. OK. All psw_i changed instruction exit to loop. > > + cpu_pswop = tcg_global_mem_new_i32(cpu_env, > > + offsetof(CPURXState, psw_op), ""); > > + for (i = 0; i < 3; i++) { > > + cpu_pswop_v[i] = tcg_global_mem_new_i32(cpu_env, > > + offsetof(CPURXState, > > psw_v[i]), ""); > > Using the empty string for the name of a temp is cruel. > That'll be very hard to debug. OK. Removed this variable. > > +uint32_t psw_cond(CPURXState *env, uint32_t cond) > > In general this can be inlined fairly easily. > See, for instance, target/arm/translate.c, arm_test_cc. OK. It very helpful. I reworked psw operation. > > +uint32_t helper_div(CPURXState *env, uint32_t num, uint32_t den) > > +{ > > + uint32_t ret = num; > > + if (den != 0) { > > + ret = (int32_t)num / (int32_t)den; > > + } > > + env->psw_o = (ret == 0 || den == 0); > > That is not the correct overflow condition. > ret == 0 is irrelevant, but INT_MIN / -1 is relevant. > Yes. It my mistake. Fixed it. > > Finally, this was way too big to review properly. This patch set will need to > be broken up into smaller pieces. Perhaps introducing just one, or a few > related instructions with each patch. > Thanks. I reworked your comment. I will send v4 patch. > > r~ > -- Yosinori Sato