Re: [Qemu-devel] [PATCH RFC v3 00/11] Add RX archtecture support

Yoshinori Sato Wed, 20 Mar 2019 07:05:46 -0700

On Fri, 08 Mar 2019 10:24:23 +0900,
Richard Henderson wrote:
> 
> On 3/1/19 10:21 PM, Yoshinori Sato wrote:
> > My git repository is bellow.
> > git://git.pf.osdn.net/gitroot/y/ys/ysato/qemu.git
> 
> Somehow patch 1 did not arrive, so I am reviewing based on
> rebasing this branch against master, and then looking at the diff.
> 
> > +struct CCop;
> 
> Unused?


Yes. I forgot remove.
Remove it.

> > +static inline void cpu_get_tb_cpu_state(CPURXState *env, target_ulong *pc,
> > +                                        target_ulong *cs_base, uint32_t 
> > *flags)
> > +{
> > +    *pc = env->pc;
> > +    *cs_base = 0;
> > +    *flags = 0;
> > +}
> 
> *flags should contain PSW[PM], I think, so that knowledge of the
> privilege level is given to the translator.
> 
> Looking forward I see that you're currently testing cpu_psw_pm dynamically for
> instructions that require privs, so what you have is not wrong, but this is
> exactly the sort of thing that TB flags are for.

OK.
Update PSW.PM copy to DisasContextBase.flags.

> > +#define RX_PSW_OP_NEG 4
> 
> Unused?

Yes. Remove it.

> > +#define RX_BYTE 0
> > +#define RX_WORD 1
> > +#define RX_LONG 2
> 
> Redundant with TCGMemOps: MO_8, MO_16, MO_32?

OK. Convert it.

> > +++ b/target/rx/insns.decode
> 
> Should have a copyright + license notice.
> 
> > +BCnd_b         0010 cd:4 dsp:8                         &bcnd
> ...
> > +#BRA_b         0010 1110 dsp:8         # overlaps BCnd_b
> 
> FYI, using pattern groups this can be written
> 
> {
>   BRA_b        0010 1110 dsp:8
>   Bcnd_b       0010 cd:4 dsp:8
> }
> 
> I expect to have pattern groups merged this week.

OK. Update pattern groups.

> > +static void rx_tr_init_disas_context(DisasContextBase *dcbase, CPUState 
> > *cs)
> > +{
> > +    DisasContext *ctx = container_of(dcbase, DisasContext, base);
> > +
> > +    ctx->src = tcg_temp_new();
> > +}
> 
> This allocation will not survive across a branch or label.
> So, after any SHIFTR_REG, for instance.
> 
> I think you need to allocate this on demand each insn, and
> free it as necessary after each insn.
> 
> (Although avoiding branches in tcg is always helpful.)

OK. Remove it.

> > +/* generic load / store wrapper */
> > +static inline void rx_gen_ldst(unsigned int size, unsigned int dir,
> > +                        TCGv reg, TCGv mem)
> > +{
> > +    if (dir) {
> > +        tcg_gen_qemu_ld_i32(reg, mem, 0, size | MO_SIGN | MO_TE);
> > +    } else {
> > +        tcg_gen_qemu_st_i32(reg, mem, 0, size | MO_TE);
> > +    }
> > +}
> 
> It would probably be worthwhile to split this function, and drop the "dir"
> parameter.  It is always a constant, so instead of
> 
>   rx_gen_ldst(mi, RX_MEMORY_LD, ctx->src, ctx->src);
>   rx_gen_ldst(a->sz, RX_MEMORY_ST, cpu_regs[a->rs], ctx->src);
> 
> use the shorter
> 
>   rx_gen_ld(mi, ctx->src, ctx->src);
>   rx_gen_st(a->sz, cpu_regs[a->rs], ctx->src);

OK. fixed

> 
> > +/* mov.l #uimm4,rd */
> > +/* mov.l #uimm8,rd */
> > +static bool trans_MOV_ri(DisasContext *ctx, arg_MOV_ri *a)
> > +{
> > +    tcg_gen_movi_i32(cpu_regs[a->rd], a->imm & 0xff);
> > +    return true;
> > +}
> 
> a->imm will already have been masked by the decode.
> You can drop the & 0xff here.

That was right.
It seems that I was misunderstood.

> > +/* mov.l #imm,rd */
> > +static bool trans_MOV_rli(DisasContext *ctx, arg_MOV_rli *a)
> > +{
> > +    tcg_gen_movi_i32(cpu_regs[a->rd], a->imm);
> > +    return true;
> > +}
> 
> As written, this function is redundant.  We should be using the same MOV_ri,
> with the immediate loaded from %b2_li_2.
> 
> Likewise for trans_MOV_mi vs trans_MOV_mli.  That said...

OK. Unified various interger instructions.

> > +static bool trans_MOV_mli(DisasContext *ctx, arg_MOV_mli *a)
> > +{
> > +    TCGv imm = tcg_const_i32(a->imm);
> > +    if (a->ld == 2) {
> > +        a->dsp = bswap_16(a->dsp);
> > +    }
> 
> This suggests that the decode is incorrect.  Or perhaps the construction of 
> the
> 32-bit insn passed into decode.  In decode_load_bytes, we construct a
> big-endian value, so it would seem this dsp field should be loaded as a
> little-endian value.
> 
> This could be fixed by not attempting to load the LI constant in decodetree
> (for this insn), which would in turn not require that you decode the LD 
> operand
> by hand in decodetree.  E.g.
> 
> static bool trans_MOV_mli(DisasContext *ctx, arg_MOV_mli *a)
> {
>     TCGv imm;
> 
>     /* dsp operand comes before immediate operand */
>     rx_index_addr(a->ld, a->sz, a->rd, s);
>     imm = tcg_const_i32(li(s, a->li));
>     rx_gen_st(a->sz, imm, ctx->src);
>     tcg_temp_free_i32(imm);
>     return true;
> }
> 
> This will be easiest if you ever support the big-endian version of RX.

OK. fixed another way.
But RX big-endian mode only data access.
So operand value always little-endian order.

> > +/* push rs */
> > +static bool trans_PUSH_r(DisasContext *ctx, arg_PUSH_r *a)
> > +{
> > +    if (a->rs != 0) {
> > +        tcg_gen_subi_i32(cpu_regs[0], cpu_regs[0], 4);
> > +        rx_gen_ldst(a->sz, RX_MEMORY_ST, cpu_regs[a->rs], cpu_regs[0]);
> > +    } else {
> > +        tcg_gen_mov_i32(ctx->src, cpu_regs[a->rs]);
> > +        tcg_gen_subi_i32(cpu_regs[0], cpu_regs[0], 4);
> > +        rx_gen_ldst(a->sz, RX_MEMORY_ST, ctx->src, cpu_regs[0]);
> > +    }
> > +    return true;
> 
> As far as I can see the THEN and ELSE cases have identical operation.

It little different.
In the case of r0, the value before decrementing is stored in memory.
I added comment.

> > +static bool trans_XCHG_rl(DisasContext *ctx, arg_XCHG_rl *a)
> > +{
> > +    int sz;
> > +    TCGv tmp;
> > +    tmp = tcg_temp_new();
> > +    if (a->ld == 3) {
> > +       /* xchg rs,rd */
> > +        tcg_gen_mov_i32(tmp, cpu_regs[a->rs]);
> > +        tcg_gen_mov_i32(cpu_regs[a->rs], cpu_regs[a->rd]);
> > +        tcg_gen_mov_i32(cpu_regs[a->rd], tmp);
> > +    } else {
> > +        switch (a->mi) {
> > +        case 0 ... 2:
> > +            rx_index_addr(a->ld, a->mi, a->rs, ctx);
> > +            rx_gen_ldst(a->mi, RX_MEMORY_LD, tmp, ctx->src);
> > +            sz = a->mi;
> > +            break;
> > +        case 3:
> > +            rx_index_addr(a->ld, RX_MEMORY_WORD, a->rs, ctx);
> > +            rx_gen_ldu(RX_MEMORY_WORD, tmp, ctx->src);
> > +            sz = RX_MEMORY_WORD;
> > +            break;
> > +        case 4:
> > +            rx_index_addr(a->ld, RX_MEMORY_BYTE, a->rs, ctx);
> > +            rx_gen_ldu(RX_MEMORY_BYTE, tmp, ctx->src);
> > +            sz = RX_MEMORY_BYTE;
> > +            break;
> > +        }
> > +        rx_gen_ldst(sz, RX_MEMORY_ST, cpu_regs[a->rd], ctx->src);
> > +        tcg_gen_mov_i32(cpu_regs[a->rd], tmp);
> > +    }
> 
> While I doubt that there is an SMP RX cpu, I think this should still implement
> an atomic xchg operation.  This will allow rx-linux-user to run on SMP host
> cpus.  Thus use
> 
> static inline TCGMemOp mi_to_mop(unsigned mi)
> {
>     static const TCGMemOp mop[5] = { MO_SB, MO_SW, MO_UL, MO_UH, MO_UB };
>     tcg_debug_assert(mi < 5);
>     return mop[mi];
> }
> 
>   tcg_gen_atomic_xchg_i32(cpu_regs[a->rd], ctx->src, cpu_regs[a->rd],
>                           0, mi_to_mop(a->mi));
>

OK.

> 
> > +#define STCOND(cond)                                                    \
> > +    do {                                                                \
> > +        tcg_gen_movi_i32(ctx->imm, a->imm);            \
> > +        tcg_gen_movi_i32(ctx->one, 1);                                  \
> > +        tcg_gen_movcond_i32(cond, cpu_regs[a->rd], cpu_psw_z,           \
> > +                            ctx->one, ctx->imm, cpu_regs[a->rd]);       \
> > +        return true;                                                    \
> > +    } while(0)
> 
> Unused?  This seems close to what you wanted instead of...

Yes. Removed.

> > +#define STZFN(maskop)                                                   \
> > +    do {                                                                \
> > +        TCGv mask, imm;                                                 \
> > +        mask = tcg_temp_new();                                          \
> > +        imm = tcg_temp_new();                                           \
> > +        maskop;                                                         \
> > +        tcg_gen_andi_i32(imm, mask, a->imm);                            \
> > +        tcg_gen_andc_i32(cpu_regs[a->rd], cpu_regs[a->rd], mask);       \
> > +        tcg_gen_or_i32(cpu_regs[a->rd], cpu_regs[a->rd], imm);          \
> > +        tcg_temp_free(mask);                                            \
> > +        tcg_temp_free(imm);                                             \
> > +        return true;                                                    \
> > +    } while(0)
> 
> ... this.
> 
>   TCGv zero = tcg_const_i32(0);
>   TCGv imm = tcg_const_i32(a->imm);
>   tcg_gen_movcond(COND, cpu_regs[a->rd], cpu_psw_z, zero,
>                   imm, cpu_regs[a->rd]);
> 
> where trans_STZ passes TCG_COND_NE and STNZ passes TCG_COND_EQ.
> 
> In addition, there's no point in using a #define when a function would work
> just as well.

OK. fixed.

> > +#define GEN_LOGIC_OP(opr, ret, arg1, arg2) \
> > +    do {                                                                \
> > +        opr(ret, arg1, arg2);                                       \
> > +        tcg_gen_setcondi_i32(TCG_COND_EQ, cpu_psw_z, ret, 0);           \
> > +        tcg_gen_setcondi_i32(TCG_COND_GEU, cpu_psw_s, ret, 0x80000000UL); \
> > +    } while(0)
> 
> Note that the second is the same as
> 
>   tcg_gen_setcondi_i32(TCG_COND_LT, cpu_psw_s, ret, 0);
> or
>   tcg_gen_shri_i32(cpu_psw_s, ret, 31);
> 
> That said, this extra computation is exactly why ARM (and others) represent 
> the
> flag differently in the field.  The inline setting would be
> 
>   tcg_gen_mov_i32(cpu_psw_z, ret);
>   tcg_gen_mov_i32(cpu_psw_s, ret);
> 
> And then whenever we *use* these flags, we only look at the sign bit of S, and
> we see if the whole of Z is zero/non-zero.  Similarly, the overflow flag is
> also stored in the sign bit of O, so that is computed more easily:
> 
>   O = (ret ^ in1) & ~(in1 & in2)
> 
> with no right-shift required at the end.
> 
> This also means that we've reduced the amount of inline computation to the
> point that there is no point in *not* doing the computation every time it is
> required, and so update_psw_o and cpu->psw_v[] goes away.

OK. fixed psw operation.

> Again, no reason for a macro when a function will do.
> They are easier to debug.

OK.

> > +    rs = (a->rs2 < 16) ? a->rs2 : a->rd;
> 
> Is this due to
> 
> > @b2_rds_uimm4   .... .... imm:4 rd:4            &rri rs2=255 len=2
> 
> this?  A better decode is
> 
> @b2_rds_uimm4   .... .... imm:4 rd:4    &rri rs2=%b2_r_0 len=2
> 
> i.e. extract the rd field twice.  Once directly into rd and again from the 
> same
> bits into rs2.

OK. It works fine.

> > +/* revw rs, rd */
> > +static bool trans_REVW(DisasContext *ctx, arg_REVW *a)
> > +{
> > +    TCGv hi, lo;
> > +
> > +    hi = tcg_temp_new();
> > +    lo = tcg_temp_new();
> > +    tcg_gen_shri_i32(hi, cpu_regs[a->rs], 16);
> > +    tcg_gen_bswap16_i32(hi, hi);
> > +    tcg_gen_shli_i32(hi, hi, 16);
> > +    tcg_gen_bswap16_i32(lo, cpu_regs[a->rs]);
> > +    tcg_gen_or_i32(cpu_regs[a->rd], hi, lo);
> > +    tcg_temp_free(hi);
> > +    tcg_temp_free(lo);
> > +    return true;
> > +}
> 
> The bswap16 primitive requires the input to be zero-extended.
> Thus this second bswap16 may fail.  Perhaps better as
> 
>   T0 = 0x00ff00ff;
>   T1 = RS & T0;
>   T2 = RS >> 8;
>   T1 = T1 << 8;
>   T2 = T2 & T0;
>   RD = T1 | T2;

OK.

> > +/* conditional branch helper */
> > +static void rx_bcnd_main(DisasContext *ctx, int cd, int dst, int len)
> > +{
> > +        TCGv t, f, cond;
> > +        switch(cd) {
> > +        case 0 ... 13:
> 
> Indentation is off?

checkpatch.pl said "switch and case should be at the same indent".

> > +static bool trans_BCnd_s(DisasContext *ctx, arg_BCnd_s *a)
> > +{
> > +    if (a->dsp < 3)
> > +        a->dsp += 8;
> 
> This bit should probably be in the decode, via !function.

OK.

> > +static int16_t rev16(uint16_t dsp)
> > +{
> > +    return ((dsp << 8) & 0xff00) | ((dsp >> 8) & 0x00ff);
> > +}
> > +
> > +static int32_t rev24(uint32_t dsp)
> > +{
> > +    dsp = ((dsp << 16) & 0xff0000) |
> > +        (dsp & 0x00ff00) |
> > +        ((dsp >> 16) & 0x0000ff);
> > +    dsp |= (dsp & 0x00800000) ? 0xff000000 : 0x00000000;
> > +    return dsp;
> > +}
> 
> These could also be in the decode, probably with %some_field.
> But as with MOV above, maybe ought to be done via an explicit
> call to li(), in order to get a (potential) big-endian RX to work.

This field always little-endian. So it needed.

> Also, watch out for CODING_STYLE violations.
> Please recheck everything with ./scripts/checkpatch.pl.

OK.

> > +/* mvtachi rs */
> > +static bool trans_MVTACHI(DisasContext *ctx, arg_MVTACHI *a)
> > +{
> > +    TCGv_i32 hi, lo;
> > +    hi = tcg_temp_new_i32();
> > +    lo = tcg_temp_new_i32();
> > +    tcg_gen_extr_i64_i32(lo, hi, cpu_acc);
> > +    tcg_gen_concat_i32_i64(cpu_acc, lo, cpu_regs[a->rs]);
> > +    tcg_temp_free_i32(hi);
> > +    tcg_temp_free_i32(lo);
> > +    return true;
> > +}
> 
> Hmm.  How about
> 
>    TCGv_i64 rs64 = tcg_temp_new_i64();
>    tcg_gen_extu_i32_i64(rs64, cpu_regs[a->rs]);
>    tcg_gen_deposit_i64(cpu_acc, cpu_acc, rs64, 32, 32);

OK. Fixed it.

> > +static inline void clrsetpsw(int dst, int mode)
> > +{
> > +    TCGv psw[] = {
> > +        cpu_psw_c, cpu_psw_z, cpu_psw_s, cpu_psw_o,
> > +        NULL, NULL, NULL, NULL,
> > +        cpu_psw_i, cpu_psw_u, NULL, NULL,
> > +        NULL, NULL, NULL, NULL
> > +    };
> > +    TCGLabel *skip;
> > +
> > +    skip = gen_new_label();
> > +    if (dst >= 8)
> > +        tcg_gen_brcondi_i32(TCG_COND_NE, cpu_psw_pm, 0, skip);
> > +    tcg_gen_movi_i32(psw[dst], mode);
> > +    if (dst == 3)
> > +        tcg_gen_movi_i32(cpu_pswop, RX_PSW_OP_NONE);
> > +    gen_set_label(skip);
> > +}
> 
> When clearing psw_i, you'll need to exit to the main loop so that any pending
> interrupts are recognized.  Otherwise you risk going around in a loop and 
> never
> delivering the interrupt at all.

OK.

> > +/* rtfi */
> > +static bool trans_RTFI(DisasContext *ctx, arg_RTFI *a)
> > +{
> > +    check_previleged();
> > +    tcg_gen_mov_i32(cpu_pc, cpu_bpc);
> > +    tcg_gen_mov_i32(cpu_psw, cpu_bpsw);
> > +    gen_helper_unpack_psw(cpu_env);
> > +    ctx->base.is_jmp = DISAS_JUMP;
> > +    return true;
> > +}
> 
> Likewise, you need to exit to the main loop, because psw_i changed.

OK. All psw_i changed instruction exit to loop.

> > +    cpu_pswop = tcg_global_mem_new_i32(cpu_env,
> > +                                     offsetof(CPURXState, psw_op), "");
> > +    for (i = 0; i < 3; i++) {
> > +        cpu_pswop_v[i] = tcg_global_mem_new_i32(cpu_env,
> > +                                                offsetof(CPURXState, 
> > psw_v[i]), "");
> 
> Using the empty string for the name of a temp is cruel.
> That'll be very hard to debug.

OK. Removed this variable.

> > +uint32_t psw_cond(CPURXState *env, uint32_t cond)
> 
> In general this can be inlined fairly easily.
> See, for instance, target/arm/translate.c, arm_test_cc.

OK. It very helpful.
I reworked psw operation.

> > +uint32_t helper_div(CPURXState *env, uint32_t num, uint32_t den)
> > +{
> > +    uint32_t ret = num;
> > +    if (den != 0) {
> > +        ret = (int32_t)num / (int32_t)den;
> > +    }
> > +    env->psw_o = (ret == 0 || den == 0);
> 
> That is not the correct overflow condition.
> ret == 0 is irrelevant, but INT_MIN / -1 is relevant.
>

Yes. It my mistake.
Fixed it.

> 
> Finally, this was way too big to review properly.  This patch set will need to
> be broken up into smaller pieces.  Perhaps introducing just one, or a few
> related instructions with each patch.
>

Thanks.
I reworked your comment.
I will send v4 patch.

> 
> r~
> 

-- 
Yosinori Sato

Re: [Qemu-devel] [PATCH RFC v3 00/11] Add RX archtecture support

Reply via email to