On 4/9/24 06:43, Paolo Bonzini wrote:
+static void gen_ARPL(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
+{
+ TCGLabel *label1 = gen_new_label();
+ TCGv rpl_adj = tcg_temp_new();
+ TCGv flags = tcg_temp_new();
+
+ gen_mov_eflags(s, flags);
+ tcg_gen_andi_tl(flags, flags, ~CC_Z);
+
+ /* Compute dest[rpl] - src[rpl], adjust if result <0. */
+ tcg_gen_andi_tl(rpl_adj, s->T0, 3);
+ tcg_gen_andi_tl(s->T1, s->T1, 3);
+ tcg_gen_sub_tl(rpl_adj, rpl_adj, s->T1);
+
+ tcg_gen_brcondi_tl(TCG_COND_LT, rpl_adj, 0, label1);
Comment is right, but branch condition is wrong.
I think this might be better as:
/* SRC = DST with SRC[RPL] */
tcg_gen_deposit_tl(s->T1, s->T0, s->T1, 0, 2);
/* Z flag set if DST < SRC */
tcg_gen_setcond_tl(TCG_COND_LTU, tmp, s->T0, s->T1);
/* Install Z */
tcg_gen_deposit_tl(flags, flags, tmp, ctz(CC_Z), 1);
/* DST with maximum RPL */
tcg_gen_umax_tl(s->T0, s->T0, s->T1);
+ case MO_32:
+#ifdef TARGET_X86_64
+ /*
+ * This could also use the same algorithm as MO_16. It produces fewer
+ * TCG ops and better code if flags are needed, but it requires a
64-bit
+ * multiply even if they are not (and thus the high part of the
multiply
+ * is dead).
+ */
Is 64-bit multiply ever slower these days?
My intuition says "slow" multiply is at least a decade out of date.
+ tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
+ tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
Avoid s->tmp*, especially in new code.
+ tcg_gen_muls2_i32(s->tmp2_i32, s->tmp3_i32,
+ s->tmp2_i32, s->tmp3_i32);
+ tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32);
+
+ cc_src_rhs = tcg_temp_new();
+ tcg_gen_extu_i32_tl(cc_src_rhs, s->tmp3_i32);
+ /* Compare the high part to the sign bit of the truncated result */
+ tcg_gen_negsetcondi_i32(TCG_COND_LT, s->tmp2_i32, s->tmp2_i32, 0);
This seems like something the optimizer should handle, but doesn't.
I'd write this as
tcg_gen_sari_i32(tmp, tmp, 31);
or
tcg_gen_sextract_i32(tmp, tmp, 31, 1);
which I know will expand to the same thing.
+ case MO_64:
+#endif
+ cc_src_rhs = tcg_temp_new();
+ tcg_gen_muls2_tl(s->T0, cc_src_rhs, s->T0, s->T1);
+ /* Compare the high part to the sign bit of the truncated result */
+ tcg_gen_negsetcondi_tl(TCG_COND_LT, s->T1, s->T0, 0);
Similarly.
r~