date:20211120

Re: [PATCH v11 06/26] target/loongarch: Add fixed point bit instruction translation

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+static bool gen_rr(DisasContext *ctx, arg_rr *a,
+   DisasExtend src_ext, DisasExtend dst_ext,
+   void (*func)(TCGv, TCGv))
+{
+TCGv dest = gpr_dst(ctx, a->rd, dst_ext);
+TCGv src1 = gpr_src(ctx, a->rj, src_ext);
+
+func(dest, src1);
+
+if (dst_ext) {
+gen_set_gpr(a->rd, dest, dst_ext);
+}


Again, I think you should call gen_set_gpr unconditionally.


+static bool trans_bytepick_w(DisasContext *ctx, arg_bytepick_w *a)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+
+tcg_gen_concat_tl_i64(dest, src1, src2);
+tcg_gen_sextract_i64(dest, dest, (32 - (a->sa) * 8), 32);
+
+return true;
+}


Better to use gen_rrr_sa.


+static bool trans_bytepick_d(DisasContext *ctx, arg_bytepick_d *a)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+
+tcg_gen_extract2_i64(dest, src1, src2, (64 - (a->sa) * 8));
+return true;
+}


Likewise.


+static void gen_ctz_w(TCGv dest, TCGv src1)
+{
+tcg_gen_ori_tl(dest, src1, (target_ulong)MAKE_64BIT_MASK(32, 32));
+tcg_gen_ctzi_tl(dest, dest, 32);


This should be TARGET_LONG_BITS.  It will never happen, because the value is not zero per 
the OR, but it's what is most efficient for a tcg backend that naturally produces 
TARGET_LONG_BITS for a TL-sized ctz.



+}
+
+static void gen_cto_w(TCGv dest, TCGv src1)
+{
+tcg_gen_not_tl(dest, src1);
+tcg_gen_ext32u_tl(dest, dest);
+gen_ctz_w(dest, dest);
+}


The EXT32U here is useless, as the OR within gen_ctz_w overrides it.


+&rr_2bw   rd rj msbw lsbw
+&rr_2bd   rd rj msbd lsbd


Merge these.


r~

Re: [PATCH v11 07/26] target/loongarch: Add fixed point load/store instruction translation

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

  DEF_HELPER_FLAGS_1(bitswap, TCG_CALL_NO_RWG_SE, tl, tl)
+
+DEF_HELPER_3(asrtle_d, void, env, tl, tl)
+DEF_HELPER_3(asrtgt_d, void, env, tl, tl)


Use DEF_HELPER_FLAGS_3 and TCG_CALL_NO_WG.

(They do not write globals, but do implicitly read them via the exception path, and of 
course the exception is a side-effect.)



+static bool gen_load_gt(DisasContext *ctx, arg_rrr *a, MemOp mop)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+TCGv addr = tcg_temp_new();
+
+gen_helper_asrtgt_d(cpu_env, src1, src2);
+tcg_gen_add_tl(addr, src1, src2);
+tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);


This add is incorrect.  The address is rj (src1).
Likewise for the rest of the bound check memory ops.


+static bool gen_ldptr(DisasContext *ctx, arg_rr_i *a, MemOp mop)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv temp = NULL;
+
+if (a->imm) {
+temp = tcg_temp_new();
+tcg_gen_addi_tl(temp, addr, a->imm << 2);
+addr = temp;
+}


I think it would be cleaner to have this immediate shift handled by decodetree, so that 
gen_ldptr is dropped in favor of gen_load (and also for stores).  It would also print the 
correct logical offset on the disassembly side.


%i14s2  10:s14 !function=shl_2
@rr_i14s2     .. rj:5 rd:5  &rr_i imm=%i14s2



+/* loongarch sync */
+static void gen_loongarch_sync(int stype)
+{
+TCGBar tcg_mo = TCG_BAR_SC;
+
+switch (stype) {
+case 0x4: /* SYNC_WMB */
+tcg_mo |= TCG_MO_ST_ST;
+break;
+case 0x10: /* SYNC_MB */
+tcg_mo |= TCG_MO_ALL;
+break;
+case 0x11: /* SYNC_ACQUIRE */
+tcg_mo |= TCG_MO_LD_LD | TCG_MO_LD_ST;
+break;
+case 0x12: /* SYNC_RELEASE */
+tcg_mo |= TCG_MO_ST_ST | TCG_MO_LD_ST;
+break;
+case 0x13: /* SYNC_RMB */
+tcg_mo |= TCG_MO_LD_LD;
+break;
+default:
+tcg_mo |= TCG_MO_ALL;
+break;
+}
+
+tcg_gen_mb(tcg_mo);
+}


This is copied from mips, I think.  The only defined hint for dbar is 0.  I think this 
function should be removed, and just emit the tcg barrier directly within trans_dbar.



r~

Re: [PATCH v11 08/26] target/loongarch: Add fixed point atomic instruction translation

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

This includes:
- LL.{W/D}, SC.{W/D}
- AM{SWAP/ADD/AND/OR/XOR/MAX/MIN}[_DB].{W/D}
- AM{MAX/MIN}[_DB].{WU/DU}

Signed-off-by: Song Gao 
Signed-off-by: Xiaojuan Yang 
Reviewed-by: Richard Henderson 
---
  target/loongarch/insn_trans/trans_atomic.c.inc | 130 +
  target/loongarch/insns.decode  |  44 +
  target/loongarch/translate.c   |   1 +
  3 files changed, 175 insertions(+)
  create mode 100644 target/loongarch/insn_trans/trans_atomic.c.inc

diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
new file mode 100644
index 000..96957bb
--- /dev/null
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ */
+
+static bool gen_ll(DisasContext *ctx, arg_rr_i *a,
+   void (*func)(TCGv, TCGv, int))
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv t0 = tcg_temp_new();
+
+tcg_gen_addi_tl(t0, src1, a->imm << 2);


The ll/sc instructions would of course use the same pre-shifted immediate as for 
ldptr/stptr, as discussed wrt the previous patch.



+static bool gen_am(DisasContext *ctx, arg_rrr *a,
+   void (*func)(TCGv, TCGv, TCGv, TCGArg, MemOp),
+   MemOp mop)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv val = gpr_src(ctx, a->rk, EXT_NONE);
+
+if ((a->rd != 0) && ((a->rj == a->rd) || (a->rk == a->rd))) {


BTW, you don't need to overdo it with the parenthesis.

   if (a != b && (c == d || e == f)) {

is fine.


+static bool gen_am_db(DisasContext *ctx, arg_rrr *a,
+  void (*func)(TCGv, TCGv, TCGv, TCGArg, MemOp),
+  MemOp mop)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv val = gpr_src(ctx, a->rk, EXT_NONE);
+
+if ((a->rd != 0) && ((a->rj == a->rd) || (a->rk == a->rd))) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "Warning: source register overlaps destination register"
+  "in atomic insn at pc=0x" TARGET_FMT_lx "\n",
+  ctx->base.pc_next - 4);
+return false;
+}
+
+gen_loongarch_sync(0x10);
+func(dest, addr, val, ctx->mem_idx, mop);


All tcg atomic ops are sequentially consistent, so you don't need the extra sync, and thus 
this entire function.




r~

Re: [PATCH v11 09/26] target/loongarch: Add fixed point extra instruction translation

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+DEF_HELPER_3(crc32, tl, tl, tl, tl)
+DEF_HELPER_3(crc32c, tl, tl, tl, tl)
+DEF_HELPER_2(cpucfg, tl, env, tl)


DEF_HELPER_FLAGS_N, TCG_CALL_NO_RWG_SE.


+target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
+{
+return env->cpucfg[rj];
+}


The value of the source register should be bounded by ARRAY_SIZE(env->cpucfg); 
out-of-bound indicies read 0.



r~

Re: [PATCH v11 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-20 Thread gaosong


Hi Richard,

On 2021/11/20 下午3:17, Richard Henderson wrote:

On 11/19/21 7:13 AM, Song Gao wrote:

+static bool gen_rrr(DisasContext *ctx, arg_rrr *a,
+    DisasExtend src1_ext, DisasExtend src2_ext,
+    DisasExtend dst_ext, void (*func)(TCGv, TCGv, 
TCGv))

+{
+    TCGv dest = gpr_dst(ctx, a->rd, dst_ext);
+    TCGv src1 = gpr_src(ctx, a->rj, src1_ext);
+    TCGv src2 = gpr_src(ctx, a->rk, src2_ext);
+
+    func(dest, src1, src2);
+
+    /* dst_ext is EXT_NONE and input is dest, We don't run 
gen_set_gpr. */

+    if (dst_ext) {
+    gen_set_gpr(a->rd, dest, dst_ext);
+    }


Why the (incomplete) condition around gen_set_gpr?

I think it's a bug to not name EXT_NONE in the test (just because 
EXT_NONE == 0 now...), but I also think you should not have added the 
test at all.  We will not generate any code in the end within 
gen_set_gpr, but it allows the routines to be self-contained.  You 
shouldn't assume what gpr_dst returned.



You're right, gen_set_gpr not need EXT_NONE at all, and we need not condition 
around gen_set_gpr.
I think that if we know the dst_ext is EXT_NONE, we do't need gen_set_gpr.  
I'll correct them on v12.

Thanks
Song Gao



r~

Re: [PATCH v11 10/26] target/loongarch: Add floating point arithmetic instruction translation

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+static void update_fcsr0_mask(CPULoongArchState *env, uintptr_t pc, int mask)
+{
+int flags = get_float_exception_flags(&env->fp_status);
+
+set_float_exception_flags(0, &env->fp_status);
+
+if (~mask) {
+flags = flags & (~mask);
+}


No need for if -- mask will always be valid.

flags &= ~mask.


+if (!flags) {
+SET_FP_CAUSE(env->fcsr0, flags);
+return;
+}
+
+flags = ieee_ex_to_loongarch(flags);
+SET_FP_CAUSE(env->fcsr0, flags);


It looks like this could be hoisted above the !flags check to unify the two 
statements.



+/* Floating-point helper */
+DEF_HELPER_3(fadd_s, i64, env, i64, i64)
+DEF_HELPER_3(fadd_d, i64, env, i64, i64)
+DEF_HELPER_3(fsub_s, i64, env, i64, i64)
+DEF_HELPER_3(fsub_d, i64, env, i64, i64)
+DEF_HELPER_3(fmul_s, i64, env, i64, i64)
+DEF_HELPER_3(fmul_d, i64, env, i64, i64)
+DEF_HELPER_3(fdiv_s, i64, env, i64, i64)
+DEF_HELPER_3(fdiv_d, i64, env, i64, i64)
+DEF_HELPER_3(fmax_s, i64, env, i64, i64)
+DEF_HELPER_3(fmax_d, i64, env, i64, i64)
+DEF_HELPER_3(fmin_s, i64, env, i64, i64)
+DEF_HELPER_3(fmin_d, i64, env, i64, i64)
+DEF_HELPER_3(fmaxa_s, i64, env, i64, i64)
+DEF_HELPER_3(fmaxa_d, i64, env, i64, i64)
+DEF_HELPER_3(fmina_s, i64, env, i64, i64)
+DEF_HELPER_3(fmina_d, i64, env, i64, i64)
+
+DEF_HELPER_5(fmuladd_s, i64, env, i64, i64, i64, i32)
+DEF_HELPER_5(fmuladd_d, i64, env, i64, i64, i64, i32)
+
+DEF_HELPER_3(fscaleb_s, i64, env, i64, i64)
+DEF_HELPER_3(fscaleb_d, i64, env, i64, i64)
+
+DEF_HELPER_2(flogb_s, i64, env, i64)
+DEF_HELPER_2(flogb_d, i64, env, i64)
+
+DEF_HELPER_2(fsqrt_s, i64, env, i64)
+DEF_HELPER_2(fsqrt_d, i64, env, i64)
+DEF_HELPER_2(frsqrt_s, i64, env, i64)
+DEF_HELPER_2(frsqrt_d, i64, env, i64)
+DEF_HELPER_2(frecip_s, i64, env, i64)
+DEF_HELPER_2(frecip_d, i64, env, i64)


DEF_HELPER_FLAGS_N, TCG_CALL_NO_WG.


r~

Re: [PATCH v11 04/26] target/loongarch: Add fixed point arithmetic instruction translation

2021-11-20 Thread Richard Henderson


On 11/20/21 9:52 AM, gaosong wrote:

You're right, gen_set_gpr not need EXT_NONE at all, and we need not condition 
around gen_set_gpr.
I think that if we know the dst_ext is EXT_NONE, we do't need gen_set_gpr.


But that assumes that gpr_dst did not return a temporary.
I think it's cleaner to assume that gen_set_gpr is needed.


r~

Re: [RFC PATCH v3 0/5] QMP support for cold-plugging devices

2021-11-20 Thread Markus Armbruster

Damien Hedde  writes:

> Hi all,
>
> This series adds support for cold-plugging devices using QMP
> commands. It is a step towards machine configuration using QMP, but
> it does not allow the user to add more devices than he could do with
> the CLI options before.
>
> Right now we can add a device using 2 ways:
> + giving "-device" CLI option. In that case the device is
>   cold-plugged: it is created before the machine becomes ready.
> + issuing device_add QMP command. In that case the device is
>   hot-plugged (the command can not be issued before the machine is
>   ready).
>
> This series allows to issue device_add QMP to cold-plug a device
> like we do with "-device" CLI option. All added QMP commands are
> marked as 'unstable' in qapi as they are part of preconfig.
> We achieve this by introducing a new 'x-machine-init' command to
> stop the VM creation at a point where we can cold-plug devices.
>
> We are aware of the ongoing discussion about preconfig future (see
> [1]). This patchset includes no major modifications from the v2 (but
> the scope is reduced) and "x-machine-init" simply stops the
> configuration between qemu_board_init() and qemu_create_cli_devices()
> function calls.
>
> As a consequence, in the current state, the timeline is:

"current state" = with this series applied?

> + "x-machine-init" command
> + "device_add" cold-plug commands (no fw_cfg legacy order support)
> + "x-exit-preconfig" command will then trigger the following
> + "-soundhw" CLI options
> + "-fw_cfg" CLI options
> + usb devices creation
> + "-device" CLI options (with fw_cfg legacy order support)
> + some other devices creation (with fw_cfg legacy order support)
>
> We don't know if the differences between -device/device_add are
> acceptable or not. To reduce/remove them we could move the
> "x-machine-init" stopping point. What do you think ?

I'm not sure I understand this paragraph.

I understand the difference between -device and device_add in master:
cold vs. hot plug.

Your patch series makes "cold" device_add possible, i.e. it reduces
(eliminates?) the difference between -device and device_add when the
latter is "cold".

What difference remains that moving 'the "x-machine-init" stopping
point' would 'reduce/remove'?

> Patches 1, 3 and 5 miss a review.
>
> The series is organized as follow:
>
> + Patches 1 and 2 converts the MachinePhase enum to a qapi definition
>   and add the 'query-machine-phase'. It allows to introspect the
>   current machine phase during preconfig as we will now be able to
>   reach several machine phases using QMP.

If we fold MachinePhase into RunState, we can reuse query-status.

Having two state machines run one after the other feels like one too
many.

> + Patch 3 adds the 'x-machine-init' QMP command to stop QEMU at
>   machine-initialized phase during preconfig.
> + Patch 4 allows issuing device_add QMP command during the
>   machine-initialized phase.
> + Patch 5 improves the doc about preconfig in consequence. 

I understand you want to make progress towards machine configuration
with QMP.  However, QEMU startup is (in my educated opinion) a hole, and
we should be wary of digging deeper.

The "timeline" you gave above illustrates this.  It's a complicated
shuffling of command line options and QMP commands that basically nobody
can keep in working memory.  We have reshuffled it / made it more
complicated quite a few times already to support new features.  Based on
your cover letter, I figure you're making it more complicated once more.

At some point, we need to stop digging us deeper into the hole.  This is
not an objection to merging your work.  It's a call to stop and think.

Let me quote the sketch I posted to the "Stabilize preconfig" thread:

1. Start event loop

2. Feed it CLI left to right.  Each option runs a handler just like each
   QMP command does.

   Options that read a configuration file inject the file into the feed.

   Options that create a monitor create it suspended.

   Options may advance the phase / run state, and they may require
   certain phase(s).

3. When we're done with CLI, resume any monitors we created.

4. Monitors now feed commands to the event loop.  Commands may advance
   the phase / run state, and they may require certain phase(s).

Users can do as much or as little with the CLI as they want.  You'd
probably want to set up a QMP monitor and no more.

device_add becomes possible at a certain state of the phase / run state
machine.  It changes from cold to hot plug at a certain later state.

> [1]: 
> https://lore.kernel.org/qemu-devel/b31f442d28920447690a6b8cee865bdbacde1283.1635160056.git.mpriv...@redhat.com
>
> Thanks for your feedback.

Re: [PATCH v11 11/26] target/loongarch: Add floating point comparison instruction translation

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+static bool trans_fcmp_cond_s(DisasContext *ctx, arg_fcmp_cond_s *a)
+{
+TCGv var = tcg_temp_new();
+uint32_t flags;
+void (*fn)(TCGv, TCGv_env, TCGv, TCGv, TCGv_i32);
+
+fn = (a->fcond & 1 ? gen_helper_fcmp_s_s : gen_helper_fcmp_c_s);
+flags = get_fcmp_flags(a->fcond >> 1);
+
+fn(var, cpu_env, cpu_fpr[a->fj], cpu_fpr[a->fk], tcg_constant_i32(flags));
+
+tcg_gen_st8_tl(var, cpu_env, offsetof(CPULoongArchState, cf[a->cd & 0x7]));


No need to mask cd; the decode took care of that.


+#define FCMP_LT   0x0001  /* fp0 < fp1 */
+#define FCMP_EQ   0x0010  /* fp0 = fp1 */
+#define FCMP_UN   0x0100  /* unordered */
+#define FCMP_GT   0x1000  /* fp0 > fp1 */


Any reason why these bits are not sequential?


r~

Re: [PATCH v11 14/26] target/loongarch: Add floating point load/store instruction translation

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+static bool gen_fload_imm(DisasContext *ctx, arg_fr_i *a,
+  MemOp mop, bool nanbox)


Don't pass nanbox, as it can be determined from mop.

I think you should split out

static void maybe_nanbox_load(TCGv freg, MemOp mop)
{
if ((mop & MO_SIZE) == MO_32) {
gen_nanbox_s(freg, freg);
}
}

for use in the 4 different fload functions.


+static bool gen_fstore_imm(DisasContext *ctx, arg_fr_i *a,
+   MemOp mop, bool nanbox)


Don't pass nanbox, because it's useless for stores.


+if (nanbox) {
+gen_nanbox_s(cpu_fpr[a->fd], cpu_fpr[a->fd]);
+}


(1) nanboxing not needed for store,
(2) incorrect to modify fd.


+static bool gen_fload_tl(DisasContext *ctx, arg_frr *a,
+ MemOp mop, bool nanbox)


Similarly.

Since the integer version is called gen_loadx, should this one be called 
gen_floadx?


+static bool gen_fstore_tl(DisasContext *ctx, arg_frr *a,
+  MemOp mop, bool nanbox)

...

+static bool gen_fload_gt(DisasContext *ctx, arg_frr *a,
+ MemOp mop, bool nanbox)

...

+static bool gen_fstore_gt(DisasContext *ctx, arg_frr *a,
+  MemOp mop, bool nanbox)

...

+static bool gen_fload_le(DisasContext *ctx, arg_frr *a,
+ MemOp mop, bool nanbox)

...

+static bool gen_fstore_le(DisasContext *ctx, arg_frr *a,
+  MemOp mop, bool nanbox)


Simiarly.


+TRANS(fld_s, gen_fload_imm, MO_TESL, true)


Use TEUL for everything here, because you don't need sign extension.


r~

Re: [PATCH v11 07/26] target/loongarch: Add fixed point load/store instruction translation

2021-11-20 Thread Richard Henderson


On 11/20/21 9:20 AM, Richard Henderson wrote:

%i14s2  10:s14 !function=shl_2


Of course you have a times_4 function introduced later which could be used for 
this.


r~

Re: [PATCH v11 05/26] target/loongarch: Add fixed point shift instruction translation

2021-11-20 Thread gaosong


Hi Richard,

On 2021/11/20 下午3:42, Richard Henderson wrote:

On 11/19/21 7:13 AM, Song Gao wrote:

+static bool gen_shift(DisasContext *ctx, arg_rr_i *a,
+  void(*func)(TCGv, TCGv, TCGv))
+{
+    TCGv dest = gpr_dst(ctx, a->rd, EXT_SIGN);
+    TCGv src1 = gpr_src(ctx, a->rj, EXT_ZERO);
+    TCGv src2 = tcg_constant_tl(a->imm);
+
+    func(dest, src1, src2);
+    gen_set_gpr(a->rd, dest, EXT_SIGN);
+
+    return true;
+}


This is no longer generic; it's specific to word operations.  But 
there's nothing in here that can't be done with gen_rr_i, so I think 
you should remove it.



OK.

+
+static bool gen_shift_i(DisasContext *ctx, arg_rr_i *a,
+    void(*func)(TCGv, TCGv, target_long))
+{
+    TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+    TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+
+    func(dest, src1, a->imm);
+
+    return true;
+}


This one has dropped gen_set_gpr.


We need't gen_set_gpr, the dst_ext is EXT_NONE.

I think that your current gen_rr_i should be named gen_rri_v 
(variable) and this one should regain the DisasExtend and be named 
gen_rri_c (constant).


Then, in the previous,

TRANS(addi_w, gen_rri_c, EXT_NONE, EXT_SIGN, tcg_gen_addi_tl)
TRANS(addi_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_addi_tl)
TRANS(andi, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_andi_tl)
TRANS(ori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_ori_tl)
TRANS(xori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_xori_tl)

There are a few identity tests within these tcg_gen_opi_tl functions 
which would be nice to apply.  Particularly because the canonical 
"nop" instruction for loongarch is "andi r0,r0,0".



+TRANS(slli_w, gen_shift, tcg_gen_shl_tl)
+TRANS(slli_d, gen_shift_i, tcg_gen_shli_tl)
+TRANS(srli_w, gen_shift, tcg_gen_shr_tl)
+TRANS(srli_d, gen_shift_i, tcg_gen_shri_tl)
+TRANS(srai_d, gen_shift_i, tcg_gen_sari_tl)
+TRANS(rotri_w, gen_shift, gen_rotr_w)
+TRANS(rotri_d, gen_shift_i, tcg_gen_rotri_tl)


Then these become

TRANS(slli_w, gen_rri_c, EXT_NONE, EXT_SIGN, tcg_gen_shli_tl)
TRANS(slli_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_shli_tl)
TRANS(srli_w, gen_rri_c, EXT_SIGN, EXT_SIGN, tcg_gen_shri_tl)
TRANS(srli_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_shri_tl)
TRANS(srai_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_sari_tl)
TRANS(rotri_w, gen_rri_v, EXT_NONE, EXT_NONE, gen_rotr_w)
TRANS(rotri_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_rotri_tl)


Very nice, very clean.

Thanks
Song Gao


r~

Re: [PATCH v11 15/26] target/loongarch: Add branch instruction translation

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+tcg_gen_ld8u_tl(src1, cpu_env,
+offsetof(CPULoongArchState, cf[a->cj & 0x7]));


Mask of cj not needed; it's done by decode.


+&rr_dj_offs   rd rj offs
+&rr_offs  rj rd offs

...

+@rr_dj_offs16   ..  rj:5 rd:5&rr_dj_offs  
offs=%offs16
+@rr_offs16  ..  rj:5 rd:5&rr_offs 
offs=%offs16


These two are the same.


r~

Re: [PATCH v11 16/26] target/loongarch: Add disassembler

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+static void output_r_offs(DisasContext *ctx, arg_r_offs *a,
+  const char *mnemonic)
+{
+output(ctx, mnemonic, "r%d, %d # 0x%lx", a->rj, a->offs,


%lx is wrong; use PRIx64.
Many instances.

If you use

make docker-test-build@fedora-i386-cross

you should see errors here.


r~

Re: [PATCH v11 17/26] linux-user: Add LoongArch generic header files

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

This includes:
- sockbits.h
- target_errno_defs.h
- target_fcntl.h
- termbits.h

Signed-off-by: Song Gao
Signed-off-by: Xiaojuan Yang
---
  linux-user/loongarch64/sockbits.h  | 11 +++
  linux-user/loongarch64/target_errno_defs.h | 12 
  linux-user/loongarch64/target_fcntl.h  | 11 +++
  linux-user/loongarch64/termbits.h  | 11 +++
  4 files changed, 45 insertions(+)
  create mode 100644 linux-user/loongarch64/sockbits.h
  create mode 100644 linux-user/loongarch64/target_errno_defs.h
  create mode 100644 linux-user/loongarch64/target_fcntl.h
  create mode 100644 linux-user/loongarch64/termbits.h


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v11 18/26] linux-user: Add LoongArch specific structures

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

Signed-off-by: Song Gao 
Signed-off-by: Xiaojuan Yang 
---
  linux-user/loongarch64/target_structs.h | 48 +
  1 file changed, 48 insertions(+)
  create mode 100644 linux-user/loongarch64/target_structs.h

diff --git a/linux-user/loongarch64/target_structs.h 
b/linux-user/loongarch64/target_structs.h
new file mode 100644
index 000..cc7928a
--- /dev/null
+++ b/linux-user/loongarch64/target_structs.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LoongArch specific structures for linux-user
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ */
+
+#ifndef LOONGARCH_TARGET_STRUCTS_H
+#define LOONGARCH_TARGET_STRUCTS_H
+
+struct target_ipc_perm {
+abi_int __key;  /* Key.  */
+abi_uint uid;   /* Owner's user ID.  */
+abi_uint gid;   /* Owner's group ID.  */
+abi_uint cuid;  /* Creator's user ID.  */
+abi_uint cgid;  /* Creator's group ID.  */
+abi_uint mode;  /* Read/write permission.  */
+abi_ushort __seq;   /* Sequence number.  */
+abi_ushort __pad1;
+abi_ulong __unused1;
+abi_ulong __unused2;
+};


We should most definitely create a generic version of this file before we do any more 
replication.


The aarch64 version is generic, with copies in cris, hexagon, i386, m68k, microblaze, 
nios2, openrisc, riscv, and sh4.



r~

[PATCH] scsi-generic: replace logical block count of response of READ CAPACITY

2021-11-20 Thread Lin Ma

While using SCSI passthrough, Following scenario makes qemu doesn't
realized the capacity change of remote scsi target:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'rescan-scsi-bus.sh -s ...' in vm.

In above scenario I used to experienced errors while accessing the
additional disk space in vm. I think the reasonable operations should
be:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'block_resize' via qmp to notify qemu.
4. issue 'rescan-scsi-bus.sh -s ...' in vm.

The errors disappear once I notify qemu by block_resize via qmp.

So this patch replaces the number of logical blocks of READ CAPACITY
response from scsi target by qemu's bs->total_sectors. If the user in
vm wants to access the additional disk space, The administrator of
host must notify qemu once resizeing the scsi target.

Bonus is that domblkinfo of libvirt can reflect the consistent capacity
information between host and vm in case of missing block_resize in qemu.
E.g:
...

  
  
  
  
  
  

...

Before:
1. online resize the scsi target.
2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity:   4.000 GiB
Allocation: 0.000 B
Physical:   8.000 GiB

5. guest:~ # lsblk /dev/sda
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda  8:00   8G  0 disk
└─sda1   8:10   2G  0 part

After:
1. online resize the scsi target.
2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity:   4.000 GiB
Allocation: 0.000 B
Physical:   8.000 GiB

5. guest:~ # lsblk /dev/sda
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda  8:00   4G  0 disk
└─sda1   8:10   2G  0 part

Signed-off-by: Lin Ma 
---
 hw/scsi/scsi-generic.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 0306ccc7b1..343b51c2c0 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -315,11 +315,17 @@ static void scsi_read_complete(void * opaque, int ret)
 if (r->req.cmd.buf[0] == READ_CAPACITY_10 &&
 (ldl_be_p(&r->buf[0]) != 0xU || s->max_lba == 0)) {
 s->blocksize = ldl_be_p(&r->buf[4]);
-s->max_lba = ldl_be_p(&r->buf[0]) & 0xULL;
+BlockBackend *blk = s->conf.blk;
+BlockDriverState *bs = blk_bs(blk);
+s->max_lba = bs->total_sectors - 1;
+stl_be_p(&r->buf[0], s->max_lba);
 } else if (r->req.cmd.buf[0] == SERVICE_ACTION_IN_16 &&
(r->req.cmd.buf[1] & 31) == SAI_READ_CAPACITY_16) {
 s->blocksize = ldl_be_p(&r->buf[8]);
-s->max_lba = ldq_be_p(&r->buf[0]);
+BlockBackend *blk = s->conf.blk;
+BlockDriverState *bs = blk_bs(blk);
+s->max_lba = bs->total_sectors - 1;
+stq_be_p(&r->buf[0], s->max_lba);
 }
 blk_set_guest_block_size(s->conf.blk, s->blocksize);
 
-- 
2.26.2

Re: [PATCH v11 19/26] linux-user: Add LoongArch signal support

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

Signed-off-by: Song Gao 
Signed-off-by: Xiaojuan Yang 
---
  linux-user/loongarch64/signal.c| 162 +
  linux-user/loongarch64/target_signal.h |  29 ++
  2 files changed, 191 insertions(+)
  create mode 100644 linux-user/loongarch64/signal.c
  create mode 100644 linux-user/loongarch64/target_signal.h

diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
new file mode 100644
index 000..8fbc827
--- /dev/null
+++ b/linux-user/loongarch64/signal.c
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LoongArch emulation of Linux signals
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ */
+
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "signal-common.h"
+#include "user-internals.h"
+#include "linux-user/trace.h"
+
+struct target_sigcontext {
+uint64_t   sc_pc;
+uint64_t   sc_gpr[32];
+uint64_t   sc_fpr[32];
+uint64_t   sc_fcc;
+uint32_t   sc_fcsr;
+uint32_t   sc_flags;
+};


Does not match
https://github.com/loongson/linux/blob/loongarch-next/arch/loongarch/include/uapi/asm/sigcontext.h


+
+struct target_ucontext {
+target_ulong tuc_flags;
+target_ulong tuc_link;
+target_stack_t tuc_stack;
+target_ulong pad0;
+struct target_sigcontext tuc_mcontext;
+target_sigset_t tuc_sigmask;
+};


Does not match
https://github.com/loongson/linux/blob/loongarch-next/arch/loongarch/include/uapi/asm/ucontext.h


+static inline void setup_sigcontext(CPULoongArchState *env,
+struct target_sigcontext *sc)


Drop all of the the inline markers.


+{
+int i;
+
+__put_user(env->pc, &sc->sc_pc);
+
+__put_user(0, &sc->sc_gpr[0]);
+for (i = 1; i < 32; ++i) {
+__put_user(env->gpr[i], &sc->sc_gpr[i]);
+}
+
+for (i = 0; i < 32; ++i) {
+__put_user(env->fpr[i], &sc->sc_fpr[i]);
+}
+}


Missing fcsr and fcc.

I'll note that the kernel is missing sets of vscr and scr[0-3].  IMO they should at least 
be zeroed in advance of supporting the vector extension.



+static inline void
+restore_sigcontext(CPULoongArchState *env, struct target_sigcontext *sc)
+{
+int i;
+
+__get_user(env->pc, &sc->sc_pc);
+
+for (i = 1; i < 32; ++i) {
+__get_user(env->gpr[i], &sc->sc_gpr[i]);
+}
+
+for (i = 0; i < 32; ++i) {
+__get_user(env->fpr[i], &sc->sc_fpr[i]);
+}
+}


Similarly.


+return (sp - frame_size) & ~7;


include/asm/asm.h:#define ALMASK~15
kernel/signal.c:return (void __user *)((sp - frame_size) & ALMASK);


+env->pc = env->gpr[20] = ka->_sa_handler;


There is no set of gpr[20].


+void setup_sigtramp(abi_ulong sigtramp_page)
+{
+uint32_t *tramp = lock_user(VERIFY_WRITE, sigtramp_page, 8, 0);
+assert(tramp != NULL);
+
+__put_user(0x03822c0b, tramp + 0);  /* ori a7, a7, 0x8b */


The comment is incorrect: "ori a7, zero, 0x8b", but the hex is right.


+/* this struct defines a stack used during syscall handling */
+typedef struct target_sigaltstack {
+abi_long ss_sp;
+abi_int ss_flags;
+abi_ulong ss_size;
+} target_stack_t;
+
+/*
+ * sigaltstack controls
+ */
+#define TARGET_SS_ONSTACK 1
+#define TARGET_SS_DISABLE 2
+
+#define TARGET_MINSIGSTKSZ2048
+#define TARGET_SIGSTKSZ   8192


We should move these to generic/signal.h.


r~

Re: [PATCH v11 20/26] linux-user: Add LoongArch elf support

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

Signed-off-by: Song Gao
Signed-off-by: Xiaojuan Yang
---
  include/elf.h   |  2 ++
  linux-user/elfload.c| 58 +
  linux-user/loongarch64/target_elf.h | 12 
  3 files changed, 72 insertions(+)
  create mode 100644 linux-user/loongarch64/target_elf.h


Reviewed-by: Richard Henderson 

r~

[PATCH v3 0/3] support dirty restraint on vCPU

2021-11-20 Thread huangy81

From: Hyman Huang(黄勇) 

v3:
- rebase on master
- modify the following points according to the advice given by Markus
  1. remove the DirtyRateQuotaVcpu and use its field as option directly
  2. add comments to show details of what dirtylimit setup do
  3. explain how to use dirtylimit in combination with existing qmp
 commands "calc-dirty-rate" and "query-dirty-rate" in documentation. 

Thanks for the carefule reviews made by Markus.

Please review, thanks!

Hyman

v2:
- rebase on master
- modify the following points according to the advices given by Juan
  1. rename dirtyrestraint to dirtylimit
  2. implement the full lifecyle function of dirtylimit_calc, include
 dirtylimit_calc and dirtylimit_calc_quit
  3. introduce 'quit' field in dirtylimit_calc_state to implement the
 dirtylimit_calc_quit
  4. remove the ready_cond and ready_mtx since it may not be suitable
  5. put the 'record_dirtypage' function code at the beggining of the 
 file 
  6. remove the unnecesary return;
- other modifications has been made after code review
  1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
 number of running thread forked by dirtylimit
  2. stop the dirtyrate calculation thread if all the dirtylimit thread
 are stopped
  3. do some renaming works
 dirtyrate calulation thread -> dirtylimit-calc 
 dirtylimit thread -> dirtylimit-{cpu_index}
 function name do_dirtyrestraint -> dirtylimit_check
 qmp command dirty-restraint -> set-drity-limit
 qmp command dirty-restraint-cancel -> cancel-dirty-limit
 header file dirtyrestraint.h -> dirtylimit.h

Please review, thanks !

thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.

Best Regards ! 

Hyman

v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects. 
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high. 
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever. 

we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!

Please review, thanks !

Best Regards ! 

Hyman Huang (3):
  migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  cpu-throttle: implement vCPU throttle
  cpus-common: implement dirty limit on vCPU

 cpus-common.c |  41 ++
 include/exec/memory.h |   5 +-
 include/hw/core/cpu.h |   9 ++
 include/sysemu/cpu-throttle.h |  23 +++
 include/sysemu/dirtylimit.h   |  44 ++
 migration/dirtyrate.c | 139 --
 migration/dirtyrate.h |   2 +
 qapi/misc.json|  39 ++
 softmmu/cpu-throttle.c| 319 ++
 softmmu/trace-events  |   5 +
 softmmu/vl.c  |   1 +
 11 files changed, 616 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

-- 
1.8.3.1

[PATCH v3 3/3] cpus-common: implement dirty limit on vCPU

2021-11-20 Thread huangy81

From: Hyman Huang(黄勇) 

implement dirtyrate calculation periodically basing on
dirty-ring and throttle vCPU until it reachs the quota
dirtyrate given by user.

introduce qmp commands set-dirty-limit/cancel-dirty-limit to
set/cancel dirty limit on vCPU.

Signed-off-by: Hyman Huang(黄勇) 
---
 cpus-common.c | 41 +
 include/hw/core/cpu.h |  9 +
 qapi/misc.json| 39 +++
 softmmu/vl.c  |  1 +
 4 files changed, 90 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e..e32612b 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -23,6 +23,11 @@
 #include "hw/core/cpu.h"
 #include "sysemu/cpus.h"
 #include "qemu/lockable.h"
+#include "sysemu/dirtylimit.h"
+#include "sysemu/cpu-throttle.h"
+#include "sysemu/kvm.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-misc.h"
 
 static QemuMutex qemu_cpu_list_lock;
 static QemuCond exclusive_cond;
@@ -352,3 +357,39 @@ void process_queued_cpu_work(CPUState *cpu)
 qemu_mutex_unlock(&cpu->work_mutex);
 qemu_cond_broadcast(&qemu_work_cond);
 }
+
+void qmp_set_dirty_limit(int64_t idx,
+ uint64_t dirtyrate,
+ Error **errp)
+{
+if (!kvm_dirty_ring_enabled()) {
+error_setg(errp, "dirty ring not enable, needed by dirty restraint!");
+return;
+}
+
+dirtylimit_calc();
+dirtylimit_vcpu(idx, dirtyrate);
+}
+
+void qmp_cancel_dirty_limit(int64_t idx,
+Error **errp)
+{
+if (!kvm_dirty_ring_enabled()) {
+error_setg(errp, "dirty ring not enable, needed by dirty restraint!");
+return;
+}
+
+if (unlikely(!dirtylimit_cancel_vcpu(idx))) {
+dirtylimit_calc_quit();
+}
+}
+
+void dirtylimit_setup(int max_cpus)
+{
+if (!kvm_dirty_ring_enabled()) {
+return;
+}
+
+dirtylimit_calc_state_init(max_cpus);
+dirtylimit_state_init(max_cpus);
+}
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e948e81..11df012 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -881,6 +881,15 @@ void end_exclusive(void);
  */
 void qemu_init_vcpu(CPUState *cpu);
 
+/**
+ * dirtylimit_setup:
+ *
+ * Initializes the global state of dirtylimit calculation and
+ * dirtylimit itself. This is prepared for vCPU dirtylimit which
+ * could be triggered during vm lifecycle.
+ */
+void dirtylimit_setup(int max_cpus);
+
 #define SSTEP_ENABLE  0x1  /* Enable simulated HW single stepping */
 #define SSTEP_NOIRQ   0x2  /* Do not use IRQ while single stepping */
 #define SSTEP_NOTIMER 0x4  /* Do not Timers while single stepping */
diff --git a/qapi/misc.json b/qapi/misc.json
index 358548a..98e6001 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -527,3 +527,42 @@
  'data': { '*option': 'str' },
  'returns': ['CommandLineOptionInfo'],
  'allow-preconfig': true }
+
+##
+# @set-dirty-limit:
+#
+# This command could be used to cap the vCPU memory load, which is also
+# refered as dirtyrate. One should use "calc-dirty-rate" with "dirty-ring"
+# and to calculate vCPU dirtyrate and query it with "query-dirty-rate".
+# Once getting the vCPU current dirtyrate, "set-dirty-limit" can be used
+# to set the upper limit of dirtyrate for the interested vCPU.
+#
+# @idx: vCPU index to set dirtylimit.
+#
+# @dirtyrate: upper limit of drityrate the specified vCPU could reach (MB/s)
+#
+# Since: 6.3
+#
+# Example:
+#   {"execute": "set-dirty-limit"}
+#"arguments": { "idx": 0,
+#   "dirtyrate": 200 } }
+#
+##
+{ 'command': 'set-dirty-limit',
+  'data': { 'idx': 'int', 'dirtyrate': 'uint64' } }
+
+##
+# @cancel-dirty-limit:
+#
+# @idx: vCPU index to canceled the dirtylimit
+#
+# Since: 6.3
+#
+# Example:
+#   {"execute": "cancel-dirty-limit"}
+#"arguments": { "idx": 0 } }
+#
+##
+{ 'command': 'cancel-dirty-limit',
+  'data': { 'idx': 'int' } }
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 1159a64..170ee23 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3776,5 +3776,6 @@ void qemu_init(int argc, char **argv, char **envp)
 qemu_init_displays();
 accel_setup_post(current_machine);
 os_setup_post();
+dirtylimit_setup(current_machine->smp.max_cpus);
 resume_mux_open();
 }
-- 
1.8.3.1

[PATCH v3 2/3] cpu-throttle: implement vCPU throttle

2021-11-20 Thread huangy81

From: Hyman Huang(黄勇) 

impose dirty restraint on vCPU by kicking it and sleep
as the auto-converge does during migration, but just
kick the specified vCPU instead, not all vCPUs of vm.

start a thread to track the dirtylimit status and adjust
the throttle pencentage dynamically depend on current
and quota dirtyrate.

introduce the util function in the header for dirtylimit
implementation.

Signed-off-by: Hyman Huang(黄勇) 
---
 include/sysemu/cpu-throttle.h |  23 +++
 softmmu/cpu-throttle.c| 319 ++
 softmmu/trace-events  |   5 +
 3 files changed, 347 insertions(+)

diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h
index d65bdef..726c1ce 100644
--- a/include/sysemu/cpu-throttle.h
+++ b/include/sysemu/cpu-throttle.h
@@ -65,4 +65,27 @@ bool cpu_throttle_active(void);
  */
 int cpu_throttle_get_percentage(void);
 
+/**
+ * dirtylimit_state_init:
+ *
+ * initialize golobal state for dirtylimit
+ */
+void dirtylimit_state_init(int max_cpus);
+
+/**
+ * dirtylimit_vcpu:
+ *
+ * impose dirtylimit on vcpu util reaching the quota dirtyrate
+ */
+void dirtylimit_vcpu(int cpu_index,
+ uint64_t quota);
+/**
+ * dirtylimit_cancel_vcpu:
+ *
+ * cancel dirtylimit for the specified vcpu
+ *
+ * Returns: the number of running threads for dirtylimit
+ */
+int dirtylimit_cancel_vcpu(int cpu_index);
+
 #endif /* SYSEMU_CPU_THROTTLE_H */
diff --git a/softmmu/cpu-throttle.c b/softmmu/cpu-throttle.c
index 8c2144a..a5e67c9 100644
--- a/softmmu/cpu-throttle.c
+++ b/softmmu/cpu-throttle.c
@@ -29,6 +29,8 @@
 #include "qemu/main-loop.h"
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-throttle.h"
+#include "sysemu/dirtylimit.h"
+#include "trace.h"
 
 /* vcpu throttling controls */
 static QEMUTimer *throttle_timer;
@@ -38,6 +40,323 @@ static unsigned int throttle_percentage;
 #define CPU_THROTTLE_PCT_MAX 99
 #define CPU_THROTTLE_TIMESLICE_NS 1000
 
+#define DIRTYLIMIT_TOLERANCE_RANGE  15  /* 15MB/s */
+
+#define DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK 75
+#define DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK90
+
+#define DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE 5
+#define DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE2
+
+typedef enum {
+RESTRAIN_KEEP,
+RESTRAIN_RATIO,
+RESTRAIN_HEAVY,
+RESTRAIN_SLIGHT,
+} RestrainPolicy;
+
+typedef struct DirtyLimitState {
+int cpu_index;
+bool enabled;
+uint64_t quota; /* quota dirtyrate MB/s */
+QemuThread thread;
+char *name; /* thread name */
+} DirtyLimitState;
+
+struct {
+DirtyLimitState *states;
+int max_cpus;
+unsigned long *bmap; /* running thread bitmap */
+unsigned long nr;
+} *dirtylimit_state;
+
+static inline bool dirtylimit_enabled(int cpu_index)
+{
+return qatomic_read(&dirtylimit_state->states[cpu_index].enabled);
+}
+
+static inline void dirtylimit_set_quota(int cpu_index, uint64_t quota)
+{
+qatomic_set(&dirtylimit_state->states[cpu_index].quota, quota);
+}
+
+static inline uint64_t dirtylimit_quota(int cpu_index)
+{
+return qatomic_read(&dirtylimit_state->states[cpu_index].quota);
+}
+
+static int64_t dirtylimit_current(int cpu_index)
+{
+return dirtylimit_calc_current(cpu_index);
+}
+
+static void dirtylimit_vcpu_thread(CPUState *cpu, run_on_cpu_data data)
+{
+double pct;
+double throttle_ratio;
+int64_t sleeptime_ns, endtime_ns;
+int *percentage = (int *)data.host_ptr;
+
+pct = (double)(*percentage) / 100;
+throttle_ratio = pct / (1 - pct);
+/* Add 1ns to fix double's rounding error (like 0.999...) */
+sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
+endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
+while (sleeptime_ns > 0 && !cpu->stop) {
+if (sleeptime_ns > SCALE_MS) {
+qemu_cond_timedwait_iothread(cpu->halt_cond,
+ sleeptime_ns / SCALE_MS);
+} else {
+qemu_mutex_unlock_iothread();
+g_usleep(sleeptime_ns / SCALE_US);
+qemu_mutex_lock_iothread();
+}
+sleeptime_ns = endtime_ns - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+}
+qatomic_set(&cpu->throttle_thread_scheduled, 0);
+
+free(percentage);
+}
+
+static void dirtylimit_check(int cpu_index,
+ int percentage)
+{
+CPUState *cpu;
+int64_t sleeptime_ns, starttime_ms, currenttime_ms;
+int *pct_parameter;
+double pct;
+
+pct = (double) percentage / 100;
+
+starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+while (true) {
+CPU_FOREACH(cpu) {
+if ((cpu_index == cpu->cpu_index) &&
+(!qatomic_xchg(&cpu->throttle_thread_scheduled, 1))) {
+pct_parameter = malloc(sizeof(*pct_parameter));
+*pct_parameter = percentage;
+async_run_on_cpu(cpu, dirtylimit_vcpu_thread,
+ RUN_ON_CPU_H

[PATCH v3 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically

2021-11-20 Thread huangy81

From: Hyman Huang(黄勇) 

introduce the third method GLOBAL_DIRTY_RESTRAINT of dirty
tracking for calculate dirtyrate periodly for dirty restraint.

implement thread for calculate dirtyrate periodly, which will
be used for dirty restraint.

add dirtylimit.h to introduce the util function for dirty
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  44 ++
 migration/dirtyrate.c   | 139 
 migration/dirtyrate.h   |   2 +
 4 files changed, 179 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27..606bec8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 000..3a1c74a
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,44 @@
+/*
+ * dirty limit helper functions
+ *
+ * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_PERIOD_TIME_S   15  /* 15s */
+
+/**
+ * dirtylimit_calc_current:
+ *
+ * get current dirty rate of specified vCPU.
+ */
+int64_t dirtylimit_calc_current(int cpu_index);
+
+/**
+ * dirtylimit_calc:
+ *
+ * start dirty rate calculation thread.
+ */
+void dirtylimit_calc(void);
+
+/**
+ * dirtylimit_calc_quit:
+ *
+ * quit dirty rate calculation thread.
+ */
+void dirtylimit_calc_quit(void);
+
+/**
+ * dirtylimit_calc_state_init:
+ *
+ * initialize dirty rate calculation state.
+ */
+void dirtylimit_calc_state_init(int max_cpus);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index d65e744..d370a21 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -27,6 +27,7 @@
 #include "qapi/qmp/qdict.h"
 #include "sysemu/kvm.h"
 #include "sysemu/runstate.h"
+#include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 
 /*
@@ -46,6 +47,134 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+struct {
+DirtyRatesData data;
+int64_t period;
+bool quit;
+} *dirtylimit_calc_state;
+
+static void dirtylimit_global_dirty_log_start(void)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
+qemu_mutex_unlock_iothread();
+}
+
+static void dirtylimit_global_dirty_log_stop(void)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
+qemu_mutex_unlock_iothread();
+}
+
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static void dirtylimit_calc_func(void)
+{
+CPUState *cpu;
+DirtyPageRecord *dirty_pages;
+int64_t start_time, end_time, calc_time;
+DirtyRateVcpu rate;
+int i = 0;
+
+dirty_pages = g_malloc0(sizeof(*dirty_pages) *
+dirtylimit_calc_state->data.nvcpu);
+
+dirtylimit_global_dirty_log_start();
+
+CPU_FOREACH(cpu) {
+record_dirtypages(dirty_pages, cpu, true);
+}
+
+start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
+end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+calc_time = end_time - start_time;
+
+dirtylimit_global_dirty_log_stop();
+
+CPU_FOREACH(cpu) {
+record_dirtypages(dirty_pages, cpu, false);
+}
+
+for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
+uint64_t increased_dirty_pages =
+dirty_pages[i].end_pages - dirty_pages[i].start_pages;
+uint64_t memory_size_MB =
+(increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
+
+rate.id = i;
+rate.dirty_rate  = dirtyrate;
+dirtylimit_calc_state->data.rates[i] = rate;
+
+trace_dirtyrate_do_calculate_vcpu(i,
+dirtylimit_calc_state->data.rates[i].dirty_rate);
+}
+}
+
+static void *dirtylimit_calc_thread(void *opaqu

Re: [PATCH v11 21/26] linux-user: Add LoongArch syscall support

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+static inline abi_ulong target_shmlba(CPULoongArchState *env)
+{
+return 0x4;
+}


include/asm/shmparam.h:#define  SHMLBA  SZ_64K

Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [PATCH v11 22/26] linux-user: Add LoongArch cpu_loop support

2021-11-20 Thread Richard Henderson


On 11/19/21 7:13 AM, Song Gao wrote:

+case EXCP_ADE:
+force_sig_fault(TARGET_SIGSEGV, TARGET_SEGV_MAPERR, env->badaddr);
+break;


No longer required; should be handled correctly by cpu_loop_exit_sigsegv.

Otherwise,
Reviewed-by: Richard Henderson 



r~

Re: [PATCH 01/11] migration: Remove is_zero_range()

2021-11-20 Thread Richard Henderson


On 11/19/21 5:58 PM, Juan Quintela wrote:

It just calls buffer_is_zero().  Just change the callers.

Signed-off-by: Juan Quintela
---
  migration/ram.c | 9 ++---
  1 file changed, 2 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 02/11] dump: Remove is_zero_page()

2021-11-20 Thread Richard Henderson


On 11/19/21 5:58 PM, Juan Quintela wrote:

It just calls buffer_is_zero().  Just change the callers.

Signed-off-by: Juan Quintela
---
  dump/dump.c | 10 +-
  1 file changed, 1 insertion(+), 9 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v1 08/12] target/riscv: Handle KVM_EXIT_RISCV_SBI exit

2021-11-20 Thread Philippe Mathieu-Daudé

Hi,

On 11/20/21 08:46, Yifei Jiang wrote:
> Use char-fe to handle console sbi call, which implement early
> console io while apply 'earlycon=sbi' into kernel parameters.
> 
> Signed-off-by: Yifei Jiang 
> Signed-off-by: Mingwang Li 
> ---
>  target/riscv/kvm.c | 42 -
>  target/riscv/sbi_ecall_interface.h | 72 ++
>  2 files changed, 113 insertions(+), 1 deletion(-)
>  create mode 100644 target/riscv/sbi_ecall_interface.h
> 
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index 8da2648d1a..6d419ba02e 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -38,6 +38,8 @@
>  #include "qemu/log.h"
>  #include "hw/loader.h"
>  #include "kvm_riscv.h"
> +#include "sbi_ecall_interface.h"
> +#include "chardev/char-fe.h"
>  
>  static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type, uint64_t 
> idx)
>  {
> @@ -440,9 +442,47 @@ bool kvm_arch_stop_on_emulation_error(CPUState *cs)
>  return true;
>  }
>  
> +static int kvm_riscv_handle_sbi(struct kvm_run *run)
> +{
> +int ret = 0;
> +unsigned char ch;
> +switch (run->riscv_sbi.extension_id) {
> +case SBI_EXT_0_1_CONSOLE_PUTCHAR:
> +ch = run->riscv_sbi.args[0];
> +qemu_chr_fe_write(serial_hd(0)->be, &ch, sizeof(ch));
> +break;
> +case SBI_EXT_0_1_CONSOLE_GETCHAR:
> +ret = qemu_chr_fe_read_all(serial_hd(0)->be, &ch, sizeof(ch));
> +if (ret == sizeof(ch)) {
> +run->riscv_sbi.args[0] = ch;
> +} else {
> +run->riscv_sbi.args[0] = -1;
> +}
> +break;

Shouldn't this code use the Semihosting Console API from
"semihosting/console.h" instead?

Re: [PATCH 03/11] multifd: Fill offset and block for reception

2021-11-20 Thread Richard Henderson


On 11/19/21 5:58 PM, Juan Quintela wrote:

We were using the iov directly, but we will need this info on the
following patch.

Signed-off-by: Juan Quintela 
---
  migration/multifd.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 7c9deb1921..e2adcdffa1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -364,6 +364,8 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 offset, block->used_length);
  return -1;
  }
+p->pages->offset[i] = offset;
+p->pages->block = block;
  p->pages->iov[i].iov_base = block->host + offset;
  p->pages->iov[i].iov_len = qemu_target_page_size();
  }



Block should be stored one outside the loop.


r~

Re: [PATCH 06/11] migration: Move iov from pages to params

2021-11-20 Thread Richard Henderson


On 11/19/21 5:58 PM, Juan Quintela wrote:

  static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
 Error **errp)
  {
+MultiFDPages_t *pages = p->pages;
+
+for (int i = 0; i < used; i++) {
+p->iov[p->iovs_used].iov_base = pages->block->host + pages->offset[i];
+p->iov[p->iovs_used].iov_len = qemu_target_page_size();
+p->iovs_used++;
+}
+
  p->next_packet_size = used * qemu_target_page_size();


Compute qemu_target_page_size once in the function.
Hoist p->iovs_used to a local variable around the loop.


@@ -154,7 +162,11 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, 
uint32_t used, Error **errp)
 p->id, flags, MULTIFD_FLAG_NOCOMP);
  return -1;
  }
-return qio_channel_readv_all(p->c, p->pages->iov, used, errp);
+for (int i = 0; i < p->pages->used; i++) {
+p->iov[i].iov_base = p->pages->block->host + p->pages->offset[i];
+p->iov[i].iov_len = qemu_target_page_size();
+}


Similarly.


r~

Re: [RFC PATCH] gdbstub: handle a potentially racing TaskState

2021-11-20 Thread Richard Henderson


On 11/19/21 3:51 PM, Alex Bennée wrote:

When dealing with multi-threaded userspace programs there is a race
condition with the addition of cpu->opaque (aka TaskState). This is
due to cpu_copy calling cpu_create which updates the global vCPU list.
However the task state isn't set until later. This shouldn't be a
problem because the new thread can't have executed anything yet but
the gdbstub code does liberally iterate through the CPU list in
various places.

This sticking plaster ensure the not yet fully realized vCPU is given
an pid of -1 which should be enough to ensure it doesn't show up
anywhere else.

In the longer term I think the code that manages the association
between vCPUs and attached GDB processes could do with a clean-up and
re-factor.

Signed-off-by: Alex Bennée 
Cc: Richard Henderson 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/730
---
  gdbstub.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdbstub.c b/gdbstub.c
index 23baaef40e..141d7bc4ec 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -94,7 +94,7 @@ static inline int cpu_gdb_index(CPUState *cpu)
  {
  #if defined(CONFIG_USER_ONLY)
  TaskState *ts = (TaskState *) cpu->opaque;
-return ts->ts_tid;
+return ts ? ts->ts_tid : -1;
  #else
  return cpu->cpu_index + 1;
  #endif


Tested-by: Richard Henderson 
Reviewed-by: Richard Henderson 


r~

Re: [RFC v2 PATCH 07/13] KVM: Handle page fault for fd based memslot

2021-11-20 Thread Yao Yuan

On Fri, Nov 19, 2021 at 09:47:33PM +0800, Chao Peng wrote:
> Current code assume the private memory is persistent and KVM can check
> with backing store to see if private memory exists at the same address
> by calling get_pfn(alloc=false).
>
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 
> ---
>  arch/x86/kvm/mmu/mmu.c | 75 --
>  1 file changed, 73 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 40377901598b..cd5d1f923694 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3277,6 +3277,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
>   if (max_level == PG_LEVEL_4K)
>   return PG_LEVEL_4K;
>
> + if (memslot_is_memfd(slot))
> + return max_level;
> +
>   host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
>   return min(host_level, max_level);
>  }
> @@ -4555,6 +4558,65 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu 
> *vcpu, gpa_t cr2_or_gpa,
> kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
>  }
>
> +static bool kvm_faultin_pfn_memfd(struct kvm_vcpu *vcpu,
> +   struct kvm_page_fault *fault, int *r)
> +{int order;
> + kvm_pfn_t pfn;
> + struct kvm_memory_slot *slot = fault->slot;
> + bool priv_gfn = kvm_vcpu_is_private_gfn(vcpu, fault->addr >> 
> PAGE_SHIFT);
> + bool priv_slot_exists = memslot_has_private(slot);
> + bool priv_gfn_exists = false;
> + int mem_convert_type;
> +
> + if (priv_gfn && !priv_slot_exists) {
> + *r = RET_PF_INVALID;
> + return true;
> + }
> +
> + if (priv_slot_exists) {
> + pfn = slot->memfd_ops->get_pfn(slot, slot->priv_file,
> +fault->gfn, false, &order);
> + if (pfn >= 0)
> + priv_gfn_exists = true;

Need "fault->pfn = pfn" here if actual pfn is returned in
get_pfn(alloc=false) case for private page case.

> + }
> +
> + if (priv_gfn && !priv_gfn_exists) {
> + mem_convert_type = KVM_EXIT_MEM_MAP_PRIVATE;
> + goto out_convert;
> + }
> +
> + if (!priv_gfn && priv_gfn_exists) {
> + slot->memfd_ops->put_pfn(pfn);
> + mem_convert_type = KVM_EXIT_MEM_MAP_SHARED;
> + goto out_convert;
> + }
> +
> + if (!priv_gfn) {
> + pfn = slot->memfd_ops->get_pfn(slot, slot->file,
> +fault->gfn, true, &order);

Need "fault->pfn = pfn" here, because he pfn for
share page is getted here only.

> + if (fault->pfn < 0) {
> + *r = RET_PF_INVALID;
> + return true;
> + }
> + }
> +
> + if (slot->flags & KVM_MEM_READONLY)
> + fault->map_writable = false;
> + if (order == 0)
> + fault->max_level = PG_LEVEL_4K;
> +
> + return false;
> +
> +out_convert:
> + vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
> + vcpu->run->mem.type = mem_convert_type;
> + vcpu->run->mem.u.map.gpa = fault->gfn << PAGE_SHIFT;
> + vcpu->run->mem.u.map.size = PAGE_SIZE;
> + fault->pfn = -1;
> + *r = -1;
> + return true;
> +}
> +
>  static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault 
> *fault, int *r)
>  {
>   struct kvm_memory_slot *slot = fault->slot;
> @@ -4596,6 +4658,9 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, 
> struct kvm_page_fault *fault,
>   }
>   }
>
> + if (memslot_is_memfd(slot))
> + return kvm_faultin_pfn_memfd(vcpu, fault, r);
> +
>   async = false;
>   fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
> fault->write, &fault->map_writable,
> @@ -4660,7 +4725,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, 
> struct kvm_page_fault *fault
>   else
>   write_lock(&vcpu->kvm->mmu_lock);
>
> - if (fault->slot && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, 
> fault->hva))
> + if (fault->slot && !memslot_is_memfd(fault->slot) &&
> + mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
>   goto out_unlock;
>   r = make_mmu_pages_available(vcpu);
>   if (r)
> @@ -4676,7 +4742,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, 
> struct kvm_page_fault *fault
>   read_unlock(&vcpu->kvm->mmu_lock);
>   else
>   write_unlock(&vcpu->kvm->mmu_lock);
> - kvm_release_pfn_clean(fault->pfn);
> +
> + if (memslot_is_memfd(fault->slot))
> + fault->slot->memfd_ops->put_pfn(fault->pfn);
> + else
> + kvm_release_pfn_clean(fault->pfn);
> +
>   return r;
>  }
>
> --
> 2.17.1
>

[PATCH] MAINTAINERS: Add myself as a reviewer for Hyper-V VMBus

2021-11-20 Thread Maciej S. Szmigiero

From: "Maciej S. Szmigiero" 

This way there is at least some contact point for incoming patches.

We'll see whether the code still gets just a random patch a few times
a year or whether it requires a permanent maintainer to take care of it.

Signed-off-by: Maciej S. Szmigiero 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d3879aa3c12c..7f57e7fda73b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1729,6 +1729,12 @@ F: include/hw/block/fdc.h
 F: tests/qtest/fdc-test.c
 T: git https://gitlab.com/jsnow/qemu.git ide
 
+Hyper-V VMBus
+S: Odd Fixes
+R: Maciej S. Szmigiero 
+F: hw/hyperv/vmbus.c
+F: include/hw/hyperv/vmbus*.h
+
 OMAP
 M: Peter Maydell 
 L: qemu-...@nongnu.org

Re: [PATCH v1 0/9] hw/mos6522: VIA timer emulation fixes and improvements

2021-11-20 Thread Mark Cave-Ayland


On 19/11/2021 08:39, Finn Thain wrote:


On Thu, 18 Nov 2021, Mark Cave-Ayland wrote:



Hi Finn,

I've not forgotten about this series - we're now in 6.2 freeze, but it's
on my TODO list to revisit in the next development cycle this along with
the ESP stress-ng changes which I've also been looking at. As mentioned
in my previous reviews the patch will need some further analysis:
particularly the logic in mos6522_read() that can generate a spurious
interrupt on a register read needs to be removed,


If mos6522 fails to raise its IRQ then the counter value observed by the
guest will not make sense. This relationship was explained in the
description of patch 8. If you have a problem with that patch, please
reply there so that your misapprehension can be placed in context.


It is the existing code in mos6522_read() which is doing the wrong thing here, which 
I mentioned in https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02883.html.



and also testing is required to ensure that these changes don't affect
the CUDA clock warping which allows OS X to boot under qemu-system-ppc.



I'm not expecting any issues. What is required in order to confirm this?
Would it be sufficient to boot a Mac OS X 10.4 installer DVD?


Possibly: I've only been testing various images since after the timing workaround was 
added so I'm not sure exactly what the symptoms are, but the links sent earlier in 
the thread are still valid.



I'm confident that qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) is monotonic,


As I mentioned, it is monotonic here.


Phew :)


since if it were not then there would be huge numbers of complaints from
QEMU users. It appears that Linux can potentially alter the ticks in
mac_read_clk() at
https://github.com/torvalds/linux/blob/master/arch/m68k/mac/via.c#L624
which suggests the issue is related to timer wraparound. I'd like to
confirm exactly which part of your series fixes the specific problem of
the clock jumping backwards before merging these changes.



You really only need a good datasheet to review these patches. You don't
need a deep understanding of any particular guest, and you shouldn't be
targetting any particular guest.

One of the purposes of this patch series is to allow the guest to change
to better exploit actual, physical hardware. Since QEMU regression testing
is part of the kernel development process, regressions need to be avoided.

That means QEMU's shortcomings hinder Linux development.

Therefore, QEMU should not target the via timer driver in Linux v2.6 or
the one in v5.15 or the one in NetBSD etc. QEMU should target correctness
-- especially when that can be had for cheap. Wouldn't you agree?

QEMU deviates in numerous ways from the behaviour of real mos6522 timer.
This patch series does not address all of these deviations (see patch 8).
Instead, this patch series addresses only the most aggregious ones, and
they do impact existing guests.


That is true, but as per the link above there is an existing bug in the mos6522 
device, and the patchset builds on this in its current form, including adding a state 
field which shouldn't be required.


From looking at mac_read_clk() presumably the problem here is that the timer IRQ 
fires on 0 rather than on 0x when it overflows? If so, this should be a very 
small and simple patch. Once these 2 fixes are in place, it will be much easier to 
test the remaining changes.



I realized that I could easily cross-compile a 5.14 kernel to test this
theory with the test root image and .config you supplied at
https://gitlab.com/qemu-project/qemu/-/issues/611 using the QEMU
docker-m68k-cross image to avoid having to build a complete toolchain by
hand. The kernel builds successfully using this method, but during boot
it hangs sending the first SCSI CDB to the ESP device, failing to send
the last byte using PDMA.

Are there known issues cross-compiling an m68k kernel on an x86 host?


Not that I'm aware of.


Or are your current kernels being built from a separate branch outside
of mainline Linux git?


I use mainline Linux when testing QEMU. I provided a mainline build,
attached to the same bug report, so you don't have to build it.


The problem here is that I have no way to reproduce your results and test any patches 
other than to try and build a kernel with your extra warning and run it. Even then I 
don't know how long to wait for the clock to jump, how much it jumps by, or if there 
is anything else that needs to be done to trigger the warning.


Any help with being able to build a working cross-m68k kernel to test this would be 
gratefully received, otherwise I don't feel I have enough knowledge of the m68k 
kernel to be able to validate the fixes and review the changes in order to merge them.



ATB,

Mark.

Re: [PATCH v1 03/12] target/riscv: Implement function kvm_arch_init_vcpu

2021-11-20 Thread Richard Henderson


On 11/20/21 8:46 AM, Yifei Jiang wrote:

+id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG, 
KVM_REG_RISCV_CONFIG_REG(isa));
+ret = kvm_get_one_reg(cs, id, &isa);
+if (ret) {
+return ret;
+}
+env->misa_mxl |= isa;


This doesn't look right.
I'm sure you meant

env->misa_ext = isa;


r~

Re: [PATCH v1 12/12] target/riscv: Support virtual time context synchronization

2021-11-20 Thread Richard Henderson


On 11/20/21 8:46 AM, Yifei Jiang wrote:

  const VMStateDescription vmstate_riscv_cpu = {
  .name = "cpu",
  .version_id = 3,
  .minimum_version_id = 3,
+.post_load = cpu_post_load,
  .fields = (VMStateField[]) {
  VMSTATE_UINTTL_ARRAY(env.gpr, RISCVCPU, 32),
  VMSTATE_UINT64_ARRAY(env.fpr, RISCVCPU, 32),
@@ -211,6 +221,10 @@ const VMStateDescription vmstate_riscv_cpu = {
  VMSTATE_UINT64(env.mtohost, RISCVCPU),
  VMSTATE_UINT64(env.timecmp, RISCVCPU),
  
+VMSTATE_UINT64(env.kvm_timer_time, RISCVCPU),

+VMSTATE_UINT64(env.kvm_timer_compare, RISCVCPU),
+VMSTATE_UINT64(env.kvm_timer_state, RISCVCPU),
+
  VMSTATE_END_OF_LIST()
  },


Can't alter VMStateDescription.fields without bumping version.

If this is really kvm-only state, consider placing it into a subsection.  But I worry 
about kvm-only state because ideally we'd be able to migrate between tcg and kvm (if only 
for debugging).



r~

Re: [PATCH v1 0/9] hw/mos6522: VIA timer emulation fixes and improvements

2021-11-20 Thread Finn Thain

On Sat, 20 Nov 2021, Mark Cave-Ayland wrote:

> On 19/11/2021 08:39, Finn Thain wrote:
> 
> > On Thu, 18 Nov 2021, Mark Cave-Ayland wrote:
> > 
> >>
> >> Hi Finn,
> >>
> >> I've not forgotten about this series - we're now in 6.2 freeze, but it's
> >> on my TODO list to revisit in the next development cycle this along with
> >> the ESP stress-ng changes which I've also been looking at. As mentioned
> >> in my previous reviews the patch will need some further analysis:
> >> particularly the logic in mos6522_read() that can generate a spurious
> >> interrupt on a register read needs to be removed,
> > 
> > If mos6522 fails to raise its IRQ then the counter value observed by the
> > guest will not make sense. This relationship was explained in the
> > description of patch 8. If you have a problem with that patch, please
> > reply there so that your misapprehension can be placed in context.
> 
> It is the existing code in mos6522_read() which is doing the wrong thing here,
> which I mentioned in
> https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02883.html.
> 

How disingenous. I responded to that message 2 months ago and you 
completely ignored my response.

Basically, you found a bug in your own modified version of mainline code, 
and you claimed that this is somehow relevant to this patch series. 
(Apparently you failed to find that bug in my code.)

Once again, if you have an objection to existing code in mainline QEMU, 
please take it up with the author of that code (commit cd8843ff25) which 
is Laurent.

This patch series is a separate issue. It doesn't add anything 
objectionable (commit cd8843ff25 was not objected to), it fixes bugs, 
improves emulation fidelity, improves performance and reduces guest 
malfunctions.

> 
> That is true, but as per the link above there is an existing bug in the 
> mos6522 device, and the patchset builds on this in its current form, 
> including adding a state field which shouldn't be required.
> 

The state enum is required for the oneshot feature (patch 9). It is also 
needed to produce the correct relationship between irq and counter (patch 
8), and between interrupt flag set and clear operations. Finally, it is 
also useful for debugging purposes.

> From looking at mac_read_clk() presumably the problem here is that the 
> timer IRQ fires on 0 rather than on 0x when it overflows? 

Guests depend on the correct relationship between timer irq flag and timer 
counter value. If QEMU can't get that right, the Linux clocksource can't 
work without race conditions. This problem is almost certain to affect 
other guests too.

You are being foolish to insist that this is somehow a Linux quirk.

> If so, this should be a very small and simple patch. Once these 2 fixes 
> are in place, it will be much easier to test the remaining changes.
> 
> >> I realized that I could easily cross-compile a 5.14 kernel to test 
> >> this theory with the test root image and .config you supplied at 
> >> https://gitlab.com/qemu-project/qemu/-/issues/611 using the QEMU 
> >> docker-m68k-cross image to avoid having to build a complete toolchain 
> >> by hand. The kernel builds successfully using this method, but during 
> >> boot it hangs sending the first SCSI CDB to the ESP device, failing 
> >> to send the last byte using PDMA.
> >>
> >> Are there known issues cross-compiling an m68k kernel on an x86 host?
> > 
> > Not that I'm aware of.
> > 
> >> Or are your current kernels being built from a separate branch 
> >> outside of mainline Linux git?
> >>
> > I use mainline Linux when testing QEMU. I provided a mainline build, 
> > attached to the same bug report, so you don't have to build it.
> 
> The problem here is that I have no way to reproduce your results and 
> test any patches other than to try and build a kernel with your extra 
> warning and run it.

Nonsense. Any programmer can easily observe the gettimeofday problem. Just 
run it in a loop. (How else would you establish monotonicity?)

Similarly, anyone who can understand mos6522.c and can read patches could 
easily add assertions to establish any of the deficiencies claimed in 
these patches.

The problem isn't that you "have no way to reproduce results". The problem 
is that you are unwilling or unable to understand the datasheet and the 
patches.

> Even then I don't know how long to wait for the clock to jump, how much 
> it jumps by, or if there is anything else that needs to be done to 
> trigger the warning.
> 
> Any help with being able to build a working cross-m68k kernel to test 
> this would be gratefully received, otherwise I don't feel I have enough 
> knowledge of the m68k kernel to be able to validate the fixes and review 
> the changes in order to merge them.
> 

I've already helped you by supplying a mainline vmlinux binary. But you 
don't even need that. If you don't believe what I've said about mos6522.c 
behaviour, just install Debian/m68k.

Anyway, thanks for taking the time to write. A competent reviewe

Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST

2021-11-20 Thread Jason Gunthorpe

On Sat, Nov 20, 2021 at 01:23:16AM +, Sean Christopherson wrote:
> On Fri, Nov 19, 2021, Jason Gunthorpe wrote:
> > On Fri, Nov 19, 2021 at 10:21:39PM +, Sean Christopherson wrote:
> > > On Fri, Nov 19, 2021, Jason Gunthorpe wrote:
> > > > On Fri, Nov 19, 2021 at 07:18:00PM +, Sean Christopherson wrote:
> > > > > No ideas for the kernel API, but that's also less concerning since
> > > > > it's not set in stone.  I'm also not sure that dedicated APIs for
> > > > > each high-ish level use case would be a bad thing, as the semantics
> > > > > are unlikely to be different to some extent.  E.g. for the KVM use
> > > > > case, there can be at most one guest associated with the fd, but
> > > > > there can be any number of VFIO devices attached to the fd.
> > > > 
> > > > Even the kvm thing is not a hard restriction when you take away
> > > > confidential compute.
> > > > 
> > > > Why can't we have multiple KVMs linked to the same FD if the memory
> > > > isn't encrypted? Sure it isn't actually useful but it should work
> > > > fine.
> > > 
> > > Hmm, true, but I want the KVM semantics to be 1:1 even if memory
> > > isn't encrypted.
> > 
> > That is policy and it doesn't belong hardwired into the kernel.
> 
> Agreed.  I had a blurb typed up about that policy just being an "exclusive" 
> flag
> in the kernel API that KVM would set when creating a confidential
> VM,

I still think that is policy in the kernel, what is wrong with
userspace doing it?

> > Your explanation makes me think that the F_SEAL_XX isn't defined
> > properly. It should be a userspace trap door to prevent any new
> > external accesses, including establishing new kvms, iommu's, rdmas,
> > mmaps, read/write, etc.
> 
> Hmm, the way I was thinking of it is that it the F_SEAL_XX itself would 
> prevent
> mapping/accessing it from userspace, and that any policy beyond that would be
> done via kernel APIs and thus handled by whatever in-kernel agent can access 
> the
> memory.  E.g. in the confidential VM case, without support for trusted 
> devices,
> KVM would require that it be the sole owner of the file.

And how would kvm know if there is support for trusted devices?
Again seems like policy choices that should be left in userspace.

Especially for what could be a general in-kernel mechanism with many
users and not tightly linked to KVM as imagined here.

Jason

38 matches

Mail list logo