date:20220208

[PATCH v6 3/8] tcg/sparc: Add scratch argument to tcg_out_movi_int

2022-02-08 Thread Richard Henderson

This will allow us to control exactly what scratch register is
used for loading the constant.

Signed-off-by: Richard Henderson 
---
 tcg/sparc/tcg-target.c.inc | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 576903e0d8..7b970d58e3 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -428,7 +428,8 @@ static void tcg_out_movi_imm32(TCGContext *s, TCGReg ret, 
int32_t arg)
 }
 
 static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
- tcg_target_long arg, bool in_prologue)
+ tcg_target_long arg, bool in_prologue,
+ TCGReg scratch)
 {
 tcg_target_long hi, lo = (int32_t)arg;
 tcg_target_long test, lsb;
@@ -483,16 +484,17 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, 
TCGReg ret,
 } else {
 hi = arg >> 32;
 tcg_out_movi_imm32(s, ret, hi);
-tcg_out_movi_imm32(s, TCG_REG_T2, lo);
+tcg_out_movi_imm32(s, scratch, lo);
 tcg_out_arithi(s, ret, ret, 32, SHIFT_SLLX);
-tcg_out_arith(s, ret, ret, TCG_REG_T2, ARITH_OR);
+tcg_out_arith(s, ret, ret, scratch, ARITH_OR);
 }
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg)
 {
-tcg_out_movi_int(s, type, ret, arg, false);
+tcg_debug_assert(ret != TCG_REG_T2);
+tcg_out_movi_int(s, type, ret, arg, false, TCG_REG_T2);
 }
 
 static void tcg_out_ldst_rr(TCGContext *s, TCGReg data, TCGReg a1,
@@ -847,7 +849,7 @@ static void tcg_out_call_nodelay(TCGContext *s, const 
tcg_insn_unit *dest,
 } else {
 uintptr_t desti = (uintptr_t)dest;
 tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_REG_T1,
- desti & ~0xfff, in_prologue);
+ desti & ~0xfff, in_prologue, TCG_REG_O7);
 tcg_out_arithi(s, TCG_REG_O7, TCG_REG_T1, desti & 0xfff, JMPL);
 }
 }
@@ -1023,7 +1025,8 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 #ifndef CONFIG_SOFTMMU
 if (guest_base != 0) {
-tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base, 
true);
+tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG,
+ guest_base, true, TCG_REG_T1);
 tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
 }
 #endif
-- 
2.25.1

Re: [PATCH v9 00/23] QEMU RISC-V AIA support

2022-02-08 Thread Anup Patel

On Tue, Feb 8, 2022 at 12:27 PM Alistair Francis  wrote:
>
> On Tue, Feb 8, 2022 at 2:16 PM Alistair Francis  wrote:
> >
> > On Sat, Feb 5, 2022 at 3:47 AM Anup Patel  wrote:
> > >
> > > From: Anup Patel 
> > >
> > > The advanced interrupt architecture (AIA) extends the per-HART local
> > > interrupt support. Along with this, it also adds IMSIC (MSI contrllor)
> > > and Advanced PLIC (wired interrupt controller).
> > >
> > > The latest AIA draft specification can be found here:
> > > https://github.com/riscv/riscv-aia/releases/download/0.2-draft.28/riscv-interrupts-028.pdf
> > >
> > > This series adds RISC-V AIA support in QEMU which includes emulating all
> > > AIA local CSRs, APLIC, and IMSIC. Only AIA local interrupt filtering is
> > > not implemented because we don't have any local interrupt greater than 12.
> > >
> > > To enable AIA in QEMU, use one of the following:
> > > 1) Only AIA local interrupt CSRs: Pass "x-aia=true" as CPU paramenter
> > >in the QEMU command-line
> > > 2) Only APLIC for virt machine: Pass "aia=aplic" as machine parameter
> > >in the QEMU command-line
> > > 3) Both APLIC and IMSIC for virt machine: Pass "aia=aplic-imsic" as
> > >machine parameter in the QEMU command-line
> > > 4) Both APLIC and IMSIC with 2 guest files for virt machine: Pass
> > >"aia=aplic-imsic,aia-guests=2" as machine parameter in the QEMU
> > >command-line
> > >
> > > To test series, we require OpenSBI and Linux with AIA support which can
> > > be found in:
> > > riscv_aia_v2 branch at https://github.com/avpatel/opensbi.git
> > > riscv_aia_v1 branch at https://github.com/avpatel/linux.git
> > >
> > > This series can be found riscv_aia_v9 branch at:
> > > https://github.com/avpatel/qemu.git
> > >
> > > Changes since v8:
> > >  - Use error_setg() in riscv_imsic_realize() added by PATCH20
> > >
> > > Changes since v7:
> > >  - Rebased on latest riscv-to-apply.next branch of Alistair's repo
> > >  - Improved default priority assignment in PATCH9
> > >
> > > Changes since v6:
> > >  - Fixed priority comparison in riscv_cpu_pending_to_irq() of PATCH9
> > >  - Fixed typos in comments added by PATCH11
> > >  - Added "pend = true;" for CSR_MSETEIPNUM case of rmw_xsetclreinum()
> > >in PATCH15
> > >  - Handle ithreshold == 0 case in riscv_aplic_idc_topi() of PATCH18
> > >  - Allow setting pending bit for Level0 or Level1 interrupts in
> > >riscv_aplic_set_pending() of PATCH18
> > >  - Force DOMAINCFG[31:24] bits to 0x80 in riscv_aplic_read() of PATCH18
> > >  - For APLIC direct mode, set target.iprio to 1 when zero is writtern
> > >in PATCH18
> > >  - Handle eithreshold == 0 case in riscv_imsic_topei() of PATCH20
> > >
> > > Changes since v5:
> > >  - Moved VSTOPI_NUM_SRCS define to top of the file in PATCH13
> > >  - Fixed typo in PATCH16
> > >
> > > Changes since v4:
> > >  - Changed IRQ_LOCAL_MAX to 16 in PATCH2
> > >  - Fixed typo in PATCH10
> > >  - Replaced TARGET_LONG_BITS with riscv_cpu_mxl_bits(env) in PATCH11
> > >  - Replaced TARGET_LONG_BITS with riscv_cpu_mxl_bits(env) in PATCH14
> > >  - Replaced TARGET_LONG_BITS with riscv_cpu_mxl_bits(env) in PATCH15
> > >  - Replaced TARGET_LONG_BITS with xlen passed via ireg callback in PATCH20
> > >  - Retrict maximum IMSIC guest files per-HART of virt machine to 7 in
> > >PATCH21.
> > >  - Added separate PATCH23 to increase maximum number of allowed CPUs
> > >for virt machine
> > >
> > > Changes since v3:
> > >  - Replaced "aplic,xyz" and "imsic,xyz" DT properties with "riscv,xyz"
> > >DT properties because "aplic" and "imsic" are not valid vendor names
> > >required by Linux DT schema checker.
> > >
> > > Changes since v2:
> > >  - Update PATCH4 to check and inject interrupt after V=1 when
> > >transitioning from V=0 to V=1
> > >
> > > Changes since v1:
> > >  - Revamped whole series and created more granular patches
> > >  - Added HGEIE and HGEIP CSR emulation for H-extension
> > >  - Added APLIC emulation
> > >  - Added IMSIC emulation
> > >
> > > Anup Patel (23):
> > >   target/riscv: Fix trap cause for RV32 HS-mode CSR access from RV64
> > > HS-mode
> > >   target/riscv: Implement SGEIP bit in hip and hie CSRs
> > >   target/riscv: Implement hgeie and hgeip CSRs
> > >   target/riscv: Improve delivery of guest external interrupts
> > >   target/riscv: Allow setting CPU feature from machine/device emulation
> > >   target/riscv: Add AIA cpu feature
> > >   target/riscv: Add defines for AIA CSRs
> > >   target/riscv: Allow AIA device emulation to set ireg rmw callback
> > >   target/riscv: Implement AIA local interrupt priorities
> > >   target/riscv: Implement AIA CSRs for 64 local interrupts on RV32
> > >   target/riscv: Implement AIA hvictl and hviprioX CSRs
> > >   target/riscv: Implement AIA interrupt filtering CSRs
> > >   target/riscv: Implement AIA mtopi, stopi, and vstopi CSRs
> > >   target/riscv: Implement AIA xiselect and xireg CSRs
> > >   target/riscv: Implement AIA IMSIC interface CSRs
> > >

[PATCH 4/5] linux-user: Move sparc/host-signal.h to sparc64/host-signal.h

2022-02-08 Thread Richard Henderson

We do not support sparc32 as a host, so there's no point in
sparc64 redirecting to sparc.

Signed-off-by: Richard Henderson 
---
 linux-user/include/host/sparc/host-signal.h   | 71 ---
 linux-user/include/host/sparc64/host-signal.h | 64 -
 2 files changed, 63 insertions(+), 72 deletions(-)
 delete mode 100644 linux-user/include/host/sparc/host-signal.h

diff --git a/linux-user/include/host/sparc/host-signal.h 
b/linux-user/include/host/sparc/host-signal.h
deleted file mode 100644
index 871b6bb269..00
--- a/linux-user/include/host/sparc/host-signal.h
+++ /dev/null
@@ -1,71 +0,0 @@
-/*
- * host-signal.h: signal info dependent on the host architecture
- *
- * Copyright (c) 2003-2005 Fabrice Bellard
- * Copyright (c) 2021 Linaro Limited
- *
- * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#ifndef SPARC_HOST_SIGNAL_H
-#define SPARC_HOST_SIGNAL_H
-
-/* FIXME: the third argument to a SA_SIGINFO handler is *not* ucontext_t. */
-typedef ucontext_t host_sigcontext;
-
-static inline uintptr_t host_signal_pc(host_sigcontext *uc)
-{
-#ifdef __arch64__
-return uc->uc_mcontext.mc_gregs[MC_PC];
-#else
-return uc->uc_mcontext.gregs[REG_PC];
-#endif
-}
-
-static inline void host_signal_set_pc(host_sigcontext *uc, uintptr_t pc)
-{
-#ifdef __arch64__
-uc->uc_mcontext.mc_gregs[MC_PC] = pc;
-#else
-uc->uc_mcontext.gregs[REG_PC] = pc;
-#endif
-}
-
-static inline void *host_signal_mask(host_sigcontext *uc)
-{
-return &uc->uc_sigmask;
-}
-
-static inline bool host_signal_write(siginfo_t *info, host_sigcontext *uc)
-{
-uint32_t insn = *(uint32_t *)host_signal_pc(uc);
-
-if ((insn >> 30) == 3) {
-switch ((insn >> 19) & 0x3f) {
-case 0x05: /* stb */
-case 0x15: /* stba */
-case 0x06: /* sth */
-case 0x16: /* stha */
-case 0x04: /* st */
-case 0x14: /* sta */
-case 0x07: /* std */
-case 0x17: /* stda */
-case 0x0e: /* stx */
-case 0x1e: /* stxa */
-case 0x24: /* stf */
-case 0x34: /* stfa */
-case 0x27: /* stdf */
-case 0x37: /* stdfa */
-case 0x26: /* stqf */
-case 0x36: /* stqfa */
-case 0x25: /* stfsr */
-case 0x3c: /* casa */
-case 0x3e: /* casxa */
-return true;
-}
-}
-return false;
-}
-
-#endif
diff --git a/linux-user/include/host/sparc64/host-signal.h 
b/linux-user/include/host/sparc64/host-signal.h
index 1191fe2d40..f8a8a4908d 100644
--- a/linux-user/include/host/sparc64/host-signal.h
+++ b/linux-user/include/host/sparc64/host-signal.h
@@ -1 +1,63 @@
-#include "../sparc/host-signal.h"
+/*
+ * host-signal.h: signal info dependent on the host architecture
+ *
+ * Copyright (c) 2003-2005 Fabrice Bellard
+ * Copyright (c) 2021 Linaro Limited
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef SPARC64_HOST_SIGNAL_H
+#define SPARC64_HOST_SIGNAL_H
+
+/* FIXME: the third argument to a SA_SIGINFO handler is *not* ucontext_t. */
+typedef ucontext_t host_sigcontext;
+
+static inline uintptr_t host_signal_pc(host_sigcontext *uc)
+{
+return uc->uc_mcontext.mc_gregs[MC_PC];
+}
+
+static inline void host_signal_set_pc(host_sigcontext *uc, uintptr_t pc)
+{
+uc->uc_mcontext.mc_gregs[MC_PC] = pc;
+}
+
+static inline void *host_signal_mask(host_sigcontext *uc)
+{
+return &uc->uc_sigmask;
+}
+
+static inline bool host_signal_write(siginfo_t *info, host_sigcontext *uc)
+{
+uint32_t insn = *(uint32_t *)host_signal_pc(uc);
+
+if ((insn >> 30) == 3) {
+switch ((insn >> 19) & 0x3f) {
+case 0x05: /* stb */
+case 0x15: /* stba */
+case 0x06: /* sth */
+case 0x16: /* stha */
+case 0x04: /* st */
+case 0x14: /* sta */
+case 0x07: /* std */
+case 0x17: /* stda */
+case 0x0e: /* stx */
+case 0x1e: /* stxa */
+case 0x24: /* stf */
+case 0x34: /* stfa */
+case 0x27: /* stdf */
+case 0x37: /* stdfa */
+case 0x26: /* stqf */
+case 0x36: /* stqfa */
+case 0x25: /* stfsr */
+case 0x3c: /* casa */
+case 0x3e: /* casxa */
+return true;
+}
+}
+return false;
+}
+
+#endif
-- 
2.25.1

[PATCH v6 6/8] tcg/sparc: Use the constant pool for 64-bit constants

2022-02-08 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/sparc/tcg-target.c.inc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index ae809c9941..21b0dd6734 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -336,6 +336,13 @@ static bool patch_reloc(tcg_insn_unit *src_rw, int type,
 insn &= ~INSN_OFF19(-1);
 insn |= INSN_OFF19(pcrel);
 break;
+case R_SPARC_13:
+if (!check_fit_ptr(value, 13)) {
+return false;
+}
+insn &= ~INSN_IMM13(-1);
+insn |= INSN_IMM13(value);
+break;
 default:
 g_assert_not_reached();
 }
@@ -479,6 +486,14 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, 
TCGReg ret,
 return;
 }
 
+/* Use the constant pool, if possible. */
+if (!in_prologue && USE_REG_TB) {
+new_pool_label(s, arg, R_SPARC_13, s->code_ptr,
+   tcg_tbrel_diff(s, NULL));
+tcg_out32(s, LDX | INSN_RD(ret) | INSN_RS1(TCG_REG_TB));
+return;
+}
+
 /* A 64-bit constant decomposed into 2 32-bit pieces.  */
 if (check_fit_i32(lo, 13)) {
 hi = (arg - lo) >> 32;
-- 
2.25.1

[PATCH v6 7/8] tcg/sparc: Add tcg_out_jmpl_const for better tail calls

2022-02-08 Thread Richard Henderson

Due to mapping changes, we now rarely place the code_gen_buffer
near the main executable.  Which means that direct calls will
now rarely be in range.

So, always use indirect calls for tail calls, which allows us to
avoid clobbering %o7, and therefore we need not save and restore it.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/sparc/tcg-target.c.inc | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 21b0dd6734..ed83e2dcd7 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -858,6 +858,19 @@ static void tcg_out_addsub2_i64(TCGContext *s, TCGReg rl, 
TCGReg rh,
 tcg_out_mov(s, TCG_TYPE_I64, rl, tmp);
 }
 
+static void tcg_out_jmpl_const(TCGContext *s, const tcg_insn_unit *dest,
+   bool in_prologue, bool tail_call)
+{
+uintptr_t desti = (uintptr_t)dest;
+
+/* Be careful not to clobber %o7 for a tail call. */
+tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_REG_T1,
+ desti & ~0xfff, in_prologue,
+ tail_call ? TCG_REG_G2 : TCG_REG_O7);
+tcg_out_arithi(s, tail_call ? TCG_REG_G0 : TCG_REG_O7,
+   TCG_REG_T1, desti & 0xfff, JMPL);
+}
+
 static void tcg_out_call_nodelay(TCGContext *s, const tcg_insn_unit *dest,
  bool in_prologue)
 {
@@ -866,10 +879,7 @@ static void tcg_out_call_nodelay(TCGContext *s, const 
tcg_insn_unit *dest,
 if (disp == (int32_t)disp) {
 tcg_out32(s, CALL | (uint32_t)disp >> 2);
 } else {
-uintptr_t desti = (uintptr_t)dest;
-tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_REG_T1,
- desti & ~0xfff, in_prologue, TCG_REG_O7);
-tcg_out_arithi(s, TCG_REG_O7, TCG_REG_T1, desti & 0xfff, JMPL);
+tcg_out_jmpl_const(s, dest, in_prologue, false);
 }
 }
 
@@ -960,11 +970,10 @@ static void build_trampolines(TCGContext *s)
 
 /* Set the retaddr operand.  */
 tcg_out_mov(s, TCG_TYPE_PTR, ra, TCG_REG_O7);
-/* Set the env operand.  */
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O0, TCG_AREG0);
 /* Tail call.  */
-tcg_out_call_nodelay(s, qemu_ld_helpers[i], true);
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O7, ra);
+tcg_out_jmpl_const(s, qemu_ld_helpers[i], true, true);
+/* delay slot -- set the env argument */
+tcg_out_mov_delay(s, TCG_REG_O0, TCG_AREG0);
 }
 
 for (i = 0; i < ARRAY_SIZE(qemu_st_helpers); ++i) {
@@ -1006,14 +1015,14 @@ static void build_trampolines(TCGContext *s)
 if (ra >= TCG_REG_O6) {
 tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_O7, TCG_REG_CALL_STACK,
TCG_TARGET_CALL_STACK_OFFSET);
-ra = TCG_REG_G1;
+} else {
+tcg_out_mov(s, TCG_TYPE_PTR, ra, TCG_REG_O7);
 }
-tcg_out_mov(s, TCG_TYPE_PTR, ra, TCG_REG_O7);
-/* Set the env operand.  */
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O0, TCG_AREG0);
+
 /* Tail call.  */
-tcg_out_call_nodelay(s, qemu_st_helpers[i], true);
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O7, ra);
+tcg_out_jmpl_const(s, qemu_st_helpers[i], true, true);
+/* delay slot -- set the env argument */
+tcg_out_mov_delay(s, TCG_REG_O0, TCG_AREG0);
 }
 }
 #endif
-- 
2.25.1

Re: [PATCH 4/5] vduse-blk: Add vduse-blk resize support

2022-02-08 Thread Yongji Xie

On Mon, Feb 7, 2022 at 10:18 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jan 25, 2022 at 09:17:59PM +0800, Xie Yongji wrote:
> > To support block resize, this uses vduse_dev_update_config()
> > to update the capacity field in configuration space and inject
> > config interrupt on the block resize callback.
> >
> > Signed-off-by: Xie Yongji 
> > ---
> >  block/export/vduse-blk.c | 19 +++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
> > index 5a8d289685..83845e9a9a 100644
> > --- a/block/export/vduse-blk.c
> > +++ b/block/export/vduse-blk.c
> > @@ -297,6 +297,23 @@ static void blk_aio_detach(void *opaque)
> >  vblk_exp->export.ctx = NULL;
> >  }
> >
> > +static void vduse_blk_resize(void *opaque)
> > +{
> > +BlockExport *exp = opaque;
> > +VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
> > +struct virtio_blk_config config;
> > +
> > +config.capacity =
> > +cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
> > +vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
> > +offsetof(struct virtio_blk_config, capacity),
> > +(char *)&config.capacity);
> > +}
> > +
> > +static const BlockDevOps vduse_block_ops = {
> > +.resize_cb = vduse_blk_resize,
> > +};
> > +
> >  static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
> >  Error **errp)
> >  {
> > @@ -387,6 +404,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
> > BlockExportOptions *opts,
> >  blk_add_aio_context_notifier(exp->blk, blk_aio_attached, 
> > blk_aio_detach,
> >   vblk_exp);
> >
> > +blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
>
> Detach is missing, so BlockBackend->dev_ops will become stale after the
> export is deleted. Please add code to detach when the export is deleted.

OK.

Thanks,
Yongji

Re: [PATCH v12 1/5] accel/kvm/kvm-all: refactor per-vcpu dirty ring reaping

2022-02-08 Thread Peter Xu

On Mon, Jan 24, 2022 at 10:10:36PM +0800, huang...@chinatelecom.cn wrote:
> @@ -2956,7 +2959,7 @@ int kvm_cpu_exec(CPUState *cpu)
>   */
>  trace_kvm_dirty_ring_full(cpu->cpu_index);
>  qemu_mutex_lock_iothread();
> -kvm_dirty_ring_reap(kvm_state);
> +kvm_dirty_ring_reap(kvm_state, cpu);

Shall we keep passing in NULL in this patch, and make it conditionally taking
cpu parameter if dirty limit enabled?

Ring reset can still be expensive, so ideally we can still try the best to reap
as much PFNs as possible, as long as we still don't need accuracy on RING_FULL
exit events.

>  qemu_mutex_unlock_iothread();
>  ret = 0;
>  break;
> -- 
> 1.8.3.1
> 

-- 
Peter Xu

Re: [PATCH 08/11] mos6522: add "info via" HMP command for debugging

2022-02-08 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 7/2/22 20:34, Peter Maydell wrote:
>> On Thu, 27 Jan 2022 at 21:03, Mark Cave-Ayland
>>  wrote:
>>>
>>> This displays detailed information about the device registers and timers to 
>>> aid
>>> debugging problems with timers and interrupts.
>>>
>>> Signed-off-by: Mark Cave-Ayland 
>>> ---
>>>   hmp-commands-info.hx | 12 ++
>>>   hw/misc/mos6522.c| 92 
>>>   2 files changed, 104 insertions(+)
>> 
>> I'm not sure how keen we are on adding new device-specific
>> HMP info commands, but it's not my area of expertise. Markus ?
>
> HMP is David :)

Yes.

> IIRC it is OK as long as HMP is a QMP wrapper.

That's "how to do it", and I'll get back to it in a jiffie, but Peter
was wondering about the "whether to do it".

Most HMP commands are always there.

We have a few specific to compile-time configurable features: TCG, VNC,
Spice, Slirp, Linux.  Does not apply here.

We have a few specific to targets, such as dump-skeys and info skeys for
s390.  Target-specific is not quite the same as device-specific.

We have no device-specific commands so far.  However, dump-skeys and
info skeys appear to be about the skeys *device*, not the s390 target.
Perhaps any s390 target has such a device?  I don't know.  My point is
we already have device-specific commands, they're just masquerading as
target-specific commands.

The proposed device-specific command uses a mechanism originally made
for modules instead (more on that below).

I think we should make up our minds which way we want device-specific
commands done, then do *all* of them that way.

On to "how to do it", part 1.

Most of the time, the command handler is declared with the command in
hmp-commands{,-info}.hx, possibly with compile-time conditionals.

But it can also be left null there, and set with monitor_register_hmp()
or monitor_register_hmp_info_hrt().  This is intended for modules; see
commit f0e48cbd791^..bca6eb34f03.

Aside: can modules be unloaded?  If yes, we better zap the handler
then.

The proposed "info via" uses monitor_register_hmp_info_hrt().  No
objection from me, requires David's ACK.

"How to do it", part 2, in reply to Philippe's remark.

Ideally, HMP commands wrap around QMP commands, but we accept exceptions
for certain use cases where the wrapping is more trouble than it's
worth, with justification.  I've explained this several times, and I'm
happy to dig up a reference or explain it again if there's a need.

Justifying an exception is bothersome, too.  Daniel Berrangé recently
created a way to reduce the wrapping trouble (merge commit
e86e00a2493).  The proposed "info via" makes use of it.

>> (patch below for context)
>> thanks
>> -- PMM
>> 
>>>
>>> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
>>> index e90f20a107..4e714e79a2 100644
>>> --- a/hmp-commands-info.hx
>>> +++ b/hmp-commands-info.hx
>>> @@ -879,3 +879,15 @@ SRST
>>> ``info sgx``
>>>   Show intel SGX information.
>>>   ERST
>>> +
>>> +{
>>> +.name   = "via",
>>> +.args_type  = "",
>>> +.params = "",
>>> +.help   = "show guest 6522 VIA devices",
>>> +},
>>> +
>>> +SRST
>>> +  ``info via``
>>> +Show guest 6522 VIA devices.
>>> +ERST

Should this be conditional on the targets where we actually link the
device, like info skeys?

[...]

[RFC 1/8] ioregionfd: introduce a syscall and memory API

2022-02-08 Thread Elena Ufimtseva

Signed-off-by: Elena Ufimtseva 
---
 include/exec/memory.h |  50 +++
 include/sysemu/kvm.h  |  15 +
 linux-headers/linux/kvm.h |  25 
 accel/kvm/kvm-all.c   | 132 ++
 accel/stubs/kvm-stub.c|   1 +
 5 files changed, 223 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27377..2ce7f35cc2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -712,6 +712,7 @@ void 
ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
+typedef struct MemoryRegionIoregionfd MemoryRegionIoregionfd;
 
 /** MemoryRegion:
  *
@@ -756,6 +757,8 @@ struct MemoryRegion {
 const char *name;
 unsigned ioeventfd_nb;
 MemoryRegionIoeventfd *ioeventfds;
+unsigned ioregionfd_nb;
+MemoryRegionIoregionfd *ioregionfds;
 RamDiscardManager *rdm; /* Only for RAM */
 };
 
@@ -974,6 +977,38 @@ struct MemoryListener {
  */
 void (*eventfd_del)(MemoryListener *listener, MemoryRegionSection *section,
 bool match_data, uint64_t data, EventNotifier *e);
+/**
+ * @ioregionfd_add:
+ *
+ * Called during an address space update transaction,
+ * for a section of the address space that has had a new ioregionfd
+ * registration since the last transaction.
+ *
+ * @listener: The #MemoryListener.
+ * @section: The new #MemoryRegionSection.
+ * @data: The @data parameter for the new ioregionfd.
+ * @fd: The file descriptor parameter for the new ioregionfd.
+ */
+void (*ioregionfd_add)(MemoryListener *listener,
+   MemoryRegionSection *section,
+   uint64_t data, int fd);
+
+/**
+ * @ioregionfd_del:
+ *
+ * Called during an address space update transaction,
+ * for a section of the address space that has dropped an ioregionfd
+ * registration since the last transaction.
+ *
+ * @listener: The #MemoryListener.
+ * @section: The new #MemoryRegionSection.
+ * @data: The @data parameter for the dropped ioregionfd.
+ * @fd: The file descriptor parameter for the dropped ioregionfd.
+ */
+void (*ioregionfd_del)(MemoryListener *listener,
+   MemoryRegionSection *section,
+   uint64_t data, int fd);
+
 
 /**
  * @coalesced_io_add:
@@ -1041,6 +1076,8 @@ struct AddressSpace {
 
 int ioeventfd_nb;
 struct MemoryRegionIoeventfd *ioeventfds;
+int ioregionfd_nb;
+struct MemoryRegionIoregionfd *ioregionfds;
 QTAILQ_HEAD(, MemoryListener) listeners;
 QTAILQ_ENTRY(AddressSpace) address_spaces_link;
 };
@@ -2175,6 +2212,19 @@ void memory_region_del_eventfd(MemoryRegion *mr,
uint64_t data,
EventNotifier *e);
 
+void memory_region_add_ioregionfd(MemoryRegion *mr,
+  hwaddr addr,
+  unsigned size,
+  uint64_t data,
+  int fd,
+  bool pio);
+
+void memory_region_del_ioregionfd(MemoryRegion *mr,
+  hwaddr addr,
+  unsigned size,
+  uint64_t data,
+  int fd);
+
 /**
  * memory_region_add_subregion: Add a subregion to a container.
  *
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 7b22aeb6ae..fea77b5185 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -46,6 +46,7 @@ extern bool kvm_readonly_mem_allowed;
 extern bool kvm_direct_msi_allowed;
 extern bool kvm_ioeventfd_any_length_allowed;
 extern bool kvm_msi_use_devid;
+extern bool kvm_ioregionfds_allowed;
 
 #define kvm_enabled()   (kvm_allowed)
 /**
@@ -167,6 +168,15 @@ extern bool kvm_msi_use_devid;
  */
 #define kvm_msi_devid_required() (kvm_msi_use_devid)
 
+/**
+ * kvm_ioregionfds_enabled:
+ *
+ * Returns: true if we can use ioregionfd to receive the MMIO/PIO
+ * dispatches from KVM (ie the kernel supports ioregionfd and we are running
+ * with a configuration where it is meaningful to use them).
+ */
+#define kvm_ioregionfds_enabled() (kvm_ioregionfds_allowed)
+
 #else
 
 #define kvm_enabled()   (0)
@@ -184,12 +194,14 @@ extern bool kvm_msi_use_devid;
 #define kvm_direct_msi_enabled() (false)
 #define kvm_ioeventfd_any_length_enabled() (false)
 #define kvm_msi_devid_required() (false)
+#define kvm_ioregionfds_enabled (false)
 
 #endif  /* CONFIG_KVM_IS_POSSIBLE */
 
 struct kvm_run;
 struct kvm_lapic_state;
 struct kvm_irq_routing_entry;
+struct kvm_ioregion;
 
 typedef struct KVMCapabilityInfo {
 const char *name;
@@ -548,4 +560,7 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_rese

[PATCH 0/5] linux-user: Fixes for sparc64 host

2022-02-08 Thread Richard Henderson

Brown bag time, since both of these problems are my fault, and I
ostensibly tested them.  Ho hum.  Anyway, this brings linux-test
back to working.


r~


Richard Henderson (5):
  common-user/host/sparc64: Fix safe_syscall_base
  linux-user: Introduce host_signal_mask
  linux-user: Introduce host_sigcontext
  linux-user: Move sparc/host-signal.h to sparc64/host-signal.h
  linux-user/include/host/sparc64: Fix host_sigcontext

 linux-user/include/host/aarch64/host-signal.h | 16 +++--
 linux-user/include/host/alpha/host-signal.h   | 14 +++-
 linux-user/include/host/arm/host-signal.h | 14 +++-
 linux-user/include/host/i386/host-signal.h| 14 +++-
 .../include/host/loongarch64/host-signal.h| 14 +++-
 linux-user/include/host/mips/host-signal.h| 14 +++-
 linux-user/include/host/ppc/host-signal.h | 14 +++-
 linux-user/include/host/riscv/host-signal.h   | 14 +++-
 linux-user/include/host/s390/host-signal.h| 14 +++-
 linux-user/include/host/sparc/host-signal.h   | 63 --
 linux-user/include/host/sparc64/host-signal.h | 65 ++-
 linux-user/include/host/x86_64/host-signal.h  | 14 +++-
 linux-user/signal.c   | 22 +++
 common-user/host/sparc64/safe-syscall.inc.S   |  5 +-
 14 files changed, 188 insertions(+), 109 deletions(-)
 delete mode 100644 linux-user/include/host/sparc/host-signal.h

-- 
2.25.1

[RFC 2/8] multiprocess: place RemoteObject definition in a header file

2022-02-08 Thread Elena Ufimtseva

This will be needed later. No functional changes.

Signed-off-by: Elena Ufimtseva 
---
 include/hw/remote/remote.h | 28 
 hw/remote/remote-obj.c | 16 +---
 MAINTAINERS|  1 +
 3 files changed, 30 insertions(+), 15 deletions(-)
 create mode 100644 include/hw/remote/remote.h

diff --git a/include/hw/remote/remote.h b/include/hw/remote/remote.h
new file mode 100644
index 00..a2d23178b9
--- /dev/null
+++ b/include/hw/remote/remote.h
@@ -0,0 +1,28 @@
+/*
+ * RemoteObject header.
+ *
+ * Copyright © 2018, 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef REMOTE_H
+#define REMOTE_H
+
+struct RemoteObject {
+/* private */
+Object parent;
+
+Notifier machine_done;
+
+int32_t fd;
+char *devid;
+
+QIOChannel *ioc;
+
+DeviceState *dev;
+DeviceListener listener;
+};
+
+#endif
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
index 4f21254219..f0da696662 100644
--- a/hw/remote/remote-obj.c
+++ b/hw/remote/remote-obj.c
@@ -23,6 +23,7 @@
 #include "hw/pci/pci.h"
 #include "qemu/sockets.h"
 #include "monitor/monitor.h"
+#include "hw/remote/remote.h"
 
 #define TYPE_REMOTE_OBJECT "x-remote-object"
 OBJECT_DECLARE_TYPE(RemoteObject, RemoteObjectClass, REMOTE_OBJECT)
@@ -34,21 +35,6 @@ struct RemoteObjectClass {
 unsigned int max_devs;
 };
 
-struct RemoteObject {
-/* private */
-Object parent;
-
-Notifier machine_done;
-
-int32_t fd;
-char *devid;
-
-QIOChannel *ioc;
-
-DeviceState *dev;
-DeviceListener listener;
-};
-
 static void remote_object_set_fd(Object *obj, const char *str, Error **errp)
 {
 RemoteObject *o = REMOTE_OBJECT(obj);
diff --git a/MAINTAINERS b/MAINTAINERS
index 7543eb4d59..3c60a29760 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3453,6 +3453,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: include/hw/remote/remote.h
 
 EBPF:
 M: Jason Wang 
-- 
2.25.1

Re: [PATCH v12 2/5] migration/dirtyrate: refactor dirty page rate calculation

2022-02-08 Thread Peter Xu

On Mon, Jan 24, 2022 at 10:10:37PM +0800, huang...@chinatelecom.cn wrote:
> diff --git a/cpus-common.c b/cpus-common.c
> index 6e73d3e..63159d6 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -73,6 +73,7 @@ static int cpu_get_free_index(void)
>  }
>  
>  CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
> +unsigned int cpu_list_generation_id;
>  
>  void cpu_list_add(CPUState *cpu)
>  {
> @@ -84,6 +85,7 @@ void cpu_list_add(CPUState *cpu)
>  assert(!cpu_index_auto_assigned);
>  }
>  QTAILQ_INSERT_TAIL_RCU(&cpus, cpu, node);
> +cpu_list_generation_id++;
>  }
>  
>  void cpu_list_remove(CPUState *cpu)
> @@ -96,6 +98,7 @@ void cpu_list_remove(CPUState *cpu)
>  
>  QTAILQ_REMOVE_RCU(&cpus, cpu, node);
>  cpu->cpu_index = UNASSIGNED_CPU_INDEX;
> +cpu_list_generation_id++;
>  }

Could you move the cpu list gen id changes into a separate patch?

>  
>  CPUState *qemu_get_cpu(int index)
> diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
> new file mode 100644
> index 000..ea4785f
> --- /dev/null
> +++ b/include/sysemu/dirtyrate.h
> @@ -0,0 +1,31 @@
> +/*
> + * dirty page rate helper functions
> + *
> + * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
> + *
> + * Authors:
> + *  Hyman Huang(黄勇) 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_DIRTYRATE_H
> +#define QEMU_DIRTYRATE_H
> +
> +extern unsigned int cpu_list_generation_id;

How about exporting a function cpu_list_generation_id_get() from the cpu code,
rather than referencing it directly?

> +int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
> + int64_t init_time_ms,
> + VcpuStat *stat,
> + unsigned int flag,
> + bool one_shot)
> +{
> +DirtyPageRecord *records;
> +int64_t duration;
> +int64_t dirtyrate;
> +int i = 0;
> +unsigned int gen_id;
> +
> +retry:
> +cpu_list_lock();
> +gen_id = cpu_list_generation_id;
> +records = vcpu_dirty_stat_alloc(stat);
> +vcpu_dirty_stat_collect(stat, records, true);
> +
> +duration = dirty_stat_wait(calc_time_ms, init_time_ms);
> +cpu_list_unlock();

Should release the lock before sleep (dirty_stat_wait)?

> +
> +global_dirty_log_sync(flag, one_shot);
> +
> +cpu_list_lock();
> +if (gen_id != cpu_list_generation_id) {
> +g_free(records);
> +g_free(stat->rates);
> +cpu_list_unlock();
> +goto retry;
> +}
> +vcpu_dirty_stat_collect(stat, records, false);
> +cpu_list_unlock();
> +
> +for (i = 0; i < stat->nvcpu; i++) {
> +dirtyrate = do_calculate_dirtyrate(records[i], duration);
> +
> +stat->rates[i].id = i;
> +stat->rates[i].dirty_rate = dirtyrate;
> +
> +trace_dirtyrate_do_calculate_vcpu(i, dirtyrate);
> +}
> +
> +g_free(records);
> +
> +return duration;
> +}

Thanks,

-- 
Peter Xu

[RFC 7/8] multiprocess: add ioregionfd memory region in proxy

2022-02-08 Thread Elena Ufimtseva

Signed-off-by: Elena Ufimtseva 
---
 include/hw/remote/proxy.h |  1 +
 hw/remote/proxy.c | 66 ---
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
index 741def71f1..9efef0b935 100644
--- a/include/hw/remote/proxy.h
+++ b/include/hw/remote/proxy.h
@@ -29,6 +29,7 @@ struct PCIProxyDev {
 PCIDevice parent_dev;
 char *fd;
 
+char *ioregfd;
 /*
  * Mutex used to protect the QIOChannel fd from
  * the concurrent access by the VCPUs since proxy
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
index bad164299d..ba1aa20d78 100644
--- a/hw/remote/proxy.c
+++ b/hw/remote/proxy.c
@@ -146,6 +146,33 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
 event_notifier_cleanup(&dev->resample);
 }
 
+static void config_get_ioregionfd_info(PCIProxyDev *pdev, uint32_t reg_num,
+   uint32_t *val, bool memory)
+{
+MPQemuMsg msg = { 0 };
+Error *local_err = NULL;
+uint64_t ret = -EINVAL;
+
+memset(&msg, 0, sizeof(MPQemuMsg));
+msg.cmd = MPQEMU_CMD_BAR_INFO;
+msg.num_fds = 0;
+msg.data.u64 = (uint64_t)reg_num & MAKE_64BIT_MASK(0, 32);
+
+msg.data.u64 |= memory ? (1ULL << 32) : 0;
+msg.size = sizeof(msg.data.u64);
+
+ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
+if (local_err) {
+error_report_err(local_err);
+error_report("Error while receiving reply from remote about fd");
+}
+if (ret == UINT64_MAX) {
+error_report("Failed to request bar info for %d", reg_num);
+}
+
+*val = (uint32_t)ret;
+}
+
 static void config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
int len, unsigned int op)
 {
@@ -198,6 +225,7 @@ static void pci_proxy_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val,
 
 static Property proxy_properties[] = {
 DEFINE_PROP_STRING("fd", PCIProxyDev, fd),
+DEFINE_PROP_STRING("ioregfd", PCIProxyDev, ioregfd),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -297,7 +325,7 @@ const MemoryRegionOps proxy_mr_ops = {
 static void probe_pci_info(PCIDevice *dev, Error **errp)
 {
 PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
-uint32_t orig_val, new_val, base_class, val;
+uint32_t orig_val, new_val, base_class, val, ioregionfd_bar;
 PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
 DeviceClass *dc = DEVICE_CLASS(pc);
 uint8_t type;
@@ -342,6 +370,9 @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
 }
 
 for (i = 0; i < PCI_NUM_REGIONS; i++) {
+bool init_ioregionfd = false;
+int fd = -1;
+
 config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &orig_val, 4,
MPQEMU_CMD_PCI_CFGREAD);
 new_val = 0x;
@@ -362,9 +393,36 @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
 if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
 pdev->region[i].memory = true;
 }
-memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
-  &proxy_mr_ops, &pdev->region[i],
-  name, size);
+#ifdef CONFIG_IOREGIONFD
+/*
+ * Currently, only one fd per device is supported.
+ * TODO: Drop this limit.
+ */
+if (pdev->ioregfd) {
+fd = monitor_fd_param(monitor_cur(), pdev->ioregfd, errp);
+if (fd == -1) {
+error_prepend(errp, "Could not parse ioregionfd fd %s:",
+  pdev->ioregfd);
+}
+
+config_get_ioregionfd_info(pdev, i, &ioregionfd_bar,
+   pdev->region[i].memory);
+if (ioregionfd_bar == i) {
+init_ioregionfd = true;
+}
+}
+#endif
+if (init_ioregionfd) {
+memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
+  NULL, &pdev->region[i],
+  name, size);
+memory_region_add_ioregionfd(&pdev->region[i].mr, 0, size, i,
+ fd, false);
+} else {
+memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
+  &proxy_mr_ops, &pdev->region[i],
+  name, size);
+}
 pci_register_bar(dev, i, type, &pdev->region[i].mr);
 }
 }
-- 
2.25.1

Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ

2022-02-08 Thread Jason Wang




在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:

On Sun, Jan 30, 2022 at 7:50 AM Jason Wang  wrote:


在 2022/1/22 上午4:27, Eugenio Pérez 写道:

SVQ is able to log the dirty bits by itself, so let's use it to not
block migration.

Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
enabled. Even if the device supports it, the reports would be nonsense
because SVQ memory is in the qemu region.

The log region is still allocated. Future changes might skip that, but
this series is already long enough.

Signed-off-by: Eugenio Pérez 
---
   hw/virtio/vhost-vdpa.c | 20 
   1 file changed, 20 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index fb0a338baa..75090d65e8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, 
uint64_t *features)
   if (ret == 0 && v->shadow_vqs_enabled) {
   /* Filter only features that SVQ can offer to guest */
   vhost_svq_valid_guest_features(features);
+
+/* Add SVQ logging capabilities */
+*features |= BIT_ULL(VHOST_F_LOG_ALL);
   }

   return ret;
@@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,

   if (v->shadow_vqs_enabled) {
   uint64_t dev_features, svq_features, acked_features;
+uint8_t status = 0;
   bool ok;

+ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
+if (unlikely(ret)) {
+return ret;
+}
+
+if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+/*
+ * vhost is trying to enable or disable _F_LOG, and the device
+ * would report wrong dirty pages. SVQ handles it.
+ */


I fail to understand this comment, I'd think there's no way to disable
dirty page tracking for SVQ.


vhost_log_global_{start,stop} are called at the beginning and end of
migration. To inform the device that it should start logging, they set
or clean VHOST_F_LOG_ALL at vhost_dev_set_log.



Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The 
only thing is to ignore or filter out the F_LOG_ALL and pretend to be 
enabled and disabled.





While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
vhost does not block migration. Maybe we need to look for another way
to do this?



I'm fine with filtering since it's much more simpler, but I fail to 
understand why we need to check DRIVER_OK.


Thanks




Thanks!


Thanks



+return 0;
+}
+
+/* We must not ack _F_LOG if SVQ is enabled */
+features &= ~BIT_ULL(VHOST_F_LOG_ALL);
+
   ret = vhost_vdpa_get_dev_features(dev, &dev_features);
   if (ret != 0) {
   error_report("Can't get vdpa device features, got (%d)", ret);

[PATCH 1/5] common-user/host/sparc64: Fix safe_syscall_base

2022-02-08 Thread Richard Henderson

Use the "retl" instead of "ret" instruction alias, since we
do not allocate a register window in this function.

Fix the offset to the first stacked parameter, which lies
beyond the register window save area.

Fixes: 95c021dac835 ("linux-user/host/sparc64: Add safe-syscall.inc.S")
Signed-off-by: Richard Henderson 
---
 common-user/host/sparc64/safe-syscall.inc.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/common-user/host/sparc64/safe-syscall.inc.S 
b/common-user/host/sparc64/safe-syscall.inc.S
index a2f2b9c967..c7be8f2d25 100644
--- a/common-user/host/sparc64/safe-syscall.inc.S
+++ b/common-user/host/sparc64/safe-syscall.inc.S
@@ -24,7 +24,8 @@
 .type   safe_syscall_end, @function
 
 #define STACK_BIAS  2047
-#define PARAM(N)STACK_BIAS + N*8
+#define WINDOW_SIZE 16 * 8
+#define PARAM(N)STACK_BIAS + WINDOW_SIZE + N * 8
 
 /*
  * This is the entry point for making a system call. The calling
@@ -74,7 +75,7 @@ safe_syscall_end:
 /* code path for having successfully executed the syscall */
 bcs,pn  %xcc, 1f
  nop
-ret
+retl
  nop
 
 /* code path when we didn't execute the syscall */
-- 
2.25.1

[RFC 0/8] ioregionfd introduction

2022-02-08 Thread Elena Ufimtseva

This patchset is an RFC version for the ioregionfd implementation
in QEMU. The kernel patches are to be posted with some fixes as a v4.

For this implementation version 3 of the posted kernel patches was user:
https://lore.kernel.org/kvm/cover.1613828726.git.eafanas...@gmail.com/

The future version will include support for vfio/libvfio-user.
Please refer to the design discussion here proposed by Stefan:
https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/

The vfio-user version needed some bug-fixing and it was decided to send
this for multiprocess first.

The ioregionfd is configured currently trough the command line and each
ioregionfd represent an object. This allow for easy parsing and does
not require device/remote object command line option modifications.

The following command line can be used to specify ioregionfd:

  '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
  '-object', 
'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
  '-object', 
'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\


Proxy side of ioregionfd in this version uses only one file descriptor:

  '-device', 
'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()),
 \


This is done for RFC version and my though was that next version will
be for vfio-user, so I have not dedicated much effort to this command
line options.

The multiprocess messaging protocol was extended to support inquiries
by the proxy if device has any ioregionfds.
This RFC implements inquires by proxy about the type of BAR (ioregionfd
or not) and the type of it (memory/io).

Currently there are few limitations in this version of ioregionfd.
 - one ioregionfd per bar, only full bar size is supported;
 - one file descriptor per device for all of its ioregionfds;
 - each remote device runs fd handler for all its BARs in one IOThread;
 - proxy supports only one fd.

Some of these limitations will be dropped in the future version.
This RFC is to acquire the feedback/suggestions from the community
on the general approach.

The quick performance test was done for the remote lsi device with
ioregionfd and without for both mem BARs (1 and 2) with help
of the fio tool:

Random R/W:

 read IOPS  read BW write IOPS   write BW
no ioregionfd8893559KiB/s   890  3561KiB/s
ioregionfd   9383756KiB/s   939  3757KiB/s


Sequential Read and Sequential Write:

 Sequential readSequential write
 read IOPS  read BW write IOPS   write BW

no ioregionfd367k   1434MiB/s   76k  297MiB/s
ioregionfd   374k   1459MiB/s   77.3k302MiB/s


Please review and send your feedback.

Thank you!
Elena

Elena Ufimtseva (8):
  ioregionfd: introduce a syscall and memory API
  multiprocess: place RemoteObject definition in a header file
  ioregionfd: introduce memory API functions
  ioregionfd: Introduce IORegionDFObject type
  multiprocess: prepare ioregionfds for remote device
  multiprocess: add MPQEMU_CMD_BAR_INFO
  multiprocess: add ioregionfd memory region in proxy
  multiprocess: handle ioregionfd commands

 meson.build |  15 +-
 qapi/qom.json   |  32 ++-
 include/exec/memory.h   |  50 +
 include/hw/remote/ioregionfd.h  |  45 
 include/hw/remote/machine.h |   1 +
 include/hw/remote/mpqemu-link.h |   2 +
 include/hw/remote/proxy.h   |   1 +
 include/hw/remote/remote.h  |  31 +++
 include/sysemu/kvm.h|  15 ++
 linux-headers/ioregionfd.h  |  30 +++
 linux-headers/linux/kvm.h   |  25 +++
 accel/kvm/kvm-all.c | 132 
 accel/stubs/kvm-stub.c  |   1 +
 hw/remote/ioregionfd.c  | 361 
 hw/remote/message.c |  38 
 hw/remote/proxy.c   |  66 +-
 hw/remote/remote-obj.c  | 154 --
 softmmu/memory.c| 207 ++
 Kconfig.host|   3 +
 MAINTAINERS |   3 +
 hw/remote/Kconfig   |   4 +
 hw/remote/meson.build   |   1 +
 meson_options.txt   |   2 +
 scripts/meson-buildoptions.sh   |   3 +
 24 files changed, 1199 insertions(+), 23 deletions(-)
 create mode 100644 include/hw/remote/ioregionfd.h
 create mode 100644 include/hw/remote/remote.h
 create mode 100644 linux-headers/ioregionfd.h
 create mode 100644 hw/remote/ioregionfd.c

-- 
2.25.1

[PATCH 3/5] linux-user: Introduce host_sigcontext

2022-02-08 Thread Richard Henderson

Do not directly access ucontext_t as the third signal parameter.
This is preparation for a sparc64 fix.

Signed-off-by: Richard Henderson 
---
 linux-user/include/host/aarch64/host-signal.h | 13 -
 linux-user/include/host/alpha/host-signal.h   | 11 +++
 linux-user/include/host/arm/host-signal.h | 11 +++
 linux-user/include/host/i386/host-signal.h| 11 +++
 linux-user/include/host/loongarch64/host-signal.h | 11 +++
 linux-user/include/host/mips/host-signal.h| 11 +++
 linux-user/include/host/ppc/host-signal.h | 11 +++
 linux-user/include/host/riscv/host-signal.h   | 11 +++
 linux-user/include/host/s390/host-signal.h| 11 +++
 linux-user/include/host/sparc/host-signal.h   | 11 +++
 linux-user/include/host/x86_64/host-signal.h  | 11 +++
 linux-user/signal.c   |  4 ++--
 12 files changed, 80 insertions(+), 47 deletions(-)

diff --git a/linux-user/include/host/aarch64/host-signal.h 
b/linux-user/include/host/aarch64/host-signal.h
index 76ab078069..be079684a2 100644
--- a/linux-user/include/host/aarch64/host-signal.h
+++ b/linux-user/include/host/aarch64/host-signal.h
@@ -11,6 +11,9 @@
 #ifndef AARCH64_HOST_SIGNAL_H
 #define AARCH64_HOST_SIGNAL_H
 
+/* The third argument to a SA_SIGINFO handler is ucontext_t. */
+typedef ucontext_t host_sigcontext;
+
 /* Pre-3.16 kernel headers don't have these, so provide fallback definitions */
 #ifndef ESR_MAGIC
 #define ESR_MAGIC 0x45535201
@@ -20,7 +23,7 @@ struct esr_context {
 };
 #endif
 
-static inline struct _aarch64_ctx *first_ctx(ucontext_t *uc)
+static inline struct _aarch64_ctx *first_ctx(host_sigcontext *uc)
 {
 return (struct _aarch64_ctx *)&uc->uc_mcontext.__reserved;
 }
@@ -30,22 +33,22 @@ static inline struct _aarch64_ctx *next_ctx(struct 
_aarch64_ctx *hdr)
 return (struct _aarch64_ctx *)((char *)hdr + hdr->size);
 }
 
-static inline uintptr_t host_signal_pc(ucontext_t *uc)
+static inline uintptr_t host_signal_pc(host_sigcontext *uc)
 {
 return uc->uc_mcontext.pc;
 }
 
-static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+static inline void host_signal_set_pc(host_sigcontext *uc, uintptr_t pc)
 {
 uc->uc_mcontext.pc = pc;
 }
 
-static inline void *host_signal_mask(ucontext_t *uc)
+static inline void *host_signal_mask(host_sigcontext *uc)
 {
 return &uc->uc_sigmask;
 }
 
-static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
+static inline bool host_signal_write(siginfo_t *info, host_sigcontext *uc)
 {
 struct _aarch64_ctx *hdr;
 uint32_t insn;
diff --git a/linux-user/include/host/alpha/host-signal.h 
b/linux-user/include/host/alpha/host-signal.h
index a44d670f2b..4f9e2abc4b 100644
--- a/linux-user/include/host/alpha/host-signal.h
+++ b/linux-user/include/host/alpha/host-signal.h
@@ -11,22 +11,25 @@
 #ifndef ALPHA_HOST_SIGNAL_H
 #define ALPHA_HOST_SIGNAL_H
 
-static inline uintptr_t host_signal_pc(ucontext_t *uc)
+/* The third argument to a SA_SIGINFO handler is ucontext_t. */
+typedef ucontext_t host_sigcontext;
+
+static inline uintptr_t host_signal_pc(host_sigcontext *uc)
 {
 return uc->uc_mcontext.sc_pc;
 }
 
-static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+static inline void host_signal_set_pc(host_sigcontext *uc, uintptr_t pc)
 {
 uc->uc_mcontext.sc_pc = pc;
 }
 
-static inline void *host_signal_mask(ucontext_t *uc)
+static inline void *host_signal_mask(host_sigcontext *uc)
 {
 return &uc->uc_sigmask;
 }
 
-static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
+static inline bool host_signal_write(siginfo_t *info, host_sigcontext *uc)
 {
 uint32_t *pc = (uint32_t *)host_signal_pc(uc);
 uint32_t insn = *pc;
diff --git a/linux-user/include/host/arm/host-signal.h 
b/linux-user/include/host/arm/host-signal.h
index bbeb4ffefb..faba496d24 100644
--- a/linux-user/include/host/arm/host-signal.h
+++ b/linux-user/include/host/arm/host-signal.h
@@ -11,22 +11,25 @@
 #ifndef ARM_HOST_SIGNAL_H
 #define ARM_HOST_SIGNAL_H
 
-static inline uintptr_t host_signal_pc(ucontext_t *uc)
+/* The third argument to a SA_SIGINFO handler is ucontext_t. */
+typedef ucontext_t host_sigcontext;
+
+static inline uintptr_t host_signal_pc(host_sigcontext *uc)
 {
 return uc->uc_mcontext.arm_pc;
 }
 
-static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+static inline void host_signal_set_pc(host_sigcontext *uc, uintptr_t pc)
 {
 uc->uc_mcontext.arm_pc = pc;
 }
 
-static inline void *host_signal_mask(ucontext_t *uc)
+static inline void *host_signal_mask(host_sigcontext *uc)
 {
 return &uc->uc_sigmask;
 }
 
-static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
+static inline bool host_signal_write(siginfo_t *info, host_sigcontext *uc)
 {
 /*
  * In the FSR, bit 11 is WnR, assuming a v6 or
diff --git a/linux-user/include/host/i386/host-signal.h 
b/linux

Re: [PATCH v2] hw/smbios: fix memory corruption for large guests due to handle overlap

2022-02-08 Thread Igor Mammedov

On Mon, 7 Feb 2022 20:42:35 +0530 (IST)
Ani Sinha  wrote:

> >
> > So question is it is worth to have legacy SMBIOS code and introduce a
> > new handle layout + memory_region re-sizable SMBIOS tables like we did
> > with ACPI ones.
> >
> > That way we we will be free to change SMBIOS tables at will without a
> > risk of breaking migration and without need to add compat knob for every
> > change to keep src and dst binary compatible.
> >  
> 
> Could you please point me to the change on the acpi side so that I can
> study it and look into the refactoring for smbios side?
> 

I'd suggest to start looking at acpi_add_rom_blob() and how it evolved to
the current code. Eventually you should find a commit introducing resizable
memory_regions introduced by Michael, it I recall correctly it was around
that time when we switched ACPI tables to memory regions.

[PATCH v6 0/8] tcg/sparc: Unaligned access for user-only

2022-02-08 Thread Richard Henderson

Changes from v5:
  * Use tcg_out_movi_imm13 from tcg_out_addsub2_i64.
  * Split out tcg_out_movi_imm32 to avoid recursion.
  * Reinstate the assert vs TCG_REG_T2 in tcg_out_movi.

Changes from v4:
  * Remove assert from tcg_out_movi; rely on the one in tcg_out_movi_int (pmm).
  * Finish conversion of patch_reloc (pmm).
  * Simplify unaligned access loads.

Changes from v3:
  * Rebase on master, two patches merged.


r~

Richard Henderson (8):
  tcg/sparc: Use tcg_out_movi_imm13 in tcg_out_addsub2_i64
  tcg/sparc: Split out tcg_out_movi_imm32
  tcg/sparc: Add scratch argument to tcg_out_movi_int
  tcg/sparc: Improve code gen for shifted 32-bit constants
  tcg/sparc: Convert patch_reloc to return bool
  tcg/sparc: Use the constant pool for 64-bit constants
  tcg/sparc: Add tcg_out_jmpl_const for better tail calls
  tcg/sparc: Support unaligned access for user-only

 tcg/sparc/tcg-target.c.inc | 348 +++--
 1 file changed, 296 insertions(+), 52 deletions(-)

-- 
2.25.1

[RFC 8/8] multiprocess: handle ioregionfd commands

2022-02-08 Thread Elena Ufimtseva

Signed-off-by: Elena Ufimtseva 
---
 include/hw/remote/ioregionfd.h |   2 +
 include/hw/remote/remote.h |   2 +
 linux-headers/ioregionfd.h |  30 +
 hw/remote/ioregionfd.c | 111 +
 hw/remote/remote-obj.c |  44 +
 5 files changed, 189 insertions(+)
 create mode 100644 linux-headers/ioregionfd.h

diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
index 66bb459f76..8021eed6f1 100644
--- a/include/hw/remote/ioregionfd.h
+++ b/include/hw/remote/ioregionfd.h
@@ -40,4 +40,6 @@ typedef struct IORegionFDObject IORegionFDObject;
 GSList *ioregionfd_get_obj_list(void);
 IORegionFD *ioregionfd_get_by_bar(GSList *list, uint32_t bar);
 void ioregionfd_set_bar_type(GSList *list, uint32_t bar, bool memory);
+int qio_channel_ioregionfd_read(QIOChannel *ioc, gpointer opaque,
+Error **errp);
 #endif /* IOREGIONFD_H */
diff --git a/include/hw/remote/remote.h b/include/hw/remote/remote.h
index 46390c7934..53b570e1ac 100644
--- a/include/hw/remote/remote.h
+++ b/include/hw/remote/remote.h
@@ -23,6 +23,8 @@ struct RemoteObject {
 
 DeviceState *dev;
 DeviceListener listener;
+QIOChannel *ioregfd_ioc;
+AioContext *ioregfd_ctx;
 GHashTable *ioregionfd_hash;
 };
 
diff --git a/linux-headers/ioregionfd.h b/linux-headers/ioregionfd.h
new file mode 100644
index 00..58f9b5ba61
--- /dev/null
+++ b/linux-headers/ioregionfd.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: ((GPL-2.0-only WITH Linux-syscall-note) OR 
BSD-3-Clause) */
+#ifndef _UAPI_LINUX_IOREGION_H
+#define _UAPI_LINUX_IOREGION_H
+
+/* Wire protocol */
+
+struct ioregionfd_cmd {
+   __u8 cmd;
+   __u8 size_exponent : 4;
+   __u8 resp : 1;
+   __u8 padding[6];
+   __u64 user_data;
+   __u64 offset;
+   __u64 data;
+};
+
+struct ioregionfd_resp {
+   __u64 data;
+   __u8 pad[24];
+};
+
+#define IOREGIONFD_CMD_READ0
+#define IOREGIONFD_CMD_WRITE   1
+
+#define IOREGIONFD_SIZE_8BIT   0
+#define IOREGIONFD_SIZE_16BIT  1
+#define IOREGIONFD_SIZE_32BIT  2
+#define IOREGIONFD_SIZE_64BIT  3
+
+#endif
diff --git a/hw/remote/ioregionfd.c b/hw/remote/ioregionfd.c
index 1d371357c6..dd04c39e25 100644
--- a/hw/remote/ioregionfd.c
+++ b/hw/remote/ioregionfd.c
@@ -26,6 +26,7 @@
 #include "hw/pci/pci.h"
 #include "qapi/qapi-visit-qom.h"
 #include "hw/remote/remote.h"
+#include "ioregionfd.h"
 
 #define TYPE_IOREGIONFD_OBJECT "ioregionfd-object"
 OBJECT_DECLARE_TYPE(IORegionFDObject, IORegionFDObjectClass, IOREGIONFD_OBJECT)
@@ -91,6 +92,116 @@ void ioregionfd_set_bar_type(GSList *list, uint32_t bar, 
bool memory)
 }
 }
 
+int qio_channel_ioregionfd_read(QIOChannel *ioc, gpointer opaque,
+Error **errp)
+{
+struct RemoteObject *o = (struct RemoteObject *)opaque;
+struct ioregionfd_cmd cmd = {};
+struct iovec iov = {
+.iov_base = &cmd,
+.iov_len = sizeof(struct ioregionfd_cmd),
+};
+IORegionFDObject *ioregfd_obj;
+PCIDevice *pci_dev;
+hwaddr addr;
+struct ioregionfd_resp resp = {};
+int bar = 0;
+Error *local_err = NULL;
+uint64_t val = UINT64_MAX;
+AddressSpace *as;
+int ret = -EINVAL;
+
+ERRP_GUARD();
+
+if (!ioc) {
+return -EINVAL;
+}
+ret = qio_channel_readv_full(ioc, &iov, 1, NULL, 0, &local_err);
+
+if (ret == QIO_CHANNEL_ERR_BLOCK) {
+return -EINVAL;
+}
+
+if (ret <= 0) {
+/* read error or other side closed connection */
+if (local_err) {
+error_report_err(local_err);
+}
+error_setg(errp, "ioregionfd receive error");
+return -EINVAL;
+}
+
+bar = cmd.user_data;
+pci_dev = PCI_DEVICE(o->dev);
+addr = (hwaddr)(pci_get_bar_addr(pci_dev, bar) + cmd.offset);
+IORegionFDObject key = {.ioregfd = {.bar = bar} };
+ioregfd_obj = g_hash_table_lookup(o->ioregionfd_hash, &key);
+
+if (!ioregfd_obj) {
+error_setg(errp, "Could not find IORegionFDObject");
+return -EINVAL;
+}
+if (ioregfd_obj->ioregfd.memory) {
+as = &address_space_memory;
+} else {
+as = &address_space_io;
+}
+
+if (ret > 0 && pci_dev) {
+switch (cmd.cmd) {
+case IOREGIONFD_CMD_READ:
+ret = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED,
+   (void *)&val, 1 << cmd.size_exponent,
+   false);
+if (ret != MEMTX_OK) {
+ret = -EINVAL;
+error_setg(errp, "Bad address %"PRIx64" in mem read", addr);
+val = UINT64_MAX;
+}
+
+memset(&resp, 0, sizeof(resp));
+resp.data = val;
+if (qio_channel_write_all(ioc, (char *)&resp, sizeof(resp),
+  &local_err)) {
+error_propagate(errp, local_err);
+goto fatal;
+

Re: [PATCH 00/31] vDPA shadow virtqueue

2022-02-08 Thread Jason Wang




在 2022/1/31 下午5:15, Eugenio Perez Martin 写道:

On Fri, Jan 28, 2022 at 7:02 AM Jason Wang  wrote:


在 2022/1/22 上午4:27, Eugenio Pérez 写道:

This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
is intended as a new method of tracking the memory the devices touch
during a migration process: Instead of relay on vhost device's dirty
logging capability, SVQ intercepts the VQ dataplane forwarding the
descriptors between VM and device. This way qemu is the effective
writer of guests memory, like in qemu's emulated virtio device
operation.

When SVQ is enabled qemu offers a new virtual address space to the
device to read and write into, and it maps new vrings and the guest
memory in it. SVQ also intercepts kicks and calls between the device
and the guest. Used buffers relay would cause dirty memory being
tracked, but at this RFC SVQ is not enabled on migration automatically.

Thanks of being a buffers relay system, SVQ can be used also to
communicate devices and drivers with different capabilities, like
devices that only support packed vring and not split and old guests with
no driver packed support.

It is based on the ideas of DPDK SW assisted LM, in the series of
DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
not map the shadow vq in guest's VA, but in qemu's.

This version of SVQ is limited in the amount of features it can use with
guest and device, because this series is already very big otherwise.
Features like indirect or event_idx will be addressed in future series.

SVQ needs to be enabled with cmdline parameter x-svq, like:

-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true

In this version it cannot be enabled or disabled in runtime. Further
series will remove this limitation and will enable it only for migration
time.

Some patches are intentionally very small to ease review, but they can
be squashed if preferred.

Patches 1-10 prepares the SVQ and QEMU to support both guest to device
and device to guest notifications forwarding, with the extra qemu hop.
That part can be tested in isolation if cmdline change is reproduced.

Patches from 11 to 18 implement the actual buffer forwarding, but with
no IOMMU support. It requires a vdpa device capable of addressing all
qemu vaddr.

Patches 19 to 23 adds the iommu support, so the device with address
range limitations can access SVQ through this new virtual address space
created.

The rest of the series add the last pieces needed for migration.

Comments are welcome.


I wonder the performance impact. So performance numbers are more than
welcomed.


Sure, I'll do it for the next revision. Since this one brings a decent
amount of changes, I chose to collect the feedback first.



A simple single TCP_STREAM netperf test should be sufficient to give 
some basic understanding about the performance impact.


Thanks




Thanks!


Thanks



TODO:
* Event, indirect, packed, and other features of virtio.
* To separate buffers forwarding in its own AIO context, so we can
throw more threads to that task and we don't need to stop the main
event loop.
* Support virtio-net control vq.
* Proper documentation.

Changes from v5 RFC:
* Remove dynamic enablement of SVQ, making less dependent of the device.
* Enable live migration if SVQ is enabled.
* Fix SVQ when driver reset.
* Comments addressed, specially in the iova area.
* Rebase on latest master, adding multiqueue support (but no networking
control vq processing).
v5 link:
https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html

Changes from v4 RFC:
* Support of allocating / freeing iova ranges in IOVA tree. Extending
already present iova-tree for that.
* Proper validation of guest features. Now SVQ can negotiate a
different set of features with the device when enabled.
* Support of host notifiers memory regions
* Handling of SVQ full queue in case guest's descriptors span to
different memory regions (qemu's VA chunks).
* Flush pending used buffers at end of SVQ operation.
* QMP command now looks by NetClientState name. Other devices will need
to implement it's way to enable vdpa.
* Rename QMP command to set, so it looks more like a way of working
* Better use of qemu error system
* Make a few assertions proper error-handling paths.
* Add more documentation
* Less coupling of virtio / vhost, that could cause friction on changes
* Addressed many other small comments and small fixes.

Changes from v3 RFC:
* Move everything to vhost-vdpa backend. A big change, this allowed
  some cleanup but more code has been added in other places.
* More use of glib utilities, especially to manage memory.
v3 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html

Changes from v2 RFC:
* Adding vhost-vdpa devices support
* Fixed some memory leaks pointed by different comments
v2 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html

Changes from v1 RFC:
* Use QMP inste

Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding

2022-02-08 Thread Jason Wang




在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:

On Sun, Jan 30, 2022 at 5:43 AM Jason Wang  wrote:


在 2022/1/22 上午4:27, Eugenio Pérez 写道:

Initial version of shadow virtqueue that actually forward buffers. There
is no iommu support at the moment, and that will be addressed in future
patches of this series. Since all vhost-vdpa devices use forced IOMMU,
this means that SVQ is not usable at this point of the series on any
device.

For simplicity it only supports modern devices, that expects vring
in little endian, with split ring and no event idx or indirect
descriptors. Support for them will not be added in this series.

It reuses the VirtQueue code for the device part. The driver part is
based on Linux's virtio_ring driver, but with stripped functionality
and optimizations so it's easier to review.

However, forwarding buffers have some particular pieces: One of the most
unexpected ones is that a guest's buffer can expand through more than
one descriptor in SVQ. While this is handled gracefully by qemu's
emulated virtio devices, it may cause unexpected SVQ queue full. This
patch also solves it by checking for this condition at both guest's
kicks and device's calls. The code may be more elegant in the future if
SVQ code runs in its own iocontext.

Signed-off-by: Eugenio Pérez 
---
   hw/virtio/vhost-shadow-virtqueue.h |   2 +
   hw/virtio/vhost-shadow-virtqueue.c | 365 -
   hw/virtio/vhost-vdpa.c | 111 -
   3 files changed, 462 insertions(+), 16 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 39aef5ffdf..19c934af49 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);

+void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
+ VirtQueue *vq);
   void vhost_svq_stop(VhostShadowVirtqueue *svq);

   VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 7c168075d7..a1a404f68f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -9,6 +9,8 @@

   #include "qemu/osdep.h"
   #include "hw/virtio/vhost-shadow-virtqueue.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/virtio-access.h"
   #include "standard-headers/linux/vhost_types.h"

   #include "qemu/error-report.h"
@@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {

   /* Guest's call notifier, where SVQ calls guest. */
   EventNotifier svq_call;
+
+/* Virtio queue shadowing */
+VirtQueue *vq;
+
+/* Virtio device */
+VirtIODevice *vdev;
+
+/* Map for returning guest's descriptors */
+VirtQueueElement **ring_id_maps;
+
+/* Next VirtQueue element that guest made available */
+VirtQueueElement *next_guest_avail_elem;
+
+/* Next head to expose to device */
+uint16_t avail_idx_shadow;
+
+/* Next free descriptor */
+uint16_t free_head;
+
+/* Last seen used idx */
+uint16_t shadow_used_idx;
+
+/* Next head to consume from device */
+uint16_t last_used_idx;
+
+/* Cache for the exposed notification flag */
+bool notification;
   } VhostShadowVirtqueue;

   #define INVALID_SVQ_KICK_FD -1
@@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
   return true;
   }

-/* Forward guest notifications */
-static void vhost_handle_guest_kick(EventNotifier *n)
+/**
+ * Number of descriptors that SVQ can make available from the guest.
+ *
+ * @svq   The svq
+ */
+static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
   {
-VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
- svq_kick);
+return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
+}
+
+static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
+{
+uint16_t notification_flag;

-if (unlikely(!event_notifier_test_and_clear(n))) {
+if (svq->notification == enable) {
+return;
+}
+
+notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+
+svq->notification = enable;
+if (enable) {
+svq->vring.avail->flags &= ~notification_flag;
+} else {
+svq->vring.avail->flags |= notification_flag;
+}
+}
+
+static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
+const struct iovec *iovec,
+size_t num, bool more_descs, bool write)
+{
+uint16_t i = svq->free_head, last = svq->free_head;
+unsigned n;
+uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
+vring_desc_t *descs = svq->vring.desc;
+
+if (num == 0) {
+

[PATCH v6 4/8] tcg/sparc: Improve code gen for shifted 32-bit constants

2022-02-08 Thread Richard Henderson

We had code for checking for 13 and 21-bit shifted constants,
but we can do better and allow 32-bit shifted constants.
This is still 2 insns shorter than the full 64-bit sequence.

Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/sparc/tcg-target.c.inc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 7b970d58e3..088c680f37 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -462,17 +462,17 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, 
TCGReg ret,
 return;
 }
 
-/* A 21-bit constant, shifted.  */
+/* A 32-bit constant, shifted.  */
 lsb = ctz64(arg);
 test = (tcg_target_long)arg >> lsb;
-if (check_fit_tl(test, 13)) {
-tcg_out_movi_imm13(s, ret, test);
-tcg_out_arithi(s, ret, ret, lsb, SHIFT_SLLX);
-return;
-} else if (lsb > 10 && test == extract64(test, 0, 21)) {
+if (lsb > 10 && test == extract64(test, 0, 21)) {
 tcg_out_sethi(s, ret, test << 10);
 tcg_out_arithi(s, ret, ret, lsb - 10, SHIFT_SLLX);
 return;
+} else if (test == (uint32_t)test || test == (int32_t)test) {
+tcg_out_movi_int(s, TCG_TYPE_I64, ret, test, in_prologue, scratch);
+tcg_out_arithi(s, ret, ret, lsb, SHIFT_SLLX);
+return;
 }
 
 /* A 64-bit constant decomposed into 2 32-bit pieces.  */
-- 
2.25.1

[PATCH v6 5/8] tcg/sparc: Convert patch_reloc to return bool

2022-02-08 Thread Richard Henderson

Since 7ecd02a06f8, if patch_reloc fails we restart translation
with a smaller TB.  Sparc had its function signature changed,
but not the logic.  Replace assert with return false.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/sparc/tcg-target.c.inc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 088c680f37..ae809c9941 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -323,12 +323,16 @@ static bool patch_reloc(tcg_insn_unit *src_rw, int type,
 
 switch (type) {
 case R_SPARC_WDISP16:
-assert(check_fit_ptr(pcrel >> 2, 16));
+if (!check_fit_ptr(pcrel >> 2, 16)) {
+return false;
+}
 insn &= ~INSN_OFF16(-1);
 insn |= INSN_OFF16(pcrel);
 break;
 case R_SPARC_WDISP19:
-assert(check_fit_ptr(pcrel >> 2, 19));
+if (!check_fit_ptr(pcrel >> 2, 19)) {
+return false;
+}
 insn &= ~INSN_OFF19(-1);
 insn |= INSN_OFF19(pcrel);
 break;
-- 
2.25.1

Re: [PATCH v4 02/12] mm/memfd: Introduce MFD_INACCESSIBLE flag

2022-02-08 Thread David Hildenbrand

On 07.02.22 19:51, Vlastimil Babka wrote:
> On 1/18/22 14:21, Chao Peng wrote:
>> Introduce a new memfd_create() flag indicating the content of the
>> created memfd is inaccessible from userspace. It does this by force
>> setting F_SEAL_INACCESSIBLE seal when the file is created. It also set
>> F_SEAL_SEAL to prevent future sealing, which means, it can not coexist
>> with MFD_ALLOW_SEALING.
>>
>> The pages backed by such memfd will be used as guest private memory in
>> confidential computing environments such as Intel TDX/AMD SEV. Since
>> page migration/swapping is not yet supported for such usages so these
>> pages are currently marked as UNMOVABLE and UNEVICTABLE which makes
>> them behave like long-term pinned pages.
> 
> Shouldn't the amount of such memory allocations be restricted? E.g. similar
> to secretmem_mmap() doing mlock_future_check().

I've raised this already in the past and Kirill wanted to look into it [1].

We'll most certainly need a way to limit/control the amount of
unswappable + unmovable ("worse than mlock" memory) a user/process can
consume via this mechanism.


[1]
https://lkml.kernel.org/r/20211122135933.arjxpl7wyskkw...@box.shutemov.name


-- 
Thanks,

David / dhildenb

Re: [PATCH 22/31] vhost: Add VhostIOVATree

2022-02-08 Thread Jason Wang




在 2022/2/2 上午1:27, Eugenio Perez Martin 写道:

On Sun, Jan 30, 2022 at 6:21 AM Jason Wang  wrote:


在 2022/1/22 上午4:27, Eugenio Pérez 写道:

This tree is able to look for a translated address from an IOVA address.

At first glance it is similar to util/iova-tree. However, SVQ working on
devices with limited IOVA space need more capabilities,


So did the IOVA tree (e.g l2 vtd can only work in the range of GAW and
without RMRRs).



   like allocating
IOVA chunks or performing reverse translations (qemu addresses to iova).


This looks like a general request as well. So I wonder if we can simply
extend iova tree instead.


While both are true, I don't see code that performs allocations or
qemu vaddr to iova translations. But if the changes can be integrated
into iova-tree that would be great for sure.

The main drawback I see is the need to maintain two trees instead of
one for users of iova-tree. While complexity does not grow, it needs
to double the amount of work needed.



If you care about the performance, we can disable the reverse mapping 
during the allocation. For vIOMMU users it won't notice any performance 
penalty.


Thanks




Thanks!


Thanks



The allocation capability, as "assign a free IOVA address to this chunk
of memory in qemu's address space" allows shadow virtqueue to create a
new address space that is not restricted by guest's addressable one, so
we can allocate shadow vqs vrings outside of it.

It duplicates the tree so it can search efficiently both directions,
and it will signal overlap if iova or the translated address is
present in any tree.

Signed-off-by: Eugenio Pérez 
---
   hw/virtio/vhost-iova-tree.h |  27 +++
   hw/virtio/vhost-iova-tree.c | 157 
   hw/virtio/meson.build   |   2 +-
   3 files changed, 185 insertions(+), 1 deletion(-)
   create mode 100644 hw/virtio/vhost-iova-tree.h
   create mode 100644 hw/virtio/vhost-iova-tree.c

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
new file mode 100644
index 00..610394eaf1
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.h
@@ -0,0 +1,27 @@
+/*
+ * vhost software live migration ring
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
+#define HW_VIRTIO_VHOST_IOVA_TREE_H
+
+#include "qemu/iova-tree.h"
+#include "exec/memory.h"
+
+typedef struct VhostIOVATree VhostIOVATree;
+
+VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
+
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
+const DMAMap *map);
+int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
+void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
+
+#endif
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
new file mode 100644
index 00..0021dbaf54
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.c
@@ -0,0 +1,157 @@
+/*
+ * vhost software live migration ring
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iova-tree.h"
+#include "vhost-iova-tree.h"
+
+#define iova_min_addr qemu_real_host_page_size
+
+/**
+ * VhostIOVATree, able to:
+ * - Translate iova address
+ * - Reverse translate iova address (from translated to iova)
+ * - Allocate IOVA regions for translated range (potentially slow operation)
+ *
+ * Note that it cannot remove nodes.
+ */
+struct VhostIOVATree {
+/* First addresable iova address in the device */
+uint64_t iova_first;
+
+/* Last addressable iova address in the device */
+uint64_t iova_last;
+
+/* IOVA address to qemu memory maps. */
+IOVATree *iova_taddr_map;
+
+/* QEMU virtual memory address to iova maps */
+GTree *taddr_iova_map;
+};
+
+static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
+  gpointer data)
+{
+const DMAMap *m1 = a, *m2 = b;
+
+if (m1->translated_addr > m2->translated_addr + m2->size) {
+return 1;
+}
+
+if (m1->translated_addr + m1->size < m2->translated_addr) {
+return -1;
+}
+
+/* Overlapped */
+return 0;
+}
+
+/**
+ * Create a new IOVA tree
+ *
+ * Returns the new IOVA tree
+ */
+VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
+{
+VhostIOVATree *tree = g_new(VhostIOVATree, 1);
+
+/* Some devices does not like 0 addresses */
+tree->iova_first = MAX(iova_first, iova_min_addr);
+tree->iova_last = iova_last;
+
+tree->iova_taddr_map = iova_tree_new();
+tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
+

[PATCH v6 8/8] tcg/sparc: Support unaligned access for user-only

2022-02-08 Thread Richard Henderson

This is kinda sorta the opposite of the other tcg hosts, where
we get (normal) alignment checks for free with host SIGBUS and
need to add code to support unaligned accesses.

This inline code expansion is somewhat large, but it takes quite
a few instructions to make a function call to a helper anyway.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/sparc/tcg-target.c.inc | 219 +++--
 1 file changed, 211 insertions(+), 8 deletions(-)

diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index ed83e2dcd7..f227572857 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -211,6 +211,7 @@ static const int tcg_target_call_oarg_regs[] = {
 #define ARITH_ADD  (INSN_OP(2) | INSN_OP3(0x00))
 #define ARITH_ADDCC (INSN_OP(2) | INSN_OP3(0x10))
 #define ARITH_AND  (INSN_OP(2) | INSN_OP3(0x01))
+#define ARITH_ANDCC (INSN_OP(2) | INSN_OP3(0x11))
 #define ARITH_ANDN (INSN_OP(2) | INSN_OP3(0x05))
 #define ARITH_OR   (INSN_OP(2) | INSN_OP3(0x02))
 #define ARITH_ORCC (INSN_OP(2) | INSN_OP3(0x12))
@@ -1025,6 +1026,38 @@ static void build_trampolines(TCGContext *s)
 tcg_out_mov_delay(s, TCG_REG_O0, TCG_AREG0);
 }
 }
+#else
+static const tcg_insn_unit *qemu_unalign_ld_trampoline;
+static const tcg_insn_unit *qemu_unalign_st_trampoline;
+
+static void build_trampolines(TCGContext *s)
+{
+for (int ld = 0; ld < 2; ++ld) {
+void *helper;
+
+while ((uintptr_t)s->code_ptr & 15) {
+tcg_out_nop(s);
+}
+
+if (ld) {
+helper = helper_unaligned_ld;
+qemu_unalign_ld_trampoline = tcg_splitwx_to_rx(s->code_ptr);
+} else {
+helper = helper_unaligned_st;
+qemu_unalign_st_trampoline = tcg_splitwx_to_rx(s->code_ptr);
+}
+
+if (!SPARC64 && TARGET_LONG_BITS == 64) {
+/* Install the high part of the address.  */
+tcg_out_arithi(s, TCG_REG_O1, TCG_REG_O2, 32, SHIFT_SRLX);
+}
+
+/* Tail call.  */
+tcg_out_jmpl_const(s, helper, true, true);
+/* delay slot -- set the env argument */
+tcg_out_mov_delay(s, TCG_REG_O0, TCG_AREG0);
+}
+}
 #endif
 
 /* Generate global QEMU prologue and epilogue code */
@@ -1075,9 +1108,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 /* delay slot */
 tcg_out_movi_imm13(s, TCG_REG_O0, 0);
 
-#ifdef CONFIG_SOFTMMU
 build_trampolines(s);
-#endif
 }
 
 static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
@@ -1162,18 +1193,22 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg 
addr, int mem_index,
 static const int qemu_ld_opc[(MO_SSIZE | MO_BSWAP) + 1] = {
 [MO_UB]   = LDUB,
 [MO_SB]   = LDSB,
+[MO_UB | MO_LE] = LDUB,
+[MO_SB | MO_LE] = LDSB,
 
 [MO_BEUW] = LDUH,
 [MO_BESW] = LDSH,
 [MO_BEUL] = LDUW,
 [MO_BESL] = LDSW,
 [MO_BEUQ] = LDX,
+[MO_BESQ] = LDX,
 
 [MO_LEUW] = LDUH_LE,
 [MO_LESW] = LDSH_LE,
 [MO_LEUL] = LDUW_LE,
 [MO_LESL] = LDSW_LE,
 [MO_LEUQ] = LDX_LE,
+[MO_LESQ] = LDX_LE,
 };
 
 static const int qemu_st_opc[(MO_SIZE | MO_BSWAP) + 1] = {
@@ -1192,11 +1227,12 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, 
TCGReg addr,
 MemOpIdx oi, bool is_64)
 {
 MemOp memop = get_memop(oi);
+tcg_insn_unit *label_ptr;
+
 #ifdef CONFIG_SOFTMMU
 unsigned memi = get_mmuidx(oi);
 TCGReg addrz, param;
 const tcg_insn_unit *func;
-tcg_insn_unit *label_ptr;
 
 addrz = tcg_out_tlb_load(s, addr, memi, memop,
  offsetof(CPUTLBEntry, addr_read));
@@ -1260,13 +1296,99 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, 
TCGReg addr,
 
 *label_ptr |= INSN_OFF19(tcg_ptr_byte_diff(s->code_ptr, label_ptr));
 #else
+TCGReg index = (guest_base ? TCG_GUEST_BASE_REG : TCG_REG_G0);
+unsigned a_bits = get_alignment_bits(memop);
+unsigned s_bits = memop & MO_SIZE;
+unsigned t_bits;
+
 if (SPARC64 && TARGET_LONG_BITS == 32) {
 tcg_out_arithi(s, TCG_REG_T1, addr, 0, SHIFT_SRL);
 addr = TCG_REG_T1;
 }
-tcg_out_ldst_rr(s, data, addr,
-(guest_base ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+
+/*
+ * Normal case: alignment equal to access size.
+ */
+if (a_bits == s_bits) {
+tcg_out_ldst_rr(s, data, addr, index,
+qemu_ld_opc[memop & (MO_BSWAP | MO_SSIZE)]);
+return;
+}
+
+/*
+ * Test for at least natural alignment, and assume most accesses
+ * will be aligned -- perform a straight load in the delay slot.
+ * This is required to preserve atomicity for aligned accesses.
+ */
+t_bits = MAX(a_bits, s_bits);
+tcg_debug_assert(t_bits < 13);
+tcg_out_arithi(s, TCG_REG_G0, addr, (1u << t_bits) - 1, ARITH_ANDCC);
+
+/* beq,a,pt %icc, label */
+label_ptr = s->code_ptr;
+tcg_out_bpcc0(s, COND_E, BPCC_A | BPCC_PT | BPCC_ICC, 0);

Re: [PATCH 2/5] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-02-08 Thread Stefan Hajnoczi

On Tue, Feb 08, 2022 at 02:42:41PM +0800, Yongji Xie wrote:
> On Mon, Feb 7, 2022 at 10:01 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Jan 25, 2022 at 09:17:57PM +0800, Xie Yongji wrote:
> > > +int vduse_dev_handler(VduseDev *dev)
> > > +{
> > > +struct vduse_dev_request req;
> > > +struct vduse_dev_response resp = { 0 };
> > > +VduseVirtq *vq;
> > > +int i, ret;
> > > +
> > > +ret = read(dev->fd, &req, sizeof(req));
> >
> > This file descriptor is blocking? I guess the assumption is that the
> > kernel VDUSE code always enqueues at least one struct vduse_dev_request,
> > so userspace will not block when the file descriptor becomes readable?
> >
> 
> Yes, that's true. We can always get one entire request if the file
> descriptor becomes readable.

Okay, then the code is fine. We trust the kernel not to block us. While
it is possible to get spurious select(2)/poll(2) ready file descriptors
in the general case (e.g. multiple processes monitoring the same file),
I don't think that can happen in this case.


signature.asc
Description: PGP signature

[RFC 3/8] ioregionfd: introduce memory API functions

2022-02-08 Thread Elena Ufimtseva

Similar to ioeventfd, introduce the ioregionfd
functions to add and delete ioregionfds.

Signed-off-by: Elena Ufimtseva 
---
 softmmu/memory.c | 207 +++
 1 file changed, 207 insertions(+)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5..3618c5d1cf 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -40,6 +40,7 @@ static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
 unsigned int global_dirty_tracking;
+static bool ioregionfd_update_pending;
 
 static QTAILQ_HEAD(, MemoryListener) memory_listeners
 = QTAILQ_HEAD_INITIALIZER(memory_listeners);
@@ -170,6 +171,13 @@ struct MemoryRegionIoeventfd {
 EventNotifier *e;
 };
 
+struct MemoryRegionIoregionfd {
+AddrRange addr;
+uint64_t data;
+int fd;
+bool pio;
+};
+
 static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd *a,
MemoryRegionIoeventfd *b)
 {
@@ -214,6 +222,33 @@ static bool 
memory_region_ioeventfd_equal(MemoryRegionIoeventfd *a,
 return false;
 }
 
+static bool memory_region_ioregionfd_before(MemoryRegionIoregionfd *a,
+   MemoryRegionIoregionfd *b)
+{
+if (int128_lt(a->addr.start, b->addr.start)) {
+return true;
+} else if (int128_gt(a->addr.start, b->addr.start)) {
+return false;
+} else if (int128_lt(a->addr.size, b->addr.size)) {
+return true;
+} else if (int128_gt(a->addr.size, b->addr.size)) {
+return false;
+}
+return false;
+}
+
+static bool memory_region_ioregionfd_equal(MemoryRegionIoregionfd *a,
+  MemoryRegionIoregionfd *b)
+{
+if (int128_eq(a->addr.start, b->addr.start) &&
+(!int128_nz(a->addr.size) || !int128_nz(b->addr.size) ||
+ (int128_eq(a->addr.size, b->addr.size) &&
+  (a->fd == b->fd
+return true;
+
+return false;
+}
+
 /* Range of memory in the global map.  Addresses are absolute. */
 struct FlatRange {
 MemoryRegion *mr;
@@ -800,6 +835,52 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
 }
 }
 
+static void address_space_add_del_ioregionfds(AddressSpace *as,
+  MemoryRegionIoregionfd *fds_new,
+  unsigned fds_new_nb,
+  MemoryRegionIoregionfd *fds_old,
+  unsigned fds_old_nb)
+{
+unsigned iold, inew;
+MemoryRegionIoregionfd *fd;
+MemoryRegionSection section;
+
+iold = inew = 0;
+while (iold < fds_old_nb || inew < fds_new_nb) {
+if (iold < fds_old_nb
+&& (inew == fds_new_nb
+|| memory_region_ioregionfd_before(&fds_old[iold],
+  &fds_new[inew]))) {
+fd = &fds_old[iold];
+section = (MemoryRegionSection) {
+.fv = address_space_to_flatview(as),
+.offset_within_address_space = int128_get64(fd->addr.start),
+.size = fd->addr.size,
+};
+MEMORY_LISTENER_CALL(as, ioregionfd_del, Forward, §ion,
+ fd->data, fd->fd);
+++iold;
+
+} else if (inew < fds_new_nb
+   && (iold == fds_old_nb
+   || memory_region_ioregionfd_before(&fds_new[inew],
+ &fds_old[iold]))) {
+fd = &fds_new[inew];
+section = (MemoryRegionSection) {
+.fv = address_space_to_flatview(as),
+.offset_within_address_space = int128_get64(fd->addr.start),
+.size = fd->addr.size,
+};
+MEMORY_LISTENER_CALL(as, ioregionfd_add, Reverse, §ion,
+ fd->data, fd->fd);
+++inew;
+} else {
+++iold;
+++inew;
+}
+}
+}
+
 FlatView *address_space_get_flatview(AddressSpace *as)
 {
 FlatView *view;
@@ -814,6 +895,52 @@ FlatView *address_space_get_flatview(AddressSpace *as)
 return view;
 }
 
+static void address_space_update_ioregionfds(AddressSpace *as)
+{
+FlatView *view;
+FlatRange *fr;
+unsigned ioregionfd_nb = 0;
+unsigned ioregionfd_max;
+MemoryRegionIoregionfd *ioregionfds;
+AddrRange tmp;
+unsigned i;
+
+/*
+ * It is likely that the number of ioregionfds hasn't changed much, so use
+ * the previous size as the starting value, with some headroom to avoid
+ * gratuitous reallocations.
+ */
+ioregionfd_max = QEMU_ALIGN_UP(as->ioregionfd_nb, 4);
+ioregionfds = g_new(MemoryRegionIoregionfd, ioregionfd_max);
+
+view = address_space_get_flatview(as);
+FOR_EACH_FLAT_RANGE(fr, view) {
+for (i = 0; i < fr->mr->

Re: [PATCH 5/5] libvduse: Add support for reconnecting

2022-02-08 Thread Stefan Hajnoczi

On Tue, Feb 08, 2022 at 03:35:27PM +0800, Yongji Xie wrote:
> On Mon, Feb 7, 2022 at 10:39 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Jan 25, 2022 at 09:18:00PM +0800, Xie Yongji wrote:
> > > +static void *vduse_log_get(const char *dir, const char *name, size_t 
> > > size)
> > > +{
> > > +void *ptr = MAP_FAILED;
> > > +char *path;
> > > +int fd;
> > > +
> > > +path = (char *)malloc(strlen(dir) + strlen(name) +
> > > +  strlen("/vduse-log-") + 1);
> > > +if (!path) {
> > > +return ptr;
> > > +}
> > > +sprintf(path, "%s/vduse-log-%s", dir, name);
> >
> > Please use g_strdup_printf() and g_autofree in QEMU code. In libvduse
> > code it's okay to use malloc(3), but regular QEMU code should use glib.
> >
> 
> But this code resides in libvduse currently.

Oops, I thought we were in block/export/vduse-blk.c. Then it's fine to
use malloc(3).

> > > +static int vduse_queue_check_inflights(VduseVirtq *vq)
> > > +{
> > > +int i = 0;
> > > +VduseDev *dev = vq->dev;
> > > +
> > > +vq->used_idx = vq->vring.used->idx;
> >
> > Is this reading struct vring_used->idx without le16toh()?
> >
> > > +vq->resubmit_num = 0;
> > > +vq->resubmit_list = NULL;
> > > +vq->counter = 0;
> > > +
> > > +if (unlikely(vq->log->inflight.used_idx != vq->used_idx)) {
> > > +
> > > vq->log->inflight.desc[vq->log->inflight.last_batch_head].inflight = 0;
> >
> > I suggest validating vq->log->inflight fields before using them.
> > last_batch_head must be less than the virtqueue size. Although the log
> > file is somewhat trusted, there may still be ways to corrupt it or
> > confuse the new process that loads it.
> >
> 
> I can validate the last_batch_head field. But it's hard to validate
> the inflight field, so we might still meet some issues if the file is
> corrupted.

It's okay if the log tells us to resubmit virtqueue buffers that have
garbage vring descriptors because the vring code needs to handle garbage
descriptors anyway.

But we cannot load dest[untrusted_input] or do anything else that could
crash, corrupt memory, etc.

> > > @@ -988,6 +1212,12 @@ VduseDev *vduse_dev_create(const char *name, 
> > > uint32_t device_id,
> > >  vqs[i].index = i;
> > >  vqs[i].dev = dev;
> > >  vqs[i].fd = -1;
> > > +if (log) {
> > > +vqs[i].log = log;
> > > +vqs[i].log->inflight.desc_num = VIRTQUEUE_MAX_SIZE;
> > > +log = (void *)((char *)log +
> > > +  vduse_vq_log_size(VIRTQUEUE_MAX_SIZE));
> >
> > The size of the log needs to be verified. The file is mmapped but
> > there's no guarantee that the size matches num_queues *
> > vduse_vq_log_size(VIRTQUEUE_MAX_SIZE).
> >
> 
> We will call ftruncate() in vduse_log_get(). Is it enough?

Yes, I think so.

Thanks,
Stefan


signature.asc
Description: PGP signature

Re: [PATCH v3] i386/cpu: Remove the deprecated cpu model 'Icelake-Client'

2022-02-08 Thread Robert Hoo

Hi,

Can we remove the deprecated 'Icelake-Client' CPU model now? if so, I
can rebase patch to latest and resend.

Thanks.

On Sat, 2021-05-08 at 11:16 +0800, Robert Hoo wrote:
> Hi,
> 
> Ping...
> 
> Thanks.
> 
> On Thu, 2021-04-29 at 09:35 +0800, Robert Hoo wrote:
> > As it's been marked deprecated since v5.2, now I think it's time
> > remove it
> > from code.
> > 
> > Signed-off-by: Robert Hoo 
> > ---
> > Changelog:
> > v3:
> > Update deprecated.rst. (Sorry for my carelessness in last
> > search. I
> > sware I did search.)
> > v2:
> > Update removed-features.rst.
> > ---
> >  docs/system/deprecated.rst   |   6 --
> >  docs/system/removed-features.rst |   5 ++
> >  target/i386/cpu.c| 118 ---
> > 
> >  3 files changed, 5 insertions(+), 124 deletions(-)
> > 
> > diff --git a/docs/system/deprecated.rst
> > b/docs/system/deprecated.rst
> > index 80cae86..780b756 100644
> > --- a/docs/system/deprecated.rst
> > +++ b/docs/system/deprecated.rst
> > @@ -222,12 +222,6 @@ a future version of QEMU. Support for this CPU
> > was removed from the
> >  upstream Linux kernel, and there is no available upstream
> > toolchain
> >  to build binaries for it.
> >  
> > -``Icelake-Client`` CPU Model (since 5.2.0)
> > -''
> > -
> > -``Icelake-Client`` CPU Models are deprecated. Use ``Icelake-
> > Server`` 
> > CPU
> > -Models instead.
> > -
> >  MIPS ``I7200`` CPU Model (since 5.2)
> >  
> >  
> > diff --git a/docs/system/removed-features.rst
> > b/docs/system/removed-
> > features.rst
> > index 29e9060..f1b5a16 100644
> > --- a/docs/system/removed-features.rst
> > +++ b/docs/system/removed-features.rst
> > @@ -285,6 +285,11 @@ The RISC-V no MMU cpus have been removed. The
> > two CPUs: ``rv32imacu-nommu`` and
> >  ``rv64imacu-nommu`` can no longer be used. Instead the MMU status
> > can be specified
> >  via the CPU ``mmu`` option when using the ``rv32`` or ``rv64``
> > CPUs.
> >  
> > +x86 Icelake-Client CPU (removed in 6.1)
> > +'''
> > +
> > +``Icelake-Client`` cpu can no longer be used. Use ``Icelake-
> > Server`` 
> > instead.
> > +
> >  System emulator machines
> >  
> >  
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index ad99cad..75f2ad1 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -3338,124 +3338,6 @@ static X86CPUDefinition builtin_x86_defs[]
> > =
> > {
> >  .model_id = "Intel Xeon Processor (Cooperlake)",
> >  },
> >  {
> > -.name = "Icelake-Client",
> > -.level = 0xd,
> > -.vendor = CPUID_VENDOR_INTEL,
> > -.family = 6,
> > -.model = 126,
> > -.stepping = 0,
> > -.features[FEAT_1_EDX] =
> > -CPUID_VME | CPUID_SSE2 | CPUID_SSE | CPUID_FXSR |
> > CPUID_MMX |
> > -CPUID_CLFLUSH | CPUID_PSE36 | CPUID_PAT | CPUID_CMOV |
> > CPUID_MCA |
> > -CPUID_PGE | CPUID_MTRR | CPUID_SEP | CPUID_APIC |
> > CPUID_CX8 |
> > -CPUID_MCE | CPUID_PAE | CPUID_MSR | CPUID_TSC |
> > CPUID_PSE |
> > -CPUID_DE | CPUID_FP87,
> > -.features[FEAT_1_ECX] =
> > -CPUID_EXT_AVX | CPUID_EXT_XSAVE | CPUID_EXT_AES |
> > -CPUID_EXT_POPCNT | CPUID_EXT_X2APIC | CPUID_EXT_SSE42
> > |
> > -CPUID_EXT_SSE41 | CPUID_EXT_CX16 | CPUID_EXT_SSSE3 |
> > -CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSE3 |
> > -CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_FMA |
> > CPUID_EXT_MOVBE |
> > -CPUID_EXT_PCID | CPUID_EXT_F16C | CPUID_EXT_RDRAND,
> > -.features[FEAT_8000_0001_EDX] =
> > -CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
> > -CPUID_EXT2_SYSCALL,
> > -.features[FEAT_8000_0001_ECX] =
> > -CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM |
> > CPUID_EXT3_3DNOWPREFETCH,
> > -.features[FEAT_8000_0008_EBX] =
> > -CPUID_8000_0008_EBX_WBNOINVD,
> > -.features[FEAT_7_0_EBX] =
> > -CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
> > -CPUID_7_0_EBX_HLE | CPUID_7_0_EBX_AVX2 |
> > CPUID_7_0_EBX_SMEP |
> > -CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ERMS |
> > CPUID_7_0_EBX_INVPCID |
> > -CPUID_7_0_EBX_RTM | CPUID_7_0_EBX_RDSEED |
> > CPUID_7_0_EBX_ADX |
> > -CPUID_7_0_EBX_SMAP,
> > -.features[FEAT_7_0_ECX] =
> > -CPUID_7_0_ECX_AVX512_VBMI | CPUID_7_0_ECX_UMIP |
> > CPUID_7_0_ECX_PKU |
> > -CPUID_7_0_ECX_AVX512_VBMI2 | CPUID_7_0_ECX_GFNI |
> > -CPUID_7_0_ECX_VAES | CPUID_7_0_ECX_VPCLMULQDQ |
> > -CPUID_7_0_ECX_AVX512VNNI | CPUID_7_0_ECX_AVX512BITALG
> > |
> > -CPUID_7_0_ECX_AVX512_VPOPCNTDQ,
> > -.features[FEAT_7_0_EDX] =
> > -CPUID_7_0_EDX_SPEC_CTRL |
> > CPUID_7_0_EDX_SPEC_CTRL_SSBD,
> > -/* Missing: XSAVES (not supported by some L

[RFC 4/8] ioregionfd: Introduce IORegionDFObject type

2022-02-08 Thread Elena Ufimtseva

Signed-off-by: Elena Ufimtseva 
---
 meson.build|  15 ++-
 qapi/qom.json  |  32 +-
 include/hw/remote/ioregionfd.h |  40 +++
 hw/remote/ioregionfd.c | 196 +
 Kconfig.host   |   3 +
 MAINTAINERS|   2 +
 hw/remote/Kconfig  |   4 +
 hw/remote/meson.build  |   1 +
 meson_options.txt  |   2 +
 scripts/meson-buildoptions.sh  |   3 +
 10 files changed, 294 insertions(+), 4 deletions(-)
 create mode 100644 include/hw/remote/ioregionfd.h
 create mode 100644 hw/remote/ioregionfd.c

diff --git a/meson.build b/meson.build
index 96de1a6ef9..6483e754bd 100644
--- a/meson.build
+++ b/meson.build
@@ -258,6 +258,17 @@ if targetos != 'linux' and 
get_option('multiprocess').enabled()
 endif
 multiprocess_allowed = targetos == 'linux' and not 
get_option('multiprocess').disabled()
 
+# TODO: drop this limitation
+if not multiprocess_allowed and not get_option('ioregionfd').disabled()
+  error('To enable ioregiofd support, enable mutliprocess option.')
+endif
+ioregionfd_allowed = multiprocess_allowed and not 
get_option('ioregionfd').disabled()
+if ioregionfd_allowed
+config_host += { 'CONFIG_IOREGIONFD': 'y' }
+else
+config_host += { 'CONFIG_IOREGIONFD': 'n' }
+endif
+
 libm = cc.find_library('m', required: false)
 threads = dependency('threads')
 util = cc.find_library('util', required: false)
@@ -1837,7 +1848,8 @@ host_kconfig = \
   (have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
   ('CONFIG_LINUX' in config_host ? ['CONFIG_LINUX=y'] : []) + \
   ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : []) + \
-  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : [])
+  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : []) + \
+  (ioregionfd_allowed ? ['CONFIG_IOREGIONFD=y'] : [])
 
 ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
 
@@ -3315,6 +3327,7 @@ summary_info += {'target list':   ' 
'.join(target_dirs)}
 if have_system
   summary_info += {'default devices':   get_option('default_devices')}
   summary_info += {'out of process emulation': multiprocess_allowed}
+  summary_info += {'ioregionfd support': ioregionfd_allowed}
 endif
 summary(summary_info, bool_yn: true, section: 'Targets and accelerators')
 
diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3..439fb94c93 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -689,6 +689,29 @@
 'data': { 'chardev': 'str',
   '*log': 'str' } }
 
+##
+# @IORegionFDObjectProperties:
+#
+# Describes ioregionfd for the device
+#
+# @devid: the id of the device to be associated with the ioregionfd
+#
+# @iofd: File descriptor
+#
+# @bar: BAR number to use with ioregionfd
+#
+# @start: offset from the BAR start address of ioregionfd
+#
+# @size: size of the ioregionfd
+##
+# Since: 2.9
+{ 'struct': 'IORegionFDObjectProperties',
+  'data': { 'devid': 'str',
+'iofd': 'str',
+'bar': 'int',
+'*start': 'int',
+'*size':'int' } }
+
 ##
 # @RemoteObjectProperties:
 #
@@ -842,8 +865,10 @@
 'tls-creds-psk',
 'tls-creds-x509',
 'tls-cipher-suites',
-{ 'name': 'x-remote-object', 'features': [ 'unstable' ] }
-  ] }
+{ 'name': 'x-remote-object', 'features': [ 'unstable' ] },
+{ 'name' :'ioregionfd-object',
+  'if': 'CONFIG_IOREGIONFD' }
+ ] }
 
 ##
 # @ObjectOptions:
@@ -905,7 +930,8 @@
   'tls-creds-psk':  'TlsCredsPskProperties',
   'tls-creds-x509': 'TlsCredsX509Properties',
   'tls-cipher-suites':  'TlsCredsProperties',
-  'x-remote-object':'RemoteObjectProperties'
+  'x-remote-object':'RemoteObjectProperties',
+  'ioregionfd-object':  'IORegionFDObjectProperties'
   } }
 
 ##
diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
new file mode 100644
index 00..c8a8b32ee0
--- /dev/null
+++ b/include/hw/remote/ioregionfd.h
@@ -0,0 +1,40 @@
+/*
+ * Ioregionfd headers
+ *
+ * Copyright © 2018, 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef IOREGIONFD_H
+#define IOREGIONFD_H
+
+#define PCI_BARS_NR 6
+
+typedef struct {
+uint64_t val;
+bool memory;
+} IORegionFDOp;
+
+typedef struct {
+int fd;
+char *devid;
+uint32_t bar;
+uint32_t start;
+uint32_t size;
+bool memory;
+} IORegionFD;
+
+struct IORegionFDObject {
+/* private */
+Object parent;
+
+IORegionFD ioregfd;
+QTAILQ_ENTRY(IORegionFDObject) next;
+};
+
+typedef struct IORegionFDObject IORegionFDObject;
+
+#endif /* IOREGIONFD_H */
diff --git a/hw/remote/ioregionfd.c b/hw/remote/ioregionfd.c
new file mode 100644
index 00..ae95f702a6
--- /dev/null
+++ b/hw/remote/ioregionfd.c
@@ -0,0 +1,196 @@
+/*
+ * Memory manager for remote device
+

Re: [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM

2022-02-08 Thread Paolo Bonzini


On 2/7/22 11:36, Peter Xu wrote:

Yeah I can do.  Though the latter "if (!flags)" check will also start to allow
nesting of memory_global_dirty_log_start(), and it'll make this assert useless:

 assert(!(global_dirty_tracking & flags));

I'll probably drop it too, then.

Curious: do we have any real case of nesting calls of starting dirty log?  I
always thought there's none, but I could miss something.


I don't think so, but I think there's no disadvantage in allowing it.

Paolo

Re: [PATCH v2 0/5] Misc OHCI clean ups

2022-02-08 Thread BALATON Zoltan


On Tue, 25 Jan 2022, BALATON Zoltan wrote:

v2 - Fixed checkpatch errors

Hello,


Ping?

Regards,
BALATON Zoltan


I have these patches from last October when we've looked at what
causes problems with mac99 and USB. We've found the main problem is
likely not allowing pending packets per endpoint which we did not fix
but these patches came out of debugging that and trying to improve the
device model so eventually the real problem could be fixed more
easily. So these are just clean ups and fixing one potential issue
with isochronous transfers breaking pending async packet but it does
not solve all problems OHCI currently has. I'm sending it anyway as I
don't plan to work further on this so this series could be taken as is
for now.

Regards,

BALATON Zoltan (5):
 usb/ohci: Move trace point and log ep number to help debugging
 usb/ohci: Move cancelling async packet to ohci_stop_endpoints()
 usb/ohci: Move USBPortOps related functions together
 usb/ohci: Merge ohci_async_cancel_device() into ohci_child_detach()
 usb/ohci: Don't use packet from OHCIState for isochronous transfers

hw/usb/hcd-ohci.c   | 297 +---
hw/usb/trace-events |   2 +-
2 files changed, 146 insertions(+), 153 deletions(-)

[RFC 6/8] multiprocess: add MPQEMU_CMD_BAR_INFO

2022-02-08 Thread Elena Ufimtseva

This command is used to request the bar type info from
remote device.

Signed-off-by: Elena Ufimtseva 
---
 include/hw/remote/ioregionfd.h  |  2 ++
 include/hw/remote/machine.h |  1 +
 include/hw/remote/mpqemu-link.h |  2 ++
 hw/remote/ioregionfd.c  | 28 
 hw/remote/message.c | 38 +
 hw/remote/remote-obj.c  |  1 +
 6 files changed, 72 insertions(+)

diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
index 85a2ef2c4f..66bb459f76 100644
--- a/include/hw/remote/ioregionfd.h
+++ b/include/hw/remote/ioregionfd.h
@@ -38,4 +38,6 @@ struct IORegionFDObject {
 typedef struct IORegionFDObject IORegionFDObject;
 
 GSList *ioregionfd_get_obj_list(void);
+IORegionFD *ioregionfd_get_by_bar(GSList *list, uint32_t bar);
+void ioregionfd_set_bar_type(GSList *list, uint32_t bar, bool memory);
 #endif /* IOREGIONFD_H */
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 2a2a33c4b2..71c53ba0d7 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -28,6 +28,7 @@ struct RemoteMachineState {
 typedef struct RemoteCommDev {
 PCIDevice *dev;
 QIOChannel *ioc;
+GSList *ioregions_list;
 } RemoteCommDev;
 
 #define TYPE_REMOTE_MACHINE "x-remote-machine"
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
index 4ec0915885..be546e4586 100644
--- a/include/hw/remote/mpqemu-link.h
+++ b/include/hw/remote/mpqemu-link.h
@@ -17,6 +17,7 @@
 #include "exec/hwaddr.h"
 #include "io/channel-socket.h"
 #include "hw/remote/proxy.h"
+#include "hw/remote/ioregionfd.h"
 
 #define REMOTE_MAX_FDS 8
 
@@ -41,6 +42,7 @@ typedef enum {
 MPQEMU_CMD_BAR_READ,
 MPQEMU_CMD_SET_IRQFD,
 MPQEMU_CMD_DEVICE_RESET,
+MPQEMU_CMD_BAR_INFO,
 MPQEMU_CMD_MAX,
 } MPQemuCmd;
 
diff --git a/hw/remote/ioregionfd.c b/hw/remote/ioregionfd.c
index 85ec0f7d38..1d371357c6 100644
--- a/hw/remote/ioregionfd.c
+++ b/hw/remote/ioregionfd.c
@@ -63,6 +63,34 @@ GSList *ioregionfd_get_obj_list(void)
 return list;
 }
 
+IORegionFD *ioregionfd_get_by_bar(GSList *list, uint32_t bar)
+{
+IORegionFDObject *ioregionfd;
+GSList *elem;
+
+for (elem = list; elem; elem = elem->next) {
+ioregionfd = elem->data;
+
+if (ioregionfd->ioregfd.bar == bar) {
+return &ioregionfd->ioregfd;
+}
+}
+return NULL;
+}
+
+void ioregionfd_set_bar_type(GSList *list, uint32_t bar, bool memory)
+{
+IORegionFDObject *ioregionfd;
+GSList *elem;
+
+for (elem = list; elem; elem = elem->next) {
+ioregionfd = elem->data;
+if (ioregionfd->ioregfd.bar == bar) {
+ioregionfd->ioregfd.memory = memory;
+}
+}
+}
+
 static void ioregionfd_object_init(Object *obj)
 {
 IORegionFDObjectClass *k = IOREGIONFD_OBJECT_GET_CLASS(obj);
diff --git a/hw/remote/message.c b/hw/remote/message.c
index 11d729845c..a8fb9764ba 100644
--- a/hw/remote/message.c
+++ b/hw/remote/message.c
@@ -29,6 +29,8 @@ static void process_bar_write(QIOChannel *ioc, MPQemuMsg 
*msg, Error **errp);
 static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
 static void process_device_reset_msg(QIOChannel *ioc, PCIDevice *dev,
  Error **errp);
+static void process_device_get_reg_info(QIOChannel *ioc, RemoteCommDev *com,
+MPQemuMsg *msg, Error **errp);
 
 void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
 {
@@ -75,6 +77,9 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
 case MPQEMU_CMD_DEVICE_RESET:
 process_device_reset_msg(com->ioc, pci_dev, &local_err);
 break;
+case MPQEMU_CMD_BAR_INFO:
+process_device_get_reg_info(com->ioc, com, &msg, &local_err);
+break;
 default:
 error_setg(&local_err,
"Unknown command (%d) received for device %s"
@@ -91,6 +96,39 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
 }
 }
 
+static void process_device_get_reg_info(QIOChannel *ioc, RemoteCommDev *com,
+MPQemuMsg *msg, Error **errp)
+{
+ERRP_GUARD();
+uint32_t bar = (uint32_t)(msg->data.u64 & MAKE_64BIT_MASK(0, 32));
+bool memory;
+
+memory = (msg->data.u64 && MAKE_64BIT_MASK(32, 32)) == 1 ?  true : false;
+
+IORegionFD *ioregfd;
+MPQemuMsg ret = { 0 };
+
+error_report("Bar is %d, mem %s", bar, memory ? "true" : "false");
+
+memset(&ret, 0, sizeof(MPQemuMsg));
+ret.cmd = MPQEMU_CMD_RET;
+ret.size = sizeof(ret.data.u64);
+
+ioregfd = ioregionfd_get_by_bar(com->ioregions_list, bar);
+if (ioregfd) {
+ret.data.u64 = ioregfd->bar;
+if (ioregfd->memory != memory) {
+ioregionfd_set_bar_type(com->ioregions_list, bar, memory);
+}
+} else {
+ret.data.u64 = UINT64_MAX;
+}
+if (!mpqemu_msg_send

Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding

2022-02-08 Thread Jason Wang




在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:

On Sun, Jan 30, 2022 at 7:47 AM Jason Wang  wrote:


在 2022/1/22 上午4:27, Eugenio Pérez 写道:

@@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, 
int svq_kick_fd)
   void vhost_svq_stop(VhostShadowVirtqueue *svq)
   {
   event_notifier_set_handler(&svq->svq_kick, NULL);
+g_autofree VirtQueueElement *next_avail_elem = NULL;
+
+if (!svq->vq) {
+return;
+}
+
+/* Send all pending used descriptors to guest */
+vhost_svq_flush(svq, false);


Do we need to wait for all the pending descriptors to be completed here?


No, this function does not wait, it only completes the forwarding of
the *used* descriptors.

The best example is the net rx queue in my opinion. This call will
check SVQ's vring used_idx and will forward the last used descriptors
if any, but all available descriptors will remain as available for
qemu's VQ code.

To skip it would miss those last rx descriptors in migration.

Thanks!



So it's probably to not the best place to ask. It's more about the 
inflight descriptors so it should be TX instead of RX.


I can imagine the migration last phase, we should stop the vhost-vDPA 
before calling vhost_svq_stop(). Then we should be fine regardless of 
inflight descriptors.


Thanks





Thanks



+
+for (unsigned i = 0; i < svq->vring.num; ++i) {
+g_autofree VirtQueueElement *elem = NULL;
+elem = g_steal_pointer(&svq->ring_id_maps[i]);
+if (elem) {
+virtqueue_detach_element(svq->vq, elem, elem->len);
+}
+}
+
+next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
+if (next_avail_elem) {
+virtqueue_detach_element(svq->vq, next_avail_elem,
+ next_avail_elem->len);
+}
   }

[RFC 5/8] multiprocess: prepare ioregionfds for remote device

2022-02-08 Thread Elena Ufimtseva

Signed-off-by: Elena Ufimtseva 
---
 include/hw/remote/ioregionfd.h |  1 +
 include/hw/remote/remote.h |  1 +
 hw/remote/ioregionfd.c | 26 ++
 hw/remote/remote-obj.c | 93 ++
 4 files changed, 121 insertions(+)

diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
index c8a8b32ee0..85a2ef2c4f 100644
--- a/include/hw/remote/ioregionfd.h
+++ b/include/hw/remote/ioregionfd.h
@@ -37,4 +37,5 @@ struct IORegionFDObject {
 
 typedef struct IORegionFDObject IORegionFDObject;
 
+GSList *ioregionfd_get_obj_list(void);
 #endif /* IOREGIONFD_H */
diff --git a/include/hw/remote/remote.h b/include/hw/remote/remote.h
index a2d23178b9..46390c7934 100644
--- a/include/hw/remote/remote.h
+++ b/include/hw/remote/remote.h
@@ -23,6 +23,7 @@ struct RemoteObject {
 
 DeviceState *dev;
 DeviceListener listener;
+GHashTable *ioregionfd_hash;
 };
 
 #endif
diff --git a/hw/remote/ioregionfd.c b/hw/remote/ioregionfd.c
index ae95f702a6..85ec0f7d38 100644
--- a/hw/remote/ioregionfd.c
+++ b/hw/remote/ioregionfd.c
@@ -37,6 +37,32 @@ struct IORegionFDObjectClass {
 unsigned int max_ioregfds;
 };
 
+static int ioregionfd_obj_list(Object *obj, void *opaque)
+{
+GSList **list = opaque;
+
+if (object_dynamic_cast(obj, TYPE_IOREGIONFD_OBJECT)) {
+*list = g_slist_append(*list, obj);
+}
+
+object_child_foreach(obj, ioregionfd_obj_list, opaque);
+return 0;
+}
+
+/*
+ * inquire ioregionfd objects and link them into the list which is
+ * returned to the caller.
+ *
+ * Caller must free the list.
+ */
+GSList *ioregionfd_get_obj_list(void)
+{
+GSList *list = NULL;
+
+object_child_foreach(object_get_root(), ioregionfd_obj_list, &list);
+return list;
+}
+
 static void ioregionfd_object_init(Object *obj)
 {
 IORegionFDObjectClass *k = IOREGIONFD_OBJECT_GET_CLASS(obj);
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
index f0da696662..9bb61c3a2d 100644
--- a/hw/remote/remote-obj.c
+++ b/hw/remote/remote-obj.c
@@ -24,6 +24,10 @@
 #include "qemu/sockets.h"
 #include "monitor/monitor.h"
 #include "hw/remote/remote.h"
+#include "hw/remote/ioregionfd.h"
+#include "qemu/cutils.h"
+#include "qapi/qapi-visit-qom.h"
+#include "qapi/string-output-visitor.h"
 
 #define TYPE_REMOTE_OBJECT "x-remote-object"
 OBJECT_DECLARE_TYPE(RemoteObject, RemoteObjectClass, REMOTE_OBJECT)
@@ -74,6 +78,80 @@ static void remote_object_unrealize_listener(DeviceListener 
*listener,
 }
 }
 
+static GSList *ioregions_list;
+
+static unsigned int ioregionfd_bar_hash(const void *key)
+{
+const IORegionFDObject *o = key;
+
+return g_int_hash(&o->ioregfd.bar);
+}
+
+/* TODO: allow for multiple ioregionfds per BAR. */
+static gboolean ioregionfd_bar_equal(const void *a, const void *b)
+{
+const IORegionFDObject *oa = a;
+const IORegionFDObject *ob = b;
+
+error_report("BARS comparing %d %d", oa->ioregfd.bar, ob->ioregfd.bar);
+if (oa->ioregfd.bar == ob->ioregfd.bar) {
+return TRUE;
+}
+return FALSE;
+}
+
+static void ioregionfd_prepare_for_dev(RemoteObject *o, PCIDevice *dev)
+{
+IORegionFDObject *ioregfd_obj = NULL;
+GSList *obj_list, *list;
+
+list = ioregionfd_get_obj_list();
+
+o->ioregionfd_hash = g_hash_table_new(ioregionfd_bar_hash,
+   ioregionfd_bar_equal);
+
+for (obj_list = list; obj_list; obj_list = obj_list->next) {
+ioregfd_obj = obj_list->data;
+if (strcmp(ioregfd_obj->ioregfd.devid, o->devid) != 0) {
+list = g_slist_remove(list, ioregfd_obj);
+error_report("No my dev remove");
+continue;
+}
+if (!g_hash_table_add(o->ioregionfd_hash, ioregfd_obj)) {
+error_report("Cannot use more than one ioregionfd per bar");
+list = g_slist_remove(list, ioregfd_obj);
+object_unparent(OBJECT(ioregfd_obj));
+} else {
+error_report("Added to hash");
+}
+}
+
+if (!list) {
+error_report("Remote device %s will not have ioregionfds.",
+ o->devid);
+goto fatal;
+}
+
+/*
+ * Take first element in the list of ioregions and use its fd
+ * for all regions for this device.
+ * TODO: make this more flexible and allow different fd for the
+ * device.
+ */
+ioregfd_obj = list->data;
+
+/* This is default and will be changed when proxy requests region info. */
+ioregfd_obj->ioregfd.memory = true;
+
+ioregions_list = list;
+return;
+
+ fatal:
+g_slist_free(list);
+g_hash_table_destroy(o->ioregionfd_hash);
+return;
+}
+
 static void remote_object_machine_done(Notifier *notifier, void *data)
 {
 RemoteObject *o = container_of(notifier, RemoteObject, machine_done);
@@ -98,6 +176,10 @@ static void remote_object_machine_done(Notifier *notifier, 
void *data)
 
 o->dev = dev;
 
+#if CONFIG_IOREGIONFD
+ioregionfd_prepare_f

Re: [PATCH 5/5] libvduse: Add support for reconnecting

2022-02-08 Thread Yongji Xie

On Mon, Feb 7, 2022 at 10:39 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jan 25, 2022 at 09:18:00PM +0800, Xie Yongji wrote:
> > To support reconnecting after restart or crash, VDUSE backend
> > might need to resubmit inflight I/Os. This stores the metadata
> > such as the index of inflight I/O's descriptors to a shm file so
> > that VDUSE backend can restore them during reconnecting.
> >
> > Signed-off-by: Xie Yongji 
> > ---
> >  block/export/vduse-blk.c|   4 +-
> >  subprojects/libvduse/libvduse.c | 254 +++-
> >  subprojects/libvduse/libvduse.h |   4 +-
> >  3 files changed, 254 insertions(+), 8 deletions(-)
> >
> > diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
> > index 83845e9a9a..bc14fd798b 100644
> > --- a/block/export/vduse-blk.c
> > +++ b/block/export/vduse-blk.c
> > @@ -232,6 +232,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
> > VduseVirtq *vq)
> >
> >  aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
> > true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
> > +/* Make sure we don't miss any kick afer reconnecting */
> > +eventfd_write(vduse_queue_get_fd(vq), 1);
> >  }
> >
> >  static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
> > @@ -388,7 +390,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
> > BlockExportOptions *opts,
> >   features, num_queues,
> >   sizeof(struct virtio_blk_config),
> >   (char *)&config, &vduse_blk_ops,
> > - vblk_exp);
> > + g_get_tmp_dir(), vblk_exp);
> >  if (!vblk_exp->dev) {
> >  error_setg(errp, "failed to create vduse device");
> >  return -ENOMEM;
> > diff --git a/subprojects/libvduse/libvduse.c 
> > b/subprojects/libvduse/libvduse.c
> > index 7671864bca..ce2f6c7949 100644
> > --- a/subprojects/libvduse/libvduse.c
> > +++ b/subprojects/libvduse/libvduse.c
> > @@ -41,6 +41,8 @@
> >  #define VDUSE_VQ_ALIGN 4096
> >  #define MAX_IOVA_REGIONS 256
> >
> > +#define LOG_ALIGNMENT 64
> > +
> >  /* Round number down to multiple */
> >  #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
> >
> > @@ -51,6 +53,31 @@
> >  #define unlikely(x)   __builtin_expect(!!(x), 0)
> >  #endif
> >
> > +typedef struct VduseDescStateSplit {
> > +uint8_t inflight;
> > +uint8_t padding[5];
> > +uint16_t next;
> > +uint64_t counter;
> > +} VduseDescStateSplit;
> > +
> > +typedef struct VduseVirtqLogInflight {
> > +uint64_t features;
> > +uint16_t version;
> > +uint16_t desc_num;
> > +uint16_t last_batch_head;
> > +uint16_t used_idx;
> > +VduseDescStateSplit desc[];
> > +} VduseVirtqLogInflight;
> > +
> > +typedef struct VduseVirtqLog {
> > +VduseVirtqLogInflight inflight;
> > +} VduseVirtqLog;
> > +
> > +typedef struct VduseVirtqInflightDesc {
> > +uint16_t index;
> > +uint64_t counter;
> > +} VduseVirtqInflightDesc;
> > +
> >  typedef struct VduseRing {
> >  unsigned int num;
> >  uint64_t desc_addr;
> > @@ -73,6 +100,10 @@ struct VduseVirtq {
> >  bool ready;
> >  int fd;
> >  VduseDev *dev;
> > +VduseVirtqInflightDesc *resubmit_list;
> > +uint16_t resubmit_num;
> > +uint64_t counter;
> > +VduseVirtqLog *log;
> >  };
> >
> >  typedef struct VduseIovaRegion {
> > @@ -96,8 +127,67 @@ struct VduseDev {
> >  int fd;
> >  int ctrl_fd;
> >  void *priv;
> > +char *shm_log_dir;
> > +void *log;
> > +bool reconnect;
> >  };
> >
> > +static inline size_t vduse_vq_log_size(uint16_t queue_size)
> > +{
> > +return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
> > +sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
> > +}
> > +
> > +static void *vduse_log_get(const char *dir, const char *name, size_t size)
> > +{
> > +void *ptr = MAP_FAILED;
> > +char *path;
> > +int fd;
> > +
> > +path = (char *)malloc(strlen(dir) + strlen(name) +
> > +  strlen("/vduse-log-") + 1);
> > +if (!path) {
> > +return ptr;
> > +}
> > +sprintf(path, "%s/vduse-log-%s", dir, name);
>
> Please use g_strdup_printf() and g_autofree in QEMU code. In libvduse
> code it's okay to use malloc(3), but regular QEMU code should use glib.
>

But this code resides in libvduse currently.

> > +
> > +fd = open(path, O_RDWR | O_CREAT, 0600);
> > +if (fd == -1) {
> > +goto out;
> > +}
> > +
> > +if (ftruncate(fd, size) == -1) {
> > +goto out;
> > +}
> > +
> > +ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +if (ptr == MAP_FAILED) {
> > +goto out;
> > +}
> > +out:
> > +if (fd > 0) {
> > +close(fd);
> > +}
> > +free(path);
> > +
> > +return ptr;
> > +}
> > +
> > +static void vduse_log_destroy(const char *dir, const char *name)
> > +{
> > +

Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd

2022-02-08 Thread Jason Wang




在 2022/1/31 下午6:18, Eugenio Perez Martin 写道:

On Fri, Jan 28, 2022 at 7:29 AM Jason Wang  wrote:


在 2022/1/22 上午4:27, Eugenio Pérez 写道:

This function allows the vhost-vdpa backend to override kick_fd.

Signed-off-by: Eugenio Pérez 
---
   hw/virtio/vhost-shadow-virtqueue.h |  1 +
   hw/virtio/vhost-shadow-virtqueue.c | 45 ++
   2 files changed, 46 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 400effd9f2..a56ecfc09d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,7 @@

   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;

+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
   const EventNotifier *vhost_svq_get_dev_kick_notifier(
 const VhostShadowVirtqueue 
*svq);

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index bd87110073..21534bc94d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,6 +11,7 @@
   #include "hw/virtio/vhost-shadow-virtqueue.h"

   #include "qemu/error-report.h"
+#include "qemu/main-loop.h"

   /* Shadow virtqueue to relay notifications */
   typedef struct VhostShadowVirtqueue {
@@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
   EventNotifier hdev_kick;
   /* Shadow call notifier, sent to vhost */
   EventNotifier hdev_call;
+
+/*
+ * Borrowed virtqueue's guest to host notifier.
+ * To borrow it in this event notifier allows to register on the event
+ * loop and access the associated shadow virtqueue easily. If we use the
+ * VirtQueue, we don't have an easy way to retrieve it.
+ *
+ * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
+ */
+EventNotifier svq_kick;
   } VhostShadowVirtqueue;

+#define INVALID_SVQ_KICK_FD -1
+
   /**
* The notifier that SVQ will use to notify the device.
*/
@@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
   return &svq->hdev_kick;
   }

+/**
+ * Set a new file descriptor for the guest to kick SVQ and notify for avail
+ *
+ * @svq  The svq
+ * @svq_kick_fd  The new svq kick fd
+ */
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
+{
+EventNotifier tmp;
+bool check_old = INVALID_SVQ_KICK_FD !=
+ event_notifier_get_fd(&svq->svq_kick);
+
+if (check_old) {
+event_notifier_set_handler(&svq->svq_kick, NULL);
+event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
+}


It looks to me we don't do similar things in vhost-net. Any reason for
caring about the old svq_kick?


Do you mean to check for old kick_fd in case we miss notifications,
and explicitly omit the INVALID_SVQ_KICK_FD?



Yes.




If you mean qemu's vhost-net, I guess it's because the device's kick
fd is never changed in all the vhost device lifecycle, it's only set
at the beginning. Previous RFC also depended on that, but you
suggested better vhost and SVQ in v4 feedback if I understood
correctly [1]. Or am I missing something?



No, I forgot that. But in this case we should have a better dealing with 
the the conversion from valid fd to -1 by disabling the handler.





Qemu's vhost-net does not need to use this because it is not polling
it. For kernel's vhost, I guess the closest is the use of pollstop and
pollstart at vhost_vring_ioctl.

In my opinion, I think that SVQ code size can benefit from now
allowing to override kick_fd from the start of the operation. Not from
initialization, but start. But I can see the benefits of having the
change into account from this moment so it's more resilient to the
future.


+
+/*
+ * event_notifier_set_handler already checks for guest's notifications if
+ * they arrive to the new file descriptor in the switch, so there is no
+ * need to explicitely check for them.
+ */
+event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
+
+if (!check_old || event_notifier_test_and_clear(&tmp)) {
+event_notifier_set(&svq->hdev_kick);


Any reason we need to kick the device directly here?


At this point of the series only notifications are forwarded, not
buffers. If kick_fd is set, we need to check the old one, the same way
as vhost checks the masked notifier in case of change.



I meant we need to kick the svq instead of vhost-vdpa in this case?

Thanks




Thanks!

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg03152.html
, from "I'd suggest to not depend on this since it:"



Thanks



+}
+}
+
   /**
* Creates vhost shadow virtqueue, and instruct vhost device to use the 
shadow
* methods and file descriptors.
@@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
   goto err_init_hdev_call;
   }

+/* Placeholder descriptor, it should be deleted at set_kick_fd */
+e

Re: [PATCH RFC 07/15] migration: Introduce postcopy channels on dest node

2022-02-08 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Thu, Feb 03, 2022 at 03:08:39PM +, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > Postcopy handles huge pages in a special way that currently we can only 
> > > have
> > > one "channel" to transfer the page.
> > > 
> > > It's because when we install pages using UFFDIO_COPY, we need to have the 
> > > whole
> > > huge page ready, it also means we need to have a temp huge page when 
> > > trying to
> > > receive the whole content of the page.
> > > 
> > > Currently all maintainance around this tmp page is global: firstly we'll
> > > allocate a temp huge page, then we maintain its status mostly within
> > > ram_load_postcopy().
> > > 
> > > To enable multiple channels for postcopy, the first thing we need to do 
> > > is to
> > > prepare N temp huge pages as caching, one for each channel.
> > > 
> > > Meanwhile we need to maintain the tmp huge page status per-channel too.
> > > 
> > > To give some example, some local variables maintained in 
> > > ram_load_postcopy()
> > > are listed; they are responsible for maintaining temp huge page status:
> > > 
> > >   - all_zero: this keeps whether this huge page contains all zeros
> > >   - target_pages: this counts how many target pages have been copied
> > >   - host_page:this keeps the host ptr for the page to install
> > > 
> > > Move all these fields to be together with the temp huge pages to form a 
> > > new
> > > structure called PostcopyTmpPage.  Then for each (future) postcopy 
> > > channel, we
> > > need one structure to keep the state around.
> > > 
> > > For vanilla postcopy, obviously there's only one channel.  It contains 
> > > both
> > > precopy and postcopy pages.
> > > 
> > > This patch teaches the dest migration node to start realize the possible 
> > > number
> > > of postcopy channels by introducing the "postcopy_channels" variable.  Its
> > > value is calculated when setup postcopy on dest node (during 
> > > POSTCOPY_LISTEN
> > > phase).
> > > 
> > > Vanilla postcopy will have channels=1, but when postcopy-preempt 
> > > capability is
> > > enabled (in the future), we will boost it to 2 because even during partial
> > > sending of a precopy huge page we still want to preempt it and start 
> > > sending
> > > the postcopy requested page right away (so we start to keep two temp huge
> > > pages; more if we want to enable multifd).  In this patch there's a TODO 
> > > marked
> > > for that; so far the channels is always set to 1.
> > > 
> > > We need to send one "host huge page" on one channel only and we cannot 
> > > split
> > > them, because otherwise the data upon the same huge page can locate on 
> > > more
> > > than one channel so we need more complicated logic to manage.  One temp 
> > > host
> > > huge page for each channel will be enough for us for now.
> > > 
> > > Postcopy will still always use the index=0 huge page even after this 
> > > patch.
> > > However it prepares for the latter patches where it can start to use 
> > > multiple
> > > channels (which needs src intervention, because only src knows which 
> > > channel we
> > > should use).
> > 
> > Generally OK, some minor nits.
> > 
> > > Signed-off-by: Peter Xu 
> > > ---
> > >  migration/migration.h| 35 +++-
> > >  migration/postcopy-ram.c | 50 +---
> > >  migration/ram.c  | 43 +-
> > >  3 files changed, 91 insertions(+), 37 deletions(-)
> > > 
> > > diff --git a/migration/migration.h b/migration/migration.h
> > > index 8130b703eb..8bb2931312 100644
> > > --- a/migration/migration.h
> > > +++ b/migration/migration.h
> > > @@ -45,6 +45,24 @@ struct PostcopyBlocktimeContext;
> > >   */
> > >  #define CLEAR_BITMAP_SHIFT_MAX31
> > >  
> > > +/* This is an abstraction of a "temp huge page" for postcopy's purpose */
> > > +typedef struct {
> > > +/*
> > > + * This points to a temporary huge page as a buffer for UFFDIO_COPY. 
> > >  It's
> > > + * mmap()ed and needs to be freed when cleanup.
> > > + */
> > > +void *tmp_huge_page;
> > > +/*
> > > + * This points to the host page we're going to install for this temp 
> > > page.
> > > + * It tells us after we've received the whole page, where we should 
> > > put it.
> > > + */
> > > +void *host_addr;
> > > +/* Number of small pages copied (in size of TARGET_PAGE_SIZE) */
> > > +int target_pages;
> > 
> > Can we take the opportunity to convert this to an unsigned?
> 
> Sure.
> 
> > 
> > > +/* Whether this page contains all zeros */
> > > +bool all_zero;
> > > +} PostcopyTmpPage;
> > > +
> > >  /* State for the incoming migration */
> > >  struct MigrationIncomingState {
> > >  QEMUFile *from_src_file;
> > > @@ -81,7 +99,22 @@ struct MigrationIncomingState {
> > >  QemuMutex rp_mutex;/* We send replies from multiple threads */
> > >  /* RAMBlock of last request sent t

Re: [PATCH v3 0/2] python: a few improvements to qmp-shell

2022-02-08 Thread Daniel P . Berrangé

On Mon, Feb 07, 2022 at 04:05:47PM -0500, John Snow wrote:
> On Fri, Jan 28, 2022 at 11:12 AM Daniel P. Berrangé  
> wrote:
> >
> > This makes the qmp-shell program a little more pleasant to use when you
> > are just trying to spawn a throw-away QEMU process to query some info
> > from.
> >
> > First it introduces a 'qmp-shell-wrap' command that takes a QEMU command
> > line instead of QMP socket, and spawns QEMU automatically, so its life
> > is tied to that of the shell.
> >
> > Second it adds ability to log QMP commands/responses to a file that can
> > be queried with 'jq' to extract information. This is good for commands
> > which return huge JSON docs.
> >
> > In v3:
> >
> >  - Add qmp-shell-wrap to setup.cfg entry points
> >
> > In v2:
> >
> >  - Unlink unix socket path on exit
> >  - Fix default command name
> >  - Deal with flake8/pylint warnings
> >
> > Daniel P. Berrangé (2):
> >   python: introduce qmp-shell-wrap convenience tool
> >   python: support recording QMP session to a file
> >
> >  python/qemu/aqmp/qmp_shell.py | 88 ---
> >  python/setup.cfg  |  4 ++
> >  scripts/qmp/qmp-shell-wrap| 11 +
> >  3 files changed, 96 insertions(+), 7 deletions(-)
> >  create mode 100755 scripts/qmp/qmp-shell-wrap
> >
> > --
> > 2.34.1
> >
> >
> 
> Great, thanks! I rebased patch 1/2 myself as a courtesy and have staged these.
> 
> --js
> 
> (fwiw: using pip, it seems like the wrapper script works just fine. it
> appears as though using 'python3 setup.py install' does indeed cause
> issues here. I have a patch I'll send soon that discourages the direct
> setup.py invocation to avoid frustration in the future.)

I've only ever used  pip to install from pypi or remote git archives.
How do you use it to install from your local git checkout

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 5/5] libvduse: Add support for reconnecting

2022-02-08 Thread Yongji Xie

On Tue, Feb 8, 2022 at 4:09 PM Stefan Hajnoczi  wrote:
>
> On Tue, Feb 08, 2022 at 03:35:27PM +0800, Yongji Xie wrote:
> > On Mon, Feb 7, 2022 at 10:39 PM Stefan Hajnoczi  wrote:
> > >
> > > On Tue, Jan 25, 2022 at 09:18:00PM +0800, Xie Yongji wrote:
> > > > +static void *vduse_log_get(const char *dir, const char *name, size_t 
> > > > size)
> > > > +{
> > > > +void *ptr = MAP_FAILED;
> > > > +char *path;
> > > > +int fd;
> > > > +
> > > > +path = (char *)malloc(strlen(dir) + strlen(name) +
> > > > +  strlen("/vduse-log-") + 1);
> > > > +if (!path) {
> > > > +return ptr;
> > > > +}
> > > > +sprintf(path, "%s/vduse-log-%s", dir, name);
> > >
> > > Please use g_strdup_printf() and g_autofree in QEMU code. In libvduse
> > > code it's okay to use malloc(3), but regular QEMU code should use glib.
> > >
> >
> > But this code resides in libvduse currently.
>
> Oops, I thought we were in block/export/vduse-blk.c. Then it's fine to
> use malloc(3).
>
> > > > +static int vduse_queue_check_inflights(VduseVirtq *vq)
> > > > +{
> > > > +int i = 0;
> > > > +VduseDev *dev = vq->dev;
> > > > +
> > > > +vq->used_idx = vq->vring.used->idx;
> > >
> > > Is this reading struct vring_used->idx without le16toh()?
> > >
> > > > +vq->resubmit_num = 0;
> > > > +vq->resubmit_list = NULL;
> > > > +vq->counter = 0;
> > > > +
> > > > +if (unlikely(vq->log->inflight.used_idx != vq->used_idx)) {
> > > > +
> > > > vq->log->inflight.desc[vq->log->inflight.last_batch_head].inflight = 0;
> > >
> > > I suggest validating vq->log->inflight fields before using them.
> > > last_batch_head must be less than the virtqueue size. Although the log
> > > file is somewhat trusted, there may still be ways to corrupt it or
> > > confuse the new process that loads it.
> > >
> >
> > I can validate the last_batch_head field. But it's hard to validate
> > the inflight field, so we might still meet some issues if the file is
> > corrupted.
>
> It's okay if the log tells us to resubmit virtqueue buffers that have
> garbage vring descriptors because the vring code needs to handle garbage
> descriptors anyway.
>
> But we cannot load dest[untrusted_input] or do anything else that could
> crash, corrupt memory, etc.
>

Makes sense to me.

Thanks,
Yongji

Re: [PATCH RFC 07/15] migration: Introduce postcopy channels on dest node

2022-02-08 Thread Peter Xu

On Tue, Feb 08, 2022 at 09:43:49AM +, Dr. David Alan Gilbert wrote:
> > It'll be cleaned up later here:
> > 
> >   loadvm_postcopy_handle_listen
> > postcopy_ram_incoming_setup
> >   postcopy_temp_pages_setup
> > postcopy_ram_incoming_cleanup  <-- if fail above, go here
> >   postcopy_temp_pages_cleanup
> 
> Ah OK, it might still be worth a comment.

Will do.

-- 
Peter Xu

[PATCH v2] migration/rdma: set the REUSEADDR option for destination

2022-02-08 Thread Jack Wang

We hit following error during testing RDMA transport:
in case of migration error, mgmt daemon pick one migration port,
incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr

Then try another -incoming rdma:[::]:8103, sometime it worked,
sometimes need another try with other ports number.

Set the REUSEADDR option for destination, This allow address could
be reused to avoid rdma_bind_addr error out.

Signed-off-by: Jack Wang 
---
v2: extend commit message as discussed with Pankaj and David
---
 migration/rdma.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index c7c7a384875b..663e1fbb096d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
**errp)
 char ip[40] = "unknown";
 struct rdma_addrinfo *res, *e;
 char port_str[16];
+int reuse = 1;
 
 for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
 rdma->wr_data[idx].control_len = 0;
@@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
**errp)
 goto err_dest_init_bind_addr;
 }
 
+ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
+ &reuse, sizeof reuse);
+if (ret) {
+ERROR(errp, "Error: could not set REUSEADDR option");
+goto err_dest_init_bind_addr;
+}
 for (e = res; e != NULL; e = e->ai_next) {
 inet_ntop(e->ai_family,
 &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
-- 
2.25.1

Re: [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ

2022-02-08 Thread Jason Wang




在 2022/2/1 上午3:11, Eugenio Perez Martin 写道:

+return false;
+}
+
+/*
+ * Map->iova chunk size is ignored. What to do if descriptor
+ * (addr, size) does not fit is delegated to the device.
+ */

I think we need at least check the size and fail if the size doesn't
match here. Or is it possible that we have a buffer that may cross two
memory regions?


It should be impossible, since both iova_tree and VirtQueue should be
in sync regarding the memory regions updates. If a VirtQueue buffer
crosses many memory regions, iovec has more entries.

I can add a return false, but I'm not able to trigger that situation
even with a malformed driver.



Ok, but it won't harm to add a warn here.

Thanks

Re: [PATCH v12 4/5] softmmu/dirtylimit: implement virtual CPU throttle

2022-02-08 Thread Peter Xu

On Mon, Jan 24, 2022 at 10:10:39PM +0800, huang...@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) 
> 
> Setup a negative feedback system when vCPU thread
> handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
> throttle_us_per_full field in struct CPUState. Sleep
> throttle_us_per_full microseconds to throttle vCPU
> if dirtylimit is enabled.
> 
> Start a thread to track current dirty page rates and
> tune the throttle_us_per_full dynamically untill current
> dirty page rate reach the quota.
> 
> Introduce the util function in the header for dirtylimit
> implementation.
> 
> Signed-off-by: Hyman Huang(黄勇) 
> ---
>  accel/kvm/kvm-all.c |  13 ++
>  accel/stubs/kvm-stub.c  |   5 +
>  include/hw/core/cpu.h   |   6 +
>  include/sysemu/dirtylimit.h |  16 +++
>  include/sysemu/kvm.h|   2 +
>  softmmu/dirtylimit.c| 308 
> 
>  softmmu/trace-events|   8 ++
>  7 files changed, 358 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 1a5f1d1..60f51fd 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -45,6 +45,7 @@
>  #include "qemu/guest-random.h"
>  #include "sysemu/hw_accel.h"
>  #include "kvm-cpus.h"
> +#include "sysemu/dirtylimit.h"
>  
>  #include "hw/boards.h"
>  
> @@ -476,6 +477,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>  cpu->kvm_state = s;
>  cpu->vcpu_dirty = true;
>  cpu->dirty_pages = 0;
> +cpu->throttle_us_per_full = 0;
>  
>  mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
>  if (mmap_size < 0) {
> @@ -1469,6 +1471,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
>   */
>  sleep(1);
>  
> +/* keep sleeping in order to not interfere the dirtylimit */
> +if (dirtylimit_in_service()) {
> +continue;
> +}
> +
>  trace_kvm_dirty_ring_reaper("wakeup");
>  r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
>  
> @@ -2312,6 +2319,11 @@ bool kvm_dirty_ring_enabled(void)
>  return kvm_state->kvm_dirty_ring_size ? true : false;
>  }
>  
> +uint32_t kvm_dirty_ring_size(void)
> +{
> +return kvm_state->kvm_dirty_ring_size;
> +}

Please consider moving this into a small patch too along with the stub.

> +
>  static int kvm_init(MachineState *ms)
>  {
>  MachineClass *mc = MACHINE_GET_CLASS(ms);
> @@ -2961,6 +2973,7 @@ int kvm_cpu_exec(CPUState *cpu)
>  qemu_mutex_lock_iothread();
>  kvm_dirty_ring_reap(kvm_state, cpu);
>  qemu_mutex_unlock_iothread();
> +dirtylimit_vcpu_execute(cpu);
>  ret = 0;
>  break;
>  case KVM_EXIT_SYSTEM_EVENT:
> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
> index 5319573..1128cb2 100644
> --- a/accel/stubs/kvm-stub.c
> +++ b/accel/stubs/kvm-stub.c
> @@ -152,4 +152,9 @@ bool kvm_dirty_ring_enabled(void)
>  {
>  return false;
>  }
> +
> +uint32_t kvm_dirty_ring_size(void)
> +{
> +return 0;
> +}
>  #endif
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 76ab3b8..dbeb31a 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -411,6 +411,12 @@ struct CPUState {
>   */
>  bool throttle_thread_scheduled;
>  
> +/*
> + * Sleep throttle_us_per_full microseconds once dirty ring is full
> + * if dirty page rate limit is enabled.
> + */
> +int64_t throttle_us_per_full;
> +
>  bool ignore_memory_transaction_failures;
>  
>  /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */
> diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
> index da459f0..37e634b 100644
> --- a/include/sysemu/dirtylimit.h
> +++ b/include/sysemu/dirtylimit.h
> @@ -19,4 +19,20 @@ void vcpu_dirty_rate_stat_start(void);
>  void vcpu_dirty_rate_stat_stop(void);
>  void vcpu_dirty_rate_stat_initialize(void);
>  void vcpu_dirty_rate_stat_finalize(void);
> +
> +void dirtylimit_state_lock(void);
> +void dirtylimit_state_unlock(void);
> +void dirtylimit_state_initialize(void);
> +void dirtylimit_state_finalize(void);
> +void dirtylimit_thread_finalize(void);
> +bool dirtylimit_in_service(void);
> +bool dirtylimit_vcpu_index_valid(int cpu_index);
> +void dirtylimit_start(void);
> +void dirtylimit_stop(void);
> +void dirtylimit_set_vcpu(int cpu_index,
> + uint64_t quota,
> + bool enable);
> +void dirtylimit_set_all(uint64_t quota,
> +bool enable);
> +void dirtylimit_vcpu_execute(CPUState *cpu);
>  #endif
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 6eb39a0..bc3f0b5 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -563,4 +563,6 @@ bool kvm_cpu_check_are_resettable(void);
>  bool kvm_arch_cpu_check_are_resettable(void);
>  
>  bool kvm_dirty_ring_enabled(void);
> +
> +uint32_t kvm_dirty_ring_size(void);
>  #endif
> diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
> ind

Re: [PATCH v12 1/5] accel/kvm/kvm-all: refactor per-vcpu dirty ring reaping

2022-02-08 Thread Hyman Huang





在 2022/2/8 15:20, Peter Xu 写道:

On Mon, Jan 24, 2022 at 10:10:36PM +0800, huang...@chinatelecom.cn wrote:

@@ -2956,7 +2959,7 @@ int kvm_cpu_exec(CPUState *cpu)
   */
  trace_kvm_dirty_ring_full(cpu->cpu_index);
  qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, cpu);


Shall we keep passing in NULL in this patch, and make it conditionally taking
cpu parameter if dirty limit enabled?


Ok，so we should pass the cpu parameter only if dirtylimit in service.

Ring reset can still be expensive, so ideally we can still try the best to reap
as much PFNs as possible, as long as we still don't need accuracy on RING_FULL
exit events.


  qemu_mutex_unlock_iothread();
  ret = 0;
  break;
--
1.8.3.1





--
Best regard

Hyman Huang(黄勇)

[PATCH 2/6] tests/qemu-iotests/meson.build: Improve the indentation

2022-02-08 Thread Thomas Huth

By using subdir_done(), we can get rid of one level of indentation
in this file. This will make it easier to add more conditions to
skip the iotests in future patches.

Signed-off-by: Thomas Huth 
---
 tests/qemu-iotests/meson.build | 61 ++
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/tests/qemu-iotests/meson.build b/tests/qemu-iotests/meson.build
index 5be3c74127..e1832c90e0 100644
--- a/tests/qemu-iotests/meson.build
+++ b/tests/qemu-iotests/meson.build
@@ -1,30 +1,33 @@
-if have_tools and targetos != 'windows'
-  qemu_iotests_binaries = [qemu_img, qemu_io, qemu_nbd, qsd]
-  qemu_iotests_env = {'PYTHON': python.full_path()}
-  qemu_iotests_formats = {
-'qcow2': 'quick',
-'raw': 'slow',
-'qed': 'thorough',
-'vmdk': 'thorough',
-'vpc': 'thorough'
-  }
-
-  foreach k, v : emulators
-if k.startswith('qemu-system-')
-  qemu_iotests_binaries += v
-endif
-  endforeach
-  foreach format, speed: qemu_iotests_formats
-if speed == 'quick'
-  suites = 'block'
-else
-  suites = ['block-' + speed, speed]
-endif
-test('qemu-iotests ' + format, sh, args: [files('../check-block.sh'), 
format],
- depends: qemu_iotests_binaries, env: qemu_iotests_env,
- protocol: 'tap',
- suite: suites,
- timeout: 0,
- is_parallel: false)
-  endforeach
+if not have_tools or targetos == 'windows'
+  subdir_done()
 endif
+
+qemu_iotests_binaries = [qemu_img, qemu_io, qemu_nbd, qsd]
+qemu_iotests_env = {'PYTHON': python.full_path()}
+qemu_iotests_formats = {
+  'qcow2': 'quick',
+  'raw': 'slow',
+  'qed': 'thorough',
+  'vmdk': 'thorough',
+  'vpc': 'thorough'
+}
+
+foreach k, v : emulators
+  if k.startswith('qemu-system-')
+qemu_iotests_binaries += v
+  endif
+endforeach
+
+foreach format, speed: qemu_iotests_formats
+  if speed == 'quick'
+suites = 'block'
+  else
+suites = ['block-' + speed, speed]
+  endif
+  test('qemu-iotests ' + format, sh, args: [files('../check-block.sh'), 
format],
+   depends: qemu_iotests_binaries, env: qemu_iotests_env,
+   protocol: 'tap',
+   suite: suites,
+   timeout: 0,
+   is_parallel: false)
+endforeach
-- 
2.27.0

[PATCH 5/6] tests: Do not treat the iotests as separate meson test target anymore

2022-02-08 Thread Thomas Huth

Now that we add the single iotests directly in meson.build, we do
not have to separate the block suite from the other suites anymore.

Signed-off-by: Thomas Huth 
---
 meson.build| 6 +++---
 scripts/mtest2make.py  | 4 
 tests/Makefile.include | 9 +
 3 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/meson.build b/meson.build
index 5f43355071..b203402ee1 100644
--- a/meson.build
+++ b/meson.build
@@ -3,9 +3,9 @@ project('qemu', ['c'], meson_version: '>=0.58.2',
   'b_staticpic=false', 'stdsplit=false'],
 version: files('VERSION'))
 
-add_test_setup('quick', exclude_suites: ['block', 'slow', 'thorough'], 
is_default: true)
-add_test_setup('slow', exclude_suites: ['block', 'thorough'], env: 
['G_TEST_SLOW=1', 'SPEED=slow'])
-add_test_setup('thorough', exclude_suites: ['block'], env: ['G_TEST_SLOW=1', 
'SPEED=thorough'])
+add_test_setup('quick', exclude_suites: ['slow', 'thorough'], is_default: true)
+add_test_setup('slow', exclude_suites: ['thorough'], env: ['G_TEST_SLOW=1', 
'SPEED=slow'])
+add_test_setup('thorough', env: ['G_TEST_SLOW=1', 'SPEED=thorough'])
 
 not_found = dependency('', required: false)
 keyval = import('keyval')
diff --git a/scripts/mtest2make.py b/scripts/mtest2make.py
index 4d542e8aaa..304634b71e 100644
--- a/scripts/mtest2make.py
+++ b/scripts/mtest2make.py
@@ -101,10 +101,6 @@ def emit_suite(name, suite, prefix):
 testsuites = defaultdict(Suite)
 for test in introspect['tests']:
 process_tests(test, targets, testsuites)
-# HACK: check-block is a separate target so that it runs with --verbose;
-# only write the dependencies
-emit_suite_deps('block', testsuites['block'], 'check')
-del testsuites['block']
 emit_prolog(testsuites, 'check')
 for name, suite in testsuites.items():
 emit_suite(name, suite, 'check')
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 9157a57b1a..f93ae5b479 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -151,16 +151,9 @@ check-acceptance: check-acceptance-deprecated-warning | 
check-avocado
 
 # Consolidated targets
 
-.PHONY: check-block check check-clean get-vm-images
+.PHONY: check check-clean get-vm-images
 check:
 
-ifeq ($(CONFIG_TOOLS)$(CONFIG_POSIX),yy)
-check: check-block
-check-block: run-ninja
-   $(if $(MAKE.n),,+)$(MESON) test $(MTESTARGS) $(.mtestargs) --verbose \
-   --logbase iotestslog $(call .speed.$(SPEED), block block-slow 
block-thorough)
-endif
-
 check-build: run-ninja
 
 check-clean:
-- 
2.27.0

Re: [PATCH v12 2/5] migration/dirtyrate: refactor dirty page rate calculation

2022-02-08 Thread Hyman Huang





在 2022/2/8 16:18, Peter Xu 写道:

On Mon, Jan 24, 2022 at 10:10:37PM +0800, huang...@chinatelecom.cn wrote:

diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e..63159d6 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,7 @@ static int cpu_get_free_index(void)
  }
  
  CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);

+unsigned int cpu_list_generation_id;
  
  void cpu_list_add(CPUState *cpu)

  {
@@ -84,6 +85,7 @@ void cpu_list_add(CPUState *cpu)
  assert(!cpu_index_auto_assigned);
  }
  QTAILQ_INSERT_TAIL_RCU(&cpus, cpu, node);
+cpu_list_generation_id++;
  }
  
  void cpu_list_remove(CPUState *cpu)

@@ -96,6 +98,7 @@ void cpu_list_remove(CPUState *cpu)
  
  QTAILQ_REMOVE_RCU(&cpus, cpu, node);

  cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+cpu_list_generation_id++;
  }


Could you move the cpu list gen id changes into a separate patch?

Yes, of course


  
  CPUState *qemu_get_cpu(int index)

diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 000..ea4785f
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,31 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+extern unsigned int cpu_list_generation_id;


How about exporting a function cpu_list_generation_id_get() from the cpu code,
rather than referencing it directly?
Ok, this will be done along with "cpu list gen id changes" in a separate 
patch



+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ int64_t init_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+DirtyPageRecord *records;
+int64_t duration;
+int64_t dirtyrate;
+int i = 0;
+unsigned int gen_id;
+
+retry:
+cpu_list_lock();
+gen_id = cpu_list_generation_id;
+records = vcpu_dirty_stat_alloc(stat);
+vcpu_dirty_stat_collect(stat, records, true);
+
+duration = dirty_stat_wait(calc_time_ms, init_time_ms);
+cpu_list_unlock();


Should release the lock before sleep (dirty_stat_wait)?
Good point, since we have introduced the cpu_list_generation_id and make 
sure the we can handle the plug/unplug scenario, we can make the cpu 
plug/unplug as fast as it can. :)



+
+global_dirty_log_sync(flag, one_shot);
+
+cpu_list_lock();
+if (gen_id != cpu_list_generation_id) {
+g_free(records);
+g_free(stat->rates);
+cpu_list_unlock();
+goto retry;
+}
+vcpu_dirty_stat_collect(stat, records, false);
+cpu_list_unlock();
+
+for (i = 0; i < stat->nvcpu; i++) {
+dirtyrate = do_calculate_dirtyrate(records[i], duration);
+
+stat->rates[i].id = i;
+stat->rates[i].dirty_rate = dirtyrate;
+
+trace_dirtyrate_do_calculate_vcpu(i, dirtyrate);
+}
+
+g_free(records);
+
+return duration;
+}


Thanks,



--
Best regard

Hyman Huang(黄勇)

[PATCH 3/6] tests/qemu-iotests: Allow to run "./check -n" from the source directory, too

2022-02-08 Thread Thomas Huth

For better integration of the iotests into the meson build system, it
would be very helpful to get the list of the tests in the "auto" group
during the "configure" step already. However, "check -n -g auto"
currently only works if the binaries have already been built. Re-order
the code in the "check" a little bit so that we can use the -n option
without building the binaries first.

Signed-off-by: Thomas Huth 
---
 tests/qemu-iotests/check | 52 ++--
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 75de1b4691..0fa75abf13 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -120,6 +120,30 @@ def make_argparser() -> argparse.ArgumentParser:
 if __name__ == '__main__':
 args = make_argparser().parse_args()
 
+if os.path.islink(sys.argv[0]):
+# called from the build tree
+source_iotests = os.path.dirname(os.readlink(sys.argv[0]))
+else:
+source_iotests = os.getcwd()
+
+testfinder = TestFinder(source_iotests)
+
+groups = args.groups.split(',') if args.groups else None
+x_groups = args.exclude_groups.split(',') if args.exclude_groups else None
+
+try:
+tests = testfinder.find_tests(groups=groups, exclude_groups=x_groups,
+  tests=args.tests,
+  start_from=args.start_from)
+if not tests:
+raise ValueError('No tests selected')
+except ValueError as e:
+sys.exit(e)
+
+if args.dry_run:
+print('\n'.join(tests))
+sys.exit(0)
+
 env = TestEnv(imgfmt=args.imgfmt, imgproto=args.imgproto,
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
@@ -140,11 +164,6 @@ if __name__ == '__main__':
 os.chdir(exec_path.parent)
 os.execve(cmd[0], cmd, full_env)
 
-testfinder = TestFinder(test_dir=env.source_iotests)
-
-groups = args.groups.split(',') if args.groups else None
-x_groups = args.exclude_groups.split(',') if args.exclude_groups else None
-
 group_local = os.path.join(env.source_iotests, 'group.local')
 if os.path.isfile(group_local):
 try:
@@ -152,21 +171,8 @@ if __name__ == '__main__':
 except ValueError as e:
 sys.exit(f"Failed to parse group file '{group_local}': {e}")
 
-try:
-tests = testfinder.find_tests(groups=groups, exclude_groups=x_groups,
-  tests=args.tests,
-  start_from=args.start_from)
-if not tests:
-raise ValueError('No tests selected')
-except ValueError as e:
-sys.exit(e)
-
-if args.dry_run:
-print('\n'.join(tests))
-else:
-with TestRunner(env, tap=args.tap,
-color=args.color) as tr:
-paths = [os.path.join(env.source_iotests, t) for t in tests]
-ok = tr.run_tests(paths, args.jobs)
-if not ok:
-sys.exit(1)
+with TestRunner(env, tap=args.tap, color=args.color) as tr:
+paths = [os.path.join(env.source_iotests, t) for t in tests]
+ok = tr.run_tests(paths, args.jobs)
+if not ok:
+sys.exit(1)
-- 
2.27.0

Re: [PATCH 1/4] target/ppc: Remove powerpc_excp_legacy

2022-02-08 Thread Cédric Le Goater


On 2/7/22 19:30, Fabiano Rosas wrote:

Now that all CPU families have their own separate exception
dispatching code we can remove powerpc_excp_legacy.

Signed-off-by: Fabiano Rosas 


Super :)

Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
  target/ppc/excp_helper.c | 477 +--
  1 file changed, 3 insertions(+), 474 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 0050c8447f..c6646503aa 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -163,7 +163,7 @@ static void ppc_excp_debug_sw_tlb(CPUPPCState *env, int 
excp)
   env->error_code);
  }
  
-

+#if defined(TARGET_PPC64)
  static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState *env, int excp,
  target_ulong *msr)
  {
@@ -267,7 +267,6 @@ static void ppc_excp_apply_ail(PowerPCCPU *cpu, int 
excp_model, int excp,
target_ulong *new_msr,
target_ulong *vector)
  {
-#if defined(TARGET_PPC64)
  CPUPPCState *env = &cpu->env;
  bool mmu_all_on = ((msr >> MSR_IR) & 1) && ((msr >> MSR_DR) & 1);
  bool hv_escalation = !(msr & MSR_HVB) && (*new_msr & MSR_HVB);
@@ -356,8 +355,8 @@ static void ppc_excp_apply_ail(PowerPCCPU *cpu, int 
excp_model, int excp,
  *vector |= 0xc0003000ull; /* Apply scv's AIL=3 offset */
  }
  }
-#endif
  }
+#endif
  
  static void powerpc_set_excp_state(PowerPCCPU *cpu,

target_ulong vector, target_ulong 
msr)
@@ -1641,476 +1640,6 @@ static inline void powerpc_excp_books(PowerPCCPU *cpu, 
int excp)
  }
  #endif
  
-/*

- * Note that this function should be greatly optimized when called
- * with a constant excp, from ppc_hw_interrupt
- */
-static inline void powerpc_excp_legacy(PowerPCCPU *cpu, int excp)
-{
-CPUState *cs = CPU(cpu);
-CPUPPCState *env = &cpu->env;
-int excp_model = env->excp_model;
-target_ulong msr, new_msr, vector;
-int srr0, srr1, lev = -1;
-
-if (excp <= POWERPC_EXCP_NONE || excp >= POWERPC_EXCP_NB) {
-cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
-}
-
-qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
-  " => %s (%d) error=%02x\n", env->nip, 
powerpc_excp_name(excp),
-  excp, env->error_code);
-
-/* new srr1 value excluding must-be-zero bits */
-if (excp_model == POWERPC_EXCP_BOOKE) {
-msr = env->msr;
-} else {
-msr = env->msr & ~0x783fULL;
-}
-
-/*
- * new interrupt handler msr preserves existing HV and ME unless
- * explicitly overriden
- */
-new_msr = env->msr & (((target_ulong)1 << MSR_ME) | MSR_HVB);
-
-/* target registers */
-srr0 = SPR_SRR0;
-srr1 = SPR_SRR1;
-
-/*
- * check for special resume at 0x100 from doze/nap/sleep/winkle on
- * P7/P8/P9
- */
-if (env->resume_as_sreset) {
-excp = powerpc_reset_wakeup(cs, env, excp, &msr);
-}
-
-/*
- * Hypervisor emulation assistance interrupt only exists on server
- * arch 2.05 server or later. We also don't want to generate it if
- * we don't have HVB in msr_mask (PAPR mode).
- */
-if (excp == POWERPC_EXCP_HV_EMU
-#if defined(TARGET_PPC64)
-&& !(mmu_is_64bit(env->mmu_model) && (env->msr_mask & MSR_HVB))
-#endif /* defined(TARGET_PPC64) */
-
-) {
-excp = POWERPC_EXCP_PROGRAM;
-}
-
-#ifdef TARGET_PPC64
-/*
- * SPEU and VPU share the same IVOR but they exist in different
- * processors. SPEU is e500v1/2 only and VPU is e6500 only.
- */
-if (excp_model == POWERPC_EXCP_BOOKE && excp == POWERPC_EXCP_VPU) {
-excp = POWERPC_EXCP_SPEU;
-}
-#endif
-
-vector = env->excp_vectors[excp];
-if (vector == (target_ulong)-1ULL) {
-cpu_abort(cs, "Raised an exception without defined vector %d\n",
-  excp);
-}
-
-vector |= env->excp_prefix;
-
-switch (excp) {
-case POWERPC_EXCP_CRITICAL:/* Critical input */
-switch (excp_model) {
-case POWERPC_EXCP_40x:
-srr0 = SPR_40x_SRR2;
-srr1 = SPR_40x_SRR3;
-break;
-case POWERPC_EXCP_BOOKE:
-srr0 = SPR_BOOKE_CSRR0;
-srr1 = SPR_BOOKE_CSRR1;
-break;
-case POWERPC_EXCP_6xx:
-break;
-default:
-goto excp_invalid;
-}
-break;
-case POWERPC_EXCP_MCHECK:/* Machine check exception  */
-if (msr_me == 0) {
-/*
- * Machine check exception is not enabled.  Enter
- * checkstop state.
- */
-fprintf(stderr, "Machine check while not allowed. "
-"Entering checkstop state\n");
-if (qemu_log_separate()) {
-qemu_log("Machine check while

Re: [PATCH v5 0/9] virtiofsd: Add support for file security context at file creation

2022-02-08 Thread Daniel P . Berrangé

On Mon, Feb 07, 2022 at 04:19:38PM -0500, Vivek Goyal wrote:
> On Mon, Feb 07, 2022 at 01:05:16PM +, Daniel P. Berrangé wrote:
> > On Wed, Feb 02, 2022 at 02:39:26PM -0500, Vivek Goyal wrote:
> > > Hi,
> > > 
> > > This is V5 of the patches. I posted V4 here.
> > > 
> > > https://listman.redhat.com/archives/virtio-fs/2022-January/msg00041.html
> > > 
> > > These will allow us to support SELinux with virtiofs. This will send
> > > SELinux context at file creation to server and server can set it on
> > > file.
> > 
> > I've not entirely figured it out from the code, so easier for me
> > to ask...
> > 
> > How is the SELinux labelled stored on the host side ? It is stored
> > directly in the security.* xattr namespace,
> 
> [ CC Dan Walsh ]
> 
> I just tried to test the mode where I don't do xattr remapping and try
> to set /proc/pid/attr/fscreate with the context I want to set. It will
> set security.selinux xattr on host.
> 
> But write to /proc/pid/attr/fscreate fails if host does not recognize
> the label sent by guest. I am running virtiofsd with unconfined_t but
> it still fails because guest is trying to create a file with
> "test_filesystem_filetranscon_t" and host does not recognize this
> label. Seeing following in audit logs.
> 
> type=SELINUX_ERR msg=audit(1644268262.666:8111): op=fscreate 
> invalid_context="unconfined_u:object_r:test_filesystem_filetranscon_t:s0"

Yes, that's to be expected if the host policy doesn't know about the
label that the guest is using.

IOW, non-mapping case is only useful if you have a very good match
between host + guest OS policy. This could be useful for an app
like Kata because their guest is not a full OS, it is something
special purpose and tightly controlled.

> So if we don't remap xattrs and host has SELinux enabled, then it probably
> work in very limited circumstances where host and guest policies don't
> conflict. I guess its like running fedora 34 guest on fedora 34 host. 
> I suspect that this will see very limited use. Though I have put the
> code in for the sake of completeness.

For general purpose guest OS virtualization remapping is going to be
effectuively mandatory.  The non-mapped case only usable when you tightly
control the guest OS packages from the host.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: target/arm: cp15.dacr migration

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 04:56, Pavel Dovgalyuk  wrote:
>
> On 07.02.2022 16:44, Peter Maydell wrote:
> > On Mon, 7 Feb 2022 at 12:13, Pavel Dovgalyuk  
> > wrote:
> >>
> >> I recently encountered a problem with cp15.dacr register.
> >> It has _s and _ns versions. During the migration only dacr_ns is
> >> saved/loaded.
> >> But both of the values are used in get_phys_addr_v5 and get_phys_addr_v6
> >> functions. Therefore VM behavior becomes incorrect after loading the
> >> vmstate.
> >
> > Yes, we don't correctly save and restore the Secure banked
> > registers. This is a long standing bug (eg it is the
> > cause of https://gitlab.com/qemu-project/qemu/-/issues/467).
> > Almost nobody notices this, because almost nobody both runs
> > Secure-world AArch32 code and also tries migration or save/restore.
>
> We actually did it for reverse debugging of custom firmware.
>
> >> I found that kvm_to_cpreg_id is responsible for disabling dacr_s
> >> migration, because it always selects ns variant.
> >
> >> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> >> index c6a4d50e82..d3ffef3640 100644
> >> --- a/target/arm/cpu.h
> >> +++ b/target/arm/cpu.h
> >> @@ -2510,11 +2510,6 @@ static inline uint32_t kvm_to_cpreg_id(uint64_t
> >> kvmid)
> >>if ((kvmid & CP_REG_SIZE_MASK) == CP_REG_SIZE_U64) {
> >>cpregid |= (1 << 15);
> >>}
> >> -
> >> -/* KVM is always non-secure so add the NS flag on AArch32 register
> >> - * entries.
> >> - */
> >> - cpregid |= 1 << CP_REG_NS_SHIFT;
> >>}
> >>return cpregid;
> >>}
> >
> > This change is wrong, or at least incomplete -- as the comment notes,
> > a guest running under KVM is always NonSecure, so when KVM says "this is
> > DACR" (or whatever) it always means "this is the NS banked DACR".
> > (Though now AArch32 KVM support has been dropped we have some flexibility
> > to not necessarily use KVM register ID values that exactly match what
> > the kernel uses, if we need to do that.)
>
> Unfortunately, I can't test anything with AArch32 KVM.

As I say, it doesn't exist any more, so you don't need to.
In any case, this patch isn't sufficient on its own.

thanks
-- PMM

Re: [PATCH 2/4] target/ppc: powerpc_excp: Move common code to the caller function

2022-02-08 Thread Cédric Le Goater


On 2/7/22 19:30, Fabiano Rosas wrote:

Make the cpu-specific powerpc_excp_* functions a bit simpler by moving
the bounds check and logging to powerpc_excp.

Signed-off-by: Fabiano Rosas 


Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
  target/ppc/excp_helper.c | 57 +++-
  1 file changed, 9 insertions(+), 48 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index c6646503aa..206314aaa2 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -396,14 +396,6 @@ static void powerpc_excp_40x(PowerPCCPU *cpu, int excp)
  target_ulong msr, new_msr, vector;
  int srr0, srr1;
  
-if (excp <= POWERPC_EXCP_NONE || excp >= POWERPC_EXCP_NB) {

-cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
-}
-
-qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
-  " => %s (%d) error=%02x\n", env->nip, 
powerpc_excp_name(excp),
-  excp, env->error_code);
-
  /* new srr1 value excluding must-be-zero bits */
  msr = env->msr & ~0x783fULL;
  
@@ -554,14 +546,6 @@ static void powerpc_excp_6xx(PowerPCCPU *cpu, int excp)

  CPUPPCState *env = &cpu->env;
  target_ulong msr, new_msr, vector;
  
-if (excp <= POWERPC_EXCP_NONE || excp >= POWERPC_EXCP_NB) {

-cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
-}
-
-qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
-  " => %s (%d) error=%02x\n", env->nip, 
powerpc_excp_name(excp),
-  excp, env->error_code);
-
  /* new srr1 value excluding must-be-zero bits */
  msr = env->msr & ~0x783fULL;
  
@@ -746,14 +730,6 @@ static void powerpc_excp_7xx(PowerPCCPU *cpu, int excp)

  CPUPPCState *env = &cpu->env;
  target_ulong msr, new_msr, vector;
  
-if (excp <= POWERPC_EXCP_NONE || excp >= POWERPC_EXCP_NB) {

-cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
-}
-
-qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
-  " => %s (%d) error=%02x\n", env->nip, 
powerpc_excp_name(excp),
-  excp, env->error_code);
-
  /* new srr1 value excluding must-be-zero bits */
  msr = env->msr & ~0x783fULL;
  
@@ -926,14 +902,6 @@ static void powerpc_excp_74xx(PowerPCCPU *cpu, int excp)

  CPUPPCState *env = &cpu->env;
  target_ulong msr, new_msr, vector;
  
-if (excp <= POWERPC_EXCP_NONE || excp >= POWERPC_EXCP_NB) {

-cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
-}
-
-qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
-  " => %s (%d) error=%02x\n", env->nip, 
powerpc_excp_name(excp),
-  excp, env->error_code);
-
  /* new srr1 value excluding must-be-zero bits */
  msr = env->msr & ~0x783fULL;
  
@@ -1121,14 +1089,6 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)

  target_ulong msr, new_msr, vector;
  int srr0, srr1;
  
-if (excp <= POWERPC_EXCP_NONE || excp >= POWERPC_EXCP_NB) {

-cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
-}
-
-qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
-  " => %s (%d) error=%02x\n", env->nip, 
powerpc_excp_name(excp),
-  excp, env->error_code);
-
  msr = env->msr;
  
  /*

@@ -1348,14 +1308,6 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
  target_ulong msr, new_msr, vector;
  int srr0, srr1, lev = -1;
  
-if (excp <= POWERPC_EXCP_NONE || excp >= POWERPC_EXCP_NB) {

-cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
-}
-
-qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
-  " => %s (%d) error=%02x\n", env->nip, 
powerpc_excp_name(excp),
-  excp, env->error_code);
-
  /* new srr1 value excluding must-be-zero bits */
  msr = env->msr & ~0x783fULL;
  
@@ -1642,8 +1594,17 @@ static inline void powerpc_excp_books(PowerPCCPU *cpu, int excp)
  
  static void powerpc_excp(PowerPCCPU *cpu, int excp)

  {
+CPUState *cs = CPU(cpu);
  CPUPPCState *env = &cpu->env;
  
+if (excp <= POWERPC_EXCP_NONE || excp >= POWERPC_EXCP_NB) {

+cpu_abort(cs, "Invalid PowerPC exception %d. Aborting\n", excp);
+}
+
+qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
+  " => %s (%d) error=%02x\n", env->nip, 
powerpc_excp_name(excp),
+  excp, env->error_code);
+
  switch (env->excp_model) {
  case POWERPC_EXCP_40x:
  powerpc_excp_40x(cpu, excp);

Re: [PATCH v5 11/11] 9p: darwin: Adjust assumption on virtio-9p-test

2022-02-08 Thread Greg Kurz

On Mon,  7 Feb 2022 17:40:24 -0500
Will Cohen  wrote:

> The previous test depended on the assumption that P9_DOTL_AT_REMOVEDIR
> and AT_REMOVEDIR have the same value.
> 
> While this is true on Linux, it is not true everywhere, and leads to an
> incorrect test failure on unlink_at, noticed when adding 9p to darwin:
> 
> Received response 7 (RLERROR) instead of 77 (RUNLINKAT)
> Rlerror has errno 22 (Invalid argument)
> **
> 
> ERROR:../tests/qtest/virtio-9p-test.c:305:v9fs_req_recv: assertion
> failed (hdr.id == id): (7 == 77) Bail out!
> 
> ERROR:../tests/qtest/virtio-9p-test.c:305:v9fs_req_recv: assertion
> failed (hdr.id == id): (7 == 77)
> 
> Signed-off-by: Fabian Franz 
> [Will Cohen: - Add explanation of patch and description
>of pre-patch test failure]
> Signed-off-by: Will Cohen 
> Acked-by: Thomas Huth 
> ---

LGTM but this patch should go before patch 10 that enables
Darwin host support to avoid qtest breakage while bisecting.

Reviewed-by: Greg Kurz 

>  tests/qtest/virtio-9p-test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> index 41fed41de1..6bcf89f0f8 100644
> --- a/tests/qtest/virtio-9p-test.c
> +++ b/tests/qtest/virtio-9p-test.c
> @@ -1270,7 +1270,7 @@ static void fs_unlinkat_dir(void *obj, void *data, 
> QGuestAllocator *t_alloc)
>  /* ... and is actually a directory */
>  g_assert((st.st_mode & S_IFMT) == S_IFDIR);
>  
> -do_unlinkat(v9p, "/", "02", AT_REMOVEDIR);
> +do_unlinkat(v9p, "/", "02", P9_DOTL_AT_REMOVEDIR);
>  /* directory should be gone now */
>  g_assert(stat(new_dir, &st) != 0);
>

Re: [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue

2022-02-08 Thread Jason Wang




在 2022/1/31 下午7:33, Eugenio Perez Martin 写道:

On Fri, Jan 28, 2022 at 7:57 AM Jason Wang  wrote:


在 2022/1/22 上午4:27, Eugenio Pérez 写道:

At this moment no buffer forwarding will be performed in SVQ mode: Qemu
just forward the guest's kicks to the device. This commit also set up
SVQs in the vhost device.

Host memory notifiers regions are left out for simplicity, and they will
not be addressed in this series.


I wonder if it's better to squash this into patch 5 since it gives us a
full guest->host forwarding.


I'm fine with that if you think it makes the review easier.



Yes please.





Signed-off-by: Eugenio Pérez 
---
   include/hw/virtio/vhost-vdpa.h |   4 ++
   hw/virtio/vhost-vdpa.c | 122 -
   2 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 3ce79a646d..009a9f3b6b 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -12,6 +12,8 @@
   #ifndef HW_VIRTIO_VHOST_VDPA_H
   #define HW_VIRTIO_VHOST_VDPA_H

+#include 
+
   #include "hw/virtio/virtio.h"
   #include "standard-headers/linux/vhost_types.h"

@@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
   bool iotlb_batch_begin_sent;
   MemoryListener listener;
   struct vhost_vdpa_iova_range iova_range;
+bool shadow_vqs_enabled;
+GPtrArray *shadow_vqs;
   struct vhost_dev *dev;
   VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
   } VhostVDPA;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 6c10a7f05f..18de14f0fb 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -17,12 +17,14 @@
   #include "hw/virtio/vhost.h"
   #include "hw/virtio/vhost-backend.h"
   #include "hw/virtio/virtio-net.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
   #include "hw/virtio/vhost-vdpa.h"
   #include "exec/address-spaces.h"
   #include "qemu/main-loop.h"
   #include "cpu.h"
   #include "trace.h"
   #include "qemu-common.h"
+#include "qapi/error.h"

   /*
* Return one past the end of the end of section. Be careful with uint64_t
@@ -409,8 +411,14 @@ err:

   static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
   {
+struct vhost_vdpa *v = dev->opaque;
   int i;

+if (v->shadow_vqs_enabled) {
+/* SVQ is not compatible with host notifiers mr */


I guess there should be a TODO or FIXME here.


Sure I can add it.


+return;
+}
+
   for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
   if (vhost_vdpa_host_notifier_init(dev, i)) {
   goto err;
@@ -424,6 +432,17 @@ err:
   return;
   }

+static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
+{
+struct vhost_vdpa *v = dev->opaque;
+size_t idx;
+
+for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
+vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
+}
+g_ptr_array_free(v->shadow_vqs, true);
+}
+
   static int vhost_vdpa_cleanup(struct vhost_dev *dev)
   {
   struct vhost_vdpa *v;
@@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
   trace_vhost_vdpa_cleanup(dev, v);
   vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
   memory_listener_unregister(&v->listener);
+vhost_vdpa_svq_cleanup(dev);

   dev->opaque = NULL;
   ram_block_discard_disable(false);
@@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,

   static int vhost_vdpa_reset_device(struct vhost_dev *dev)
   {
+struct vhost_vdpa *v = dev->opaque;
   int ret;
   uint8_t status = 0;

+for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
+vhost_svq_stop(svq);
+}
+
   ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
   trace_vhost_vdpa_reset_device(dev, status);
   return ret;
@@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev 
*dev,
   return ret;
   }

-static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
-   struct vhost_vring_file *file)
+static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
+ struct vhost_vring_file *file)
   {
   trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
   return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
   }

+static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
+   struct vhost_vring_file *file)
+{
+struct vhost_vdpa *v = dev->opaque;
+int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
+
+if (v->shadow_vqs_enabled) {
+VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
+vhost_svq_set_svq_kick_fd(svq, file->fd);
+return 0;
+} else {
+return vhost_vdpa_set_vring_dev_kick(dev, file);
+}
+}
+
   static int vhost_vdpa_set_vring_call(struct vhost_dev *

Re: [PATCH] hvf: arm: Handle ID_AA64ISAR2_EL1 reads

2022-02-08 Thread Peter Maydell

On Mon, 7 Feb 2022 at 22:52, Alexander Graf  wrote:
>
> Recent Linux versions added support to read ID_AA64ISAR2_EL1. On M1,
> those reads trap into QEMU which handles them as faults.
>
> However, according to the ARMv8 spec (issue D17783), reads on this
> register in older ARMv8 revisions should be RES0. So let's treat it
> as such instead.
>
> Reported-by: Ivan Babrou 
> Signed-off-by: Alexander Graf 
> ---
>  target/arm/hvf/hvf.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
> index 92ad0d29c4..045ec69c7c 100644
> --- a/target/arm/hvf/hvf.c
> +++ b/target/arm/hvf/hvf.c
> @@ -54,6 +54,7 @@
>  #define SYSREG_PMCEID1_EL0SYSREG(3, 3, 9, 12, 7)
>  #define SYSREG_PMCCNTR_EL0SYSREG(3, 3, 9, 13, 0)
>  #define SYSREG_PMCCFILTR_EL0  SYSREG(3, 3, 14, 15, 7)
> +#define SYSREG_ID_AA64ISAR2_EL1 SYSREG(3, 0, 0, 6, 2)
>
>  #define WFX_IS_WFE (1 << 0)
>
> @@ -780,6 +781,10 @@ static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, 
> uint32_t rt)
>  case SYSREG_OSDLR_EL1:
>  /* Dummy register */
>  break;
> +case SYSREG_ID_AA64ISAR2_EL1:
> +/* We do not support any of the ISAR2 features yet */
> +val = 0;
> +break;
>  default:

We should handle all the architected "this should RAZ/WI"
ID register space, if hvf doesn't do the right thing internally.

thanks
-- PMM

Re: [PATCH 5/6] tests: Do not treat the iotests as separate meson test target anymore

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 10:18, Thomas Huth  wrote:
>
> Now that we add the single iotests directly in meson.build, we do
> not have to separate the block suite from the other suites anymore.
>
> Signed-off-by: Thomas Huth 
> ---
>  meson.build| 6 +++---
>  scripts/mtest2make.py  | 4 
>  tests/Makefile.include | 9 +
>  3 files changed, 4 insertions(+), 15 deletions(-)
>
> diff --git a/meson.build b/meson.build
> index 5f43355071..b203402ee1 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -3,9 +3,9 @@ project('qemu', ['c'], meson_version: '>=0.58.2',
>'b_staticpic=false', 'stdsplit=false'],
>  version: files('VERSION'))
>
> -add_test_setup('quick', exclude_suites: ['block', 'slow', 'thorough'], 
> is_default: true)
> -add_test_setup('slow', exclude_suites: ['block', 'thorough'], env: 
> ['G_TEST_SLOW=1', 'SPEED=slow'])
> -add_test_setup('thorough', exclude_suites: ['block'], env: ['G_TEST_SLOW=1', 
> 'SPEED=thorough'])
> +add_test_setup('quick', exclude_suites: ['slow', 'thorough'], is_default: 
> true)
> +add_test_setup('slow', exclude_suites: ['thorough'], env: ['G_TEST_SLOW=1', 
> 'SPEED=slow'])
> +add_test_setup('thorough', env: ['G_TEST_SLOW=1', 'SPEED=thorough'])
>
>  not_found = dependency('', required: false)
>  keyval = import('keyval')
> diff --git a/scripts/mtest2make.py b/scripts/mtest2make.py
> index 4d542e8aaa..304634b71e 100644
> --- a/scripts/mtest2make.py
> +++ b/scripts/mtest2make.py
> @@ -101,10 +101,6 @@ def emit_suite(name, suite, prefix):
>  testsuites = defaultdict(Suite)
>  for test in introspect['tests']:
>  process_tests(test, targets, testsuites)
> -# HACK: check-block is a separate target so that it runs with --verbose;
> -# only write the dependencies
> -emit_suite_deps('block', testsuites['block'], 'check')
> -del testsuites['block']

This code being deleted claims to be doing something to ensure that
the tests get run and output the useful messages on failure.
What is the mechanism for this in the new meson setup ?
(As far as I can tell at the moment this is broken. At some
point I will start agitating for reverting that conversion if
it isn't fixed :-))

-- PMM

[PATCH qemu] spapr/vof: Install rom and nvram binaries

2022-02-08 Thread Alexey Kardashevskiy

This installs VOF-related binaries (the firmware and the preformatted
NVRAM) as those were left out when the VOF was submitted initially.

Fixes: fc8c745d5015 ("spapr: Implement Open Firmware client interface")
Signed-off-by: Alexey Kardashevskiy 
---
 pc-bios/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/pc-bios/meson.build b/pc-bios/meson.build
index 4ac7a5509b69..c86dedf7dff9 100644
--- a/pc-bios/meson.build
+++ b/pc-bios/meson.build
@@ -81,6 +81,8 @@ blobs = files(
   'opensbi-riscv32-generic-fw_dynamic.bin',
   'opensbi-riscv64-generic-fw_dynamic.bin',
   'npcm7xx_bootrom.bin',
+  'vof.bin',
+  'vof-nvram.bin',
 )
 
 if get_option('install_blobs')
-- 
2.30.2

[PATCH 0/6] Improve integration of iotests in the meson test harness

2022-02-08 Thread Thomas Huth

Though "make check-block" is currently already run via the meson test
runner, it still looks like an oddball in the output of "make check" since
the tests are still run separately via the check-block.sh script. It would
be nicer if the iotests would show up like the other tests suites. For this
we have to tweak the tests/qemu-iotests/check script so that it can already
be run with "-g auto -n" during the configuration step [*], then we can
directly add the individual tests in the tests/qemu-iotests/meson.build file
already and finally get rid of the check-block.sh script.

[*] Alternatively, I think we could also get rid of the "auto" group
and add the test list to the tests/qemu-iotests/meson.build file
directly ... not sure whether that's so much nicer, though.

Thomas Huth (6):
  tests/qemu-iotests: Improve the check for GNU sed
  tests/qemu-iotests/meson.build: Improve the indentation
  tests/qemu-iotests: Allow to run "./check -n" from the source
directory, too
  tests/qemu-iotests/meson.build: Call the 'check' script directly
  tests: Do not treat the iotests as separate meson test target anymore
  tests: Remove check-block.sh

 meson.build|  6 +--
 scripts/mtest2make.py  |  4 --
 tests/Makefile.include |  9 +---
 tests/check-block.sh   | 85 --
 tests/qemu-iotests/check   | 52 -
 tests/qemu-iotests/common.rc   | 26 +--
 tests/qemu-iotests/meson.build | 84 ++---
 7 files changed, 104 insertions(+), 162 deletions(-)
 delete mode 100755 tests/check-block.sh

-- 
2.27.0

[PATCH 1/6] tests/qemu-iotests: Improve the check for GNU sed

2022-02-08 Thread Thomas Huth

Instead of failing the iotests if GNU sed is not available (or skipping
them completely in the check-block.sh script), it would be better to
simply skip the bash-based tests, so that the python-based tests could
still be run. Thus add the check for BusyBox sed to common.rc and mark
the tests as "not run" if GNU sed is not available. Then we can also
remove the sed checks from the check-block.sh script.

Signed-off-by: Thomas Huth 
---
 tests/check-block.sh | 12 
 tests/qemu-iotests/common.rc | 26 +-
 2 files changed, 13 insertions(+), 25 deletions(-)

diff --git a/tests/check-block.sh b/tests/check-block.sh
index 720a46bc36..af0c574812 100755
--- a/tests/check-block.sh
+++ b/tests/check-block.sh
@@ -52,18 +52,6 @@ if LANG=C bash --version | grep -q 'GNU bash, version [123]' 
; then
 skip "bash version too old ==> Not running the qemu-iotests."
 fi
 
-if ! (sed --version | grep 'GNU sed') > /dev/null 2>&1 ; then
-if ! command -v gsed >/dev/null 2>&1; then
-skip "GNU sed not available ==> Not running the qemu-iotests."
-fi
-else
-# Double-check that we're not using BusyBox' sed which says
-# that "This is not GNU sed version 4.0" ...
-if sed --version | grep -q 'not GNU sed' ; then
-skip "BusyBox sed not supported ==> Not running the qemu-iotests."
-fi
-fi
-
 cd tests/qemu-iotests
 
 # QEMU_CHECK_BLOCK_AUTO is used to disable some unstable sub-tests
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 9885030b43..9ea504810c 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -17,17 +17,27 @@
 # along with this program.  If not, see .
 #
 
+# bail out, setting up .notrun file
+_notrun()
+{
+echo "$*" >"$OUTPUT_DIR/$seq.notrun"
+echo "$seq not run: $*"
+status=0
+exit
+}
+
+# We need GNU sed for the iotests. Make sure to not use BusyBox sed
+# which says that "This is not GNU sed version 4.0"
 SED=
 for sed in sed gsed; do
-($sed --version | grep 'GNU sed') > /dev/null 2>&1
+($sed --version | grep -v "not GNU sed" | grep 'GNU sed') > /dev/null 2>&1
 if [ "$?" -eq 0 ]; then
 SED=$sed
 break
 fi
 done
 if [ -z "$SED" ]; then
-echo "$0: GNU sed not found"
-exit 1
+_notrun "GNU sed not found"
 fi
 
 dd()
@@ -722,16 +732,6 @@ _img_info()
 done
 }
 
-# bail out, setting up .notrun file
-#
-_notrun()
-{
-echo "$*" >"$OUTPUT_DIR/$seq.notrun"
-echo "$seq not run: $*"
-status=0
-exit
-}
-
 # bail out, setting up .casenotrun file
 # The function _casenotrun() is used as a notifier. It is the
 # caller's responsibility to make skipped a particular test.
-- 
2.27.0

Re: [PATCH 2/5] linux-user: Introduce host_signal_mask

2022-02-08 Thread Philippe Mathieu-Daudé via


On 8/2/22 08:12, Richard Henderson wrote:

Do not directly access the uc_sigmask member.
This is preparation for a sparc64 fix.

Signed-off-by: Richard Henderson 
---
  linux-user/include/host/aarch64/host-signal.h  |  5 +
  linux-user/include/host/alpha/host-signal.h|  5 +
  linux-user/include/host/arm/host-signal.h  |  5 +
  linux-user/include/host/i386/host-signal.h |  5 +
  .../include/host/loongarch64/host-signal.h |  5 +
  linux-user/include/host/mips/host-signal.h |  5 +
  linux-user/include/host/ppc/host-signal.h  |  5 +
  linux-user/include/host/riscv/host-signal.h|  5 +
  linux-user/include/host/s390/host-signal.h |  5 +
  linux-user/include/host/sparc/host-signal.h|  5 +
  linux-user/include/host/x86_64/host-signal.h   |  5 +
  linux-user/signal.c| 18 --
  12 files changed, 63 insertions(+), 10 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH 6/6] tests: Remove check-block.sh

2022-02-08 Thread Thomas Huth

Now that the iotests are added by the meson.build file already,
we do not need the check-block.sh wrapper script anymore.

Signed-off-by: Thomas Huth 
---
 tests/check-block.sh | 73 
 1 file changed, 73 deletions(-)
 delete mode 100755 tests/check-block.sh

diff --git a/tests/check-block.sh b/tests/check-block.sh
deleted file mode 100755
index af0c574812..00
--- a/tests/check-block.sh
+++ /dev/null
@@ -1,73 +0,0 @@
-#!/bin/sh
-
-if [ "$#" -eq 0 ]; then
-echo "Usage: $0 fmt..." >&2
-exit 99
-fi
-
-# Honor the SPEED environment variable, just like we do it for "meson test"
-format_list="$@"
-if [ "$SPEED" = "slow" ] || [ "$SPEED" = "thorough" ]; then
-group=
-else
-group="-g auto"
-fi
-
-skip() {
-echo "1..0 #SKIP $*"
-exit 0
-}
-
-if grep -q "CONFIG_GPROF=y" config-host.mak 2>/dev/null ; then
-skip "GPROF is enabled ==> Not running the qemu-iotests."
-fi
-
-# Disable tests with any sanitizer except for specific ones
-SANITIZE_FLAGS=$( grep "CFLAGS.*-fsanitize" config-host.mak 2>/dev/null )
-ALLOWED_SANITIZE_FLAGS="safe-stack cfi-icall"
-#Remove all occurrencies of allowed Sanitize flags
-for j in ${ALLOWED_SANITIZE_FLAGS}; do
-TMP_FLAGS=${SANITIZE_FLAGS}
-SANITIZE_FLAGS=""
-for i in ${TMP_FLAGS}; do
-if ! echo ${i} | grep -q "${j}" 2>/dev/null; then
-SANITIZE_FLAGS="${SANITIZE_FLAGS} ${i}"
-fi
-done
-done
-if echo ${SANITIZE_FLAGS} | grep -q "\-fsanitize" 2>/dev/null; then
-# Have a sanitize flag that is not allowed, stop
-skip "Sanitizers are enabled ==> Not running the qemu-iotests."
-fi
-
-if [ -z "$(find . -name 'qemu-system-*' -print)" ]; then
-skip "No qemu-system binary available ==> Not running the qemu-iotests."
-fi
-
-if ! command -v bash >/dev/null 2>&1 ; then
-skip "bash not available ==> Not running the qemu-iotests."
-fi
-
-if LANG=C bash --version | grep -q 'GNU bash, version [123]' ; then
-skip "bash version too old ==> Not running the qemu-iotests."
-fi
-
-cd tests/qemu-iotests
-
-# QEMU_CHECK_BLOCK_AUTO is used to disable some unstable sub-tests
-export QEMU_CHECK_BLOCK_AUTO=1
-export PYTHONUTF8=1
-# If make was called with -jN we want to call ./check with -j N. Extract the
-# flag from MAKEFLAGS, so that if it absent (or MAKEFLAGS is not defined), JOBS
-# would be an empty line otherwise JOBS is prepared string of flag with value:
-# "-j N"
-# Note, that the following works even if make was called with "-j N" or even
-# "--jobs N", as all these variants becomes simply "-jN" in MAKEFLAGS variable.
-JOBS=$(echo "$MAKEFLAGS" | sed -n 's/\(^\|.* \)-j\([0-9]\+\)\( .*\|$\)/-j 
\2/p')
-
-ret=0
-for fmt in $format_list ; do
-${PYTHON} ./check $JOBS -tap -$fmt $group || ret=1
-done
-
-exit $ret
-- 
2.27.0

[PATCH 4/6] tests/qemu-iotests/meson.build: Call the 'check' script directly

2022-02-08 Thread Thomas Huth

We can get a nicer progress indication if we add the iotests
individually via the 'check' script instead of going through
the check-block.sh wrapper.

For this, we have to add some of the sanity checks that have
originally been done in the tests/check-block.sh script (whether
"bash" is available or whether CFLAGS contain -fsanitize switches)
to the meson.build file now, and add the environment variables
that have been set up by the tests/check-block.sh script before.

Signed-off-by: Thomas Huth 
---
 tests/qemu-iotests/meson.build | 45 --
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/meson.build b/tests/qemu-iotests/meson.build
index e1832c90e0..5a6ccd35d8 100644
--- a/tests/qemu-iotests/meson.build
+++ b/tests/qemu-iotests/meson.build
@@ -1,9 +1,29 @@
-if not have_tools or targetos == 'windows'
+if not have_tools or targetos == 'windows' or \
+   config_host.has_key('CONFIG_GPROF')
   subdir_done()
 endif
 
+bash = find_program('bash', required: false)
+if not bash.found() or \
+   run_command(bash, ['--version']).stdout().contains('GNU bash, version 3')
+  message('bash >= v4.0 not available ==> Disabled the qemu-iotests.')
+  subdir_done()
+endif
+
+foreach cflag: config_host['QEMU_CFLAGS'].split()
+  if cflag.startswith('-fsanitize') and \
+ not cflag.contains('safe-stack') and not cflag.contains('cfi-icall')
+message('Sanitizers are enabled ==> Disabled the qemu-iotests.')
+subdir_done()
+  endif
+endforeach
+
 qemu_iotests_binaries = [qemu_img, qemu_io, qemu_nbd, qsd]
-qemu_iotests_env = {'PYTHON': python.full_path()}
+qemu_iotests_env = {
+  'PYTHON': python.full_path(),
+  'PYTHONUTF8': '1',
+  'QEMU_CHECK_BLOCK_AUTO': '1'
+}
 qemu_iotests_formats = {
   'qcow2': 'quick',
   'raw': 'slow',
@@ -18,16 +38,25 @@ foreach k, v : emulators
   endif
 endforeach
 
+check_script = find_program(meson.current_build_dir() / 'check')
+iotests = run_command(python, [check_script.full_path(), '-g', 'auto', '-n'],
+  check: true).stdout().strip().replace('tests/', 
'').split('\n')
+
 foreach format, speed: qemu_iotests_formats
   if speed == 'quick'
 suites = 'block'
   else
 suites = ['block-' + speed, speed]
   endif
-  test('qemu-iotests ' + format, sh, args: [files('../check-block.sh'), 
format],
-   depends: qemu_iotests_binaries, env: qemu_iotests_env,
-   protocol: 'tap',
-   suite: suites,
-   timeout: 0,
-   is_parallel: false)
+  foreach tst: iotests
+test('iotest-' + format + '-' + tst,
+ python, args: [check_script.full_path(), '-tap', '-' + format, tst],
+ depends: qemu_iotests_binaries,
+ env: qemu_iotests_env + \
+  { 'TEST_DIR':
+meson.current_build_dir() / 'scratch' / format + '-' + tst },
+ protocol: 'tap',
+ suite: suites,
+ timeout: 0)
+  endforeach
 endforeach
-- 
2.27.0

Re: [PATCH v6 02/33] include/block/block: split header into I/O and global state API

2022-02-08 Thread Emanuele Giuseppe Esposito




On 07/02/2022 17:53, Kevin Wolf wrote:
> Am 01.02.2022 um 11:30 hat Paolo Bonzini geschrieben:
>> On 2/1/22 10:45, Emanuele Giuseppe Esposito wrote:
 That said, even if they are a different category, I think it makes sense
 to leave them in the same header file as I/O functions, because I/O
 functions are locked out between drained_begin and drained_end.
>>>
>>> Proposed category description:
>>> /*
>>>   * "Global OR I/O" API functions. These functions can run without
>>>   * the BQL, but only in one specific iothread/main loop.
>>>   *
>>>   * More specifically, these functions use BDRV_POLL_WHILE(bs), which
>>>   * requires the caller to be either in the main thread and hold
>>>   * the BlockdriverState (bs) AioContext lock, or directly in the
>>>   * home thread that runs the bs AioContext. Calling them from
>>>   * another thread in another AioContext would cause deadlocks.
>>>   *
>>>   * Therefore, these functions are not proper I/O, because they
>>>   * can't run in *any* iothreads, but only in a specific one.
>>>   */
>>>
>>> Functions that will surely go under this category:
>>>
>>> BDRV_POLL_WHILE
>>> bdrv_parent_drained_begin_single
>>> bdrv_parent_drained_end_single
>>> bdrv_drain_poll
>>> bdrv_drained_begin
>>> bdrv_do_drained_begin_quiesce
>>> bdrv_subtree_drained_begin
>>> bdrv_drained_end
>>> bdrv_drained_end_no_poll
>>> bdrv_subtree_drained_end
>>>
>>> (all generated_co_wrapper)
>>> bdrv_truncate
>>> bdrv_check
>>> bdrv_invalidate_cache
>>> bdrv_flush
>>> bdrv_pdiscard
>>> bdrv_readv_vmstate
>>> bdrv_writev_vmstate
>>>
>>>
>>> What I am not sure:
>>>
>>> * bdrv_drain_all_begin - bdrv_drain_all_end - bdrv_drain_all: these were
>>> classified as GS, because thay are always called from the main loop.
>>> Should they go in this new category?
>>
>> 1) They look at the list of BDS's, and 2) you can't in general be sure that
>> all BDS's are in *your* AioContext if you call them from a specific
>> AioContext.
>>
>> So they should be GS.
> 
> I agree, calling drain_all functions can only work from the main thread,
> so they are GS.
> 
>>> * how should I interpret "all the callers of BDRV_POLL_WHILE"?
>>> Meaning, if I consider also the callers of the callers, we end up
>>> covering much much more functions. Should I only consider the direct
>>> callers (ie the above)?
>>
>> In general it is safe to make a function GS even if it is potentially "GS or
>> I/O", because that _reduces_ the number of places you can call it from.
>> It's likewise safe to make it I/O-only, but probably it makes less sense.
> 
> Basically, we have a hierarchy of categories where you can always call
> functions in other categories with less restrictions, but never the
> opposite direction.

Added in the respective category documentation:
> 1. Common functions
 * These functions must never call any function from other categories
 * (I/O, "I/O or GS", Global State) except this one, but can be invoked by
 * all of them.

> 2. I/O functions
 * These functions can only call functions from I/O and common categories,
 * but can be invoked by GS, "I/O or GS" and I/O APIs.

> 3. I/O or GS functions
 * These functions can call any function from I/O, common and this
 * categories, but must be invoked only by other "I/O or GS" and GS APIs.

> 4. GS functions
 * These functions can call any function from this and other categories
 * (I/O, "I/O or GS", common), but must be invoked only by other GS APIs.

Emanuele
> 
> So common functions must never call any of the other categories. Global
> state functions can call functions in any category. And "I/O or GS"
> functions like BDRV_POLL_WHILE() can be called by other "I/O or GS" or
> just GS functions, but if it's ever (directly or indirectly) called by
> an I/O or common function, that would be a bug.
> 
> Kevin
>

Re: [PATCH v6 5/8] tcg/sparc: Convert patch_reloc to return bool

2022-02-08 Thread Philippe Mathieu-Daudé via


On 8/2/22 08:17, Richard Henderson wrote:

Since 7ecd02a06f8, if patch_reloc fails we restart translation
with a smaller TB.  Sparc had its function signature changed,


"SPARC"?


but not the logic.  Replace assert with return false.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
  tcg/sparc/tcg-target.c.inc | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 1/2] hw/misc: Supporting AST2600 HACE accumulative mode

2022-02-08 Thread Joel Stanley

Hello Troy,

On Wed, 12 Jan 2022 at 08:10, Troy Lee  wrote:
>
> Accumulative mode will supply a initial state and append padding bit at
> the end of hash stream.  However, the crypto library will padding those
> bit automatically, so ripped it off from iov array.
>
> The aspeed ast2600 acculumative mode is described in datasheet
> ast2600v10.pdf section 25.6.4:
>  1. Allocationg and initiating accumulative hash digest write buffer
> with initial state.
> * Since QEMU crypto/hash api doesn't provide the API to set initial
>   state of hash library, and the initial state is already setted by
>   crypto library (gcrypt/glib/...), so skip this step.
>  2. Calculating accumulative hash digest.
> (a) When receiving the last accumulative data, software need to add
> padding message at the end of the accumulative data. Padding
> message described in specific of MD5, SHA-1, SHA224, SHA256,
> SHA512, SHA512/224, SHA512/256.
> * Since the crypto library (gcrypt/glib) already pad the
>   padding message internally.
> * This patch is to remove the padding message which fed byguest
>   machine driver.


I tested the latest aspeed SDK u-boot, loaded form mmc (with our mmc
model that lives in Cedric's tree) and qemu crashed:

#0  0x7fe867d44932 in ?? () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#1  0x557aba2b6e22 in qcrypto_glib_hash_bytesv (alg=, iov=0x7fe8662ee0b0, niov=1, result=0x7fe8662ee0a8,
resultlen=0x7fe8662ee0a0, errp=0x0) at ../crypto/hash-glib.c:68
#2  0x557ab9f549ea in do_hash_operation (s=s@entry=0x7fe866e1b3b0,
algo=5, sg_mode=sg_mode@entry=true, acc_mode=acc_mode@entry=true) at
../hw/misc/aspeed_hace.c:161
#3  0x557ab9f54dd1 in aspeed_hace_write (opaque=,
addr=12, data=262504, size=) at
../hw/misc/aspeed_hace.c:260

WIthout your patch applied the HACE operation fails, as we do not have
support for accumulative mode, but we do not crash.

>
> Changes in v2:
> - Coding style
> - Add accumulative mode description in comment
>
> Signed-off-by: Troy Lee 
> ---
>  hw/misc/aspeed_hace.c | 43 ---
>  include/hw/misc/aspeed_hace.h |  1 +
>  2 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/hw/misc/aspeed_hace.c b/hw/misc/aspeed_hace.c
> index 10f00e65f4..0710f44621 100644
> --- a/hw/misc/aspeed_hace.c
> +++ b/hw/misc/aspeed_hace.c
> @@ -11,6 +11,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/log.h"
>  #include "qemu/error-report.h"
> +#include "qemu/bswap.h"
>  #include "hw/misc/aspeed_hace.h"
>  #include "qapi/error.h"
>  #include "migration/vmstate.h"
> @@ -27,6 +28,7 @@
>
>  #define R_HASH_SRC  (0x20 / 4)
>  #define R_HASH_DEST (0x24 / 4)
> +#define R_HASH_KEY_BUFF (0x28 / 4)
>  #define R_HASH_SRC_LEN  (0x2c / 4)
>
>  #define R_HASH_CMD  (0x30 / 4)
> @@ -94,7 +96,8 @@ static int hash_algo_lookup(uint32_t reg)
>  return -1;
>  }
>
> -static void do_hash_operation(AspeedHACEState *s, int algo, bool sg_mode)
> +static void do_hash_operation(AspeedHACEState *s, int algo, bool sg_mode,
> +  bool acc_mode)
>  {
>  struct iovec iov[ASPEED_HACE_MAX_SG];
>  g_autofree uint8_t *digest_buf;
> @@ -103,6 +106,7 @@ static void do_hash_operation(AspeedHACEState *s, int 
> algo, bool sg_mode)
>
>  if (sg_mode) {
>  uint32_t len = 0;
> +uint32_t total_len = 0;
>
>  for (i = 0; !(len & SG_LIST_LEN_LAST); i++) {
>  uint32_t addr, src;
> @@ -123,10 +127,26 @@ static void do_hash_operation(AspeedHACEState *s, int 
> algo, bool sg_mode)
>  MEMTXATTRS_UNSPECIFIED, NULL);
>  addr &= SG_LIST_ADDR_MASK;
>
> -iov[i].iov_len = len & SG_LIST_LEN_MASK;
> -plen = iov[i].iov_len;
> +plen = len & SG_LIST_LEN_MASK;
>  iov[i].iov_base = address_space_map(&s->dram_as, addr, &plen, 
> false,
>  MEMTXATTRS_UNSPECIFIED);
> +
> +if (acc_mode) {
> +total_len += plen;
> +
> +if (len & SG_LIST_LEN_LAST) {
> +/*
> + * In the padding message, the last 64/128 bit represents
> + * the total length of bitstream in big endian.
> + * SHA-224, SHA-256 are 64 bit
> + * SHA-384, SHA-512, SHA-512/224, SHA-512/256 are 128 bit
> + * However, we would not process such a huge bit stream.
> + */
> +plen -= total_len - (ldq_be_p(iov[i].iov_base + plen - 
> 8) / 8);
> +}
> +}
> +
> +iov[i].iov_len = plen;
>  }
>  } else {
>  hwaddr len = s->regs[R_HASH_SRC_LEN];
> @@ -210,6 +230,9 @@ static void aspeed_hace_write(void *opaque, hwaddr addr, 
> uint64_t data,
>  case R_HASH_DEST:
>  data &= ahc->dest_mask;
>

[PATCH v2] hvf: arm: Handle unknown ID registers as RES0

2022-02-08 Thread Alexander Graf

Recent Linux versions added support to read ID_AA64ISAR2_EL1. On M1,
those reads trap into QEMU which handles them as faults.

However, AArch64 ID registers should always read as RES0. Let's
handle them accordingly.

This fixes booting Linux 5.17 guests.

Cc: qemu-sta...@nongnu.org
Reported-by: Ivan Babrou 
Signed-off-by: Alexander Graf 
---
 target/arm/hvf/hvf.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 92ad0d29c4..39c3e0d85f 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -729,6 +729,17 @@ static bool hvf_handle_psci_call(CPUState *cpu)
 return true;
 }
 
+static bool is_id_sysreg(uint32_t reg)
+{
+uint32_t op0 = (reg >> 20) & 0x3;
+uint32_t op1 = (reg >> 14) & 0x7;
+uint32_t crn = (reg >> 10) & 0xf;
+uint32_t crm = (reg >> 1) & 0xf;
+uint32_t op2 = (reg >> 7) & 0x7;
+
+return op0 == 3 && op1 == 0 && crn == 0 && crm >= 1 && crm < 8 && op2 < 8;
+}
+
 static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, uint32_t rt)
 {
 ARMCPU *arm_cpu = ARM_CPU(cpu);
@@ -781,6 +792,11 @@ static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, 
uint32_t rt)
 /* Dummy register */
 break;
 default:
+if (is_id_sysreg(reg)) {
+/* ID system registers read as RES0 */
+val = 0;
+break;
+}
 cpu_synchronize_state(cpu);
 trace_hvf_unhandled_sysreg_read(env->pc, reg,
 (reg >> 20) & 0x3,
-- 
2.32.0 (Apple Git-132)

Re: [PATCH 08/11] mos6522: add "info via" HMP command for debugging

2022-02-08 Thread Dr. David Alan Gilbert

* Markus Armbruster (arm...@redhat.com) wrote:
> Philippe Mathieu-Daudé  writes:
> 
> > On 7/2/22 20:34, Peter Maydell wrote:
> >> On Thu, 27 Jan 2022 at 21:03, Mark Cave-Ayland
> >>  wrote:
> >>>
> >>> This displays detailed information about the device registers and timers 
> >>> to aid
> >>> debugging problems with timers and interrupts.
> >>>
> >>> Signed-off-by: Mark Cave-Ayland 
> >>> ---
> >>>   hmp-commands-info.hx | 12 ++
> >>>   hw/misc/mos6522.c| 92 
> >>>   2 files changed, 104 insertions(+)
> >> 
> >> I'm not sure how keen we are on adding new device-specific
> >> HMP info commands, but it's not my area of expertise. Markus ?
> >
> > HMP is David :)
> 
> Yes.

So let me start with an:

Acked-by: Dr. David Alan Gilbert 
(If it's useful info for the author of the device, then I'm happy for
HMP to have that), but then - (moving the reply around a bit):


> Should this be conditional on the targets where we actually link the
> device, like info skeys?
> 

Yes, I think so; it's a reasonably old/obscure device, there's no reason
everyone having it built in.

> > IIRC it is OK as long as HMP is a QMP wrapper.
> 
> That's "how to do it", and I'll get back to it in a jiffie, but Peter
> was wondering about the "whether to do it".
> 
> Most HMP commands are always there.
> 
> We have a few specific to compile-time configurable features: TCG, VNC,
> Spice, Slirp, Linux.  Does not apply here.
> 
> We have a few specific to targets, such as dump-skeys and info skeys for
> s390.  Target-specific is not quite the same as device-specific.
> 
> We have no device-specific commands so far.  However, dump-skeys and
> info skeys appear to be about the skeys *device*, not the s390 target.
> Perhaps any s390 target has such a device?  I don't know.  My point is
> we already have device-specific commands, they're just masquerading as
> target-specific commands.

Yeh we've got info lapic/ioapic as well.

> The proposed device-specific command uses a mechanism originally made
> for modules instead (more on that below).
> 
> I think we should make up our minds which way we want device-specific
> commands done, then do *all* of them that way.

I think device specific commands make sense, but I think it would
probably be better if we had an 'info dev $name' and that a method on
the device rather than registering each one separately.
I'd assume that this would be a QMP level thing that got unwrapped at
HMP.

But that's not a problem for this contribution; someone else can figure
that out later.

Dave


> 
> On to "how to do it", part 1.
> 
> Most of the time, the command handler is declared with the command in
> hmp-commands{,-info}.hx, possibly with compile-time conditionals.
> 
> But it can also be left null there, and set with monitor_register_hmp()
> or monitor_register_hmp_info_hrt().  This is intended for modules; see
> commit f0e48cbd791^..bca6eb34f03.
> 
> Aside: can modules be unloaded?  If yes, we better zap the handler
> then.
> 
> The proposed "info via" uses monitor_register_hmp_info_hrt().  No
> objection from me, requires David's ACK.
> 
> 
> "How to do it", part 2, in reply to Philippe's remark.
> 
> Ideally, HMP commands wrap around QMP commands, but we accept exceptions
> for certain use cases where the wrapping is more trouble than it's
> worth, with justification.  I've explained this several times, and I'm
> happy to dig up a reference or explain it again if there's a need.
> 
> Justifying an exception is bothersome, too.  Daniel Berrangé recently
> created a way to reduce the wrapping trouble (merge commit
> e86e00a2493).  The proposed "info via" makes use of it.
> 
> >> (patch below for context)
> >> thanks
> >> -- PMM
> >> 
> >>>
> >>> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> >>> index e90f20a107..4e714e79a2 100644
> >>> --- a/hmp-commands-info.hx
> >>> +++ b/hmp-commands-info.hx
> >>> @@ -879,3 +879,15 @@ SRST
> >>> ``info sgx``
> >>>   Show intel SGX information.
> >>>   ERST
> >>> +
> >>> +{
> >>> +.name   = "via",
> >>> +.args_type  = "",
> >>> +.params = "",
> >>> +.help   = "show guest 6522 VIA devices",
> >>> +},
> >>> +
> >>> +SRST
> >>> +  ``info via``
> >>> +Show guest 6522 VIA devices.
> >>> +ERST
> 
> [...]
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 3/5] linux-user: Introduce host_sigcontext

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 08:57, Richard Henderson
 wrote:
>
> Do not directly access ucontext_t as the third signal parameter.
> This is preparation for a sparc64 fix.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v6 13/33] include/block/blockjob.h: global state API

2022-02-08 Thread Emanuele Giuseppe Esposito




On 07/02/2022 18:26, Kevin Wolf wrote:
> Am 21.01.2022 um 18:05 hat Emanuele Giuseppe Esposito geschrieben:
>> blockjob functions run always under the BQL lock.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> Reviewed-by: Stefan Hajnoczi 
>> ---
>>  include/block/blockjob.h | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
>> index 87fbb3985f..2373dfeb07 100644
>> --- a/include/block/blockjob.h
>> +++ b/include/block/blockjob.h
>> @@ -74,6 +74,13 @@ typedef struct BlockJob {
>>  GSList *nodes;
>>  } BlockJob;
>>  
>> +/*
>> + * Global state (GS) API. These functions run under the BQL lock.
>> + *
>> + * See include/block/block-global-state.h for more information about
>> + * the GS API.
>> + */
>> +
>>  /**
>>   * block_job_next:
>>   * @job: A block job, or %NULL.
>> @@ -155,6 +162,8 @@ BlockJobInfo *block_job_query(BlockJob *job, Error 
>> **errp);
>>   */
>>  void block_job_iostatus_reset(BlockJob *job);
>>  
>> +/* Common functions that are neither I/O nor Global State */
>> +
>>  /**
>>   * block_job_is_internal:
>>   * @job: The job to determine if it is user-visible or not.
> 
> It's a bit random to comment on it for this patch specifically, but I
> feel that the comments that separate different categories of interfaces
> in a single file are not very easy to visually register.
> 
> I don't think we're doing this anywhere yet, but I wonder if it wouldn't
> be helpful to use a comment style like this which gives more visibility
> to the start and end of sections:
> 
> /***
>  * Common functions that are neither I/O nor Global State
>  */
> 
> Not sure what checkpatch thinks about it either... ;-)

Checkpatch does not like that:

WARNING: Block comments use a leading /* on a separate line

+/**

But adding it like that would be ok
+/*
+ **

I honestly don't find it very beautiful, but let me know if I should add it.

Emanuele

> 
> Kevin
>

Re: [PATCH 4/5] linux-user: Move sparc/host-signal.h to sparc64/host-signal.h

2022-02-08 Thread Philippe Mathieu-Daudé via


On 8/2/22 08:12, Richard Henderson wrote:

We do not support sparc32 as a host, so there's no point in
sparc64 redirecting to sparc.

Signed-off-by: Richard Henderson 
---
  linux-user/include/host/sparc/host-signal.h   | 71 ---
  linux-user/include/host/sparc64/host-signal.h | 64 -
  2 files changed, 63 insertions(+), 72 deletions(-)
  delete mode 100644 linux-user/include/host/sparc/host-signal.h


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2] migration/rdma: set the REUSEADDR option for destination

2022-02-08 Thread Dr. David Alan Gilbert

* Jack Wang (jinpu.w...@ionos.com) wrote:
> We hit following error during testing RDMA transport:
> in case of migration error, mgmt daemon pick one migration port,
> incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr
> 
> Then try another -incoming rdma:[::]:8103, sometime it worked,
> sometimes need another try with other ports number.
> 
> Set the REUSEADDR option for destination, This allow address could
> be reused to avoid rdma_bind_addr error out.
> 
> Signed-off-by: Jack Wang 

Reviewed-by: Dr. David Alan Gilbert 

> ---
> v2: extend commit message as discussed with Pankaj and David
> ---
>  migration/rdma.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c7c7a384875b..663e1fbb096d 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
> **errp)
>  char ip[40] = "unknown";
>  struct rdma_addrinfo *res, *e;
>  char port_str[16];
> +int reuse = 1;
>  
>  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>  rdma->wr_data[idx].control_len = 0;
> @@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> Error **errp)
>  goto err_dest_init_bind_addr;
>  }
>  
> +ret = rdma_set_option(listen_id, RDMA_OPTION_ID, 
> RDMA_OPTION_ID_REUSEADDR,
> +   &reuse, sizeof reuse);
> +if (ret) {
> +ERROR(errp, "Error: could not set REUSEADDR option");
> +goto err_dest_init_bind_addr;
> +}
>  for (e = res; e != NULL; e = e->ai_next) {
>  inet_ntop(e->ai_family,
>  &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof 
> ip);
> -- 
> 2.25.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v6 1/8] tcg/sparc: Use tcg_out_movi_imm13 in tcg_out_addsub2_i64

2022-02-08 Thread Richard Henderson


On 2/8/22 21:40, Peter Maydell wrote:

On Tue, 8 Feb 2022 at 07:17, Richard Henderson
 wrote:


When BH is constant, it is constrained to 10 bits for use in MOVCC.


Where does this happen? I assumed it was going to be done
by the constraint encodings, but tcg_out_addsub2_i64()
is called for the add2_i64 and sub2_i64 ops, which get

 return C_O2_I4(r, r, rZ, rZ, rJ, rJ);
and constraint J is
CONST('J', TCG_CT_CONST_S13).
(and indeed there is no "constrain to 10 bits" letter).


Typo/thinko with 10 bit vs 11 bit:

CONST('I', TCG_CT_CONST_S11)

But there are different constraints for add2_i32 and add2_i64:

case INDEX_op_add2_i32:
case INDEX_op_sub2_i32:
return C_O2_I4(r, r, rZ, rZ, rJ, rJ);
...
case INDEX_op_add2_i64:
case INDEX_op_sub2_i64:
return C_O2_I4(R, R, RZ, RZ, RJ, RI);


r~

Re: [PATCH v5 09/11] 9p: darwin: Implement compatibility for mknodat

2022-02-08 Thread Philippe Mathieu-Daudé via


On 7/2/22 23:40, Will Cohen wrote:

From: Keno Fischer 

Darwin does not support mknodat. However, to avoid race conditions
with later setting the permissions, we must avoid using mknod on
the full path instead. We could try to fchdir, but that would cause
problems if multiple threads try to call mknodat at the same time.
However, luckily there is a solution: Darwin includes a function
that sets the cwd for the current thread only.
This should suffice to use mknod safely.

This function (pthread_fchdir_np) is protected by a check in
meson in a patch later in tihs series.

Signed-off-by: Keno Fischer 
Signed-off-by: Michael Roitzsch 
[Will Cohen: - Adjust coding style
  - Replace clang references with gcc
  - Note radar filed with Apple for missing syscall
  - Replace direct syscall with pthread_fchdir_np and
adjust patch notes accordingly
  - Move qemu_mknodat from 9p-util to osdep and os-posix]
Signed-off-by: Will Cohen 
---
  hw/9pfs/9p-local.c   |  4 ++--
  include/qemu/osdep.h | 10 ++
  os-posix.c   | 34 ++
  3 files changed, 46 insertions(+), 2 deletions(-)



diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index d1660d67fa..f3a8367ece 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -810,3 +810,13 @@ static inline int platform_does_not_support_system(const 
char *command)
  #endif
  
  #endif

+
+/*
+ * As long as mknodat is not available on macOS, this workaround
+ * using pthread_fchdir_np is needed. qemu_mknodat is defined in
+ * os-posix.c
+ */
+#ifdef CONFIG_DARWIN
+int pthread_fchdir_np(int fd);
+#endif
+int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev);


Misplaced. You want the declaration before the __cplusplus guard.

Re: [PATCH 3/5] linux-user: Introduce host_sigcontext

2022-02-08 Thread Philippe Mathieu-Daudé via


On 8/2/22 08:12, Richard Henderson wrote:

Do not directly access ucontext_t as the third signal parameter.
This is preparation for a sparc64 fix.

Signed-off-by: Richard Henderson 
---
  linux-user/include/host/aarch64/host-signal.h | 13 -
  linux-user/include/host/alpha/host-signal.h   | 11 +++
  linux-user/include/host/arm/host-signal.h | 11 +++
  linux-user/include/host/i386/host-signal.h| 11 +++
  linux-user/include/host/loongarch64/host-signal.h | 11 +++
  linux-user/include/host/mips/host-signal.h| 11 +++
  linux-user/include/host/ppc/host-signal.h | 11 +++
  linux-user/include/host/riscv/host-signal.h   | 11 +++
  linux-user/include/host/s390/host-signal.h| 11 +++
  linux-user/include/host/sparc/host-signal.h   | 11 +++
  linux-user/include/host/x86_64/host-signal.h  | 11 +++
  linux-user/signal.c   |  4 ++--
  12 files changed, 80 insertions(+), 47 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 5/5] linux-user/include/host/sparc64: Fix host_sigcontext

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 07:46, Richard Henderson
 wrote:
>
> Sparc64 is unique on linux in *not* passing ucontext_t as
> the third argument to a SA_SIGINFO handler.  It passes the
> old struct sigcontext instead.t log

Stray bit of text at the end of the commit message here.

>
> Fixes: 8b5bd461935b ("linux-user/host/sparc: Populate host_signal.h")
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/include/host/sparc64/host-signal.h | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
>

Otherwise


Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 20/27] qga/vss-win32: use widl if available

2022-02-08 Thread Konstantin Kostiuk

Signed-off-by: Konstantin Kostiuk 

On Thu, Feb 3, 2022 at 8:08 PM Paolo Bonzini  wrote:

> From: Marc-André Lureau 
>
> widl from mingw64-tools and wine can compile a TLB file.
>
> Signed-off-by: Marc-André Lureau 
> Signed-off-by: Paolo Bonzini 
> ---
>  qga/vss-win32/meson.build | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/qga/vss-win32/meson.build b/qga/vss-win32/meson.build
> index 78bdf5e74a..8f3aff5fe3 100644
> --- a/qga/vss-win32/meson.build
> +++ b/qga/vss-win32/meson.build
> @@ -18,15 +18,18 @@ if add_languages('cpp', required: false)
>all_qga += qga_vss
>  endif
>
> -# rules to build qga-vss.tlb
> -# Currently, only native build is supported because building .tlb
> -# (TypeLibrary) from .idl requires WindowsSDK and MIDL (and cl.exe in
> VC++).
>  midl = find_program('midl', required: false)
> +widl = find_program('widl', required: false)
>  if midl.found()
>gen_tlb = custom_target('gen-tlb',
>input: 'qga-vss.idl',
>output: 'qga-vss.tlb',
>command: [midl, '@INPUT@', '/tlb', '@OUTPUT@'])
> +elif widl.found()
> +  gen_tlb = custom_target('gen-tlb',
> +  input: 'qga-vss.idl',
> +  output: 'qga-vss.tlb',
> +  command: [widl, '-t', '@INPUT@', '-o',
> '@OUTPUT@'])
>  else
>gen_tlb = custom_target('gen-tlb',
>input: 'qga-vss.tlb',
> --
> 2.34.1
>
>
>
>

Re: [PATCH v6 1/8] tcg/sparc: Use tcg_out_movi_imm13 in tcg_out_addsub2_i64

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 07:17, Richard Henderson
 wrote:
>
> When BH is constant, it is constrained to 10 bits for use in MOVCC.

Where does this happen? I assumed it was going to be done
by the constraint encodings, but tcg_out_addsub2_i64()
is called for the add2_i64 and sub2_i64 ops, which get

return C_O2_I4(r, r, rZ, rZ, rJ, rJ);
and constraint J is
CONST('J', TCG_CT_CONST_S13).
(and indeed there is no "constrain to 10 bits" letter).

thanks
-- PMM

Re: [PATCH v2] migration/rdma: set the REUSEADDR option for destination

2022-02-08 Thread Pankaj Gupta

> We hit following error during testing RDMA transport:
> in case of migration error, mgmt daemon pick one migration port,
> incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr
>
> Then try another -incoming rdma:[::]:8103, sometime it worked,
> sometimes need another try with other ports number.
>
> Set the REUSEADDR option for destination, This allow address could
> be reused to avoid rdma_bind_addr error out.
>
> Signed-off-by: Jack Wang 
> ---
> v2: extend commit message as discussed with Pankaj and David
> ---
>  migration/rdma.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c7c7a384875b..663e1fbb096d 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
> **errp)
>  char ip[40] = "unknown";
>  struct rdma_addrinfo *res, *e;
>  char port_str[16];
> +int reuse = 1;
>
>  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>  rdma->wr_data[idx].control_len = 0;
> @@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> Error **errp)
>  goto err_dest_init_bind_addr;
>  }
>
> +ret = rdma_set_option(listen_id, RDMA_OPTION_ID, 
> RDMA_OPTION_ID_REUSEADDR,
> + &reuse, sizeof reuse);
> +if (ret) {
> +ERROR(errp, "Error: could not set REUSEADDR option");
> +goto err_dest_init_bind_addr;
> +}
>  for (e = res; e != NULL; e = e->ai_next) {
>  inet_ntop(e->ai_family,
>  &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof 
> ip);

Reviewed-by: Pankaj Gupta

Re: [PATCH v2] memory: Fix qemu crash on starting dirty log twice with stopped VM

2022-02-08 Thread Paolo Bonzini


On 2/7/22 13:30, Peter Xu wrote:

QEMU can now easily crash with two continuous migration carried out:

(qemu) migrate -d exec:cat>out
(qemu) migrate_cancel
(qemu) migrate -d exec:cat>out
[crash] ../softmmu/memory.c:2782: memory_global_dirty_log_start: Assertion
`!(global_dirty_tracking & flags)' failed.

It's because memory API provides a way to postpone dirty log stop if the VM is
stopped, and that'll be re-done until the next VM start.  It was added in 2017
with commit 1931076077 ("migration: optimize the downtime", 2017-08-01).

However the recent work on allowing dirty tracking to be bitmask broke it,
which is commit 63b41db4bc ("memory: make global_dirty_tracking a bitmask",
2021-11-01).

The fix proposed in this patch contains two things:

   (1) Instead of passing over the flags to postpone stop dirty track, we add a
   global variable (along with current vmstate_change variable) to record
   what flags to stop dirty tracking.

   (2) When start dirty tracking, instead if remove the vmstate hook directly,
   we also execute the postponed stop process so that we make sure all the
   starts and stops will be paired.

This procedure is overlooked in the bitmask-ify work in 2021.

Cc: Hyman Huang 
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044818
Fixes: 63b41db4bc ("memory: make global_dirty_tracking a bitmask")
Signed-off-by: Peter Xu 
---
  softmmu/memory.c | 61 +++-
  1 file changed, 45 insertions(+), 16 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 678dc62f06..8060c6de78 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2790,19 +2790,32 @@ void memory_global_after_dirty_log_sync(void)
  MEMORY_LISTENER_CALL_GLOBAL(log_global_after_sync, Forward);
  }
  
+/*

+ * Dirty track stop flags that are postponed due to VM being stopped.  Should
+ * only be used within vmstate_change hook.
+ */
+static unsigned int postponed_stop_flags;
  static VMChangeStateEntry *vmstate_change;
+static void memory_global_dirty_log_stop_postponed_run(void);
  
  void memory_global_dirty_log_start(unsigned int flags)

  {
-unsigned int old_flags = global_dirty_tracking;
+unsigned int old_flags;
+
+assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
  
  if (vmstate_change) {

-qemu_del_vm_change_state_handler(vmstate_change);
-vmstate_change = NULL;
+/* If there is postponed stop(), operate on it first */
+postponed_stop_flags &= ~flags;
+memory_global_dirty_log_stop_postponed_run();
  }
  
-assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));

-assert(!(global_dirty_tracking & flags));
+flags &= ~global_dirty_tracking;
+if (!flags) {
+return;
+}
+
+old_flags = global_dirty_tracking;
  global_dirty_tracking |= flags;
  trace_global_dirty_changed(global_dirty_tracking);
  
@@ -2830,29 +2843,45 @@ static void memory_global_dirty_log_do_stop(unsigned int flags)

  }
  }
  
+/*

+ * Execute the postponed dirty log stop operations if there is, then reset
+ * everything (including the flags and the vmstate change hook).
+ */
+static void memory_global_dirty_log_stop_postponed_run(void)
+{
+/* This must be called with the vmstate handler registered */
+assert(vmstate_change);
+
+/* Note: postponed_stop_flags can be cleared in log start routine */
+if (postponed_stop_flags) {
+memory_global_dirty_log_do_stop(postponed_stop_flags);
+postponed_stop_flags = 0;
+}
+
+qemu_del_vm_change_state_handler(vmstate_change);
+vmstate_change = NULL;
+}
+
  static void memory_vm_change_state_handler(void *opaque, bool running,
 RunState state)
  {
-unsigned int flags = (unsigned int)(uintptr_t)opaque;
  if (running) {
-memory_global_dirty_log_do_stop(flags);
-
-if (vmstate_change) {
-qemu_del_vm_change_state_handler(vmstate_change);
-vmstate_change = NULL;
-}
+memory_global_dirty_log_stop_postponed_run();
  }
  }
  
  void memory_global_dirty_log_stop(unsigned int flags)

  {
  if (!runstate_is_running()) {
+/* Postpone the dirty log stop, e.g., to when VM starts again */
  if (vmstate_change) {
-return;
+/* Batch with previous postponed flags */
+postponed_stop_flags |= flags;
+} else {
+postponed_stop_flags = flags;
+vmstate_change = qemu_add_vm_change_state_handler(
+memory_vm_change_state_handler, NULL);
  }
-vmstate_change = qemu_add_vm_change_state_handler(
-memory_vm_change_state_handler,
-(void *)(uintptr_t)flags);
  return;
  }
  


Queued, thanks.

Paolo

Re: [PATCH v6 2/8] tcg/sparc: Split out tcg_out_movi_imm32

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 07:17, Richard Henderson
 wrote:
>
> Handle 32-bit constants with a separate function, so that
> tcg_out_movi_int does not need to recurse.  This slightly
> rearranges the order of tests for small constants, but
> produces the same output.
>
> Signed-off-by: Richard Henderson 
> ---

Dropping the recursion makes this function definitely
easier to reason about.

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 23/27] meson: do not make qga/vss-win32/meson.build conditional on C++ presence

2022-02-08 Thread Konstantin Kostiuk

Reviewed-by: Konstantin Kostiuk 

On Tue, Feb 8, 2022 at 1:14 PM Konstantin Kostiuk 
wrote:

> Signed-off-by: Konstantin Kostiuk 
>
> On Thu, Feb 3, 2022 at 8:14 PM Paolo Bonzini  wrote:
>
>> From: Marc-André Lureau 
>>
>> C++ presence is checked by the qga/ directory, so it can be assumed
>> when building VSS module.
>>
>> Signed-off-by: Marc-André Lureau 
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  qga/vss-win32/meson.build | 41 +++
>>  1 file changed, 24 insertions(+), 17 deletions(-)
>>
>> diff --git a/qga/vss-win32/meson.build b/qga/vss-win32/meson.build
>> index 8f3aff5fe3..8d4c5708d8 100644
>> --- a/qga/vss-win32/meson.build
>> +++ b/qga/vss-win32/meson.build
>> @@ -1,22 +1,29 @@
>> -if add_languages('cpp', required: false)
>> -  glib_dynamic = dependency('glib-2.0', static: false)
>> -  link_args = cc.get_supported_link_arguments(['-fstack-protector-all',
>> '-fstack-protector-strong',
>> -
>>  '-Wl,--add-stdcall-alias', '-Wl,--enable-stdcall-fixup'])
>> +glib_dynamic = dependency('glib-2.0', static: false)
>> +link_args = cc.get_supported_link_arguments([
>> +  '-fstack-protector-all',
>> +  '-fstack-protector-strong',
>> +  '-Wl,--add-stdcall-alias',
>> +  '-Wl,--enable-stdcall-fixup'
>> +])
>>
>> -  qga_vss = shared_module('qga-vss', ['requester.cpp', 'provider.cpp',
>> 'install.cpp'],
>> -name_prefix: '',
>> -cpp_args: ['-Wno-unknown-pragmas',
>> '-Wno-delete-non-virtual-dtor', '-Wno-non-virtual-dtor'],
>> -link_args: link_args,
>> -vs_module_defs: 'qga-vss.def',
>> -dependencies: [glib_dynamic, socket,
>> -   cc.find_library('ole32'),
>> -   cc.find_library('oleaut32'),
>> -   cc.find_library('shlwapi'),
>> -   cc.find_library('uuid'),
>> -   cc.find_library('intl')])
>> +qga_vss = shared_module(
>> +  'qga-vss',
>> +  ['requester.cpp', 'provider.cpp', 'install.cpp'],
>> +  name_prefix: '',
>> +  cpp_args: ['-Wno-unknown-pragmas', '-Wno-delete-non-virtual-dtor',
>> '-Wno-non-virtual-dtor'],
>> +  link_args: link_args,
>> +  vs_module_defs: 'qga-vss.def',
>> +  dependencies: [
>> +glib_dynamic, socket,
>> +cc.find_library('ole32'),
>> +cc.find_library('oleaut32'),
>> +cc.find_library('shlwapi'),
>> +cc.find_library('uuid'),
>> +cc.find_library('intl')
>> +  ]
>> +)
>>
>> -  all_qga += qga_vss
>> -endif
>> +all_qga += qga_vss
>>
>>  midl = find_program('midl', required: false)
>>  widl = find_program('widl', required: false)
>> --
>> 2.34.1
>>
>>
>>
>>

Re: [PATCH v6 31/33] include/qemu/job.h: introduce job->pre_run() and use it in amend

2022-02-08 Thread Emanuele Giuseppe Esposito




On 07/02/2022 19:14, Kevin Wolf wrote:
> Am 21.01.2022 um 18:05 hat Emanuele Giuseppe Esposito geschrieben:
>> Introduce .pre_run() job callback. This cb will run in job_start,
>> before the coroutine is created and runs run() in the job aiocontext.
>>
>> Therefore, .pre_run() always runs in the main loop.
>> We can use this function together with clean() cb to replace
>> bdrv_child_refresh_perms in block_crypto_amend_options_generic_luks(),
>> since that function can also be called from an iothread via
>> .bdrv_co_amend().
> 
> How is this different from having the same code in the function that
> creates the job, i.e. qmp_x_blockdev_amend()?
> 
> Almost all block jobs have some setup code in the function that creates
> the job instead of doing everything in .run(), precisely because they
> know this code runs in the main thread.
> 
> Is amend really so different from the other block jobs in this respect
> that it needs a different solution?
> 

Are you suggesting to simply call .bdrv_amend_pre_run before job_start()
and just leave JobDriver .clean() to call .bdrv_amend_clean?

Yes, that will work too. I will delete .pre_run().

>> In addition, doing so we check for permissions in all bdrv
>> in amend, not only crypto.
>>
>> .pre_run() and .clean() take care of calling bdrv_amend_pre_run()
>> and bdrv_amend_clean() respectively, to set up driver-specific flags
>> and allow the crypto driver to temporarly provide the WRITE
>> perm to qcrypto_block_amend_options().
>>
>> .pre_run() is not yet invoked by job_start, but .clean() is.
>> This is not a problem, since it will just be a redundant check
>> and crypto will have the update->keys flag == false anyways.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
> 
> I find the way how you split the patches a bit confusing because the
> patches aren't self-contained, but always refer to what the code will do
> in the future, because after the patch it's dead code that isn't even
> theoretically called until the final patch comes in.
> 
> Can we restructure this a bit? First a patch that adds a new JobDriver
> callback (if really needed) along with the actual calls for it and
> everything else that needs to be touched in the generic job
> infrastructure. Second, new BlockDriver callbacks with all of the
> plumbing code. Third, the amend job changes with a patch that doesn't
> touch anything but block/amend.c and potentially block/crypto.c (the
> latter could also be another separate patch).

It is more or less what also Hanna suggested, I have it for the next
version.
> 
> This change with three or four patches could also be a candidate to be
> split out into a separate smaller series.

Makes sense.

Emanuele
> 
> Kevin
>

Re: [PATCH RFC 09/15] migration: Add postcopy_thread_create()

2022-02-08 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Thu, Feb 03, 2022 at 03:19:48PM +, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > Postcopy create threads. A common manner is we init a sem and use it to 
> > > sync
> > > with the thread.  Namely, we have fault_thread_sem and listen_thread_sem 
> > > and
> > > they're only used for this.
> > > 
> > > Make it a shared infrastructure so it's easier to create yet another 
> > > thread.
> > > 
> > 
> > It might be worth a note saying you now share that sem, so you can't
> > start two threads in parallel.
> 
> I'll squash this into the patch:

Thanks

> ---8<---
> diff --git a/migration/migration.h b/migration/migration.h
> index 845be3463c..2a311fd8d6 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -72,7 +72,10 @@ struct MigrationIncomingState {
>  /* A hook to allow cleanup at the end of incoming migration */
>  void *transport_data;
>  void (*transport_cleanup)(void *data);
> -/* Used to sync thread creations */
> +/*
> + * Used to sync thread creations.  Note that we can't create threads in
> + * parallel with this sem.
> + */
>  QemuSemaphore  thread_sync_sem;
>  /*
>   * Free at the start of the main state load, set as the main thread 
> finishes
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 099d8ed478..1a3ba1db84 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -79,6 +79,10 @@ int postcopy_notify(enum PostcopyNotifyReason reason, 
> Error **errp)
>  &pnd);
>  }
>  
> +/*
> + * NOTE: this routine is not thread safe, we can't call it concurrently. But 
> it
> + * should be good enough for migration's purposes.
> + */
>  void postcopy_thread_create(MigrationIncomingState *mis,
>  QemuThread *thread, const char *name,
>  void *(*fn)(void *), int joinable)
> ---8<---
> 
> > 
> > Reviewed-by: Dr. David Alan Gilbert 
> 
> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v12 4/5] softmmu/dirtylimit: implement virtual CPU throttle

2022-02-08 Thread Hyman Huang





在 2022/2/8 16:59, Peter Xu 写道:

On Mon, Jan 24, 2022 at 10:10:39PM +0800, huang...@chinatelecom.cn wrote:

From: Hyman Huang(黄勇) 

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is enabled.

Start a thread to track current dirty page rates and
tune the throttle_us_per_full dynamically untill current
dirty page rate reach the quota.

Introduce the util function in the header for dirtylimit
implementation.

Signed-off-by: Hyman Huang(黄勇) 
---
  accel/kvm/kvm-all.c |  13 ++
  accel/stubs/kvm-stub.c  |   5 +
  include/hw/core/cpu.h   |   6 +
  include/sysemu/dirtylimit.h |  16 +++
  include/sysemu/kvm.h|   2 +
  softmmu/dirtylimit.c| 308 
  softmmu/trace-events|   8 ++
  7 files changed, 358 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 1a5f1d1..60f51fd 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
  #include "qemu/guest-random.h"
  #include "sysemu/hw_accel.h"
  #include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
  
  #include "hw/boards.h"
  
@@ -476,6 +477,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)

  cpu->kvm_state = s;
  cpu->vcpu_dirty = true;
  cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
  
  mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);

  if (mmap_size < 0) {
@@ -1469,6 +1471,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
   */
  sleep(1);
  
+/* keep sleeping in order to not interfere the dirtylimit */

+if (dirtylimit_in_service()) {
+continue;
+}
+
  trace_kvm_dirty_ring_reaper("wakeup");
  r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
  
@@ -2312,6 +2319,11 @@ bool kvm_dirty_ring_enabled(void)

  return kvm_state->kvm_dirty_ring_size ? true : false;
  }
  
+uint32_t kvm_dirty_ring_size(void)

+{
+return kvm_state->kvm_dirty_ring_size;
+}


Please consider moving this into a small patch too along with the stub.

Ok



+
  static int kvm_init(MachineState *ms)
  {
  MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -2961,6 +2973,7 @@ int kvm_cpu_exec(CPUState *cpu)
  qemu_mutex_lock_iothread();
  kvm_dirty_ring_reap(kvm_state, cpu);
  qemu_mutex_unlock_iothread();
+dirtylimit_vcpu_execute(cpu);
  ret = 0;
  break;
  case KVM_EXIT_SYSTEM_EVENT:
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 5319573..1128cb2 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -152,4 +152,9 @@ bool kvm_dirty_ring_enabled(void)
  {
  return false;
  }
+
+uint32_t kvm_dirty_ring_size(void)
+{
+return 0;
+}
  #endif
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 76ab3b8..dbeb31a 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -411,6 +411,12 @@ struct CPUState {
   */
  bool throttle_thread_scheduled;
  
+/*

+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+int64_t throttle_us_per_full;
+
  bool ignore_memory_transaction_failures;
  
  /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */

diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f0..37e634b 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,20 @@ void vcpu_dirty_rate_stat_start(void);
  void vcpu_dirty_rate_stat_stop(void);
  void vcpu_dirty_rate_stat_initialize(void);
  void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_lock(void);
+void dirtylimit_state_unlock(void);
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+void dirtylimit_thread_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_start(void);
+void dirtylimit_stop(void);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
  #endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 6eb39a0..bc3f0b5 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -563,4 +563,6 @@ bool kvm_cpu_check_are_resettable(void);
  bool kvm_arch_cpu_check_are_resettable(void);
  
  bool kvm_dirty_ring_enabled(void);

+
+uint32_t kvm_dirty_ring_size(void);
  #endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index a10ac6f..cf20020 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,32 @@
  #include "sysemu/dirtylimit.h"
  #include "exec/memory.h"
  #include "hw/boards.h"
+#in

Re: [PATCH 5/6] tests: Do not treat the iotests as separate meson test target anymore

2022-02-08 Thread Thomas Huth


On 08/02/2022 11.26, Peter Maydell wrote:

On Tue, 8 Feb 2022 at 10:18, Thomas Huth  wrote:


Now that we add the single iotests directly in meson.build, we do
not have to separate the block suite from the other suites anymore.

Signed-off-by: Thomas Huth 
---
  meson.build| 6 +++---
  scripts/mtest2make.py  | 4 
  tests/Makefile.include | 9 +
  3 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/meson.build b/meson.build
index 5f43355071..b203402ee1 100644
--- a/meson.build
+++ b/meson.build
@@ -3,9 +3,9 @@ project('qemu', ['c'], meson_version: '>=0.58.2',
'b_staticpic=false', 'stdsplit=false'],
  version: files('VERSION'))

-add_test_setup('quick', exclude_suites: ['block', 'slow', 'thorough'], 
is_default: true)
-add_test_setup('slow', exclude_suites: ['block', 'thorough'], env: 
['G_TEST_SLOW=1', 'SPEED=slow'])
-add_test_setup('thorough', exclude_suites: ['block'], env: ['G_TEST_SLOW=1', 
'SPEED=thorough'])
+add_test_setup('quick', exclude_suites: ['slow', 'thorough'], is_default: true)
+add_test_setup('slow', exclude_suites: ['thorough'], env: ['G_TEST_SLOW=1', 
'SPEED=slow'])
+add_test_setup('thorough', env: ['G_TEST_SLOW=1', 'SPEED=thorough'])

  not_found = dependency('', required: false)
  keyval = import('keyval')
diff --git a/scripts/mtest2make.py b/scripts/mtest2make.py
index 4d542e8aaa..304634b71e 100644
--- a/scripts/mtest2make.py
+++ b/scripts/mtest2make.py
@@ -101,10 +101,6 @@ def emit_suite(name, suite, prefix):
  testsuites = defaultdict(Suite)
  for test in introspect['tests']:
  process_tests(test, targets, testsuites)
-# HACK: check-block is a separate target so that it runs with --verbose;
-# only write the dependencies
-emit_suite_deps('block', testsuites['block'], 'check')
-del testsuites['block']


This code being deleted claims to be doing something to ensure that
the tests get run and output the useful messages on failure.


No, AFAIK that --verbose switch just influences how meson prints the 
progress during the test runs (i.e. either a brief or a slightly more 
verbose output).



What is the mechanism for this in the new meson setup ?


cat meson-logs/testlog.txt

... I guess we should either dump that to stdout or publish that file as a 
test artifact?


 Thomas

Re: [PATCH 23/27] meson: do not make qga/vss-win32/meson.build conditional on C++ presence

2022-02-08 Thread Konstantin Kostiuk

Signed-off-by: Konstantin Kostiuk 

On Thu, Feb 3, 2022 at 8:14 PM Paolo Bonzini  wrote:

> From: Marc-André Lureau 
>
> C++ presence is checked by the qga/ directory, so it can be assumed
> when building VSS module.
>
> Signed-off-by: Marc-André Lureau 
> Signed-off-by: Paolo Bonzini 
> ---
>  qga/vss-win32/meson.build | 41 +++
>  1 file changed, 24 insertions(+), 17 deletions(-)
>
> diff --git a/qga/vss-win32/meson.build b/qga/vss-win32/meson.build
> index 8f3aff5fe3..8d4c5708d8 100644
> --- a/qga/vss-win32/meson.build
> +++ b/qga/vss-win32/meson.build
> @@ -1,22 +1,29 @@
> -if add_languages('cpp', required: false)
> -  glib_dynamic = dependency('glib-2.0', static: false)
> -  link_args = cc.get_supported_link_arguments(['-fstack-protector-all',
> '-fstack-protector-strong',
> -   '-Wl,--add-stdcall-alias',
> '-Wl,--enable-stdcall-fixup'])
> +glib_dynamic = dependency('glib-2.0', static: false)
> +link_args = cc.get_supported_link_arguments([
> +  '-fstack-protector-all',
> +  '-fstack-protector-strong',
> +  '-Wl,--add-stdcall-alias',
> +  '-Wl,--enable-stdcall-fixup'
> +])
>
> -  qga_vss = shared_module('qga-vss', ['requester.cpp', 'provider.cpp',
> 'install.cpp'],
> -name_prefix: '',
> -cpp_args: ['-Wno-unknown-pragmas',
> '-Wno-delete-non-virtual-dtor', '-Wno-non-virtual-dtor'],
> -link_args: link_args,
> -vs_module_defs: 'qga-vss.def',
> -dependencies: [glib_dynamic, socket,
> -   cc.find_library('ole32'),
> -   cc.find_library('oleaut32'),
> -   cc.find_library('shlwapi'),
> -   cc.find_library('uuid'),
> -   cc.find_library('intl')])
> +qga_vss = shared_module(
> +  'qga-vss',
> +  ['requester.cpp', 'provider.cpp', 'install.cpp'],
> +  name_prefix: '',
> +  cpp_args: ['-Wno-unknown-pragmas', '-Wno-delete-non-virtual-dtor',
> '-Wno-non-virtual-dtor'],
> +  link_args: link_args,
> +  vs_module_defs: 'qga-vss.def',
> +  dependencies: [
> +glib_dynamic, socket,
> +cc.find_library('ole32'),
> +cc.find_library('oleaut32'),
> +cc.find_library('shlwapi'),
> +cc.find_library('uuid'),
> +cc.find_library('intl')
> +  ]
> +)
>
> -  all_qga += qga_vss
> -endif
> +all_qga += qga_vss
>
>  midl = find_program('midl', required: false)
>  widl = find_program('widl', required: false)
> --
> 2.34.1
>
>
>
>

Re: [PATCH v6 3/8] tcg/sparc: Add scratch argument to tcg_out_movi_int

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 07:17, Richard Henderson
 wrote:
>
> This will allow us to control exactly what scratch register is
> used for loading the constant.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/sparc/tcg-target.c.inc | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 25/27] meson: require dynamic linking for VSS support

2022-02-08 Thread Konstantin Kostiuk

Reviewed-by: Konstantin Kostiuk 

On Fri, Feb 4, 2022 at 7:23 AM Philippe Mathieu-Daudé via <
qemu-devel@nongnu.org> wrote:

> On 3/2/22 18:33, Paolo Bonzini wrote:
> > From: Marc-André Lureau 
> >
> > The glib_dynamic detection does not work because the dependency is
> > overridden in the main meson.build.
> >
> > Signed-off-by: Marc-André Lureau 
> > [Rewritten commit message, added requirement in qga/meson.build - Paolo]
> > Signed-off-by: Paolo Bonzini 
> > ---
> >   qga/meson.build   | 2 ++
> >   qga/vss-win32/meson.build | 4 ++--
> >   2 files changed, 4 insertions(+), 2 deletions(-)
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
>

Re: [PATCH 4/5] linux-user: Move sparc/host-signal.h to sparc64/host-signal.h

2022-02-08 Thread Richard Henderson


On 2/8/22 22:01, Peter Maydell wrote:

On Tue, 8 Feb 2022 at 08:17, Richard Henderson
 wrote:


We do not support sparc32 as a host, so there's no point in
sparc64 redirecting to sparc.


Where do we enforce that ? I couldn't see anything in
configure or meson.build that forbids linux-user with
a 32-bit sparc host, but I probably missed it.


The common-user/host/sparc directory is missing; meson will error out.


r~

Re: [PATCH 2/6] tests/qemu-iotests/meson.build: Improve the indentation

2022-02-08 Thread Philippe Mathieu-Daudé via


On 8/2/22 11:13, Thomas Huth wrote:

By using subdir_done(), we can get rid of one level of indentation
in this file. This will make it easier to add more conditions to
skip the iotests in future patches.

Signed-off-by: Thomas Huth 
---
  tests/qemu-iotests/meson.build | 61 ++
  1 file changed, 32 insertions(+), 29 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 24/27] qga/vss-win32: require widl/midl, remove pre-built TLB file

2022-02-08 Thread Konstantin Kostiuk

Reviewed-by: Konstantin Kostiuk 

On Fri, Feb 4, 2022 at 7:20 AM Philippe Mathieu-Daudé via <
qemu-devel@nongnu.org> wrote:

> On 3/2/22 18:33, Paolo Bonzini wrote:
> > From: Marc-André Lureau 
> >
> > There are no good reason anymore to keep a pre-built file in the
> repository.
> >
> > Signed-off-by: Marc-André Lureau 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >   meson.build   |   4 
> >   qga/meson.build   |   2 ++
> >   qga/vss-win32/meson.build |   9 +
> >   qga/vss-win32/qga-vss.tlb | Bin 1528 -> 0 bytes
> >   4 files changed, 7 insertions(+), 8 deletions(-)
> >   delete mode 100644 qga/vss-win32/qga-vss.tlb
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
>

Re: [PATCH RFC 14/15] migration: Postcopy preemption on separate channel

2022-02-08 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Thu, Feb 03, 2022 at 05:45:32PM +, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > This patch enables postcopy-preempt feature.
> > > 
> > > It contains two major changes to the migration logic:
> > > 
> > >   (1) Postcopy requests are now sent via a different socket from precopy
> > >   background migration stream, so as to be isolated from very high 
> > > page
> > >   request delays
> > > 
> > >   (2) For huge page enabled hosts: when there's postcopy requests, they 
> > > can now
> > >   intercept a partial sending of huge host pages on src QEMU.
> > > 
> > > After this patch, we'll have two "channels" (or say, sockets, because 
> > > it's only
> > > supported on socket-based channels) for postcopy: (1) PRECOPY channel 
> > > (which is
> > > the default channel that transfers background pages), and (2) POSTCOPY
> > > channel (which only transfers requested pages).
> > > 
> > > On the source QEMU, when we found a postcopy request, we'll interrupt the
> > > PRECOPY channel sending process and quickly switch to the POSTCOPY 
> > > channel.
> > > After we serviced all the high priority postcopy pages, we'll switch back 
> > > to
> > > PRECOPY channel so that we'll continue to send the interrupted huge page 
> > > again.
> > > There's no new thread introduced.
> > > 
> > > On the destination QEMU, one new thread is introduced to receive page 
> > > data from
> > > the postcopy specific socket.
> > > 
> > > This patch has a side effect.  After sending postcopy pages, previously 
> > > we'll
> > > assume the guest will access follow up pages so we'll keep sending from 
> > > there.
> > > Now it's changed.  Instead of going on with a postcopy requested page, 
> > > we'll go
> > > back and continue sending the precopy huge page (which can be intercepted 
> > > by a
> > > postcopy request so the huge page can be sent partially before).
> > > 
> > > Whether that's a problem is debatable, because "assuming the guest will
> > > continue to access the next page" doesn't really suite when huge pages are
> > > used, especially if the huge page is large (e.g. 1GB pages).  So that 
> > > locality
> > > hint is much meaningless if huge pages are used.
> > > 
> > > If postcopy preempt is enabled, a separate channel is created for it so 
> > > that it
> > > can be used later for postcopy specific page requests.  On dst node, a
> > > standalone thread is used to receive postcopy requested pages.  The 
> > > thread is
> > > created along with the ram listen thread during POSTCOPY_LISTEN phase.
> > 
> > I think this patch could do with being split into two; the first one that
> > deals with closing/opening channels; and the second that handles the
> > data on the two channels and does the preemption.
> 
> Sounds good, I'll give it a shot on the split.
> 
> > 
> > Another thought is whether, if in the future we allow multifd +
> > postcopy, the multifd code would change - I think it would end up closer
> > to using multiple channels taking different pages on each one.
> 
> Right, so potentially the postcopy channels can be multi-threaded too itself.
> 
> We've had a quick discussion on irc, just to recap: I didn't reuse multifd
> infra because IMO multifd is designed with below ideas in mind:
> 
>   (1) Every multifd thread is equal
>   (2) Throughput oriented
> 
> However I found that postcopy needs something different when they're mixed up
> together with multifd.
> 
> Firstly, we will have some channels sending as much as we could where latency
> is not an issue (aka background pages).  However it's not suitable for page
> requests, so we could also have channels that are servicing page faults fron
> dst.  In short, there're two types of channels/threads we want, and we may 
> want
> to treat them differently.
> 
> The current model is we only have 1 postcopy channel and 1 precopy channel, 
> but
> it should be easier if we want to make it N post + 1 pre base on this series.

It's not clear to me if we need to be able to do N post + M pre, or
whether we have a rule like always at least 1 post, but if there's more
pagefaults in the queue then you can steal all of the pre channels.

> So far all send() is still done in the migration thread so no new sender 
> thread
> but 1 more receiver thread only. If we want to grow that 1->N for postcopy
> channels we may want to move that out too just like what we do with multifd.
> Not sure whether there can be something reused around.  That's where I haven't
> yet explored, but this series should already share a common piece of code on
> refactoring of things like tmp huge page on dst node to be able to receive 
> with
> multiple huge pages.

Right; it makes me think the multifd+postcopy should just use channels.

> This also reminded me that, instead of a new capability, should I simply 
> expose
> a parameter "postcopy-channels=N" to CLI so that we can be prepared with multi
> postcopy channels?

Re: [PATCH 26/27] meson, configure: move ntddscsi API check to meson

2022-02-08 Thread Konstantin Kostiuk

Reviewed-by: Konstantin Kostiuk 

On Tue, Feb 8, 2022 at 1:15 PM Konstantin Kostiuk 
wrote:

> Signed-off-by: Konstantin Kostiuk 
>
> On Thu, Feb 3, 2022 at 8:03 PM Paolo Bonzini  wrote:
>
>> From: Marc-André Lureau 
>>
>> Signed-off-by: Marc-André Lureau 
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  configure| 23 ---
>>  meson.build  | 18 +-
>>  qga/commands-win32.c |  6 +++---
>>  qga/meson.build  |  2 +-
>>  4 files changed, 21 insertions(+), 28 deletions(-)
>>
>> diff --git a/configure b/configure
>> index f67088044f..f6b9e5a1cd 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2289,26 +2289,6 @@ EOF
>>fi
>>  fi
>>
>> -##
>> -# check if mingw environment provides a recent ntddscsi.h
>> -guest_agent_ntddscsi="no"
>> -if test "$mingw32" = "yes"; then
>> -  cat > $TMPC << EOF
>> -#include 
>> -#include 
>> -int main(void) {
>> -#if !defined(IOCTL_SCSI_GET_ADDRESS)
>> -#error Missing required ioctl definitions
>> -#endif
>> -  SCSI_ADDRESS addr = { .Lun = 0, .TargetId = 0, .PathId = 0 };
>> -  return addr.Lun;
>> -}
>> -EOF
>> -  if compile_prog "" "" ; then
>> -guest_agent_ntddscsi=yes
>> -  fi
>> -fi
>> -
>>  ##
>>  # capstone
>>
>> @@ -2818,9 +2798,6 @@ if test "$debug_tcg" = "yes" ; then
>>  fi
>>  if test "$mingw32" = "yes" ; then
>>echo "CONFIG_WIN32=y" >> $config_host_mak
>> -  if test "$guest_agent_ntddscsi" = "yes" ; then
>> -echo "CONFIG_QGA_NTDDSCSI=y" >> $config_host_mak
>> -  fi
>>echo "QEMU_GA_MSI_MINGW_DLL_PATH=${QEMU_GA_MSI_MINGW_DLL_PATH}" >>
>> $config_host_mak
>>echo "QEMU_GA_MANUFACTURER=${QEMU_GA_MANUFACTURER}" >> $config_host_mak
>>echo "QEMU_GA_DISTRO=${QEMU_GA_DISTRO}" >> $config_host_mak
>> diff --git a/meson.build b/meson.build
>> index 999d2c8bd1..98e795d21a 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -1944,6 +1944,22 @@ if targetos == 'windows' and link_language == 'cpp'
>>  int main(void) { return VSS_CTX_BACKUP; }''')
>>  endif
>>
>> +have_ntddscsi = false
>> +if targetos == 'windows'
>> +  have_ntddscsi = cc.compiles('''
>> +#include 
>> +#include 
>> +int main(void) {
>> +#if !defined(IOCTL_SCSI_GET_ADDRESS)
>> +#error Missing required ioctl definitions
>> +#endif
>> +  SCSI_ADDRESS addr = { .Lun = 0, .TargetId = 0, .PathId = 0 };
>> +  return addr.Lun;
>> +}
>> +''')
>> +endif
>> +config_host_data.set('HAVE_NTDDSCSI', have_ntddscsi)
>> +
>>  ignored = ['CONFIG_QEMU_INTERP_PREFIX', # actually per-target
>>  'HAVE_GDB_BIN']
>>  arrays = ['CONFIG_BDRV_RW_WHITELIST', 'CONFIG_BDRV_RO_WHITELIST']
>> @@ -3615,7 +3631,7 @@ summary_info += {'libnfs support':libnfs}
>>  if targetos == 'windows'
>>if have_ga
>>  summary_info += {'QGA VSS support':   have_qga_vss}
>> -summary_info += {'QGA w32 disk info':
>> config_host.has_key('CONFIG_QGA_NTDDSCSI')}
>> +summary_info += {'QGA w32 disk info': have_ntddscsi}
>>endif
>>  endif
>>  summary_info += {'seccomp support':   seccomp}
>> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
>> index 484cb1c6bd..4fbbad793f 100644
>> --- a/qga/commands-win32.c
>> +++ b/qga/commands-win32.c
>> @@ -18,7 +18,7 @@
>>  #include 
>>  #include 
>>  #include 
>> -#ifdef CONFIG_QGA_NTDDSCSI
>> +#ifdef HAVE_NTDDSCSI
>>  #include 
>>  #include 
>>  #endif
>> @@ -474,7 +474,7 @@ void qmp_guest_file_flush(int64_t handle, Error
>> **errp)
>>  }
>>  }
>>
>> -#ifdef CONFIG_QGA_NTDDSCSI
>> +#ifdef HAVE_NTDDSCSI
>>
>>  static GuestDiskBusType win2qemu[] = {
>>  [BusTypeUnknown] = GUEST_DISK_BUS_TYPE_UNKNOWN,
>> @@ -,7 +,7 @@ GuestDiskInfoList *qmp_guest_get_disks(Error **errp)
>>  return NULL;
>>  }
>>
>> -#endif /* CONFIG_QGA_NTDDSCSI */
>> +#endif /* HAVE_NTDDSCSI */
>>
>>  static GuestFilesystemInfo *build_guest_fsinfo(char *guid, Error **errp)
>>  {
>> diff --git a/qga/meson.build b/qga/meson.build
>> index 8c177435ac..fe0bfc295f 100644
>> --- a/qga/meson.build
>> +++ b/qga/meson.build
>> @@ -88,7 +88,7 @@ if targetos == 'windows'
>>  qga_libs += ['-lole32', '-loleaut32', '-lshlwapi', '-lstdc++',
>> '-Wl,--enable-stdcall-fixup']
>>  subdir('vss-win32')
>>endif
>> -  if 'CONFIG_QGA_NTDDSCSI' in config_host
>> +  if have_ntddscsi
>>  qga_libs += ['-lsetupapi', '-lcfgmgr32']
>>endif
>>  endif
>> --
>> 2.34.1
>>
>>
>>
>>

Re: [PATCH 2/5] linux-user: Introduce host_signal_mask

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 08:03, Richard Henderson
 wrote:
>
> Do not directly access the uc_sigmask member.
> This is preparation for a sparc64 fix.
>
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/include/host/aarch64/host-signal.h  |  5 +
>  linux-user/include/host/alpha/host-signal.h|  5 +
>  linux-user/include/host/arm/host-signal.h  |  5 +
>  linux-user/include/host/i386/host-signal.h |  5 +
>  .../include/host/loongarch64/host-signal.h |  5 +
>  linux-user/include/host/mips/host-signal.h |  5 +
>  linux-user/include/host/ppc/host-signal.h  |  5 +
>  linux-user/include/host/riscv/host-signal.h|  5 +
>  linux-user/include/host/s390/host-signal.h |  5 +
>  linux-user/include/host/sparc/host-signal.h|  5 +
>  linux-user/include/host/x86_64/host-signal.h   |  5 +
>  linux-user/signal.c| 18 --
>  12 files changed, 63 insertions(+), 10 deletions(-)
>
> diff --git a/linux-user/include/host/aarch64/host-signal.h 
> b/linux-user/include/host/aarch64/host-signal.h
> index 9770b36dc1..76ab078069 100644
> --- a/linux-user/include/host/aarch64/host-signal.h
> +++ b/linux-user/include/host/aarch64/host-signal.h
> @@ -40,6 +40,11 @@ static inline void host_signal_set_pc(ucontext_t *uc, 
> uintptr_t pc)
>  uc->uc_mcontext.pc = pc;
>  }
>
> +static inline void *host_signal_mask(ucontext_t *uc)
> +{
> +return &uc->uc_sigmask;
> +}

Why void* rather than sigset_t* ?

thanks
-- PMM

Re: [PATCH 21/27] qga/vss: use standard windows headers location

2022-02-08 Thread Konstantin Kostiuk

Reviewed-by: Konstantin Kostiuk 

On Fri, Feb 4, 2022 at 7:18 AM Philippe Mathieu-Daudé via <
qemu-devel@nongnu.org> wrote:

> On 3/2/22 18:33, Paolo Bonzini wrote:
> > From: Marc-André Lureau 
> >
> > Stop using special paths with outdated headers from an old SDK.
> >
> > Instead, use standard include paths.
> >
> > You can still build against the old SDK by running configure with
> > --extra-cxxflags="-isystem `/path/to/inc/win2003/"
>
> Superfluous back quote.
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
> > (this also allows to build against MinGW headers, which are currently
> > broken as in 9.0)
> >
> > Signed-off-by: Marc-André Lureau 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >   qga/vss-win32/install.cpp   | 2 +-
> >   qga/vss-win32/provider.cpp  | 4 ++--
> >   qga/vss-win32/requester.cpp | 4 ++--
> >   qga/vss-win32/vss-common.h  | 6 +-
> >   4 files changed, 6 insertions(+), 10 deletions(-)
>
>

Re: [PATCH 5/6] tests: Do not treat the iotests as separate meson test target anymore

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 11:16, Thomas Huth  wrote:
>
> On 08/02/2022 11.26, Peter Maydell wrote:
> > What is the mechanism for this in the new meson setup ?
>
> cat meson-logs/testlog.txt
>
> ... I guess we should either dump that to stdout

Yes, it needs to actually appear in the stdout for CI jobs,
otherwise it is inaccessible and might as well not exist.
V=1 is the switch we have for "be verbose", and meson's
test facility should honour it.

thanks
-- PMM

Re: [PATCH 4/5] linux-user: Move sparc/host-signal.h to sparc64/host-signal.h

2022-02-08 Thread Peter Maydell

On Tue, 8 Feb 2022 at 08:17, Richard Henderson
 wrote:
>
> We do not support sparc32 as a host, so there's no point in
> sparc64 redirecting to sparc.

Where do we enforce that ? I couldn't see anything in
configure or meson.build that forbids linux-user with
a 32-bit sparc host, but I probably missed it.

thanks
-- PMM

1 2 3 4 >

1 - 100 of 360 matches

Mail list logo