[PATCH] target/loongarch: Add timer information dump support
Timer emulation sometimes is problematic especially when vm is running in kvm mode. This patch adds registers dump support relative with timer hardware, so that it is easier to find the problems. Signed-off-by: Bibo Mao --- target/loongarch/cpu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index fc075952e6..db9a421cc4 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -762,6 +762,8 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int flags) qemu_fprintf(f, "TLBRENTRY=%016" PRIx64 "\n", env->CSR_TLBRENTRY); qemu_fprintf(f, "TLBRBADV=%016" PRIx64 "\n", env->CSR_TLBRBADV); qemu_fprintf(f, "TLBRERA=%016" PRIx64 "\n", env->CSR_TLBRERA); +qemu_fprintf(f, "TCFG=%016" PRIx64 "\n", env->CSR_TCFG); +qemu_fprintf(f, "TVAL=%016" PRIx64 "\n", env->CSR_TVAL); /* fpr */ if (flags & CPU_DUMP_FPU) { -- 2.39.3
Re: [PATCH for-8.2] ui/vnc-clipboard: fix inflate_buffer
Am 04.12.23 um 08:30 schrieb Marc-André Lureau: > Hi > > On Tue, Nov 28, 2023 at 2:52 PM Marc-André Lureau > wrote: >> >> >> It's hard to make the best decision. >> >> We could return the uncompressed data so far, that would fix the >> regression. But potentially, we have incomplete data returned. Clients >> should be fixed to include Z_STREAM_END marker (using Z_FINISH). >> > > I'll queue your patch for 8.2. thanks > Thanks to you for looking into the issue! A colleague of mine now opened a bug report for noVNC and according to the the maintainer there, the protocol does not require setting an end marker: https://github.com/novnc/noVNC/issues/1820#issuecomment-1841014968 Best Regards, Fiona
Re: [PATCH] target/loongarch: Add timer information dump support
在 2023/12/6 下午4:18, Bibo Mao 写道: Timer emulation sometimes is problematic especially when vm is running in kvm mode. This patch adds registers dump support relative with timer hardware, so that it is easier to find the problems. Signed-off-by: Bibo Mao --- target/loongarch/cpu.c | 2 ++ 1 file changed, 2 insertions(+) Reviewed-by: Song Gao Thanks. Song Gao diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index fc075952e6..db9a421cc4 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -762,6 +762,8 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int flags) qemu_fprintf(f, "TLBRENTRY=%016" PRIx64 "\n", env->CSR_TLBRENTRY); qemu_fprintf(f, "TLBRBADV=%016" PRIx64 "\n", env->CSR_TLBRBADV); qemu_fprintf(f, "TLBRERA=%016" PRIx64 "\n", env->CSR_TLBRERA); +qemu_fprintf(f, "TCFG=%016" PRIx64 "\n", env->CSR_TCFG); +qemu_fprintf(f, "TVAL=%016" PRIx64 "\n", env->CSR_TVAL); /* fpr */ if (flags & CPU_DUMP_FPU) {
[PATCH] migration: using qapi_free_SocketAddress instead of g_free
use unified function qapi_free_SocketAddress to free SocketAddress object. Signed-off-by: lijiejun --- migration/migration.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 3ce04b2aaf..e78d31bbbf 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -493,7 +493,7 @@ bool migrate_uri_parse(const char *uri, MigrationChannel **channel, addr->u.socket.type = saddr->type; addr->u.socket.u = saddr->u; /* Don't free the objects inside; their ownership moved to "addr" */ -g_free(saddr); +qapi_free_SocketAddress(saddr); } else if (strstart(uri, "file:", NULL)) { addr->transport = MIGRATION_ADDRESS_TYPE_FILE; addr->u.file.filename = g_strdup(uri + strlen("file:")); -- 2.25.1
Re: [PATCH v6 2/3] hw/ppc: Add N1 chiplet model
On 28-11-2023 12:18, Cédric Le Goater wrote: On 11/27/23 18:13, Chalapathi V wrote: The N1 chiplet handle the high speed i/o traffic over PCIe and others. The N1 chiplet consists of PowerBus Fabric controller, nest Memory Management Unit, chiplet control unit and more. This commit creates a N1 chiplet model and initialize and realize the pervasive chiplet model where chiplet control registers are implemented. This commit also implement the read/write method for the powerbus scom registers Signed-off-by: Chalapathi V --- include/hw/ppc/pnv_n1_chiplet.h | 35 +++ include/hw/ppc/pnv_xscom.h | 6 ++ hw/ppc/pnv_n1_chiplet.c | 171 hw/ppc/meson.build | 1 + 4 files changed, 213 insertions(+) create mode 100644 include/hw/ppc/pnv_n1_chiplet.h create mode 100644 hw/ppc/pnv_n1_chiplet.c diff --git a/include/hw/ppc/pnv_n1_chiplet.h b/include/hw/ppc/pnv_n1_chiplet.h new file mode 100644 index 00..3c42ada7f4 --- /dev/null +++ b/include/hw/ppc/pnv_n1_chiplet.h @@ -0,0 +1,35 @@ +/* + * QEMU PowerPC N1 chiplet model + * + * Copyright (c) 2023, IBM Corporation. + * + * SPDX-License-Identifier: GPL-2.0-or-later + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#ifndef PPC_PNV_N1_CHIPLET_H +#define PPC_PNV_N1_CHIPLET_H + +#include "hw/ppc/pnv_nest_pervasive.h" + +#define TYPE_PNV_N1_CHIPLET "pnv-N1-chiplet" +#define PNV_N1_CHIPLET(obj) OBJECT_CHECK(PnvN1Chiplet, (obj), TYPE_PNV_N1_CHIPLET) + +typedef struct pb_scom { + uint64_t mode; + uint64_t hp_mode2_curr; +} pb_scom; Please use CamelCase coding style. Sure. Thank You + +typedef struct PnvN1Chiplet { + DeviceState parent; + MemoryRegion xscom_pb_eq_regs; + MemoryRegion xscom_pb_es_regs; the MemoryRegion are generally called _mr, _iomem. Sure. Updated in V7. Thank You. + /* common pervasive chiplet unit */ + PnvNestChipletPervasive nest_pervasive; + pb_scom eq[8]; + pb_scom es[4]; are these arrays the registers ? Yes, These are array of pb_scom EQ and ES registers. For now just PB_SCOM_EQ0_HP_MODE2_CURR and PB_SCOM_ES3_MODE registers are modeled. +} PnvN1Chiplet; +#endif /*PPC_PNV_N1_CHIPLET_H */ diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h index 3e15706dec..535ae1dab0 100644 --- a/include/hw/ppc/pnv_xscom.h +++ b/include/hw/ppc/pnv_xscom.h @@ -173,6 +173,12 @@ struct PnvXScomInterfaceClass { #define PNV10_XSCOM_N1_CHIPLET_CTRL_REGS_BASE 0x300 #define PNV10_XSCOM_CHIPLET_CTRL_REGS_SIZE 0x400 +#define PNV10_XSCOM_N1_PB_SCOM_EQ_BASE 0x3011000 +#define PNV10_XSCOM_N1_PB_SCOM_EQ_SIZE 0x200 + +#define PNV10_XSCOM_N1_PB_SCOM_ES_BASE 0x3011300 +#define PNV10_XSCOM_N1_PB_SCOM_ES_SIZE 0x100 + #define PNV10_XSCOM_PEC_NEST_BASE 0x3011800 /* index goes downwards ... */ #define PNV10_XSCOM_PEC_NEST_SIZE 0x100 diff --git a/hw/ppc/pnv_n1_chiplet.c b/hw/ppc/pnv_n1_chiplet.c new file mode 100644 index 00..8e4c21dbf6 --- /dev/null +++ b/hw/ppc/pnv_n1_chiplet.c @@ -0,0 +1,171 @@ +/* + * QEMU PowerPC N1 chiplet model + * + * Copyright (c) 2023, IBM Corporation. + * + * SPDX-License-Identifier: GPL-2.0-or-later + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/log.h" +#include "hw/qdev-properties.h" +#include "hw/ppc/pnv.h" +#include "hw/ppc/pnv_xscom.h" +#include "hw/ppc/pnv_n1_chiplet.h" +#include "hw/ppc/pnv_nest_pervasive.h" + +/* + * The n1 chiplet contains chiplet control unit, + * PowerBus/RaceTrack/Bridge logic, nest Memory Management Unit(nMMU) + * and more. + * + * In this model Nest1 chiplet control registers are modelled via common + * nest pervasive model and few PowerBus racetrack registers are modelled. + */ + +#define PB_SCOM_EQ0_HP_MODE2_CURR 0xe +#define PB_SCOM_ES3_MODE 0x8a + +static uint64_t pnv_n1_chiplet_pb_scom_eq_read(void *opaque, hwaddr addr, + unsigned size) +{ + PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(opaque); + int reg = addr >> 3; + uint64_t val = ~0ull; + + switch (reg) { + case PB_SCOM_EQ0_HP_MODE2_CURR: + val = n1_chiplet->eq[0].hp_mode2_curr; + break; + default: + qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom read at 0x%" PRIx32 "\n", + __func__, reg); + } + return val; +} + +static void pnv_n1_chiplet_pb_scom_eq_write(void *opaque, hwaddr addr, + uint64_t val, unsigned size) +{ + PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(opaque); + int reg = addr >> 3; + + switch (reg) { + case PB_SCOM_EQ0_HP_MODE2_CURR: + n1_chiplet->eq[0].hp_mode2_curr = val; + break; + default: + qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom write at 0x%
Re: [PULL 20/20] tracing: install trace events file only if necessary
On Thu, Apr 20, 2023 at 9:10 AM Stefan Hajnoczi wrote: > > From: Carlos Santos > > It is not useful when configuring with --enable-trace-backends=nop. > > Signed-off-by: Carlos Santos > Signed-off-by: Stefan Hajnoczi > Message-Id: <20230408010410.281263-1-casan...@redhat.com> > --- > trace/meson.build | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/trace/meson.build b/trace/meson.build > index 8e80be895c..30b1d942eb 100644 > --- a/trace/meson.build > +++ b/trace/meson.build > @@ -64,7 +64,7 @@ trace_events_all = custom_target('trace-events-all', > input: trace_events_files, > command: [ 'cat', '@INPUT@' ], > capture: true, > - install: true, > + install: get_option('trace_backends') != [ > 'nop' ], > install_dir: qemu_datadir) > > if 'ust' in get_option('trace_backends') > -- > 2.39.2 > Hello, I still don't see this in the master branch. Is there something preventing it from being applied? -- Carlos Santos Senior Software Maintenance Engineer Red Hat casan...@redhat.comT: +55-11-3534-6186
Re: [PATCH] migration: using qapi_free_SocketAddress instead of g_free
On Wed, Dec 06, 2023 at 06:01:11PM +0800, lijiejun wrote: > use unified function qapi_free_SocketAddress to free SocketAddress > object. > > Signed-off-by: lijiejun > --- > migration/migration.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 3ce04b2aaf..e78d31bbbf 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -493,7 +493,7 @@ bool migrate_uri_parse(const char *uri, MigrationChannel > **channel, > addr->u.socket.type = saddr->type; > addr->u.socket.u = saddr->u; > /* Don't free the objects inside; their ownership moved to "addr" */ > -g_free(saddr); > +qapi_free_SocketAddress(saddr); The comment on the line above that explains we only want to free 'saddr', not its contents. > } else if (strstart(uri, "file:", NULL)) { > addr->transport = MIGRATION_ADDRESS_TYPE_FILE; > addr->u.file.filename = g_strdup(uri + strlen("file:")); > -- > 2.25.1 > > With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH 02/11] tests/avocado: fix typo in replay_linux
On 5/12/23 21:40, Alex Bennée wrote: Signed-off-by: Alex Bennée --- tests/avocado/replay_linux.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 04/11] scripts/replay_dump: track total number of instructions
On 5/12/23 21:40, Alex Bennée wrote: This will help in tracking where we are in the stream when debugging. Signed-off-by: Alex Bennée --- scripts/replay-dump.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 06/11] replay: add proper kdoc for ReplayState
On 5/12/23 21:41, Alex Bennée wrote: Remove the non-standard comment formatting and move the descriptions into a proper kdoc comment. Signed-off-by: Alex Bennée --- replay/replay-internal.h | 27 -- ~~~ roms/SLOF | 2 +- tests/tcg/i386/Makefile.softmmu-target | 19 ++ 3 files changed, 37 insertions(+), 11 deletions(-) diff --git a/replay/replay-internal.h b/replay/replay-internal.h index 516147ddbc..98ca3748ed 100644 --- a/replay/replay-internal.h +++ b/replay/replay-internal.h @@ -63,24 +63,31 @@ enum ReplayEvents { EVENT_COUNT }; +/** + * typedef ReplayState - global tracking Replay state + * + * This structure tracks where we are in the current ReplayState + * including the logged events from the recorded replay stream. Some + * of the data is also stored/restored from VMStateDescription when VM + * save/restore events take place. + * + * @cached_clock: Cached clocks values + * @current_icount: number of processed instructions + * @instruction_count: number of instructions until next event + * @data_kind: current event + * @has_unread_data: 1 if event not yet processed + * @file_offset: offset into replay log at replay snapshot + * @block_request_id: current serialised block request id + * @read_event_id: current async read event id + */ typedef struct ReplayState { -/*! Cached clock values. */ int64_t cached_clock[REPLAY_CLOCK_COUNT]; -/*! Current icount - number of processed instructions. */ uint64_t current_icount; -/*! Number of instructions to be executed before other events happen. */ int instruction_count; -/*! Type of the currently executed event. */ unsigned int data_kind; -/*! Flag which indicates that event is not processed yet. */ unsigned int has_unread_data; -/*! Temporary variable for saving current log offset. */ uint64_t file_offset; -/*! Next block operation id. -This counter is global, because requests from different -block devices should not get overlapping ids. */ uint64_t block_request_id; -/*! Asynchronous event id read from the log */ uint64_t read_event_id; } ReplayState; extern ReplayState replay_state; Up to here: Reviewed-by: Philippe Mathieu-Daudé The rest doesn't belong to this commit: diff --git a/roms/SLOF b/roms/SLOF index 3a259df244..6b6c16b4b4 16 --- a/roms/SLOF +++ b/roms/SLOF @@ -1 +1 @@ -Subproject commit 3a259df2449fc4a4e43ab5f33f0b2c66484b4bc3 +Subproject commit 6b6c16b4b40763507cf1f518096f3c3883c5cf2d diff --git a/tests/tcg/i386/Makefile.softmmu-target b/tests/tcg/i386/Makefile.softmmu-target index 5266f2335a..b9bef72dcf 100644 --- a/tests/tcg/i386/Makefile.softmmu-target +++ b/tests/tcg/i386/Makefile.softmmu-target @@ -33,5 +33,24 @@ EXTRA_RUNS+=$(MULTIARCH_RUNS) memory: CFLAGS+=-DCHECK_UNALIGNED=1 +# Simple Record/Replay Test +.PHONY: memory-record +run-memory-record: memory-record memory + $(call run-test, $<, \ + $(QEMU) -monitor none -display none \ + -chardev file$(COMMA)path=$<.out$(COMMA)id=output \ + -icount shift=5$(COMMA)rr=record$(COMMA)rrfile=record.bin \ + $(QEMU_OPTS) memory) + +.PHONY: memory-replay +run-memory-replay: memory-replay run-memory-record + $(call run-test, $<, \ + $(QEMU) -monitor none -display none \ + -chardev file$(COMMA)path=$<.out$(COMMA)id=output \ + -icount shift=5$(COMMA)rr=replay$(COMMA)rrfile=record.bin \ + $(QEMU_OPTS) memory) + +EXTRA_RUNS+=run-memory-replay + # Running QEMU_OPTS+=-device isa-debugcon,chardev=output -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel
Re: [PATCH 07/11] replay: make has_unread_data a bool
On 5/12/23 21:41, Alex Bennée wrote: For clarity given it only has two states. Signed-off-by: Alex Bennée --- replay/replay-internal.h | 4 ++-- replay/replay-internal.c | 4 ++-- replay/replay-snapshot.c | 6 +++--- replay/replay.c | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 05/11] replay: remove host_clock_last
On 5/12/23 21:41, Alex Bennée wrote: Fixes: a02fe2ca70 (replay: Remove host_clock_last) Signed-off-by: Alex Bennée --- replay/replay-internal.h | 2 -- 1 file changed, 2 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 08/11] replay: introduce a central report point for sync errors
Hi Alex, On 5/12/23 21:41, Alex Bennée wrote: Figuring out why replay has failed is tricky at the best of times. Lets centralise the reporting of a replay sync error and add a little bit of extra information to help with debugging. Signed-off-by: Alex Bennée --- replay/replay-internal.h | 12 replay/replay-char.c | 6 ++ replay/replay-internal.c | 1 + replay/replay.c | 9 + 4 files changed, 24 insertions(+), 4 deletions(-) diff --git a/replay/replay-internal.h b/replay/replay-internal.h index 1bc8fd5086..709e2eb4cb 100644 --- a/replay/replay-internal.h +++ b/replay/replay-internal.h @@ -74,6 +74,7 @@ enum ReplayEvents { * @cached_clock: Cached clocks values * @current_icount: number of processed instructions * @instruction_count: number of instructions until next event + * @current_event: current event index * @data_kind: current event * @has_unread_data: true if event not yet processed * @file_offset: offset into replay log at replay snapshot @@ -84,6 +85,7 @@ typedef struct ReplayState { int64_t cached_clock[REPLAY_CLOCK_COUNT]; uint64_t current_icount; int instruction_count; +unsigned int current_event; unsigned int data_kind; bool has_unread_data; uint64_t file_offset; Shouldn't this field be migrated?
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
Hi Michael, (Cc'ing Lara, Vitaly and Maxim) On 5/12/23 23:28, Michael Roth wrote: Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it exposed a long-running bug in current KVM support for SEV-ES where the kernel assumes that MSR_EFER_LMA will be set explicitly by the guest kernel, in which case EFER write traps would result in KVM eventually seeing MSR_EFER_LMA get set and recording it in such a way that it would be subsequently visible when accessing it via KVM_GET_SREGS/etc. However, guests kernels currently rely on MSR_EFER_LMA getting set automatically when MSR_EFER_LME is set and paging is enabled via CR0_PG_MASK. As a result, the EFER write traps don't actually expose the MSR_EFER_LMA even though it is set internally, and when QEMU subsequently tries to pass this EFER value back to KVM via KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL, which is now considered fatal due to the aforementioned QEMU commit. This can be addressed by inferring the MSR_EFER_LMA bit being set when paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure the expected bits are all present in subsequent handling on the host side. Ultimately, this handling will be implemented in the host kernel, but to avoid breaking QEMU's SEV-ES support when using older host kernels, the same handling can be done in QEMU just after fetching the register values via KVM_GET_SREGS*. Implement that here. Cc: Paolo Bonzini Cc: Marcelo Tosatti Cc: Tom Lendacky Cc: Akihiko Odaki Cc: k...@vger.kernel.org Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") This 'Fixes:' tag is misleading, since as you mentioned this commit only exposes the issue. Commit d499f196fe ("target/i386: Added consistency checks for EFER") or around it seems more appropriate. Is this feature easily testable on our CI, on a x86 runner with KVM access? Signed-off-by: Michael Roth --- v2: - Add handling for KVM_GET_SREGS, not just KVM_GET_SREGS2 target/i386/kvm/kvm.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 11b8177eff..8721c1bf8f 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -3610,6 +3610,7 @@ static int kvm_get_sregs(X86CPU *cpu) { CPUX86State *env = &cpu->env; struct kvm_sregs sregs; +target_ulong cr0_old; int ret; ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS, &sregs); @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu) env->gdt.limit = sregs.gdt.limit; env->gdt.base = sregs.gdt.base; +cr0_old = env->cr[0]; env->cr[0] = sregs.cr0; env->cr[2] = sregs.cr2; env->cr[3] = sregs.cr3; env->cr[4] = sregs.cr4; env->efer = sregs.efer; +if (sev_es_enabled() && env->efer & MSR_EFER_LME) { +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) { +env->efer |= MSR_EFER_LMA; +} +} /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run */ x86_update_hflags(env); @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu) { CPUX86State *env = &cpu->env; struct kvm_sregs2 sregs; +target_ulong cr0_old; int i, ret; ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs); @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu) env->gdt.limit = sregs.gdt.limit; env->gdt.base = sregs.gdt.base; +cr0_old = env->cr[0]; env->cr[0] = sregs.cr0; env->cr[2] = sregs.cr2; env->cr[3] = sregs.cr3; env->cr[4] = sregs.cr4; env->efer = sregs.efer; +if (sev_es_enabled() && env->efer & MSR_EFER_LME) { +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) { +env->efer |= MSR_EFER_LMA; +} +} env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
Re: [PATCH 06/11] replay: add proper kdoc for ReplayState
Philippe Mathieu-Daudé writes: > On 5/12/23 21:41, Alex Bennée wrote: >> Remove the non-standard comment formatting and move the descriptions >> into a proper kdoc comment. >> Signed-off-by: Alex Bennée >> --- >> replay/replay-internal.h | 27 -- > > ~~~ > >> roms/SLOF | 2 +- >> tests/tcg/i386/Makefile.softmmu-target | 19 ++ >> 3 files changed, 37 insertions(+), 11 deletions(-) >> diff --git a/replay/replay-internal.h b/replay/replay-internal.h >> index 516147ddbc..98ca3748ed 100644 >> --- a/replay/replay-internal.h >> +++ b/replay/replay-internal.h >> @@ -63,24 +63,31 @@ enum ReplayEvents { >> EVENT_COUNT >> }; >> +/** >> + * typedef ReplayState - global tracking Replay state >> + * >> + * This structure tracks where we are in the current ReplayState >> + * including the logged events from the recorded replay stream. Some >> + * of the data is also stored/restored from VMStateDescription when VM >> + * save/restore events take place. >> + * >> + * @cached_clock: Cached clocks values >> + * @current_icount: number of processed instructions >> + * @instruction_count: number of instructions until next event >> + * @data_kind: current event >> + * @has_unread_data: 1 if event not yet processed >> + * @file_offset: offset into replay log at replay snapshot >> + * @block_request_id: current serialised block request id >> + * @read_event_id: current async read event id >> + */ >> typedef struct ReplayState { >> -/*! Cached clock values. */ >> int64_t cached_clock[REPLAY_CLOCK_COUNT]; >> -/*! Current icount - number of processed instructions. */ >> uint64_t current_icount; >> -/*! Number of instructions to be executed before other events happen. */ >> int instruction_count; >> -/*! Type of the currently executed event. */ >> unsigned int data_kind; >> -/*! Flag which indicates that event is not processed yet. */ >> unsigned int has_unread_data; >> -/*! Temporary variable for saving current log offset. */ >> uint64_t file_offset; >> -/*! Next block operation id. >> -This counter is global, because requests from different >> -block devices should not get overlapping ids. */ >> uint64_t block_request_id; >> -/*! Asynchronous event id read from the log */ >> uint64_t read_event_id; >> } ReplayState; >> extern ReplayState replay_state; > > Up to here: > Reviewed-by: Philippe Mathieu-Daudé > > The rest doesn't belong to this commit: Oops, I missed that when rushing this out last night... will delete. -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()
On Tue, Dec 05, 2023 at 11:09:59AM +0100, Michal Suchánek wrote: > On Mon, Dec 04, 2023 at 03:02:45PM -0800, Richard Henderson wrote: > > On 12/4/23 12:09, Michal Suchánek wrote: > > > On Mon, Dec 04, 2023 at 02:50:17PM -0500, Stefan Hajnoczi wrote: > > > > On Mon, 4 Dec 2023 at 14:40, Philippe Mathieu-Daudé > > > > wrote: > > > > > +void tcg_unregister_thread(void) > > > > > +{ > > > > > +unsigned int n; > > > > > + > > > > > +n = qatomic_fetch_dec(&tcg_cur_ctxs); > > > > > +g_free(tcg_ctxs[n]); > > > > > +qatomic_set(&tcg_ctxs[n], NULL); > > > > > +} > > > > > > > > tcg_ctxs[n] may not be our context, so this looks like it could free > > > > another thread's context and lead to undefined behavior. > > > > Correct. > > > > > There is cpu->thread_id so perhaps cpu->thread_ctx could be added as > > > well. That would require a bitmap of used threads contexts rather than a > > > counter, though. > > > > Or don't free the context at all, but re-use it when incrementing and > > tcg_ctxs[n] != null (i.e. plugging in a repacement vcpu). After all, there > > can only be tcg_max_ctxs contexts. > > But you would not know which contexts are in use and which aren't without > tracking the association of contexts to CPUs. > > Unless there is a cpu array somewhere and you can use the same index for > both to create the association. I tried to use cpu_index for correlating the tcg_ctx with cpu. I added some asserts that only null contexts are allocated and non-null contexts released but qemu crashes somewhere in tcg sometime after the guest gets to switch root. Thanks Michal
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
On Wed, Dec 06, 2023 at 12:48:35PM +0100, Philippe Mathieu-Daudé wrote: > Hi Michael, > > (Cc'ing Lara, Vitaly and Maxim) > > On 5/12/23 23:28, Michael Roth wrote: > > Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") > > added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it > > exposed a long-running bug in current KVM support for SEV-ES where the > > kernel assumes that MSR_EFER_LMA will be set explicitly by the guest > > kernel, in which case EFER write traps would result in KVM eventually > > seeing MSR_EFER_LMA get set and recording it in such a way that it would > > be subsequently visible when accessing it via KVM_GET_SREGS/etc. > > > > However, guests kernels currently rely on MSR_EFER_LMA getting set > > automatically when MSR_EFER_LME is set and paging is enabled via > > CR0_PG_MASK. As a result, the EFER write traps don't actually expose the > > MSR_EFER_LMA even though it is set internally, and when QEMU > > subsequently tries to pass this EFER value back to KVM via > > KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL, > > which is now considered fatal due to the aforementioned QEMU commit. > > > > This can be addressed by inferring the MSR_EFER_LMA bit being set when > > paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure > > the expected bits are all present in subsequent handling on the host > > side. > > > > Ultimately, this handling will be implemented in the host kernel, but to > > avoid breaking QEMU's SEV-ES support when using older host kernels, the > > same handling can be done in QEMU just after fetching the register > > values via KVM_GET_SREGS*. Implement that here. > > > > Cc: Paolo Bonzini > > Cc: Marcelo Tosatti > > Cc: Tom Lendacky > > Cc: Akihiko Odaki > > Cc: k...@vger.kernel.org > > Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") > Hi Philippe, > This 'Fixes:' tag is misleading, since as you mentioned this commit > only exposes the issue. That's true, a "Workaround-for: " tag or something like that might be more appropriate. I just wanted to make it clear that SEV-ES support is no longer working with that patch applied, so I used Fixes: and elaborated on the commit message. I can change it if there's a better way to convey this though. > > Commit d499f196fe ("target/i386: Added consistency checks for EFER") > or around it seems more appropriate. Those checks seem to be more for TCG. The actual bug is in the host kernel, and it seems to have been there basically since the original SEV-ES host support went in in 2020. I've also sent a patch to address this in KVM: https://lore.kernel.org/lkml/20231205234956.1156210-1-michael.r...@amd.com/T/#u but in the meantime it means that QEMU 8.2+ SEV-ES support would no longer work for any current/older host kernels, so I'm hoping a targeted workaround is warranted to cover that gap. > > Is this feature easily testable on our CI, on a x86 runner with KVM > access? SEV-ES support was introduced with EPYC Zen2 architecture (EPYC 7002 series processors, aka "Rome"). If there are any systems in the test pool that are Zen2 or greater, then a simple boot of a SEV-ES linux guest would be enough to trigger the QEMU crash. I'm not sure if there are any systems of that sort in the pool though. Thanks, Mike
[PATCH] target/i386: Add new CPU model SierraForest
SierraForest is Intel's first generation E-core based Xeon server processor, which will be released in the first half of 2024. SierraForest mainly adds the following new features based on GraniteRapids: - CMPCCXADD CPUID.(EAX=7,ECX=1):EAX[bit 7] - AVX-IFMA CPUID.(EAX=7,ECX=1):EAX[bit 23] - AVX-VNNI-INT8 CPUID.(EAX=7,ECX=1):EDX[bit 4] - AVX-NE-CONVERT CPUID.(EAX=7,ECX=1):EDX[bit 5] - LAM CPUID.(EAX=7,ECX=1):EAX[bit 26] - LASS CPUID.(EAX=7,ECX=1):EAX[bit 6] and removes the following features based on GraniteRapids: - HLE CPUID.(EAX=7,ECX=0):EBX[bit 4] - RTM CPUID.(EAX=7,ECX=0):EBX[bit 11] - AVX512F CPUID.(EAX=7,ECX=0):EBX[bit 16] - AVX512DQ CPUID.(EAX=7,ECX=0):EBX[bit 17] - AVX512_IFMA CPUID.(EAX=7,ECX=0):EBX[bit 21] - AVX512CD CPUID.(EAX=7,ECX=0):EBX[bit 28] - AVX512BW CPUID.(EAX=7,ECX=0):EBX[bit 30] - AVX512VL CPUID.(EAX=7,ECX=0):EBX[bit 31] - AVX512_VBMI CPUID.(EAX=7,ECX=0):ECX[bit 1] - AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 6] - AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 11] - AVX512_BITALG CPUID.(EAX=7,ECX=0):ECX[bit 12] - AVX512_VPOPCNTDQ CPUID.(EAX=7,ECX=0):ECX[bit 14] - LA57 CPUID.(EAX=7,ECX=0):ECX[bit 16] - TSXLDTRK CPUID.(EAX=7,ECX=0):EDX[bit 16] - AMX-BF16 CPUID.(EAX=7,ECX=0):EDX[bit 22] - AVX512_FP16 CPUID.(EAX=7,ECX=0):EDX[bit 23] - AMX-TILE CPUID.(EAX=7,ECX=0):EDX[bit 24] - AMX-INT8 CPUID.(EAX=7,ECX=0):EDX[bit 25] - AVX512_BF16 CPUID.(EAX=7,ECX=1):EAX[bit 5] - fast zero-length MOVSB CPUID.(EAX=7,ECX=1):EAX[bit 10] - fast short CMPSB, SCASB CPUID.(EAX=7,ECX=1):EAX[bit 12] - AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21] - PREFETCHI CPUID.(EAX=7,ECX=1):EDX[bit 14] - XFD CPUID.(EAX=0xD,ECX=1):EAX[bit 4] - EPT_PAGE_WALK_LENGTH_5 VMX_EPT_VPID_CAP(0x48c)[bit 7] SierraForest doesn't support TSX, so TSX Async Abort(TAA) vulnerabilities don't exist on SierraForest. On KVM side, if host doesn't enumerate RTM or RTM gets disabled, ARCH_CAP_TAA_NO is reported as unsupported. To avoid the confusing warning: warning: host doesn't support requested feature: MSR(10AH).taa-no [bit 8] just don't include TAA_NO in SierraForest CPU model. Currently LAM and LASS are not enabled in KVM mainline yet, will add them after merged. Signed-off-by: Tao Su --- The new features can be found in Intel ISE[1]. LAM has just been accepted by KVM[2]. Although we would like to include all SierraForest features in the first version of the CPU model, SierraForest will be released in the first half of 2024[3], we would want user can have a first usable SierraForest CPU model in the QEMU when they have the hardware in their hand. [1] https://cdrdv2.intel.com/v1/dl/getContent/671368 [2] https://lore.kernel.org/all/169810442917.2499338.3440694989716170017.b4...@google.com/ [3] https://www.intel.com/content/www/us/en/newsroom/news/tackling-throughput-computing-sierra-forest.html --- target/i386/cpu.c | 126 ++ 1 file changed, 126 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index cd16cb893d..2405c9e407 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -4099,6 +4099,132 @@ static const X86CPUDefinition builtin_x86_defs[] = { { /* end of list */ }, }, }, +{ +.name = "SierraForest", +.level = 0x23, +.vendor = CPUID_VENDOR_INTEL, +.family = 6, +.model = 175, +.stepping = 0, +/* + * please keep the ascending order so that we can have a clear view of + * bit position of each feature. + */ +.features[FEAT_1_EDX] = +CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC | +CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | +CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | +CPUID_PAT | CPUID_PSE36 | CPUID_CLFLUSH | CPUID_MMX | CPUID_FXSR | +CPUID_SSE | CPUID_SSE2, +.features[FEAT_1_ECX] = +CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSSE3 | +CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID | CPUID_EXT_SSE41 | +CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE | +CPUID_EXT_POPCNT | CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_AES | +CPUID_EXT_XSAVE | CPUID_EXT_AVX | CPUID_EXT_F16C | CPUID_EXT_RDRAND, +.features[FEAT_8000_0001_EDX] = +CPUID_EXT2_SYSCALL | CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB | +CPUID_EXT2_RDTSCP | CPUID_EXT2_LM, +.features[FEAT_8000_0001_ECX] = +CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM | CPUID_EXT3_3DNOWPREFETCH, +.features[FEAT_8000_0008_EBX] = +CPUID_8000_0008_EBX_WBNOINVD, +.features[FEAT_7_0_EBX] = +CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 | +CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ERMS | +CPUID_7_0_EBX_INVPCID | CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX | +CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT | CPUID_7_0
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
On Tue, Dec 5, 2023 at 11:28 PM Michael Roth wrote: > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu) > env->gdt.limit = sregs.gdt.limit; > env->gdt.base = sregs.gdt.base; > > +cr0_old = env->cr[0]; > env->cr[0] = sregs.cr0; > env->cr[2] = sregs.cr2; > env->cr[3] = sregs.cr3; > env->cr[4] = sregs.cr4; > > env->efer = sregs.efer; > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) { > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) { > +env->efer |= MSR_EFER_LMA; > +} > +} There is no need to check cr0_old or sev_es_enabled(); EFER.LMA is simply EFER.LME && CR0.PG. Alternatively, sev_es_enabled() could be an assertion, that is: if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK) && !(env->efer & MSR_EFER_LMA)) { /* Workaround for... */ assert(sev_es_enabled()); env->efer |= MSR_EFER_LMA; } What do you think? Thanks, Paolo > /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run > */ > x86_update_hflags(env); > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu) > { > CPUX86State *env = &cpu->env; > struct kvm_sregs2 sregs; > +target_ulong cr0_old; > int i, ret; > > ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs); > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu) > env->gdt.limit = sregs.gdt.limit; > env->gdt.base = sregs.gdt.base; > > +cr0_old = env->cr[0]; > env->cr[0] = sregs.cr0; > env->cr[2] = sregs.cr2; > env->cr[3] = sregs.cr3; > env->cr[4] = sregs.cr4; > > env->efer = sregs.efer; > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) { > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) { > +env->efer |= MSR_EFER_LMA; > +} > +} > > env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID; > > -- > 2.25.1 >
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
On Wed, Dec 6, 2023 at 2:13 PM Michael Roth wrote: > > This 'Fixes:' tag is misleading, since as you mentioned this commit > > only exposes the issue. > > That's true, a "Workaround-for: " tag or something like that might be more > appropriate. I just wanted to make it clear that SEV-ES support is no longer > working with that patch applied, so I used Fixes: and elaborated on the > commit message. I can change it if there's a better way to convey this > though. That's fine, Fixes is also for automated checks, like "if you have this commit you also want this one". > > > > Commit d499f196fe ("target/i386: Added consistency checks for EFER") > > or around it seems more appropriate. > > Those checks seem to be more for TCG. Yes, that's 100% TCG code. > The actual bug is in the host > kernel, and it seems to have been there basically since the original > SEV-ES host support went in in 2020. I've also sent a patch to address > this in KVM: > > > https://lore.kernel.org/lkml/20231205234956.1156210-1-michael.r...@amd.com/T/#u Thanks, looking at it. Paolo
[PATCH] [PATCH v4] ui/gtk-clipboard: async owner_change clipboard_request
Previous implementation of both functions was blocking and caused guest freezes / crashes on host clipboard owner change. * use callbacks instead of waiting for GTK to deliver clipboard content type evaluation and contents * evaluate a serial in the info struct to discard old events Fixes: d11ebe2ca257 ("ui/gtk: add clipboard support") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1150 Signed-off-by: Edmund Raile --- Gitlab user kolAflash is to credit for determining that the main issue of the QEMU-UI-GTK clipboard is the call to the blocking function gtk_clipboard_wait_is_text_available in gd_owner_change, causing guests to freeze / crash when GTK takes too long. Marc-André Lureau suggested: * gd_clipboard_request might express the same issue due to using gtk_clipboard_wait_for_text * the callbacks could use the QemuClipboardInfo struct's serial field to discard old events * storing the serial for the owner change callback inside the GtkDisplay struct This patch implements asynchronous gd_clipboard_request and gd_owner_change with serial checking. What I haven't implemented is gd_clipboard_notify's QEMU_CLIPBOARD_RESET_SERIAL handling, I don't know how to. Please help me test this patch. The issue mentions the conditions, so far it has been stable. Note that you will need to build QEMU with `enable-gtk-clipboard`. command line options for qemu-vdagent: -device virtio-serial,packed=on,ioeventfd=on \ -device virtserialport,name=com.redhat.spice.0,chardev=vdagent0 \ -chardev qemu-vdagent,id=vdagent0,name=vdagent,clipboard=on,mouse=off \ The guests spice-vdagent user service may have to be started manually. If testing is sufficient and shows no way to break this, we could undo or modify 29e0bfffab87d89c65c0890607e203b1579590a3 to have the GTK UI's clipboard built-in by default again. Previous threads: * https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg06027.html * https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg04397.html * https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg05755.html include/ui/gtk.h | 1 + ui/gtk-clipboard.c | 79 ++ 2 files changed, 66 insertions(+), 14 deletions(-) diff --git a/include/ui/gtk.h b/include/ui/gtk.h index aa3d637029..ac44609770 100644 --- a/include/ui/gtk.h +++ b/include/ui/gtk.h @@ -147,6 +147,7 @@ struct GtkDisplayState { uint32_t cbpending[QEMU_CLIPBOARD_SELECTION__COUNT]; GtkClipboard *gtkcb[QEMU_CLIPBOARD_SELECTION__COUNT]; bool cbowner[QEMU_CLIPBOARD_SELECTION__COUNT]; +uint32_t cb_serial_owner_change; DisplayOptions *opts; }; diff --git a/ui/gtk-clipboard.c b/ui/gtk-clipboard.c index 8d8a636fd1..6b2c32abee 100644 --- a/ui/gtk-clipboard.c +++ b/ui/gtk-clipboard.c @@ -133,26 +133,81 @@ static void gd_clipboard_notify(Notifier *notifier, void *data) } } +/* + * asynchronous clipboard text transfer callback + * called when host (gtk) is ready to deliver to guest + */ +static void gd_clipboard_request_text_callback +(GtkClipboard *clipboard, const gchar *text, gpointer data) +{ +QemuClipboardInfo *info = data; + +if (!text || !qemu_clipboard_check_serial(info, true)) { +qemu_clipboard_info_unref(info); +return; +} + +qemu_clipboard_set_data(info->owner, info, QEMU_CLIPBOARD_TYPE_TEXT, +strlen(text), text, true); +qemu_clipboard_info_unref(info); +} + +/* + * asynchronous clipboard data transfer initiator + * guest requests, host delivers when ready + */ static void gd_clipboard_request(QemuClipboardInfo *info, QemuClipboardType type) { GtkDisplayState *gd = container_of(info->owner, GtkDisplayState, cbpeer); -char *text; switch (type) { case QEMU_CLIPBOARD_TYPE_TEXT: -text = gtk_clipboard_wait_for_text(gd->gtkcb[info->selection]); -if (text) { -qemu_clipboard_set_data(&gd->cbpeer, info, type, -strlen(text), text, true); -g_free(text); -} +gtk_clipboard_request_text +(gd->gtkcb[info->selection], + gd_clipboard_request_text_callback, info); break; default: break; } } +/* + * asynchronous clipboard text availability notification callback + * called when host (gtk) is ready to notify guest + */ +static void gd_owner_change_text_callback +(GtkClipboard *clipboard, const gchar *text, gpointer data) +{ +QemuClipboardInfo *info = data; +GtkDisplayState *gd = container_of(info->owner, GtkDisplayState, cbpeer); + +/* + * performing the subtraction of uints as ints + * is a neat trick to guard against rollover issues + */ +if (!text || +(((int32_t)(info->serial - gd->cb_serial_owner_change)) < 0)) +{ +goto end; +} +gd->cb_serial_owner_change = info->serial; + +info->types[QEMU_CLIPBOARD_TYPE_TEXT].available = t
Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()
Hi! On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote: > Unplugging vCPU triggers the following assertion in > tcg_register_thread(): > > 796 void tcg_register_thread(void) > 797 { > ... > 812 /* Claim an entry in tcg_ctxs */ > 813 n = qatomic_fetch_inc(&tcg_cur_ctxs); > 814 g_assert(n < tcg_max_ctxs); > > Implement and use tcg_unregister_thread() so when a > vCPU is unplugged, the tcg_cur_ctxs refcount is > decremented. I've had addressed this issue before (posted at [1]) and had exercised it with vCPU hotplug/unplug patches for ARM although I was not sure about what would be needed to be done regarding plugins on the context of tcg_unregister_thread. I guess they would need to be freed also? > Reported-by: Michal Suchánek > Suggested-by: Stefan Hajnoczi > Signed-off-by: Philippe Mathieu-Daudé > --- > RFC: untested > Report: > https://lore.kernel.org/qemu-devel/20231204183638.gz9...@kitsune.suse.cz/ > --- > include/tcg/startup.h | 5 + > accel/tcg/tcg-accel-ops-mttcg.c | 1 + > accel/tcg/tcg-accel-ops-rr.c| 1 + > tcg/tcg.c | 17 + > 4 files changed, 24 insertions(+) > > diff --git a/include/tcg/startup.h b/include/tcg/startup.h > index f71305765c..520942a4a1 100644 > --- a/include/tcg/startup.h > +++ b/include/tcg/startup.h > @@ -45,6 +45,11 @@ void tcg_init(size_t tb_size, int splitwx, unsigned > max_cpus); > */ > void tcg_register_thread(void); > > +/** > + * tcg_unregister_thread: Unregister this thread with the TCG runtime > + */ > +void tcg_unregister_thread(void); > + > /** > * tcg_prologue_init(): Generate the code for the TCG prologue > * > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c > index fac80095bb..88d7427aad 100644 > --- a/accel/tcg/tcg-accel-ops-mttcg.c > +++ b/accel/tcg/tcg-accel-ops-mttcg.c > @@ -120,6 +120,7 @@ static void *mttcg_cpu_thread_fn(void *arg) > > tcg_cpus_destroy(cpu); > qemu_mutex_unlock_iothread(); > +tcg_unregister_thread(); > rcu_remove_force_rcu_notifier(&force_rcu.notifier); > rcu_unregister_thread(); > return NULL; > diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c > index 611932f3c3..c2af3aad21 100644 > --- a/accel/tcg/tcg-accel-ops-rr.c > +++ b/accel/tcg/tcg-accel-ops-rr.c > @@ -302,6 +302,7 @@ static void *rr_cpu_thread_fn(void *arg) > rr_deal_with_unplugged_cpus(); > } > > +tcg_unregister_thread(); > rcu_remove_force_rcu_notifier(&force_rcu); > rcu_unregister_thread(); > return NULL; > diff --git a/tcg/tcg.c b/tcg/tcg.c > index d2ea22b397..5125342d70 100644 > --- a/tcg/tcg.c > +++ b/tcg/tcg.c > @@ -781,11 +781,18 @@ static void alloc_tcg_plugin_context(TCGContext *s) > * modes. > */ > #ifdef CONFIG_USER_ONLY > + > void tcg_register_thread(void) > { > tcg_ctx = &tcg_init_ctx; > } > + > +void tcg_unregister_thread(void) > +{ > +} > + > #else > + > void tcg_register_thread(void) > { > TCGContext *s = g_malloc(sizeof(*s)); > @@ -814,6 +821,16 @@ void tcg_register_thread(void) > > tcg_ctx = s; > } > + > +void tcg_unregister_thread(void) > +{ > +unsigned int n; > + > +n = qatomic_fetch_dec(&tcg_cur_ctxs); > +g_free(tcg_ctxs[n]); Is the above off-by-one? > +qatomic_set(&tcg_ctxs[n], NULL); > +} > + Thank you Miguel [1]: https://lore.kernel.org/qemu-devel/20230926103654.34424-5-salil.me...@huawei.com/ > #endif /* !CONFIG_USER_ONLY */ > > /* pool based memory allocation */
[PATCH v4] ui/gtk-clipboard: async owner_change clipboard_request
Previous implementation of both functions was blocking and caused guest freezes / crashes on host clipboard owner change. * use callbacks instead of waiting for GTK to deliver clipboard content type evaluation and contents * evaluate a serial in the info struct to discard old events Fixes: d11ebe2ca257 ("ui/gtk: add clipboard support") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1150 Signed-off-by: Edmund Raile --- Gitlab user kolAflash is to credit for determining that the main issue of the QEMU-UI-GTK clipboard is the call to the blocking function gtk_clipboard_wait_is_text_available in gd_owner_change, causing guests to freeze / crash when GTK takes too long. Marc-André Lureau suggested: * gd_clipboard_request might express the same issue due to using gtk_clipboard_wait_for_text * the callbacks could use the QemuClipboardInfo struct's serial field to discard old events * storing the serial for the owner change callback inside the GtkDisplay struct This patch implements asynchronous gd_clipboard_request and gd_owner_change with serial checking. What I haven't implemented is gd_clipboard_notify's QEMU_CLIPBOARD_RESET_SERIAL handling, I don't know how to. Please help me test this patch. The issue mentions the conditions, so far it has been stable. Note that you will need to build QEMU with `enable-gtk-clipboard`. command line options for qemu-vdagent: -device virtio-serial,packed=on,ioeventfd=on \ -device virtserialport,name=com.redhat.spice.0,chardev=vdagent0 \ -chardev qemu-vdagent,id=vdagent0,name=vdagent,clipboard=on,mouse=off \ The guests spice-vdagent user service may have to be started manually. If testing is sufficient and shows no way to break this, we could undo or modify 29e0bfffab87d89c65c0890607e203b1579590a3 to have the GTK UI's clipboard built-in by default again. Previous threads: * https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg06027.html * https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg04397.html * https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg05755.html include/ui/gtk.h | 1 + ui/gtk-clipboard.c | 79 ++ 2 files changed, 66 insertions(+), 14 deletions(-) diff --git a/include/ui/gtk.h b/include/ui/gtk.h index aa3d637029..ac44609770 100644 --- a/include/ui/gtk.h +++ b/include/ui/gtk.h @@ -147,6 +147,7 @@ struct GtkDisplayState { uint32_t cbpending[QEMU_CLIPBOARD_SELECTION__COUNT]; GtkClipboard *gtkcb[QEMU_CLIPBOARD_SELECTION__COUNT]; bool cbowner[QEMU_CLIPBOARD_SELECTION__COUNT]; +uint32_t cb_serial_owner_change; DisplayOptions *opts; }; diff --git a/ui/gtk-clipboard.c b/ui/gtk-clipboard.c index 8d8a636fd1..6b2c32abee 100644 --- a/ui/gtk-clipboard.c +++ b/ui/gtk-clipboard.c @@ -133,26 +133,81 @@ static void gd_clipboard_notify(Notifier *notifier, void *data) } } +/* + * asynchronous clipboard text transfer callback + * called when host (gtk) is ready to deliver to guest + */ +static void gd_clipboard_request_text_callback +(GtkClipboard *clipboard, const gchar *text, gpointer data) +{ +QemuClipboardInfo *info = data; + +if (!text || !qemu_clipboard_check_serial(info, true)) { +qemu_clipboard_info_unref(info); +return; +} + +qemu_clipboard_set_data(info->owner, info, QEMU_CLIPBOARD_TYPE_TEXT, +strlen(text), text, true); +qemu_clipboard_info_unref(info); +} + +/* + * asynchronous clipboard data transfer initiator + * guest requests, host delivers when ready + */ static void gd_clipboard_request(QemuClipboardInfo *info, QemuClipboardType type) { GtkDisplayState *gd = container_of(info->owner, GtkDisplayState, cbpeer); -char *text; switch (type) { case QEMU_CLIPBOARD_TYPE_TEXT: -text = gtk_clipboard_wait_for_text(gd->gtkcb[info->selection]); -if (text) { -qemu_clipboard_set_data(&gd->cbpeer, info, type, -strlen(text), text, true); -g_free(text); -} +gtk_clipboard_request_text +(gd->gtkcb[info->selection], + gd_clipboard_request_text_callback, info); break; default: break; } } +/* + * asynchronous clipboard text availability notification callback + * called when host (gtk) is ready to notify guest + */ +static void gd_owner_change_text_callback +(GtkClipboard *clipboard, const gchar *text, gpointer data) +{ +QemuClipboardInfo *info = data; +GtkDisplayState *gd = container_of(info->owner, GtkDisplayState, cbpeer); + +/* + * performing the subtraction of uints as ints + * is a neat trick to guard against rollover issues + */ +if (!text || +(((int32_t)(info->serial - gd->cb_serial_owner_change)) < 0)) +{ +goto end; +} +gd->cb_serial_owner_change = info->serial; + +info->types[QEMU_CLIPBOARD_TYPE_TEXT].available =
Re: [PATCH 10/11] chardev: force write all when recording replay logs
On 5/12/23 21:41, Alex Bennée wrote: This is mostly a problem within avocado as serial generally isn't busy enough to overfill pipes. However the consequences of recording a failed write will haunt us on replay when causing the log to go out of sync. Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2010 Signed-off-by: Alex Bennée Cc: Pavel Dovgalyuk --- chardev/char.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/chardev/char.c b/chardev/char.c index 996a024c7a..6e5b4d7345 100644 --- a/chardev/char.c +++ b/chardev/char.c @@ -171,7 +171,8 @@ int qemu_chr_write(Chardev *s, const uint8_t *buf, int len, bool write_all) return res; } Please add a comment explaining why in the code. Maybe simpler using: if (replay_mode == REPLAY_MODE_RECORD) { /* $explanation */ write_all = true; } -res = qemu_chr_write_buffer(s, buf, len, &offset, write_all); +res = qemu_chr_write_buffer(s, buf, len, &offset, +replay_mode == REPLAY_MODE_RECORD ? true : write_all); With a comment: Reviewed-by: Philippe Mathieu-Daudé
Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()
Hi Stefan, On 6/12/23 13:56, Michal Suchánek wrote: On Tue, Dec 05, 2023 at 11:09:59AM +0100, Michal Suchánek wrote: On Mon, Dec 04, 2023 at 03:02:45PM -0800, Richard Henderson wrote: On 12/4/23 12:09, Michal Suchánek wrote: On Mon, Dec 04, 2023 at 02:50:17PM -0500, Stefan Hajnoczi wrote: On Mon, 4 Dec 2023 at 14:40, Philippe Mathieu-Daudé wrote: +void tcg_unregister_thread(void) +{ +unsigned int n; + +n = qatomic_fetch_dec(&tcg_cur_ctxs); +g_free(tcg_ctxs[n]); +qatomic_set(&tcg_ctxs[n], NULL); +} tcg_ctxs[n] may not be our context, so this looks like it could free another thread's context and lead to undefined behavior. Correct. There is cpu->thread_id so perhaps cpu->thread_ctx could be added as well. That would require a bitmap of used threads contexts rather than a counter, though. Or don't free the context at all, but re-use it when incrementing and tcg_ctxs[n] != null (i.e. plugging in a repacement vcpu). After all, there can only be tcg_max_ctxs contexts. But you would not know which contexts are in use and which aren't without tracking the association of contexts to CPUs. Unless there is a cpu array somewhere and you can use the same index for both to create the association. I tried to use cpu_index for correlating the tcg_ctx with cpu. I added some asserts that only null contexts are allocated and non-null contexts released but qemu crashes somewhere in tcg sometime after the guest gets to switch root. Since this isn't trivial and is a long standing issue, let's not block the 8.2 release with it. Regards, Phil.
Re: [PATCH 1/1] hw/i2c: add pca9543 i2c-mux switch
On Tue, Dec 05, 2023 at 11:05:33AM -0800, Patrick Venture wrote: > On Tue, Nov 14, 2023 at 3:30 PM Corey Minyard wrote: > > > On Mon, Nov 13, 2023 at 02:31:56PM +0800, Potin Lai wrote: > > > Add pca9543 2-channel i2c-mux switch support. > > > > > > Signed-off-by: Potin Lai > > > > Looks good to me. > > > > Acked-by: Corey Minyard > > > > > --- > > > hw/i2c/i2c_mux_pca954x.c | 12 > > > include/hw/i2c/i2c_mux_pca954x.h | 1 + > > > 2 files changed, 13 insertions(+) > > > > > > diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c > > > index db5db956a6..6aace0fc47 100644 > > > --- a/hw/i2c/i2c_mux_pca954x.c > > > +++ b/hw/i2c/i2c_mux_pca954x.c > > > @@ -30,6 +30,7 @@ > > > > > > #define PCA9548_CHANNEL_COUNT 8 > > > #define PCA9546_CHANNEL_COUNT 4 > > > +#define PCA9543_CHANNEL_COUNT 2 > > > > > > /* > > > * struct Pca954xState - The pca954x state object. > > > @@ -172,6 +173,12 @@ I2CBus *pca954x_i2c_get_bus(I2CSlave *mux, uint8_t > > channel) > > > return pca954x->bus[channel]; > > > } > > > > > > +static void pca9543_class_init(ObjectClass *klass, void *data) > > > +{ > > > +Pca954xClass *s = PCA954X_CLASS(klass); > > > +s->nchans = PCA9543_CHANNEL_COUNT; > > > +} > > > + > > > static void pca9546_class_init(ObjectClass *klass, void *data) > > > { > > > Pca954xClass *s = PCA954X_CLASS(klass); > > > @@ -246,6 +253,11 @@ static const TypeInfo pca954x_info[] = { > > > .class_init= pca954x_class_init, > > > .abstract = true, > > > }, > > > +{ > > > +.name = TYPE_PCA9543, > > > +.parent= TYPE_PCA954X, > > > +.class_init= pca9543_class_init, > > > +}, > > > { > > > .name = TYPE_PCA9546, > > > .parent= TYPE_PCA954X, > > > diff --git a/include/hw/i2c/i2c_mux_pca954x.h > > b/include/hw/i2c/i2c_mux_pca954x.h > > > index 3dd25ec983..1da5508ed5 100644 > > > --- a/include/hw/i2c/i2c_mux_pca954x.h > > > +++ b/include/hw/i2c/i2c_mux_pca954x.h > > > @@ -3,6 +3,7 @@ > > > > > > #include "hw/i2c/i2c.h" > > > > > > +#define TYPE_PCA9543 "pca9543" > > > #define TYPE_PCA9546 "pca9546" > > > #define TYPE_PCA9548 "pca9548" > > > > > > -- > > > 2.31.1 > > > > > > > > > Corey, can you include this in a pull email? I don't quite have that set up > at present, Titus is going to help me figure it out. Ok, I'm running tests now, I'll get a pull request in assuming everything is ok. -corey
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
On Wed, Dec 06, 2023 at 02:41:13PM +0100, Paolo Bonzini wrote: > On Tue, Dec 5, 2023 at 11:28 PM Michael Roth wrote: > > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu) > > env->gdt.limit = sregs.gdt.limit; > > env->gdt.base = sregs.gdt.base; > > > > +cr0_old = env->cr[0]; > > env->cr[0] = sregs.cr0; > > env->cr[2] = sregs.cr2; > > env->cr[3] = sregs.cr3; > > env->cr[4] = sregs.cr4; > > > > env->efer = sregs.efer; > > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) { > > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) { > > +env->efer |= MSR_EFER_LMA; > > +} > > +} > > There is no need to check cr0_old or sev_es_enabled(); EFER.LMA is > simply EFER.LME && CR0.PG. Yah, I originally had it like that, but svm_set_cr0() in the kernel only sets it in vcpu->arch.efer it when setting CR0.PG, so I thought it might be safer to be more selective and mirror that handling on the QEMU side so we can leave as much of any other sanity checks on kernel/QEMU side intact as possible. E.g., if some other bug in the kernel ends up unsetting EFER.LMA while paging is still enabled, we'd still notice that when passing it back in via KVM_SET_SREGS*. But agree it's simpler to just always set it based on CR0.PG and EFER.LMA and can send a v3 if that's preferred. > > Alternatively, sev_es_enabled() could be an assertion, that is: > > if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK) && >!(env->efer & MSR_EFER_LMA)) { > /* Workaround for... */ > assert(sev_es_enabled()); > env->efer |= MSR_EFER_LMA; > } > > What do you think? I'm a little apprehensive about this approach for similar reasons as above. The current patch is guaranteed to only affect SEV-ES, whereas this approach could trigger assertions for other edge-cases we aren't aware of that could further impact the release. For instance, "in theory", QEMU or KVM might have some handling where EFER.LMA is set somewhere after (or outside of) KVM_GET_SREGS, but now with this proposed change QEMU would become more restrictive and generate an assert for those cases. I don't think that's actually the case, but in the off-chance that such a case exists there could be more fall-out such as further delays, or the need for a stable fix. But no strong opinion there either if that ends up being the preferred approach, just trying to consider the pros/cons. Thanks, Mike > > Thanks, > > Paolo > > > /* changes to apic base and cr8/tpr are read back via > > kvm_arch_post_run */ > > x86_update_hflags(env); > > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu) > > { > > CPUX86State *env = &cpu->env; > > struct kvm_sregs2 sregs; > > +target_ulong cr0_old; > > int i, ret; > > > > ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs); > > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu) > > env->gdt.limit = sregs.gdt.limit; > > env->gdt.base = sregs.gdt.base; > > > > +cr0_old = env->cr[0]; > > env->cr[0] = sregs.cr0; > > env->cr[2] = sregs.cr2; > > env->cr[3] = sregs.cr3; > > env->cr[4] = sregs.cr4; > > > > env->efer = sregs.efer; > > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) { > > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) { > > +env->efer |= MSR_EFER_LMA; > > +} > > +} > > > > env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID; > > > > -- > > 2.25.1 > > >
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
On Wed, Dec 6, 2023 at 3:46 PM Michael Roth wrote: > > There is no need to check cr0_old or sev_es_enabled(); EFER.LMA is > > simply EFER.LME && CR0.PG. > > Yah, I originally had it like that, but svm_set_cr0() in the kernel only > sets it in vcpu->arch.efer it when setting CR0.PG, so I thought it might > be safer to be more selective and mirror that handling on the QEMU side > so we can leave as much of any other sanity checks on kernel/QEMU side > intact as possible. E.g., if some other bug in the kernel ends up > unsetting EFER.LMA while paging is still enabled, we'd still notice that > when passing it back in via KVM_SET_SREGS*. > > But agree it's simpler to just always set it based on CR0.PG and EFER.LMA > and can send a v3 if that's preferred. Yeah, in this case I think the chance of something breaking is really, really small. The behavior of svm_set_cr0() is more due to how the surrounding code looks like, than anything else. > > Alternatively, sev_es_enabled() could be an assertion, that is: > > > > if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK) && > >!(env->efer & MSR_EFER_LMA)) { > > /* Workaround for... */ > > assert(sev_es_enabled()); > > env->efer |= MSR_EFER_LMA; > > } > > > > What do you think? > > I'm a little apprehensive about this approach for similar reasons as > above I agree on this. I think it's worth in general to have clear expectations, though. If you think it's worrisome, we can commit it without assertion now and add it in 9.0. Paolo
Re: [PATCH] target/i386: Add new CPU model SierraForest
On Wed, Dec 06, 2023 at 09:19:23PM +0800, Tao Su wrote: > Date: Wed, 6 Dec 2023 21:19:23 +0800 > From: Tao Su > Subject: [PATCH] target/i386: Add new CPU model SierraForest > X-Mailer: git-send-email 2.34.1 > > SierraForest is Intel's first generation E-core based Xeon server > processor, which will be released in the first half of 2024. > > SierraForest mainly adds the following new features based on > GraniteRapids: > > - CMPCCXADD CPUID.(EAX=7,ECX=1):EAX[bit 7] > - AVX-IFMA CPUID.(EAX=7,ECX=1):EAX[bit 23] > - AVX-VNNI-INT8 CPUID.(EAX=7,ECX=1):EDX[bit 4] > - AVX-NE-CONVERT CPUID.(EAX=7,ECX=1):EDX[bit 5] > - LAM CPUID.(EAX=7,ECX=1):EAX[bit 26] > - LASS CPUID.(EAX=7,ECX=1):EAX[bit 6] > > and removes the following features based on GraniteRapids: > > - HLE CPUID.(EAX=7,ECX=0):EBX[bit 4] > - RTM CPUID.(EAX=7,ECX=0):EBX[bit 11] > - AVX512F CPUID.(EAX=7,ECX=0):EBX[bit 16] > - AVX512DQ CPUID.(EAX=7,ECX=0):EBX[bit 17] > - AVX512_IFMA CPUID.(EAX=7,ECX=0):EBX[bit 21] > - AVX512CD CPUID.(EAX=7,ECX=0):EBX[bit 28] > - AVX512BW CPUID.(EAX=7,ECX=0):EBX[bit 30] > - AVX512VL CPUID.(EAX=7,ECX=0):EBX[bit 31] > - AVX512_VBMI CPUID.(EAX=7,ECX=0):ECX[bit 1] > - AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 6] > - AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 11] > - AVX512_BITALG CPUID.(EAX=7,ECX=0):ECX[bit 12] > - AVX512_VPOPCNTDQ CPUID.(EAX=7,ECX=0):ECX[bit 14] > - LA57 CPUID.(EAX=7,ECX=0):ECX[bit 16] > - TSXLDTRK CPUID.(EAX=7,ECX=0):EDX[bit 16] > - AMX-BF16 CPUID.(EAX=7,ECX=0):EDX[bit 22] > - AVX512_FP16 CPUID.(EAX=7,ECX=0):EDX[bit 23] > - AMX-TILE CPUID.(EAX=7,ECX=0):EDX[bit 24] > - AMX-INT8 CPUID.(EAX=7,ECX=0):EDX[bit 25] > - AVX512_BF16 CPUID.(EAX=7,ECX=1):EAX[bit 5] > - fast zero-length MOVSB CPUID.(EAX=7,ECX=1):EAX[bit 10] > - fast short CMPSB, SCASB CPUID.(EAX=7,ECX=1):EAX[bit 12] > - AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21] > - PREFETCHI CPUID.(EAX=7,ECX=1):EDX[bit 14] > - XFD CPUID.(EAX=0xD,ECX=1):EAX[bit 4] > - EPT_PAGE_WALK_LENGTH_5 VMX_EPT_VPID_CAP(0x48c)[bit 7] > > SierraForest doesn't support TSX, so TSX Async Abort(TAA) vulnerabilities > don't exist on SierraForest. On KVM side, if host doesn't enumerate RTM > or RTM gets disabled, ARCH_CAP_TAA_NO is reported as unsupported. To > avoid the confusing warning: > warning: host doesn't support requested feature: MSR(10AH).taa-no > [bit 8] > > just don't include TAA_NO in SierraForest CPU model. > > Currently LAM and LASS are not enabled in KVM mainline yet, will add > them after merged. > > Signed-off-by: Tao Su > --- Reviewed-by: Zhao Liu
Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()
On Wed, 6 Dec 2023 at 09:29, Philippe Mathieu-Daudé wrote: > > Hi Stefan, > > On 6/12/23 13:56, Michal Suchánek wrote: > > On Tue, Dec 05, 2023 at 11:09:59AM +0100, Michal Suchánek wrote: > >> On Mon, Dec 04, 2023 at 03:02:45PM -0800, Richard Henderson wrote: > >>> On 12/4/23 12:09, Michal Suchánek wrote: > On Mon, Dec 04, 2023 at 02:50:17PM -0500, Stefan Hajnoczi wrote: > > On Mon, 4 Dec 2023 at 14:40, Philippe Mathieu-Daudé > > wrote: > >> +void tcg_unregister_thread(void) > >> +{ > >> +unsigned int n; > >> + > >> +n = qatomic_fetch_dec(&tcg_cur_ctxs); > >> +g_free(tcg_ctxs[n]); > >> +qatomic_set(&tcg_ctxs[n], NULL); > >> +} > > > > tcg_ctxs[n] may not be our context, so this looks like it could free > > another thread's context and lead to undefined behavior. > >>> > >>> Correct. > >>> > There is cpu->thread_id so perhaps cpu->thread_ctx could be added as > well. That would require a bitmap of used threads contexts rather than a > counter, though. > >>> > >>> Or don't free the context at all, but re-use it when incrementing and > >>> tcg_ctxs[n] != null (i.e. plugging in a repacement vcpu). After all, > >>> there > >>> can only be tcg_max_ctxs contexts. > >> > >> But you would not know which contexts are in use and which aren't without > >> tracking the association of contexts to CPUs. > >> > >> Unless there is a cpu array somewhere and you can use the same index for > >> both to create the association. > > > > I tried to use cpu_index for correlating the tcg_ctx with cpu. I added > > some asserts that only null contexts are allocated and non-null contexts > > released but qemu crashes somewhere in tcg sometime after the guest gets > > to switch root. > > Since this isn't trivial and is a long standing issue, let's not > block the 8.2 release with it. Sounds good. Thanks, Stefan
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
On Wed, Dec 06, 2023 at 04:04:43PM +0100, Paolo Bonzini wrote: > On Wed, Dec 6, 2023 at 3:46 PM Michael Roth wrote: > > > There is no need to check cr0_old or sev_es_enabled(); EFER.LMA is > > > simply EFER.LME && CR0.PG. > > > > Yah, I originally had it like that, but svm_set_cr0() in the kernel only > > sets it in vcpu->arch.efer it when setting CR0.PG, so I thought it might > > be safer to be more selective and mirror that handling on the QEMU side > > so we can leave as much of any other sanity checks on kernel/QEMU side > > intact as possible. E.g., if some other bug in the kernel ends up > > unsetting EFER.LMA while paging is still enabled, we'd still notice that > > when passing it back in via KVM_SET_SREGS*. > > > > But agree it's simpler to just always set it based on CR0.PG and EFER.LMA > > and can send a v3 if that's preferred. > > Yeah, in this case I think the chance of something breaking is really, > really small. > > The behavior of svm_set_cr0() is more due to how the surrounding code > looks like, than anything else. Ok, seems reasonable. I'll plan to send a v3 with the old_cr0 stuff dropped. > > > > Alternatively, sev_es_enabled() could be an assertion, that is: > > > > > > if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK) && > > >!(env->efer & MSR_EFER_LMA)) { > > > /* Workaround for... */ > > > assert(sev_es_enabled()); > > > env->efer |= MSR_EFER_LMA; > > > } > > > > > > What do you think? > > > > I'm a little apprehensive about this approach for similar reasons as > > above > > I agree on this. I think it's worth in general to have clear > expectations, though. If you think it's worrisome, we can commit it > without assertion now and add it in 9.0. I think that seems like a good approach. That would give us more time to discuss the fix/handling on the kernel side, and then as a follow-up we can tighten down the QEMU handling/expectations based on that. Thanks, Mike > > Paolo > >
Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()
On Wed, Dec 06, 2023 at 01:17:08PM -0100, Miguel Luis wrote: > Hi! > > On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote: > > Unplugging vCPU triggers the following assertion in > > tcg_register_thread(): > > > > 796 void tcg_register_thread(void) > > 797 { > > ... > > 812 /* Claim an entry in tcg_ctxs */ > > 813 n = qatomic_fetch_inc(&tcg_cur_ctxs); > > 814 g_assert(n < tcg_max_ctxs); > > > > Implement and use tcg_unregister_thread() so when a > > vCPU is unplugged, the tcg_cur_ctxs refcount is > > decremented. > > > I've had addressed this issue before (posted at [1]) and had exercised > it with vCPU hotplug/unplug patches for ARM although I was not sure about what > would be needed to be done regarding plugins on the context of > tcg_unregister_thread. I guess they would need to be freed also? Doesn't it have the same problem that it will randomly free some context which is not necessarily associated with the unplugged CPU? Consider machine with 4 CPUs, they are likely added in order - cpu0 getting context0, cpu1 context1, etc. Unplug CPU 1. Given that context 3 is top the would be unallocated by the decrement, or am I missing something? Thanks Michal > > Thank you > > Miguel > > [1]: > https://lore.kernel.org/qemu-devel/20230926103654.34424-5-salil.me...@huawei.com/
Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()
Miguel Luis writes: > Hi! > > On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote: >> Unplugging vCPU triggers the following assertion in >> tcg_register_thread(): >> >> 796 void tcg_register_thread(void) >> 797 { >> ... >> 812 /* Claim an entry in tcg_ctxs */ >> 813 n = qatomic_fetch_inc(&tcg_cur_ctxs); >> 814 g_assert(n < tcg_max_ctxs); >> >> Implement and use tcg_unregister_thread() so when a >> vCPU is unplugged, the tcg_cur_ctxs refcount is >> decremented. > > > I've had addressed this issue before (posted at [1]) and had exercised > it with vCPU hotplug/unplug patches for ARM although I was not sure about what > would be needed to be done regarding plugins on the context of > tcg_unregister_thread. I guess they would need to be freed also? Good catch. They should indeed. > > >> Reported-by: Michal Suchánek >> Suggested-by: Stefan Hajnoczi >> Signed-off-by: Philippe Mathieu-Daudé >> --- >> void tcg_register_thread(void) >> { >> TCGContext *s = g_malloc(sizeof(*s)); >> @@ -814,6 +821,16 @@ void tcg_register_thread(void) >> >> tcg_ctx = s; >> } >> + >> +void tcg_unregister_thread(void) >> +{ >> +unsigned int n; >> + >> +n = qatomic_fetch_dec(&tcg_cur_ctxs); >> +g_free(tcg_ctxs[n]); Perhaps a bit of re-factoring and we could have a tcg_alloc_context and tcg_free_context to keep everything together? > > > Is the above off-by-one? > > >> +qatomic_set(&tcg_ctxs[n], NULL); >> +} >> + > > Thank you > > Miguel > > [1]: > https://lore.kernel.org/qemu-devel/20230926103654.34424-5-salil.me...@huawei.com/ > > >> #endif /* !CONFIG_USER_ONLY */ >> >> /* pool based memory allocation */ -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()
> On 6 Dec 2023, at 14:25, Michal Suchánek wrote: > > On Wed, Dec 06, 2023 at 01:17:08PM -0100, Miguel Luis wrote: >> Hi! >> >> On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote: >>> Unplugging vCPU triggers the following assertion in >>> tcg_register_thread(): >>> >>> 796 void tcg_register_thread(void) >>> 797 { >>> ... >>> 812 /* Claim an entry in tcg_ctxs */ >>> 813 n = qatomic_fetch_inc(&tcg_cur_ctxs); >>> 814 g_assert(n < tcg_max_ctxs); >>> >>> Implement and use tcg_unregister_thread() so when a >>> vCPU is unplugged, the tcg_cur_ctxs refcount is >>> decremented. >> >> >> I've had addressed this issue before (posted at [1]) and had exercised >> it with vCPU hotplug/unplug patches for ARM although I was not sure about >> what >> would be needed to be done regarding plugins on the context of >> tcg_unregister_thread. I guess they would need to be freed also? > > Doesn't it have the same problem that it will randomly free some context > which is not necessarily associated with the unplugged CPU? > > Consider machine with 4 CPUs, they are likely added in order - cpu0 > getting context0, cpu1 context1, etc. > > Unplug CPU 1. Given that context 3 is top the would be unallocated by > the decrement, or am I missing something? > I think you’re right and I share of the same opinion that matching a tcg thread to a vCPU would be handy to solve this and maybe sorting tcg_ctxs after unregistering the thread. Thank you Miguel > Thanks > > Michal > >> >> Thank you >> >> Miguel >> >> [1]: >> https://lore.kernel.org/qemu-devel/20230926103654.34424-5-salil.me...@huawei.com/
Re: [PATCH qemu v3 02/20] Fixing the basic functionality of STM32 timers
~lbryndza writes: > From: Lucjan Bryndza > > The current implementation of timers does not work properly > even in basic functionality. A counter configured to report > an interrupt every 10ms reports the first interrupts after a > few seconds. There are also no properly implemented count up an > count down modes. This commit fixes bugs with interrupt > reporting and implements the basic modes of the counter's > time-base block. > > Remove wrong qemu timer implementation I suspect this breaks bisectability of the series. Each point in the series should still be able to compile and at least function as well as it did before. So in this case I think this patch needs to be merged with the patch that brings in the replacement functionality. > > Signed-off-by: Lucjan Bryndza > --- > hw/timer/stm32f2xx_timer.c | 55 -- > 1 file changed, 5 insertions(+), 50 deletions(-) > > diff --git a/hw/timer/stm32f2xx_timer.c b/hw/timer/stm32f2xx_timer.c > index ba8694dcd3..f03f594a17 100644 > --- a/hw/timer/stm32f2xx_timer.c > +++ b/hw/timer/stm32f2xx_timer.c > @@ -23,12 +23,17 @@ > */ > > #include "qemu/osdep.h" > +#include "qapi/error.h" > #include "hw/irq.h" > #include "hw/qdev-properties.h" > #include "hw/timer/stm32f2xx_timer.h" > #include "migration/vmstate.h" > #include "qemu/log.h" > #include "qemu/module.h" > +#include "qemu/typedefs.h" > +#include "qemu/timer.h" > +#include "qemu/main-loop.h" > +#include "sysemu/dma.h" Seems odd to increase the includes needed when the rest of the patch just deletes code. -- Alex Bennée Virtualisation Tech Lead @ Linaro
[PATCH v3 for-8.2] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it exposed a long-running bug in current KVM support for SEV-ES where the kernel assumes that MSR_EFER_LMA will be set explicitly by the guest kernel, in which case EFER write traps would result in KVM eventually seeing MSR_EFER_LMA get set and recording it in such a way that it would be subsequently visible when accessing it via KVM_GET_SREGS/etc. However, guest kernels currently rely on MSR_EFER_LMA getting set automatically when MSR_EFER_LME is set and paging is enabled via CR0_PG_MASK. As a result, the EFER write traps don't actually expose the MSR_EFER_LMA bit, even though it is set internally, and when QEMU subsequently tries to pass this EFER value back to KVM via KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL, which is now considered fatal due to the aforementioned QEMU commit. This can be addressed by inferring the MSR_EFER_LMA bit being set when paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure the expected bits are all present in subsequent handling on the host side. Ultimately, this handling will be implemented in the host kernel, but to avoid breaking QEMU's SEV-ES support when using older host kernels, the same handling can be done in QEMU just after fetching the register values via KVM_GET_SREGS*. Implement that here. Cc: Paolo Bonzini Cc: Marcelo Tosatti Cc: Tom Lendacky Cc: Akihiko Odaki Cc: Philippe Mathieu-Daudé Cc: Lara Lazier Cc: Vitaly Kuznetsov Cc: Maxim Levitsky Cc: k...@vger.kernel.org Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") Signed-off-by: Michael Roth --- target/i386/kvm/kvm.c | 8 1 file changed, 8 insertions(+) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 11b8177eff..4ce80555b4 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -3643,6 +3643,10 @@ static int kvm_get_sregs(X86CPU *cpu) env->cr[4] = sregs.cr4; env->efer = sregs.efer; +if (sev_es_enabled() && env->efer & MSR_EFER_LME && +env->cr[0] & CR0_PG_MASK) { +env->efer |= MSR_EFER_LMA; +} /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run */ x86_update_hflags(env); @@ -3682,6 +3686,10 @@ static int kvm_get_sregs2(X86CPU *cpu) env->cr[4] = sregs.cr4; env->efer = sregs.efer; +if (sev_es_enabled() && env->efer & MSR_EFER_LME && +env->cr[0] & CR0_PG_MASK) { +env->efer |= MSR_EFER_LMA; +} env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID; -- 2.25.1
Re: [PATCH qemu 1/2] hw/arm: Add minimal support for the STM32L4x5 SoC
Alistair Francis writes: > On Mon, Nov 27, 2023 at 12:44 AM ~inesvarhol wrote: >> >> From: Inès Varhol >> >> This patch adds a new STM32L4x5 SoC, it is necessary to add support for >> the B-L475E-IOT01A board. >> The implementation is derived from the STM32F405 SoC. >> The implementation contains no peripherals, only memory regions are >> implemented. >> >> Reviewed-by: Philippe Mathieu-Daudé >> >> Signed-off-by: Arnaud Minier >> Signed-off-by: Inès Varhol >> --- >> MAINTAINERS| 8 + >> hw/arm/Kconfig | 5 + >> hw/arm/meson.build | 1 + >> hw/arm/stm32l4x5_soc.c | 277 + >> include/hw/arm/stm32l4x5_soc.h | 68 >> 5 files changed, 359 insertions(+) >> create mode 100644 hw/arm/stm32l4x5_soc.c >> create mode 100644 include/hw/arm/stm32l4x5_soc.h >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index ff1238bb98..32458d41dd 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -1122,6 +1122,14 @@ L: qemu-...@nongnu.org >> S: Maintained >> F: hw/arm/olimex-stm32-h405.c >> >> +STM32L4x5 SoC Family >> +M: Arnaud Minier >> +M: Inès Varhol >> +L: qemu-...@nongnu.org >> +S: Maintained >> +F: hw/arm/stm32l4x5_soc.c >> +F: include/hw/arm/stm32l4x5_soc.h >> + >> SmartFusion2 >> M: Subbaraya Sundeep >> M: Peter Maydell >> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig >> index 3ada335a24..d2b94d9a47 100644 >> --- a/hw/arm/Kconfig >> +++ b/hw/arm/Kconfig >> @@ -448,6 +448,11 @@ config STM32F405_SOC >> select STM32F4XX_SYSCFG >> select STM32F4XX_EXTI >> >> +config STM32L4X5_SOC >> +bool >> +select ARM_V7M >> +select OR_IRQ >> + >> config XLNX_ZYNQMP_ARM >> bool >> default y if PIXMAN >> diff --git a/hw/arm/meson.build b/hw/arm/meson.build >> index 68245d3ad1..9766da10c4 100644 >> --- a/hw/arm/meson.build >> +++ b/hw/arm/meson.build >> @@ -42,6 +42,7 @@ arm_ss.add(when: 'CONFIG_RASPI', if_true: >> files('bcm2836.c', 'raspi.c')) >> arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c')) >> arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c')) >> arm_ss.add(when: 'CONFIG_STM32F405_SOC', if_true: files('stm32f405_soc.c')) >> +arm_ss.add(when: 'CONFIG_STM32L4X5_SOC', if_true: files('stm32l4x5_soc.c')) >> arm_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp.c', >> 'xlnx-zcu102.c')) >> arm_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files('xlnx-versal.c', >> 'xlnx-versal-virt.c')) >> arm_ss.add(when: 'CONFIG_FSL_IMX25', if_true: files('fsl-imx25.c', >> 'imx25_pdk.c')) >> diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c >> new file mode 100644 >> index 00..f476878b2c >> --- /dev/null >> +++ b/hw/arm/stm32l4x5_soc.c >> @@ -0,0 +1,277 @@ >> +/* >> + * STM32L4x5 SoC family >> + * >> + * SPDX-License-Identifier: MIT > > I'm pretty sure this must be GPL to be accepted Does it? A quick grep of the code shows we have quite a lot of hw emulation files that are MIT licensed. Although IANAL MIT is very permissive and easily combined with GPL as long at the final product is under GPL. I think there is some LGPL code about for TCG but I'm not sure how well tagged that is. -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test
On 12/5/23 12:40, Alex Bennée wrote: There are a number of bugs against 32 bit x86 on the tracker. Lets at least establish a baseline pure kernel boot can do record/replay before we start looking at the devices. Signed-off-by: Alex Bennée --- tests/avocado/replay_kernel.py | 16 1 file changed, 16 insertions(+) Acked-by: Richard Henderson r~
Re: [PATCH 02/11] tests/avocado: fix typo in replay_linux
On 12/5/23 12:40, Alex Bennée wrote: Signed-off-by: Alex Bennée --- tests/avocado/replay_linux.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Richard Henderson r~
Re: [PATCH 1/4] scsi: only access SCSIDevice->requests from one thread
On Tue, 5 Dec 2023 at 05:01, Kevin Wolf wrote: > > Am 04.12.2023 um 17:30 hat Stefan Hajnoczi geschrieben: > > On Fri, Dec 01, 2023 at 05:03:13PM +0100, Kevin Wolf wrote: > > > Am 23.11.2023 um 20:49 hat Stefan Hajnoczi geschrieben: > > > > Stop depending on the AioContext lock and instead access > > > > SCSIDevice->requests from only one thread at a time: > > > > - When the VM is running only the BlockBackend's AioContext may access > > > > the requests list. > > > > - When the VM is stopped only the main loop may access the requests > > > > list. > > > > > > > > These constraints protect the requests list without the need for locking > > > > in the I/O code path. > > > > > > > > Note that multiple IOThreads are not supported yet because the code > > > > assumes all SCSIRequests are executed from a single AioContext. Leave > > > > that as future work. > > > > > > > > Signed-off-by: Stefan Hajnoczi > > > > --- > > > > include/hw/scsi/scsi.h | 7 +- > > > > hw/scsi/scsi-bus.c | 174 - > > > > 2 files changed, 124 insertions(+), 57 deletions(-) > > > > > > > > diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h > > > > index 3692ca82f3..10c4e8288d 100644 > > > > --- a/include/hw/scsi/scsi.h > > > > +++ b/include/hw/scsi/scsi.h > > > > @@ -69,14 +69,19 @@ struct SCSIDevice > > > > { > > > > DeviceState qdev; > > > > VMChangeStateEntry *vmsentry; > > > > -QEMUBH *bh; > > > > uint32_t id; > > > > BlockConf conf; > > > > SCSISense unit_attention; > > > > bool sense_is_ua; > > > > uint8_t sense[SCSI_SENSE_BUF_SIZE]; > > > > uint32_t sense_len; > > > > + > > > > +/* > > > > + * The requests list is only accessed from the AioContext that > > > > executes > > > > + * requests or from the main loop when IOThread processing is > > > > stopped. > > > > + */ > > > > QTAILQ_HEAD(, SCSIRequest) requests; > > > > + > > > > uint32_t channel; > > > > uint32_t lun; > > > > int blocksize; > > > > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c > > > > index fc4b77fdb0..b8bfde9565 100644 > > > > --- a/hw/scsi/scsi-bus.c > > > > +++ b/hw/scsi/scsi-bus.c > > > > @@ -85,6 +85,82 @@ SCSIDevice *scsi_device_get(SCSIBus *bus, int > > > > channel, int id, int lun) > > > > return d; > > > > } > > > > > > > > +/* > > > > + * Invoke @fn() for each enqueued request in device @s. Must be called > > > > from the > > > > + * main loop thread while the guest is stopped. This is only suitable > > > > for > > > > + * vmstate ->put(), use scsi_device_for_each_req_async() for other > > > > cases. > > > > + */ > > > > +static void scsi_device_for_each_req_sync(SCSIDevice *s, > > > > + void (*fn)(SCSIRequest *, > > > > void *), > > > > + void *opaque) > > > > +{ > > > > +SCSIRequest *req; > > > > +SCSIRequest *next_req; > > > > + > > > > +assert(!runstate_is_running()); > > > > +assert(qemu_in_main_thread()); > > > > + > > > > +QTAILQ_FOREACH_SAFE(req, &s->requests, next, next_req) { > > > > +fn(req, opaque); > > > > +} > > > > +} > > > > + > > > > +typedef struct { > > > > +SCSIDevice *s; > > > > +void (*fn)(SCSIRequest *, void *); > > > > +void *fn_opaque; > > > > +} SCSIDeviceForEachReqAsyncData; > > > > + > > > > +static void scsi_device_for_each_req_async_bh(void *opaque) > > > > +{ > > > > +g_autofree SCSIDeviceForEachReqAsyncData *data = opaque; > > > > +SCSIDevice *s = data->s; > > > > +SCSIRequest *req; > > > > +SCSIRequest *next; > > > > + > > > > +/* > > > > + * It is unlikely that the AioContext will change before this BH > > > > is called, > > > > + * but if it happens then ->requests must not be accessed from this > > > > + * AioContext. > > > > + */ > > > > > > What is the scenario where this happens? I would have expected that > > > switching the AioContext of a node involves draining the node first, > > > which would execute this BH before the context changes. > > > > I don't think aio_poll() is invoked by bdrv_drained_begin() when there > > are no requests in flight. In that case the BH could remain pending > > across bdrv_drained_begin()/bdrv_drained_end(). > > Hm, I wonder if that is actually a bug. Without executing pending BHs, > you can't be sure that nothing touches the node any more. > > Before commit 5e8ac217, we always polled at least once, though I think > that was an unintentional side effect. It makes the programming model easier and safer if aio_bh_poll() is guaranteed to be called by bdrv_drained_begin(). Then I could convert this conditional into an assertion and assume it never happens. > > > The other option I see is an empty BlockBackend, which can change its > > > AioContext without polling BHs, but in that case there is no connection > > > to other users, so the only change could come from
Re: [PATCH 03/11] scripts/replay-dump: update to latest format
On 12/5/23 12:40, Alex Bennée wrote: @@ -268,6 +279,49 @@ def decode_clock(eid, name, dumpfile): Decoder(28, "EVENT_CP_RESET", decode_checkpoint), ] +# Shutdown cause added +v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction), This comment applied to the v7 changes. Otherwise, Reviewed-by: Richard Henderson r~
Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()
On Wed, Dec 06, 2023 at 03:49:28PM +, Miguel Luis wrote: > > > > On 6 Dec 2023, at 14:25, Michal Suchánek wrote: > > > > On Wed, Dec 06, 2023 at 01:17:08PM -0100, Miguel Luis wrote: > >> Hi! > >> > >> On 04/12/2023 18:40, Philippe Mathieu-Daudé wrote: > >>> Unplugging vCPU triggers the following assertion in > >>> tcg_register_thread(): > >>> > >>> 796 void tcg_register_thread(void) > >>> 797 { > >>> ... > >>> 812 /* Claim an entry in tcg_ctxs */ > >>> 813 n = qatomic_fetch_inc(&tcg_cur_ctxs); > >>> 814 g_assert(n < tcg_max_ctxs); > >>> > >>> Implement and use tcg_unregister_thread() so when a > >>> vCPU is unplugged, the tcg_cur_ctxs refcount is > >>> decremented. > >> > >> > >> I've had addressed this issue before (posted at [1]) and had exercised > >> it with vCPU hotplug/unplug patches for ARM although I was not sure about > >> what > >> would be needed to be done regarding plugins on the context of > >> tcg_unregister_thread. I guess they would need to be freed also? > > > > Doesn't it have the same problem that it will randomly free some context > > which is not necessarily associated with the unplugged CPU? > > > > Consider machine with 4 CPUs, they are likely added in order - cpu0 > > getting context0, cpu1 context1, etc. > > > > Unplug CPU 1. Given that context 3 is top the would be unallocated by > > the decrement, or am I missing something? > > > > I think you’re right and I share of the same opinion that matching a tcg > thread > to a vCPU would be handy to solve this and maybe sorting tcg_ctxs after > unregistering the thread. Tried to apply the patch. It does not crash right away, and due to virsh limitation I get only one (8-thread) core to hotplug so it did survive a few hotplug cycles. After a while of hotplugging it crashed, anyway. Given the atomic_dec there is probably no expectation that the processing is sequential. Thanks Michal
Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
On 11/30/23 05:33, Peter Zijlstra wrote: > On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote: > >> Kernel Lockdown >> --- >> >> But, we must provide at least some security in V2. Otherwise, it is useless. >> >> So, we have implemented what we call a kernel lockdown. At the end of kernel >> boot, Heki establishes permissions in the extended page table as mentioned >> before. Also, it adds an immutable attribute for kernel text and kernel RO >> data. >> Beyond that point, guest requests that attempt to modify permissions on any >> of >> the immutable pages will be denied. >> >> This means that features like FTrace and KProbes will not work on kernel text >> in V2. This is a temporary limitation. Once authentication is in place, the >> limitation will go away. > > So either you're saying your patch 17 / text_poke is broken (so why > include it ?!?) or your statement above is incorrect. Pick one. > It has been included so that people can be aware of the changes. I will remove the text_poke() changes from the patchset and send it later when I have some authentication in place. It will make sense then. > >> __text_poke() >> This function is called by various features to patch text. >> This calls heki_text_poke_start() and heki_text_poke_end(). >> >> heki_text_poke_start() is called to add write permissions to the >> extended page table so that text can be patched. heki_text_poke_end() >> is called to revert write permissions in the extended page table. > > This, if text_poke works, then static_call / jump_label / ftrace and > everything else should work, they all rely on this. > > >> Peter mentioned the following: >> >> "if you want to mirror the native PTEs why don't you hook into the >> paravirt page-table muck and get all that for free?" >> >> We did consider using a shadow page table kind of approach so that guest >> page table >> modifications can be intercepted and reflected in the page table entry. We >> did not >> do this for two reasons: >> >> - there are bits in the page table entry that are not permission bits. We >> would like >> the guest kernel to be able to modify them directly. > > This statement makes no sense. > >> - we cannot tell a genuine request from an attack. > > Why not? How is an explicit call different from an explicit call in a > paravirt hook? > >>From a maintenance pov we already hate paravirt with a passion, but it > is ever so much better than sprinkling yet another pile of crap > (heki_*) around. I only said that the idea was considered. We can resume the discussion on this topic when I send the text_poke() changes in a later version of the Heki patchset. Madhavan
Re: [PATCH 04/11] scripts/replay_dump: track total number of instructions
On 12/5/23 12:40, Alex Bennée wrote: This will help in tracking where we are in the stream when debugging. Signed-off-by: Alex Bennée --- scripts/replay-dump.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) Reviewed-by: Richard Henderson r~
Re: [PATCH 05/11] replay: remove host_clock_last
On 12/5/23 12:41, Alex Bennée wrote: Fixes: a02fe2ca70 (replay: Remove host_clock_last) Signed-off-by: Alex Bennée --- replay/replay-internal.h | 2 -- 1 file changed, 2 deletions(-) Reviewed-by: Richard Henderson r~
[v2 1/4] crypto: Introduce option and structure for detached LUKS header
Add the "header" option for the LUKS format. This field would be used to identify the blockdev's position where a detachable LUKS header is stored. In addition, introduce header field in struct BlockCrypto Signed-off-by: Hyman Huang --- block/crypto.c | 1 + qapi/block-core.json | 6 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index 921933a5e5..f82b13d32b 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -39,6 +39,7 @@ typedef struct BlockCrypto BlockCrypto; struct BlockCrypto { QCryptoBlock *block; bool updating_keys; +BdrvChild *header; /* Reference to the detached LUKS header */ }; diff --git a/qapi/block-core.json b/qapi/block-core.json index ca390c5700..10be08d08f 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3352,11 +3352,15 @@ # decryption key (since 2.6). Mandatory except when doing a # metadata-only probe of the image. # +# @header: optional reference to the location of a blockdev +# storing a detached LUKS header. (since 9.0) +# # Since: 2.9 ## { 'struct': 'BlockdevOptionsLUKS', 'base': 'BlockdevOptionsGenericFormat', - 'data': { '*key-secret': 'str' } } + 'data': { '*key-secret': 'str', +'*header': 'BlockdevRef'} } ## # @BlockdevOptionsGenericCOWFormat: -- 2.39.1
[v2 4/4] block: Support detached LUKS header creation for blockdev-create
Provide the "detached-mode" option for detached LUKS header formatting. To format the LUKS header on the pre-creating disk, example as follows: 1. add a protocol blockdev node of LUKS header $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/cipher.gluks" }}' 2. add the secret for encrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type": "secret", "id": > "libvirt-1-storage-secret0", "data": "abc123"}}' 3. format the disk node $ virsh qemu-monitor-command vm '{"execute":"blockdev-create", > "arguments":{"job-id":"job0", "options":{"driver":"luks", > "size":0, "file":"libvirt-1-storage", "detached-mode":true, > "cipher-alg":"aes-256", > "key-secret":"libvirt-3-storage-encryption-secret0"}}}' Signed-off-by: Hyman Huang --- block/crypto.c | 8 +++- qapi/block-core.json | 5 - 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 7d70349463..e77c49bd0c 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -667,10 +667,12 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) BlockDriverState *bs = NULL; QCryptoBlockCreateOptions create_opts; PreallocMode preallocation = PREALLOC_MODE_OFF; +int64_t size; int ret; assert(create_options->driver == BLOCKDEV_DRIVER_LUKS); luks_opts = &create_options->u.luks; +size = luks_opts->size; bs = bdrv_co_open_blockdev_ref(luks_opts->file, errp); if (bs == NULL) { @@ -686,7 +688,11 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp) preallocation = luks_opts->preallocation; } -ret = block_crypto_co_create_generic(bs, luks_opts->size, &create_opts, +if (luks_opts->detached_mode) { +size = 0; +} + +ret = block_crypto_co_create_generic(bs, size, &create_opts, preallocation, errp); if (ret < 0) { goto fail; diff --git a/qapi/block-core.json b/qapi/block-core.json index 10be08d08f..1e7a7e1b05 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -4952,13 +4952,16 @@ # @preallocation: Preallocation mode for the new image (since: 4.2) # (default: off; allowed values: off, metadata, falloc, full) # +# @detached-mode: create a detached LUKS header. (since 9.0) +# # Since: 2.12 ## { 'struct': 'BlockdevCreateOptionsLUKS', 'base': 'QCryptoBlockCreateOptionsLUKS', 'data': { 'file': 'BlockdevRef', 'size': 'size', -'*preallocation': 'PreallocMode' } } +'*preallocation': 'PreallocMode', +'*detached-mode': 'bool'}} ## # @BlockdevCreateOptionsNfs: -- 2.39.1
[v2 3/4] crypto: Support generic LUKS encryption
By enhancing the LUKS driver, it is possible to enable the detachable LUKS header and, as a result, achieve general encryption for any disk format that QEMU has supported. Take the qcow2 as an example, the usage of the generic LUKS encryption as follows: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/test_disk.qcow2"}}' 2. add a protocol blockdev node of LUKS header as above. $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-storage", "driver":"file", > "filename": "/path/to/cipher.gluks" }}' 3. add the secret for decrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type":"secret", "id": > "libvirt-2-storage-secret0", "data":"abc123"}}' 4. add the qcow2-drived blockdev format node $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2", > "file":"libvirt-1-storage"}}' 5. add the luks-drived blockdev to link the qcow2 disk with LUKS header by specifying the field "header" $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-format", "driver":"luks", > "file":"libvirt-1-format", "header":"libvirt-2-storage", > "key-secret":"libvirt-2-format-secret0"}}' 6. add the virtio-blk device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", > "arguments": {"num-queues":"1", "driver":"virtio-blk-pci", > "drive": "libvirt-2-format", "id":"virtio-disk2"}}' The generic LUKS encryption method of starting a virtual machine (VM) is somewhat similar to hot-plug in that both maintaining the same json command while the starting VM changes the "blockdev-add/device_add" parameters to "blockdev/device". Signed-off-by: Hyman Huang --- block/crypto.c | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index f82b13d32b..7d70349463 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -40,6 +40,7 @@ struct BlockCrypto { QCryptoBlock *block; bool updating_keys; BdrvChild *header; /* Reference to the detached LUKS header */ +bool detached_mode; /* If true, LUKS plays a detached header role */ }; @@ -64,12 +65,16 @@ static int block_crypto_read_func(QCryptoBlock *block, Error **errp) { BlockDriverState *bs = opaque; +BlockCrypto *crypto = bs->opaque; ssize_t ret; GLOBAL_STATE_CODE(); GRAPH_RDLOCK_GUARD_MAINLOOP(); -ret = bdrv_pread(bs->file, offset, buflen, buf, 0); +if (crypto->detached_mode) +ret = bdrv_pread(crypto->header, offset, buflen, buf, 0); +else +ret = bdrv_pread(bs->file, offset, buflen, buf, 0); if (ret < 0) { error_setg_errno(errp, -ret, "Could not read encryption header"); return ret; @@ -269,6 +274,8 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, QCryptoBlockOpenOptions *open_opts = NULL; unsigned int cflags = 0; QDict *cryptoopts = NULL; +const char *header_bdref = +qdict_get_try_str(options, "header"); GLOBAL_STATE_CODE(); @@ -277,6 +284,16 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, return ret; } +if (header_bdref) { +crypto->detached_mode = true; +crypto->header = bdrv_open_child(NULL, options, "header", bs, + &child_of_bds, BDRV_CHILD_METADATA, + false, errp); +if (!crypto->header) { +return -EINVAL; +} +} + GRAPH_RDLOCK_GUARD_MAINLOOP(); bs->supported_write_flags = BDRV_REQ_FUA & @@ -312,6 +329,14 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, goto cleanup; } +if (crypto->detached_mode) { +/* + * Set payload offset to zero as the file bdref has no LUKS + * header under detached mode. + */ +qcrypto_block_set_payload_offset(crypto->block, 0); +} + bs->encrypted = true; ret = 0; @@ -903,6 +928,17 @@ block_crypto_child_perms(BlockDriverState *bs, BdrvChild *c, BlockCrypto *crypto = bs->opaque; +if (role == (role & BDRV_CHILD_METADATA)) { +/* Assign read permission only */ +perm |= BLK_PERM_CONSISTENT_READ; +/* Share all permissions */ +shared |= BLK_PERM_ALL; + +*nperm = perm; +*nshared = shared; +return; +} + bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared); /* -- 2.39.1
[v2 2/4] crypto: Introduce payload offset set function
Signed-off-by: Hyman Huang --- crypto/block.c | 4 include/crypto/block.h | 1 + 2 files changed, 5 insertions(+) diff --git a/crypto/block.c b/crypto/block.c index 7bb4b74a37..3dcf22a69f 100644 --- a/crypto/block.c +++ b/crypto/block.c @@ -319,6 +319,10 @@ QCryptoHashAlgorithm qcrypto_block_get_kdf_hash(QCryptoBlock *block) return block->kdfhash; } +void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset) +{ +block->payload_offset = offset; +} uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block) { diff --git a/include/crypto/block.h b/include/crypto/block.h index 4f63a37872..b47a90c529 100644 --- a/include/crypto/block.h +++ b/include/crypto/block.h @@ -312,4 +312,5 @@ void qcrypto_block_free(QCryptoBlock *block); G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free) +void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset); #endif /* QCRYPTO_BLOCK_H */ -- 2.39.1
[v2 0/4] Support generic Luks encryption
v2: - Simplify the design by reusing the LUKS driver to implement the generic Luks encryption, thank Daniel for the insightful advice. - rebase on master. This functionality was motivated by the following to-do list seen in crypto documents: https://wiki.qemu.org/Features/Block/Crypto The last chapter says we should "separate header volume": The LUKS format has ability to store the header in a separate volume from the payload. We should extend the LUKS driver in QEMU to support this use case. By enhancing the LUKS driver, it is possible to enable the detachable LUKS header and, as a result, achieve general encryption for any disk format that QEMU has supported. Take the qcow2 as an example, the usage of the generic LUKS encryption as follows: 1. add a protocol blockdev node of data disk $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-storage", "driver":"file", > "filename":"/path/to/test_disk.qcow2"}}' 2. add a protocol blockdev node of LUKS header as above. $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-storage", "driver":"file", > "filename": "/path/to/cipher.gluks" }}' 3. add the secret for decrypting the cipher stored in LUKS header above $ virsh qemu-monitor-command vm '{"execute":"object-add", > "arguments":{"qom-type":"secret", "id": > "libvirt-2-storage-secret0", "data":"abc123"}}' 4. add the qcow2-drived blockdev format node $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-1-format", "driver":"qcow2", > "file":"libvirt-1-storage"}}' 5. add the luks-drived blockdev to link the qcow2 disk with LUKS header by specifying the field "header" $ virsh qemu-monitor-command vm '{"execute":"blockdev-add", > "arguments":{"node-name":"libvirt-2-format", "driver":"luks", > "file":"libvirt-1-format", "header":"libvirt-2-storage", > "key-secret":"libvirt-2-format-secret0"}}' 6. add the virtio-blk device finally $ virsh qemu-monitor-command vm '{"execute":"device_add", > "arguments": {"num-queues":"1", "driver":"virtio-blk-pci", > "drive": "libvirt-2-format", "id":"virtio-disk2"}}' The generic LUKS encryption method of starting a virtual machine (VM) is somewhat similar to hot-plug in that both maintaining the same json command while the starting VM changes the "blockdev-add/device_add" parameters to "blockdev/device". Please review, thanks Best regared, Yong Hyman Huang (4): crypto: Introduce option and structure for detached LUKS header crypto: Introduce payload offset set function crypto: Support generic LUKS encryption block: Support detached LUKS header creation for blockdev-create block/crypto.c | 47 -- crypto/block.c | 4 include/crypto/block.h | 1 + qapi/block-core.json | 11 -- 4 files changed, 59 insertions(+), 4 deletions(-) -- 2.39.1
Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
On 11/30/23 18:45, Edgecombe, Rick P wrote: > On Wed, 2023-11-29 at 15:07 -0600, Madhavan T. Venkataraman wrote: >> Threat Model >> >> >> In the threat model in Heki, the attacker is a user space attacker >> who exploits >> a kernel vulnerability to gain more privileges or bypass the kernel's >> access >> control and self-protection mechanisms. >> >> In the context of the guest page table, one of the things that the >> threat model translates >> to is a hacker gaining access to a guest page with RWX permissions. >> E.g., by adding execute >> permissions to a writable page or by adding write permissions to an >> executable page. >> >> Today, the permissions for a guest page in the extended page table >> are RWX by >> default. So, if a hacker manages to establish RWX for a page in the >> guest page >> table, then that is all he needs to do some damage. > > I had a few random comments from watching the plumbers talk online: > > Is there really a big difference between a page that is RWX, and a RW > page that is about to become RX? I realize that there is an addition of > timing, but when executable code is getting loaded it can be written to > then and later executed. I think that gap could be addressed in two > different ways, both pretty difficult: > 1. Verifying the loaded code before it gets marked > executable. This is difficult because the kernel does lots of > tweaks on the code it is loading (alternatives, etc). It can't > just check a signature. > 2. Loading the code in a protected environment. In this model the > (for example) module signature would be checked, then the code > would be loaded in some sort of protected environment. This way > integrity of the loaded code would be enforced. But extracting > module loading into a separate domain would be difficult. > Various scattered features all have their hands in the loading. > > Secondly, I wonder if another way to look at the memory parts of HEKI > could be that this is a way to protect certain page table bits from > stay writes. The RWX bits in the EPT are not directly writable, so more > steps are needed to change things than just a stray write (instead the > helpers involved in the operations need to be called). If that is a > fair way of looking at it, then I wonder how HEKI compares to a > solution like this security-wise: > https://lore.kernel.org/lkml/20210830235927.6443-1-rick.p.edgeco...@intel.com/ > > Functional-wise it had the benefit of working on bare metal and > supporting the normal kernel features. Thanks for the comments. I will think about what you have said and will respond soon. Madhavan
Re: [PATCH 07/11] replay: make has_unread_data a bool
On 12/5/23 12:41, Alex Bennée wrote: For clarity given it only has two states. Signed-off-by: Alex Bennée --- replay/replay-internal.h | 4 ++-- replay/replay-internal.c | 4 ++-- replay/replay-snapshot.c | 6 +++--- replay/replay.c | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) Reviewed-by: Richard Henderson r~
[PATCH] meson: sort C warning flags alphabetically
When scanning the list of warning flags to see if one is present, it is helpful if they are in alphabetical order. It is further helpful to separate out the 'no-' prefixed warnings. Signed-off-by: Daniel P. Berrangé --- The diff looks horrendous, so look at the resulting meson.build to see the benefit. meson.build | 40 +--- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/meson.build b/meson.build index d2c4c2adb3..5c884cf7ce 100644 --- a/meson.build +++ b/meson.build @@ -433,36 +433,38 @@ endif add_global_arguments(qemu_common_flags, native: false, language: all_languages) add_global_link_arguments(qemu_ldflags, native: false, language: all_languages) -# Collect warnings that we want to enable - +# Collect warning flags we want to set, sorted alphabetically warn_flags = [ - '-Wundef', - '-Wwrite-strings', - '-Wmissing-prototypes', - '-Wstrict-prototypes', - '-Wredundant-decls', - '-Wold-style-declaration', - '-Wold-style-definition', - '-Wtype-limits', - '-Wformat-security', - '-Wformat-y2k', - '-Winit-self', - '-Wignored-qualifiers', + # First enable interesting warnings '-Wempty-body', - '-Wnested-externs', '-Wendif-labels', '-Wexpansion-to-defined', + '-Wformat-security', + '-Wformat-y2k', + '-Wignored-qualifiers', '-Wimplicit-fallthrough=2', + '-Winit-self', '-Wmissing-format-attribute', + '-Wmissing-prototypes', + '-Wnested-externs', + '-Wold-style-declaration', + '-Wold-style-definition', + '-Wredundant-decls', + '-Wshadow=local', + '-Wstrict-prototypes', + '-Wtype-limits', + '-Wundef', + '-Wwrite-strings', + + # Then disable some undesirable warnings + '-Wno-gnu-variable-sized-type-not-at-end', '-Wno-initializer-overrides', '-Wno-missing-include-dirs', + '-Wno-psabi', '-Wno-shift-negative-value', '-Wno-string-plus-int', - '-Wno-typedef-redefinition', '-Wno-tautological-type-limit-compare', - '-Wno-psabi', - '-Wno-gnu-variable-sized-type-not-at-end', - '-Wshadow=local', + '-Wno-typedef-redefinition', ] if targetos != 'darwin' -- 2.43.0
Re: [PATCH 08/11] replay: introduce a central report point for sync errors
On 12/5/23 12:41, Alex Bennée wrote: Figuring out why replay has failed is tricky at the best of times. Lets centralise the reporting of a replay sync error and add a little bit of extra information to help with debugging. Signed-off-by: Alex Bennée --- replay/replay-internal.h | 12 replay/replay-char.c | 6 ++ replay/replay-internal.c | 1 + replay/replay.c | 9 + 4 files changed, 24 insertions(+), 4 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH 08/11] replay: introduce a central report point for sync errors
On 12/6/23 03:35, Philippe Mathieu-Daudé wrote: Hi Alex, On 5/12/23 21:41, Alex Bennée wrote: Figuring out why replay has failed is tricky at the best of times. Lets centralise the reporting of a replay sync error and add a little bit of extra information to help with debugging. Signed-off-by: Alex Bennée --- replay/replay-internal.h | 12 replay/replay-char.c | 6 ++ replay/replay-internal.c | 1 + replay/replay.c | 9 + 4 files changed, 24 insertions(+), 4 deletions(-) diff --git a/replay/replay-internal.h b/replay/replay-internal.h index 1bc8fd5086..709e2eb4cb 100644 --- a/replay/replay-internal.h +++ b/replay/replay-internal.h @@ -74,6 +74,7 @@ enum ReplayEvents { * @cached_clock: Cached clocks values * @current_icount: number of processed instructions * @instruction_count: number of instructions until next event + * @current_event: current event index * @data_kind: current event * @has_unread_data: true if event not yet processed * @file_offset: offset into replay log at replay snapshot @@ -84,6 +85,7 @@ typedef struct ReplayState { int64_t cached_clock[REPLAY_CLOCK_COUNT]; uint64_t current_icount; int instruction_count; + unsigned int current_event; unsigned int data_kind; bool has_unread_data; uint64_t file_offset; Shouldn't this field be migrated? No, it's for diagnostic use only. r~
Re: [PATCH 09/11] replay: stop us hanging in rr_wait_io_event
On 12/5/23 12:41, Alex Bennée wrote: A lot of the hang I see are when we end up spinning in rr_wait_io_event for an event that will never come in playback. As a new check functions which can see if we are in PLAY mode and kick us us the wait function so the event can be processed. This fixes most of the failures in replay_kernel.py Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2013 Signed-off-by: Alex Bennée Cc: Pavel Dovgalyuk --- include/sysemu/replay.h | 5 + accel/tcg/tcg-accel-ops-rr.c | 2 +- replay/replay.c | 24 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h index 08aae5869f..83995ae4bd 100644 --- a/include/sysemu/replay.h +++ b/include/sysemu/replay.h @@ -70,6 +70,11 @@ int replay_get_instructions(void); /*! Updates instructions counter in replay mode. */ void replay_account_executed_instructions(void); +/** + * replay_can_wait: check if we should pause for wait-io + */ +bool replay_can_wait(void); + /* Processing clocks and other time sources */ /*! Save the specified clock */ diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c index 611932f3c3..825e35b3dc 100644 --- a/accel/tcg/tcg-accel-ops-rr.c +++ b/accel/tcg/tcg-accel-ops-rr.c @@ -109,7 +109,7 @@ static void rr_wait_io_event(void) { CPUState *cpu; -while (all_cpu_threads_idle()) { +while (all_cpu_threads_idle() && replay_can_wait()) { rr_stop_kick_timer(); qemu_cond_wait_iothread(first_cpu->halt_cond); } diff --git a/replay/replay.c b/replay/replay.c index e83c01285c..042a6a9636 100644 --- a/replay/replay.c +++ b/replay/replay.c @@ -347,6 +347,30 @@ void replay_start(void) replay_enable_events(); } +/* + * For none/record the answer is yes. + */ +bool replay_can_wait(void) +{ +if (replay_mode == REPLAY_MODE_PLAY) { +/* + * For playback we shouldn't ever be at a point we wait. If + * the instruction count has reached zero and we have an + * unconsumed event we should go around again and consume it. + */ +if (replay_state.instruction_count == 0 && replay_state.has_unread_data) { +return false; +} else { +fprintf(stderr, "Error: Invalid replay state\n"); +fprintf(stderr,"instruction_count = %d, has = %d, event_kind = %d\n", +replay_state.instruction_count, replay_state.has_unread_data, replay_state.data_kind); +abort(); error_report. r~ +} +} +return true; +} + + void replay_finish(void) { if (replay_mode == REPLAY_MODE_NONE) {
Re: [PATCH V7 3/8] hw/acpi: Update ACPI GED framework to support vCPU Hotplug
Hi Salil, On Mon, Nov 13, 2023 at 08:12:31PM +, Salil Mehta via wrote: > Date: Mon, 13 Nov 2023 20:12:31 + > From: Salil Mehta via > Subject: [PATCH V7 3/8] hw/acpi: Update ACPI GED framework to support vCPU > Hotplug > X-Mailer: git-send-email 2.8.3 > [snip] > @@ -400,6 +411,12 @@ static void acpi_ged_initfn(Object *obj) > memory_region_init_io(&ged_st->regs, obj, &ged_regs_ops, ged_st, >TYPE_ACPI_GED "-regs", ACPI_GED_REG_COUNT); > sysbus_init_mmio(sbd, &ged_st->regs); > + > +memory_region_init(&s->container_cpuhp, OBJECT(dev), "cpuhp container", > + ACPI_CPU_HOTPLUG_REG_LEN); > +sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->container_cpuhp); > +cpu_hotplug_hw_init(&s->container_cpuhp, OBJECT(dev), > +&s->cpuhp_state, 0); I find this cpu_hotplug_hw_init() can still cause qtest errors on x86 platforms as you mentioned in v6: https://lore.kernel.org/qemu-devel/15e70616-6abb-63a4-17d0-820f4a254...@opnsrc.net/T/#m108f102b2fe92b7dd7218f2f942f7b233a9d6af3 IIUC, microvm machine has its own 'possible_cpus_arch_ids' and that is inherited from its parent x86 machine. The above error is because device-introspect-test sets the none-machine: # starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3094820.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3094820.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -nodefaults -machine none -accel qtest So what about just checking mc->possible_cpu_arch_ids instead of an assert in cpu_hotplug_hw_init()? diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c index 4b24a2500361..303f1f1f57bc 100644 --- a/hw/acpi/cpu.c +++ b/hw/acpi/cpu.c @@ -221,7 +221,10 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner, const CPUArchIdList *id_list; int i; -assert(mc->possible_cpu_arch_ids); +if (!mc->possible_cpu_arch_ids) { +return; +} + id_list = mc->possible_cpu_arch_ids(machine); state->dev_count = id_list->len; state->devs = g_new0(typeof(*state->devs), state->dev_count); This check seems to be acceptable in the general code path? Not all machines have possible_cpu_arch_ids, after all. Thanks, Zhao > } > > static void acpi_ged_class_init(ObjectClass *class, void *data) > diff --git a/include/hw/acpi/generic_event_device.h > b/include/hw/acpi/generic_event_device.h > index ba84ce0214..90fc41cbb8 100644 > --- a/include/hw/acpi/generic_event_device.h > +++ b/include/hw/acpi/generic_event_device.h > @@ -60,6 +60,7 @@ > #define HW_ACPI_GENERIC_EVENT_DEVICE_H > > #include "hw/sysbus.h" > +#include "hw/acpi/cpu_hotplug.h" > #include "hw/acpi/memory_hotplug.h" > #include "hw/acpi/ghes.h" > #include "qom/object.h" > @@ -95,6 +96,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED) > #define ACPI_GED_MEM_HOTPLUG_EVT 0x1 > #define ACPI_GED_PWR_DOWN_EVT 0x2 > #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4 > +#define ACPI_GED_CPU_HOTPLUG_EVT0x8 > > typedef struct GEDState { > MemoryRegion evt; > @@ -106,6 +108,8 @@ struct AcpiGedState { > SysBusDevice parent_obj; > MemHotplugState memhp_state; > MemoryRegion container_memhp; > +CPUHotplugState cpuhp_state; > +MemoryRegion container_cpuhp; > GEDState ged_state; > uint32_t ged_event_bitmap; > qemu_irq irq; > -- > 2.34.1 > >
Re: [PATCH] meson: sort C warning flags alphabetically
On 6/12/23 17:44, Daniel P. Berrangé wrote: When scanning the list of warning flags to see if one is present, it is helpful if they are in alphabetical order. It is further helpful to separate out the 'no-' prefixed warnings. I was sure this was already done eh... Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Daniel P. Berrangé --- The diff looks horrendous, so look at the resulting meson.build to see the benefit. meson.build | 40 +--- 1 file changed, 21 insertions(+), 19 deletions(-)
Re: [PATCH v4 2/6] xen: backends: don't overwrite XenStore nodes created by toolstack
On Sat, Dec 02, 2023 at 01:41:21AM +, Volodymyr Babchuk wrote: > Xen PV devices in QEMU can be created in two ways: either by QEMU > itself, if they were passed via command line, or by Xen toolstack. In > the latter case, QEMU scans XenStore entries and configures devices > accordingly. > > In the second case we don't want QEMU to write/delete front-end > entries for two reasons: it might have no access to those entries if > it is running in un-privileged domain and it is just incorrect to > overwrite entries already provided by Xen toolstack, because toolstack > manages those nodes. For example, it might read backend- or frontend- > state to be sure that they are both disconnected and it is safe to > destroy a domain. > > This patch checks presence of xendev->backend to check if Xen PV > device was configured by Xen toolstack to decide if it should touch > frontend entries in XenStore. Also, when we need to remove XenStore > entries during device teardown only if they weren't created by Xen > toolstack. If they were created by toolstack, then it is toolstack's > job to do proper clean-up. > > Suggested-by: Paul Durrant > Suggested-by: David Woodhouse > Co-Authored-by: Oleksandr Tyshchenko > Signed-off-by: Volodymyr Babchuk > Reviewed-by: David Woodhouse > Hi Volodymyr, There's something wrong with this patch. The block backend doesn't work when creating a guest via libxl, an x86 hvm guest with qdisk. Error from guest kernel: "2 reading backend fields at /local/domain/0/backend/qdisk/23/768" It seems that "sector-size" is missing for the disk. Thanks, -- Anthony PERARD
[PATCH V7 04/12] cpus: vm_resume
Define the vm_resume helper, for use in subsequent patches. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 9 + system/cpus.c | 9 + 2 files changed, 18 insertions(+) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 867e53c..315046b 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -53,6 +53,15 @@ void vm_start(void); * @step_pending: whether any of the CPUs is about to be single-stepped by gdb */ int vm_prepare_start(bool step_pending); + +/** + * vm_resume: If @state is a live state, start the vm and set the state, + * else just set the state. + * + * @state: the state to restore + */ +void vm_resume(RunState state); + int vm_stop(RunState state); int vm_stop_force_state(RunState state); int vm_shutdown(void); diff --git a/system/cpus.c b/system/cpus.c index f162435..7d2c28b 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -748,6 +748,15 @@ void vm_start(void) } } +void vm_resume(RunState state) +{ +if (runstate_is_live(state)) { +vm_start(); +} else { +runstate_set(state); +} +} + /* does a state transition even if the VM is already stopped, current state is forgotten forever */ int vm_stop_force_state(RunState state) -- 1.8.3.1
[PATCH 05/14] cpus: check running not RUN_STATE_RUNNING
When a vm transitions from running to suspended, runstate notifiers are not called, so the notifiers still think the vm is running. Hence, when we call vm_start to restore the suspended state, we call vm_state_notify with running=1. However, some notifiers check for RUN_STATE_RUNNING. They must check the running boolean instead. No functional change. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- backends/tpm/tpm_emulator.c | 2 +- hw/usb/hcd-ehci.c | 2 +- hw/usb/redirect.c | 2 +- hw/xen/xen-hvm-common.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c index f7f1b4a..254fce7 100644 --- a/backends/tpm/tpm_emulator.c +++ b/backends/tpm/tpm_emulator.c @@ -904,7 +904,7 @@ static void tpm_emulator_vm_state_change(void *opaque, bool running, trace_tpm_emulator_vm_state_change(running, state); -if (!running || state != RUN_STATE_RUNNING || !tpm_emu->relock_storage) { +if (!running || !tpm_emu->relock_storage) { return; } diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 19b4534..10c82ce 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -2451,7 +2451,7 @@ static void usb_ehci_vm_state_change(void *opaque, bool running, RunState state) * USB-devices which have async handled packages have a packet in the * ep queue to match the completion with. */ -if (state == RUN_STATE_RUNNING) { +if (running) { ehci_advance_async_state(ehci); } diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c index c9893df..3785bb0 100644 --- a/hw/usb/redirect.c +++ b/hw/usb/redirect.c @@ -1403,7 +1403,7 @@ static void usbredir_vm_state_change(void *priv, bool running, RunState state) { USBRedirDevice *dev = priv; -if (state == RUN_STATE_RUNNING && dev->parser != NULL) { +if (running && dev->parser != NULL) { usbredirparser_do_write(dev->parser); /* Flush any pending writes */ } } diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c index 565dc39..47e6cb1 100644 --- a/hw/xen/xen-hvm-common.c +++ b/hw/xen/xen-hvm-common.c @@ -623,7 +623,7 @@ void xen_hvm_change_state_handler(void *opaque, bool running, xen_set_ioreq_server_state(xen_domid, state->ioservid, - (rstate == RUN_STATE_RUNNING)); + running); } void xen_exit_notifier(Notifier *n, void *data) -- 1.8.3.1
[PATCH V7 00/12] fix migration of suspended runstate
Migration of a guest in the suspended runstate is broken. The incoming migration code automatically tries to wake the guest, which is wrong; the guest should end migration in the same runstate it started. Further, after saving a snapshot in the suspended state and loading it, the vm_start fails. The runstate is RUNNING, but the guest is not. See the commit messages for the details. Changes in V2: * simplify "start on wakeup request" * fix postcopy, snapshot, and background migration * refactor fixes for each type of migration * explicitly handled suspended events and runstate in tests * add test for postcopy and background migration Changes in V3: * rebase to tip * fix hang in new function migrate_wait_for_dirty_mem Changes in V4: * rebase to tip * add patch for vm_prepare_start (thanks Peter) * add patch to preserve cpu ticks Changes in V5: * rebase to tip * added patches to completely stop vm in suspended state: cpus: refactor vm_stop cpus: stop vm in suspended state * added patch to partially resume vm in suspended state: cpus: start vm in suspended state * modified "preserve suspended ..." patches to use the above. * deleted patch "preserve cpu ticks if suspended". stop ticks in vm_stop_force_state instead. * deleted patch "add runstate function". defined new helper function migrate_new_runstate in "preserve suspended runstate" * Added some RB's, but removed other RB's because the patches changed. Changes in V6: * all vm_stop calls completely stop the suspended state * refactored and updated the "cpus" patches * simplified the "preserve suspended" patches * added patch "bootfile per vm" Changes in V7: * rebase to tip, add RB-s * fix backwards compatibility for global_state.vm_was_suspended * delete vm_prepare_start state argument, and rename patch "pass runstate to vm_prepare_start" to "check running not RUN_STATE_RUNNING" * drop patches: tests/qtest: bootfile per vm tests/qtest: background migration with suspend * rename runstate_is_started to runstate_is_live * move wait_for_suspend in tests Steve Sistare (12): cpus: vm_was_suspended cpus: stop vm in suspended runstate cpus: check running not RUN_STATE_RUNNING cpus: vm_resume migration: propagate suspended runstate migration: preserve suspended runstate migration: preserve suspended for snapshot migration: preserve suspended for bg_migration tests/qtest: migration events tests/qtest: option to suspend during migration tests/qtest: precopy migration with suspend tests/qtest: postcopy migration with suspend backends/tpm/tpm_emulator.c | 2 +- hw/usb/hcd-ehci.c| 2 +- hw/usb/redirect.c| 2 +- hw/xen/xen-hvm-common.c | 2 +- include/migration/snapshot.h | 7 ++ include/sysemu/runstate.h| 16 migration/global_state.c | 35 ++- migration/migration-hmp-cmds.c | 8 +- migration/migration.c| 15 +-- migration/savevm.c | 23 +++-- qapi/misc.json | 10 +- system/cpus.c| 47 +++-- system/runstate.c| 9 ++ system/vl.c | 2 + tests/migration/i386/Makefile| 5 +- tests/migration/i386/a-b-bootblock.S | 50 +- tests/migration/i386/a-b-bootblock.h | 26 +++-- tests/qtest/migration-helpers.c | 27 ++ tests/qtest/migration-helpers.h | 11 ++- tests/qtest/migration-test.c | 181 +-- 20 files changed, 354 insertions(+), 126 deletions(-) -- 1.8.3.1
[PATCH V7 05/12] migration: propagate suspended runstate
If the outgoing machine was previously suspended, propagate that to the incoming side via global_state, so a subsequent vm_start restores the suspended state. To maintain backward and forward compatibility, reclaim some space from the runstate member. Signed-off-by: Steve Sistare --- migration/global_state.c | 35 +-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/migration/global_state.c b/migration/global_state.c index 4e2a9d8..d4f61a1 100644 --- a/migration/global_state.c +++ b/migration/global_state.c @@ -22,7 +22,16 @@ typedef struct { uint32_t size; -uint8_t runstate[100]; + +/* + * runstate was 100 bytes, zero padded, but we trimmed it to add a + * few fields and maintain backwards compatibility. + */ +uint8_t runstate[32]; +uint8_t has_vm_was_suspended; +uint8_t vm_was_suspended; +uint8_t unused[66]; + RunState state; bool received; } GlobalState; @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state) assert(strlen(state_str) < sizeof(global_state.runstate)); strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate), state_str, '\0'); +global_state.has_vm_was_suspended = true; +global_state.vm_was_suspended = vm_get_suspended(); + +memset(global_state.unused, 0, sizeof(global_state.unused)); } void global_state_store(void) @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque) return true; } +/* If the suspended state must be remembered, it is needed */ + +if (vm_get_suspended()) { +return true; +} + /* If state is running or paused, it is not needed */ if (strcmp(runstate, "running") == 0 || @@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int version_id) Error *local_err = NULL; int r; char *runstate = (char *)s->runstate; +bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended; s->received = true; trace_migrate_global_state_post_load(runstate); @@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int version_id) sizeof(s->runstate)) == sizeof(s->runstate)) { /* * This condition should never happen during migration, because - * all runstate names are shorter than 100 bytes (the size of + * all runstate names are shorter than 32 bytes (the size of * s->runstate). However, a malicious stream could overflow * the qapi_enum_parse() call, so we force the last character * to a NUL byte. @@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int version_id) } s->state = r; +/* + * global_state is saved on the outgoing side before forcing a stopped + * state, so it may have saved state=suspended and vm_was_suspended=0. + * Now we are in a paused state, and when we later call vm_start, it must + * restore the suspended state, so we must set vm_was_suspended=1 here. + */ +vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED); + return 0; } @@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = { .fields = (VMStateField[]) { VMSTATE_UINT32(size, GlobalState), VMSTATE_BUFFER(runstate, GlobalState), +VMSTATE_UINT8(has_vm_was_suspended, GlobalState), +VMSTATE_UINT8(vm_was_suspended, GlobalState), +VMSTATE_BUFFER(unused, GlobalState), VMSTATE_END_OF_LIST() }, }; -- 1.8.3.1
[PATCH V7 09/12] tests/qtest: migration events
Define a state object to capture events seen by migration tests, to allow more events to be captured in a subsequent patch, and simplify event checking in wait_for_migration_pass. No functional change. Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas Reviewed-by: Daniel P. Berrangé --- tests/qtest/migration-helpers.c | 24 --- tests/qtest/migration-helpers.h | 9 ++-- tests/qtest/migration-test.c| 91 +++-- 3 files changed, 56 insertions(+), 68 deletions(-) diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index 24fb7b3..fd3b94e 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -24,26 +24,16 @@ */ #define MIGRATION_STATUS_WAIT_TIMEOUT 120 -bool migrate_watch_for_stop(QTestState *who, const char *name, -QDict *event, void *opaque) -{ -bool *seen = opaque; - -if (g_str_equal(name, "STOP")) { -*seen = true; -return true; -} - -return false; -} - -bool migrate_watch_for_resume(QTestState *who, const char *name, +bool migrate_watch_for_events(QTestState *who, const char *name, QDict *event, void *opaque) { -bool *seen = opaque; +QTestMigrationState *state = opaque; -if (g_str_equal(name, "RESUME")) { -*seen = true; +if (g_str_equal(name, "STOP")) { +state->stop_seen = true; +return true; +} else if (g_str_equal(name, "RESUME")) { +state->resume_seen = true; return true; } diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h index e31dc85..3d32699 100644 --- a/tests/qtest/migration-helpers.h +++ b/tests/qtest/migration-helpers.h @@ -15,9 +15,12 @@ #include "libqtest.h" -bool migrate_watch_for_stop(QTestState *who, const char *name, -QDict *event, void *opaque); -bool migrate_watch_for_resume(QTestState *who, const char *name, +typedef struct QTestMigrationState { +bool stop_seen; +bool resume_seen; +} QTestMigrationState; + +bool migrate_watch_for_events(QTestState *who, const char *name, QDict *event, void *opaque); G_GNUC_PRINTF(3, 4) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 0fbaa6a..05c0740 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -43,8 +43,8 @@ unsigned start_address; unsigned end_address; static bool uffd_feature_thread_id; -static bool got_src_stop; -static bool got_dst_resume; +static QTestMigrationState src_state; +static QTestMigrationState dst_state; /* * An initial 3 MB offset is used as that corresponds @@ -230,6 +230,20 @@ static void wait_for_serial(const char *side) } while (true); } +static void wait_for_stop(QTestState *who, QTestMigrationState *state) +{ +if (!state->stop_seen) { +qtest_qmp_eventwait(who, "STOP"); +} +} + +static void wait_for_resume(QTestState *who, QTestMigrationState *state) +{ +if (!state->resume_seen) { +qtest_qmp_eventwait(who, "RESUME"); +} +} + /* * It's tricky to use qemu's migration event capability with qtest, * events suddenly appearing confuse the qmp()/hmp() responses. @@ -277,21 +291,19 @@ static void read_blocktime(QTestState *who) qobject_unref(rsp_return); } +/* + * Wait for two changes in the migration pass count, but bail if we stop. + */ static void wait_for_migration_pass(QTestState *who) { -uint64_t initial_pass = get_migration_pass(who); -uint64_t pass; +uint64_t pass, prev_pass = 0, changes = 0; -/* Wait for the 1st sync */ -while (!got_src_stop && !initial_pass) { -usleep(1000); -initial_pass = get_migration_pass(who); -} - -do { +while (changes < 2 && !src_state.stop_seen) { usleep(1000); pass = get_migration_pass(who); -} while (pass == initial_pass && !got_src_stop); +changes += (pass != prev_pass); +prev_pass = pass; +} } static void check_guests_ram(QTestState *who) @@ -617,10 +629,7 @@ static void migrate_postcopy_start(QTestState *from, QTestState *to) { qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }"); -if (!got_src_stop) { -qtest_qmp_eventwait(from, "STOP"); -} - +wait_for_stop(from, &src_state); qtest_qmp_eventwait(to, "RESUME"); } @@ -756,8 +765,8 @@ static int test_migrate_start(QTestState **from, QTestState **to, } } -got_src_stop = false; -got_dst_resume = false; +dst_state = (QTestMigrationState) { }; +src_state = (QTestMigrationState) { }; if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) { memory_size = "150M"; @@ -848,8 +857,8 @@ static int test_migrate_start(QTestState **from, QTestState **to, if (!args->only_target) { *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);
[PATCH 11/14] tests/qtest: migration events
Define a state object to capture events seen by migration tests, to allow more events to be captured in a subsequent patch, and simplify event checking in wait_for_migration_pass. No functional change. Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas Reviewed-by: Daniel P. Berrangé --- tests/qtest/migration-helpers.c | 24 --- tests/qtest/migration-helpers.h | 9 ++-- tests/qtest/migration-test.c| 91 +++-- 3 files changed, 56 insertions(+), 68 deletions(-) diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index 24fb7b3..fd3b94e 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -24,26 +24,16 @@ */ #define MIGRATION_STATUS_WAIT_TIMEOUT 120 -bool migrate_watch_for_stop(QTestState *who, const char *name, -QDict *event, void *opaque) -{ -bool *seen = opaque; - -if (g_str_equal(name, "STOP")) { -*seen = true; -return true; -} - -return false; -} - -bool migrate_watch_for_resume(QTestState *who, const char *name, +bool migrate_watch_for_events(QTestState *who, const char *name, QDict *event, void *opaque) { -bool *seen = opaque; +QTestMigrationState *state = opaque; -if (g_str_equal(name, "RESUME")) { -*seen = true; +if (g_str_equal(name, "STOP")) { +state->stop_seen = true; +return true; +} else if (g_str_equal(name, "RESUME")) { +state->resume_seen = true; return true; } diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h index e31dc85..3d32699 100644 --- a/tests/qtest/migration-helpers.h +++ b/tests/qtest/migration-helpers.h @@ -15,9 +15,12 @@ #include "libqtest.h" -bool migrate_watch_for_stop(QTestState *who, const char *name, -QDict *event, void *opaque); -bool migrate_watch_for_resume(QTestState *who, const char *name, +typedef struct QTestMigrationState { +bool stop_seen; +bool resume_seen; +} QTestMigrationState; + +bool migrate_watch_for_events(QTestState *who, const char *name, QDict *event, void *opaque); G_GNUC_PRINTF(3, 4) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 5752412..59fbbef 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -43,8 +43,8 @@ unsigned start_address; unsigned end_address; static bool uffd_feature_thread_id; -static bool got_src_stop; -static bool got_dst_resume; +static QTestMigrationState src_state; +static QTestMigrationState dst_state; /* * An initial 3 MB offset is used as that corresponds @@ -230,6 +230,20 @@ static void wait_for_serial(const char *side) } while (true); } +static void wait_for_stop(QTestState *who, QTestMigrationState *state) +{ +if (!state->stop_seen) { +qtest_qmp_eventwait(who, "STOP"); +} +} + +static void wait_for_resume(QTestState *who, QTestMigrationState *state) +{ +if (!state->resume_seen) { +qtest_qmp_eventwait(who, "RESUME"); +} +} + /* * It's tricky to use qemu's migration event capability with qtest, * events suddenly appearing confuse the qmp()/hmp() responses. @@ -277,21 +291,19 @@ static void read_blocktime(QTestState *who) qobject_unref(rsp_return); } +/* + * Wait for two changes in the migration pass count, but bail if we stop. + */ static void wait_for_migration_pass(QTestState *who) { -uint64_t initial_pass = get_migration_pass(who); -uint64_t pass; +uint64_t pass, prev_pass = 0, changes = 0; -/* Wait for the 1st sync */ -while (!got_src_stop && !initial_pass) { -usleep(1000); -initial_pass = get_migration_pass(who); -} - -do { +while (changes < 2 && !src_state.stop_seen) { usleep(1000); pass = get_migration_pass(who); -} while (pass == initial_pass && !got_src_stop); +changes += (pass != prev_pass); +prev_pass = pass; +} } static void check_guests_ram(QTestState *who) @@ -617,10 +629,7 @@ static void migrate_postcopy_start(QTestState *from, QTestState *to) { qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }"); -if (!got_src_stop) { -qtest_qmp_eventwait(from, "STOP"); -} - +wait_for_stop(from, &src_state); qtest_qmp_eventwait(to, "RESUME"); } @@ -756,8 +765,8 @@ static int test_migrate_start(QTestState **from, QTestState **to, } } -got_src_stop = false; -got_dst_resume = false; +dst_state = (QTestMigrationState) { }; +src_state = (QTestMigrationState) { }; if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) { memory_size = "150M"; @@ -848,8 +857,8 @@ static int test_migrate_start(QTestState **from, QTestState **to, if (!args->only_target) { *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);
[PATCH V7 03/12] cpus: check running not RUN_STATE_RUNNING
When a vm transitions from running to suspended, runstate notifiers are not called, so the notifiers still think the vm is running. Hence, when we call vm_start to restore the suspended state, we call vm_state_notify with running=1. However, some notifiers check for RUN_STATE_RUNNING. They must check the running boolean instead. No functional change. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- backends/tpm/tpm_emulator.c | 2 +- hw/usb/hcd-ehci.c | 2 +- hw/usb/redirect.c | 2 +- hw/xen/xen-hvm-common.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c index f7f1b4a..254fce7 100644 --- a/backends/tpm/tpm_emulator.c +++ b/backends/tpm/tpm_emulator.c @@ -904,7 +904,7 @@ static void tpm_emulator_vm_state_change(void *opaque, bool running, trace_tpm_emulator_vm_state_change(running, state); -if (!running || state != RUN_STATE_RUNNING || !tpm_emu->relock_storage) { +if (!running || !tpm_emu->relock_storage) { return; } diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 19b4534..10c82ce 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -2451,7 +2451,7 @@ static void usb_ehci_vm_state_change(void *opaque, bool running, RunState state) * USB-devices which have async handled packages have a packet in the * ep queue to match the completion with. */ -if (state == RUN_STATE_RUNNING) { +if (running) { ehci_advance_async_state(ehci); } diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c index c9893df..3785bb0 100644 --- a/hw/usb/redirect.c +++ b/hw/usb/redirect.c @@ -1403,7 +1403,7 @@ static void usbredir_vm_state_change(void *priv, bool running, RunState state) { USBRedirDevice *dev = priv; -if (state == RUN_STATE_RUNNING && dev->parser != NULL) { +if (running && dev->parser != NULL) { usbredirparser_do_write(dev->parser); /* Flush any pending writes */ } } diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c index 565dc39..47e6cb1 100644 --- a/hw/xen/xen-hvm-common.c +++ b/hw/xen/xen-hvm-common.c @@ -623,7 +623,7 @@ void xen_hvm_change_state_handler(void *opaque, bool running, xen_set_ioreq_server_state(xen_domid, state->ioservid, - (rstate == RUN_STATE_RUNNING)); + running); } void xen_exit_notifier(Notifier *n, void *data) -- 1.8.3.1
[PATCH 09/14] migration: preserve suspended for snapshot
Restoring a snapshot can break a suspended guest. Snapshots suffer from the same suspended-state issues that affect live migration, plus they must handle an additional problematic scenario, which is that a running vm must remain running if it loads a suspended snapshot. To save, the existing vm_stop call now completely stops the suspended state. Finish with vm_resume to leave the vm in the state it had prior to the save, correctly restoring the suspended state. To load, if the snapshot is not suspended, then vm_stop + vm_resume correctly handles all states, and leaves the vm in the state it had prior to the load. However, if the snapshot is suspended, restoration is trickier. First, call vm_resume to restore the state to suspended so the current state matches the saved state. Then, if the pre-load state is running, call wakeup to resume running. Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and RUN_STATE_RESTORE_VM did not change runstate if the current state was suspended, but now it does, so allow these transitions. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/migration/snapshot.h | 7 +++ migration/migration-hmp-cmds.c | 8 +--- migration/savevm.c | 23 +-- system/runstate.c | 5 + system/vl.c| 2 ++ 5 files changed, 32 insertions(+), 13 deletions(-) diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h index e72083b..9e4dcaa 100644 --- a/include/migration/snapshot.h +++ b/include/migration/snapshot.h @@ -16,6 +16,7 @@ #define QEMU_MIGRATION_SNAPSHOT_H #include "qapi/qapi-builtin-types.h" +#include "qapi/qapi-types-run-state.h" /** * save_snapshot: Save an internal snapshot. @@ -61,4 +62,10 @@ bool delete_snapshot(const char *name, bool has_devices, strList *devices, Error **errp); +/** + * load_snapshot_resume: Restore runstate after loading snapshot. + * @state: state to restore + */ +void load_snapshot_resume(RunState state); + #endif diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 86ae832..c8d70bc 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -399,15 +399,17 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) void hmp_loadvm(Monitor *mon, const QDict *qdict) { -int saved_vm_running = runstate_is_running(); +RunState saved_state = runstate_get(); + const char *name = qdict_get_str(qdict, "name"); Error *err = NULL; vm_stop(RUN_STATE_RESTORE_VM); -if (load_snapshot(name, NULL, false, NULL, &err) && saved_vm_running) { -vm_start(); +if (load_snapshot(name, NULL, false, NULL, &err)) { +load_snapshot_resume(saved_state); } + hmp_handle_error(mon, err); } diff --git a/migration/savevm.c b/migration/savevm.c index eec5503..26866f0 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -3046,7 +3046,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, QEMUSnapshotInfo sn1, *sn = &sn1; int ret = -1, ret2; QEMUFile *f; -int saved_vm_running; +RunState saved_state = runstate_get(); uint64_t vm_state_size; g_autoptr(GDateTime) now = g_date_time_new_now_local(); AioContext *aio_context; @@ -3094,8 +3094,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, } aio_context = bdrv_get_aio_context(bs); -saved_vm_running = runstate_is_running(); - global_state_store(); vm_stop(RUN_STATE_SAVE_VM); @@ -3163,9 +3161,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, bdrv_drain_all_end(); -if (saved_vm_running) { -vm_start(); -} +vm_resume(saved_state); return ret == 0; } @@ -3339,6 +3335,14 @@ err_drain: return false; } +void load_snapshot_resume(RunState state) +{ +vm_resume(state); +if (state == RUN_STATE_RUNNING && runstate_get() == RUN_STATE_SUSPENDED) { +qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, &error_abort); +} +} + bool delete_snapshot(const char *name, bool has_devices, strList *devices, Error **errp) { @@ -3403,16 +3407,15 @@ static void snapshot_load_job_bh(void *opaque) { Job *job = opaque; SnapshotJob *s = container_of(job, SnapshotJob, common); -int orig_vm_running; +RunState orig_state = runstate_get(); job_progress_set_remaining(&s->common, 1); -orig_vm_running = runstate_is_running(); vm_stop(RUN_STATE_RESTORE_VM); s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp); -if (s->ret && orig_vm_running) { -vm_start(); +if (s->ret) { +load_snapshot_resume(orig_state); } job_progress_update(&s->common, 1); diff --git a/system/runstate.c b/system/runstate.c index e2fa204..ca9eb54 100644 --- a/system/runstate.c +++ b
[PATCH V7 02/12] cpus: stop vm in suspended runstate
Currently, a vm in the suspended state is not completely stopped. The VCPUs have been paused, but the cpu clock still runs, and runstate notifiers for the transition to stopped have not been called. This causes problems for live migration. Stale cpu timers_state is saved to the migration stream, causing time errors in the guest when it wakes from suspend, and state that would have been modified by runstate notifiers is wrong. Modify vm_stop to completely stop the vm if the current state is suspended, transition to RUN_STATE_PAUSED, and remember that the machine was suspended. Modify vm_start to restore the suspended state. This affects all callers of vm_stop and vm_start, notably, the qapi stop and cont commands. For example: (qemu) info status VM status: paused (suspended) (qemu) stop (qemu) info status VM status: paused (qemu) system_wakeup Error: Unable to wake up: guest is not in suspended state (qemu) cont (qemu) info status VM status: paused (suspended) (qemu) system_wakeup (qemu) info status VM status: running Suggested-by: Peter Xu Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 5 + qapi/misc.json| 10 -- system/cpus.c | 23 +++ system/runstate.c | 3 +++ 4 files changed, 31 insertions(+), 10 deletions(-) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 88a67e2..867e53c 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause cause) return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; } +static inline bool runstate_is_live(RunState state) +{ +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; +} + void vm_start(void); /** diff --git a/qapi/misc.json b/qapi/misc.json index cda2eff..efb8d44 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -134,7 +134,7 @@ ## # @stop: # -# Stop all guest VCPU execution. +# Stop all guest VCPU and VM execution. # # Since: 0.14 # @@ -143,6 +143,9 @@ # the guest remains paused once migration finishes, as if the -S # option was passed on the command line. # +# In the "suspended" state, it will completely stop the VM and +# cause a transition to the "paused" state. (Since 9.0) +# # Example: # # -> { "execute": "stop" } @@ -153,7 +156,7 @@ ## # @cont: # -# Resume guest VCPU execution. +# Resume guest VCPU and VM execution. # # Since: 0.14 # @@ -165,6 +168,9 @@ # guest starts once migration finishes, removing the effect of the # -S command line option if it was passed. # +# If the VM was previously suspended, and not been reset or woken, +# this command will transition back to the "suspended" state. (Since 9.0) +# # Example: # # -> { "execute": "cont" } diff --git a/system/cpus.c b/system/cpus.c index 9f631ab..f162435 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -277,11 +277,15 @@ bool vm_get_suspended(void) static int do_vm_stop(RunState state, bool send_stop) { int ret = 0; +RunState oldstate = runstate_get(); -if (runstate_is_running()) { +if (runstate_is_live(oldstate)) { +vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED); runstate_set(state); cpu_disable_ticks(); -pause_all_vcpus(); +if (oldstate == RUN_STATE_RUNNING) { +pause_all_vcpus(); +} vm_state_notify(0, state); if (send_stop) { qapi_event_send_stop(); @@ -694,11 +698,13 @@ int vm_stop(RunState state) /** * Prepare for (re)starting the VM. - * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already - * running or in case of an error condition), 0 otherwise. + * Returns 0 if the vCPUs should be restarted, -1 on an error condition, + * and 1 otherwise. */ int vm_prepare_start(bool step_pending) { +int ret = vm_was_suspended ? 1 : 0; +RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : RUN_STATE_RUNNING; RunState requested; qemu_vmstop_requested(&requested); @@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending) qapi_event_send_resume(); cpu_enable_ticks(); -runstate_set(RUN_STATE_RUNNING); -vm_state_notify(1, RUN_STATE_RUNNING); -return 0; +runstate_set(state); +vm_state_notify(1, state); +vm_was_suspended = false; +return ret; } void vm_start(void) @@ -745,7 +752,7 @@ void vm_start(void) current state is forgotten forever */ int vm_stop_force_state(RunState state) { -if (runstate_is_running()) { +if (runstate_is_live(runstate_get())) { return vm_stop(state); } else { int ret; diff --git a/system/runstate.c b/system/runstate.c index ea9d6c2..e2fa204 100644 --- a/system/runstate.c +++ b/system/runstate.c @@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] = { {
[PATCH 01/14] temporary: define CLONE_NEWCGROUP qemu-8.2.0
--- system/qemu-seccomp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/system/qemu-seccomp.c b/system/qemu-seccomp.c index 4d7439e..9e1ff85 100644 --- a/system/qemu-seccomp.c +++ b/system/qemu-seccomp.c @@ -22,6 +22,7 @@ #include #include "sysemu/seccomp.h" #include +#define CLONE_NEWCGROUP 0x0200 /* For some architectures (notably ARM) cacheflush is not supported until * libseccomp 2.2.3, but configure enforces that we are using a more recent -- 1.8.3.1
[PATCH 10/14] migration: preserve suspended for bg_migration
Do not wake a suspended guest during bg_migration, and restore the prior state at finish rather than unconditionally running. Allow the additional state transitions that occur. Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu --- migration/migration.c | 7 +-- system/runstate.c | 1 + 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 52ba0bc..b30313a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3389,7 +3389,7 @@ static void bg_migration_vm_start_bh(void *opaque) qemu_bh_delete(s->vm_start_bh); s->vm_start_bh = NULL; -vm_start(); +vm_resume(s->vm_old_state); migration_downtime_end(s); } @@ -3461,11 +3461,6 @@ static void *bg_migration_thread(void *opaque) qemu_mutex_lock_iothread(); -/* - * If VM is currently in suspended state, then, to make a valid runstate - * transition in vm_stop_force_state() we need to wakeup it up. - */ -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); s->vm_old_state = runstate_get(); global_state_store(); diff --git a/system/runstate.c b/system/runstate.c index ca9eb54..621a023 100644 --- a/system/runstate.c +++ b/system/runstate.c @@ -168,6 +168,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED}, { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM }, { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM }, +{ RUN_STATE_SUSPENDED, RUN_STATE_SHUTDOWN }, { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -- 1.8.3.1
[PATCH 07/14] migration: propagate suspended runstate
If the outgoing machine was previously suspended, propagate that to the incoming side via global_state, so a subsequent vm_start restores the suspended state. To maintain backward and forward compatibility, reclaim some space from the runstate member. Signed-off-by: Steve Sistare --- migration/global_state.c | 35 +-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/migration/global_state.c b/migration/global_state.c index 4e2a9d8..d4f61a1 100644 --- a/migration/global_state.c +++ b/migration/global_state.c @@ -22,7 +22,16 @@ typedef struct { uint32_t size; -uint8_t runstate[100]; + +/* + * runstate was 100 bytes, zero padded, but we trimmed it to add a + * few fields and maintain backwards compatibility. + */ +uint8_t runstate[32]; +uint8_t has_vm_was_suspended; +uint8_t vm_was_suspended; +uint8_t unused[66]; + RunState state; bool received; } GlobalState; @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state) assert(strlen(state_str) < sizeof(global_state.runstate)); strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate), state_str, '\0'); +global_state.has_vm_was_suspended = true; +global_state.vm_was_suspended = vm_get_suspended(); + +memset(global_state.unused, 0, sizeof(global_state.unused)); } void global_state_store(void) @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque) return true; } +/* If the suspended state must be remembered, it is needed */ + +if (vm_get_suspended()) { +return true; +} + /* If state is running or paused, it is not needed */ if (strcmp(runstate, "running") == 0 || @@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int version_id) Error *local_err = NULL; int r; char *runstate = (char *)s->runstate; +bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended; s->received = true; trace_migrate_global_state_post_load(runstate); @@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int version_id) sizeof(s->runstate)) == sizeof(s->runstate)) { /* * This condition should never happen during migration, because - * all runstate names are shorter than 100 bytes (the size of + * all runstate names are shorter than 32 bytes (the size of * s->runstate). However, a malicious stream could overflow * the qapi_enum_parse() call, so we force the last character * to a NUL byte. @@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int version_id) } s->state = r; +/* + * global_state is saved on the outgoing side before forcing a stopped + * state, so it may have saved state=suspended and vm_was_suspended=0. + * Now we are in a paused state, and when we later call vm_start, it must + * restore the suspended state, so we must set vm_was_suspended=1 here. + */ +vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED); + return 0; } @@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = { .fields = (VMStateField[]) { VMSTATE_UINT32(size, GlobalState), VMSTATE_BUFFER(runstate, GlobalState), +VMSTATE_UINT8(has_vm_was_suspended, GlobalState), +VMSTATE_UINT8(vm_was_suspended, GlobalState), +VMSTATE_BUFFER(unused, GlobalState), VMSTATE_END_OF_LIST() }, }; -- 1.8.3.1
[PATCH V7 08/12] migration: preserve suspended for bg_migration
Do not wake a suspended guest during bg_migration, and restore the prior state at finish rather than unconditionally running. Allow the additional state transitions that occur. Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu --- migration/migration.c | 7 +-- system/runstate.c | 1 + 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 8124811..2cc7e2a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3390,7 +3390,7 @@ static void bg_migration_vm_start_bh(void *opaque) qemu_bh_delete(s->vm_start_bh); s->vm_start_bh = NULL; -vm_start(); +vm_resume(s->vm_old_state); migration_downtime_end(s); } @@ -3462,11 +3462,6 @@ static void *bg_migration_thread(void *opaque) qemu_mutex_lock_iothread(); -/* - * If VM is currently in suspended state, then, to make a valid runstate - * transition in vm_stop_force_state() we need to wakeup it up. - */ -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); s->vm_old_state = runstate_get(); global_state_store(); diff --git a/system/runstate.c b/system/runstate.c index ca9eb54..621a023 100644 --- a/system/runstate.c +++ b/system/runstate.c @@ -168,6 +168,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED}, { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM }, { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM }, +{ RUN_STATE_SUSPENDED, RUN_STATE_SHUTDOWN }, { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -- 1.8.3.1
[PATCH 04/14] cpus: stop vm in suspended runstate
Currently, a vm in the suspended state is not completely stopped. The VCPUs have been paused, but the cpu clock still runs, and runstate notifiers for the transition to stopped have not been called. This causes problems for live migration. Stale cpu timers_state is saved to the migration stream, causing time errors in the guest when it wakes from suspend, and state that would have been modified by runstate notifiers is wrong. Modify vm_stop to completely stop the vm if the current state is suspended, transition to RUN_STATE_PAUSED, and remember that the machine was suspended. Modify vm_start to restore the suspended state. This affects all callers of vm_stop and vm_start, notably, the qapi stop and cont commands. For example: (qemu) info status VM status: paused (suspended) (qemu) stop (qemu) info status VM status: paused (qemu) system_wakeup Error: Unable to wake up: guest is not in suspended state (qemu) cont (qemu) info status VM status: paused (suspended) (qemu) system_wakeup (qemu) info status VM status: running Suggested-by: Peter Xu Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 5 + qapi/misc.json| 10 -- system/cpus.c | 23 +++ system/runstate.c | 3 +++ 4 files changed, 31 insertions(+), 10 deletions(-) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 88a67e2..867e53c 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause cause) return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; } +static inline bool runstate_is_live(RunState state) +{ +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; +} + void vm_start(void); /** diff --git a/qapi/misc.json b/qapi/misc.json index cda2eff..efb8d44 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -134,7 +134,7 @@ ## # @stop: # -# Stop all guest VCPU execution. +# Stop all guest VCPU and VM execution. # # Since: 0.14 # @@ -143,6 +143,9 @@ # the guest remains paused once migration finishes, as if the -S # option was passed on the command line. # +# In the "suspended" state, it will completely stop the VM and +# cause a transition to the "paused" state. (Since 9.0) +# # Example: # # -> { "execute": "stop" } @@ -153,7 +156,7 @@ ## # @cont: # -# Resume guest VCPU execution. +# Resume guest VCPU and VM execution. # # Since: 0.14 # @@ -165,6 +168,9 @@ # guest starts once migration finishes, removing the effect of the # -S command line option if it was passed. # +# If the VM was previously suspended, and not been reset or woken, +# this command will transition back to the "suspended" state. (Since 9.0) +# # Example: # # -> { "execute": "cont" } diff --git a/system/cpus.c b/system/cpus.c index 9baf096..73b0e3e 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -270,11 +270,15 @@ bool vm_get_suspended(void) static int do_vm_stop(RunState state, bool send_stop) { int ret = 0; +RunState oldstate = runstate_get(); -if (runstate_is_running()) { +if (runstate_is_live(oldstate)) { +vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED); runstate_set(state); cpu_disable_ticks(); -pause_all_vcpus(); +if (oldstate == RUN_STATE_RUNNING) { +pause_all_vcpus(); +} vm_state_notify(0, state); if (send_stop) { qapi_event_send_stop(); @@ -687,11 +691,13 @@ int vm_stop(RunState state) /** * Prepare for (re)starting the VM. - * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already - * running or in case of an error condition), 0 otherwise. + * Returns 0 if the vCPUs should be restarted, -1 on an error condition, + * and 1 otherwise. */ int vm_prepare_start(bool step_pending) { +int ret = vm_was_suspended ? 1 : 0; +RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : RUN_STATE_RUNNING; RunState requested; qemu_vmstop_requested(&requested); @@ -722,9 +728,10 @@ int vm_prepare_start(bool step_pending) qapi_event_send_resume(); cpu_enable_ticks(); -runstate_set(RUN_STATE_RUNNING); -vm_state_notify(1, RUN_STATE_RUNNING); -return 0; +runstate_set(state); +vm_state_notify(1, state); +vm_was_suspended = false; +return ret; } void vm_start(void) @@ -738,7 +745,7 @@ void vm_start(void) current state is forgotten forever */ int vm_stop_force_state(RunState state) { -if (runstate_is_running()) { +if (runstate_is_live(runstate_get())) { return vm_stop(state); } else { int ret; diff --git a/system/runstate.c b/system/runstate.c index ea9d6c2..e2fa204 100644 --- a/system/runstate.c +++ b/system/runstate.c @@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] = { {
[PATCH V7 01/12] cpus: vm_was_suspended
Add a state variable to remember if a vm previously transitioned into a suspended state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 2 ++ system/cpus.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index c8c2bd8..88a67e2 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -51,6 +51,8 @@ int vm_prepare_start(bool step_pending); int vm_stop(RunState state); int vm_stop_force_state(RunState state); int vm_shutdown(void); +void vm_set_suspended(bool suspended); +bool vm_get_suspended(void); typedef enum WakeupReason { /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */ diff --git a/system/cpus.c b/system/cpus.c index a444a74..9f631ab 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -259,6 +259,21 @@ void cpu_interrupt(CPUState *cpu, int mask) } } +/* + * True if the vm was previously suspended, and has not been woken or reset. + */ +static int vm_was_suspended; + +void vm_set_suspended(bool suspended) +{ +vm_was_suspended = suspended; +} + +bool vm_get_suspended(void) +{ +return vm_was_suspended; +} + static int do_vm_stop(RunState state, bool send_stop) { int ret = 0; -- 1.8.3.1
[PATCH 03/14] cpus: vm_was_suspended
Add a state variable to remember if a vm previously transitioned into a suspended state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 2 ++ system/cpus.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index c8c2bd8..88a67e2 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -51,6 +51,8 @@ int vm_prepare_start(bool step_pending); int vm_stop(RunState state); int vm_stop_force_state(RunState state); int vm_shutdown(void); +void vm_set_suspended(bool suspended); +bool vm_get_suspended(void); typedef enum WakeupReason { /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */ diff --git a/system/cpus.c b/system/cpus.c index 0848e0d..9baf096 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -252,6 +252,21 @@ void cpu_interrupt(CPUState *cpu, int mask) } } +/* + * True if the vm was previously suspended, and has not been woken or reset. + */ +static int vm_was_suspended; + +void vm_set_suspended(bool suspended) +{ +vm_was_suspended = suspended; +} + +bool vm_get_suspended(void) +{ +return vm_was_suspended; +} + static int do_vm_stop(RunState state, bool send_stop) { int ret = 0; -- 1.8.3.1
[PATCH 12/14] tests/qtest: option to suspend during migration
Add an option to suspend the src in a-b-bootblock.S, which puts the guest in S3 state after one round of writing to memory. The option is enabled by poking a 1 into the suspend_me word in the boot block prior to starting the src vm. Generate symbol offsets in a-b-bootblock.h so that the suspend_me offset is known. Generate the bootblock for each test, because suspend_me may differ for each. Signed-off-by: Steve Sistare Acked-by: Peter Xu Reviewed-by: Fabiano Rosas --- tests/migration/i386/Makefile| 5 ++-- tests/migration/i386/a-b-bootblock.S | 50 +--- tests/migration/i386/a-b-bootblock.h | 26 +-- tests/qtest/migration-test.c | 12 ++--- 4 files changed, 77 insertions(+), 16 deletions(-) diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile index 5c03241..37a72ae 100644 --- a/tests/migration/i386/Makefile +++ b/tests/migration/i386/Makefile @@ -4,9 +4,10 @@ .PHONY: all clean all: a-b-bootblock.h -a-b-bootblock.h: x86.bootsect +a-b-bootblock.h: x86.bootsect x86.o echo "$$__note" > header.tmp xxd -i $< | sed -e 's/.*int.*//' >> header.tmp + nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp mv header.tmp $@ x86.bootsect: x86.boot @@ -16,7 +17,7 @@ x86.boot: x86.o $(CROSS_PREFIX)objcopy -O binary $< $@ x86.o: a-b-bootblock.S - $(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@ + $(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@ clean: @rm -rf *.boot *.o *.bootsect diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 6bb..6f39eb6 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -9,6 +9,23 @@ # # Author: dgilb...@redhat.com +#include "migration-test.h" + +#define ACPI_ENABLE 0xf1 +#define ACPI_PORT_SMI_CMD 0xb2 +#define ACPI_PM_BASE0x600 +#define PM1A_CNT_OFFSET 4 + +#define ACPI_SCI_ENABLE 0x0001 +#define ACPI_SLEEP_TYPE 0x0400 +#define ACPI_SLEEP_ENABLE 0x2000 +#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE) + +#define LOW_ADDRX86_TEST_MEM_START +#define HIGH_ADDR X86_TEST_MEM_END + +/* Save the suspended status at an address that is not written in the loop. */ +#define suspended (X86_TEST_MEM_START + 4) .code16 .org 0x7c00 @@ -35,8 +52,8 @@ start: # at 0x7c00 ? mov %eax,%ds # Start from 1MB -.set TEST_MEM_START, (1024*1024) -.set TEST_MEM_END, (100*1024*1024) +.set TEST_MEM_START, X86_TEST_MEM_START +.set TEST_MEM_END, X86_TEST_MEM_END mov $65,%ax mov $0x3f8,%dx @@ -69,7 +86,30 @@ innerloop: mov $0x3f8,%dx outb %al,%dx -jmp mainloop +# should this test suspend? +mov (suspend_me),%eax +cmp $0,%eax +je mainloop + +# are we waking after suspend? do not suspend again. +mov $suspended,%eax +mov (%eax),%eax +cmp $1,%eax +je mainloop + +# enable acpi +mov $ACPI_ENABLE,%al +outb %al,$ACPI_PORT_SMI_CMD + +# suspend to ram +mov $suspended,%eax +movl $1,(%eax) +mov $SLEEP,%ax +mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx +outw %ax,%dx +# not reached. The wakeup causes reset and restart at 0x7c00, and we +# do not save and restore registers as a real kernel would do. + # GDT magic from old (GPLv2) Grub startup.S .p2align2 /* force 4-byte alignment */ @@ -95,6 +135,10 @@ gdtdesc: .word 0x27/* limit */ .long gdt /* addr */ +/* test launcher can poke a 1 here to exercise suspend */ +suspend_me: +.int 0 + /* I'm a bootable disk */ .org 0x7dfe .byte 0x55 diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h index 5b52391..c83f871 100644 --- a/tests/migration/i386/a-b-bootblock.h +++ b/tests/migration/i386/a-b-bootblock.h @@ -4,7 +4,7 @@ * the header and the assembler differences in your patch submission. */ unsigned char x86_bootsect[] = { - 0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, + 0xfa, 0x0f, 0x01, 0x16, 0xb8, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02, 0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41, @@ -13,13 +13,13 @@ unsigned char x86_bootsect[] = { 0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba, - 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00
Re: [PATCH V7 00/12] fix migration of suspended runstate
Gack, there was cruft in my send mail directory. Please ignore this threaded series, and I will resend. Sorry for the noise. - Steve On 12/6/2023 12:12 PM, Steve Sistare wrote: > Migration of a guest in the suspended runstate is broken. The incoming > migration code automatically tries to wake the guest, which is wrong; > the guest should end migration in the same runstate it started. Further, > after saving a snapshot in the suspended state and loading it, the vm_start > fails. The runstate is RUNNING, but the guest is not. > > See the commit messages for the details. > > Changes in V2: > * simplify "start on wakeup request" > * fix postcopy, snapshot, and background migration > * refactor fixes for each type of migration > * explicitly handled suspended events and runstate in tests > * add test for postcopy and background migration > > Changes in V3: > * rebase to tip > * fix hang in new function migrate_wait_for_dirty_mem > > Changes in V4: > * rebase to tip > * add patch for vm_prepare_start (thanks Peter) > * add patch to preserve cpu ticks > > Changes in V5: > * rebase to tip > * added patches to completely stop vm in suspended state: > cpus: refactor vm_stop > cpus: stop vm in suspended state > * added patch to partially resume vm in suspended state: > cpus: start vm in suspended state > * modified "preserve suspended ..." patches to use the above. > * deleted patch "preserve cpu ticks if suspended". stop ticks in > vm_stop_force_state instead. > * deleted patch "add runstate function". defined new helper function > migrate_new_runstate in "preserve suspended runstate" > * Added some RB's, but removed other RB's because the patches changed. > > Changes in V6: > * all vm_stop calls completely stop the suspended state > * refactored and updated the "cpus" patches > * simplified the "preserve suspended" patches > * added patch "bootfile per vm" > > Changes in V7: > * rebase to tip, add RB-s > * fix backwards compatibility for global_state.vm_was_suspended > * delete vm_prepare_start state argument, and rename patch > "pass runstate to vm_prepare_start" to > "check running not RUN_STATE_RUNNING" > * drop patches: > tests/qtest: bootfile per vm > tests/qtest: background migration with suspend > * rename runstate_is_started to runstate_is_live > * move wait_for_suspend in tests > > Steve Sistare (12): > cpus: vm_was_suspended > cpus: stop vm in suspended runstate > cpus: check running not RUN_STATE_RUNNING > cpus: vm_resume > migration: propagate suspended runstate > migration: preserve suspended runstate > migration: preserve suspended for snapshot > migration: preserve suspended for bg_migration > tests/qtest: migration events > tests/qtest: option to suspend during migration > tests/qtest: precopy migration with suspend > tests/qtest: postcopy migration with suspend > > backends/tpm/tpm_emulator.c | 2 +- > hw/usb/hcd-ehci.c| 2 +- > hw/usb/redirect.c| 2 +- > hw/xen/xen-hvm-common.c | 2 +- > include/migration/snapshot.h | 7 ++ > include/sysemu/runstate.h| 16 > migration/global_state.c | 35 ++- > migration/migration-hmp-cmds.c | 8 +- > migration/migration.c| 15 +-- > migration/savevm.c | 23 +++-- > qapi/misc.json | 10 +- > system/cpus.c| 47 +++-- > system/runstate.c| 9 ++ > system/vl.c | 2 + > tests/migration/i386/Makefile| 5 +- > tests/migration/i386/a-b-bootblock.S | 50 +- > tests/migration/i386/a-b-bootblock.h | 26 +++-- > tests/qtest/migration-helpers.c | 27 ++ > tests/qtest/migration-helpers.h | 11 ++- > tests/qtest/migration-test.c | 181 > +-- > 20 files changed, 354 insertions(+), 126 deletions(-) >
[PATCH V7 12/12] tests/qtest: postcopy migration with suspend
Add a test case to verify that the suspended state is handled correctly by live migration postcopy. The test suspends the src, migrates, then wakes the dest. Signed-off-by: Steve Sistare --- tests/qtest/migration-test.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index f57a978..b7c094e 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1347,6 +1347,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr, /* Wait for the first serial output from the source */ wait_for_serial("src_serial"); +wait_for_suspend(from, &src_state); g_autofree char *uri = migrate_get_socket_address(to, "socket-address"); migrate_qmp(from, uri, "{}"); @@ -1364,6 +1365,11 @@ static void migrate_postcopy_complete(QTestState *from, QTestState *to, { wait_for_migration_complete(from); +if (args->start.suspend_me) { +/* wakeup succeeds only if guest is suspended */ +qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}"); +} + /* Make sure we get at least one "B" on destination */ wait_for_serial("dest_serial"); @@ -1397,6 +1403,15 @@ static void test_postcopy(void) test_postcopy_common(&args); } +static void test_postcopy_suspend(void) +{ +MigrateCommon args = { +.start.suspend_me = true, +}; + +test_postcopy_common(&args); +} + static void test_postcopy_compress(void) { MigrateCommon args = { @@ -3412,7 +3427,10 @@ int main(int argc, char **argv) qtest_add_func("/migration/postcopy/recovery/double-failures", test_postcopy_recovery_double_fail); #endif /* _WIN32 */ - +if (is_x86) { +qtest_add_func("/migration/postcopy/suspend", + test_postcopy_suspend); +} } qtest_add_func("/migration/bad_dest", test_baddest); -- 1.8.3.1
[PATCH 14/14] tests/qtest: postcopy migration with suspend
Add a test case to verify that the suspended state is handled correctly by live migration postcopy. The test suspends the src, migrates, then wakes the dest. Signed-off-by: Steve Sistare --- tests/qtest/migration-test.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 03da984..2033085 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1347,6 +1347,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr, /* Wait for the first serial output from the source */ wait_for_serial("src_serial"); +wait_for_suspend(from, &src_state); g_autofree char *uri = migrate_get_socket_address(to, "socket-address"); migrate_qmp(from, uri, "{}"); @@ -1364,6 +1365,11 @@ static void migrate_postcopy_complete(QTestState *from, QTestState *to, { wait_for_migration_complete(from); +if (args->start.suspend_me) { +/* wakeup succeeds only if guest is suspended */ +qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}"); +} + /* Make sure we get at least one "B" on destination */ wait_for_serial("dest_serial"); @@ -1397,6 +1403,15 @@ static void test_postcopy(void) test_postcopy_common(&args); } +static void test_postcopy_suspend(void) +{ +MigrateCommon args = { +.start.suspend_me = true, +}; + +test_postcopy_common(&args); +} + static void test_postcopy_compress(void) { MigrateCommon args = { @@ -3412,7 +3427,10 @@ int main(int argc, char **argv) qtest_add_func("/migration/postcopy/recovery/double-failures", test_postcopy_recovery_double_fail); #endif /* _WIN32 */ - +if (is_x86) { +qtest_add_func("/migration/postcopy/suspend", + test_postcopy_suspend); +} } qtest_add_func("/migration/bad_dest", test_baddest); -- 1.8.3.1
[PATCH V7 07/12] migration: preserve suspended for snapshot
Restoring a snapshot can break a suspended guest. Snapshots suffer from the same suspended-state issues that affect live migration, plus they must handle an additional problematic scenario, which is that a running vm must remain running if it loads a suspended snapshot. To save, the existing vm_stop call now completely stops the suspended state. Finish with vm_resume to leave the vm in the state it had prior to the save, correctly restoring the suspended state. To load, if the snapshot is not suspended, then vm_stop + vm_resume correctly handles all states, and leaves the vm in the state it had prior to the load. However, if the snapshot is suspended, restoration is trickier. First, call vm_resume to restore the state to suspended so the current state matches the saved state. Then, if the pre-load state is running, call wakeup to resume running. Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and RUN_STATE_RESTORE_VM did not change runstate if the current state was suspended, but now it does, so allow these transitions. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/migration/snapshot.h | 7 +++ migration/migration-hmp-cmds.c | 8 +--- migration/savevm.c | 23 +-- system/runstate.c | 5 + system/vl.c| 2 ++ 5 files changed, 32 insertions(+), 13 deletions(-) diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h index e72083b..9e4dcaa 100644 --- a/include/migration/snapshot.h +++ b/include/migration/snapshot.h @@ -16,6 +16,7 @@ #define QEMU_MIGRATION_SNAPSHOT_H #include "qapi/qapi-builtin-types.h" +#include "qapi/qapi-types-run-state.h" /** * save_snapshot: Save an internal snapshot. @@ -61,4 +62,10 @@ bool delete_snapshot(const char *name, bool has_devices, strList *devices, Error **errp); +/** + * load_snapshot_resume: Restore runstate after loading snapshot. + * @state: state to restore + */ +void load_snapshot_resume(RunState state); + #endif diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 86ae832..c8d70bc 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -399,15 +399,17 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) void hmp_loadvm(Monitor *mon, const QDict *qdict) { -int saved_vm_running = runstate_is_running(); +RunState saved_state = runstate_get(); + const char *name = qdict_get_str(qdict, "name"); Error *err = NULL; vm_stop(RUN_STATE_RESTORE_VM); -if (load_snapshot(name, NULL, false, NULL, &err) && saved_vm_running) { -vm_start(); +if (load_snapshot(name, NULL, false, NULL, &err)) { +load_snapshot_resume(saved_state); } + hmp_handle_error(mon, err); } diff --git a/migration/savevm.c b/migration/savevm.c index eec5503..26866f0 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -3046,7 +3046,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, QEMUSnapshotInfo sn1, *sn = &sn1; int ret = -1, ret2; QEMUFile *f; -int saved_vm_running; +RunState saved_state = runstate_get(); uint64_t vm_state_size; g_autoptr(GDateTime) now = g_date_time_new_now_local(); AioContext *aio_context; @@ -3094,8 +3094,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, } aio_context = bdrv_get_aio_context(bs); -saved_vm_running = runstate_is_running(); - global_state_store(); vm_stop(RUN_STATE_SAVE_VM); @@ -3163,9 +3161,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, bdrv_drain_all_end(); -if (saved_vm_running) { -vm_start(); -} +vm_resume(saved_state); return ret == 0; } @@ -3339,6 +3335,14 @@ err_drain: return false; } +void load_snapshot_resume(RunState state) +{ +vm_resume(state); +if (state == RUN_STATE_RUNNING && runstate_get() == RUN_STATE_SUSPENDED) { +qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, &error_abort); +} +} + bool delete_snapshot(const char *name, bool has_devices, strList *devices, Error **errp) { @@ -3403,16 +3407,15 @@ static void snapshot_load_job_bh(void *opaque) { Job *job = opaque; SnapshotJob *s = container_of(job, SnapshotJob, common); -int orig_vm_running; +RunState orig_state = runstate_get(); job_progress_set_remaining(&s->common, 1); -orig_vm_running = runstate_is_running(); vm_stop(RUN_STATE_RESTORE_VM); s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp); -if (s->ret && orig_vm_running) { -vm_start(); +if (s->ret) { +load_snapshot_resume(orig_state); } job_progress_update(&s->common, 1); diff --git a/system/runstate.c b/system/runstate.c index e2fa204..ca9eb54 100644 --- a/system/runstate.c +++ b
[PATCH V7 11/12] tests/qtest: precopy migration with suspend
Add a test case to verify that the suspended state is handled correctly during live migration precopy. The test suspends the src, migrates, then wakes the dest. Signed-off-by: Steve Sistare --- tests/qtest/migration-helpers.c | 3 ++ tests/qtest/migration-helpers.h | 2 ++ tests/qtest/migration-test.c| 62 +++-- 3 files changed, 64 insertions(+), 3 deletions(-) diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index fd3b94e..37e8e81 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char *name, if (g_str_equal(name, "STOP")) { state->stop_seen = true; return true; +} else if (g_str_equal(name, "SUSPEND")) { +state->suspend_seen = true; +return true; } else if (g_str_equal(name, "RESUME")) { state->resume_seen = true; return true; diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h index 3d32699..b478549 100644 --- a/tests/qtest/migration-helpers.h +++ b/tests/qtest/migration-helpers.h @@ -18,6 +18,8 @@ typedef struct QTestMigrationState { bool stop_seen; bool resume_seen; +bool suspend_seen; +bool suspend_me; } QTestMigrationState; bool migrate_watch_for_events(QTestState *who, const char *name, diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index e10d5a4..f57a978 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -178,7 +178,7 @@ static void bootfile_delete(void) /* * Wait for some output in the serial output file, * we get an 'A' followed by an endless string of 'B's - * but on the destination we won't have the A. + * but on the destination we won't have the A (unless we enabled suspend/resume) */ static void wait_for_serial(const char *side) { @@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, QTestMigrationState *state) } } +static void wait_for_suspend(QTestState *who, QTestMigrationState *state) +{ +if (state->suspend_me && !state->suspend_seen) { +qtest_qmp_eventwait(who, "SUSPEND"); +} +} + /* * It's tricky to use qemu's migration event capability with qtest, * events suddenly appearing confuse the qmp()/hmp() responses. @@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who) { uint64_t pass, prev_pass = 0, changes = 0; -while (changes < 2 && !src_state.stop_seen) { +while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) { usleep(1000); pass = get_migration_pass(who); changes += (pass != prev_pass); @@ -584,6 +591,12 @@ static void migrate_wait_for_dirty_mem(QTestState *from, usleep(1000 * 10); } while (qtest_readq(to, marker_address) != MAGIC_MARKER); + +/* If suspended, src only iterates once, and watch_byte may never change */ +if (src_state.suspend_me) { +return; +} + /* * Now ensure that already transferred bytes are * dirty again from the guest workload. Note the @@ -771,6 +784,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, dst_state = (QTestMigrationState) { }; src_state = (QTestMigrationState) { }; bootfile_create(tmpfs, args->suspend_me); +src_state.suspend_me = args->suspend_me; if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) { memory_size = "150M"; @@ -1717,6 +1731,7 @@ static void test_precopy_common(MigrateCommon *args) /* Wait for the first serial output from the source */ if (args->result == MIG_TEST_SUCCEED) { wait_for_serial("src_serial"); +wait_for_suspend(from, &src_state); } if (args->live) { @@ -1793,6 +1808,11 @@ static void test_precopy_common(MigrateCommon *args) wait_for_resume(to, &dst_state); +if (args->start.suspend_me) { +/* wakeup succeeds only if guest is suspended */ +qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}"); +} + wait_for_serial("dest_serial"); } @@ -1879,6 +1899,34 @@ static void test_precopy_unix_plain(void) test_precopy_common(&args); } +static void test_precopy_unix_suspend_live(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +MigrateCommon args = { +.listen_uri = uri, +.connect_uri = uri, +/* + * despite being live, the test is fast because the src + * suspends immediately. + */ +.live = true, +.start.suspend_me = true, +}; + +test_precopy_common(&args); +} + +static void test_precopy_unix_suspend_notlive(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +MigrateCommon args = { +.listen_uri = uri, +.connect_uri = uri, +.start.suspend_me = true, +}; + +test
[PATCH V7 10/12] tests/qtest: option to suspend during migration
Add an option to suspend the src in a-b-bootblock.S, which puts the guest in S3 state after one round of writing to memory. The option is enabled by poking a 1 into the suspend_me word in the boot block prior to starting the src vm. Generate symbol offsets in a-b-bootblock.h so that the suspend_me offset is known. Generate the bootblock for each test, because suspend_me may differ for each. Signed-off-by: Steve Sistare Acked-by: Peter Xu Reviewed-by: Fabiano Rosas --- tests/migration/i386/Makefile| 5 ++-- tests/migration/i386/a-b-bootblock.S | 50 +--- tests/migration/i386/a-b-bootblock.h | 26 +-- tests/qtest/migration-test.c | 12 ++--- 4 files changed, 77 insertions(+), 16 deletions(-) diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile index 5c03241..37a72ae 100644 --- a/tests/migration/i386/Makefile +++ b/tests/migration/i386/Makefile @@ -4,9 +4,10 @@ .PHONY: all clean all: a-b-bootblock.h -a-b-bootblock.h: x86.bootsect +a-b-bootblock.h: x86.bootsect x86.o echo "$$__note" > header.tmp xxd -i $< | sed -e 's/.*int.*//' >> header.tmp + nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp mv header.tmp $@ x86.bootsect: x86.boot @@ -16,7 +17,7 @@ x86.boot: x86.o $(CROSS_PREFIX)objcopy -O binary $< $@ x86.o: a-b-bootblock.S - $(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@ + $(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@ clean: @rm -rf *.boot *.o *.bootsect diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 6bb..6f39eb6 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -9,6 +9,23 @@ # # Author: dgilb...@redhat.com +#include "migration-test.h" + +#define ACPI_ENABLE 0xf1 +#define ACPI_PORT_SMI_CMD 0xb2 +#define ACPI_PM_BASE0x600 +#define PM1A_CNT_OFFSET 4 + +#define ACPI_SCI_ENABLE 0x0001 +#define ACPI_SLEEP_TYPE 0x0400 +#define ACPI_SLEEP_ENABLE 0x2000 +#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE) + +#define LOW_ADDRX86_TEST_MEM_START +#define HIGH_ADDR X86_TEST_MEM_END + +/* Save the suspended status at an address that is not written in the loop. */ +#define suspended (X86_TEST_MEM_START + 4) .code16 .org 0x7c00 @@ -35,8 +52,8 @@ start: # at 0x7c00 ? mov %eax,%ds # Start from 1MB -.set TEST_MEM_START, (1024*1024) -.set TEST_MEM_END, (100*1024*1024) +.set TEST_MEM_START, X86_TEST_MEM_START +.set TEST_MEM_END, X86_TEST_MEM_END mov $65,%ax mov $0x3f8,%dx @@ -69,7 +86,30 @@ innerloop: mov $0x3f8,%dx outb %al,%dx -jmp mainloop +# should this test suspend? +mov (suspend_me),%eax +cmp $0,%eax +je mainloop + +# are we waking after suspend? do not suspend again. +mov $suspended,%eax +mov (%eax),%eax +cmp $1,%eax +je mainloop + +# enable acpi +mov $ACPI_ENABLE,%al +outb %al,$ACPI_PORT_SMI_CMD + +# suspend to ram +mov $suspended,%eax +movl $1,(%eax) +mov $SLEEP,%ax +mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx +outw %ax,%dx +# not reached. The wakeup causes reset and restart at 0x7c00, and we +# do not save and restore registers as a real kernel would do. + # GDT magic from old (GPLv2) Grub startup.S .p2align2 /* force 4-byte alignment */ @@ -95,6 +135,10 @@ gdtdesc: .word 0x27/* limit */ .long gdt /* addr */ +/* test launcher can poke a 1 here to exercise suspend */ +suspend_me: +.int 0 + /* I'm a bootable disk */ .org 0x7dfe .byte 0x55 diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h index 5b52391..c83f871 100644 --- a/tests/migration/i386/a-b-bootblock.h +++ b/tests/migration/i386/a-b-bootblock.h @@ -4,7 +4,7 @@ * the header and the assembler differences in your patch submission. */ unsigned char x86_bootsect[] = { - 0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, + 0xfa, 0x0f, 0x01, 0x16, 0xb8, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02, 0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41, @@ -13,13 +13,13 @@ unsigned char x86_bootsect[] = { 0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba, - 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00
[PATCH 08/14] migration: preserve suspended runstate
A guest that is migrated in the suspended state automaticaly wakes and continues execution. This is wrong; the guest should end migration in the same state it started. The root cause is that the outgoing migration code automatically wakes the guest, then saves the RUNNING runstate in global_state_store(), hence the incoming migration code thinks the guest is running and continues the guest if autostart is true. On the outgoing side, delete the call to qemu_system_wakeup_request(). Now that vm_stop completely stops a vm in the suspended state (from the preceding patches), the existing call to vm_stop_force_state is sufficient to correctly migrate all vmstate. On the incoming side, call vm_start if the pre-migration state was running or suspended. For the latter, vm_start correctly restores the suspended state, and a future system_wakeup monitor request will cause the vm to resume running. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- migration/migration.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 28a34c9..52ba0bc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -603,7 +603,7 @@ static void process_incoming_migration_bh(void *opaque) */ if (!migrate_late_block_activate() || (autostart && (!global_state_received() || -global_state_get_runstate() == RUN_STATE_RUNNING))) { +runstate_is_live(global_state_get_runstate() { /* Make sure all file formats throw away their mutable metadata. * If we get an error here, just don't restart the VM yet. */ bdrv_activate_all(&local_err); @@ -627,7 +627,7 @@ static void process_incoming_migration_bh(void *opaque) dirty_bitmap_mig_before_vm_start(); if (!global_state_received() || -global_state_get_runstate() == RUN_STATE_RUNNING) { +runstate_is_live(global_state_get_runstate())) { if (autostart) { vm_start(); } else { @@ -2415,7 +2415,6 @@ static int postcopy_start(MigrationState *ms, Error **errp) migration_downtime_start(ms); -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); global_state_store(); ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); if (ret < 0) { @@ -2614,7 +2613,6 @@ static int migration_completion_precopy(MigrationState *s, qemu_mutex_lock_iothread(); migration_downtime_start(s); -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); s->vm_old_state = runstate_get(); global_state_store(); @@ -3135,7 +3133,7 @@ static void migration_iteration_finish(MigrationState *s) case MIGRATION_STATUS_FAILED: case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_CANCELLING: -if (s->vm_old_state == RUN_STATE_RUNNING) { +if (runstate_is_live(s->vm_old_state)) { if (!runstate_check(RUN_STATE_SHUTDOWN)) { vm_start(); } -- 1.8.3.1
[PATCH 13/14] tests/qtest: precopy migration with suspend
Add a test case to verify that the suspended state is handled correctly during live migration precopy. The test suspends the src, migrates, then wakes the dest. Signed-off-by: Steve Sistare --- tests/qtest/migration-helpers.c | 3 ++ tests/qtest/migration-helpers.h | 2 ++ tests/qtest/migration-test.c| 62 +++-- 3 files changed, 64 insertions(+), 3 deletions(-) diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index fd3b94e..37e8e81 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char *name, if (g_str_equal(name, "STOP")) { state->stop_seen = true; return true; +} else if (g_str_equal(name, "SUSPEND")) { +state->suspend_seen = true; +return true; } else if (g_str_equal(name, "RESUME")) { state->resume_seen = true; return true; diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h index 3d32699..b478549 100644 --- a/tests/qtest/migration-helpers.h +++ b/tests/qtest/migration-helpers.h @@ -18,6 +18,8 @@ typedef struct QTestMigrationState { bool stop_seen; bool resume_seen; +bool suspend_seen; +bool suspend_me; } QTestMigrationState; bool migrate_watch_for_events(QTestState *who, const char *name, diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 84c78e3..03da984 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -178,7 +178,7 @@ static void bootfile_delete(void) /* * Wait for some output in the serial output file, * we get an 'A' followed by an endless string of 'B's - * but on the destination we won't have the A. + * but on the destination we won't have the A (unless we enabled suspend/resume) */ static void wait_for_serial(const char *side) { @@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, QTestMigrationState *state) } } +static void wait_for_suspend(QTestState *who, QTestMigrationState *state) +{ +if (state->suspend_me && !state->suspend_seen) { +qtest_qmp_eventwait(who, "SUSPEND"); +} +} + /* * It's tricky to use qemu's migration event capability with qtest, * events suddenly appearing confuse the qmp()/hmp() responses. @@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who) { uint64_t pass, prev_pass = 0, changes = 0; -while (changes < 2 && !src_state.stop_seen) { +while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) { usleep(1000); pass = get_migration_pass(who); changes += (pass != prev_pass); @@ -584,6 +591,12 @@ static void migrate_wait_for_dirty_mem(QTestState *from, usleep(1000 * 10); } while (qtest_readq(to, marker_address) != MAGIC_MARKER); + +/* If suspended, src only iterates once, and watch_byte may never change */ +if (src_state.suspend_me) { +return; +} + /* * Now ensure that already transferred bytes are * dirty again from the guest workload. Note the @@ -771,6 +784,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, dst_state = (QTestMigrationState) { }; src_state = (QTestMigrationState) { }; bootfile_create(tmpfs, args->suspend_me); +src_state.suspend_me = args->suspend_me; if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) { memory_size = "150M"; @@ -1717,6 +1731,7 @@ static void test_precopy_common(MigrateCommon *args) /* Wait for the first serial output from the source */ if (args->result == MIG_TEST_SUCCEED) { wait_for_serial("src_serial"); +wait_for_suspend(from, &src_state); } if (args->live) { @@ -1793,6 +1808,11 @@ static void test_precopy_common(MigrateCommon *args) wait_for_resume(to, &dst_state); +if (args->start.suspend_me) { +/* wakeup succeeds only if guest is suspended */ +qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}"); +} + wait_for_serial("dest_serial"); } @@ -1879,6 +1899,34 @@ static void test_precopy_unix_plain(void) test_precopy_common(&args); } +static void test_precopy_unix_suspend_live(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +MigrateCommon args = { +.listen_uri = uri, +.connect_uri = uri, +/* + * despite being live, the test is fast because the src + * suspends immediately. + */ +.live = true, +.start.suspend_me = true, +}; + +test_precopy_common(&args); +} + +static void test_precopy_unix_suspend_notlive(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +MigrateCommon args = { +.listen_uri = uri, +.connect_uri = uri, +.start.suspend_me = true, +}; + +test
[PATCH 06/14] cpus: vm_resume
Define the vm_resume helper, for use in subsequent patches. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 9 + system/cpus.c | 9 + 2 files changed, 18 insertions(+) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 867e53c..315046b 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -53,6 +53,15 @@ void vm_start(void); * @step_pending: whether any of the CPUs is about to be single-stepped by gdb */ int vm_prepare_start(bool step_pending); + +/** + * vm_resume: If @state is a live state, start the vm and set the state, + * else just set the state. + * + * @state: the state to restore + */ +void vm_resume(RunState state); + int vm_stop(RunState state); int vm_stop_force_state(RunState state); int vm_shutdown(void); diff --git a/system/cpus.c b/system/cpus.c index 73b0e3e..45e4b3e 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -741,6 +741,15 @@ void vm_start(void) } } +void vm_resume(RunState state) +{ +if (runstate_is_live(state)) { +vm_start(); +} else { +runstate_set(state); +} +} + /* does a state transition even if the VM is already stopped, current state is forgotten forever */ int vm_stop_force_state(RunState state) -- 1.8.3.1
[PATCH V7 06/12] migration: preserve suspended runstate
A guest that is migrated in the suspended state automaticaly wakes and continues execution. This is wrong; the guest should end migration in the same state it started. The root cause is that the outgoing migration code automatically wakes the guest, then saves the RUNNING runstate in global_state_store(), hence the incoming migration code thinks the guest is running and continues the guest if autostart is true. On the outgoing side, delete the call to qemu_system_wakeup_request(). Now that vm_stop completely stops a vm in the suspended state (from the preceding patches), the existing call to vm_stop_force_state is sufficient to correctly migrate all vmstate. On the incoming side, call vm_start if the pre-migration state was running or suspended. For the latter, vm_start correctly restores the suspended state, and a future system_wakeup monitor request will cause the vm to resume running. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- migration/migration.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 3ce04b2..8124811 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -604,7 +604,7 @@ static void process_incoming_migration_bh(void *opaque) */ if (!migrate_late_block_activate() || (autostart && (!global_state_received() || -global_state_get_runstate() == RUN_STATE_RUNNING))) { +runstate_is_live(global_state_get_runstate() { /* Make sure all file formats throw away their mutable metadata. * If we get an error here, just don't restart the VM yet. */ bdrv_activate_all(&local_err); @@ -628,7 +628,7 @@ static void process_incoming_migration_bh(void *opaque) dirty_bitmap_mig_before_vm_start(); if (!global_state_received() || -global_state_get_runstate() == RUN_STATE_RUNNING) { +runstate_is_live(global_state_get_runstate())) { if (autostart) { vm_start(); } else { @@ -2416,7 +2416,6 @@ static int postcopy_start(MigrationState *ms, Error **errp) migration_downtime_start(ms); -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); global_state_store(); ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); if (ret < 0) { @@ -2615,7 +2614,6 @@ static int migration_completion_precopy(MigrationState *s, qemu_mutex_lock_iothread(); migration_downtime_start(s); -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); s->vm_old_state = runstate_get(); global_state_store(); @@ -3136,7 +3134,7 @@ static void migration_iteration_finish(MigrationState *s) case MIGRATION_STATUS_FAILED: case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_CANCELLING: -if (s->vm_old_state == RUN_STATE_RUNNING) { +if (runstate_is_live(s->vm_old_state)) { if (!runstate_check(RUN_STATE_SHUTDOWN)) { vm_start(); } -- 1.8.3.1
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
On Tue, 2023-12-05 at 16:28 -0600, Michael Roth wrote: > Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") > added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it > exposed a long-running bug in current KVM support for SEV-ES where the > kernel assumes that MSR_EFER_LMA will be set explicitly by the guest > kernel, in which case EFER write traps would result in KVM eventually > seeing MSR_EFER_LMA get set and recording it in such a way that it would > be subsequently visible when accessing it via KVM_GET_SREGS/etc. > > However, guests kernels currently rely on MSR_EFER_LMA getting set > automatically when MSR_EFER_LME is set and paging is enabled via > CR0_PG_MASK. As a result, the EFER write traps don't actually expose the > MSR_EFER_LMA even though it is set internally, and when QEMU > subsequently tries to pass this EFER value back to KVM via > KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL, > which is now considered fatal due to the aforementioned QEMU commit. > > This can be addressed by inferring the MSR_EFER_LMA bit being set when > paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure > the expected bits are all present in subsequent handling on the host > side. > > Ultimately, this handling will be implemented in the host kernel, but to > avoid breaking QEMU's SEV-ES support when using older host kernels, the > same handling can be done in QEMU just after fetching the register > values via KVM_GET_SREGS*. Implement that here. > > Cc: Paolo Bonzini > Cc: Marcelo Tosatti > Cc: Tom Lendacky > Cc: Akihiko Odaki > Cc: k...@vger.kernel.org > Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") > Signed-off-by: Michael Roth > --- > v2: > - Add handling for KVM_GET_SREGS, not just KVM_GET_SREGS2 > > target/i386/kvm/kvm.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 11b8177eff..8721c1bf8f 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -3610,6 +3610,7 @@ static int kvm_get_sregs(X86CPU *cpu) > { > CPUX86State *env = &cpu->env; > struct kvm_sregs sregs; > +target_ulong cr0_old; > int ret; > > ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS, &sregs); > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu) > env->gdt.limit = sregs.gdt.limit; > env->gdt.base = sregs.gdt.base; > > +cr0_old = env->cr[0]; > env->cr[0] = sregs.cr0; > env->cr[2] = sregs.cr2; > env->cr[3] = sregs.cr3; > env->cr[4] = sregs.cr4; > > env->efer = sregs.efer; > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) { > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) { > +env->efer |= MSR_EFER_LMA; > +} > +} I think that we should not check that CR0_PG has changed, and just blindly assume that if EFER.LME is set and CR0.PG is set, then EFER.LMA must be set as defined in x86 spec. Otherwise, suppose qemu calls kvm_get_sregs twice: First time it will work, but second time CR0.PG will match one that is stored in the env, and thus the workaround will not be executed, and instead we will revert back to wrong EFER value reported by the kernel. How about something like that: if (sev_es_enabled() && env->efer & MSR_EFER_LME && env->cr[0] & CR0_PG_MASK) { /* * Workaround KVM bug, because of which KVM might not be aware of the * fact that EFER.LMA was toggled by the hardware */ env->efer |= MSR_EFER_LMA; } Best regards, Maxim Levitsky > > /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run > */ > x86_update_hflags(env); > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu) > { > CPUX86State *env = &cpu->env; > struct kvm_sregs2 sregs; > +target_ulong cr0_old; > int i, ret; > > ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs); > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu) > env->gdt.limit = sregs.gdt.limit; > env->gdt.base = sregs.gdt.base; > > +cr0_old = env->cr[0]; > env->cr[0] = sregs.cr0; > env->cr[2] = sregs.cr2; > env->cr[3] = sregs.cr3; > env->cr[4] = sregs.cr4; > > env->efer = sregs.efer; > +if (sev_es_enabled() && env->efer & MSR_EFER_LME) { > +if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) { > +env->efer |= MSR_EFER_LMA; > +} > +} > > env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID; >
[PATCH V7 00/12] fix migration of suspended runstate
Migration of a guest in the suspended runstate is broken. The incoming migration code automatically tries to wake the guest, which is wrong; the guest should end migration in the same runstate it started. Further, after saving a snapshot in the suspended state and loading it, the vm_start fails. The runstate is RUNNING, but the guest is not. See the commit messages for the details. Changes in V2: * simplify "start on wakeup request" * fix postcopy, snapshot, and background migration * refactor fixes for each type of migration * explicitly handled suspended events and runstate in tests * add test for postcopy and background migration Changes in V3: * rebase to tip * fix hang in new function migrate_wait_for_dirty_mem Changes in V4: * rebase to tip * add patch for vm_prepare_start (thanks Peter) * add patch to preserve cpu ticks Changes in V5: * rebase to tip * added patches to completely stop vm in suspended state: cpus: refactor vm_stop cpus: stop vm in suspended state * added patch to partially resume vm in suspended state: cpus: start vm in suspended state * modified "preserve suspended ..." patches to use the above. * deleted patch "preserve cpu ticks if suspended". stop ticks in vm_stop_force_state instead. * deleted patch "add runstate function". defined new helper function migrate_new_runstate in "preserve suspended runstate" * Added some RB's, but removed other RB's because the patches changed. Changes in V6: * all vm_stop calls completely stop the suspended state * refactored and updated the "cpus" patches * simplified the "preserve suspended" patches * added patch "bootfile per vm" Changes in V7: * rebase to tip, add RB-s * fix backwards compatibility for global_state.vm_was_suspended * delete vm_prepare_start state argument, and rename patch "pass runstate to vm_prepare_start" to "check running not RUN_STATE_RUNNING" * drop patches: tests/qtest: bootfile per vm tests/qtest: background migration with suspend * rename runstate_is_started to runstate_is_live * move wait_for_suspend in tests Steve Sistare (12): cpus: vm_was_suspended cpus: stop vm in suspended runstate cpus: check running not RUN_STATE_RUNNING cpus: vm_resume migration: propagate suspended runstate migration: preserve suspended runstate migration: preserve suspended for snapshot migration: preserve suspended for bg_migration tests/qtest: migration events tests/qtest: option to suspend during migration tests/qtest: precopy migration with suspend tests/qtest: postcopy migration with suspend backends/tpm/tpm_emulator.c | 2 +- hw/usb/hcd-ehci.c| 2 +- hw/usb/redirect.c| 2 +- hw/xen/xen-hvm-common.c | 2 +- include/migration/snapshot.h | 7 ++ include/sysemu/runstate.h| 16 migration/global_state.c | 35 ++- migration/migration-hmp-cmds.c | 8 +- migration/migration.c| 15 +-- migration/savevm.c | 23 +++-- qapi/misc.json | 10 +- system/cpus.c| 47 +++-- system/runstate.c| 9 ++ system/vl.c | 2 + tests/migration/i386/Makefile| 5 +- tests/migration/i386/a-b-bootblock.S | 50 +- tests/migration/i386/a-b-bootblock.h | 26 +++-- tests/qtest/migration-helpers.c | 27 ++ tests/qtest/migration-helpers.h | 11 ++- tests/qtest/migration-test.c | 181 +-- 20 files changed, 354 insertions(+), 126 deletions(-) -- 1.8.3.1
[PATCH V7 10/12] tests/qtest: option to suspend during migration
Add an option to suspend the src in a-b-bootblock.S, which puts the guest in S3 state after one round of writing to memory. The option is enabled by poking a 1 into the suspend_me word in the boot block prior to starting the src vm. Generate symbol offsets in a-b-bootblock.h so that the suspend_me offset is known. Generate the bootblock for each test, because suspend_me may differ for each. Signed-off-by: Steve Sistare Acked-by: Peter Xu Reviewed-by: Fabiano Rosas --- tests/migration/i386/Makefile| 5 ++-- tests/migration/i386/a-b-bootblock.S | 50 +--- tests/migration/i386/a-b-bootblock.h | 26 +-- tests/qtest/migration-test.c | 12 ++--- 4 files changed, 77 insertions(+), 16 deletions(-) diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile index 5c03241..37a72ae 100644 --- a/tests/migration/i386/Makefile +++ b/tests/migration/i386/Makefile @@ -4,9 +4,10 @@ .PHONY: all clean all: a-b-bootblock.h -a-b-bootblock.h: x86.bootsect +a-b-bootblock.h: x86.bootsect x86.o echo "$$__note" > header.tmp xxd -i $< | sed -e 's/.*int.*//' >> header.tmp + nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp mv header.tmp $@ x86.bootsect: x86.boot @@ -16,7 +17,7 @@ x86.boot: x86.o $(CROSS_PREFIX)objcopy -O binary $< $@ x86.o: a-b-bootblock.S - $(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@ + $(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@ clean: @rm -rf *.boot *.o *.bootsect diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 6bb..6f39eb6 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -9,6 +9,23 @@ # # Author: dgilb...@redhat.com +#include "migration-test.h" + +#define ACPI_ENABLE 0xf1 +#define ACPI_PORT_SMI_CMD 0xb2 +#define ACPI_PM_BASE0x600 +#define PM1A_CNT_OFFSET 4 + +#define ACPI_SCI_ENABLE 0x0001 +#define ACPI_SLEEP_TYPE 0x0400 +#define ACPI_SLEEP_ENABLE 0x2000 +#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE) + +#define LOW_ADDRX86_TEST_MEM_START +#define HIGH_ADDR X86_TEST_MEM_END + +/* Save the suspended status at an address that is not written in the loop. */ +#define suspended (X86_TEST_MEM_START + 4) .code16 .org 0x7c00 @@ -35,8 +52,8 @@ start: # at 0x7c00 ? mov %eax,%ds # Start from 1MB -.set TEST_MEM_START, (1024*1024) -.set TEST_MEM_END, (100*1024*1024) +.set TEST_MEM_START, X86_TEST_MEM_START +.set TEST_MEM_END, X86_TEST_MEM_END mov $65,%ax mov $0x3f8,%dx @@ -69,7 +86,30 @@ innerloop: mov $0x3f8,%dx outb %al,%dx -jmp mainloop +# should this test suspend? +mov (suspend_me),%eax +cmp $0,%eax +je mainloop + +# are we waking after suspend? do not suspend again. +mov $suspended,%eax +mov (%eax),%eax +cmp $1,%eax +je mainloop + +# enable acpi +mov $ACPI_ENABLE,%al +outb %al,$ACPI_PORT_SMI_CMD + +# suspend to ram +mov $suspended,%eax +movl $1,(%eax) +mov $SLEEP,%ax +mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx +outw %ax,%dx +# not reached. The wakeup causes reset and restart at 0x7c00, and we +# do not save and restore registers as a real kernel would do. + # GDT magic from old (GPLv2) Grub startup.S .p2align2 /* force 4-byte alignment */ @@ -95,6 +135,10 @@ gdtdesc: .word 0x27/* limit */ .long gdt /* addr */ +/* test launcher can poke a 1 here to exercise suspend */ +suspend_me: +.int 0 + /* I'm a bootable disk */ .org 0x7dfe .byte 0x55 diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h index 5b52391..c83f871 100644 --- a/tests/migration/i386/a-b-bootblock.h +++ b/tests/migration/i386/a-b-bootblock.h @@ -4,7 +4,7 @@ * the header and the assembler differences in your patch submission. */ unsigned char x86_bootsect[] = { - 0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, + 0xfa, 0x0f, 0x01, 0x16, 0xb8, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02, 0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41, @@ -13,13 +13,13 @@ unsigned char x86_bootsect[] = { 0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba, - 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00
[PATCH V7 04/12] cpus: vm_resume
Define the vm_resume helper, for use in subsequent patches. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 9 + system/cpus.c | 9 + 2 files changed, 18 insertions(+) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 867e53c..315046b 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -53,6 +53,15 @@ void vm_start(void); * @step_pending: whether any of the CPUs is about to be single-stepped by gdb */ int vm_prepare_start(bool step_pending); + +/** + * vm_resume: If @state is a live state, start the vm and set the state, + * else just set the state. + * + * @state: the state to restore + */ +void vm_resume(RunState state); + int vm_stop(RunState state); int vm_stop_force_state(RunState state); int vm_shutdown(void); diff --git a/system/cpus.c b/system/cpus.c index f162435..7d2c28b 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -748,6 +748,15 @@ void vm_start(void) } } +void vm_resume(RunState state) +{ +if (runstate_is_live(state)) { +vm_start(); +} else { +runstate_set(state); +} +} + /* does a state transition even if the VM is already stopped, current state is forgotten forever */ int vm_stop_force_state(RunState state) -- 1.8.3.1
[PATCH V7 12/12] tests/qtest: postcopy migration with suspend
Add a test case to verify that the suspended state is handled correctly by live migration postcopy. The test suspends the src, migrates, then wakes the dest. Signed-off-by: Steve Sistare --- tests/qtest/migration-test.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index f57a978..b7c094e 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1347,6 +1347,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr, /* Wait for the first serial output from the source */ wait_for_serial("src_serial"); +wait_for_suspend(from, &src_state); g_autofree char *uri = migrate_get_socket_address(to, "socket-address"); migrate_qmp(from, uri, "{}"); @@ -1364,6 +1365,11 @@ static void migrate_postcopy_complete(QTestState *from, QTestState *to, { wait_for_migration_complete(from); +if (args->start.suspend_me) { +/* wakeup succeeds only if guest is suspended */ +qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}"); +} + /* Make sure we get at least one "B" on destination */ wait_for_serial("dest_serial"); @@ -1397,6 +1403,15 @@ static void test_postcopy(void) test_postcopy_common(&args); } +static void test_postcopy_suspend(void) +{ +MigrateCommon args = { +.start.suspend_me = true, +}; + +test_postcopy_common(&args); +} + static void test_postcopy_compress(void) { MigrateCommon args = { @@ -3412,7 +3427,10 @@ int main(int argc, char **argv) qtest_add_func("/migration/postcopy/recovery/double-failures", test_postcopy_recovery_double_fail); #endif /* _WIN32 */ - +if (is_x86) { +qtest_add_func("/migration/postcopy/suspend", + test_postcopy_suspend); +} } qtest_add_func("/migration/bad_dest", test_baddest); -- 1.8.3.1
[PATCH V7 07/12] migration: preserve suspended for snapshot
Restoring a snapshot can break a suspended guest. Snapshots suffer from the same suspended-state issues that affect live migration, plus they must handle an additional problematic scenario, which is that a running vm must remain running if it loads a suspended snapshot. To save, the existing vm_stop call now completely stops the suspended state. Finish with vm_resume to leave the vm in the state it had prior to the save, correctly restoring the suspended state. To load, if the snapshot is not suspended, then vm_stop + vm_resume correctly handles all states, and leaves the vm in the state it had prior to the load. However, if the snapshot is suspended, restoration is trickier. First, call vm_resume to restore the state to suspended so the current state matches the saved state. Then, if the pre-load state is running, call wakeup to resume running. Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and RUN_STATE_RESTORE_VM did not change runstate if the current state was suspended, but now it does, so allow these transitions. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/migration/snapshot.h | 7 +++ migration/migration-hmp-cmds.c | 8 +--- migration/savevm.c | 23 +-- system/runstate.c | 5 + system/vl.c| 2 ++ 5 files changed, 32 insertions(+), 13 deletions(-) diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h index e72083b..9e4dcaa 100644 --- a/include/migration/snapshot.h +++ b/include/migration/snapshot.h @@ -16,6 +16,7 @@ #define QEMU_MIGRATION_SNAPSHOT_H #include "qapi/qapi-builtin-types.h" +#include "qapi/qapi-types-run-state.h" /** * save_snapshot: Save an internal snapshot. @@ -61,4 +62,10 @@ bool delete_snapshot(const char *name, bool has_devices, strList *devices, Error **errp); +/** + * load_snapshot_resume: Restore runstate after loading snapshot. + * @state: state to restore + */ +void load_snapshot_resume(RunState state); + #endif diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 86ae832..c8d70bc 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -399,15 +399,17 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) void hmp_loadvm(Monitor *mon, const QDict *qdict) { -int saved_vm_running = runstate_is_running(); +RunState saved_state = runstate_get(); + const char *name = qdict_get_str(qdict, "name"); Error *err = NULL; vm_stop(RUN_STATE_RESTORE_VM); -if (load_snapshot(name, NULL, false, NULL, &err) && saved_vm_running) { -vm_start(); +if (load_snapshot(name, NULL, false, NULL, &err)) { +load_snapshot_resume(saved_state); } + hmp_handle_error(mon, err); } diff --git a/migration/savevm.c b/migration/savevm.c index eec5503..26866f0 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -3046,7 +3046,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, QEMUSnapshotInfo sn1, *sn = &sn1; int ret = -1, ret2; QEMUFile *f; -int saved_vm_running; +RunState saved_state = runstate_get(); uint64_t vm_state_size; g_autoptr(GDateTime) now = g_date_time_new_now_local(); AioContext *aio_context; @@ -3094,8 +3094,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, } aio_context = bdrv_get_aio_context(bs); -saved_vm_running = runstate_is_running(); - global_state_store(); vm_stop(RUN_STATE_SAVE_VM); @@ -3163,9 +3161,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, bdrv_drain_all_end(); -if (saved_vm_running) { -vm_start(); -} +vm_resume(saved_state); return ret == 0; } @@ -3339,6 +3335,14 @@ err_drain: return false; } +void load_snapshot_resume(RunState state) +{ +vm_resume(state); +if (state == RUN_STATE_RUNNING && runstate_get() == RUN_STATE_SUSPENDED) { +qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, &error_abort); +} +} + bool delete_snapshot(const char *name, bool has_devices, strList *devices, Error **errp) { @@ -3403,16 +3407,15 @@ static void snapshot_load_job_bh(void *opaque) { Job *job = opaque; SnapshotJob *s = container_of(job, SnapshotJob, common); -int orig_vm_running; +RunState orig_state = runstate_get(); job_progress_set_remaining(&s->common, 1); -orig_vm_running = runstate_is_running(); vm_stop(RUN_STATE_RESTORE_VM); s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp); -if (s->ret && orig_vm_running) { -vm_start(); +if (s->ret) { +load_snapshot_resume(orig_state); } job_progress_update(&s->common, 1); diff --git a/system/runstate.c b/system/runstate.c index e2fa204..ca9eb54 100644 --- a/system/runstate.c +++ b
[PATCH V7 02/12] cpus: stop vm in suspended runstate
Currently, a vm in the suspended state is not completely stopped. The VCPUs have been paused, but the cpu clock still runs, and runstate notifiers for the transition to stopped have not been called. This causes problems for live migration. Stale cpu timers_state is saved to the migration stream, causing time errors in the guest when it wakes from suspend, and state that would have been modified by runstate notifiers is wrong. Modify vm_stop to completely stop the vm if the current state is suspended, transition to RUN_STATE_PAUSED, and remember that the machine was suspended. Modify vm_start to restore the suspended state. This affects all callers of vm_stop and vm_start, notably, the qapi stop and cont commands. For example: (qemu) info status VM status: paused (suspended) (qemu) stop (qemu) info status VM status: paused (qemu) system_wakeup Error: Unable to wake up: guest is not in suspended state (qemu) cont (qemu) info status VM status: paused (suspended) (qemu) system_wakeup (qemu) info status VM status: running Suggested-by: Peter Xu Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 5 + qapi/misc.json| 10 -- system/cpus.c | 23 +++ system/runstate.c | 3 +++ 4 files changed, 31 insertions(+), 10 deletions(-) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 88a67e2..867e53c 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause cause) return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; } +static inline bool runstate_is_live(RunState state) +{ +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; +} + void vm_start(void); /** diff --git a/qapi/misc.json b/qapi/misc.json index cda2eff..efb8d44 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -134,7 +134,7 @@ ## # @stop: # -# Stop all guest VCPU execution. +# Stop all guest VCPU and VM execution. # # Since: 0.14 # @@ -143,6 +143,9 @@ # the guest remains paused once migration finishes, as if the -S # option was passed on the command line. # +# In the "suspended" state, it will completely stop the VM and +# cause a transition to the "paused" state. (Since 9.0) +# # Example: # # -> { "execute": "stop" } @@ -153,7 +156,7 @@ ## # @cont: # -# Resume guest VCPU execution. +# Resume guest VCPU and VM execution. # # Since: 0.14 # @@ -165,6 +168,9 @@ # guest starts once migration finishes, removing the effect of the # -S command line option if it was passed. # +# If the VM was previously suspended, and not been reset or woken, +# this command will transition back to the "suspended" state. (Since 9.0) +# # Example: # # -> { "execute": "cont" } diff --git a/system/cpus.c b/system/cpus.c index 9f631ab..f162435 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -277,11 +277,15 @@ bool vm_get_suspended(void) static int do_vm_stop(RunState state, bool send_stop) { int ret = 0; +RunState oldstate = runstate_get(); -if (runstate_is_running()) { +if (runstate_is_live(oldstate)) { +vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED); runstate_set(state); cpu_disable_ticks(); -pause_all_vcpus(); +if (oldstate == RUN_STATE_RUNNING) { +pause_all_vcpus(); +} vm_state_notify(0, state); if (send_stop) { qapi_event_send_stop(); @@ -694,11 +698,13 @@ int vm_stop(RunState state) /** * Prepare for (re)starting the VM. - * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already - * running or in case of an error condition), 0 otherwise. + * Returns 0 if the vCPUs should be restarted, -1 on an error condition, + * and 1 otherwise. */ int vm_prepare_start(bool step_pending) { +int ret = vm_was_suspended ? 1 : 0; +RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : RUN_STATE_RUNNING; RunState requested; qemu_vmstop_requested(&requested); @@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending) qapi_event_send_resume(); cpu_enable_ticks(); -runstate_set(RUN_STATE_RUNNING); -vm_state_notify(1, RUN_STATE_RUNNING); -return 0; +runstate_set(state); +vm_state_notify(1, state); +vm_was_suspended = false; +return ret; } void vm_start(void) @@ -745,7 +752,7 @@ void vm_start(void) current state is forgotten forever */ int vm_stop_force_state(RunState state) { -if (runstate_is_running()) { +if (runstate_is_live(runstate_get())) { return vm_stop(state); } else { int ret; diff --git a/system/runstate.c b/system/runstate.c index ea9d6c2..e2fa204 100644 --- a/system/runstate.c +++ b/system/runstate.c @@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] = { {
[PATCH V7 09/12] tests/qtest: migration events
Define a state object to capture events seen by migration tests, to allow more events to be captured in a subsequent patch, and simplify event checking in wait_for_migration_pass. No functional change. Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas Reviewed-by: Daniel P. Berrangé --- tests/qtest/migration-helpers.c | 24 --- tests/qtest/migration-helpers.h | 9 ++-- tests/qtest/migration-test.c| 91 +++-- 3 files changed, 56 insertions(+), 68 deletions(-) diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index 24fb7b3..fd3b94e 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -24,26 +24,16 @@ */ #define MIGRATION_STATUS_WAIT_TIMEOUT 120 -bool migrate_watch_for_stop(QTestState *who, const char *name, -QDict *event, void *opaque) -{ -bool *seen = opaque; - -if (g_str_equal(name, "STOP")) { -*seen = true; -return true; -} - -return false; -} - -bool migrate_watch_for_resume(QTestState *who, const char *name, +bool migrate_watch_for_events(QTestState *who, const char *name, QDict *event, void *opaque) { -bool *seen = opaque; +QTestMigrationState *state = opaque; -if (g_str_equal(name, "RESUME")) { -*seen = true; +if (g_str_equal(name, "STOP")) { +state->stop_seen = true; +return true; +} else if (g_str_equal(name, "RESUME")) { +state->resume_seen = true; return true; } diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h index e31dc85..3d32699 100644 --- a/tests/qtest/migration-helpers.h +++ b/tests/qtest/migration-helpers.h @@ -15,9 +15,12 @@ #include "libqtest.h" -bool migrate_watch_for_stop(QTestState *who, const char *name, -QDict *event, void *opaque); -bool migrate_watch_for_resume(QTestState *who, const char *name, +typedef struct QTestMigrationState { +bool stop_seen; +bool resume_seen; +} QTestMigrationState; + +bool migrate_watch_for_events(QTestState *who, const char *name, QDict *event, void *opaque); G_GNUC_PRINTF(3, 4) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 0fbaa6a..05c0740 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -43,8 +43,8 @@ unsigned start_address; unsigned end_address; static bool uffd_feature_thread_id; -static bool got_src_stop; -static bool got_dst_resume; +static QTestMigrationState src_state; +static QTestMigrationState dst_state; /* * An initial 3 MB offset is used as that corresponds @@ -230,6 +230,20 @@ static void wait_for_serial(const char *side) } while (true); } +static void wait_for_stop(QTestState *who, QTestMigrationState *state) +{ +if (!state->stop_seen) { +qtest_qmp_eventwait(who, "STOP"); +} +} + +static void wait_for_resume(QTestState *who, QTestMigrationState *state) +{ +if (!state->resume_seen) { +qtest_qmp_eventwait(who, "RESUME"); +} +} + /* * It's tricky to use qemu's migration event capability with qtest, * events suddenly appearing confuse the qmp()/hmp() responses. @@ -277,21 +291,19 @@ static void read_blocktime(QTestState *who) qobject_unref(rsp_return); } +/* + * Wait for two changes in the migration pass count, but bail if we stop. + */ static void wait_for_migration_pass(QTestState *who) { -uint64_t initial_pass = get_migration_pass(who); -uint64_t pass; +uint64_t pass, prev_pass = 0, changes = 0; -/* Wait for the 1st sync */ -while (!got_src_stop && !initial_pass) { -usleep(1000); -initial_pass = get_migration_pass(who); -} - -do { +while (changes < 2 && !src_state.stop_seen) { usleep(1000); pass = get_migration_pass(who); -} while (pass == initial_pass && !got_src_stop); +changes += (pass != prev_pass); +prev_pass = pass; +} } static void check_guests_ram(QTestState *who) @@ -617,10 +629,7 @@ static void migrate_postcopy_start(QTestState *from, QTestState *to) { qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }"); -if (!got_src_stop) { -qtest_qmp_eventwait(from, "STOP"); -} - +wait_for_stop(from, &src_state); qtest_qmp_eventwait(to, "RESUME"); } @@ -756,8 +765,8 @@ static int test_migrate_start(QTestState **from, QTestState **to, } } -got_src_stop = false; -got_dst_resume = false; +dst_state = (QTestMigrationState) { }; +src_state = (QTestMigrationState) { }; if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) { memory_size = "150M"; @@ -848,8 +857,8 @@ static int test_migrate_start(QTestState **from, QTestState **to, if (!args->only_target) { *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);
[PATCH V7 03/12] cpus: check running not RUN_STATE_RUNNING
When a vm transitions from running to suspended, runstate notifiers are not called, so the notifiers still think the vm is running. Hence, when we call vm_start to restore the suspended state, we call vm_state_notify with running=1. However, some notifiers check for RUN_STATE_RUNNING. They must check the running boolean instead. No functional change. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- backends/tpm/tpm_emulator.c | 2 +- hw/usb/hcd-ehci.c | 2 +- hw/usb/redirect.c | 2 +- hw/xen/xen-hvm-common.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c index f7f1b4a..254fce7 100644 --- a/backends/tpm/tpm_emulator.c +++ b/backends/tpm/tpm_emulator.c @@ -904,7 +904,7 @@ static void tpm_emulator_vm_state_change(void *opaque, bool running, trace_tpm_emulator_vm_state_change(running, state); -if (!running || state != RUN_STATE_RUNNING || !tpm_emu->relock_storage) { +if (!running || !tpm_emu->relock_storage) { return; } diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 19b4534..10c82ce 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -2451,7 +2451,7 @@ static void usb_ehci_vm_state_change(void *opaque, bool running, RunState state) * USB-devices which have async handled packages have a packet in the * ep queue to match the completion with. */ -if (state == RUN_STATE_RUNNING) { +if (running) { ehci_advance_async_state(ehci); } diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c index c9893df..3785bb0 100644 --- a/hw/usb/redirect.c +++ b/hw/usb/redirect.c @@ -1403,7 +1403,7 @@ static void usbredir_vm_state_change(void *priv, bool running, RunState state) { USBRedirDevice *dev = priv; -if (state == RUN_STATE_RUNNING && dev->parser != NULL) { +if (running && dev->parser != NULL) { usbredirparser_do_write(dev->parser); /* Flush any pending writes */ } } diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c index 565dc39..47e6cb1 100644 --- a/hw/xen/xen-hvm-common.c +++ b/hw/xen/xen-hvm-common.c @@ -623,7 +623,7 @@ void xen_hvm_change_state_handler(void *opaque, bool running, xen_set_ioreq_server_state(xen_domid, state->ioservid, - (rstate == RUN_STATE_RUNNING)); + running); } void xen_exit_notifier(Notifier *n, void *data) -- 1.8.3.1
[PATCH V7 11/12] tests/qtest: precopy migration with suspend
Add a test case to verify that the suspended state is handled correctly during live migration precopy. The test suspends the src, migrates, then wakes the dest. Signed-off-by: Steve Sistare --- tests/qtest/migration-helpers.c | 3 ++ tests/qtest/migration-helpers.h | 2 ++ tests/qtest/migration-test.c| 62 +++-- 3 files changed, 64 insertions(+), 3 deletions(-) diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index fd3b94e..37e8e81 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char *name, if (g_str_equal(name, "STOP")) { state->stop_seen = true; return true; +} else if (g_str_equal(name, "SUSPEND")) { +state->suspend_seen = true; +return true; } else if (g_str_equal(name, "RESUME")) { state->resume_seen = true; return true; diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h index 3d32699..b478549 100644 --- a/tests/qtest/migration-helpers.h +++ b/tests/qtest/migration-helpers.h @@ -18,6 +18,8 @@ typedef struct QTestMigrationState { bool stop_seen; bool resume_seen; +bool suspend_seen; +bool suspend_me; } QTestMigrationState; bool migrate_watch_for_events(QTestState *who, const char *name, diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index e10d5a4..f57a978 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -178,7 +178,7 @@ static void bootfile_delete(void) /* * Wait for some output in the serial output file, * we get an 'A' followed by an endless string of 'B's - * but on the destination we won't have the A. + * but on the destination we won't have the A (unless we enabled suspend/resume) */ static void wait_for_serial(const char *side) { @@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, QTestMigrationState *state) } } +static void wait_for_suspend(QTestState *who, QTestMigrationState *state) +{ +if (state->suspend_me && !state->suspend_seen) { +qtest_qmp_eventwait(who, "SUSPEND"); +} +} + /* * It's tricky to use qemu's migration event capability with qtest, * events suddenly appearing confuse the qmp()/hmp() responses. @@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who) { uint64_t pass, prev_pass = 0, changes = 0; -while (changes < 2 && !src_state.stop_seen) { +while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) { usleep(1000); pass = get_migration_pass(who); changes += (pass != prev_pass); @@ -584,6 +591,12 @@ static void migrate_wait_for_dirty_mem(QTestState *from, usleep(1000 * 10); } while (qtest_readq(to, marker_address) != MAGIC_MARKER); + +/* If suspended, src only iterates once, and watch_byte may never change */ +if (src_state.suspend_me) { +return; +} + /* * Now ensure that already transferred bytes are * dirty again from the guest workload. Note the @@ -771,6 +784,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, dst_state = (QTestMigrationState) { }; src_state = (QTestMigrationState) { }; bootfile_create(tmpfs, args->suspend_me); +src_state.suspend_me = args->suspend_me; if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) { memory_size = "150M"; @@ -1717,6 +1731,7 @@ static void test_precopy_common(MigrateCommon *args) /* Wait for the first serial output from the source */ if (args->result == MIG_TEST_SUCCEED) { wait_for_serial("src_serial"); +wait_for_suspend(from, &src_state); } if (args->live) { @@ -1793,6 +1808,11 @@ static void test_precopy_common(MigrateCommon *args) wait_for_resume(to, &dst_state); +if (args->start.suspend_me) { +/* wakeup succeeds only if guest is suspended */ +qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}"); +} + wait_for_serial("dest_serial"); } @@ -1879,6 +1899,34 @@ static void test_precopy_unix_plain(void) test_precopy_common(&args); } +static void test_precopy_unix_suspend_live(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +MigrateCommon args = { +.listen_uri = uri, +.connect_uri = uri, +/* + * despite being live, the test is fast because the src + * suspends immediately. + */ +.live = true, +.start.suspend_me = true, +}; + +test_precopy_common(&args); +} + +static void test_precopy_unix_suspend_notlive(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +MigrateCommon args = { +.listen_uri = uri, +.connect_uri = uri, +.start.suspend_me = true, +}; + +test
[PATCH V7 01/12] cpus: vm_was_suspended
Add a state variable to remember if a vm previously transitioned into a suspended state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- include/sysemu/runstate.h | 2 ++ system/cpus.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index c8c2bd8..88a67e2 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -51,6 +51,8 @@ int vm_prepare_start(bool step_pending); int vm_stop(RunState state); int vm_stop_force_state(RunState state); int vm_shutdown(void); +void vm_set_suspended(bool suspended); +bool vm_get_suspended(void); typedef enum WakeupReason { /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */ diff --git a/system/cpus.c b/system/cpus.c index a444a74..9f631ab 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -259,6 +259,21 @@ void cpu_interrupt(CPUState *cpu, int mask) } } +/* + * True if the vm was previously suspended, and has not been woken or reset. + */ +static int vm_was_suspended; + +void vm_set_suspended(bool suspended) +{ +vm_was_suspended = suspended; +} + +bool vm_get_suspended(void) +{ +return vm_was_suspended; +} + static int do_vm_stop(RunState state, bool send_stop) { int ret = 0; -- 1.8.3.1
[PATCH V7 06/12] migration: preserve suspended runstate
A guest that is migrated in the suspended state automaticaly wakes and continues execution. This is wrong; the guest should end migration in the same state it started. The root cause is that the outgoing migration code automatically wakes the guest, then saves the RUNNING runstate in global_state_store(), hence the incoming migration code thinks the guest is running and continues the guest if autostart is true. On the outgoing side, delete the call to qemu_system_wakeup_request(). Now that vm_stop completely stops a vm in the suspended state (from the preceding patches), the existing call to vm_stop_force_state is sufficient to correctly migrate all vmstate. On the incoming side, call vm_start if the pre-migration state was running or suspended. For the latter, vm_start correctly restores the suspended state, and a future system_wakeup monitor request will cause the vm to resume running. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu --- migration/migration.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 3ce04b2..8124811 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -604,7 +604,7 @@ static void process_incoming_migration_bh(void *opaque) */ if (!migrate_late_block_activate() || (autostart && (!global_state_received() || -global_state_get_runstate() == RUN_STATE_RUNNING))) { +runstate_is_live(global_state_get_runstate() { /* Make sure all file formats throw away their mutable metadata. * If we get an error here, just don't restart the VM yet. */ bdrv_activate_all(&local_err); @@ -628,7 +628,7 @@ static void process_incoming_migration_bh(void *opaque) dirty_bitmap_mig_before_vm_start(); if (!global_state_received() || -global_state_get_runstate() == RUN_STATE_RUNNING) { +runstate_is_live(global_state_get_runstate())) { if (autostart) { vm_start(); } else { @@ -2416,7 +2416,6 @@ static int postcopy_start(MigrationState *ms, Error **errp) migration_downtime_start(ms); -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); global_state_store(); ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); if (ret < 0) { @@ -2615,7 +2614,6 @@ static int migration_completion_precopy(MigrationState *s, qemu_mutex_lock_iothread(); migration_downtime_start(s); -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); s->vm_old_state = runstate_get(); global_state_store(); @@ -3136,7 +3134,7 @@ static void migration_iteration_finish(MigrationState *s) case MIGRATION_STATUS_FAILED: case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_CANCELLING: -if (s->vm_old_state == RUN_STATE_RUNNING) { +if (runstate_is_live(s->vm_old_state)) { if (!runstate_check(RUN_STATE_SHUTDOWN)) { vm_start(); } -- 1.8.3.1
[PATCH V7 05/12] migration: propagate suspended runstate
If the outgoing machine was previously suspended, propagate that to the incoming side via global_state, so a subsequent vm_start restores the suspended state. To maintain backward and forward compatibility, reclaim some space from the runstate member. Signed-off-by: Steve Sistare --- migration/global_state.c | 35 +-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/migration/global_state.c b/migration/global_state.c index 4e2a9d8..d4f61a1 100644 --- a/migration/global_state.c +++ b/migration/global_state.c @@ -22,7 +22,16 @@ typedef struct { uint32_t size; -uint8_t runstate[100]; + +/* + * runstate was 100 bytes, zero padded, but we trimmed it to add a + * few fields and maintain backwards compatibility. + */ +uint8_t runstate[32]; +uint8_t has_vm_was_suspended; +uint8_t vm_was_suspended; +uint8_t unused[66]; + RunState state; bool received; } GlobalState; @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state) assert(strlen(state_str) < sizeof(global_state.runstate)); strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate), state_str, '\0'); +global_state.has_vm_was_suspended = true; +global_state.vm_was_suspended = vm_get_suspended(); + +memset(global_state.unused, 0, sizeof(global_state.unused)); } void global_state_store(void) @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque) return true; } +/* If the suspended state must be remembered, it is needed */ + +if (vm_get_suspended()) { +return true; +} + /* If state is running or paused, it is not needed */ if (strcmp(runstate, "running") == 0 || @@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int version_id) Error *local_err = NULL; int r; char *runstate = (char *)s->runstate; +bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended; s->received = true; trace_migrate_global_state_post_load(runstate); @@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int version_id) sizeof(s->runstate)) == sizeof(s->runstate)) { /* * This condition should never happen during migration, because - * all runstate names are shorter than 100 bytes (the size of + * all runstate names are shorter than 32 bytes (the size of * s->runstate). However, a malicious stream could overflow * the qapi_enum_parse() call, so we force the last character * to a NUL byte. @@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int version_id) } s->state = r; +/* + * global_state is saved on the outgoing side before forcing a stopped + * state, so it may have saved state=suspended and vm_was_suspended=0. + * Now we are in a paused state, and when we later call vm_start, it must + * restore the suspended state, so we must set vm_was_suspended=1 here. + */ +vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED); + return 0; } @@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = { .fields = (VMStateField[]) { VMSTATE_UINT32(size, GlobalState), VMSTATE_BUFFER(runstate, GlobalState), +VMSTATE_UINT8(has_vm_was_suspended, GlobalState), +VMSTATE_UINT8(vm_was_suspended, GlobalState), +VMSTATE_BUFFER(unused, GlobalState), VMSTATE_END_OF_LIST() }, }; -- 1.8.3.1
[PATCH V7 08/12] migration: preserve suspended for bg_migration
Do not wake a suspended guest during bg_migration, and restore the prior state at finish rather than unconditionally running. Allow the additional state transitions that occur. Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu --- migration/migration.c | 7 +-- system/runstate.c | 1 + 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 8124811..2cc7e2a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3390,7 +3390,7 @@ static void bg_migration_vm_start_bh(void *opaque) qemu_bh_delete(s->vm_start_bh); s->vm_start_bh = NULL; -vm_start(); +vm_resume(s->vm_old_state); migration_downtime_end(s); } @@ -3462,11 +3462,6 @@ static void *bg_migration_thread(void *opaque) qemu_mutex_lock_iothread(); -/* - * If VM is currently in suspended state, then, to make a valid runstate - * transition in vm_stop_force_state() we need to wakeup it up. - */ -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); s->vm_old_state = runstate_get(); global_state_store(); diff --git a/system/runstate.c b/system/runstate.c index ca9eb54..621a023 100644 --- a/system/runstate.c +++ b/system/runstate.c @@ -168,6 +168,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED}, { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM }, { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM }, +{ RUN_STATE_SUSPENDED, RUN_STATE_SHUTDOWN }, { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -- 1.8.3.1