Re: [PATCH v2 5/9] block/io: expand in_flight inc/dec section: simple cases
27.04.2020 17:39, Vladimir Sementsov-Ogievskiy wrote: It's safer to expand in_flight request to start before enter to coroutine in synchronous wrappers, due to the following (theoretical) problem: Consider write. It's possible, that qemu_coroutine_enter only schedules execution, assume such case. Then we may possibly have the following: 1. Somehow check that we are not in drained section in outer code. 2. Call bdrv_pwritev(), assuming that it will increase in_flight, which will protect us from starting drained section. 3. It calls bdrv_prwv_co() -> bdrv_coroutine_enter() (not yet increased in_flight). 4. Assume coroutine not yet actually entered, only scheduled, and we go to some code, which starts drained section (as in_flight is zero). 5. Scheduled coroutine starts, and blindly increases in_flight, and we are in drained section with in_flight request. Signed-off-by: Vladimir Sementsov-Ogievskiy Very interesting: this patch breaks test-replication. It hangs: (gdb) thr a a bt Thread 2 (Thread 0x7eff256cd700 (LWP 2843)): #0 0x7eff2f5fd1fd in syscall () from /lib64/libc.so.6 #1 0x55af9a9a4f11 in qemu_futex_wait (f=0x55af9aa6f758 , val=4294967295) at /work/src/qemu/up-expand-bdrv-in_flight-bounds/include/qemu/futex.h:29 #2 0x55af9a9a50d5 in qemu_event_wait (ev=0x55af9aa6f758 ) at util/qemu-thread-posix.c:459 #3 0x55af9a9bd20d in call_rcu_thread (opaque=0x0) at util/rcu.c:260 #4 0x55af9a9a5288 in qemu_thread_start (args=0x55af9c4f1b80) at util/qemu-thread-posix.c:519 #5 0x7eff2f6d44c0 in start_thread () from /lib64/libpthread.so.0 #6 0x7eff2f602553 in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7eff25820a80 (LWP 2842)): #0 0x7eff2f5f7bd6 in ppoll () from /lib64/libc.so.6 #1 0x55af9a99e405 in qemu_poll_ns (fds=0x55af9c52a830, nfds=1, timeout=-1) at util/qemu-timer.c:335 #2 0x55af9a9a1cab in fdmon_poll_wait (ctx=0x55af9c526890, ready_list=0x7ffc73e8c5d0, timeout=-1) at util/fdmon-poll.c:79 #3 0x55af9a9a160c in aio_poll (ctx=0x55af9c526890, blocking=true) at util/aio-posix.c:600 #4 0x55af9a8f0bb0 in bdrv_do_drained_begin (bs=0x55af9c52a8d0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:429 #5 0x55af9a8f0c95 in bdrv_drained_begin (bs=0x55af9c52a8d0) at block/io.c:435 #6 0x55af9a8dc6a8 in blk_drain (blk=0x55af9c542c10) at block/block-backend.c:1681 #7 0x55af9a8da0b6 in blk_unref (blk=0x55af9c542c10) at block/block-backend.c:473 #8 0x55af9a8eb5e7 in mirror_exit_common (job=0x55af9c6c45c0) at block/mirror.c:667 #9 0x55af9a8eb9c1 in mirror_prepare (job=0x55af9c6c45c0) at block/mirror.c:765 #10 0x55af9a87cd65 in job_prepare (job=0x55af9c6c45c0) at job.c:781 #11 0x55af9a87b62a in job_txn_apply (job=0x55af9c6c45c0, fn=0x55af9a87cd28 ) at job.c:158 #12 0x55af9a87cdee in job_do_finalize (job=0x55af9c6c45c0) at job.c:798 #13 0x55af9a87cfb5 in job_completed_txn_success (job=0x55af9c6c45c0) at job.c:852 #14 0x55af9a87d055 in job_completed (job=0x55af9c6c45c0) at job.c:865 #15 0x55af9a87d0a8 in job_exit (opaque=0x55af9c6c45c0) at job.c:885 #16 0x55af9a99b981 in aio_bh_call (bh=0x55af9c547440) at util/async.c:136 #17 0x55af9a99ba8b in aio_bh_poll (ctx=0x55af9c526890) at util/async.c:164 #18 0x55af9a9a17ff in aio_poll (ctx=0x55af9c526890, blocking=true) at util/aio-posix.c:650 #19 0x55af9a8f7011 in bdrv_flush (bs=0x55af9c53b900) at block/io.c:3019 #20 0x55af9a874351 in bdrv_close (bs=0x55af9c53b900) at block.c:4252 #21 0x55af9a874ca3 in bdrv_delete (bs=0x55af9c53b900) at block.c:4498 #22 0x55af9a877862 in bdrv_unref (bs=0x55af9c53b900) at block.c:5866 #23 0x55af9a870837 in bdrv_root_unref_child (child=0x55af9c6c4430) at block.c:2684 #24 0x55af9a8da9a2 in blk_remove_bs (blk=0x55af9c547bd0) at block/block-backend.c:803 #25 0x55af9a8d9e54 in blk_delete (blk=0x55af9c547bd0) at block/block-backend.c:422 #26 0x55af9a8da0f8 in blk_unref (blk=0x55af9c547bd0) at block/block-backend.c:477 #27 0x55af9a86a6f1 in teardown_secondary () at tests/test-replication.c:392 #28 0x55af9a86aac1 in test_secondary_stop () at tests/test-replication.c:490 #29 0x7eff2fd7df7e in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #30 0x7eff2fd7dd24 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #31 0x7eff2fd7dd24 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #32 0x7eff2fd7e46a in g_test_run_suite () from /lib64/libglib-2.0.so.0 #33 0x7eff2fd7e485 in g_test_run () from /lib64/libglib-2.0.so.0 #34 0x55af9a86b19c in main (argc=1, argv=0x7ffc73e8d088) at tests/test-replication.c:645 (gdb) p ((BlockBackend *)0x55af9c547bd0)->in_flight $5 = 0 (gdb) p ((BlockBackend *)0x55af9c542c10)->in_flight $6 = 0 (gdb) p ((BlockDriverState *)0x55af9c53b900)->in_flight $7 = 1 (gdb) p ((BlockDriverState *)0x55af9c52a8d0)->in_flight $8 = 0 (gdb) fr 20 #20 0x55af9a874351 in bdrv
Re: [PATCH v2 04/18] qom: Simplify object_property_get_enum()
Philippe Mathieu-Daudé writes: > On 5/5/20 5:29 PM, Markus Armbruster wrote: >> Reuse object_property_get_str(). Switches from the string to the >> qobject visitor under the hood. >> >> Signed-off-by: Markus Armbruster >> --- >> qom/object.c | 11 ++- >> 1 file changed, 2 insertions(+), 9 deletions(-) >> >> diff --git a/qom/object.c b/qom/object.c >> index 3d65658059..b374af302c 100644 >> --- a/qom/object.c >> +++ b/qom/object.c >> @@ -1521,8 +1521,6 @@ typedef struct EnumProperty { >> int object_property_get_enum(Object *obj, const char *name, >>const char *typename, Error **errp) >> { >> -Error *err = NULL; >> -Visitor *v; >> char *str; >> int ret; >> ObjectProperty *prop = object_property_find(obj, name, errp); >> @@ -1541,15 +1539,10 @@ int object_property_get_enum(Object *obj, const char >> *name, >> enumprop = prop->opaque; >> -v = string_output_visitor_new(false, &str); >> -object_property_get(obj, v, name, &err); >> -if (err) { >> -error_propagate(errp, err); >> -visit_free(v); >> +str = object_property_get_str(obj, name, errp); >> +if (!str) { > > Patch looks good but I'm not confident enough to add a R-b tag :) Teaching opportunity! >> return 0; >> } >> -visit_complete(v, &str); >> -visit_free(v); >> ret = qapi_enum_parse(enumprop->lookup, str, -1, errp); >> g_free(str); >> The core function for getting properties is object_property_get(). To be used like this: v = ... new output visitor of your choice ... object_property_get(obj, v, name, &err); if (!err) { visit_complete(v, &ret); } visit_free(v); Delivers the result in @ret and @err. The type of @ret depends on the visitor. It typically needs to be converted to the appropriate C type. Life's too short to write that much code every time you want to get a property value. So we provide two levels of common helpers. Level 1: the output visitor commonly used is the QObject output visitor. Combining object_property_get() with it yields QObject *object_property_get_qobject(Object *obj, const char *name, Error **errp) { QObject *ret = NULL; Error *local_err = NULL; Visitor *v; v = qobject_output_visitor_new(&ret); object_property_get(obj, v, name, &local_err); if (!local_err) { visit_complete(v, &ret); } error_propagate(errp, local_err); visit_free(v); return ret; } The use I showed above becomes ret = object_property_get_qobject(obj, name, &err); Again, result is in @ret and @err. You commonly need to convert @ret from QObject to the property's C type, and handle conversion errors. Still too much code, so we provide convenience functions for common types. Here's the one for strings: char *object_property_get_str(Object *obj, const char *name, Error **errp) { QObject *ret = object_property_get_qobject(obj, name, errp); char *retval; if (!ret) { return NULL; } retval = g_strdup(qobject_get_try_str(ret)); if (!retval) { error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name, "string"); } qobject_unref(ret); return retval; } Now back to my patch. Before the patch, object_property_get_enum() is odd: it uses the string output visitor. Works, but why do it by hand when we can simply reuse existing object_property_get_str()? All we need is the (checked) conversion from string to enum. Clearer now? Bonus: one fewer use of a string visitor. These need to die, but that's another story.
Re: [PATCH v2 05/18] qom: Drop convenience method object_property_get_uint16List()
Paolo Bonzini writes: > On 05/05/20 17:29, Markus Armbruster wrote: >> qom/object.c provides object_property_get_TYPE() and >> object_property_set_TYPE() for a number of common types. These are >> all convenience wrappers around object_property_get_qobject() and >> object_property_set_qobject(). >> >> Except for object_property_get_uint16List(), which is unusual in two ways: >> >> * It bypasses object_property_get_qobject(). Fixable; the previous >> commit did it for object_property_get_enum()) >> >> * It stores the value through a parameter. Its contract claims it >> returns the value, like the other functions do. Also fixable. >> >> Fixing is not worthwhile, though: object_property_get_uint16List() has >> seen exactly one user in six years. >> >> Convert the lone user to do its job with the generic >> object_property_get_qobject(), and drop object_property_get_qobject(). > > Typo, otherwise Will fix. > Reviewed-by: Paolo Bonzini Thanks!
Re: [PATCH v2 00/13] microvm: add acpi support
On Tue, May 05, 2020 at 04:16:00PM +0200, Philippe Mathieu-Daudé wrote: > On 5/5/20 4:04 PM, Michael S. Tsirkin wrote: > > On Tue, May 05, 2020 at 03:42:52PM +0200, Gerd Hoffmann wrote: > > > I know that not supporting ACPI in microvm is intentional. If you still > > > don't want ACPI this is perfectly fine, you can use the usual -no-acpi > > > switch to toggle ACPI support. > > > > > > These are the advantages you are going to loose then: > > > > > >(1) virtio-mmio device discovery without command line hacks (tweaking > > >the command line is a problem when not using direct kernel boot). > > >(2) Better IO-APIC support, we can use IRQ lines 16-23. > > >(3) ACPI power button (aka powerdown request) works. > > >(4) machine poweroff (aka S5 state) works. > > > > Questions > > > > - what's the tradeoff in startup time? > > - what should be the default? > > > > Based on above I'd be inclined to say default should stay no acpi and > > users should enable acpi with an option. > > As this machine was added to have the least minimum hardware required, I'd > keep the default with no ACPI and have user requiring it to use an option. > My 2 cents obviously. I also share this opinion. And I would prefer it to be a machine type option, defaulting to "off", but I guess that's a matter of taste. Sergio. > > > > > Together with seabios patches for virtio-mmio support this allows to > > > boot standard fedora images (cloud, coreos, workstation live) with the > > > microvm machine type. > > > > > > git branch for testing (including updated seabios): > > > https://git.kraxel.org/cgit/qemu/log/?h=sirius/microvm > > > > > > changes in v2: > > >* some acpi cleanups are an separate patch series now. > > >* switched to hw reduced acpi & generic event device. > > >* misc fixes here and there. > > > > > > cheers, > > >Gerd > > > > > > Gerd Hoffmann (13): > > >acpi: make build_madt() more generic. > > >acpi: create acpi-common.c and move madt code > > >acpi: madt: skip pci override on pci-less systems (microvm) > > >acpi: move acpi_build_facs to acpi-common.c > > >acpi: move acpi_init_common_fadt_data to acpi-common.c > > >acpi: move acpi_align_size to acpi-common.h > > >acpi: fadt: add hw-reduced sleep register support > > >acpi: generic event device for x86 > > >microvm: add minimal acpi support > > >microvm: disable virtio-mmio cmdline hack > > >microvm: add acpi_dsdt_add_virtio() for x86 > > >microvm: make virtio irq base runtime configurable > > >microvm/acpi: use GSI 16-23 for virtio > > > > > > hw/i386/acpi-common.h | 38 > > > hw/i386/acpi-microvm.h | 6 + > > > include/hw/acpi/acpi-defs.h| 2 + > > > include/hw/acpi/generic_event_device.h | 10 + > > > include/hw/i386/microvm.h | 10 +- > > > hw/acpi/aml-build.c| 4 +- > > > hw/i386/acpi-build.c | 198 +--- > > > hw/i386/acpi-common.c | 206 > > > hw/i386/acpi-microvm.c | 249 + > > > hw/i386/generic_event_device_x86.c | 114 +++ > > > hw/i386/microvm.c | 36 +++- > > > hw/i386/Kconfig| 1 + > > > hw/i386/Makefile.objs | 3 + > > > 13 files changed, 676 insertions(+), 201 deletions(-) > > > create mode 100644 hw/i386/acpi-common.h > > > create mode 100644 hw/i386/acpi-microvm.h > > > create mode 100644 hw/i386/acpi-common.c > > > create mode 100644 hw/i386/acpi-microvm.c > > > create mode 100644 hw/i386/generic_event_device_x86.c > > > > > > -- > > > 2.18.4 > > > > > signature.asc Description: PGP signature
Re: [PATCH v2 11/13] microvm: add acpi_dsdt_add_virtio() for x86
On Tue, May 05, 2020 at 03:43:03PM +0200, Gerd Hoffmann wrote: > Makes x86 linux kernel find virtio-mmio devices automatically. > > Signed-off-by: Gerd Hoffmann > --- > hw/i386/acpi-microvm.c | 51 ++ > 1 file changed, 51 insertions(+) Reviewed-by: Sergio Lopez > diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c > index ce5ab86d642c..4d91ac9360ce 100644 > --- a/hw/i386/acpi-microvm.c > +++ b/hw/i386/acpi-microvm.c > @@ -21,6 +21,7 @@ > */ > > #include "qemu/osdep.h" > +#include "qemu/cutils.h" > #include "qapi/error.h" > > #include "exec/memory.h" > @@ -32,6 +33,7 @@ > #include "hw/boards.h" > #include "hw/i386/fw_cfg.h" > #include "hw/i386/microvm.h" > +#include "hw/virtio/virtio-mmio.h" > > #include "acpi-common.h" > #include "acpi-microvm.h" > @@ -45,6 +47,54 @@ static void acpi_dsdt_add_power_button(Aml *scope) > aml_append(scope, dev); > } > > +static void acpi_dsdt_add_virtio(Aml *scope) > +{ > +gchar *separator; > +long int index; > +BusState *bus; > +BusChild *kid; > + > +bus = sysbus_get_default(); > +QTAILQ_FOREACH(kid, &bus->children, sibling) { > +DeviceState *dev = kid->child; > +Object *obj = object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MMIO); > + > +if (obj) { > +VirtIOMMIOProxy *mmio = VIRTIO_MMIO(obj); > +VirtioBusState *mmio_virtio_bus = &mmio->bus; > +BusState *mmio_bus = &mmio_virtio_bus->parent_obj; > + > +if (QTAILQ_EMPTY(&mmio_bus->children)) { > +continue; > +} > +separator = g_strrstr(mmio_bus->name, "."); > +if (!separator) { > +continue; > +} > +if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) { > +continue; > +} > + > +uint32_t irq = VIRTIO_IRQ_BASE + index; > +hwaddr base = VIRTIO_MMIO_BASE + index * 512; > +hwaddr size = 512; > + > +Aml *dev = aml_device("VR%02u", (unsigned)index); > +aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0005"))); > +aml_append(dev, aml_name_decl("_UID", aml_int(index))); > +aml_append(dev, aml_name_decl("_CCA", aml_int(1))); > + > +Aml *crs = aml_resource_template(); > +aml_append(crs, aml_memory32_fixed(base, size, AML_READ_WRITE)); > +aml_append(crs, > + aml_interrupt(AML_CONSUMER, AML_LEVEL, > AML_ACTIVE_HIGH, > + AML_EXCLUSIVE, &irq, 1)); > +aml_append(dev, aml_name_decl("_CRS", crs)); > +aml_append(scope, dev); > +} > +} > +} > + > static void > build_dsdt_microvm(GArray *table_data, BIOSLinker *linker, > MicrovmMachineState *mms) > @@ -69,6 +119,7 @@ build_dsdt_microvm(GArray *table_data, BIOSLinker *linker, > build_ged_aml(sb_scope, GED_DEVICE, HOTPLUG_HANDLER(mms->acpi_dev), >GED_MMIO_IRQ, AML_SYSTEM_MEMORY, GED_MMIO_BASE); > acpi_dsdt_add_power_button(sb_scope); > +acpi_dsdt_add_virtio(sb_scope); > aml_append(dsdt, sb_scope); > > scope = aml_scope("\\"); > -- > 2.18.4 > signature.asc Description: PGP signature
Re: [PATCH v2 13/13] microvm/acpi: use GSI 16-23 for virtio
On Tue, May 05, 2020 at 03:43:05PM +0200, Gerd Hoffmann wrote: > With ACPI enabled and IO-APIC being properly declared in the ACPI tables > we can use interrupt lines 16-23 for virtio and avoid shared interrupts. > > With acpi disabled we continue to use lines 8-15. > > Signed-off-by: Gerd Hoffmann > --- > hw/i386/microvm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Sergio Lopez > diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c > index 2aa2804e4ca0..08ed2a17f2ca 100644 > --- a/hw/i386/microvm.c > +++ b/hw/i386/microvm.c > @@ -124,7 +124,7 @@ static void microvm_devices_init(MicrovmMachineState *mms) > > kvmclock_create(); > > -mms->virtio_irq_base = 8; > +mms->virtio_irq_base = x86_machine_is_acpi_enabled(x86ms) ? 16 : 8; > for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) { > sysbus_create_simple("virtio-mmio", > VIRTIO_MMIO_BASE + i * 512, > -- > 2.18.4 > signature.asc Description: PGP signature
Re: [PATCH v2 12/13] microvm: make virtio irq base runtime configurable
On Tue, May 05, 2020 at 03:43:04PM +0200, Gerd Hoffmann wrote: > Signed-off-by: Gerd Hoffmann > --- > include/hw/i386/microvm.h | 2 +- > hw/i386/acpi-microvm.c| 6 +++--- > hw/i386/microvm.c | 11 +++ > 3 files changed, 11 insertions(+), 8 deletions(-) Reviewed-by: Sergio Lopez > diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h > index 55f5984cfaa1..878d2a8011f4 100644 > --- a/include/hw/i386/microvm.h > +++ b/include/hw/i386/microvm.h > @@ -28,7 +28,6 @@ > > /* Platform virtio definitions */ > #define VIRTIO_MMIO_BASE 0xc000 > -#define VIRTIO_IRQ_BASE 5 > #define VIRTIO_NUM_TRANSPORTS 8 > #define VIRTIO_CMDLINE_MAXLEN 64 > > @@ -63,6 +62,7 @@ typedef struct { > bool auto_kernel_cmdline; > > /* Machine state */ > +uint32_t virtio_irq_base; > bool kernel_cmdline_fixed; > Notifier machine_done; > AcpiDeviceIf *acpi_dev; > diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c > index 4d91ac9360ce..1230080c45cd 100644 > --- a/hw/i386/acpi-microvm.c > +++ b/hw/i386/acpi-microvm.c > @@ -47,7 +47,7 @@ static void acpi_dsdt_add_power_button(Aml *scope) > aml_append(scope, dev); > } > > -static void acpi_dsdt_add_virtio(Aml *scope) > +static void acpi_dsdt_add_virtio(Aml *scope, MicrovmMachineState *mms) > { > gchar *separator; > long int index; > @@ -75,7 +75,7 @@ static void acpi_dsdt_add_virtio(Aml *scope) > continue; > } > > -uint32_t irq = VIRTIO_IRQ_BASE + index; > +uint32_t irq = mms->virtio_irq_base + index; > hwaddr base = VIRTIO_MMIO_BASE + index * 512; > hwaddr size = 512; > > @@ -119,7 +119,7 @@ build_dsdt_microvm(GArray *table_data, BIOSLinker *linker, > build_ged_aml(sb_scope, GED_DEVICE, HOTPLUG_HANDLER(mms->acpi_dev), >GED_MMIO_IRQ, AML_SYSTEM_MEMORY, GED_MMIO_BASE); > acpi_dsdt_add_power_button(sb_scope); > -acpi_dsdt_add_virtio(sb_scope); > +acpi_dsdt_add_virtio(sb_scope, mms); > aml_append(dsdt, sb_scope); > > scope = aml_scope("\\"); > diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c > index a3708fdf1e39..2aa2804e4ca0 100644 > --- a/hw/i386/microvm.c > +++ b/hw/i386/microvm.c > @@ -124,10 +124,11 @@ static void microvm_devices_init(MicrovmMachineState > *mms) > > kvmclock_create(); > > +mms->virtio_irq_base = 8; > for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) { > sysbus_create_simple("virtio-mmio", > VIRTIO_MMIO_BASE + i * 512, > - x86ms->gsi[VIRTIO_IRQ_BASE + i]); > + x86ms->gsi[mms->virtio_irq_base + i]); > } > > /* Optional and legacy devices */ > @@ -274,7 +275,7 @@ static void microvm_memory_init(MicrovmMachineState *mms) > x86ms->ioapic_as = &address_space_memory; > } > > -static gchar *microvm_get_mmio_cmdline(gchar *name) > +static gchar *microvm_get_mmio_cmdline(gchar *name, uint32_t virtio_irq_base) > { > gchar *cmdline; > gchar *separator; > @@ -294,7 +295,7 @@ static gchar *microvm_get_mmio_cmdline(gchar *name) > ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN, > " virtio_mmio.device=512@0x%lx:%ld", > VIRTIO_MMIO_BASE + index * 512, > - VIRTIO_IRQ_BASE + index); > + virtio_irq_base + index); > if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) { > g_free(cmdline); > return NULL; > @@ -306,6 +307,7 @@ static gchar *microvm_get_mmio_cmdline(gchar *name) > static void microvm_fix_kernel_cmdline(MachineState *machine) > { > X86MachineState *x86ms = X86_MACHINE(machine); > +MicrovmMachineState *mms = MICROVM_MACHINE(machine); > BusState *bus; > BusChild *kid; > char *cmdline; > @@ -329,7 +331,8 @@ static void microvm_fix_kernel_cmdline(MachineState > *machine) > BusState *mmio_bus = &mmio_virtio_bus->parent_obj; > > if (!QTAILQ_EMPTY(&mmio_bus->children)) { > -gchar *mmio_cmdline = > microvm_get_mmio_cmdline(mmio_bus->name); > +gchar *mmio_cmdline = microvm_get_mmio_cmdline > +(mmio_bus->name, mms->virtio_irq_base); > if (mmio_cmdline) { > char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, > NULL); > g_free(mmio_cmdline); > -- > 2.18.4 > signature.asc Description: PGP signature
[PATCH] icount: fix shift=auto for record/replay
This patch fixes shift=auto when record/replay is enabled. Now user does not need to guess the best shift value. Signed-off-by: Pavel Dovgalyuk --- cpus.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 5670c96bcf..dfb9f4717f 100644 --- a/cpus.c +++ b/cpus.c @@ -379,7 +379,8 @@ static void icount_adjust(void) seqlock_write_lock(&timers_state.vm_clock_seqlock, &timers_state.vm_clock_lock); -cur_time = cpu_get_clock_locked(); +cur_time = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT, + cpu_get_clock_locked()); cur_icount = cpu_get_icount_locked(); delta = cur_icount - cur_time; @@ -685,6 +686,7 @@ static const VMStateDescription icount_vmstate_timers = { .fields = (VMStateField[]) { VMSTATE_INT64(qemu_icount_bias, TimersState), VMSTATE_INT64(qemu_icount, TimersState), +VMSTATE_INT16(icount_time_shift, TimersState), VMSTATE_END_OF_LIST() }, .subsections = (const VMStateDescription*[]) {
Re: [PATCH v23 0/4] implement zstd cluster compression method
On 05.05.2020 15:03, Max Reitz wrote: On 05.05.20 12:26, Max Reitz wrote: On 30.04.20 12:19, Denis Plotnikov wrote: v23: Undecided: whether to add zstd(zlib) compression details to the qcow2 spec 03: tighten assertion on zstd decompression [Eric] 04: use _rm_test_img appropriately [Max] Thanks, applied to my block branch: I’m afraid I have to unqueue this series again, because it makes many iotests fail due to an additional “compression_type=zlib” output when images are created, an additional “compression type” line in qemu-img info output where format-specific information is not suppressed, and an additional line in qemu-img create -f qcow2 -o help. Max Hmm, this is strange. I made some modifications for the tests in 0001 of the series (qcow2: introduce compression type feature). Among the other test related changes, the patch contains the hunk: +++ b/tests/qemu-iotests/common.filter @@ -152,7 +152,8 @@ _filter_img_create() -e "s# refcount_bits=[0-9]\\+##g" \ -e "s# key-secret=[a-zA-Z0-9]\\+##g" \ -e "s# iter-time=[0-9]\\+##g" \ - -e "s# force_size=\\(on\\|off\\)##g" + -e "s# force_size=\\(on\\|off\\)##g" \ + -e "s# compression_type=[a-zA-Z0-9]\\+##g" } which has to filter out "compression_type" on image creation. But you say that you can see the "compression_type" on the image creation. May be the patch wasn't fully applied? Or the test related modification were omitted? I've just re-based the series on top of: 681b07f4e76dbb700c16918d (vanilla/master, mainstream) Merge: a2261b2754 714eb0dbc5 Author: Peter Maydell Date: Tue May 5 15:47:44 2020 +0100 and run the tests with 'make check-block' and got the following: Not run: 071 099 184 220 259 267 Some cases not run in: 030 040 041 Passed all 113 iotests May be I do something wrong? Denis
Re: [PATCH v2 3/4] backup: Make sure that source and target size match
Am 06.05.2020 um 08:07 hat Vladimir Sementsov-Ogievskiy geschrieben: > 05.05.2020 13:03, Kevin Wolf wrote: > > Am 30.04.2020 um 20:21 hat Vladimir Sementsov-Ogievskiy geschrieben: > > > 30.04.2020 17:27, Kevin Wolf wrote: > > > > Since the introduction of a backup filter node in commit 00e30f05d, the > > > > backup block job crashes when the target image is smaller than the > > > > source image because it will try to write after the end of the target > > > > node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer > > > > would have caught this and errored out gracefully.) > > > > > > > > We can fix this and even do better than the old behaviour: Check that > > > > source and target have the same image size at the start of the block job > > > > and unshare BLK_PERM_RESIZE. (This permission was already unshared > > > > before the same commit 00e30f05d, but the BlockBackend that was used to > > > > make the restriction was removed without a replacement.) This will > > > > immediately error out when starting the job instead of only when writing > > > > to a block that doesn't exist in the target. > > > > > > > > Longer target than source would technically work because we would never > > > > write to blocks that don't exist, but semantically these are invalid, > > > > too, because a backup is supposed to create a copy, not just an image > > > > that starts with a copy. > > > > > > > > Fixes: 00e30f05de1d19586345ec373970ef4c192c6270 > > > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593 > > > > Cc: qemu-sta...@nongnu.org > > > > Signed-off-by: Kevin Wolf > > > > > > I'm OK with it as is, as it fixes bug: > > > > > > Reviewed-by: Vladimir Sementsov-Ogievskiy > > > > > > still, some notes below > > > > > > > > > > --- > > > >block/backup-top.c | 14 +- > > > >block/backup.c | 14 +- > > > >2 files changed, 22 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/block/backup-top.c b/block/backup-top.c > > > > index 3b50c06e2c..79b268e6dc 100644 > > > > --- a/block/backup-top.c > > > > +++ b/block/backup-top.c > > > > @@ -148,8 +148,10 @@ static void backup_top_child_perm(BlockDriverState > > > > *bs, BdrvChild *c, > > > > * > > > > * Share write to target (child_file), to not interfere > > > > * with guest writes to its disk which may be in target > > > > backing chain. > > > > + * Can't resize during a backup block job because we check the > > > > size > > > > + * only upfront. > > > > */ > > > > -*nshared = BLK_PERM_ALL; > > > > +*nshared = BLK_PERM_ALL & ~BLK_PERM_RESIZE; > > > >*nperm = BLK_PERM_WRITE; > > > >} else { > > > >/* Source child */ > > > > @@ -159,7 +161,7 @@ static void backup_top_child_perm(BlockDriverState > > > > *bs, BdrvChild *c, > > > >if (perm & BLK_PERM_WRITE) { > > > >*nperm = *nperm | BLK_PERM_CONSISTENT_READ; > > > >} > > > > -*nshared &= ~BLK_PERM_WRITE; > > > > +*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE); > > > >} > > > >} > > > > @@ -192,11 +194,13 @@ BlockDriverState > > > > *bdrv_backup_top_append(BlockDriverState *source, > > > >{ > > > >Error *local_err = NULL; > > > >BDRVBackupTopState *state; > > > > -BlockDriverState *top = > > > > bdrv_new_open_driver(&bdrv_backup_top_filter, > > > > - filter_node_name, > > > > - BDRV_O_RDWR, errp); > > > > +BlockDriverState *top; > > > >bool appended = false; > > > > +assert(source->total_sectors == target->total_sectors); > > > > > > May be better to error-out, just to keep backup-top independent. Still, > > > now it's not > > > really needed, as we have only one caller. And this function have to be > > > refactored > > > anyway, when publishing this filter (open() and close() should appear, so > > > this code > > > will be rewritten anyway.) > > > > Yes, the whole function only works because it's used in this restricted > > context today. For example, we only know that total_sectors is up to > > date because the caller has called bdrv_getlength() just a moment ago. > > > > I think fixing this would be beyond the scope of this patch, but > > certainly a good idea anyway. > > > > > And the other thought: the permissions we declared above, will be > > > activated only after > > > successful bdrv_child_refresh_perms(). I think some kind of race is > > > possible, so that > > > size is changed actual permission activation. So, may be good to double > > > check sizes after > > > bdrv_child_refresh_perms().. But it's a kind of paranoia. > > > > We're not in coroutine context, so we can't yield. I don't see who could > > change the size in parallel (apart from an external process, but an > > external process can mess up anything). > >
[Bug 1866870] Re: KVM Guest pauses after upgrade to Ubuntu 20.04
Hi Christian. Just filed bug: #1877052 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1866870 Title: KVM Guest pauses after upgrade to Ubuntu 20.04 Status in QEMU: Invalid Status in qemu package in Ubuntu: Invalid Status in seabios package in Ubuntu: Fix Released Bug description: Symptom: Error unpausing domain: internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb callback(*args, **kwargs) File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 66, in newfn ret = fn(self, *args, **kwargs) File "/usr/share/virt-manager/virtManager/object/domain.py", line 1311, in resume self._backend.resume() File "/usr/lib/python3/dist-packages/libvirt.py", line 2174, in resume if ret == -1: raise libvirtError ('virDomainResume() failed', dom=self) libvirt.libvirtError: internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required --- As outlined here: https://bugs.launchpad.net/qemu/+bug/1813165/comments/15 After upgrade, all KVM guests are in a default pause state. Even after forcing them off via virsh, and restarting them the guests are paused. These Guests are not nested. A lot of diganostic information are outlined in the previous bug report link provided. The solution mentioned in previous report had been allegedly integrated into the downstream updates. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1866870/+subscriptions
Re: [PATCH v7 0/7] reference implementation of RSS and hash report
I'll send v8 soon Thanks, Yuri On Wed, May 6, 2020 at 8:37 AM Jason Wang wrote: > > On 2020/5/1 下午12:01, Yuri Benditovich wrote: > > Michael/Jason, > > > > As Linux headers was updated in qemu and now include RSC/RSS/Hash > > definitions, please let me know what you prefer: > > 1. You apply this series as is, then I submit clean-up series that > > will remove all the redundant defines from virtio-net.c > > 2. I post v8 of this series with cleanup of all the redundant defines > > and also RSC ones > > 3. Something other > > > Hi Yuri: > > Though I've queued this series but consider we have new headers, I think > it's better to post v8. > > Thanks > >
Re: [PATCH v1 3/4] .cirrus.yml: bump FreeBSD to the current stable release
Li-Wen Hsu writes: > On Fri, May 1, 2020 at 7:15 PM Alex Bennée wrote: >> >> Hopefully this will un-stick the test which has been broken for a long >> time. >> >> Signed-off-by: Alex Bennée >> --- >> .cirrus.yml | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/.cirrus.yml b/.cirrus.yml >> index 90645fede6..f06f5af2b9 100644 >> --- a/.cirrus.yml >> +++ b/.cirrus.yml >> @@ -3,7 +3,7 @@ env: >> >> freebsd_12_task: >>freebsd_instance: >> -image: freebsd-12-0-release-amd64 >> +image_family: freebsd-12-1 >> cpu: 8 >> memory: 8G >>install_script: pkg install -y >> -- >> 2.20.1 >> > > Reviewed-by: Li-Wen Hsu > Tested-by: Li-Wen Hsu > > I would be nice to also add this patch: > https://github.com/lwhsu/qemu/commit/ac699f79b4d86d8195d76c3befada65ade449cc0.patch > To prevent problems in the future. Done. I took the liberty of apply your s-o-b tag as it wasn't on the github commit but it came from your repo. > > The error was due to the pkg version got "fixed" when building image, > and was too old when VM got provisioned, then it cannot be not > compatible with the package repository. Ref: > https://lists.freebsd.org/pipermail/freebsd-cloud/2020-April/000234.html > > Best, > Li-Wen -- Alex Bennée
[PATCH] replay: synchronize on every virtual timer callback
Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c |5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(&timer_list->active_timers_lock); progress = true; +/* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ +need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(&timer_list->active_timers_lock);
[Bug 1877052] [NEW] KVM Win 10 guest pauses after kernel upgrade
Public bug reported: Hello! Unfortunately the bug has apparently reappeared. I have a Windows 10 running in a VM, which after my today's "apt upgrade" goes into pause mode after a few seconds of running time. Until yesterday it used to work and I was able to boot the VM. During the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active and then went into pause mode. Even after a reboot of my host system the problem still persists: the VM boots for a few seconds and then switches to pause mode. Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Maybe relevant logfile lines: 2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from pid 1593 (/usr/sbin/libvirtd) 2020-05-06 07:47:23.101+: shutting down, reason=destroyed Kind regards, Andreas ** Affects: qemu Importance: Undecided Status: New ** Affects: qemu (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1877052 Title: KVM Win 10 guest pauses after kernel upgrade Status in QEMU: New Status in qemu package in Ubuntu: New Bug description: Hello! Unfortunately the bug has apparently reappeared. I have a Windows 10 running in a VM, which after my today's "apt upgrade" goes into pause mode after a few seconds of running time. Until yesterday it used to work and I was able to boot the VM. During the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active and then went into pause mode. Even after a reboot of my host system the problem still persists: the VM boots for a few seconds and then switches to pause mode. Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Maybe relevant logfile lines: 2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from pid 1593 (/usr/sbin/libvirtd) 2020-05-06 07:47:23.101+: shutting down, reason=destroyed Kind regards, Andreas To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions
[Bug 1877052] Re: KVM Win 10 guest pauses after kernel upgrade
** Attachment added: "VM's XML configuration file" https://bugs.launchpad.net/qemu/+bug/1877052/+attachment/5367249/+files/win10.xml ** Also affects: qemu (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1877052 Title: KVM Win 10 guest pauses after kernel upgrade Status in QEMU: New Status in qemu package in Ubuntu: New Bug description: Hello! Unfortunately the bug has apparently reappeared. I have a Windows 10 running in a VM, which after my today's "apt upgrade" goes into pause mode after a few seconds of running time. Until yesterday it used to work and I was able to boot the VM. During the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active and then went into pause mode. Even after a reboot of my host system the problem still persists: the VM boots for a few seconds and then switches to pause mode. Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Maybe relevant logfile lines: 2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from pid 1593 (/usr/sbin/libvirtd) 2020-05-06 07:47:23.101+: shutting down, reason=destroyed Kind regards, Andreas To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions
[Bug 1877052] Re: KVM Win 10 guest pauses after kernel upgrade
** Attachment added: "Libvirt Logfile" https://bugs.launchpad.net/qemu/+bug/1877052/+attachment/5367244/+files/win10.log -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1877052 Title: KVM Win 10 guest pauses after kernel upgrade Status in QEMU: New Status in qemu package in Ubuntu: New Bug description: Hello! Unfortunately the bug has apparently reappeared. I have a Windows 10 running in a VM, which after my today's "apt upgrade" goes into pause mode after a few seconds of running time. Until yesterday it used to work and I was able to boot the VM. During the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active and then went into pause mode. Even after a reboot of my host system the problem still persists: the VM boots for a few seconds and then switches to pause mode. Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Maybe relevant logfile lines: 2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from pid 1593 (/usr/sbin/libvirtd) 2020-05-06 07:47:23.101+: shutting down, reason=destroyed Kind regards, Andreas To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions
Re: [PATCH v2 3/4] backup: Make sure that source and target size match
06.05.2020 11:02, Kevin Wolf wrote: Am 06.05.2020 um 08:07 hat Vladimir Sementsov-Ogievskiy geschrieben: 05.05.2020 13:03, Kevin Wolf wrote: Am 30.04.2020 um 20:21 hat Vladimir Sementsov-Ogievskiy geschrieben: 30.04.2020 17:27, Kevin Wolf wrote: Since the introduction of a backup filter node in commit 00e30f05d, the backup block job crashes when the target image is smaller than the source image because it will try to write after the end of the target node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer would have caught this and errored out gracefully.) We can fix this and even do better than the old behaviour: Check that source and target have the same image size at the start of the block job and unshare BLK_PERM_RESIZE. (This permission was already unshared before the same commit 00e30f05d, but the BlockBackend that was used to make the restriction was removed without a replacement.) This will immediately error out when starting the job instead of only when writing to a block that doesn't exist in the target. Longer target than source would technically work because we would never write to blocks that don't exist, but semantically these are invalid, too, because a backup is supposed to create a copy, not just an image that starts with a copy. Fixes: 00e30f05de1d19586345ec373970ef4c192c6270 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593 Cc: qemu-sta...@nongnu.org Signed-off-by: Kevin Wolf I'm OK with it as is, as it fixes bug: Reviewed-by: Vladimir Sementsov-Ogievskiy still, some notes below --- block/backup-top.c | 14 +- block/backup.c | 14 +- 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/block/backup-top.c b/block/backup-top.c index 3b50c06e2c..79b268e6dc 100644 --- a/block/backup-top.c +++ b/block/backup-top.c @@ -148,8 +148,10 @@ static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c, * * Share write to target (child_file), to not interfere * with guest writes to its disk which may be in target backing chain. + * Can't resize during a backup block job because we check the size + * only upfront. */ -*nshared = BLK_PERM_ALL; +*nshared = BLK_PERM_ALL & ~BLK_PERM_RESIZE; *nperm = BLK_PERM_WRITE; } else { /* Source child */ @@ -159,7 +161,7 @@ static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c, if (perm & BLK_PERM_WRITE) { *nperm = *nperm | BLK_PERM_CONSISTENT_READ; } -*nshared &= ~BLK_PERM_WRITE; +*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE); } } @@ -192,11 +194,13 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState *source, { Error *local_err = NULL; BDRVBackupTopState *state; -BlockDriverState *top = bdrv_new_open_driver(&bdrv_backup_top_filter, - filter_node_name, - BDRV_O_RDWR, errp); +BlockDriverState *top; bool appended = false; +assert(source->total_sectors == target->total_sectors); May be better to error-out, just to keep backup-top independent. Still, now it's not really needed, as we have only one caller. And this function have to be refactored anyway, when publishing this filter (open() and close() should appear, so this code will be rewritten anyway.) Yes, the whole function only works because it's used in this restricted context today. For example, we only know that total_sectors is up to date because the caller has called bdrv_getlength() just a moment ago. I think fixing this would be beyond the scope of this patch, but certainly a good idea anyway. And the other thought: the permissions we declared above, will be activated only after successful bdrv_child_refresh_perms(). I think some kind of race is possible, so that size is changed actual permission activation. So, may be good to double check sizes after bdrv_child_refresh_perms().. But it's a kind of paranoia. We're not in coroutine context, so we can't yield. I don't see who could change the size in parallel (apart from an external process, but an external process can mess up anything). When we make backup-top an independent driver, instead of double checking (what would you do on error?), maybe we could move the size initialisation (then with bdrv_getlength()) to after bdrv_child_refresh_perms(). Also, third thought: the restricted permissions doesn't save us from resizing of the source through exactly this node, does it? Hmm, but your test works somehow. But (I assume) it worked in a previous patch version without unsharing on source.. Ha, but bdrv_co_truncate just can't work on backup-top, because it doesn't have file child. But, if we fix bdrv_co_truncate to skip filters, we'll need to define .bdrv_co_truncate in backup_top, which will return
[Bug 1877052] Re: KVM Win 10 guest pauses after kernel upgrade
** Attachment added: "History.log from apt" https://bugs.launchpad.net/qemu/+bug/1877052/+attachment/5367245/+files/history.log ** Description changed: - Hello! Unfortunately the bug has apparently reappeared. I have a Windows 10 running in a VM, which after my today's "apt upgrade" goes into pause mode after a few seconds of running time. Until yesterday it used to work and I was able to boot the VM. During the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active and then went into pause mode. Even after a reboot of my host system the problem still persists: the VM boots for a few seconds and then switches to pause mode. + Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed + Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux + Kind regards, -Andreas + Andreas -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1877052 Title: KVM Win 10 guest pauses after kernel upgrade Status in QEMU: New Status in qemu package in Ubuntu: New Bug description: Hello! Unfortunately the bug has apparently reappeared. I have a Windows 10 running in a VM, which after my today's "apt upgrade" goes into pause mode after a few seconds of running time. Until yesterday it used to work and I was able to boot the VM. During the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active and then went into pause mode. Even after a reboot of my host system the problem still persists: the VM boots for a few seconds and then switches to pause mode. Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Maybe relevant logfile lines: 2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from pid 1593 (/usr/sbin/libvirtd) 2020-05-06 07:47:23.101+: shutting down, reason=destroyed Kind regards, Andreas To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions
[Bug 1877052] Re: KVM Win 10 guest pauses after kernel upgrade
** Attachment added: "Dpkg -l Output" https://bugs.launchpad.net/qemu/+bug/1877052/+attachment/5367247/+files/dpkg-list.txt ** Description changed: Hello! Unfortunately the bug has apparently reappeared. I have a Windows 10 running in a VM, which after my today's "apt upgrade" goes into pause mode after a few seconds of running time. Until yesterday it used to work and I was able to boot the VM. During the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active and then went into pause mode. Even after a reboot of my host system the problem still persists: the VM boots for a few seconds and then switches to pause mode. Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux + Maybe relevant logfile lines: + 2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] + 2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] + 2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] + 2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] + 2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from pid 1593 (/usr/sbin/libvirtd) + 2020-05-06 07:47:23.101+: shutting down, reason=destroyed + Kind regards, Andreas -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1877052 Title: KVM Win 10 guest pauses after kernel upgrade Status in QEMU: New Status in qemu package in Ubuntu: New Bug description: Hello! Unfortunately the bug has apparently reappeared. I have a Windows 10 running in a VM, which after my today's "apt upgrade" goes into pause mode after a few seconds of running time. Until yesterday it used to work and I was able to boot the VM. During the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active and then went into pause mode. Even after a reboot of my host system the problem still persists: the VM boots for a few seconds and then switches to pause mode. Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Maybe relevant logfile lines: 2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12] 2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13] 2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from pid 1593 (/usr/sbin/libvirtd) 2020-05-06 07:47:23.101+: shutting down, reason=destroyed Kind regards, Andreas To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions
Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.
On Mon, May 04, 2020 at 11:58:56PM +0800, Kirti Wankhede wrote: > VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations: > - Start dirty pages tracking while migration is active > - Stop dirty pages tracking. > - Get dirty pages bitmap. Its user space application's responsibility to > copy content of dirty pages from source to destination during migration. > > To prevent DoS attack, memory for bitmap is allocated per vfio_dma > structure. Bitmap size is calculated considering smallest supported page > size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled > > Bitmap is populated for already pinned pages when bitmap is allocated for > a vfio_dma with the smallest supported page size. Update bitmap from > pinning functions when tracking is enabled. When user application queries > bitmap, check if requested page size is same as page size used to > populated bitmap. If it is equal, copy bitmap, but if not equal, return > error. > > Fixed below error by changing pgsize type from uint64_t to size_t. > Reported-by: kbuild test robot > > All errors: > drivers/vfio/vfio_iommu_type1.c:197: undefined reference to `__udivdi3' > > drivers/vfio/vfio_iommu_type1.c:225: undefined reference to `__udivdi3' > > Signed-off-by: Kirti Wankhede > Reviewed-by: Neo Jia > --- > drivers/vfio/vfio_iommu_type1.c | 266 > +++- > 1 file changed, 260 insertions(+), 6 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index fa735047b04d..01dcb417836f 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -71,6 +71,7 @@ struct vfio_iommu { > unsigned intdma_avail; > boolv2; > boolnesting; > + booldirty_page_tracking; > }; > > struct vfio_domain { > @@ -91,6 +92,7 @@ struct vfio_dma { > boollock_cap; /* capable(CAP_IPC_LOCK) */ > struct task_struct *task; > struct rb_root pfn_list; /* Ex-user pinned pfn list */ > + unsigned long *bitmap; > }; > > struct vfio_group { > @@ -125,7 +127,21 @@ struct vfio_regions { > #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ > (!list_empty(&iommu->domain_list)) > > +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / > BITS_PER_BYTE) > + > +/* > + * Input argument of number of bits to bitmap_set() is unsigned integer, > which > + * further casts to signed integer for unaligned multi-bit operation, > + * __bitmap_set(). > + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte, > + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page > + * system. > + */ > +#define DIRTY_BITMAP_PAGES_MAX((u64)INT_MAX) > +#define DIRTY_BITMAP_SIZE_MAX > DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX) > + > static int put_pfn(unsigned long pfn, int prot); > +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu); > > /* > * This code handles mapping and unmapping of user data buffers > @@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, > struct vfio_dma *old) > rb_erase(&old->node, &iommu->dma_list); > } > > + > +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, size_t pgsize) > +{ > + uint64_t npages = dma->size / pgsize; > + > + if (npages > DIRTY_BITMAP_PAGES_MAX) > + return -EINVAL; > + > + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL); > + if (!dma->bitmap) > + return -ENOMEM; > + > + return 0; > +} > + > +static void vfio_dma_bitmap_free(struct vfio_dma *dma) > +{ > + kfree(dma->bitmap); > + dma->bitmap = NULL; > +} > + > +static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize) > +{ > + struct rb_node *p; > + > + if (RB_EMPTY_ROOT(&dma->pfn_list)) > + return; > + > + for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) { > + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node); > + > + bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1); > + } > +} > + > +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t pgsize) > +{ > + struct rb_node *n = rb_first(&iommu->dma_list); > + > + for (; n; n = rb_next(n)) { > + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); > + int ret; > + > + ret = vfio_dma_bitmap_alloc(dma, pgsize); > + if (ret) { > + struct rb_node *p = rb_prev(n); > + > + for (; p; p = rb_prev(p)) { > + struct vfio_dma *dma = rb_entry(n, > + struct vfio_dma, node); > + > + vfio_dma_bitmap_free(dma); > + } > +
[PATCH v2 1/9] hw/net/xilinx_axienet: Auto-clear PHY Autoneg
From: "Edgar E. Iglesias" Auto-clear PHY CR Autoneg bits. This makes this model work with recent Linux kernels. Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- hw/net/xilinx_axienet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c index 704788811a..0f97510d8a 100644 --- a/hw/net/xilinx_axienet.c +++ b/hw/net/xilinx_axienet.c @@ -149,8 +149,8 @@ tdk_write(struct PHY *phy, unsigned int req, unsigned int data) break; } -/* Unconditionally clear regs[BMCR][BMCR_RESET] */ -phy->regs[0] &= ~0x8000; +/* Unconditionally clear regs[BMCR][BMCR_RESET] and auto-neg */ +phy->regs[0] &= ~0x8200; } static void -- 2.20.1
[PATCH v2 0/9] hw/core: stream: Add end-of-packet flag
From: "Edgar E. Iglesias" Hi, When modeling pipelines of processing nodes that communicate through streaming interfaces (e.g AXI-Stream), some of these nodes send packets while others may just stream unpacketized data. The purpose of this series is to add an end-of-packet flag, e.g what AXI-Stream calls tlast. This is in preparation for modeling future nodes that may use huge packets that we wouldn't be able to buffer and also to handle nodes that don't use packets. Along the way I fixed a few things in the petalinux-ml605 eth setup. Cheers, Edgar ChangeLog: v1 -> v2: * Check that packets fit c_txmem Edgar E. Iglesias (9): hw/net/xilinx_axienet: Auto-clear PHY Autoneg hw/net/xilinx_axienet: Cleanup stream->push assignment hw/net/xilinx_axienet: Remove unncessary cast hw/dma/xilinx_axidma: Add DMA memory-region property hw/core: stream: Add an end-of-packet flag hw/net/xilinx_axienet: Handle fragmented packets from DMA hw/dma/xilinx_axidma: mm2s: Stream descriptor by descriptor hw/dma/xilinx_axidma: s2mm: Support stream fragments MAINTAINERS: Add myself as streams maintainer include/hw/stream.h | 5 +-- hw/core/stream.c| 4 +-- hw/dma/xilinx_axidma.c | 75 ++--- hw/net/xilinx_axienet.c | 70 -- hw/ssi/xilinx_spips.c | 2 +- MAINTAINERS | 6 6 files changed, 113 insertions(+), 49 deletions(-) -- 2.20.1
[PATCH v2 8/9] hw/dma/xilinx_axidma: s2mm: Support stream fragments
From: "Edgar E. Iglesias" Add support for stream fragments. Reviewed-by: Alistair Francis Signed-off-by: Edgar E. Iglesias --- hw/dma/xilinx_axidma.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c index 101d32a965..87be9cade7 100644 --- a/hw/dma/xilinx_axidma.c +++ b/hw/dma/xilinx_axidma.c @@ -110,6 +110,7 @@ struct Stream { int nr; +bool sof; struct SDesc desc; unsigned int complete_cnt; uint32_t regs[R_MAX]; @@ -174,6 +175,7 @@ static void stream_reset(struct Stream *s) { s->regs[R_DMASR] = DMASR_HALTED; /* starts up halted. */ s->regs[R_DMACR] = 1 << 16; /* Starts with one in compl threshold. */ +s->sof = true; } /* Map an offset addr into a channel index. */ @@ -321,12 +323,11 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, } static size_t stream_process_s2mem(struct Stream *s, unsigned char *buf, - size_t len) + size_t len, bool eop) { uint32_t prev_d; unsigned int rxlen; size_t pos = 0; -int sof = 1; if (!stream_running(s) || stream_idle(s)) { return 0; @@ -352,16 +353,16 @@ static size_t stream_process_s2mem(struct Stream *s, unsigned char *buf, pos += rxlen; /* Update the descriptor. */ -if (!len) { +if (eop) { stream_complete(s); memcpy(s->desc.app, s->app, sizeof(s->desc.app)); s->desc.status |= SDESC_STATUS_EOF; } -s->desc.status |= sof << SDESC_STATUS_SOF_BIT; +s->desc.status |= s->sof << SDESC_STATUS_SOF_BIT; s->desc.status |= SDESC_STATUS_COMPLETE; stream_desc_store(s, s->regs[R_CURDESC]); -sof = 0; +s->sof = eop; /* Advance. */ prev_d = s->regs[R_CURDESC]; @@ -426,8 +427,7 @@ xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t len, struct Stream *s = &ds->dma->streams[1]; size_t ret; -assert(eop); -ret = stream_process_s2mem(s, buf, len); +ret = stream_process_s2mem(s, buf, len, eop); stream_update_irq(s); return ret; } -- 2.20.1
[PATCH v2 2/9] hw/net/xilinx_axienet: Cleanup stream->push assignment
From: "Edgar E. Iglesias" Split the shared stream_class_init function to assign stream->push with better type-safety. Reviewed-by: Alistair Francis Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- hw/net/xilinx_axienet.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c index 0f97510d8a..84073753d7 100644 --- a/hw/net/xilinx_axienet.c +++ b/hw/net/xilinx_axienet.c @@ -1029,11 +1029,19 @@ static void xilinx_enet_class_init(ObjectClass *klass, void *data) dc->reset = xilinx_axienet_reset; } -static void xilinx_enet_stream_class_init(ObjectClass *klass, void *data) +static void xilinx_enet_control_stream_class_init(ObjectClass *klass, + void *data) { StreamSlaveClass *ssc = STREAM_SLAVE_CLASS(klass); -ssc->push = data; +ssc->push = xilinx_axienet_control_stream_push; +} + +static void xilinx_enet_data_stream_class_init(ObjectClass *klass, void *data) +{ +StreamSlaveClass *ssc = STREAM_SLAVE_CLASS(klass); + +ssc->push = xilinx_axienet_data_stream_push; } static const TypeInfo xilinx_enet_info = { @@ -1048,8 +1056,7 @@ static const TypeInfo xilinx_enet_data_stream_info = { .name = TYPE_XILINX_AXI_ENET_DATA_STREAM, .parent= TYPE_OBJECT, .instance_size = sizeof(struct XilinxAXIEnetStreamSlave), -.class_init= xilinx_enet_stream_class_init, -.class_data= xilinx_axienet_data_stream_push, +.class_init= xilinx_enet_data_stream_class_init, .interfaces = (InterfaceInfo[]) { { TYPE_STREAM_SLAVE }, { } @@ -1060,8 +1067,7 @@ static const TypeInfo xilinx_enet_control_stream_info = { .name = TYPE_XILINX_AXI_ENET_CONTROL_STREAM, .parent= TYPE_OBJECT, .instance_size = sizeof(struct XilinxAXIEnetStreamSlave), -.class_init= xilinx_enet_stream_class_init, -.class_data= xilinx_axienet_control_stream_push, +.class_init= xilinx_enet_control_stream_class_init, .interfaces = (InterfaceInfo[]) { { TYPE_STREAM_SLAVE }, { } -- 2.20.1
[PATCH v2 3/9] hw/net/xilinx_axienet: Remove unncessary cast
From: "Edgar E. Iglesias" Remove unncessary cast, buf is already uint8_t *. No functional change. Reviewed-by: Alistair Francis Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- hw/net/xilinx_axienet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c index 84073753d7..c8dfcda3ee 100644 --- a/hw/net/xilinx_axienet.c +++ b/hw/net/xilinx_axienet.c @@ -918,7 +918,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t *buf, size_t size) uint16_t csum; tmp_csum = net_checksum_add(size - start_off, -(uint8_t *)buf + start_off); +buf + start_off); /* Accumulate the seed. */ tmp_csum += s->hdr[2] & 0x; -- 2.20.1
[PATCH v2 9/9] MAINTAINERS: Add myself as streams maintainer
From: "Edgar E. Iglesias" Since we're missing a maintainer, add myself. Reviewed-by: Alistair Francis Signed-off-by: Edgar E. Iglesias --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 1f84e3ae2c..d3663d6c9a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2315,6 +2315,12 @@ F: net/slirp.c F: include/net/slirp.h T: git https://people.debian.org/~sthibault/qemu.git slirp +Streams +M: Edgar E. Iglesias +S: Maintained +F: hw/core/stream.c +F: include/hw/stream.h + Stubs M: Paolo Bonzini S: Maintained -- 2.20.1
[PATCH v2 5/9] hw/core: stream: Add an end-of-packet flag
From: "Edgar E. Iglesias" Some stream clients stream an endless stream of data while other clients stream data in packets. Stream interfaces usually have a way to signal the end of a packet or the last beat of a transfer. This adds an end-of-packet flag to the push interface. Reviewed-by: Alistair Francis Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- include/hw/stream.h | 5 +++-- hw/core/stream.c| 4 ++-- hw/dma/xilinx_axidma.c | 10 ++ hw/net/xilinx_axienet.c | 14 ++ hw/ssi/xilinx_spips.c | 2 +- 5 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/hw/stream.h b/include/hw/stream.h index d02f62ca89..ed09e83683 100644 --- a/include/hw/stream.h +++ b/include/hw/stream.h @@ -39,12 +39,13 @@ typedef struct StreamSlaveClass { * @obj: Stream slave to push to * @buf: Data to write * @len: Maximum number of bytes to write + * @eop: End of packet flag */ -size_t (*push)(StreamSlave *obj, unsigned char *buf, size_t len); +size_t (*push)(StreamSlave *obj, unsigned char *buf, size_t len, bool eop); } StreamSlaveClass; size_t -stream_push(StreamSlave *sink, uint8_t *buf, size_t len); +stream_push(StreamSlave *sink, uint8_t *buf, size_t len, bool eop); bool stream_can_push(StreamSlave *sink, StreamCanPushNotifyFn notify, diff --git a/hw/core/stream.c b/hw/core/stream.c index 39b1e595cd..a65ad1208d 100644 --- a/hw/core/stream.c +++ b/hw/core/stream.c @@ -3,11 +3,11 @@ #include "qemu/module.h" size_t -stream_push(StreamSlave *sink, uint8_t *buf, size_t len) +stream_push(StreamSlave *sink, uint8_t *buf, size_t len, bool eop) { StreamSlaveClass *k = STREAM_SLAVE_GET_CLASS(sink); -return k->push(sink, buf, len); +return k->push(sink, buf, len, eop); } bool diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c index 4540051448..a770e12c96 100644 --- a/hw/dma/xilinx_axidma.c +++ b/hw/dma/xilinx_axidma.c @@ -283,7 +283,7 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, if (stream_desc_sof(&s->desc)) { s->pos = 0; -stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app)); +stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app), true); } txlen = s->desc.control & SDESC_CTRL_LEN_MASK; @@ -298,7 +298,7 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, s->pos += txlen; if (stream_desc_eof(&s->desc)) { -stream_push(tx_data_dev, s->txbuf, s->pos); +stream_push(tx_data_dev, s->txbuf, s->pos, true); s->pos = 0; stream_complete(s); } @@ -384,7 +384,7 @@ static void xilinx_axidma_reset(DeviceState *dev) static size_t xilinx_axidma_control_stream_push(StreamSlave *obj, unsigned char *buf, - size_t len) + size_t len, bool eop) { XilinxAXIDMAStreamSlave *cs = XILINX_AXI_DMA_CONTROL_STREAM(obj); struct Stream *s = &cs->dma->streams[1]; @@ -416,12 +416,14 @@ xilinx_axidma_data_stream_can_push(StreamSlave *obj, } static size_t -xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t len) +xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t len, + bool eop) { XilinxAXIDMAStreamSlave *ds = XILINX_AXI_DMA_DATA_STREAM(obj); struct Stream *s = &ds->dma->streams[1]; size_t ret; +assert(eop); ret = stream_process_s2mem(s, buf, len); stream_update_irq(s); return ret; diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c index c8dfcda3ee..bd48305577 100644 --- a/hw/net/xilinx_axienet.c +++ b/hw/net/xilinx_axienet.c @@ -697,14 +697,14 @@ static void axienet_eth_rx_notify(void *opaque) axienet_eth_rx_notify, s)) { size_t ret = stream_push(s->tx_control_dev, (void *)s->rxapp + CONTROL_PAYLOAD_SIZE - - s->rxappsize, s->rxappsize); + - s->rxappsize, s->rxappsize, true); s->rxappsize -= ret; } while (s->rxsize && stream_can_push(s->tx_data_dev, axienet_eth_rx_notify, s)) { size_t ret = stream_push(s->tx_data_dev, (void *)s->rxmem + s->rxpos, - s->rxsize); + s->rxsize, true); s->rxsize -= ret; s->rxpos += ret; if (!s->rxsize) { @@ -874,12 +874,14 @@ static ssize_t eth_rx(NetClientState *nc, const uint8_t *buf, size_t size) } static size_t -xilinx_axienet_control_stream_push(StreamSlave *obj, uint8_t *buf, size_t len) +xilinx_axienet_control_stream_push(StreamSlave *obj, uint8_t *buf, size_t len, + bool eop) { int i; Xi
[PATCH v2 4/9] hw/dma/xilinx_axidma: Add DMA memory-region property
From: "Edgar E. Iglesias" Add DMA memory-region property to externally control what address-space this DMA operates on. Reviewed-by: Alistair Francis Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- hw/dma/xilinx_axidma.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c index 018f36991b..4540051448 100644 --- a/hw/dma/xilinx_axidma.c +++ b/hw/dma/xilinx_axidma.c @@ -33,6 +33,7 @@ #include "qemu/log.h" #include "qemu/module.h" +#include "sysemu/dma.h" #include "hw/stream.h" #define D(x) @@ -103,6 +104,7 @@ enum { }; struct Stream { +struct XilinxAXIDMA *dma; ptimer_state *ptimer; qemu_irq irq; @@ -125,6 +127,9 @@ struct XilinxAXIDMAStreamSlave { struct XilinxAXIDMA { SysBusDevice busdev; MemoryRegion iomem; +MemoryRegion *dma_mr; +AddressSpace as; + uint32_t freqhz; StreamSlave *tx_data_dev; StreamSlave *tx_control_dev; @@ -186,7 +191,7 @@ static void stream_desc_load(struct Stream *s, hwaddr addr) { struct SDesc *d = &s->desc; -cpu_physical_memory_read(addr, d, sizeof *d); +address_space_read(&s->dma->as, addr, MEMTXATTRS_UNSPECIFIED, d, sizeof *d); /* Convert from LE into host endianness. */ d->buffer_address = le64_to_cpu(d->buffer_address); @@ -204,7 +209,8 @@ static void stream_desc_store(struct Stream *s, hwaddr addr) d->nxtdesc = cpu_to_le64(d->nxtdesc); d->control = cpu_to_le32(d->control); d->status = cpu_to_le32(d->status); -cpu_physical_memory_write(addr, d, sizeof *d); +address_space_write(&s->dma->as, addr, MEMTXATTRS_UNSPECIFIED, +d, sizeof *d); } static void stream_update_irq(struct Stream *s) @@ -286,8 +292,9 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, txlen + s->pos); } -cpu_physical_memory_read(s->desc.buffer_address, - s->txbuf + s->pos, txlen); +address_space_read(&s->dma->as, s->desc.buffer_address, + MEMTXATTRS_UNSPECIFIED, + s->txbuf + s->pos, txlen); s->pos += txlen; if (stream_desc_eof(&s->desc)) { @@ -336,7 +343,8 @@ static size_t stream_process_s2mem(struct Stream *s, unsigned char *buf, rxlen = len; } -cpu_physical_memory_write(s->desc.buffer_address, buf + pos, rxlen); +address_space_write(&s->dma->as, s->desc.buffer_address, +MEMTXATTRS_UNSPECIFIED, buf + pos, rxlen); len -= rxlen; pos += rxlen; @@ -525,6 +533,7 @@ static void xilinx_axidma_realize(DeviceState *dev, Error **errp) XilinxAXIDMAStreamSlave *cs = XILINX_AXI_DMA_CONTROL_STREAM( &s->rx_control_dev); Error *local_err = NULL; +int i; object_property_add_link(OBJECT(ds), "dma", TYPE_XILINX_AXI_DMA, (Object **)&ds->dma, @@ -545,17 +554,19 @@ static void xilinx_axidma_realize(DeviceState *dev, Error **errp) goto xilinx_axidma_realize_fail; } -int i; - for (i = 0; i < 2; i++) { struct Stream *st = &s->streams[i]; +st->dma = s; st->nr = i; st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_DEFAULT); ptimer_transaction_begin(st->ptimer); ptimer_set_freq(st->ptimer, s->freqhz); ptimer_transaction_commit(st->ptimer); } + +address_space_init(&s->as, + s->dma_mr ? s->dma_mr : get_system_memory(), "dma"); return; xilinx_axidma_realize_fail: @@ -575,6 +586,11 @@ static void xilinx_axidma_init(Object *obj) &s->rx_control_dev, sizeof(s->rx_control_dev), TYPE_XILINX_AXI_DMA_CONTROL_STREAM, &error_abort, NULL); +object_property_add_link(obj, "dma", TYPE_MEMORY_REGION, + (Object **)&s->dma_mr, + qdev_prop_allow_set_link_before_realize, + OBJ_PROP_LINK_STRONG, + &error_abort); sysbus_init_irq(sbd, &s->streams[0].irq); sysbus_init_irq(sbd, &s->streams[1].irq); -- 2.20.1
Re: [PATCH v23 0/4] implement zstd cluster compression method
06.05.2020 11:01, Denis Plotnikov wrote: On 05.05.2020 15:03, Max Reitz wrote: On 05.05.20 12:26, Max Reitz wrote: On 30.04.20 12:19, Denis Plotnikov wrote: v23: Undecided: whether to add zstd(zlib) compression details to the qcow2 spec 03: tighten assertion on zstd decompression [Eric] 04: use _rm_test_img appropriately [Max] Thanks, applied to my block branch: I’m afraid I have to unqueue this series again, because it makes many iotests fail due to an additional “compression_type=zlib” output when images are created, an additional “compression type” line in qemu-img info output where format-specific information is not suppressed, and an additional line in qemu-img create -f qcow2 -o help. Max Hmm, this is strange. I made some modifications for the tests in 0001 of the series (qcow2: introduce compression type feature). Among the other test related changes, the patch contains the hunk: +++ b/tests/qemu-iotests/common.filter @@ -152,7 +152,8 @@ _filter_img_create() -e "s# refcount_bits=[0-9]\\+##g" \ -e "s# key-secret=[a-zA-Z0-9]\\+##g" \ -e "s# iter-time=[0-9]\\+##g" \ - -e "s# force_size=\\(on\\|off\\)##g" + -e "s# force_size=\\(on\\|off\\)##g" \ + -e "s# compression_type=[a-zA-Z0-9]\\+##g" } which has to filter out "compression_type" on image creation. But you say that you can see the "compression_type" on the image creation. May be the patch wasn't fully applied? Or the test related modification were omitted? I've just re-based the series on top of: 681b07f4e76dbb700c16918d (vanilla/master, mainstream) Merge: a2261b2754 714eb0dbc5 Author: Peter Maydell Date: Tue May 5 15:47:44 2020 +0100 and run the tests with 'make check-block' and got the following: Not run: 071 099 184 220 259 267 Some cases not run in: 030 040 041 Passed all 113 iotests Hmm, we definitely have a lot more iotests. So, I assume make check-block doesn't run all of them. To run all: cd tests/qemu-iotests ./check -qcow2 ./check -raw ... and so for any format you want to test I also recommend to do export TEST_DIR=/something before running tests, where something is tmpfs or ssd, which will decrease testing time a lot. Still, some (very small subset) of tests doesn't run on tmpfs, you can rerun them with TEST_DIR=/normal/hdd/directory -- Best regards, Vladimir
[PATCH v2 6/9] hw/net/xilinx_axienet: Handle fragmented packets from DMA
From: "Edgar E. Iglesias" Add support for fragmented packets from the DMA. Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- hw/net/xilinx_axienet.c | 38 +++--- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c index bd48305577..498afe2f54 100644 --- a/hw/net/xilinx_axienet.c +++ b/hw/net/xilinx_axienet.c @@ -402,6 +402,9 @@ struct XilinxAXIEnet { uint32_t hdr[CONTROL_PAYLOAD_WORDS]; +uint8_t *txmem; +uint32_t txpos; + uint8_t *rxmem; uint32_t rxsize; uint32_t rxpos; @@ -421,6 +424,7 @@ static void axienet_rx_reset(XilinxAXIEnet *s) static void axienet_tx_reset(XilinxAXIEnet *s) { s->tc = TC_JUM | TC_TX | TC_VLAN; +s->txpos = 0; } static inline int axienet_rx_resetting(XilinxAXIEnet *s) @@ -902,17 +906,35 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t *buf, size_t size, XilinxAXIEnetStreamSlave *ds = XILINX_AXI_ENET_DATA_STREAM(obj); XilinxAXIEnet *s = ds->enet; -/* We don't support fragmented packets yet. */ -assert(eop); - /* TX enable ? */ if (!(s->tc & TC_TX)) { return size; } +if (s->txpos + size > s->c_txmem) { +qemu_log_mask(LOG_GUEST_ERROR, "%s: Packet larger than txmem\n", + TYPE_XILINX_AXI_ENET); +s->txpos = 0; +return size; +} + +if (s->txpos == 0 && eop) { +/* Fast path single fragment. */ +s->txpos = size; +} else { +memcpy(s->txmem + s->txpos, buf, size); +buf = s->txmem; +s->txpos += size; + +if (!eop) { +return size; +} +} + /* Jumbo or vlan sizes ? */ if (!(s->tc & TC_JUM)) { -if (size > 1518 && size <= 1522 && !(s->tc & TC_VLAN)) { +if (s->txpos > 1518 && s->txpos <= 1522 && !(s->tc & TC_VLAN)) { +s->txpos = 0; return size; } } @@ -923,7 +945,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t *buf, size_t size, uint32_t tmp_csum; uint16_t csum; -tmp_csum = net_checksum_add(size - start_off, +tmp_csum = net_checksum_add(s->txpos - start_off, buf + start_off); /* Accumulate the seed. */ tmp_csum += s->hdr[2] & 0x; @@ -936,12 +958,13 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t *buf, size_t size, buf[write_off + 1] = csum & 0xff; } -qemu_send_packet(qemu_get_queue(s->nic), buf, size); +qemu_send_packet(qemu_get_queue(s->nic), buf, s->txpos); -s->stats.tx_bytes += size; +s->stats.tx_bytes += s->txpos; s->regs[R_IS] |= IS_TX_COMPLETE; enet_update_irq(s); +s->txpos = 0; return size; } @@ -989,6 +1012,7 @@ static void xilinx_enet_realize(DeviceState *dev, Error **errp) s->TEMAC.parent = s; s->rxmem = g_malloc(s->c_rxmem); +s->txmem = g_malloc(s->c_txmem); return; xilinx_enet_realize_fail: -- 2.20.1
[PATCH v2 7/9] hw/dma/xilinx_axidma: mm2s: Stream descriptor by descriptor
From: "Edgar E. Iglesias" Stream descriptor by descriptor from memory instead of buffering entire packets before pushing. This enables non-packet streaming clients to work and also lifts the limitation that our internal DMA buffer needs to be able to hold entire packets. Reviewed-by: Alistair Francis Signed-off-by: Edgar E. Iglesias --- hw/dma/xilinx_axidma.c | 31 +-- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c index a770e12c96..101d32a965 100644 --- a/hw/dma/xilinx_axidma.c +++ b/hw/dma/xilinx_axidma.c @@ -111,7 +111,6 @@ struct Stream { int nr; struct SDesc desc; -int pos; unsigned int complete_cnt; uint32_t regs[R_MAX]; uint8_t app[20]; @@ -267,7 +266,9 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, StreamSlave *tx_control_dev) { uint32_t prev_d; -unsigned int txlen; +uint32_t txlen; +uint64_t addr; +bool eop; if (!stream_running(s) || stream_idle(s)) { return; @@ -282,24 +283,26 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, } if (stream_desc_sof(&s->desc)) { -s->pos = 0; stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app), true); } txlen = s->desc.control & SDESC_CTRL_LEN_MASK; -if ((txlen + s->pos) > sizeof s->txbuf) { -hw_error("%s: too small internal txbuf! %d\n", __func__, - txlen + s->pos); -} -address_space_read(&s->dma->as, s->desc.buffer_address, - MEMTXATTRS_UNSPECIFIED, - s->txbuf + s->pos, txlen); -s->pos += txlen; +eop = stream_desc_eof(&s->desc); +addr = s->desc.buffer_address; +while (txlen) { +unsigned int len; + +len = txlen > sizeof s->txbuf ? sizeof s->txbuf : txlen; +address_space_read(&s->dma->as, addr, + MEMTXATTRS_UNSPECIFIED, + s->txbuf, len); +stream_push(tx_data_dev, s->txbuf, len, eop && len == txlen); +txlen -= len; +addr += len; +} -if (stream_desc_eof(&s->desc)) { -stream_push(tx_data_dev, s->txbuf, s->pos, true); -s->pos = 0; +if (eop) { stream_complete(s); } -- 2.20.1
Re: [PATCH v4 03/13] acpi: rtc: use a single crs range
Hi, > > crs = aml_resource_template(); > > aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE, RTC_ISA_BASE, > > - 0x10, 0x02)); > > + 0x10, 0x08)); > > aml_append(crs, aml_irq_no_flags(RTC_ISA_IRQ)); > > -aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE + 2, RTC_ISA_BASE + > > 2, > > - 0x02, 0x06)); > can we just drop the later range as unused? (I don't see where it's actually > initialized) I'd rather follow what physical hardware is doing here for better compatibility ... take care, Gerd
Re: [PATCH 1/2] migration/multifd: fix memleaks in multifd_new_send_channel_async
Pan Nengyuan wrote: > When error happen in multifd_new_send_channel_async, 'sioc' will not be used > to create the multifd_send_thread. Let's free it to avoid a memleak. And also > do error_free after migrate_set_error() to avoid another leak in the same > place. > > The leak stack: > Direct leak of 2880 byte(s) in 8 object(s) allocated from: > #0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8) > #1 0x7f20b44df1d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5) > #2 0x564133bce18b in object_new_with_type > /mnt/sdb/backup/qemu/qom/object.c:683 > #3 0x564133eea950 in qio_channel_socket_new > /mnt/sdb/backup/qemu/io/channel-socket.c:56 > #4 0x5641339cfe4f in socket_send_channel_create > /mnt/sdb/backup/qemu/migration/socket.c:37 > #5 0x564133a10328 in multifd_save_setup > /mnt/sdb/backup/qemu/migration/multifd.c:772 > #6 0x5641339cebed in migrate_fd_connect > /mnt/sdb/backup/qemu/migration/migration.c:3530 > #7 0x5641339d15e4 in migration_channel_connect > /mnt/sdb/backup/qemu/migration/channel.c:92 > #8 0x5641339cf5b7 in socket_outgoing_migration > /mnt/sdb/backup/qemu/migration/socket.c:108 > > Direct leak of 384 byte(s) in 8 object(s) allocated from: > #0 0x7f20b5118cf0 in calloc (/lib64/libasan.so.5+0xefcf0) > #1 0x7f20b44df22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d) > #2 0x56413406fc17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61 > #3 0x564134070464 in error_setg_errno_internal > /mnt/sdb/backup/qemu/util/error.c:109 > #4 0x5641340851be in inet_connect_addr > /mnt/sdb/backup/qemu/util/qemu-sockets.c:379 > #5 0x5641340851be in inet_connect_saddr > /mnt/sdb/backup/qemu/util/qemu-sockets.c:458 > #6 0x5641340870ab in socket_connect > /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105 > #7 0x564133eeaabf in qio_channel_socket_connect_sync > /mnt/sdb/backup/qemu/io/channel-socket.c:145 > #8 0x564133eeabf5 in qio_channel_socket_connect_worker > /mnt/sdb/backup/qemu/io/channel-socket.c:168 > > Indirect leak of 360 byte(s) in 8 object(s) allocated from: > #0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8) > #1 0x7f20af901817 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d817) > #2 0x7f20b451fa6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c) > #3 0x7f20b44f8cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0) > #4 0x7f20b44f8d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c) > #5 0x56413406fc86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65 > #6 0x564134070464 in error_setg_errno_internal > /mnt/sdb/backup/qemu/util/error.c:109 > #7 0x5641340851be in inet_connect_addr > /mnt/sdb/backup/qemu/util/qemu-sockets.c:379 > #8 0x5641340851be in inet_connect_saddr > /mnt/sdb/backup/qemu/util/qemu-sockets.c:458 > #9 0x5641340870ab in socket_connect > /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105 > #10 0x564133eeaabf in qio_channel_socket_connect_sync > /mnt/sdb/backup/qemu/io/channel-socket.c:145 > #11 0x564133eeabf5 in qio_channel_socket_connect_worker > /mnt/sdb/backup/qemu/io/channel-socket.c:168 > > Reported-by: Euler Robot > Signed-off-by: Pan Nengyuan Reviewed-by: Juan Quintela I am not sure that this are the only possible error cases, but they are a step on the right direction. Thanks, Juan.
Re: [PATCH 2/2] migration/multifd: Do error_free after migrate_set_error to avoid memleaks
Pan Nengyuan wrote: > When error happen in multifd_send_thread, it use error_copy to set migrate > error in > multifd_send_terminate_threads(). We should call error_free after it. > > Similarly, fix another two places in multifd_recv_thread/multifd_save_cleanup. > > The leak stack: > Direct leak of 48 byte(s) in 1 object(s) allocated from: > #0 0x7f781af07cf0 in calloc (/lib64/libasan.so.5+0xefcf0) > #1 0x7f781a2ce22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d) > #2 0x55ee1d075c17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61 > #3 0x55ee1d076464 in error_setg_errno_internal > /mnt/sdb/backup/qemu/util/error.c:109 > #4 0x55ee1cef066e in qio_channel_socket_writev > /mnt/sdb/backup/qemu/io/channel-socket.c:569 > #5 0x55ee1cee806b in qio_channel_writev > /mnt/sdb/backup/qemu/io/channel.c:207 > #6 0x55ee1cee806b in qio_channel_writev_all > /mnt/sdb/backup/qemu/io/channel.c:171 > #7 0x55ee1cee8248 in qio_channel_write_all > /mnt/sdb/backup/qemu/io/channel.c:257 > #8 0x55ee1ca12c9a in multifd_send_thread > /mnt/sdb/backup/qemu/migration/multifd.c:657 > #9 0x55ee1d0607fc in qemu_thread_start > /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519 > #10 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd) > #11 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2) > > Indirect leak of 52 byte(s) in 1 object(s) allocated from: > #0 0x7f781af07f28 in __interceptor_realloc (/lib64/libasan.so.5+0xeff28) > #1 0x7f78156f07d9 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d7d9) > #2 0x7f781a30ea6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c) > #3 0x7f781a2e7cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0) > #4 0x7f781a2e7d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c) > #5 0x55ee1d075c86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65 > #6 0x55ee1d076464 in error_setg_errno_internal > /mnt/sdb/backup/qemu/util/error.c:109 > #7 0x55ee1cef066e in qio_channel_socket_writev > /mnt/sdb/backup/qemu/io/channel-socket.c:569 > #8 0x55ee1cee806b in qio_channel_writev > /mnt/sdb/backup/qemu/io/channel.c:207 > #9 0x55ee1cee806b in qio_channel_writev_all > /mnt/sdb/backup/qemu/io/channel.c:171 > #10 0x55ee1cee8248 in qio_channel_write_all > /mnt/sdb/backup/qemu/io/channel.c:257 > #11 0x55ee1ca12c9a in multifd_send_thread > /mnt/sdb/backup/qemu/migration/multifd.c:657 > #12 0x55ee1d0607fc in qemu_thread_start > /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519 > #13 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd) > #14 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2) > > Reported-by: Euler Robot > Signed-off-by: Pan Nengyuan Reviewed-by: Juan Quintela
Re: [PATCH v4 07/13] acpi: move aml builder code for parallel device
> > +static void parallel_isa_build_aml(ISADevice *isadev, Aml *scope) > > +{ > > +ISAParallelState *isa = ISA_PARALLEL(isadev); > > +int i, uid = 0; > > +Aml *dev; > > +Aml *crs; > > + > > +for (i = 0; i < ARRAY_SIZE(isa_parallel_io); i++) { > > +if (isa->iobase == isa_parallel_io[i]) { > > +uid = i + 1; > > I'm not sure about this check, as we can create a ISA device setting > manually index & iobase. What about using simply "uid = isa->index + 1" > instead? Looking at the code I see isa->index is assigned unconditionally. I misremembered that detail. So, yes, simply using isa->index should work fine even with '-device isa-serial,iobase=". I'll fix it for both serial and parallel. cheers, Gerd
Re: [PATCH] s390x: pv: Fix KVM_PV_PREP_RESET command wrapper name
On Tue, 5 May 2020 08:41:59 -0400 Janosch Frank wrote: > s390_pv_perf_clear_reset() is not a very helpful name since that > function needs to be called for a normal and a clear reset via > diag308. > > Let's instead name it s390_pv_prep_reset() which reflects the purpose > of the function a bit better. > > Signed-off-by: Janosch Frank > --- > hw/s390x/pv.c | 2 +- > hw/s390x/s390-virtio-ccw.c | 2 +- > include/hw/s390x/pv.h | 4 ++-- > 3 files changed, 4 insertions(+), 4 deletions(-) Thanks, applied.
Re: [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state
On Sun, May 03, 2020 at 09:06:38PM -0400, Raphael Norwitz wrote: > Apologies for mixing up patches last time. This looks good from a > vhost-user-blk perspective, but I worry that some of these changes > could impact other vhost device types. > > I agree with adding notifiers_set to struct vhost_dev, and setting it in > vhost_dev_enable/disable notifiers, but is there any reason notifiers_set > can’t be checked inside vhost-user-blk? Thanks for your review. I also have some concerns about changing current API, but my idea was that these issues will be triggered for all vhost-user/reconnect devices. But maybe you are right and first we should fix vhost-user-blk issues. I'll try to modify patch 2 and 3 in my patchset, so new notifiers_set field will be added, but no API change will be made. Will see how it looks.
Re: [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device
On Sun, May 03, 2020 at 08:36:45PM -0400, Raphael Norwitz wrote: > I’m happy from the vhost, vhost-user-blk and vhost-user-scsi side. For > other device types it looks pretty straightforward, but their maintainers > should probably confirm. > > Since you plan to change the behavior of these helpers in subsequent > patches, maybe consider sending the other device types separately > after the rest of the series has been merged? That way the changes to > individual devices will be much easier to review. Thanks for comments. Agree, will make a more straightforward fix only for vhost-user-blk. After it we can figure out how to propogate this change to other devices. > > On Thu, Apr 30, 2020 at 9:48 AM Dima Stepanov wrote: > > > > Introduce new wrappers to set/reset guest notifiers for the virtio > > device in the vhost device module: > > vhost_dev_assign_guest_notifiers > > ->set_guest_notifiers(..., ..., true); > > vhost_dev_drop_guest_notifiers > > ->set_guest_notifiers(..., ..., false); > > This is a preliminary step to refactor code, so the set_guest_notifiers > > methods could be called based on the vhost device state. > > Update all vhost used devices to use these wrappers instead of direct > > method call. > > > > Signed-off-by: Dima Stepanov > > --- > > backends/cryptodev-vhost.c | 26 +++--- > > backends/vhost-user.c | 16 +--- > > hw/block/vhost-user-blk.c | 15 +-- > > hw/net/vhost_net.c | 30 +- > > hw/scsi/vhost-scsi-common.c | 15 +-- > > hw/virtio/vhost-user-fs.c | 17 +++-- > > hw/virtio/vhost-vsock.c | 18 -- > > hw/virtio/vhost.c | 38 ++ > > hw/virtio/virtio.c | 13 + > > include/hw/virtio/vhost.h | 4 > > include/hw/virtio/virtio.h | 1 + > > 11 files changed, 118 insertions(+), 75 deletions(-) > >
Re: [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write
Thanks, Feng Li Dima Stepanov 于2020年4月30日周四 下午9:36写道: > > During testing of the vhost-user-blk reconnect functionality the qemu > SIGSEGV was triggered: > start qemu as: > x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \ >-object > memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \ >-numa node,cpus=0,memdev=ram-node0 \ >-chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \ >-device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm > start vhost-user-blk daemon: > ./vhost-user-blk -s ./vhost.sock -b test-img.raw > > If vhost-user-blk will be killed during the vhost initialization > process, for instance after getting VHOST_SET_VRING_CALL command, then > QEMU will fail with the following backtrace: > > Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. > 0x559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffd5b0) > at ./hw/virtio/vhost-user.c:260 > 260 CharBackend *chr = u->user->chr; > > #0 0x559272bb in vhost_user_read (dev=0x7fffef2d53e0, > msg=0x7fffd5b0) > at ./hw/virtio/vhost-user.c:260 > #1 0x5592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, > config=0x7fffef2d5394 "", config_len=60) > at ./hw/virtio/vhost-user.c:1645 > #2 0x55925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, > config=0x7fffef2d5394 "", config_len=60) > at ./hw/virtio/vhost.c:1490 > #3 0x558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, > errp=0x7fffd8f0) > at ./hw/block/vhost-user-blk.c:429 > #4 0x55920090 in virtio_device_realize (dev=0x7fffef2d51a0, > errp=0x7fffd948) > at ./hw/virtio/virtio.c:3615 > #5 0x55a9779c in device_set_realized (obj=0x7fffef2d51a0, > value=true, errp=0x7fffdb88) > at ./hw/core/qdev.c:891 > ... > > The problem is that vhost_user_write doesn't get an error after > disconnect and try to call vhost_user_read(). The tcp_chr_write() > routine should return -1 in case of disconnect. Indicate the EIO error > if this routine is called in the disconnected state. > > Signed-off-by: Dima Stepanov > --- > chardev/char-socket.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/chardev/char-socket.c b/chardev/char-socket.c > index 185fe38..c128cca 100644 > --- a/chardev/char-socket.c > +++ b/chardev/char-socket.c > @@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t > *buf, int len) > if (ret < 0 && errno != EAGAIN) { > if (tcp_chr_read_poll(chr) <= 0) { > tcp_chr_disconnect_locked(chr); > -return len; > +/* Return an error since we made a disconnect. */ > +return ret; The `return` statement could be deleted. The outside has a return statement. > } /* else let the read handler finish it properly */ > } > > return ret; > } else { > -/* XXX: indicate an error ? */ > -return len; > +/* Indicate an error. */ > +errno = EIO; > +return -1; > } > } > > -- > 2.7.4 >
[PATCH 0/8] drop unallocated_blocks_are_zero
Hi all! This is first step to block-status refactoring, and solves most simple problem mentioned in my investigation of block-status described in the thread "backing chain & block status & filters": https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg04706.html unallocated_blocks_are_zero doesn't simplify all the logic about block-status, and happily it's not needed, as shown by the following patches. So, let's get rid of it. Vladimir Sementsov-Ogievskiy (8): block/vdi: return ZERO block-status when appropriate block/vpc: return ZERO block-status when appropriate block/crypto: drop unallocated_blocks_are_zero block/iscsi: drop unallocated_blocks_are_zero block/file-posix: drop unallocated_blocks_are_zero block/vhdx: drop unallocated_blocks_are_zero qemu-img: convert: don't use unallocated_blocks_are_zero block: drop unallocated_blocks_are_zero include/block/block.h | 6 -- include/block/block_int.h | 13 - block.c | 15 --- block/crypto.c| 1 - block/file-posix.c| 3 --- block/io.c| 8 block/iscsi.c | 1 - block/qcow2.c | 1 - block/qed.c | 1 - block/vdi.c | 3 +-- block/vhdx.c | 3 --- block/vpc.c | 3 +-- qemu-img.c| 4 +--- 13 files changed, 19 insertions(+), 43 deletions(-) -- 2.21.0
[PATCH 2/8] block/vpc: return ZERO block-status when appropriate
In case when get_image_offset() returns -1, we do zero out the corresponding chunk of qiov. So, this should be reported as ZERO. After block-status update, it never reports 0, so setting unallocated_blocks_are_zero doesn't make sense. Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/vpc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/block/vpc.c b/block/vpc.c index 2d1eade146..555f9d8587 100644 --- a/block/vpc.c +++ b/block/vpc.c @@ -606,7 +606,6 @@ static int vpc_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) bdi->cluster_size = s->block_size; } -bdi->unallocated_blocks_are_zero = true; return 0; } @@ -745,7 +744,7 @@ static int coroutine_fn vpc_co_block_status(BlockDriverState *bs, image_offset = get_image_offset(bs, offset, false, NULL); allocated = (image_offset != -1); *pnum = 0; -ret = 0; +ret = BDRV_BLOCK_ZERO; do { /* All sectors in a block are contiguous (without using the bitmap) */ -- 2.21.0
[PATCH 5/8] block/file-posix: drop unallocated_blocks_are_zero
raw_co_block_status() in block/file-posix.c never returns 0, so unallocated_blocks_are_zero is useless. Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/file-posix.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 05e094be29..5c01735108 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -2878,9 +2878,6 @@ static int coroutine_fn raw_co_pwrite_zeroes( static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) { -BDRVRawState *s = bs->opaque; - -bdi->unallocated_blocks_are_zero = s->discard_zeroes; return 0; } -- 2.21.0
[PATCH 4/8] block/iscsi: drop unallocated_blocks_are_zero
We set bdi->unallocated_blocks_are_zero = iscsilun->lbprz, but iscsi_co_block_status doesn't return 0 in case of iscsilun->lbprz, it returns ZERO when appropriate. So actually unallocated_blocks_are_zero is useless. Drop it now. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/iscsi.c | 1 - 1 file changed, 1 deletion(-) diff --git a/block/iscsi.c b/block/iscsi.c index a8b76979d8..767e3e75fd 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -2163,7 +2163,6 @@ static int coroutine_fn iscsi_co_truncate(BlockDriverState *bs, int64_t offset, static int iscsi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) { IscsiLun *iscsilun = bs->opaque; -bdi->unallocated_blocks_are_zero = iscsilun->lbprz; bdi->cluster_size = iscsilun->cluster_size; return 0; } -- 2.21.0
[PATCH 1/8] block/vdi: return ZERO block-status when appropriate
In case of !VDI_IS_ALLOCATED[], we do zero out the corresponding chunk of qiov. So, this should be reported as ZERO. After block-status update, it never reports 0, so setting unallocated_blocks_are_zero doesn't make sense. Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/vdi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/block/vdi.c b/block/vdi.c index 0c7835ae70..83471528d2 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -334,7 +334,6 @@ static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) logout("\n"); bdi->cluster_size = s->block_size; bdi->vm_state_offset = 0; -bdi->unallocated_blocks_are_zero = true; return 0; } @@ -536,7 +535,7 @@ static int coroutine_fn vdi_co_block_status(BlockDriverState *bs, *pnum = MIN(s->block_size - index_in_block, bytes); result = VDI_IS_ALLOCATED(bmap_entry); if (!result) { -return 0; +return BDRV_BLOCK_ZERO; } *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size + -- 2.21.0
[PATCH 8/8] block: drop unallocated_blocks_are_zero
Currently this field only set by qed and qcow2. But in fact, all backing-supporting formats (parallels, qcow, qcow2, qed, vmdk) share this semantics: on unallocated blocks, if there is no backing file they just memset the buffer with zeroes. So, document this behavior for .supports_backing and drop .unallocated_blocks_are_zero Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/block.h | 6 -- include/block/block_int.h | 13 - block.c | 15 --- block/io.c| 8 block/qcow2.c | 1 - block/qed.c | 1 - 6 files changed, 16 insertions(+), 28 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 8b62429aa4..db1cb503ec 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -21,11 +21,6 @@ typedef struct BlockDriverInfo { /* offset at which the VM state can be saved (0 if not possible) */ int64_t vm_state_offset; bool is_dirty; -/* - * True if unallocated blocks read back as zeroes. This is equivalent - * to the LBPRZ flag in the SCSI logical block provisioning page. - */ -bool unallocated_blocks_are_zero; /* * True if this block driver only supports compressed writes */ @@ -431,7 +426,6 @@ int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes); int bdrv_has_zero_init_1(BlockDriverState *bs); int bdrv_has_zero_init(BlockDriverState *bs); int bdrv_has_zero_init_truncate(BlockDriverState *bs); -bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs); bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs); int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map, diff --git a/include/block/block_int.h b/include/block/block_int.h index 92335f33c7..c156b22c6b 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -115,7 +115,18 @@ struct BlockDriver { */ bool bdrv_needs_filename; -/* Set if a driver can support backing files */ +/* + * Set if a driver can support backing files. This also implies the + * following semantics: + * + * - Return status 0 of .bdrv_co_block_status means that corresponding + *blocks are not allocated in this layer of backing-chain + * - For such (unallocated) blocks, read will: + *- read from backing file if there is one and big enough + *- fill buffer with zeroes if there is no backing file + *- space after EOF of the backing file considered as zero + * (corresponding part of read-buffer must be zeroed by driver) + */ bool supports_backing; /* For handling image reopen for split or non-split files */ diff --git a/block.c b/block.c index cf5c19b1db..0283fdecbc 100644 --- a/block.c +++ b/block.c @@ -5305,21 +5305,6 @@ int bdrv_has_zero_init_truncate(BlockDriverState *bs) return 0; } -bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs) -{ -BlockDriverInfo bdi; - -if (bs->backing) { -return false; -} - -if (bdrv_get_info(bs, &bdi) == 0) { -return bdi.unallocated_blocks_are_zero; -} - -return false; -} - bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs) { if (!(bs->open_flags & BDRV_O_UNMAP)) { diff --git a/block/io.c b/block/io.c index a4f9714230..484adec5a1 100644 --- a/block/io.c +++ b/block/io.c @@ -2385,16 +2385,16 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs, if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) { ret |= BDRV_BLOCK_ALLOCATED; -} else if (want_zero) { -if (bdrv_unallocated_blocks_are_zero(bs)) { -ret |= BDRV_BLOCK_ZERO; -} else if (bs->backing) { +} else if (want_zero && bs->drv->supports_backing) { +if (bs->backing) { BlockDriverState *bs2 = bs->backing->bs; int64_t size2 = bdrv_getlength(bs2); if (size2 >= 0 && offset >= size2) { ret |= BDRV_BLOCK_ZERO; } +} else { +ret |= BDRV_BLOCK_ZERO; } } diff --git a/block/qcow2.c b/block/qcow2.c index 2ba0b17c39..dc3c0aac2b 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -4858,7 +4858,6 @@ err: static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) { BDRVQcow2State *s = bs->opaque; -bdi->unallocated_blocks_are_zero = true; bdi->cluster_size = s->cluster_size; bdi->vm_state_offset = qcow2_vm_state_offset(s); return 0; diff --git a/block/qed.c b/block/qed.c index b0fdb8f565..fb7833dc8b 100644 --- a/block/qed.c +++ b/block/qed.c @@ -1514,7 +1514,6 @@ static int bdrv_qed_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) memset(bdi, 0, sizeof(*bdi)); bdi->cluster_size = s->header.cluster_size; bdi->is_dirty = s->header.features & QED_F_NEED_CHECK; -bdi->unallocated_blocks_are_zero = true; return 0;
[PATCH 3/8] block/crypto: drop unallocated_blocks_are_zero
It's false by default, no needs to set it. We are going to drop this variable at all, so drop it now here, it doesn't hurt. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/crypto.c | 1 - 1 file changed, 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index e02f343590..7685e61844 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -694,7 +694,6 @@ static int block_crypto_get_info_luks(BlockDriverState *bs, return ret; } -bdi->unallocated_blocks_are_zero = false; bdi->cluster_size = subbdi.cluster_size; return 0; -- 2.21.0
[PATCH 6/8] block/vhdx: drop unallocated_blocks_are_zero
vhdx doesn't have .bdrv_co_block_status handler, so DATA|ALLOCATED is always assumed for it. unallocated_blocks_are_zero is useless, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/vhdx.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/block/vhdx.c b/block/vhdx.c index aedd782604..45963a3166 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -1164,9 +1164,6 @@ static int vhdx_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) bdi->cluster_size = s->block_size; -bdi->unallocated_blocks_are_zero = -(s->params.data_bits & VHDX_PARAMS_HAS_PARENT) == 0; - return 0; } -- 2.21.0
[PATCH 7/8] qemu-img: convert: don't use unallocated_blocks_are_zero
qemu-img convert wants to distinguish ZERO which comes from short backing files. unallocated_blocks_are_zero field of bdi is unrelated: space after EOF is always considered to be zero anyway. So, just make post_backing_zero true in case of short backing file. Signed-off-by: Vladimir Sementsov-Ogievskiy --- qemu-img.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 6a4327aaba..4fe751878b 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1632,7 +1632,6 @@ typedef struct ImgConvertState { BlockBackend *target; bool has_zero_init; bool compressed; -bool unallocated_blocks_are_zero; bool target_is_new; bool target_has_backing; int64_t target_backing_sectors; /* negative if unknown */ @@ -1677,7 +1676,7 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num) if (s->target_backing_sectors >= 0) { if (sector_num >= s->target_backing_sectors) { -post_backing_zero = s->unallocated_blocks_are_zero; +post_backing_zero = true; } else if (sector_num + n > s->target_backing_sectors) { /* Split requests around target_backing_sectors (because * starting from there, zeros are handled differently) */ @@ -2580,7 +2579,6 @@ static int img_convert(int argc, char **argv) } else { s.compressed = s.compressed || bdi.needs_compressed_writes; s.cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE; -s.unallocated_blocks_are_zero = bdi.unallocated_blocks_are_zero; } ret = convert_do_copy(&s); -- 2.21.0
Re: [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write
On Thu, Apr 30, 2020 at 3:37 PM Dima Stepanov wrote: > > During testing of the vhost-user-blk reconnect functionality the qemu > SIGSEGV was triggered: > start qemu as: > x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \ >-object > memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \ >-numa node,cpus=0,memdev=ram-node0 \ >-chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \ >-device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm > start vhost-user-blk daemon: > ./vhost-user-blk -s ./vhost.sock -b test-img.raw > > If vhost-user-blk will be killed during the vhost initialization > process, for instance after getting VHOST_SET_VRING_CALL command, then > QEMU will fail with the following backtrace: > > Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. > 0x559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffd5b0) > at ./hw/virtio/vhost-user.c:260 > 260 CharBackend *chr = u->user->chr; > > #0 0x559272bb in vhost_user_read (dev=0x7fffef2d53e0, > msg=0x7fffd5b0) > at ./hw/virtio/vhost-user.c:260 > #1 0x5592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, > config=0x7fffef2d5394 "", config_len=60) > at ./hw/virtio/vhost-user.c:1645 > #2 0x55925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, > config=0x7fffef2d5394 "", config_len=60) > at ./hw/virtio/vhost.c:1490 > #3 0x558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, > errp=0x7fffd8f0) > at ./hw/block/vhost-user-blk.c:429 > #4 0x55920090 in virtio_device_realize (dev=0x7fffef2d51a0, > errp=0x7fffd948) > at ./hw/virtio/virtio.c:3615 > #5 0x55a9779c in device_set_realized (obj=0x7fffef2d51a0, > value=true, errp=0x7fffdb88) > at ./hw/core/qdev.c:891 > ... > > The problem is that vhost_user_write doesn't get an error after > disconnect and try to call vhost_user_read(). The tcp_chr_write() > routine should return -1 in case of disconnect. Indicate the EIO error > if this routine is called in the disconnected state. > > Signed-off-by: Dima Stepanov Reviewed-by: Marc-André Lureau > --- > chardev/char-socket.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/chardev/char-socket.c b/chardev/char-socket.c > index 185fe38..c128cca 100644 > --- a/chardev/char-socket.c > +++ b/chardev/char-socket.c > @@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t > *buf, int len) > if (ret < 0 && errno != EAGAIN) { > if (tcp_chr_read_poll(chr) <= 0) { > tcp_chr_disconnect_locked(chr); > -return len; > +/* Return an error since we made a disconnect. */ > +return ret; > } /* else let the read handler finish it properly */ > } > > return ret; > } else { > -/* XXX: indicate an error ? */ > -return len; > +/* Indicate an error. */ > +errno = EIO; > +return -1; > } > } > > -- > 2.7.4 >
[PATCH v1 01/17] exec: Introduce ram_block_discard_set_(unreliable|required)()
We want to replace qemu_balloon_inhibit() by something more generic. Especially, we want to make sure that technologies that really rely on RAM block discards to work reliably to run mutual exclusive with technologies that break it. E.g., vfio will usually pin all guest memory, turning the virtio-balloon basically useless and make the VM consume more memory than reported via the balloon. While the balloon is special already (=> no guarantees, same behavior possible afer reboots and with huge pages), this will be different, especially, with virtio-mem. Let's implement a way such that we can make both types of technology run mutually exclusive. We'll convert existing balloon inhibitors in successive patches and add some new ones. Add the check to qemu_balloon_is_inhibited() for now. We might want to make virtio-balloon an acutal inhibitor in the future - however, that requires more thought to not break existing setups. Cc: "Michael S. Tsirkin" Cc: Richard Henderson Cc: Paolo Bonzini Signed-off-by: David Hildenbrand --- balloon.c | 3 ++- exec.c| 48 +++ include/exec/memory.h | 41 3 files changed, 91 insertions(+), 1 deletion(-) diff --git a/balloon.c b/balloon.c index f104b42961..c49f57c27b 100644 --- a/balloon.c +++ b/balloon.c @@ -40,7 +40,8 @@ static int balloon_inhibit_count; bool qemu_balloon_is_inhibited(void) { -return atomic_read(&balloon_inhibit_count) > 0; +return atomic_read(&balloon_inhibit_count) > 0 || + ram_block_discard_is_broken(); } void qemu_balloon_inhibit(bool state) diff --git a/exec.c b/exec.c index 2874bb5088..52a6e40e99 100644 --- a/exec.c +++ b/exec.c @@ -4049,4 +4049,52 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root) } } +static int ram_block_discard_broken; + +int ram_block_discard_set_broken(bool state) +{ +int old; + +if (!state) { +atomic_dec(&ram_block_discard_broken); +return 0; +} + +do { +old = atomic_read(&ram_block_discard_broken); +if (old < 0) { +return -EBUSY; +} +} while (atomic_cmpxchg(&ram_block_discard_broken, old, old + 1) != old); +return 0; +} + +int ram_block_discard_set_required(bool state) +{ +int old; + +if (!state) { +atomic_inc(&ram_block_discard_broken); +return 0; +} + +do { +old = atomic_read(&ram_block_discard_broken); +if (old > 0) { +return -EBUSY; +} +} while (atomic_cmpxchg(&ram_block_discard_broken, old, old - 1) != old); +return 0; +} + +bool ram_block_discard_is_broken(void) +{ +return atomic_read(&ram_block_discard_broken) > 0; +} + +bool ram_block_discard_is_required(void) +{ +return atomic_read(&ram_block_discard_broken) < 0; +} + #endif diff --git a/include/exec/memory.h b/include/exec/memory.h index e000bd2f97..9bb5ced38d 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -2463,6 +2463,47 @@ static inline MemOp devend_memop(enum device_endian end) } #endif +/* + * Inhibit technologies that rely on discarding of parts of RAM blocks to work + * reliably, e.g., to manage the actual amount of memory consumed by the VM + * (then, the memory provided by RAM blocks might be bigger than the desired + * memory consumption). This *must* be set if: + * - Discarding parts of a RAM blocks does not result in the change being + * reflected in the VM and the pages getting freed. + * - All memory in RAM blocks is pinned or duplicated, invaldiating any previous + * discards blindly. + * - Discarding parts of a RAM blocks will result in integrity issues (e.g., + * encrypted VMs). + * Technologies that only temporarily pin the current working set of a + * driver are fine, because we don't expect such pages to be discarded + * (esp. based on guest action like balloon inflation). + * + * This is *not* to be used to protect from concurrent discards (esp., + * postcopy). + * + * Returns 0 if successful. Returns -EBUSY if a technology that relies on + * discards to work reliably is active. + */ +int ram_block_discard_set_broken(bool state); + +/* + * Inhibit technologies that will break discarding of pages in RAM blocks. + * + * Returns 0 if successful. Returns -EBUSY if discards are already set to + * broken. + */ +int ram_block_discard_set_required(bool state); + +/* + * Test if discarding of memory in ram blocks is broken. + */ +bool ram_block_discard_is_broken(void); + +/* + * Test if discarding of memory in ram blocks is required to work reliably. + */ +bool ram_block_discard_is_required(void); + #endif #endif -- 2.25.3
[PATCH v1 03/17] accel/kvm: Convert to ram_block_discard_set_broken()
Discarding memory does not work as expected. At the time this is called, we cannot have anyone active that relies on discards to work properly. Cc: Paolo Bonzini Signed-off-by: David Hildenbrand --- accel/kvm/kvm-all.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 439a4efe52..33421184ac 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -40,7 +40,6 @@ #include "trace.h" #include "hw/irq.h" #include "sysemu/sev.h" -#include "sysemu/balloon.h" #include "qapi/visitor.h" #include "qapi/qapi-types-common.h" #include "qapi/qapi-visit-common.h" @@ -2107,7 +2106,7 @@ static int kvm_init(MachineState *ms) s->sync_mmu = !!kvm_vm_check_extension(kvm_state, KVM_CAP_SYNC_MMU); if (!s->sync_mmu) { -qemu_balloon_inhibit(true); +g_assert(ram_block_discard_set_broken(true)); } return 0; -- 2.25.3
[PATCH v1 02/17] vfio: Convert to ram_block_discard_set_broken()
VFIO is (except devices without a physical IOMMU or some mediated devices) incompatible ram_block_discard_set_broken. The kernel will pin basically all VM memory. Let's convert to ram_block_discard_set_broke(), which can now fail, in contrast to qemu_balloon_inhibit(). Leave "x-balloon-allowed" named as it is for now. Cc: Cornelia Huck Cc: Alex Williamson Cc: Christian Borntraeger Cc: Tony Krowiak Cc: Halil Pasic Cc: Pierre Morel Cc: Eric Farman Signed-off-by: David Hildenbrand --- hw/vfio/ap.c | 10 +++ hw/vfio/ccw.c | 11 hw/vfio/common.c | 53 +++ hw/vfio/pci.c | 6 ++-- include/hw/vfio/vfio-common.h | 4 +-- 5 files changed, 45 insertions(+), 39 deletions(-) diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c index 8649ac15f9..b51546d67a 100644 --- a/hw/vfio/ap.c +++ b/hw/vfio/ap.c @@ -105,12 +105,12 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp) vapdev->vdev.dev = dev; /* - * vfio-ap devices operate in a way compatible with - * memory ballooning, as no pages are pinned in the host. - * This needs to be set before vfio_get_device() for vfio common to - * handle the balloon inhibitor. + * vfio-ap devices operate in a way compatible discarding of memory in + * RAM blocks, as no pages are pinned in the host. This needs to be + * set before vfio_get_device() for vfio common to handle + * ram_block_discard_set_broken(). */ -vapdev->vdev.balloon_allowed = true; +vapdev->vdev.ram_block_discard_allowed = true; ret = vfio_get_device(vfio_group, mdevid, &vapdev->vdev, errp); if (ret) { diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c index 50cc2ec75c..0dd6c3f2ab 100644 --- a/hw/vfio/ccw.c +++ b/hw/vfio/ccw.c @@ -425,12 +425,13 @@ static void vfio_ccw_get_device(VFIOGroup *group, VFIOCCWDevice *vcdev, /* * All vfio-ccw devices are believed to operate in a way compatible with - * memory ballooning, ie. pages pinned in the host are in the current - * working set of the guest driver and therefore never overlap with pages - * available to the guest balloon driver. This needs to be set before - * vfio_get_device() for vfio common to handle the balloon inhibitor. + * discarding of memory in RAM blocks, ie. pages pinned in the host are + * in the current working set of the guest driver and therefore never + * overlap e.g., with pages available to the guest balloon driver. This + * needs to be set before vfio_get_device() for vfio common to handle + * ram_block_discard_set_broken(). */ -vcdev->vdev.balloon_allowed = true; +vcdev->vdev.ram_block_discard_allowed = true; if (vfio_get_device(group, vcdev->cdev.mdevid, &vcdev->vdev, errp)) { goto out_err; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 0b3593b3c0..98b2573ae6 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -33,7 +33,6 @@ #include "qemu/error-report.h" #include "qemu/main-loop.h" #include "qemu/range.h" -#include "sysemu/balloon.h" #include "sysemu/kvm.h" #include "sysemu/reset.h" #include "trace.h" @@ -1215,31 +1214,36 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, space = vfio_get_address_space(as); /* - * VFIO is currently incompatible with memory ballooning insofar as the + * VFIO is currently incompatible with discarding of RAM insofar as the * madvise to purge (zap) the page from QEMU's address space does not * interact with the memory API and therefore leaves stale virtual to * physical mappings in the IOMMU if the page was previously pinned. We - * therefore add a balloon inhibit for each group added to a container, + * therefore set discarding broken for each group added to a container, * whether the container is used individually or shared. This provides * us with options to allow devices within a group to opt-in and allow - * ballooning, so long as it is done consistently for a group (for instance + * discarding, so long as it is done consistently for a group (for instance * if the device is an mdev device where it is known that the host vendor * driver will never pin pages outside of the working set of the guest - * driver, which would thus not be ballooning candidates). + * driver, which would thus not be discarding candidates). * * The first opportunity to induce pinning occurs here where we attempt to * attach the group to existing containers within the AddressSpace. If any - * pages are already zapped from the virtual address space, such as from a - * previous ballooning opt-in, new pinning will cause valid mappings to be + * pages are already zapped from the virtual address space, such as from + * previous discards, new pinning will cause valid mappings to be * re-established. Likewise, when the overal
[PATCH v1 00/17] virtio-mem: Paravirtualized memory hot(un)plug
This is the very basic, initial version of virtio-mem. More info on virtio-mem in general can be found in the Linux kernel driver posting [1] and in patch #10. "The basic idea of virtio-mem is to provide a flexible, cross-architecture memory hot(un)plug solution that avoids many limitations imposed by existing technologies, architectures, and interfaces." There are a lot of addons in the works (esp. protection of unplugged memory, better hugepage support (esp. when reading unplugged memory), resizeable memory backends, migration optimizations, support for more architectures, ...), this is the very basic version to get the ball rolling. The first 8 patches make sure we don't have any sudden surprises e.g., if somebody tries to pin all memory in RAM blocks, resulting in a higher memory consumption than desired. The remaining patches add basic virtio-mem along with support for x86-64. [1] https://lkml.kernel.org/r/20200311171422.10484-1-da...@redhat.com David Hildenbrand (17): exec: Introduce ram_block_discard_set_(unreliable|required)() vfio: Convert to ram_block_discard_set_broken() accel/kvm: Convert to ram_block_discard_set_broken() s390x/pv: Convert to ram_block_discard_set_broken() virtio-balloon: Rip out qemu_balloon_inhibit() target/i386: sev: Use ram_block_discard_set_broken() migration/rdma: Use ram_block_discard_set_broken() migration/colo: Use ram_block_discard_set_broken() linux-headers: update to contain virtio-mem virtio-mem: Paravirtualized memory hot(un)plug virtio-pci: Proxy for virtio-mem MAINTAINERS: Add myself as virtio-mem maintainer hmp: Handle virtio-mem when printing memory device info numa: Handle virtio-mem in NUMA stats pc: Support for virtio-mem-pci virtio-mem: Allow notifiers for size changes virtio-pci: Send qapi events when the virtio-mem size changes MAINTAINERS | 8 + accel/kvm/kvm-all.c | 3 +- balloon.c | 17 - exec.c | 48 ++ hw/core/numa.c | 6 + hw/i386/Kconfig | 1 + hw/i386/pc.c| 49 +- hw/s390x/s390-virtio-ccw.c | 22 +- hw/vfio/ap.c| 10 +- hw/vfio/ccw.c | 11 +- hw/vfio/common.c| 53 +- hw/vfio/pci.c | 6 +- hw/virtio/Kconfig | 11 + hw/virtio/Makefile.objs | 2 + hw/virtio/virtio-balloon.c | 12 +- hw/virtio/virtio-mem-pci.c | 159 hw/virtio/virtio-mem-pci.h | 34 + hw/virtio/virtio-mem.c | 781 include/exec/memory.h | 41 + include/hw/pci/pci.h| 1 + include/hw/vfio/vfio-common.h | 4 +- include/hw/virtio/virtio-mem.h | 85 +++ include/migration/colo.h| 2 +- include/standard-headers/linux/virtio_ids.h | 1 + include/standard-headers/linux/virtio_mem.h | 208 ++ include/sysemu/balloon.h| 2 - migration/migration.c | 8 +- migration/postcopy-ram.c| 23 - migration/rdma.c| 18 +- migration/savevm.c | 11 +- monitor/hmp-cmds.c | 16 + monitor/monitor.c | 1 + qapi/misc.json | 64 +- target/i386/sev.c | 1 + 34 files changed, 1598 insertions(+), 121 deletions(-) create mode 100644 hw/virtio/virtio-mem-pci.c create mode 100644 hw/virtio/virtio-mem-pci.h create mode 100644 hw/virtio/virtio-mem.c create mode 100644 include/hw/virtio/virtio-mem.h create mode 100644 include/standard-headers/linux/virtio_mem.h -- 2.25.3
[PATCH v1 15/17] pc: Support for virtio-mem-pci
Let's wire it up similar to virtio-pmem. Also disallow unplug, so it's harder for users to shoot themselves into the foot. Cc: "Michael S. Tsirkin" Cc: Marcel Apfelbaum Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Eric Blake Cc: Markus Armbruster Signed-off-by: David Hildenbrand --- hw/i386/Kconfig | 1 + hw/i386/pc.c| 49 - 2 files changed, 29 insertions(+), 21 deletions(-) diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig index c93f32f657..03e347b207 100644 --- a/hw/i386/Kconfig +++ b/hw/i386/Kconfig @@ -35,6 +35,7 @@ config PC select ACPI_PCI select ACPI_VMGENID select VIRTIO_PMEM_SUPPORTED +select VIRTIO_MEM_SUPPORTED config PC_PCI bool diff --git a/hw/i386/pc.c b/hw/i386/pc.c index f6b8431c8b..588804f895 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -86,6 +86,7 @@ #include "hw/net/ne2000-isa.h" #include "standard-headers/asm-x86/bootparam.h" #include "hw/virtio/virtio-pmem-pci.h" +#include "hw/virtio/virtio-mem-pci.h" #include "hw/mem/memory-device.h" #include "sysemu/replay.h" #include "qapi/qmp/qerror.h" @@ -1654,8 +1655,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, numa_cpu_pre_plug(cpu_slot, dev, errp); } -static void pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev, -DeviceState *dev, Error **errp) +static void pc_virtio_md_pci_pre_plug(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) { HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev); Error *local_err = NULL; @@ -1666,7 +1667,8 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev, * order. This should never be the case on x86, however better add * a safety net. */ -error_setg(errp, "virtio-pmem-pci not supported on this bus."); +error_setg(errp, + "virtio based memory devices not supported on this bus."); return; } /* @@ -1681,8 +1683,8 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev, error_propagate(errp, local_err); } -static void pc_virtio_pmem_pci_plug(HotplugHandler *hotplug_dev, -DeviceState *dev, Error **errp) +static void pc_virtio_md_pci_plug(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) { HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev); Error *local_err = NULL; @@ -1700,17 +1702,17 @@ static void pc_virtio_pmem_pci_plug(HotplugHandler *hotplug_dev, error_propagate(errp, local_err); } -static void pc_virtio_pmem_pci_unplug_request(HotplugHandler *hotplug_dev, - DeviceState *dev, Error **errp) +static void pc_virtio_md_pci_unplug_request(HotplugHandler *hotplug_dev, +DeviceState *dev, Error **errp) { -/* We don't support virtio pmem hot unplug */ -error_setg(errp, "virtio pmem device unplug not supported."); +/* We don't support hot unplug of virtio based memory devices */ +error_setg(errp, "virtio based memory devices cannot be unplugged."); } -static void pc_virtio_pmem_pci_unplug(HotplugHandler *hotplug_dev, - DeviceState *dev, Error **errp) +static void pc_virtio_md_pci_unplug(HotplugHandler *hotplug_dev, +DeviceState *dev, Error **errp) { -/* We don't support virtio pmem hot unplug */ +/* We don't support hot unplug of virtio based memory devices */ } static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, @@ -1720,8 +1722,9 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, pc_memory_pre_plug(hotplug_dev, dev, errp); } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { pc_cpu_pre_plug(hotplug_dev, dev, errp); -} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) { -pc_virtio_pmem_pci_pre_plug(hotplug_dev, dev, errp); +} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) || + object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) { +pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp); } } @@ -1732,8 +1735,9 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev, pc_memory_plug(hotplug_dev, dev, errp); } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { pc_cpu_plug(hotplug_dev, dev, errp); -} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) { -pc_virtio_pmem_pci_plug(hotplug_dev, dev, errp); +} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) || + object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) { +pc_virtio_md_pci_plug(hotplug_dev, dev, errp); } } @@ -1744,8 +1748,9 @@ sta
[PATCH v1 09/17] linux-headers: update to contain virtio-mem
To be merged hopefully soon. Then, we can replace this by a proper header sync. Cc: "Michael S. Tsirkin" Signed-off-by: David Hildenbrand --- include/standard-headers/linux/virtio_ids.h | 1 + include/standard-headers/linux/virtio_mem.h | 208 2 files changed, 209 insertions(+) create mode 100644 include/standard-headers/linux/virtio_mem.h diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h index ecc27a1740..b052355ac7 100644 --- a/include/standard-headers/linux/virtio_ids.h +++ b/include/standard-headers/linux/virtio_ids.h @@ -44,6 +44,7 @@ #define VIRTIO_ID_VSOCK19 /* virtio vsock transport */ #define VIRTIO_ID_CRYPTO 20 /* virtio crypto */ #define VIRTIO_ID_IOMMU23 /* virtio IOMMU */ +#define VIRTIO_ID_MEM 24 /* virtio mem */ #define VIRTIO_ID_FS 26 /* virtio filesystem */ #define VIRTIO_ID_PMEM 27 /* virtio pmem */ #define VIRTIO_ID_MAC80211_HWSIM 29 /* virtio mac80211-hwsim */ diff --git a/include/standard-headers/linux/virtio_mem.h b/include/standard-headers/linux/virtio_mem.h new file mode 100644 index 00..c28dd40870 --- /dev/null +++ b/include/standard-headers/linux/virtio_mem.h @@ -0,0 +1,208 @@ +/* SPDX-License-Identifier: BSD-3-Clause */ +/* + * Virtio Mem Device + * + * Copyright Red Hat, Inc. 2020 + * + * Authors: + * David Hildenbrand + * + * This header is BSD licensed so anyone can use the definitions + * to implement compatible drivers/servers: + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the name of IBM nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL IBM OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifndef _LINUX_VIRTIO_MEM_H +#define _LINUX_VIRTIO_MEM_H + +#include "standard-headers/linux/types.h" +#include "standard-headers/linux/virtio_types.h" +#include "standard-headers/linux/virtio_ids.h" +#include "standard-headers/linux/virtio_config.h" + +/* + * Each virtio-mem device manages a dedicated region in physical address + * space. Each device can belong to a single NUMA node, multiple devices + * for a single NUMA node are possible. A virtio-mem device is like a + * "resizable DIMM" consisting of small memory blocks that can be plugged + * or unplugged. The device driver is responsible for (un)plugging memory + * blocks on demand. + * + * Virtio-mem devices can only operate on their assigned memory region in + * order to (un)plug memory. A device cannot (un)plug memory belonging to + * other devices. + * + * The "region_size" corresponds to the maximum amount of memory that can + * be provided by a device. The "size" corresponds to the amount of memory + * that is currently plugged. "requested_size" corresponds to a request + * from the device to the device driver to (un)plug blocks. The + * device driver should try to (un)plug blocks in order to reach the + * "requested_size". It is impossible to plug more memory than requested. + * + * The "usable_region_size" represents the memory region that can actually + * be used to (un)plug memory. It is always at least as big as the + * "requested_size" and will grow dynamically. It will only shrink when + * explicitly triggered (VIRTIO_MEM_REQ_UNPLUG). + * + * There are no guarantees what will happen if unplugged memory is + * read/written. Such memory should, in general, not be touched. E.g., + * even writing might succeed, but the values will simply be discarded at + * random points in time. + * + * It can happen that the device cannot process a request, because it is + * busy. The device driver has to retry later. + * + * Usually, during system resets all memory wi
[PATCH v1 06/17] target/i386: sev: Use ram_block_discard_set_broken()
AMD SEV will pin all guest memory, mark discarding of RAM broken. At the time this is called, we cannot have anyone active that relies on discards to work properly. Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: David Hildenbrand --- target/i386/sev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/target/i386/sev.c b/target/i386/sev.c index 846018a12d..608225f9ba 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -722,6 +722,7 @@ sev_guest_init(const char *id) ram_block_notifier_add(&sev_ram_notifier); qemu_add_machine_init_done_notifier(&sev_machine_done_notify); qemu_add_vm_change_state_handler(sev_vm_state_change, s); +g_assert(!ram_block_discard_set_broken(true)); return s; err: -- 2.25.3
[PATCH v1 17/17] virtio-pci: Send qapi events when the virtio-mem size changes
Let's register the notifier and trigger the qapi event with the right device id. MEMORY_DEVICE_SIZE_CHANGE is similar to BALLOON_CHANGE, however on a memory device level. Don't unregister the notifier (we neither have finalize() nor unrealize() for VirtIOPCIProxy, so it's not that simple to do it) - both devices are expected to vanish at the same time. Cc: "Michael S. Tsirkin" Cc: Markus Armbruster Cc: "Dr. David Alan Gilbert" Cc: Eric Blake Cc: Igor Mammedov Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem-pci.c | 28 hw/virtio/virtio-mem-pci.h | 1 + hw/virtio/virtio-mem.c | 2 +- monitor/monitor.c | 1 + qapi/misc.json | 25 + 5 files changed, 56 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c index a47d21c81f..780d7b4af7 100644 --- a/hw/virtio/virtio-mem-pci.c +++ b/hw/virtio/virtio-mem-pci.c @@ -15,6 +15,7 @@ #include "virtio-mem-pci.h" #include "hw/mem/memory-device.h" #include "qapi/error.h" +#include "qapi/qapi-events-misc.h" static void virtio_mem_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) { @@ -75,6 +76,21 @@ static void virtio_mem_pci_fill_device_info(const MemoryDeviceState *md, info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM; } +static void virtio_mem_pci_size_change_notify(Notifier *notifier, void *data) +{ +VirtIOMEMPCI *pci_mem = container_of(notifier, VirtIOMEMPCI, + size_change_notifier); +DeviceState *dev = DEVICE(pci_mem); +const uint64_t * const size_p = data; +const char *id = NULL; + +if (dev->id) { +id = g_strdup(dev->id); +} + +qapi_event_send_memory_device_size_change(!!id, id, *size_p); +} + static void virtio_mem_pci_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -99,9 +115,21 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, void *data) static void virtio_mem_pci_instance_init(Object *obj) { VirtIOMEMPCI *dev = VIRTIO_MEM_PCI(obj); +VirtIOMEMClass *vmc; +VirtIOMEM *vmem; virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev), TYPE_VIRTIO_MEM); + +dev->size_change_notifier.notify = virtio_mem_pci_size_change_notify; +vmem = VIRTIO_MEM(&dev->vdev); +vmc = VIRTIO_MEM_GET_CLASS(vmem); +/* + * We never remove the notifier again, as we expect both devices to + * disappear at the same time. + */ +vmc->add_size_change_notifier(vmem, &dev->size_change_notifier); + object_property_add_alias(obj, VIRTIO_MEM_BLOCK_SIZE_PROP, OBJECT(&dev->vdev), VIRTIO_MEM_BLOCK_SIZE_PROP, &error_abort); diff --git a/hw/virtio/virtio-mem-pci.h b/hw/virtio/virtio-mem-pci.h index 8820cd6628..b51a28b275 100644 --- a/hw/virtio/virtio-mem-pci.h +++ b/hw/virtio/virtio-mem-pci.h @@ -28,6 +28,7 @@ typedef struct VirtIOMEMPCI VirtIOMEMPCI; struct VirtIOMEMPCI { VirtIOPCIProxy parent_obj; VirtIOMEM vdev; +Notifier size_change_notifier; }; #endif /* QEMU_VIRTIO_MEM_PCI_H */ diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 88a99a0d90..eb5cf66855 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -491,7 +491,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev, Error **errp) virtio_del_queue(vdev, 0); virtio_cleanup(vdev); g_free(vmem->bitmap); -ramblock_discard_set_required(false); +ram_block_discard_set_required(false); } static int virtio_mem_pre_save(void *opaque) diff --git a/monitor/monitor.c b/monitor/monitor.c index 125494410a..19dcb8fbe3 100644 --- a/monitor/monitor.c +++ b/monitor/monitor.c @@ -235,6 +235,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, [QAPI_EVENT_QUORUM_FAILURE]= { 1000 * SCALE_MS }, [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS }, +[QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS }, }; /* diff --git a/qapi/misc.json b/qapi/misc.json index feaeacec22..58b073562b 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -1432,6 +1432,31 @@ ## { 'command': 'query-memory-devices', 'returns': ['MemoryDeviceInfo'] } +## +# @MEMORY_DEVICE_SIZE_CHANGE: +# +# Emitted when the size of a memory device changes. Only emitted for memory +# devices that can actually change the size (e.g., virtio-mem due to guest +# action). +# +# @id: device's ID +# @size: the new size of memory that the device provides +# +# Note: this event is rate-limited. +# +# Since: 5.1 +# +# Example: +# +# <- { "event": "MEMORY_DEVICE_SIZE_CHANGE", +# "data": { "id": "vm0", "size": 1073741824}, +# "timestamp": { "seconds": 1588168529, "microseconds": 201316 } } +# +## +{ 'event': 'MEMORY_DEVICE_SIZE_CHANGE', + 'data': { '*id': 'str', 'size': 'siz
[PATCH v1 04/17] s390x/pv: Convert to ram_block_discard_set_broken()
Discarding RAM does not work as expected with protected VMs. Let's switch to ram_block_discard_set_broken() for now, as we want to get rid of qemu_balloon_inhibit(). Note that it will currently never fail, but might fail in the future with new technologies (e.g., virtio-mem). Cc: Richard Henderson Cc: Cornelia Huck Cc: Halil Pasic Cc: Christian Borntraeger Cc: Janosch Frank Signed-off-by: David Hildenbrand --- hw/s390x/s390-virtio-ccw.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 45292fb5a8..883ea392bc 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -43,7 +43,6 @@ #include "hw/qdev-properties.h" #include "hw/s390x/tod.h" #include "sysemu/sysemu.h" -#include "sysemu/balloon.h" #include "hw/s390x/pv.h" #include "migration/blocker.h" @@ -329,7 +328,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms) ms->pv = false; migrate_del_blocker(pv_mig_blocker); error_free_or_abort(&pv_mig_blocker); -qemu_balloon_inhibit(false); +ram_block_discard_set_broken(false); } static int s390_machine_protect(S390CcwMachineState *ms) @@ -338,17 +337,22 @@ static int s390_machine_protect(S390CcwMachineState *ms) int rc; /* -* Ballooning on protected VMs needs support in the guest for -* sharing and unsharing balloon pages. Block ballooning for -* now, until we have a solution to make at least Linux guests -* either support it or fail gracefully. +* Discarding of memory in RAM blocks does not work as expected with +* protected VMs. Sharing and unsharing pages would be required. Mark it as +* broken for now, until until we have a solution to make at least Linux +* guests either support it (e.g., virtio-balloon) or fail gracefully. */ -qemu_balloon_inhibit(true); +rc = ram_block_discard_set_broken(true); +if (rc) { +error_report("protected VMs: cannot set discarding of RAM broken"); +return rc; +} + error_setg(&pv_mig_blocker, "protected VMs are currently not migrateable."); rc = migrate_add_blocker(pv_mig_blocker, &local_err); if (rc) { -qemu_balloon_inhibit(false); +ram_block_discard_set_broken(false); error_report_err(local_err); error_free_or_abort(&pv_mig_blocker); return rc; @@ -357,7 +361,7 @@ static int s390_machine_protect(S390CcwMachineState *ms) /* Create SE VM */ rc = s390_pv_vm_enable(); if (rc) { -qemu_balloon_inhibit(false); +ram_block_discard_set_broken(false); migrate_del_blocker(pv_mig_blocker); error_free_or_abort(&pv_mig_blocker); return rc; -- 2.25.3
[PATCH v1 07/17] migration/rdma: Use ram_block_discard_set_broken()
RDMA will pin all guest memory (as documented in docs/rdma.txt). We want to mark RAM block discards to be broken - however, to keep it simple use ram_block_discard_is_required() instead of inhibiting. Cc: "Michael S. Tsirkin" Cc: Juan Quintela Cc: "Dr. David Alan Gilbert" Signed-off-by: David Hildenbrand --- migration/rdma.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index f61587891b..029adbb950 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -29,6 +29,7 @@ #include "qemu/sockets.h" #include "qemu/bitmap.h" #include "qemu/coroutine.h" +#include "exec/memory.h" #include #include #include @@ -4017,8 +4018,14 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp) Error *local_err = NULL; trace_rdma_start_incoming_migration(); -rdma = qemu_rdma_data_init(host_port, &local_err); +/* Avoid ram_block_discard_set_broken(), cannot change during migration. */ +if (ram_block_discard_is_required()) { +error_setg(errp, "RDMA: cannot set discarding of RAM broken"); +return; +} + +rdma = qemu_rdma_data_init(host_port, &local_err); if (rdma == NULL) { goto err; } @@ -4064,10 +4071,17 @@ void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp) { MigrationState *s = opaque; -RDMAContext *rdma = qemu_rdma_data_init(host_port, errp); RDMAContext *rdma_return_path = NULL; +RDMAContext *rdma; int ret = 0; +/* Avoid ram_block_discard_set_broken(), cannot change during migration. */ +if (ram_block_discard_is_required()) { +error_setg(errp, "RDMA: cannot set discarding of RAM broken"); +return; +} + +rdma = qemu_rdma_data_init(host_port, errp); if (rdma == NULL) { goto err; } -- 2.25.3
[PATCH v1 16/17] virtio-mem: Allow notifiers for size changes
We want to send qapi events in case the size of a virtio-mem device changes. This allows upper layers to always know how much memory is actually currently consumed via a virtio-mem device. Unfortuantely, we have to report the id of our proxy device. Let's provide an easy way for our proxy device to register, so it can send the qapi events. Piggy-backing on the notifier infrastructure (although we'll only ever have one notifier registered) seems to be an easy way. Cc: "Michael S. Tsirkin" Cc: "Dr. David Alan Gilbert" Cc: Igor Mammedov Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 21 - include/hw/virtio/virtio-mem.h | 5 + 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index e25b2c74f2..88a99a0d90 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -198,6 +198,7 @@ static int virtio_mem_state_change_request(VirtIOMEM *vmem, uint64_t gpa, } else { vmem->size -= size; } +notifier_list_notify(&vmem->size_change_notifiers, &vmem->size); return VIRTIO_MEM_RESP_ACK; } @@ -253,7 +254,10 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem) return -EBUSY; } bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size); -vmem->size = 0; +if (vmem->size != 0) { +vmem->size = 0; +notifier_list_notify(&vmem->size_change_notifiers, &vmem->size); +} virtio_mem_resize_usable_region(vmem, vmem->requested_size, true); return 0; @@ -594,6 +598,18 @@ static MemoryRegion *virtio_mem_get_memory_region(VirtIOMEM *vmem, Error **errp) return &vmem->memdev->mr; } +static void virtio_mem_add_size_change_notifier(VirtIOMEM *vmem, +Notifier *notifier) +{ +notifier_list_add(&vmem->size_change_notifiers, notifier); +} + +static void virtio_mem_remove_size_change_notifier(VirtIOMEM *vmem, + Notifier *notifier) +{ +notifier_remove(notifier); +} + static void virtio_mem_get_size(Object *obj, Visitor *v, const char *name, void *opaque, Error **errp) { @@ -705,6 +721,7 @@ static void virtio_mem_instance_init(Object *obj) VirtIOMEM *vmem = VIRTIO_MEM(obj); vmem->block_size = VIRTIO_MEM_MIN_BLOCK_SIZE; +notifier_list_init(&vmem->size_change_notifiers); object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size, NULL, NULL, NULL, &error_abort); @@ -743,6 +760,8 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data) vmc->fill_device_info = virtio_mem_fill_device_info; vmc->get_memory_region = virtio_mem_get_memory_region; +vmc->add_size_change_notifier = virtio_mem_add_size_change_notifier; +vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier; } static const TypeInfo virtio_mem_info = { diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index 27158cb611..5820b5c23e 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -66,6 +66,9 @@ typedef struct VirtIOMEM { /* block size and alignment */ uint32_t block_size; uint32_t migration_block_size; + +/* notifiers to notify when "size" changes */ +NotifierList size_change_notifiers; } VirtIOMEM; typedef struct VirtIOMEMClass { @@ -75,6 +78,8 @@ typedef struct VirtIOMEMClass { /* public */ void (*fill_device_info)(const VirtIOMEM *vmen, VirtioMEMDeviceInfo *vi); MemoryRegion *(*get_memory_region)(VirtIOMEM *vmem, Error **errp); +void (*add_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifier); +void (*remove_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifier); } VirtIOMEMClass; #endif -- 2.25.3
Re: [PATCH v2 1/3] acpi: Move build_tpm2() in the generic part
Hi, On 5/6/20 8:33 AM, Andrew Jones wrote: > On Tue, May 05, 2020 at 04:44:17PM +0200, Eric Auger wrote: >> We plan to build the tpm2 table on ARM too. In order to reuse the >> generation code, let's move build_tpm2() to aml-build.c. >> >> No change in the implementation. >> >> Signed-off-by: Eric Auger >> --- >> include/hw/acpi/aml-build.h | 2 ++ >> hw/acpi/aml-build.c | 30 ++ >> hw/i386/acpi-build.c| 30 -- >> 3 files changed, 32 insertions(+), 30 deletions(-) >> >> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h >> index 0f4ed53d7f..a67ab4618a 100644 >> --- a/include/hw/acpi/aml-build.h >> +++ b/include/hw/acpi/aml-build.h >> @@ -437,4 +437,6 @@ void build_slit(GArray *table_data, BIOSLinker *linker, >> MachineState *ms); >> >> void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, >> const char *oem_id, const char *oem_table_id); >> + >> +void build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog); >> #endif >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c >> index 2c3702b882..1f7fd09112 100644 >> --- a/hw/acpi/aml-build.c >> +++ b/hw/acpi/aml-build.c >> @@ -26,6 +26,7 @@ >> #include "qemu/bitops.h" >> #include "sysemu/numa.h" >> #include "hw/boards.h" >> +#include "hw/acpi/tpm.h" >> >> static GArray *build_alloc_array(void) >> { >> @@ -1875,6 +1876,35 @@ build_hdr: >> "FACP", tbl->len - fadt_start, f->rev, oem_id, >> oem_table_id); >> } >> >> +void build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog) >> +{ >> +Acpi20TPM2 *tpm2_ptr = acpi_data_push(table_data, sizeof *tpm2_ptr); >> +unsigned log_addr_size = sizeof(tpm2_ptr->log_area_start_address); >> +unsigned log_addr_offset = >> +(char *)&tpm2_ptr->log_area_start_address - table_data->data; >> + >> +tpm2_ptr->platform_class = cpu_to_le16(TPM2_ACPI_CLASS_CLIENT); >> +if (TPM_IS_TIS_ISA(tpm_find())) { >> +tpm2_ptr->control_area_address = cpu_to_le64(0); >> +tpm2_ptr->start_method = cpu_to_le32(TPM2_START_METHOD_MMIO); >> +} else if (TPM_IS_CRB(tpm_find())) { >> +tpm2_ptr->control_area_address = cpu_to_le64(TPM_CRB_ADDR_CTRL); >> +tpm2_ptr->start_method = cpu_to_le32(TPM2_START_METHOD_CRB); >> +} else { >> +g_warn_if_reached(); >> +} >> + >> +tpm2_ptr->log_area_minimum_length = >> +cpu_to_le32(TPM_LOG_AREA_MINIMUM_SIZE); >> + >> +/* log area start address to be filled by Guest linker */ >> +bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, >> + log_addr_offset, log_addr_size, >> + ACPI_BUILD_TPMLOG_FILE, 0); >> +build_header(linker, table_data, >> + (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, >> NULL); >> +} >> + > > I'll let Igor and mst confirm/deny this, but my understanding was that the > build_append* API was the preferred way to create the table. Indeed, I > don't see too many table.field = cpu_to_le(...) lines in aml-build.c > > I realize this function is just getting moved, but maybe it should get > converted to the build_append* API while being moved? The reason I did not convert is that the struct is as follows struct Acpi20TPM2 { ACPI_TABLE_HEADER_DEF uint16_t platform_class; uint16_t reserved; uint64_t control_area_address; uint32_t start_method; uint8_t start_method_params[12]; uint32_t log_area_minimum_length; uint64_t log_area_start_address; } QEMU_PACKED; If I understand correctly the build_append* adds the fields contiguously. It was not straightforward to me how to skip the start_method_params array. While we are at it the tcpalog arg is not used. Shall I remove it? Thanks Eric > > Thanks, > drew > >
[PATCH v1 08/17] migration/colo: Use ram_block_discard_set_broken()
COLO will copy all memory in a RAM block, mark discarding of RAM broken. Cc: "Michael S. Tsirkin" Cc: Hailiang Zhang Cc: Juan Quintela Cc: "Dr. David Alan Gilbert" Signed-off-by: David Hildenbrand --- include/migration/colo.h | 2 +- migration/migration.c| 8 +++- migration/savevm.c | 11 +-- 3 files changed, 17 insertions(+), 4 deletions(-) diff --git a/include/migration/colo.h b/include/migration/colo.h index 1636e6f907..768e1f04c3 100644 --- a/include/migration/colo.h +++ b/include/migration/colo.h @@ -25,7 +25,7 @@ void migrate_start_colo_process(MigrationState *s); bool migration_in_colo_state(void); /* loadvm */ -void migration_incoming_enable_colo(void); +int migration_incoming_enable_colo(void); void migration_incoming_disable_colo(void); bool migration_incoming_colo_enabled(void); void *colo_process_incoming_thread(void *opaque); diff --git a/migration/migration.c b/migration/migration.c index 177cce9e95..f6830e4620 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -338,12 +338,18 @@ bool migration_incoming_colo_enabled(void) void migration_incoming_disable_colo(void) { +ram_block_discard_set_broken(false); migration_colo_enabled = false; } -void migration_incoming_enable_colo(void) +int migration_incoming_enable_colo(void) { +if (ram_block_discard_set_broken(true)) { +error_report("COLO: cannot set discarding of RAM broken"); +return -EBUSY; +} migration_colo_enabled = true; +return 0; } void migrate_add_address(SocketAddress *address) diff --git a/migration/savevm.c b/migration/savevm.c index c00a6807d9..19b4f9600d 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2111,8 +2111,15 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis, static int loadvm_process_enable_colo(MigrationIncomingState *mis) { -migration_incoming_enable_colo(); -return colo_init_ram_cache(); +int ret = migration_incoming_enable_colo(); + +if (!ret) { +ret = colo_init_ram_cache(); +if (ret) { +migration_incoming_disable_colo(); +} +} +return ret; } /* -- 2.25.3
[PATCH v1 05/17] virtio-balloon: Rip out qemu_balloon_inhibit()
The only remaining special case is postcopy. It cannot handle concurrent discards yet, which would result in requesting already sent pages from the source. Special-case it in virtio-balloon instead. Cc: "Michael S. Tsirkin" Cc: Juan Quintela Cc: "Dr. David Alan Gilbert" Signed-off-by: David Hildenbrand --- balloon.c | 18 -- hw/virtio/virtio-balloon.c | 12 +++- include/sysemu/balloon.h | 2 -- migration/postcopy-ram.c | 23 --- 4 files changed, 11 insertions(+), 44 deletions(-) diff --git a/balloon.c b/balloon.c index c49f57c27b..354408c6ea 100644 --- a/balloon.c +++ b/balloon.c @@ -36,24 +36,6 @@ static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; static void *balloon_opaque; -static int balloon_inhibit_count; - -bool qemu_balloon_is_inhibited(void) -{ -return atomic_read(&balloon_inhibit_count) > 0 || - ram_block_discard_is_broken(); -} - -void qemu_balloon_inhibit(bool state) -{ -if (state) { -atomic_inc(&balloon_inhibit_count); -} else { -atomic_dec(&balloon_inhibit_count); -} - -assert(atomic_read(&balloon_inhibit_count) >= 0); -} static bool have_balloon(Error **errp) { diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index a4729f7fc9..aa5b89fb47 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -29,6 +29,7 @@ #include "trace.h" #include "qemu/error-report.h" #include "migration/misc.h" +#include "migration/postcopy-ram.h" #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" @@ -63,6 +64,15 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp, return pbp->base_gpa == base_gpa; } +static bool virtio_balloon_inhibited(void) +{ +PostcopyState ps = postcopy_state_get(); + +/* Postcopy cannot deal with concurrent discards (yet), so it's special. */ +return ram_block_discard_is_broken() || + (ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END); +} + static void balloon_inflate_page(VirtIOBalloon *balloon, MemoryRegion *mr, hwaddr mr_offset, PartiallyBalloonedPage *pbp) @@ -360,7 +370,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) trace_virtio_balloon_handle_output(memory_region_name(section.mr), pa); -if (!qemu_balloon_is_inhibited()) { +if (!virtio_balloon_inhibited()) { if (vq == s->ivq) { balloon_inflate_page(s, section.mr, section.offset_within_region, &pbp); diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h index aea0c44985..20a2defe3a 100644 --- a/include/sysemu/balloon.h +++ b/include/sysemu/balloon.h @@ -23,7 +23,5 @@ typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info); int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, QEMUBalloonStatus *stat_func, void *opaque); void qemu_remove_balloon_handler(void *opaque); -bool qemu_balloon_is_inhibited(void); -void qemu_balloon_inhibit(bool state); #endif diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index a36402722b..b41a9fe2fd 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -27,7 +27,6 @@ #include "qemu/notify.h" #include "qemu/rcu.h" #include "sysemu/sysemu.h" -#include "sysemu/balloon.h" #include "qemu/error-report.h" #include "trace.h" #include "hw/boards.h" @@ -520,20 +519,6 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis) return 0; } -/* - * Manage a single vote to the QEMU balloon inhibitor for all postcopy usage, - * last caller wins. - */ -static void postcopy_balloon_inhibit(bool state) -{ -static bool cur_state = false; - -if (state != cur_state) { -qemu_balloon_inhibit(state); -cur_state = state; -} -} - /* * At the end of a migration where postcopy_ram_incoming_init was called. */ @@ -565,8 +550,6 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis) mis->have_fault_thread = false; } -postcopy_balloon_inhibit(false); - if (enable_mlock) { if (os_mlock() < 0) { error_report("mlock: %s", strerror(errno)); @@ -1160,12 +1143,6 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis) } memset(mis->postcopy_tmp_zero_page, '\0', mis->largest_page_size); -/* - * Ballooning can mark pages as absent while we're postcopying - * that would cause false userfaults. - */ -postcopy_balloon_inhibit(true); - trace_postcopy_ram_enable_notify(); return 0; -- 2.25.3
[PATCH v1 13/17] hmp: Handle virtio-mem when printing memory device info
Print the memory device info just like for other memory devices. Cc: "Dr. David Alan Gilbert" Cc: "Michael S. Tsirkin" Signed-off-by: David Hildenbrand --- monitor/hmp-cmds.c | 16 1 file changed, 16 insertions(+) diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index 7f6e982dc8..4b3638a2a6 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -1805,6 +1805,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict) MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err); MemoryDeviceInfoList *info; VirtioPMEMDeviceInfo *vpi; +VirtioMEMDeviceInfo *vmi; MemoryDeviceInfo *value; PCDIMMDeviceInfo *di; @@ -1839,6 +1840,21 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict) monitor_printf(mon, " size: %" PRIu64 "\n", vpi->size); monitor_printf(mon, " memdev: %s\n", vpi->memdev); break; +case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM: +vmi = value->u.virtio_mem.data; +monitor_printf(mon, "Memory device [%s]: \"%s\"\n", + MemoryDeviceInfoKind_str(value->type), + vmi->id ? vmi->id : ""); +monitor_printf(mon, " memaddr: 0x%" PRIx64 "\n", vmi->memaddr); +monitor_printf(mon, " node: %" PRId64 "\n", vmi->node); +monitor_printf(mon, " requested-size: %" PRIu64 "\n", + vmi->requested_size); +monitor_printf(mon, " size: %" PRIu64 "\n", vmi->size); +monitor_printf(mon, " max-size: %" PRIu64 "\n", vmi->max_size); +monitor_printf(mon, " block-size: %" PRIu64 "\n", + vmi->block_size); +monitor_printf(mon, " memdev: %s\n", vmi->memdev); +break; default: g_assert_not_reached(); } -- 2.25.3
[PATCH v1 11/17] virtio-pci: Proxy for virtio-mem
Let's add a proxy for virtio-mem, make it a memory device, and pass-through the properties. Cc: "Michael S. Tsirkin" Cc: Marcel Apfelbaum Cc: "Dr. David Alan Gilbert" Cc: Igor Mammedov Signed-off-by: David Hildenbrand --- hw/virtio/Makefile.objs| 1 + hw/virtio/virtio-mem-pci.c | 131 + hw/virtio/virtio-mem-pci.h | 33 ++ include/hw/pci/pci.h | 1 + 4 files changed, 166 insertions(+) create mode 100644 hw/virtio/virtio-mem-pci.c create mode 100644 hw/virtio/virtio-mem-pci.h diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index 7df70e977e..b9661f9c01 100644 --- a/hw/virtio/Makefile.objs +++ b/hw/virtio/Makefile.objs @@ -19,6 +19,7 @@ obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) += vhost-user-fs-p obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o obj-$(CONFIG_VIRTIO_MEM) += virtio-mem.o +common-obj-$(call land,$(CONFIG_VIRTIO_MEM),$(CONFIG_VIRTIO_PCI)) += virtio-mem-pci.o ifeq ($(CONFIG_VIRTIO_PCI),y) obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock-pci.o diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c new file mode 100644 index 00..a47d21c81f --- /dev/null +++ b/hw/virtio/virtio-mem-pci.c @@ -0,0 +1,131 @@ +/* + * Virtio MEM PCI device + * + * Copyright (C) 2020 Red Hat, Inc. + * + * Authors: + * David Hildenbrand + * + * This work is licensed under the terms of the GNU GPL, version 2. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" + +#include "virtio-mem-pci.h" +#include "hw/mem/memory-device.h" +#include "qapi/error.h" + +static void virtio_mem_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) +{ +VirtIOMEMPCI *mem_pci = VIRTIO_MEM_PCI(vpci_dev); +DeviceState *vdev = DEVICE(&mem_pci->vdev); + +qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus)); +object_property_set_bool(OBJECT(vdev), true, "realized", errp); +} + +static void virtio_mem_pci_set_addr(MemoryDeviceState *md, uint64_t addr, +Error **errp) +{ +object_property_set_uint(OBJECT(md), addr, VIRTIO_MEM_ADDR_PROP, errp); +} + +static uint64_t virtio_mem_pci_get_addr(const MemoryDeviceState *md) +{ +return object_property_get_uint(OBJECT(md), VIRTIO_MEM_ADDR_PROP, +&error_abort); +} + +static MemoryRegion *virtio_mem_pci_get_memory_region(MemoryDeviceState *md, + Error **errp) +{ +VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md); +VirtIOMEM *vmem = VIRTIO_MEM(&pci_mem->vdev); +VirtIOMEMClass *vmc = VIRTIO_MEM_GET_CLASS(vmem); + +return vmc->get_memory_region(vmem, errp); +} + +static uint64_t virtio_mem_pci_get_plugged_size(const MemoryDeviceState *md, +Error **errp) +{ +return object_property_get_uint(OBJECT(md), VIRTIO_MEM_SIZE_PROP, +errp); +} + +static void virtio_mem_pci_fill_device_info(const MemoryDeviceState *md, +MemoryDeviceInfo *info) +{ +VirtioMEMDeviceInfo *vi = g_new0(VirtioMEMDeviceInfo, 1); +VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md); +VirtIOMEM *vmem = VIRTIO_MEM(&pci_mem->vdev); +VirtIOMEMClass *vpc = VIRTIO_MEM_GET_CLASS(vmem); +DeviceState *dev = DEVICE(md); + +if (dev->id) { +vi->has_id = true; +vi->id = g_strdup(dev->id); +} + +/* let the real device handle everything else */ +vpc->fill_device_info(vmem, vi); + +info->u.virtio_mem.data = vi; +info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM; +} + +static void virtio_mem_pci_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass); +PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass); +MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass); + +k->realize = virtio_mem_pci_realize; +set_bit(DEVICE_CATEGORY_MISC, dc->categories); +pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET; +pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_MEM; +pcidev_k->revision = VIRTIO_PCI_ABI_VERSION; +pcidev_k->class_id = PCI_CLASS_OTHERS; + +mdc->get_addr = virtio_mem_pci_get_addr; +mdc->set_addr = virtio_mem_pci_set_addr; +mdc->get_plugged_size = virtio_mem_pci_get_plugged_size; +mdc->get_memory_region = virtio_mem_pci_get_memory_region; +mdc->fill_device_info = virtio_mem_pci_fill_device_info; +} + +static void virtio_mem_pci_instance_init(Object *obj) +{ +VirtIOMEMPCI *dev = VIRTIO_MEM_PCI(obj); + +virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev), +TYPE_VIRTIO_MEM); +object_property_add_alias(obj, VIRTIO_MEM_BLOCK_SIZE_PROP, + OBJECT(&dev->vdev), + VIRTIO_MEM_BLOCK_SIZE_PROP, &error_abort); +object_property_add_a
[PATCH v1 14/17] numa: Handle virtio-mem in NUMA stats
Account the memory to the configured nid. Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: "Michael S. Tsirkin" Signed-off-by: David Hildenbrand --- hw/core/numa.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/core/numa.c b/hw/core/numa.c index 316bc50d75..06960918e7 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -812,6 +812,7 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[]) MemoryDeviceInfoList *info; PCDIMMDeviceInfo *pcdimm_info; VirtioPMEMDeviceInfo *vpi; +VirtioMEMDeviceInfo *vmi; for (info = info_list; info; info = info->next) { MemoryDeviceInfo *value = info->value; @@ -832,6 +833,11 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[]) node_mem[0].node_mem += vpi->size; node_mem[0].node_plugged_mem += vpi->size; break; +case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM: +vmi = value->u.virtio_mem.data; +node_mem[vmi->node].node_mem += vmi->size; +node_mem[vmi->node].node_plugged_mem += vmi->size; +break; default: g_assert_not_reached(); } -- 2.25.3
Re: [PATCH v2 1/3] acpi: Move build_tpm2() in the generic part
On Wed, May 06, 2020 at 08:33:14AM +0200, Andrew Jones wrote: > I realize this function is just getting moved, but maybe it should get > converted to the build_append* API while being moved? I'd rather refactoring was done in a separate patch - easier to review. -- MST
Re: [PATCH RFC 1/4] migration: Export migration_bitmap_sync_precopy()
* Peter Xu (pet...@redhat.com) wrote: > Make it usable outside migration. To make it easier to use, remove the > RAMState parameter since after all ram.c has the reference of ram_state > directly from its context. > > Signed-off-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert > --- > include/migration/misc.h | 1 + > migration/ram.c | 10 +- > 2 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/include/migration/misc.h b/include/migration/misc.h > index d2762257aa..e338be8c30 100644 > --- a/include/migration/misc.h > +++ b/include/migration/misc.h > @@ -66,6 +66,7 @@ void remove_migration_state_change_notifier(Notifier > *notify); > bool migration_in_setup(MigrationState *); > bool migration_has_finished(MigrationState *); > bool migration_has_failed(MigrationState *); > +void migration_bitmap_sync_precopy(void); > /* ...and after the device transmission */ > bool migration_in_postcopy_after_devices(MigrationState *); > void migration_global_dump(Monitor *mon); > diff --git a/migration/ram.c b/migration/ram.c > index 04f13feb2e..d737175d4e 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -970,7 +970,7 @@ static void migration_bitmap_sync(RAMState *rs) > } > } > > -static void migration_bitmap_sync_precopy(RAMState *rs) > +void migration_bitmap_sync_precopy(void) > { > Error *local_err = NULL; > > @@ -983,7 +983,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs) > local_err = NULL; > } > > -migration_bitmap_sync(rs); > +migration_bitmap_sync(ram_state); > > if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, &local_err)) { > error_report_err(local_err); > @@ -2303,7 +2303,7 @@ static void ram_init_bitmaps(RAMState *rs) > WITH_RCU_READ_LOCK_GUARD() { > ram_list_init_bitmaps(); > memory_global_dirty_log_start(); > -migration_bitmap_sync_precopy(rs); > +migration_bitmap_sync_precopy(); > } > qemu_mutex_unlock_ramlist(); > qemu_mutex_unlock_iothread(); > @@ -2592,7 +2592,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > > WITH_RCU_READ_LOCK_GUARD() { > if (!migration_in_postcopy()) { > -migration_bitmap_sync_precopy(rs); > +migration_bitmap_sync_precopy(); > } > > ram_control_before_iterate(f, RAM_CONTROL_FINISH); > @@ -2642,7 +2642,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, > uint64_t max_size, > remaining_size < max_size) { > qemu_mutex_lock_iothread(); > WITH_RCU_READ_LOCK_GUARD() { > -migration_bitmap_sync_precopy(rs); > +migration_bitmap_sync_precopy(); > } > qemu_mutex_unlock_iothread(); > remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > -- > 2.24.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [PATCH v2 01/10] net: cadence_gem: Fix debug statements
Hi Edgar, Below comments will be taken care in V3. Thanks, Sai Pavan > -Original Message- > From: Edgar E. Iglesias > Sent: Monday, May 4, 2020 8:09 PM > To: Sai Pavan Boddu > Cc: Alistair Francis ; Peter Maydell > ; Jason Wang ; Markus > Armbruster ; Philippe Mathieu-Daudé > ; Tong Ho ; Ramon Fried > ; qemu-...@nongnu.org; qemu- > de...@nongnu.org > Subject: Re: [PATCH v2 01/10] net: cadence_gem: Fix debug statements > > On Mon, May 04, 2020 at 07:35:59PM +0530, Sai Pavan Boddu wrote: > > Enabling debug breaks the build, Fix them and make debug statements > > always compilable. Fix few statements to use sized integer casting. > > > > Signed-off-by: Sai Pavan Boddu > > --- > > hw/net/cadence_gem.c | 28 ++-- > > 1 file changed, 14 insertions(+), 14 deletions(-) > > > > diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c index > > 22a0b1b..2f244eb 100644 > > --- a/hw/net/cadence_gem.c > > +++ b/hw/net/cadence_gem.c > > @@ -35,14 +35,13 @@ > > #include "sysemu/dma.h" > > #include "net/checksum.h" > > > > -#ifdef CADENCE_GEM_ERR_DEBUG > > -#define DB_PRINT(...) do { \ > > -fprintf(stderr, ": %s: ", __func__); \ > > -fprintf(stderr, ## __VA_ARGS__); \ > > -} while (0) > > -#else > > -#define DB_PRINT(...) > > -#endif > > +#define CADENCE_GEM_ERR_DEBUG 0 > > +#define DB_PRINT(...) do {\ > > +if (CADENCE_GEM_ERR_DEBUG) { \ > > +qemu_log(": %s: ", __func__); \ > > +qemu_log(__VA_ARGS__); \ > > +} \ > > +} while (0) > > > > #define GEM_NWCTRL(0x/4) /* Network Control reg */ > > #define GEM_NWCFG (0x0004/4) /* Network Config reg */ > > @@ -979,7 +978,8 @@ static ssize_t gem_receive(NetClientState *nc, > const uint8_t *buf, size_t size) > > size += 4; > > } > > > > -DB_PRINT("config bufsize: %d packet size: %ld\n", rxbufsize, size); > > +DB_PRINT("config bufsize: %" PRIu64 " packet size: %" PRIu64 "\n", > > + (uint64_t) rxbufsize, (uint64_t) size); > > Shouldn't these be %u and %zd rather than casting to uint64_t? > > > > > > /* Find which queue we are targeting */ > > q = get_queue_from_screen(s, rxbuf_ptr, rxbufsize); @@ -992,9 > > +992,9 @@ static ssize_t gem_receive(NetClientState *nc, const uint8_t > *buf, size_t size) > > return -1; > > } > > > > -DB_PRINT("copy %u bytes to 0x%" PRIx64 "\n", > > - MIN(bytes_to_copy, rxbufsize), > > - rx_desc_get_buffer(s, s->rx_desc[q])); > > +DB_PRINT("copy %" PRIu32 " bytes to 0x%" PRIx64 "\n", > > +MIN(bytes_to_copy, rxbufsize), > > +rx_desc_get_buffer(s, s->rx_desc[q] + rxbuf_offset)); > > Looks like this is changing what we print (+ rxbuf_offset), was that > intentional? (it was not mentioned in the commit message) > > > > > > /* Copy packet data to emulated DMA buffer */ > > address_space_write(&s->dma_as, rx_desc_get_buffer(s, > > s->rx_desc[q]) + @@ -1160,8 +1160,8 @@ static void > gem_transmit(CadenceGEMState *s) > > */ > > if ((tx_desc_get_buffer(s, desc) == 0) || > > (tx_desc_get_length(desc) == 0)) { > > -DB_PRINT("Invalid TX descriptor @ 0x%x\n", > > - (unsigned)packet_desc_addr); > > +DB_PRINT("Invalid TX descriptor @ 0x%" HWADDR_PRIx "\n", > > + packet_desc_addr); > > break; > > } > > > > -- > > 2.7.4 > >
[PATCH v1 10/17] virtio-mem: Paravirtualized memory hot(un)plug
This is the very basic/initial version of virtio-mem. An introduction to virtio-mem can be found in the Linux kernel driver [1]. While it can be used in the current state for hotplug of a smaller amount of memory, it will heavily benefit from resizeable memory regions in the future. Each virtio-mem device manages a memory region (provided via a memory backend). After requested by the hypervisor ("requested-size"), the guest can try to plug/unplug blocks of memory within that region, in order to reach the requested size. Initially, and after a reboot, all memory is unplugged (except in special cases - reboot during postcopy). The guest may only try to plug/unplug blocks of memory within the usable region size. The usable region size is a little bigger than the requested size, to give the device driver some flexibility. The usable region size will only grow, except on reboots or when all memory is requested to get unplugged. The guest can never plug more memory than requested. Unplugged memory will get zapped/discarded, similar to in a balloon device. The block size is variable, however, it is always chosen in a way such that THP splits are avoided (e.g., 2MB). The state of each block (plugged/unplugged) is tracked in a bitmap. As virtio-mem devices (e.g., virtio-mem-pci) will be memory devices, we now expose "VirtioMEMDeviceInfo" via "query-memory-devices". -- There are two important follow-up items that are in the works: 1. Resizeable memory regions: Use resizeable allocations/RAM blocks to grow/shrink along with the usable region size. This avoids creating initially very big VMAs, RAM blocks, and KVM slots. 2. Protection of unplugged memory: Make sure the gust cannot actually make use of unplugged memory. Other follow-up items that are in the works: 1. Exclude unplugged memory during migration (via precopy notifier). 2. Handle remapping of memory. 3. Support for other architectures. -- Example usage (virtio-mem-pci is introduced in follow-up patches): Start QEMU with two virtio-mem devices (one per NUMA node): $ qemu-system-x86_64 -m 4G,maxmem=20G \ -smp sockets=2,cores=2 \ -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 \ [...] -object memory-backend-ram,id=mem0,size=8G \ -device virtio-mem-pci,id=vm0,memdev=mem0,node=0,requested-size=0M \ -object memory-backend-ram,id=mem1,size=8G \ -device virtio-mem-pci,id=vm1,memdev=mem1,node=1,requested-size=1G Query the configuration: (qemu) info memory-devices Memory device [virtio-mem]: "vm0" memaddr: 0x14000 node: 0 requested-size: 0 size: 0 max-size: 8589934592 block-size: 2097152 memdev: /objects/mem0 Memory device [virtio-mem]: "vm1" memaddr: 0x34000 node: 1 requested-size: 1073741824 size: 1073741824 max-size: 8589934592 block-size: 2097152 memdev: /objects/mem1 Add some memory to node 0: (qemu) qom-set vm0 requested-size 500M Remove some memory from node 1: (qemu) qom-set vm1 requested-size 200M Query the configuration again: (qemu) info memory-devices Memory device [virtio-mem]: "vm0" memaddr: 0x14000 node: 0 requested-size: 524288000 size: 524288000 max-size: 8589934592 block-size: 2097152 memdev: /objects/mem0 Memory device [virtio-mem]: "vm1" memaddr: 0x34000 node: 1 requested-size: 209715200 size: 209715200 max-size: 8589934592 block-size: 2097152 memdev: /objects/mem1 [1] https://lkml.kernel.org/r/20200311171422.10484-1-da...@redhat.com Cc: "Michael S. Tsirkin" Cc: Eric Blake Cc: Markus Armbruster Cc: "Dr. David Alan Gilbert" Cc: Igor Mammedov Signed-off-by: David Hildenbrand --- hw/virtio/Kconfig | 11 + hw/virtio/Makefile.objs| 1 + hw/virtio/virtio-mem.c | 762 + include/hw/virtio/virtio-mem.h | 80 qapi/misc.json | 39 +- 5 files changed, 892 insertions(+), 1 deletion(-) create mode 100644 hw/virtio/virtio-mem.c create mode 100644 include/hw/virtio/virtio-mem.h diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig index 83122424fa..0eda25c4e1 100644 --- a/hw/virtio/Kconfig +++ b/hw/virtio/Kconfig @@ -47,3 +47,14 @@ config VIRTIO_PMEM depends on VIRTIO depends on VIRTIO_PMEM_SUPPORTED select MEM_DEVICE + +config VIRTIO_MEM_SUPPORTED +bool + +config VIRTIO_MEM +bool +default y +depends on VIRTIO +depends on LINUX +depends on VIRTIO_MEM_SUPPORTED +select MEM_DEVICE diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index 4e4d39a0a4..7df70e977e 100644 --- a/hw/virtio/Makefile.objs +++ b/hw/virtio/Makefile.objs @@ -18,6 +18,7 @@ common-obj-$(call land,$(CONFIG_VIRTIO_PMEM),$(CONFIG_VIRTIO_PCI)) += virtio-pme obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) += vhost-user-fs-pci.o obj-$(CONFIG_V
Re: [PATCH 1/5] roms/opensbi: Update to support building bios images for generic platform
Hi Alistair, On Tue, May 5, 2020 at 12:00 AM Alistair Francis wrote: > > On Mon, May 4, 2020 at 1:05 AM Bin Meng wrote: > > > > On Mon, May 4, 2020 at 4:00 PM Bin Meng wrote: > > > > > > Hi Anup, > > > > > > On Mon, May 4, 2020 at 3:52 PM Anup Patel wrote: > > > > > > > > On Mon, May 4, 2020 at 12:46 PM Bin Meng wrote: > > > > > > > > > > Hi Anup, > > > > > > > > > > On Sun, May 3, 2020 at 12:38 PM Anup Patel > > > > > wrote: > > > > > > > > > > > > On Fri, May 1, 2020 at 9:26 PM Bin Meng wrote: > > > > > > > > > > > > > > From: Bin Meng > > > > > > > > > > > > > > The RISC-V generic platform is a flattened device tree (FDT) based > > > > > > > platform where all platform specific functionality is provided > > > > > > > based > > > > > > > on FDT passed by previous booting stage. The support was added in > > > > > > > upstream opensbi recently. > > > > > > > > > > > > > > Bring the QEMU roms/opensbi submodule to the upstream opensbi > > > > > > > commit: > > > > > > > commit 4f18c6e55049 ("platform: generic: Add Sifive FU540 TLB > > > > > > > flush range limit override") > > > > > > > with the following changes since v0.7 release: > > > > > > > > > > > > > > 1bb00ab lib: No need to provide default PMP region using > > > > > > > platform callbacks > > > > > > > a9eac67 include: sbi_platform: Combine reboot and shutdown into > > > > > > > one callback > > > > > > > 6585fab lib: utils: Add SiFive test device > > > > > > > 4781545 platform: Add Nuclei UX600 platform > > > > > > > 3a326af scripts: adapt binary archive script for Nuclei UX600 > > > > > > > 5bdf022 firmware: fw_base: Remove CSR_MTVEC update check > > > > > > > e6c1345 lib: utils/serial: Skip baudrate config if input > > > > > > > frequency is zero > > > > > > > 01a8c8e lib: utils: Improve fdt_parse_uart8250() API > > > > > > > 0a0093b lib: utils: Add fdt_parse_uart8250_node() function > > > > > > > 243b0d0 lib: utils: Remove redundant clint_ipi_sync() > > > > > > > declaration > > > > > > > e3ad7c1 lib: utils: Rename fdt_parse_clint() to > > > > > > > fdt_parse_compat_addr() > > > > > > > a39cd6f lib: utils: Add FDT match table based node lookup > > > > > > > dd33b9e lib: utils: Make fdt_get_node_addr_size() public > > > > > > > function > > > > > > > 66185b3 lib: utils: Add fdt_parse_sifive_uart_node() function > > > > > > > 19e966b lib: utils: Add fdt_parse_hart_id() function > > > > > > > 44dd7be lib: utils: Add fdt_parse_max_hart_id() API > > > > > > > f0eb503 lib: utils: Add fdt_parse_plic_node() function > > > > > > > 1ac794c include: Add array_size() macro > > > > > > > 8ff2b94 lib: utils: Add simple FDT timer framework > > > > > > > 76f0f81 lib: utils: Add simple FDT ipi framework > > > > > > > 75322a6 lib: utils: Add simple FDT irqchip framework > > > > > > > 76a8940 lib: utils: Add simple FDT serial framework > > > > > > > 7cc6fa4 lib: utils: Add simple FDT reset framework > > > > > > > 4d06353 firmware: fw_base: Introduce optional fw_platform_init() > > > > > > > f1aa9e5 platform: Add generic FDT based platform support > > > > > > > 1f21b99 lib: sbi: Print platform hart count at boot time > > > > > > > 2ba7087 scripts: Add generic platform to > > > > > > > create-binary-archive.sh > > > > > > > 4f18c6e platform: generic: Add Sifive FU540 TLB flush range > > > > > > > limit override > > > > > > > > > > > > > > Update our Makefile to build the generic platform instead of > > > > > > > building > > > > > > > virt and sifive_u separately. > > Hey Bin, > > Thanks for the patch! > > I don't think we can take this update though, as we should stick with > OpenSBI releases. Sure, will delay this series once OpenSBI v0.8 release is out. > > > > > > > > > > > > > > > Signed-off-by: Bin Meng > > > > > > > --- > > > > > > > > > > > > > > roms/Makefile | 30 -- > > > > > > > roms/opensbi | 2 +- > > > > > > > 2 files changed, 9 insertions(+), 23 deletions(-) > > > > > > > > > > > > > > diff --git a/roms/Makefile b/roms/Makefile > > > > > > > index f9acf39..cb00628 100644 > > > > > > > --- a/roms/Makefile > > > > > > > +++ b/roms/Makefile > > > > > > > @@ -64,10 +64,8 @@ default help: > > > > > > > @echo " u-boot.e500-- update u-boot.e500" > > > > > > > @echo " u-boot.sam460 -- update u-boot.sam460" > > > > > > > @echo " efi-- update UEFI (edk2) > > > > > > > platform firmware" > > > > > > > - @echo " opensbi32-virt -- update OpenSBI for 32-bit > > > > > > > virt machine" > > > > > > > - @echo " opensbi64-virt -- update OpenSBI for 64-bit > > > > > > > virt machine" > > > > > > > - @echo " opensbi32-sifive_u -- update OpenSBI for 32-bit > > > > > > > sifive_u machine" > > > > > > > - @echo " opensbi64-sifive_u -- update OpenSBI for 64-bit > > > > > > > sifive_u machine" > > > > > > > + @echo " opensbi32-generic -- update OpenSBI for 32-b
[PATCH v1 12/17] MAINTAINERS: Add myself as virtio-mem maintainer
Let's make sure patches/bug reports find the right person. Cc: "Michael S. Tsirkin" Cc: Peter Maydell Cc: Markus Armbruster Signed-off-by: David Hildenbrand --- MAINTAINERS | 8 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 1f84e3ae2c..09fff9e1bd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1734,6 +1734,14 @@ F: hw/virtio/virtio-crypto.c F: hw/virtio/virtio-crypto-pci.c F: include/hw/virtio/virtio-crypto.h +virtio-mem +M: David Hildenbrand +S: Supported +F: hw/virtio/virtio-mem.c +F: hw/virtio/virtio-mem-pci.h +F: hw/virtio/virtio-mem-pci.c +F: include/hw/virtio/virtio-mem.h + nvme M: Keith Busch L: qemu-bl...@nongnu.org -- 2.25.3
Re: [PATCH 2/2] qemu-img: Add --start-offset and --max-length to map
Thanks for the review. I will send V2 based on QEMU version 5.0. On 29/04/2020, 18:06, "Eric Blake" wrote: On 3/22/20 4:11 AM, Eyal Moscovici wrote: > The mapping operation of large disks especially ones stored over a > long chain of QCOW2 files can take a long time to finish. > Additionally when mapping fails there was no way recover by > restarting the mapping from the failed location. > > The new options, --start-offset and --max-length allows the user to > divide these type of map operations into shorter independent tasks. > > Acked-by: Mark Kanda > Co-developed-by: Yoav Elnekave > Signed-off-by: Yoav Elnekave > Signed-off-by: Eyal Moscovici > --- > docs/tools/qemu-img.rst | 2 +- > qemu-img-cmds.hx| 4 ++-- > qemu-img.c | 30 +- > 3 files changed, 32 insertions(+), 4 deletions(-) > > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst > index 0080f83a76..924e89f679 100644 > --- a/docs/tools/qemu-img.rst > +++ b/docs/tools/qemu-img.rst > @@ -519,7 +519,7 @@ Command description: > ``ImageInfoSpecific*`` QAPI object (e.g. ``ImageInfoSpecificQCow2`` > for qcow2 images). > > -.. option:: map [--object OBJECTDEF] [--image-opts] [-f FMT] [--output=OFMT] [-U] FILENAME > +.. option:: map [--object OBJECTDEF] [--image-opts] [-f FMT] [--start-offset=offset] [--max-length=len] [--output=OFMT] [-U] FILENAME Consistency with the rest of the line says this should be [--start-offset=OFFSET] [--max-length=LEN] Will fix. > > Dump the metadata of image *FILENAME* and its backing file chain. > In particular, this commands dumps the allocation state of every sector > diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx > index c9c54de1df..35f832816f 100644 > --- a/qemu-img-cmds.hx > +++ b/qemu-img-cmds.hx > @@ -63,9 +63,9 @@ SRST > ERST > > DEF("map", img_map, > -"map [--object objectdef] [--image-opts] [-f fmt] [--output=ofmt] [-U] filename") > +"map [--object objectdef] [--image-opts] [-f fmt] [--start-offset=offset] [--max-length=len] [--output=ofmt] [-U] filename") this one is fine, > SRST > -.. option:: map [--object OBJECTDEF] [--image-opts] [-f FMT] [--output=OFMT] [-U] FILENAME > +.. option:: map [--object OBJECTDEF] [--image-opts] [-f FMT] [--start-offset=OFFSET] [--max-length=LEN] [--output=OFMT] [-U] FILENAME this one has the same problem as the .rst. > @@ -3005,6 +3009,26 @@ static int img_map(int argc, char **argv) > case OPTION_OUTPUT: > output = optarg; > break; > +case 's': > +start_offset = cvtnum(optarg); > +if (start_offset < 0) { > +error_report("Invalid start offset specified! You may use " > + "k, M, G, T, P or E suffixes for "); > +error_report("kilobytes, megabytes, gigabytes, terabytes, " > + "petabytes and exabytes."); Pre-existing elsewhere in the file, but this seems rather verbose - shouldn't we have cvtnum() (or another wrapper function) give this extra information about what is valid, rather than open-coding it at every client of cvtnum()? You are probably right I will send a patch that adds the error message about the units to cvtnum(). > +return 1; > +} > +break; > +case 'l': > +max_length = cvtnum(optarg); > +if (max_length < 0) { > +error_report("Invalid max length specified! You may use " > + "k, M, G, T, P or E suffixes for "); > +error_report("kilobytes, megabytes, gigabytes, terabytes, " > + "petabytes and exabytes."); > +return 1; > +} > +break; > case OPTION_OBJECT: { > QemuOpts *opts; > opts = qemu_opts_parse_noisily(&qemu_object_opts, > @@ -3050,7 +3074,11 @@ static int img_map(int argc, char **argv) > printf("["); > } > > +curr.start = start_offset; > length = blk_getlength(blk); > +if (max_length != -1) { > +length = MIN(start_offset + max_length, length); > +} Pre-existing, but where does this code check for length == -1? But your MIN() doesn't make it any worse (if we fail to get length, we merely skip the loop). I don't think there is a check. I will add a check in a different patch. > while (curr.start + curr.length < length) { > int64_t offset = curr.start + curr.length;
Re: [PATCH v2 05/13] acpi: move acpi_init_common_fadt_data to acpi-common.c
On Tue, May 05, 2020 at 04:25:55PM +0200, Igor Mammedov wrote: > On Tue, 5 May 2020 15:42:57 +0200 > Gerd Hoffmann wrote: > > the same question like with FACS, why legacy data are needed for with reduced > profile? > it mostly initializes data for fixed-hw model. > > I'd preffer if you drop FACS and use minimal FADT like build_fadt_rev5() does > without pulling along a bunch of legacy junk (unless there is a good > justification for it). Leftover from the isa-acpi version, it is indeed not needed any more (same for facs). take care, Gerd
Re: [PATCH 1/2] qemu-img: refactor dump_map_entry JSON format output
On 29/04/2020, 17:58, "Eric Blake" wrote: On 3/22/20 4:11 AM, Eyal Moscovici wrote: > Previously dump_map_entry identified whether we need to start a new JSON > array based on whether start address == 0. In this refactor we remove > this assumption as in following patches we will allow map to start from > an arbitrary position. > > Acked-by: Mark Kanda > Signed-off-by: Eyal Moscovici > --- > qemu-img.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > @@ -2871,8 +2870,8 @@ static int dump_map_entry(OutputFormat output_format, MapEntry *e, > } > putchar('}'); > > -if (!next) { > -printf("]\n"); > +if (next) { > +printf(",\n"); As long as you're touching this, puts(",") is slightly more efficient than printf(). But what you have is not wrong. Thanks, will fix. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v16 QEMU 08/16] vfio: Register SaveVMHandlers for VFIO device
On Wed, 6 May 2020 02:38:46 -0400 Yan Zhao wrote: > On Tue, May 05, 2020 at 12:37:26PM +0800, Alex Williamson wrote: > > It's been a long time, but that doesn't seem like what I was asking. > > The sysfs version checking is used to select a target that is likely to > > succeed, but the migration stream is still generated by a user and the > > vendor driver is still ultimately responsible for validating that > > stream. I would hope that a vendor migration stream therefore starts > > with information similar to that found in the sysfs interface, allowing > > the receiving vendor driver to validate the source device and vendor > > software version, such that we can fail an incoming migration that the > > vendor driver deems incompatible. Ideally the vendor driver might also > > include consistency and sequence checking throughout the stream to > > prevent a malicious user from exploiting the internal operation of the > > vendor driver. Thanks, Some kind of somewhat standardized marker for driver/version seems like a good idea. Further checking is also a good idea, but I think the details of that need to be left to the individual drivers. > > > maybe we can add a rw field migration_version in > struct vfio_device_migration_info besides sysfs interface ? > > when reading it in src, it gets the same string as that from sysfs; > when writing it in target, it returns success or not to check > compatibility and fails the migration early in setup phase. Getting both populated from the same source seems like a good idea. Not sure if a string is the best value to put into a migration stream; maybe the sysfs interface can derive a human-readable string from a more compact value to be put into the migration region (and ultimately the stream)? Might be overengineering, just thinking out aloud here.
Re: [PATCH RFC 2/4] migration: Introduce migrate_is_precopy()
* Peter Xu (pet...@redhat.com) wrote: > Export a helper globally to check whether we're during a precopy. > > Signed-off-by: Peter Xu Can you change this to 'migration_in_precopy' to match the existing 'migration_in_postcopy'. Dave > --- > include/migration/misc.h | 1 + > migration/migration.c| 7 +++ > 2 files changed, 8 insertions(+) > > diff --git a/include/migration/misc.h b/include/migration/misc.h > index e338be8c30..b4f6bf7842 100644 > --- a/include/migration/misc.h > +++ b/include/migration/misc.h > @@ -61,6 +61,7 @@ void migration_shutdown(void); > void qemu_start_incoming_migration(const char *uri, Error **errp); > bool migration_is_idle(void); > bool migration_is_active(MigrationState *); > +bool migration_is_precopy(void); > void add_migration_state_change_notifier(Notifier *notify); > void remove_migration_state_change_notifier(Notifier *notify); > bool migration_in_setup(MigrationState *); > diff --git a/migration/migration.c b/migration/migration.c > index 187ac0410c..0082880279 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1795,6 +1795,13 @@ bool migration_is_active(MigrationState *s) > s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); > } > > +bool migration_is_precopy(void) > +{ > +MigrationState *s = migrate_get_current(); > + > +return s && s->state == MIGRATION_STATUS_ACTIVE; > +} > + > void migrate_init(MigrationState *s) > { > /* > -- > 2.24.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH v5 11/31] qcow2: Add offset_into_subcluster() and size_to_subclusters()
On Tue 05 May 2020 09:42:08 PM CEST, Eric Blake wrote: > On 5/5/20 12:38 PM, Alberto Garcia wrote: >> Like offset_into_cluster() and size_to_clusters(), but for >> subclusters. >> >> Signed-off-by: Alberto Garcia >> --- >> block/qcow2.h | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/block/qcow2.h b/block/qcow2.h >> index e68febb15b..8b1ed1cbcf 100644 >> --- a/block/qcow2.h >> +++ b/block/qcow2.h >> @@ -537,11 +537,21 @@ static inline int64_t >> offset_into_cluster(BDRVQcow2State *s, int64_t offset) >> return offset & (s->cluster_size - 1); >> } >> >> +static inline int64_t offset_into_subcluster(BDRVQcow2State *s, int64_t >> offset) >> +{ >> +return offset & (s->subcluster_size - 1); >> +} >> + >> static inline uint64_t size_to_clusters(BDRVQcow2State *s, uint64_t size) >> { >> return (size + (s->cluster_size - 1)) >> s->cluster_bits; >> } > > Pre-existing, but this could use DIV_ROUND_UP. Yeah but it would be nicer to have a version of the macro optimized for powers of two. Berto
Re: [PATCH v2 08/13] acpi: generic event device for x86
On Tue, May 05, 2020 at 05:03:16PM +0200, Igor Mammedov wrote: > On Tue, 5 May 2020 15:43:00 +0200 > Gerd Hoffmann wrote: > > > Uses TYPE_ACPI_GED as QOM parent for code reuse. > > Add registers for sleep states and reset. > > Add powerdown notifier for power button events. > registers aren't x86 specific per spec, > can we put these registers into TYPE_ACPI_GED > and enable the optionally like is done with memory hotplug registers > in acpi_ged_initfn() Sure, will do. > > Set AcpiDeviceIfClass->madt_cpu. > that's the only reason I could justify adding x86 specific flavour. Also the powerdown notifier. Seems the workflow is slightly different on x86 (acpi device registers notifier) and arm (machine registers notifier and calls acpidev->send_event). take care, Gerd
Re: [PATCH v18 QEMU 16/18] vfio: Add ioctl to get dirty pages bitmap during dma unmap.
On Tue, 5 May 2020 04:14:51 +0530 Kirti Wankhede wrote: > With vIOMMU, IO virtual address range can get unmapped while in pre-copy > phase of migration. In that case, unmap ioctl should return pages pinned > in that range and QEMU should find its correcponding guest physical > addresses and report those dirty. > > Note: This patch is not yet tested. I'm trying to see how I can test this > code path. This remark should go beneath the '---' line, so that it does not end up in the final commit. > > Suggested-by: Alex Williamson > Signed-off-by: Kirti Wankhede > Reviewed-by: Neo Jia > --- > hw/vfio/common.c | 79 > +--- > 1 file changed, 75 insertions(+), 4 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index 4277b275ca21..b94e2bcb1178 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -311,11 +311,77 @@ static bool vfio_devices_are_stopped_and_saving(void) > return true; > } > > +static bool vfio_devices_are_running_and_saving(void) Maybe s/are/all/ to make it sure that the scope is *all* vfio devices here? Is there any global state for this which we could use to check this in a simpler way? > +{ > +VFIOGroup *group; > +VFIODevice *vbasedev; > + > +QLIST_FOREACH(group, &vfio_group_list, next) { > +QLIST_FOREACH(vbasedev, &group->device_list, next) { > +if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) && > +(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) { > +continue; > +} else { > +return false; > +} > +} > +} > +return true; > +} > + > +static int vfio_dma_unmap_bitmap(VFIOContainer *container, > + hwaddr iova, ram_addr_t size, > + IOMMUTLBEntry *iotlb) > +{ > +struct vfio_iommu_type1_dma_unmap *unmap; > +struct vfio_bitmap *bitmap; > +uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS; > +int ret; > + > +unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap)); g_malloc0 cannot fail (it will abort). If you want to be able to tolerate memory allocation failure, you should use g_try_malloc0(). > +if (!unmap) { > +return -ENOMEM; > +} > + > +unmap->argsz = sizeof(*unmap) + sizeof(*bitmap); > +unmap->flags |= VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP; > +bitmap = (struct vfio_bitmap *)&unmap->data; > + > +/* > + * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of > + * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to > + * TARGET_PAGE_SIZE. > + */ > + > +bitmap->pgsize = TARGET_PAGE_SIZE; > +bitmap->size = ROUND_UP(pages / 8, sizeof(uint64_t)); > +bitmap->data = g_malloc0(bitmap->size); > +if (!bitmap->data) { > +error_report("UNMAP: Error allocating bitmap of size 0x%llx", > + bitmap->size); > +g_free(unmap); > +return -ENOMEM; > +} > + > +ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap); > +if (!ret) { > +cpu_physical_memory_set_dirty_lebitmap((uint64_t *)bitmap->data, > +iotlb->translated_addr, pages); > +} else { > +error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %d", -errno); > +} > + > +g_free(bitmap->data); > +g_free(unmap); > +return ret; > +} > +
Re: [PATCH v2 09/13] microvm: add minimal acpi support
> > +/* FIXME: copy & paste */ > > +static void acpi_dsdt_add_power_button(Aml *scope) > > +{ > > +Aml *dev = aml_device(ACPI_POWER_BUTTON_DEVICE); > > +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C0C"))); > > +aml_append(dev, aml_name_decl("_UID", aml_int(0))); > > +aml_append(scope, dev); > > +} > > could be unified with ARM's version Yep. Suggestions for a good place? hw/acpi/aml-build.c ? > > +acpi_align_size(tables_blob, ACPI_BUILD_TABLE_SIZE); > > +acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE); > I'd drop these as that was mostly to counter various migration issues on > legacy Dropped. take care, Gerd
Re: [PATCH v3 09/33] block: Add generic bdrv_inherited_options()
Am 18.02.2020 um 13:42 hat Max Reitz geschrieben: > After the series this patch belongs to, we want to have a common > BdrvChildClass that encompasses all of child_file, child_format, and > child_backing. Such a single class needs a single .inherit_options() > implementation, and this patch introduces it. > > The next patch will show how the existing implementations can fall back > to it just by passing appropriate BdrvChildRole and parent_is_format > values. > > Signed-off-by: Max Reitz > Reviewed-by: Eric Blake > --- > block.c | 84 + > 1 file changed, 84 insertions(+) > > diff --git a/block.c b/block.c > index c33f0e9b42..9179b9b604 100644 > --- a/block.c > +++ b/block.c > @@ -998,6 +998,90 @@ static void bdrv_temp_snapshot_options(int *child_flags, > QDict *child_options, > *child_flags &= ~BDRV_O_NATIVE_AIO; > } > > +/* > + * Returns the options and flags that a generic child of a BDS should > + * get, based on the given options and flags for the parent BDS. > + */ > +static void __attribute__((unused)) > +bdrv_inherited_options(BdrvChildRole role, bool parent_is_format, > + int *child_flags, QDict *child_options, > + int parent_flags, QDict *parent_options) > +{ > +int flags = parent_flags; > + > +/* > + * First, decide whether to set, clear, or leave BDRV_O_PROTOCOL. > + * Generally, the question to answer is: Should this child be > + * format-probed by default? > + */ > + > +/* > + * Pure and non-filtered data children of non-format nodes should > + * be probed by default (even when the node itself has BDRV_O_PROTOCOL > + * set). This only affects a very limited set of drivers (namely > + * quorum and blkverify when this comment was written). > + * Force-clear BDRV_O_PROTOCOL then. > + */ > +if (!parent_is_format && > +(role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA | > + BDRV_CHILD_FILTERED)) == > +BDRV_CHILD_DATA) You could avoid the odd indentation (I can't decide whether or not it should be one space more to align correctly) and probably also make the expression more readable if you split it into: (role & BDRV_CHILD_DATA) && !(role & (BDRV_CHILD_METADATA | BDRV_CHILD_FILTERED)) > +{ > +flags &= ~BDRV_O_PROTOCOL; > +} > + > +/* > + * All children of format nodes (except for COW children) and all > + * metadata children in general should never be format-probed. > + * Force-set BDRV_O_PROTOCOL then. > + */ > +if ((parent_is_format && !(role & BDRV_CHILD_COW)) || > +(role & BDRV_CHILD_METADATA)) > +{ > +flags |= BDRV_O_PROTOCOL; > +} > + > +/* > + * If the cache mode isn't explicitly set, inherit direct and no-flush > from > + * the parent. > + */ > +qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT); > +qdict_copy_default(child_options, parent_options, > BDRV_OPT_CACHE_NO_FLUSH); > +qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE); > + > +if (role & BDRV_CHILD_COW) { > +/* backing files are always opened read-only */ Not "always", just by default. > +qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on"); > +qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off"); > +} else { > +/* Inherit the read-only option from the parent if it's not set */ > +qdict_copy_default(child_options, parent_options, > BDRV_OPT_READ_ONLY); > +qdict_copy_default(child_options, parent_options, > + BDRV_OPT_AUTO_READ_ONLY); > +} > + > +if (parent_is_format && !(role & BDRV_CHILD_COW)) { > +/* > + * Our format drivers take care to send flushes and respect > + * unmap policy, so we can default to enable both on lower > + * layers regardless of the corresponding parent options. > + */ > +qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap"); > +} Why the restriction to format here? Don't we break "unmap" propagation through filters with this? It would probably also be a good question why we don't propagate it to the backing file, but this is preexisting. > +/* Clear flags that only apply to the top layer */ > +flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ); > + > +if (role & BDRV_CHILD_METADATA) { > +flags &= ~BDRV_O_NO_IO; > +} Hm... This is interesting, but I guess it makes sense. It just never was a hard rule that a format driver must not access data even internally with NO_IO, but I think it holds true. > +if (role & BDRV_CHILD_COW) { > +flags &= ~BDRV_O_TEMPORARY; > +} We could probably have a long discussion about whether this is right in theory, but in practice BDRV_O_TEMPORARY only exists for snapshot=on, for whi
RE: [PATCH v2 04/10] net: cadence_gem: Define access permission for interrupt registers
Hi Edgar, > -Original Message- > From: Edgar E. Iglesias > Sent: Monday, May 4, 2020 8:27 PM > To: Sai Pavan Boddu > Cc: Alistair Francis ; Peter Maydell > ; Jason Wang ; Markus > Armbruster ; Philippe Mathieu-Daudé > ; Tong Ho ; Ramon Fried > ; qemu-...@nongnu.org; qemu- > de...@nongnu.org > Subject: Re: [PATCH v2 04/10] net: cadence_gem: Define access permission > for interrupt registers > > On Mon, May 04, 2020 at 07:36:02PM +0530, Sai Pavan Boddu wrote: > > Q1 to Q7 ISR's are clear-on-read, IER/IDR registers are write-only, > > mask reg are read-only. > > > > Signed-off-by: Sai Pavan Boddu > > --- > > hw/net/cadence_gem.c | 14 ++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c index > > a930bf1..c532a14 100644 > > --- a/hw/net/cadence_gem.c > > +++ b/hw/net/cadence_gem.c > > @@ -458,6 +458,7 @@ static const uint8_t broadcast_addr[] = { 0xFF, 0xFF, > 0xFF, 0xFF, 0xFF, 0xFF }; > > */ > > static void gem_init_register_masks(CadenceGEMState *s) { > > +unsigned int i; > > /* Mask of register bits which are read only */ > > memset(&s->regs_ro[0], 0, sizeof(s->regs_ro)); > > s->regs_ro[GEM_NWCTRL] = 0xFFF8; > > @@ -470,10 +471,19 @@ static void > gem_init_register_masks(CadenceGEMState *s) > > s->regs_ro[GEM_ISR] = 0x; > > s->regs_ro[GEM_IMR] = 0x; > > s->regs_ro[GEM_MODID]= 0x; > > +for (i = 0; i < s->num_priority_queues; i++) { > > +s->regs_ro[GEM_INT_Q1_STATUS + i] = 0x; > > +s->regs_ro[GEM_INT_Q1_ENABLE + i] = 0xE319; > > +s->regs_ro[GEM_INT_Q1_DISABLE + i] = 0xE319; > > Shouldn't these be 0xf319? [Sai Pavan Boddu] This one is right. I would fix it thanks. Regards, Sai Pavan > Perhaps I'm looking at old specs but mine says bits upper bits [31:12] are > reserved and read-only. > > > With that fixed: > Reviewed-by: Edgar E. Iglesias > > > > > > > +s->regs_ro[GEM_INT_Q1_MASK + i] = 0x; > > > +} > > > > /* Mask of register bits which are clear on read */ > > memset(&s->regs_rtc[0], 0, sizeof(s->regs_rtc)); > > s->regs_rtc[GEM_ISR] = 0x; > > +for (i = 0; i < s->num_priority_queues; i++) { > > +s->regs_rtc[GEM_INT_Q1_STATUS + i] = 0x0CE6; > > +} > > > > /* Mask of register bits which are write 1 to clear */ > > memset(&s->regs_w1c[0], 0, sizeof(s->regs_w1c)); @@ -485,6 > > +495,10 @@ static void gem_init_register_masks(CadenceGEMState *s) > > s->regs_wo[GEM_NWCTRL] = 0x00073E60; > > s->regs_wo[GEM_IER] = 0x07FF; > > s->regs_wo[GEM_IDR] = 0x07FF; > > +for (i = 0; i < s->num_priority_queues; i++) { > > +s->regs_wo[GEM_INT_Q1_ENABLE + i] = 0x0CE6; > > +s->regs_wo[GEM_INT_Q1_DISABLE + i] = 0x0CE6; > > +} > > } > > > > /* > > -- > > 2.7.4 > >
Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.
On Mon, 4 May 2020 21:28:56 +0530 Kirti Wankhede wrote: > VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations: > - Start dirty pages tracking while migration is active > - Stop dirty pages tracking. > - Get dirty pages bitmap. Its user space application's responsibility to > copy content of dirty pages from source to destination during migration. > > To prevent DoS attack, memory for bitmap is allocated per vfio_dma > structure. Bitmap size is calculated considering smallest supported page > size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled > > Bitmap is populated for already pinned pages when bitmap is allocated for > a vfio_dma with the smallest supported page size. Update bitmap from > pinning functions when tracking is enabled. When user application queries > bitmap, check if requested page size is same as page size used to > populated bitmap. If it is equal, copy bitmap, but if not equal, return > error. > > Fixed below error by changing pgsize type from uint64_t to size_t. > Reported-by: kbuild test robot > > All errors: > drivers/vfio/vfio_iommu_type1.c:197: undefined reference to `__udivdi3' > > drivers/vfio/vfio_iommu_type1.c:225: undefined reference to `__udivdi3' Move that below the '---' delimiter so that it does not end up in the commit? (Crediting the build bot is fine, but the details are not really useful when you look at the code later.) > > Signed-off-by: Kirti Wankhede > Reviewed-by: Neo Jia > --- > drivers/vfio/vfio_iommu_type1.c | 266 > +++- > 1 file changed, 260 insertions(+), 6 deletions(-) > @@ -2278,6 +2435,93 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, > > return copy_to_user((void __user *)arg, &unmap, minsz) ? > -EFAULT : 0; > + } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) { > + struct vfio_iommu_type1_dirty_bitmap dirty; > + uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | > + VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP | > + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP; > + int ret = 0; > + > + if (!iommu->v2) > + return -EACCES; > + > + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap, > + flags); > + > + if (copy_from_user(&dirty, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (dirty.argsz < minsz || dirty.flags & ~mask) > + return -EINVAL; > + > + /* only one flag should be set at a time */ > + if (__ffs(dirty.flags) != __fls(dirty.flags)) > + return -EINVAL; > + Shouldn't you also check whether the flag that is set is actually valid? (maybe dirty.flags & ~VFIO_IOMMU_DIRTY_PAGES_FLAG_MASK and do a switch/case over dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_MASK)
Re: [PATCH RFC 3/4] vl: Sync dirty bits for system resets during precopy
* Peter Xu (pet...@redhat.com) wrote: > System resets will also reset system memory layout. Although the memory > layout > after the reset should probably the same as before the reset, we still need to > do frequent memory section removals and additions during the reset process. > Those operations could accidentally lose per-mem-section information like KVM > memslot dirty bitmaps. > > Previously we keep those dirty bitmaps by sync it during memory removal. > However that's hard to make it right after all [1]. Instead, we sync dirty > pages before system reset if we know we're during a precopy migration. This > should solve the same problem explicitly. > > [1] https://lore.kernel.org/qemu-devel/20200327150425.GJ422390@xz-x1/ I think the problem is knowing whether this is sufficient or whether you definitely need to do it for other cases of removal/readd. However, assuming we need to do it during reset, how do we know this is the right point to do it, and not something further inside the reset process? Is this just on the basis that vcpus are stopped? Dave > Signed-off-by: Peter Xu > --- > softmmu/vl.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/softmmu/vl.c b/softmmu/vl.c > index 32c0047889..8f864fee43 100644 > --- a/softmmu/vl.c > +++ b/softmmu/vl.c > @@ -1387,6 +1387,22 @@ void qemu_system_reset(ShutdownCause reason) > > cpu_synchronize_all_states(); > > +/* > + * System reboot could reset memory layout. Although the final status of > + * the memory layout should be the same as before the reset, the memory > + * sections can still be removed and added back frequently due to the > reset > + * process. This could potentially drop dirty bits in track for those > + * memory sections before the reset. > + * > + * Do a global dirty sync before the reset happens if we are during a > + * precopy, so we don't lose the dirty bits during the memory shuffles. > + */ > +if (migration_is_precopy()) { > +WITH_RCU_READ_LOCK_GUARD() { > +migration_bitmap_sync_precopy(); > +} > +} > + > if (mc && mc->reset) { > mc->reset(current_machine); > } else { > -- > 2.24.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [PATCH v2 05/10] net: cadence_gem: Set ISR according to queue in use
Hi Edgar, > -Original Message- > From: Edgar E. Iglesias > Sent: Monday, May 4, 2020 8:32 PM > To: Sai Pavan Boddu > Cc: Alistair Francis ; Peter Maydell > ; Jason Wang ; Markus > Armbruster ; Philippe Mathieu-Daudé > ; Tong Ho ; Ramon Fried > ; qemu-...@nongnu.org; qemu- > de...@nongnu.org > Subject: Re: [PATCH v2 05/10] net: cadence_gem: Set ISR according to queue > in use > > On Mon, May 04, 2020 at 07:36:03PM +0530, Sai Pavan Boddu wrote: > > Set ISR according to queue in use, added interrupt support for all > > queues. > > Would it help to add a gem_set_isr(CadenceGEMState *s, int q, uint32_t > flag) ? > Instead of open coding these if (q == 0) else... all over the place... [Sai Pavan Boddu] Yeah, it would be nice. Will try to include this in v3 Thanks, Sai Pavan > > Anyway, the logic looks good to me: > Reviewed-by: Edgar E. Iglesias > > > > > > > Signed-off-by: Sai Pavan Boddu > > --- > > hw/net/cadence_gem.c | 31 ++- > > 1 file changed, 22 insertions(+), 9 deletions(-) > > > > diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c index > > c532a14..beb38ec 100644 > > --- a/hw/net/cadence_gem.c > > +++ b/hw/net/cadence_gem.c > > @@ -896,7 +896,13 @@ static void gem_get_rx_desc(CadenceGEMState > *s, int q) > > if (rx_desc_get_ownership(s->rx_desc[q]) == 1) { > > DB_PRINT("descriptor 0x%" HWADDR_PRIx " owned by sw.\n", > desc_addr); > > s->regs[GEM_RXSTATUS] |= GEM_RXSTATUS_NOBUF; > > -s->regs[GEM_ISR] |= GEM_INT_RXUSED & ~(s->regs[GEM_IMR]); > > +if (q == 0) { > > +s->regs[GEM_ISR] |= GEM_INT_RXUSED & ~(s->regs[GEM_IMR]); > > +} else { > > +s->regs[GEM_INT_Q1_STATUS + q - 1] |= GEM_INT_RXUSED & > > + ~(s->regs[GEM_INT_Q1_MASK + q - > > 1]); > > +} > > + > > /* Handle interrupt consequences */ > > gem_update_int_status(s); > > } > > @@ -1071,8 +1077,12 @@ static ssize_t gem_receive(NetClientState *nc, > const uint8_t *buf, size_t size) > > gem_receive_updatestats(s, buf, size); > > > > s->regs[GEM_RXSTATUS] |= GEM_RXSTATUS_FRMRCVD; > > -s->regs[GEM_ISR] |= GEM_INT_RXCMPL & ~(s->regs[GEM_IMR]); > > - > > +if (q == 0) { > > +s->regs[GEM_ISR] |= GEM_INT_RXCMPL & ~(s->regs[GEM_IMR]); > > +} else { > > +s->regs[GEM_INT_Q1_STATUS + q - 1] |= GEM_INT_RXCMPL & > > + ~(s->regs[GEM_INT_Q1_MASK + q - 1]); > > +} > > /* Handle interrupt consequences */ > > gem_update_int_status(s); > > > > @@ -1223,12 +1233,12 @@ static void gem_transmit(CadenceGEMState > *s) > > DB_PRINT("TX descriptor next: 0x%08x\n", > > s->tx_desc_addr[q]); > > > > s->regs[GEM_TXSTATUS] |= GEM_TXSTATUS_TXCMPL; > > -s->regs[GEM_ISR] |= GEM_INT_TXCMPL & ~(s->regs[GEM_IMR]); > > - > > +if (q == 0) { > > +s->regs[GEM_ISR] |= GEM_INT_TXCMPL & ~(s- > >regs[GEM_IMR]); > > +} else { > > /* Update queue interrupt status */ > > -if (s->num_priority_queues > 1) { > > -s->regs[GEM_INT_Q1_STATUS + q] |= > > -GEM_INT_TXCMPL & ~(s->regs[GEM_INT_Q1_MASK + > > q]); > > +s->regs[GEM_INT_Q1_STATUS + q - 1] |= > > +GEM_INT_TXCMPL & ~s->regs[GEM_INT_Q1_MASK > > + + q - 1]; > > } > > > > /* Handle interrupt consequences */ @@ -1280,7 > > +1290,10 @@ static void gem_transmit(CadenceGEMState *s) > > > > if (tx_desc_get_used(desc)) { > > s->regs[GEM_TXSTATUS] |= GEM_TXSTATUS_USED; > > -s->regs[GEM_ISR] |= GEM_INT_TXUSED & ~(s->regs[GEM_IMR]); > > +/* IRQ TXUSED is defined only for queue 0 */ > > +if (q == 0) { > > +s->regs[GEM_ISR] |= GEM_INT_TXUSED & ~(s- > >regs[GEM_IMR]); > > +} > > gem_update_int_status(s); > > } > > } > > -- > > 2.7.4 > >
Re: [PATCH v25 08/10] ACPI: Record Generic Error Status Block(GESB) table
On 2020/5/5 18:44, Igor Mammedov wrote: >> Signed-off-by: Dongjiu Geng >> Signed-off-by: Xiang Zheng > Reviewed-by: Igor Mammedov > > Also we should ratelimit error messages that could be triggered at runtime > from acpi_ghes_record_errors() and functions it's calling. > It could be a patch on top. Ok, thanks Igor. I can make another patch for that after this series is applied. > >
Re: [INFO] Some preliminary performance data
Aleksandar Markovic writes: Some preliminary thoughts >> Hi, all. >> >> I just want to share with you some bits and pieces of data that I got >> while doing some preliminary experimentation for the GSoC project "TCG >> Continuous Benchmarking", that Ahmed Karaman, a student of the fourth final >> year of Electical Engineering Faculty in Cairo, will execute. >> >> *User Mode* >> >>* As expected, for any program dealing with any substantional >> floating-point calculation, softfloat library will be the the heaviest CPU >> cycles consumer. >>* We plan to examine the performance behaviour of non-FP programs >> (integer arithmetic), or even non-numeric programs (sorting strings, for >> example). Emilio was the last person to do extensive bench-marking on TCG and he used a mild fork of the venerable nbench: https://github.com/cota/dbt-bench as the hot code is fairly small it offers a good way of testing quality of the output. Larger programs will differ as they can involve more code generation. >> >> *System Mode* >> >>* I did profiling of booting several machines using a tool called >> callgrind (a part of valgrind). The tool offers pletora of information, >> however it looks it is little confused by usage of coroutines, and that >> makes some of its reports look very illogical, or plain ugly. Doesn't running through valgrind inherently serialise execution anyway? If you are looking for latency caused by locks we have support for the QEMU sync profiler built into the code. See "help sync-profile" on the HMP. >> Still, it >> seems valid data can be extracted from it. Without going into details, here >> is what it says for one machine (bear in mind that results may vary to a >> great extent between machines): You can also use perf to use sampling to find hot points in the code. One of last years GSoC student wrote some patches that included the ability to dump a jit info file for perf to consume. We never got it merged in the end but it might be worth having a go at pulling the relevant bits out from: Subject: [PATCH v9 00/13] TCG code quality tracking and perf integration Date: Mon, 7 Oct 2019 16:28:26 +0100 Message-Id: <20191007152839.30804-1-alex.ben...@linaro.org> >> ** The booting involved six threads, one for display handling, one >> for emulations, and four more. The last four did almost nothing during >> boot, just almost entire time siting idle, waiting for something. As far as >> "Total Instruction Fetch Count" (this is the main measure used in >> callgrind), they were distributed in proportion 1:3 between display thread >> and emulation thread (the rest of threads were negligible) (but, >> interestingly enough, for another machine that proportion was 1:20). >> ** The display thread is dominated by vga_update_display() function >> (21.5% "self" time, and 51.6% "self + callees" time, called almost 4 >> times). Other functions worth mentioning are >> cpu_physical_memory_snapshot_get_dirty() and >> memory_region_snapshot_get_dirty(), which are very small functions, but are >> both invoked over 26 000 000 times, and contribute with over 20% of display >> thread instruction fetch count together. The memory region tracking code will end up forcing the slow path for a lot of memory accesses to video memory via softmmu. You may want to measure if there is a difference using one of the virtio based graphics displays. >> ** Focusing now on emulation thread, "Total Instruction Fetch Counts" >> were roughly distributed this way: >>- 15.7% is execution of GIT-ed code from translation block >> buffer >>- 39.9% is execution of helpers >>- 44.4% is code translation stage, including some coroutine >> activities >> Top two among helpers: >> - helper_le_stl_memory() I assume that is the MMU slow-path being called from the generated code. >> - helper_lookup_tb_ptr() (this one is invoked whopping 36 000 >> 000 times) This is an optimisation to avoid exiting the run-loop to find the next block. From memory I think the two main cases you'll see are: - computed jumps (i.e. target not known at JIT time) - jumps outside of the current page >> Single largest instruction consumer of code translation: >> - liveness_pass_1(), that constitutes 21.5% of the entire >> "emulation thread" consumption, or, in other way, almost half of code >> translation stage (that sits at 44.4%) This is very much driven by how much code generation vs running you see. In most of my personal benchmarks I never really notice code generation because I give my machines large amounts of RAM so code tends to stay resident so not need to be re-translated. When the optimiser shows up it's usually accompanied by high TB flush and invalidate counts in "info jit" because we are doing more translation that we usually do. I'll also mention my foray into tracking down the performance regression of DOSBox Doom: https
Re: [PATCH v2 09/18] s390x/cpumodel: Fix UI to CPU features pcc-cmac-{aes,eaes}-256
On Tue, 5 May 2020 17:29:17 +0200 Markus Armbruster wrote: > Both s390_features[S390_FEAT_PCC_CMAC_AES_256].name and > s390_features[S390_FEAT_PCC_CMAC_EAES_256].name is > "pcc-cmac-eaes-256". The former is obviously a pasto. > > Impact: > > * s390_feat_bitmap_to_ascii() misidentifies S390_FEAT_PCC_CMAC_AES_256 > as "pcc-cmac-eaes-256". Affects QMP commands query-cpu-definitions, > query-cpu-model-expansion, query-cpu-model-baseline, > query-cpu-model-comparison, and the error message when > s390_realize_cpu_model() fails in check_compatibility(). > > * s390_cpu_list() also misidentifies it. Affects -cpu help. > > * s390_cpu_model_register_props() creates CPU property > "pcc-cmac-eaes-256" twice. The second one fails, but the error is > ignored (a later commit will change that). Results in a single > property "pcc-cmac-eaes-256" with the description for > S390_FEAT_PCC_CMAC_AES_256, and no property for > S390_FEAT_PCC_CMAC_EAES_256. CPU properties are visible in CLI -cpu > and -device, QMP & HMP device_add, QMP device-list-properties, and > QOM introspection. > > Fix by deleting the wayward 'e'. > > Fixes: 782417446279717aa85320191a519b51f6d5dd31 I like the more standard Fixes: 782417446279 ("s390x/cpumodel: introduce CPU features") for that. > Cc: Halil Pasic > Cc: Cornelia Huck > Cc: Christian Borntraeger > Cc: Richard Henderson > Cc: David Hildenbrand > Cc: qemu-s3...@nongnu.org > Signed-off-by: Markus Armbruster > Reviewed-by: David Hildenbrand > Tested-by: Christian Borntraeger > --- > target/s390x/cpu_features_def.inc.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Cornelia Huck I assume you'll take this one together with the rest of the series?
Re: [PATCH v2 2/9] hw/net/xilinx_axienet: Cleanup stream->push assignment
On 5/6/20 10:25 AM, Edgar E. Iglesias wrote: From: "Edgar E. Iglesias" Split the shared stream_class_init function to assign stream->push with better type-safety. Reviewed-by: Alistair Francis Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias Reviewed-by: Philippe Mathieu-Daudé --- hw/net/xilinx_axienet.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c index 0f97510d8a..84073753d7 100644 --- a/hw/net/xilinx_axienet.c +++ b/hw/net/xilinx_axienet.c @@ -1029,11 +1029,19 @@ static void xilinx_enet_class_init(ObjectClass *klass, void *data) dc->reset = xilinx_axienet_reset; } -static void xilinx_enet_stream_class_init(ObjectClass *klass, void *data) +static void xilinx_enet_control_stream_class_init(ObjectClass *klass, + void *data) { StreamSlaveClass *ssc = STREAM_SLAVE_CLASS(klass); -ssc->push = data; +ssc->push = xilinx_axienet_control_stream_push; +} + +static void xilinx_enet_data_stream_class_init(ObjectClass *klass, void *data) +{ +StreamSlaveClass *ssc = STREAM_SLAVE_CLASS(klass); + +ssc->push = xilinx_axienet_data_stream_push; } static const TypeInfo xilinx_enet_info = { @@ -1048,8 +1056,7 @@ static const TypeInfo xilinx_enet_data_stream_info = { .name = TYPE_XILINX_AXI_ENET_DATA_STREAM, .parent= TYPE_OBJECT, .instance_size = sizeof(struct XilinxAXIEnetStreamSlave), -.class_init= xilinx_enet_stream_class_init, -.class_data= xilinx_axienet_data_stream_push, +.class_init= xilinx_enet_data_stream_class_init, .interfaces = (InterfaceInfo[]) { { TYPE_STREAM_SLAVE }, { } @@ -1060,8 +1067,7 @@ static const TypeInfo xilinx_enet_control_stream_info = { .name = TYPE_XILINX_AXI_ENET_CONTROL_STREAM, .parent= TYPE_OBJECT, .instance_size = sizeof(struct XilinxAXIEnetStreamSlave), -.class_init= xilinx_enet_stream_class_init, -.class_data= xilinx_axienet_control_stream_push, +.class_init= xilinx_enet_control_stream_class_init, .interfaces = (InterfaceInfo[]) { { TYPE_STREAM_SLAVE }, { }
Re: [PATCH v2 3/9] hw/net/xilinx_axienet: Remove unncessary cast
On 5/6/20 10:25 AM, Edgar E. Iglesias wrote: From: "Edgar E. Iglesias" Remove unncessary cast, buf is already uint8_t *. Typo "unnecessary" here and in patch subject. Otherwise: Reviewed-by: Philippe Mathieu-Daudé No functional change. Reviewed-by: Alistair Francis Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- hw/net/xilinx_axienet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c index 84073753d7..c8dfcda3ee 100644 --- a/hw/net/xilinx_axienet.c +++ b/hw/net/xilinx_axienet.c @@ -918,7 +918,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t *buf, size_t size) uint16_t csum; tmp_csum = net_checksum_add(size - start_off, -(uint8_t *)buf + start_off); +buf + start_off); /* Accumulate the seed. */ tmp_csum += s->hdr[2] & 0x;
Re: [PATCH v2 09/18] s390x/cpumodel: Fix UI to CPU features pcc-cmac-{aes,eaes}-256
On 05.05.20 17:29, Markus Armbruster wrote: > Both s390_features[S390_FEAT_PCC_CMAC_AES_256].name and > s390_features[S390_FEAT_PCC_CMAC_EAES_256].name is > "pcc-cmac-eaes-256". The former is obviously a pasto. > > Impact: > > * s390_feat_bitmap_to_ascii() misidentifies S390_FEAT_PCC_CMAC_AES_256 > as "pcc-cmac-eaes-256". Affects QMP commands query-cpu-definitions, > query-cpu-model-expansion, query-cpu-model-baseline, > query-cpu-model-comparison, and the error message when > s390_realize_cpu_model() fails in check_compatibility(). > > * s390_cpu_list() also misidentifies it. Affects -cpu help. > > * s390_cpu_model_register_props() creates CPU property > "pcc-cmac-eaes-256" twice. The second one fails, but the error is > ignored (a later commit will change that). Results in a single > property "pcc-cmac-eaes-256" with the description for > S390_FEAT_PCC_CMAC_AES_256, and no property for > S390_FEAT_PCC_CMAC_EAES_256. CPU properties are visible in CLI -cpu > and -device, QMP & HMP device_add, QMP device-list-properties, and > QOM introspection. > > Fix by deleting the wayward 'e'. You dropped the comment regarding msa4, was that intended? -- Thanks, David / dhildenb
Re: [PATCH v25 00/10] Add ARMv8 RAS virtualization support in QEMU
On 2020/4/17 21:32, Peter Maydell wrote: > On Fri, 10 Apr 2020 at 12:46, Dongjiu Geng wrote: >> >> In the ARMv8 platform, the CPU error types includes synchronous external >> abort(SEA) >> and SError Interrupt (SEI). If exception happens in guest, host does not >> know the detailed >> information of guest, so it is expected that guest can do the recovery. For >> example, if an >> exception happens in a guest user-space application, host does not know >> which application >> encounters errors, only guest knows it. >> >> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify >> userspace. >> After user space gets the notification, it will record the CPER into guest >> GHES >> buffer and inject an exception or IRQ to guest. >> >> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we >> will >> treat it as a synchronous exception, and notify guest with ARMv8 SEA >> notification type after recording CPER into guest. > > Hi. I left a comment on patch 1. The other 3 patches unreviewed > are 5, 6 and 8, which are all ACPI core code, so that's for > MST, Igor or Shannon to review. > > Once those have been reviewed, please ping me if you want this > to go via target-arm.next. Hi Peter, Igor have reviewed all ACPI core code. whether you can apply this series to target-arm.next? I can make another patches to solve your comments on patch1 and another APCI comment. Thanks very much in advance. > > thanks > -- PMM > > . >
Re: [PATCH v2 4/9] hw/dma/xilinx_axidma: Add DMA memory-region property
Hi Edgar, On 5/6/20 10:25 AM, Edgar E. Iglesias wrote: From: "Edgar E. Iglesias" Add DMA memory-region property to externally control what address-space this DMA operates on. Reviewed-by: Alistair Francis Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- hw/dma/xilinx_axidma.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c index 018f36991b..4540051448 100644 --- a/hw/dma/xilinx_axidma.c +++ b/hw/dma/xilinx_axidma.c @@ -33,6 +33,7 @@ #include "qemu/log.h" #include "qemu/module.h" +#include "sysemu/dma.h" #include "hw/stream.h" #define D(x) @@ -103,6 +104,7 @@ enum { }; struct Stream { +struct XilinxAXIDMA *dma; ptimer_state *ptimer; qemu_irq irq; @@ -125,6 +127,9 @@ struct XilinxAXIDMAStreamSlave { struct XilinxAXIDMA { SysBusDevice busdev; MemoryRegion iomem; +MemoryRegion *dma_mr; +AddressSpace as; + uint32_t freqhz; StreamSlave *tx_data_dev; StreamSlave *tx_control_dev; @@ -186,7 +191,7 @@ static void stream_desc_load(struct Stream *s, hwaddr addr) { struct SDesc *d = &s->desc; -cpu_physical_memory_read(addr, d, sizeof *d); +address_space_read(&s->dma->as, addr, MEMTXATTRS_UNSPECIFIED, d, sizeof *d); /* Convert from LE into host endianness. */ d->buffer_address = le64_to_cpu(d->buffer_address); @@ -204,7 +209,8 @@ static void stream_desc_store(struct Stream *s, hwaddr addr) d->nxtdesc = cpu_to_le64(d->nxtdesc); d->control = cpu_to_le32(d->control); d->status = cpu_to_le32(d->status); -cpu_physical_memory_write(addr, d, sizeof *d); +address_space_write(&s->dma->as, addr, MEMTXATTRS_UNSPECIFIED, +d, sizeof *d); } static void stream_update_irq(struct Stream *s) @@ -286,8 +292,9 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, txlen + s->pos); } -cpu_physical_memory_read(s->desc.buffer_address, - s->txbuf + s->pos, txlen); +address_space_read(&s->dma->as, s->desc.buffer_address, + MEMTXATTRS_UNSPECIFIED, + s->txbuf + s->pos, txlen); s->pos += txlen; if (stream_desc_eof(&s->desc)) { @@ -336,7 +343,8 @@ static size_t stream_process_s2mem(struct Stream *s, unsigned char *buf, rxlen = len; } -cpu_physical_memory_write(s->desc.buffer_address, buf + pos, rxlen); +address_space_write(&s->dma->as, s->desc.buffer_address, +MEMTXATTRS_UNSPECIFIED, buf + pos, rxlen); len -= rxlen; pos += rxlen; @@ -525,6 +533,7 @@ static void xilinx_axidma_realize(DeviceState *dev, Error **errp) XilinxAXIDMAStreamSlave *cs = XILINX_AXI_DMA_CONTROL_STREAM( &s->rx_control_dev); Error *local_err = NULL; +int i; object_property_add_link(OBJECT(ds), "dma", TYPE_XILINX_AXI_DMA, (Object **)&ds->dma, @@ -545,17 +554,19 @@ static void xilinx_axidma_realize(DeviceState *dev, Error **errp) goto xilinx_axidma_realize_fail; } -int i; - for (i = 0; i < 2; i++) { struct Stream *st = &s->streams[i]; +st->dma = s; st->nr = i; st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_DEFAULT); ptimer_transaction_begin(st->ptimer); ptimer_set_freq(st->ptimer, s->freqhz); ptimer_transaction_commit(st->ptimer); } + +address_space_init(&s->as, + s->dma_mr ? s->dma_mr : get_system_memory(), "dma"); I'd instead return an error (earlier in this function) if dma_mr property is not set. If you ever respin... Reviewed-by: Philippe Mathieu-Daudé return; xilinx_axidma_realize_fail: @@ -575,6 +586,11 @@ static void xilinx_axidma_init(Object *obj) &s->rx_control_dev, sizeof(s->rx_control_dev), TYPE_XILINX_AXI_DMA_CONTROL_STREAM, &error_abort, NULL); +object_property_add_link(obj, "dma", TYPE_MEMORY_REGION, + (Object **)&s->dma_mr, + qdev_prop_allow_set_link_before_realize, + OBJ_PROP_LINK_STRONG, + &error_abort); sysbus_init_irq(sbd, &s->streams[0].irq); sysbus_init_irq(sbd, &s->streams[1].irq);
Re: [PATCH v2 00/13] microvm: add acpi support
On Tue, May 05, 2020 at 10:04:02AM -0400, Michael S. Tsirkin wrote: > On Tue, May 05, 2020 at 03:42:52PM +0200, Gerd Hoffmann wrote: > > I know that not supporting ACPI in microvm is intentional. If you still > > don't want ACPI this is perfectly fine, you can use the usual -no-acpi > > switch to toggle ACPI support. > > > > These are the advantages you are going to loose then: > > > > (1) virtio-mmio device discovery without command line hacks (tweaking > > the command line is a problem when not using direct kernel boot). > > (2) Better IO-APIC support, we can use IRQ lines 16-23. > > (3) ACPI power button (aka powerdown request) works. > > (4) machine poweroff (aka S5 state) works. > > Questions > > - what's the tradeoff in startup time? In the noise. 0.28-0.29 seconds on my hardware to the "i8042: PNP: No PS/2 controller found" message, no matter whenever acpi is on or off. With "quiet" (acpi prints more and logging to the serial console is slow). At that point -no-acpi takes one second to figure the ps2 controller really isn't there (as discussed before). Another interesting difference is interrupt handling. The -no-acpi version: CPU0 2: 0XT-PIC cascade 4:284 IO-APIC 4-edge ttyS0 8: 0 IO-APIC 8-edge rtc0 14: 5399 IO-APIC 14-edge virtio1 15: 58 IO-APIC 15-edge virtio0 NMI: 0 Non-maskable interrupts [ ... ] The acpi version: CPU0 1: 0 IO-APIC 9-edge ACPI:Ged 2:231 IO-APIC 23-fasteoi virtio0 3: 6291 IO-APIC 22-fasteoi virtio1 4: 1758 IO-APIC 4-edge ttyS0 5: 0 IO-APIC 8-edge rtc0 NMI: 0 Non-maskable interrupts [ ... ] > - what should be the default? IMO it makes sense to enable it by default. You get working power management. You can boot stock cloud images (patched seabios parses the dsdt to find virtio-mmio devices to boot from virtio-mmio disks). It's easier to leave behind legacy stuff: The kernel trusts the firmware and doesn't go into "trying harder to find ps2 kbd" mode. Also what is this "cascade" thing in /proc/interrupts above? [1] I expect dropping the rtc is easier with acpi too, the kernel probably wouldn't try to find it then. Right now seabios needs rtc cmos for ram size probing, so I didn't test that yet. On the other hand I don't really see any disadvantages. The tables are small ... # find /sys/firmware/acpi/tables/ -type f | xargs ls -l -r. 1 root root 70 May 6 06:48 /sys/firmware/acpi/tables/APIC -r. 1 root root 472 May 6 06:48 /sys/firmware/acpi/tables/DSDT -r. 1 root root 268 May 6 06:48 /sys/firmware/acpi/tables/FACP ... and simple (no methods) so you can hardly call that "bloat". > Based on above I'd be inclined to say default should stay no acpi and > users should enable acpi with an option. I disagree, but I can live with off by default too. We already have acpi=OnOffAuto for X86MachineState, so it is just a matter of handling microvm.acpi=auto accordingly in x86_machine_is_acpi_enabled(). take care, Gerd [1] Rhetorical question, I know what it is. [2] [2] I don't want remember though.
Re: [PATCH v2 00/13] microvm: add acpi support
On Wed, May 06, 2020 at 01:46:35PM +0200, Gerd Hoffmann wrote: > On Tue, May 05, 2020 at 10:04:02AM -0400, Michael S. Tsirkin wrote: > > On Tue, May 05, 2020 at 03:42:52PM +0200, Gerd Hoffmann wrote: > > > I know that not supporting ACPI in microvm is intentional. If you still > > > don't want ACPI this is perfectly fine, you can use the usual -no-acpi > > > switch to toggle ACPI support. > > > > > > These are the advantages you are going to loose then: > > > > > > (1) virtio-mmio device discovery without command line hacks (tweaking > > > the command line is a problem when not using direct kernel boot). > > > (2) Better IO-APIC support, we can use IRQ lines 16-23. > > > (3) ACPI power button (aka powerdown request) works. > > > (4) machine poweroff (aka S5 state) works. > > > > Questions > > > > - what's the tradeoff in startup time? > > In the noise. 0.28-0.29 seconds on my hardware to the "i8042: PNP: No > PS/2 controller found" message, no matter whenever acpi is on or off. > With "quiet" (acpi prints more and logging to the serial console is > slow). > > At that point -no-acpi takes one second to figure the ps2 controller > really isn't there (as discussed before). > > Another interesting difference is interrupt handling. > > The -no-acpi version: > >CPU0 > 2: 0XT-PIC cascade > 4:284 IO-APIC 4-edge ttyS0 > 8: 0 IO-APIC 8-edge rtc0 > 14: 5399 IO-APIC 14-edge virtio1 > 15: 58 IO-APIC 15-edge virtio0 > NMI: 0 Non-maskable interrupts > [ ... ] > > The acpi version: > >CPU0 > 1: 0 IO-APIC 9-edge ACPI:Ged > 2:231 IO-APIC 23-fasteoi virtio0 > 3: 6291 IO-APIC 22-fasteoi virtio1 > 4: 1758 IO-APIC 4-edge ttyS0 > 5: 0 IO-APIC 8-edge rtc0 > NMI: 0 Non-maskable interrupts > [ ... ] > > > - what should be the default? > > IMO it makes sense to enable it by default. You get working > power management. You can boot stock cloud images (patched > seabios parses the dsdt to find virtio-mmio devices to boot > from virtio-mmio disks). > > It's easier to leave behind legacy stuff: The kernel trusts the > firmware and doesn't go into "trying harder to find ps2 kbd" mode. > Also what is this "cascade" thing in /proc/interrupts above? [1] > > I expect dropping the rtc is easier with acpi too, the kernel probably > wouldn't try to find it then. Right now seabios needs rtc cmos for > ram size probing, so I didn't test that yet. > > On the other hand I don't really see any disadvantages. The tables are > small ... > > # find /sys/firmware/acpi/tables/ -type f | xargs ls -l > -r. 1 root root 70 May 6 06:48 /sys/firmware/acpi/tables/APIC > -r. 1 root root 472 May 6 06:48 /sys/firmware/acpi/tables/DSDT > -r. 1 root root 268 May 6 06:48 /sys/firmware/acpi/tables/FACP > > ... and simple (no methods) so you can hardly call that "bloat". > > > Based on above I'd be inclined to say default should stay no acpi and > > users should enable acpi with an option. > > I disagree, but I can live with off by default too. We already have > acpi=OnOffAuto for X86MachineState, so it is just a matter of handling > microvm.acpi=auto accordingly in x86_machine_is_acpi_enabled(). > > take care, > Gerd > > [1] Rhetorical question, I know what it is. [2] > [2] I don't want remember though. Let's leave flipping the default as a separate patch, to be decided on merits after a bunch of people test with/without. -- MST
Re: [PATCH v2 5/9] hw/core: stream: Add an end-of-packet flag
Hi Edgar, On 5/6/20 10:25 AM, Edgar E. Iglesias wrote: From: "Edgar E. Iglesias" Some stream clients stream an endless stream of data while other clients stream data in packets. Stream interfaces usually have a way to signal the end of a packet or the last beat of a transfer. This adds an end-of-packet flag to the push interface. Reviewed-by: Alistair Francis Reviewed-by: Francisco Iglesias Signed-off-by: Edgar E. Iglesias --- include/hw/stream.h | 5 +++-- hw/core/stream.c| 4 ++-- hw/dma/xilinx_axidma.c | 10 ++ hw/net/xilinx_axienet.c | 14 ++ hw/ssi/xilinx_spips.c | 2 +- 5 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/hw/stream.h b/include/hw/stream.h index d02f62ca89..ed09e83683 100644 --- a/include/hw/stream.h +++ b/include/hw/stream.h @@ -39,12 +39,13 @@ typedef struct StreamSlaveClass { * @obj: Stream slave to push to * @buf: Data to write * @len: Maximum number of bytes to write + * @eop: End of packet flag */ -size_t (*push)(StreamSlave *obj, unsigned char *buf, size_t len); +size_t (*push)(StreamSlave *obj, unsigned char *buf, size_t len, bool eop); I'd split this patch, first add EOP in the push handler, keeping current code working, then the following patches (implementing the feature in the backend handlers), then ... } StreamSlaveClass; size_t -stream_push(StreamSlave *sink, uint8_t *buf, size_t len); +stream_push(StreamSlave *sink, uint8_t *buf, size_t len, bool eop); ... this final patch, enable the feature and let the frontends use it. bool stream_can_push(StreamSlave *sink, StreamCanPushNotifyFn notify, diff --git a/hw/core/stream.c b/hw/core/stream.c index 39b1e595cd..a65ad1208d 100644 --- a/hw/core/stream.c +++ b/hw/core/stream.c @@ -3,11 +3,11 @@ #include "qemu/module.h" size_t -stream_push(StreamSlave *sink, uint8_t *buf, size_t len) +stream_push(StreamSlave *sink, uint8_t *buf, size_t len, bool eop) { StreamSlaveClass *k = STREAM_SLAVE_GET_CLASS(sink); -return k->push(sink, buf, len); +return k->push(sink, buf, len, eop); So in this first part patch I'd use 'false' here, and update by 'eop' in the other part (last patch in series). Does it make sense? Regards, Phil. } bool diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c index 4540051448..a770e12c96 100644 --- a/hw/dma/xilinx_axidma.c +++ b/hw/dma/xilinx_axidma.c @@ -283,7 +283,7 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, if (stream_desc_sof(&s->desc)) { s->pos = 0; -stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app)); +stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app), true); } txlen = s->desc.control & SDESC_CTRL_LEN_MASK; @@ -298,7 +298,7 @@ static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev, s->pos += txlen; if (stream_desc_eof(&s->desc)) { -stream_push(tx_data_dev, s->txbuf, s->pos); +stream_push(tx_data_dev, s->txbuf, s->pos, true); s->pos = 0; stream_complete(s); } @@ -384,7 +384,7 @@ static void xilinx_axidma_reset(DeviceState *dev) static size_t xilinx_axidma_control_stream_push(StreamSlave *obj, unsigned char *buf, - size_t len) + size_t len, bool eop) { XilinxAXIDMAStreamSlave *cs = XILINX_AXI_DMA_CONTROL_STREAM(obj); struct Stream *s = &cs->dma->streams[1]; @@ -416,12 +416,14 @@ xilinx_axidma_data_stream_can_push(StreamSlave *obj, } static size_t -xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t len) +xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t len, + bool eop) { XilinxAXIDMAStreamSlave *ds = XILINX_AXI_DMA_DATA_STREAM(obj); struct Stream *s = &ds->dma->streams[1]; size_t ret; +assert(eop); ret = stream_process_s2mem(s, buf, len); stream_update_irq(s); return ret; diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c index c8dfcda3ee..bd48305577 100644 --- a/hw/net/xilinx_axienet.c +++ b/hw/net/xilinx_axienet.c @@ -697,14 +697,14 @@ static void axienet_eth_rx_notify(void *opaque) axienet_eth_rx_notify, s)) { size_t ret = stream_push(s->tx_control_dev, (void *)s->rxapp + CONTROL_PAYLOAD_SIZE - - s->rxappsize, s->rxappsize); + - s->rxappsize, s->rxappsize, true); s->rxappsize -= ret; } while (s->rxsize && stream_can_push(s->tx_data_dev, axienet_eth_rx_notify, s)) { size_t ret = stream_push(s->tx_data_dev, (void *)s
Re: [PATCH v2 9/9] MAINTAINERS: Add myself as streams maintainer
On 5/6/20 10:25 AM, Edgar E. Iglesias wrote: From: "Edgar E. Iglesias" Since we're missing a maintainer, add myself. Reviewed-by: Alistair Francis Signed-off-by: Edgar E. Iglesias --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 1f84e3ae2c..d3663d6c9a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2315,6 +2315,12 @@ F: net/slirp.c F: include/net/slirp.h T: git https://people.debian.org/~sthibault/qemu.git slirp +Streams +M: Edgar E. Iglesias +S: Maintained +F: hw/core/stream.c +F: include/hw/stream.h + Stubs M: Paolo Bonzini S: Maintained Reviewed-by: Philippe Mathieu-Daudé
[PULL 02/14] .travis.yml: drop MacOSX
This keeps breaking on Travis so lets just fall back to the Cirrus CI builds which seem to be better maintained. Fix up the comments while we are doing this as we never had a windows build. Signed-off-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé Acked-by: Richard Henderson Message-Id: <2020050505.4225-3-alex.ben...@linaro.org> diff --git a/.travis.yml b/.travis.yml index a4c3c6c8058..49267b73b36 100644 --- a/.travis.yml +++ b/.travis.yml @@ -9,9 +9,8 @@ compiler: cache: # There is one cache per branch and compiler version. # characteristics of each job are used to identify the cache: - # - OS name (currently, linux, osx, or windows) + # - OS name (currently only linux) # - OS distribution (for Linux, xenial, trusty, or precise) - # - macOS image name (e.g., xcode7.2) # - Names and values of visible environment variables set in .travis.yml or Settings panel timeout: 1200 ccache: true @@ -271,31 +270,6 @@ jobs: - TEST_CMD="" -# MacOSX builds - cirrus.yml also tests some MacOS builds including latest Xcode - -- name: "OSX Xcode 10.3" - env: -- BASE_CONFIG="--disable-docs --enable-tools" -- CONFIG="--target-list=i386-softmmu,ppc-softmmu,ppc64-softmmu,m68k-softmmu,x86_64-softmmu" - os: osx - osx_image: xcode10.3 - compiler: clang - addons: -homebrew: - packages: -- ccache -- glib -- pixman -- gnu-sed -- python - update: true - before_script: -- brew link --overwrite python -- export PATH="/usr/local/opt/ccache/libexec:$PATH" -- mkdir -p ${BUILD_DIR} && cd ${BUILD_DIR} -- ${SRC_DIR}/configure ${BASE_CONFIG} ${CONFIG} || { cat config.log && exit 1; } - - # Python builds - name: "GCC Python 3.5 (x86_64-softmmu)" env: -- 2.20.1