date:20200506

Re: [PATCH v2 5/9] block/io: expand in_flight inc/dec section: simple cases

2020-05-06 Thread Vladimir Sementsov-Ogievskiy


27.04.2020 17:39, Vladimir Sementsov-Ogievskiy wrote:

It's safer to expand in_flight request to start before enter to
coroutine in synchronous wrappers, due to the following (theoretical)
problem:

Consider write.
It's possible, that qemu_coroutine_enter only schedules execution,
assume such case.

Then we may possibly have the following:

1. Somehow check that we are not in drained section in outer code.

2. Call bdrv_pwritev(), assuming that it will increase in_flight, which
will protect us from starting drained section.

3. It calls bdrv_prwv_co() -> bdrv_coroutine_enter() (not yet increased
in_flight).

4. Assume coroutine not yet actually entered, only scheduled, and we go
to some code, which starts drained section (as in_flight is zero).

5. Scheduled coroutine starts, and blindly increases in_flight, and we
are in drained section with in_flight request.

Signed-off-by: Vladimir Sementsov-Ogievskiy 


Very interesting: this patch breaks test-replication. It hangs:

(gdb) thr a a bt

Thread 2 (Thread 0x7eff256cd700 (LWP 2843)):
#0  0x7eff2f5fd1fd in syscall () from /lib64/libc.so.6
#1  0x55af9a9a4f11 in qemu_futex_wait (f=0x55af9aa6f758 
, val=4294967295) at 
/work/src/qemu/up-expand-bdrv-in_flight-bounds/include/qemu/futex.h:29
#2  0x55af9a9a50d5 in qemu_event_wait (ev=0x55af9aa6f758 
) at util/qemu-thread-posix.c:459
#3  0x55af9a9bd20d in call_rcu_thread (opaque=0x0) at util/rcu.c:260
#4  0x55af9a9a5288 in qemu_thread_start (args=0x55af9c4f1b80) at 
util/qemu-thread-posix.c:519
#5  0x7eff2f6d44c0 in start_thread () from /lib64/libpthread.so.0
#6  0x7eff2f602553 in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7eff25820a80 (LWP 2842)):
#0  0x7eff2f5f7bd6 in ppoll () from /lib64/libc.so.6
#1  0x55af9a99e405 in qemu_poll_ns (fds=0x55af9c52a830, nfds=1, timeout=-1) 
at util/qemu-timer.c:335
#2  0x55af9a9a1cab in fdmon_poll_wait (ctx=0x55af9c526890, 
ready_list=0x7ffc73e8c5d0, timeout=-1) at util/fdmon-poll.c:79
#3  0x55af9a9a160c in aio_poll (ctx=0x55af9c526890, blocking=true) at 
util/aio-posix.c:600
#4  0x55af9a8f0bb0 in bdrv_do_drained_begin (bs=0x55af9c52a8d0, 
recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at 
block/io.c:429
#5  0x55af9a8f0c95 in bdrv_drained_begin (bs=0x55af9c52a8d0) at 
block/io.c:435
#6  0x55af9a8dc6a8 in blk_drain (blk=0x55af9c542c10) at 
block/block-backend.c:1681
#7  0x55af9a8da0b6 in blk_unref (blk=0x55af9c542c10) at 
block/block-backend.c:473
#8  0x55af9a8eb5e7 in mirror_exit_common (job=0x55af9c6c45c0) at 
block/mirror.c:667
#9  0x55af9a8eb9c1 in mirror_prepare (job=0x55af9c6c45c0) at 
block/mirror.c:765
#10 0x55af9a87cd65 in job_prepare (job=0x55af9c6c45c0) at job.c:781
#11 0x55af9a87b62a in job_txn_apply (job=0x55af9c6c45c0, fn=0x55af9a87cd28 
) at job.c:158
#12 0x55af9a87cdee in job_do_finalize (job=0x55af9c6c45c0) at job.c:798
#13 0x55af9a87cfb5 in job_completed_txn_success (job=0x55af9c6c45c0) at 
job.c:852
#14 0x55af9a87d055 in job_completed (job=0x55af9c6c45c0) at job.c:865
#15 0x55af9a87d0a8 in job_exit (opaque=0x55af9c6c45c0) at job.c:885
#16 0x55af9a99b981 in aio_bh_call (bh=0x55af9c547440) at util/async.c:136
#17 0x55af9a99ba8b in aio_bh_poll (ctx=0x55af9c526890) at util/async.c:164
#18 0x55af9a9a17ff in aio_poll (ctx=0x55af9c526890, blocking=true) at 
util/aio-posix.c:650
#19 0x55af9a8f7011 in bdrv_flush (bs=0x55af9c53b900) at block/io.c:3019
#20 0x55af9a874351 in bdrv_close (bs=0x55af9c53b900) at block.c:4252
#21 0x55af9a874ca3 in bdrv_delete (bs=0x55af9c53b900) at block.c:4498
#22 0x55af9a877862 in bdrv_unref (bs=0x55af9c53b900) at block.c:5866
#23 0x55af9a870837 in bdrv_root_unref_child (child=0x55af9c6c4430) at 
block.c:2684
#24 0x55af9a8da9a2 in blk_remove_bs (blk=0x55af9c547bd0) at 
block/block-backend.c:803
#25 0x55af9a8d9e54 in blk_delete (blk=0x55af9c547bd0) at 
block/block-backend.c:422
#26 0x55af9a8da0f8 in blk_unref (blk=0x55af9c547bd0) at 
block/block-backend.c:477
#27 0x55af9a86a6f1 in teardown_secondary () at tests/test-replication.c:392
#28 0x55af9a86aac1 in test_secondary_stop () at tests/test-replication.c:490
#29 0x7eff2fd7df7e in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#30 0x7eff2fd7dd24 in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#31 0x7eff2fd7dd24 in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#32 0x7eff2fd7e46a in g_test_run_suite () from /lib64/libglib-2.0.so.0
#33 0x7eff2fd7e485 in g_test_run () from /lib64/libglib-2.0.so.0
#34 0x55af9a86b19c in main (argc=1, argv=0x7ffc73e8d088) at 
tests/test-replication.c:645


(gdb) p ((BlockBackend *)0x55af9c547bd0)->in_flight
$5 = 0
(gdb) p ((BlockBackend *)0x55af9c542c10)->in_flight
$6 = 0
(gdb) p ((BlockDriverState *)0x55af9c53b900)->in_flight
$7 = 1
(gdb) p ((BlockDriverState *)0x55af9c52a8d0)->in_flight
$8 = 0
(gdb) fr 20
#20 0x55af9a874351 in bdrv

Re: [PATCH v2 04/18] qom: Simplify object_property_get_enum()

2020-05-06 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 5/5/20 5:29 PM, Markus Armbruster wrote:
>> Reuse object_property_get_str().  Switches from the string to the
>> qobject visitor under the hood.
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>   qom/object.c | 11 ++-
>>   1 file changed, 2 insertions(+), 9 deletions(-)
>>
>> diff --git a/qom/object.c b/qom/object.c
>> index 3d65658059..b374af302c 100644
>> --- a/qom/object.c
>> +++ b/qom/object.c
>> @@ -1521,8 +1521,6 @@ typedef struct EnumProperty {
>>   int object_property_get_enum(Object *obj, const char *name,
>>const char *typename, Error **errp)
>>   {
>> -Error *err = NULL;
>> -Visitor *v;
>>   char *str;
>>   int ret;
>>   ObjectProperty *prop = object_property_find(obj, name, errp);
>> @@ -1541,15 +1539,10 @@ int object_property_get_enum(Object *obj, const char 
>> *name,
>> enumprop = prop->opaque;
>>   -v = string_output_visitor_new(false, &str);
>> -object_property_get(obj, v, name, &err);
>> -if (err) {
>> -error_propagate(errp, err);
>> -visit_free(v);
>> +str = object_property_get_str(obj, name, errp);
>> +if (!str) {
>
> Patch looks good but I'm not confident enough to add a R-b tag :)

Teaching opportunity!

>>   return 0;
>>   }
>> -visit_complete(v, &str);
>> -visit_free(v);
>> ret = qapi_enum_parse(enumprop->lookup, str, -1, errp);
>>   g_free(str);
>>

The core function for getting properties is object_property_get().  To
be used like this:

v = ... new output visitor of your choice ...
object_property_get(obj, v, name, &err);
if (!err) {
visit_complete(v, &ret);
}
visit_free(v);

Delivers the result in @ret and @err.

The type of @ret depends on the visitor.  It typically needs to be
converted to the appropriate C type.

Life's too short to write that much code every time you want to get a
property value.  So we provide two levels of common helpers.

Level 1: the output visitor commonly used is the QObject output
visitor.  Combining object_property_get() with it yields

QObject *object_property_get_qobject(Object *obj, const char *name,
 Error **errp)
{
QObject *ret = NULL;
Error *local_err = NULL;
Visitor *v;

v = qobject_output_visitor_new(&ret);
object_property_get(obj, v, name, &local_err);
if (!local_err) {
visit_complete(v, &ret);
}
error_propagate(errp, local_err);
visit_free(v);
return ret;
}

The use I showed above becomes

ret = object_property_get_qobject(obj, name, &err);

Again, result is in @ret and @err.  You commonly need to convert @ret
from QObject to the property's C type, and handle conversion errors.

Still too much code, so we provide convenience functions for common
types.  Here's the one for strings:

char *object_property_get_str(Object *obj, const char *name,
  Error **errp)
{
QObject *ret = object_property_get_qobject(obj, name, errp);
char *retval;

if (!ret) {
return NULL;
}

retval = g_strdup(qobject_get_try_str(ret));
if (!retval) {
error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name, "string");
}

qobject_unref(ret);
return retval;
}

Now back to my patch.  Before the patch, object_property_get_enum() is
odd: it uses the string output visitor.  Works, but why do it by hand
when we can simply reuse existing object_property_get_str()?  All we
need is the (checked) conversion from string to enum.

Clearer now?

Bonus: one fewer use of a string visitor.  These need to die, but that's
another story.

Re: [PATCH v2 05/18] qom: Drop convenience method object_property_get_uint16List()

2020-05-06 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 05/05/20 17:29, Markus Armbruster wrote:
>> qom/object.c provides object_property_get_TYPE() and
>> object_property_set_TYPE() for a number of common types.  These are
>> all convenience wrappers around object_property_get_qobject() and
>> object_property_set_qobject().
>> 
>> Except for object_property_get_uint16List(), which is unusual in two ways:
>> 
>> * It bypasses object_property_get_qobject().  Fixable; the previous
>>   commit did it for object_property_get_enum())
>> 
>> * It stores the value through a parameter.  Its contract claims it
>>   returns the value, like the other functions do.  Also fixable.
>> 
>> Fixing is not worthwhile, though: object_property_get_uint16List() has
>> seen exactly one user in six years.
>> 
>> Convert the lone user to do its job with the generic
>> object_property_get_qobject(), and drop object_property_get_qobject().
>
> Typo, otherwise

Will fix.

> Reviewed-by: Paolo Bonzini 

Thanks!

Re: [PATCH v2 00/13] microvm: add acpi support

2020-05-06 Thread Sergio Lopez

On Tue, May 05, 2020 at 04:16:00PM +0200, Philippe Mathieu-Daudé wrote:
> On 5/5/20 4:04 PM, Michael S. Tsirkin wrote:
> > On Tue, May 05, 2020 at 03:42:52PM +0200, Gerd Hoffmann wrote:
> > > I know that not supporting ACPI in microvm is intentional.  If you still
> > > don't want ACPI this is perfectly fine, you can use the usual -no-acpi
> > > switch to toggle ACPI support.
> > > 
> > > These are the advantages you are going to loose then:
> > > 
> > >(1) virtio-mmio device discovery without command line hacks (tweaking
> > >the command line is a problem when not using direct kernel boot).
> > >(2) Better IO-APIC support, we can use IRQ lines 16-23.
> > >(3) ACPI power button (aka powerdown request) works.
> > >(4) machine poweroff (aka S5 state) works.
> > 
> > Questions
> > 
> > - what's the tradeoff in startup time?
> > - what should be the default?
> > 
> > Based on above I'd be inclined to say default should stay no acpi and
> > users should enable acpi with an option.
> 
> As this machine was added to have the least minimum hardware required, I'd
> keep the default with no ACPI and have user requiring it to use an option.
> My 2 cents obviously.

I also share this opinion. And I would prefer it to be a machine type
option, defaulting to "off", but I guess that's a matter of taste.

Sergio.

> > 
> > > Together with seabios patches for virtio-mmio support this allows to
> > > boot standard fedora images (cloud, coreos, workstation live) with the
> > > microvm machine type.
> > > 
> > > git branch for testing (including updated seabios):
> > >   https://git.kraxel.org/cgit/qemu/log/?h=sirius/microvm
> > > 
> > > changes in v2:
> > >* some acpi cleanups are an separate patch series now.
> > >* switched to hw reduced acpi & generic event device.
> > >* misc fixes here and there.
> > > 
> > > cheers,
> > >Gerd
> > > 
> > > Gerd Hoffmann (13):
> > >acpi: make build_madt() more generic.
> > >acpi: create acpi-common.c and move madt code
> > >acpi: madt: skip pci override on pci-less systems (microvm)
> > >acpi: move acpi_build_facs to acpi-common.c
> > >acpi: move acpi_init_common_fadt_data to acpi-common.c
> > >acpi: move acpi_align_size to acpi-common.h
> > >acpi: fadt: add hw-reduced sleep register support
> > >acpi: generic event device for x86
> > >microvm: add minimal acpi support
> > >microvm: disable virtio-mmio cmdline hack
> > >microvm: add acpi_dsdt_add_virtio() for x86
> > >microvm: make virtio irq base runtime configurable
> > >microvm/acpi: use GSI 16-23 for virtio
> > > 
> > >   hw/i386/acpi-common.h  |  38 
> > >   hw/i386/acpi-microvm.h |   6 +
> > >   include/hw/acpi/acpi-defs.h|   2 +
> > >   include/hw/acpi/generic_event_device.h |  10 +
> > >   include/hw/i386/microvm.h  |  10 +-
> > >   hw/acpi/aml-build.c|   4 +-
> > >   hw/i386/acpi-build.c   | 198 +---
> > >   hw/i386/acpi-common.c  | 206 
> > >   hw/i386/acpi-microvm.c | 249 +
> > >   hw/i386/generic_event_device_x86.c | 114 +++
> > >   hw/i386/microvm.c  |  36 +++-
> > >   hw/i386/Kconfig|   1 +
> > >   hw/i386/Makefile.objs  |   3 +
> > >   13 files changed, 676 insertions(+), 201 deletions(-)
> > >   create mode 100644 hw/i386/acpi-common.h
> > >   create mode 100644 hw/i386/acpi-microvm.h
> > >   create mode 100644 hw/i386/acpi-common.c
> > >   create mode 100644 hw/i386/acpi-microvm.c
> > >   create mode 100644 hw/i386/generic_event_device_x86.c
> > > 
> > > -- 
> > > 2.18.4
> > 
> > 
> 


signature.asc
Description: PGP signature

Re: [PATCH v2 11/13] microvm: add acpi_dsdt_add_virtio() for x86

2020-05-06 Thread Sergio Lopez

On Tue, May 05, 2020 at 03:43:03PM +0200, Gerd Hoffmann wrote:
> Makes x86 linux kernel find virtio-mmio devices automatically.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/i386/acpi-microvm.c | 51 ++
>  1 file changed, 51 insertions(+)

Reviewed-by: Sergio Lopez 

> diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c
> index ce5ab86d642c..4d91ac9360ce 100644
> --- a/hw/i386/acpi-microvm.c
> +++ b/hw/i386/acpi-microvm.c
> @@ -21,6 +21,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include "qapi/error.h"
>  
>  #include "exec/memory.h"
> @@ -32,6 +33,7 @@
>  #include "hw/boards.h"
>  #include "hw/i386/fw_cfg.h"
>  #include "hw/i386/microvm.h"
> +#include "hw/virtio/virtio-mmio.h"
>  
>  #include "acpi-common.h"
>  #include "acpi-microvm.h"
> @@ -45,6 +47,54 @@ static void acpi_dsdt_add_power_button(Aml *scope)
>  aml_append(scope, dev);
>  }
>  
> +static void acpi_dsdt_add_virtio(Aml *scope)
> +{
> +gchar *separator;
> +long int index;
> +BusState *bus;
> +BusChild *kid;
> +
> +bus = sysbus_get_default();
> +QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +DeviceState *dev = kid->child;
> +Object *obj = object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MMIO);
> +
> +if (obj) {
> +VirtIOMMIOProxy *mmio = VIRTIO_MMIO(obj);
> +VirtioBusState *mmio_virtio_bus = &mmio->bus;
> +BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
> +
> +if (QTAILQ_EMPTY(&mmio_bus->children)) {
> +continue;
> +}
> +separator = g_strrstr(mmio_bus->name, ".");
> +if (!separator) {
> +continue;
> +}
> +if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
> +continue;
> +}
> +
> +uint32_t irq = VIRTIO_IRQ_BASE + index;
> +hwaddr base = VIRTIO_MMIO_BASE + index * 512;
> +hwaddr size = 512;
> +
> +Aml *dev = aml_device("VR%02u", (unsigned)index);
> +aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0005")));
> +aml_append(dev, aml_name_decl("_UID", aml_int(index)));
> +aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
> +
> +Aml *crs = aml_resource_template();
> +aml_append(crs, aml_memory32_fixed(base, size, AML_READ_WRITE));
> +aml_append(crs,
> +   aml_interrupt(AML_CONSUMER, AML_LEVEL, 
> AML_ACTIVE_HIGH,
> + AML_EXCLUSIVE, &irq, 1));
> +aml_append(dev, aml_name_decl("_CRS", crs));
> +aml_append(scope, dev);
> +}
> +}
> +}
> +
>  static void
>  build_dsdt_microvm(GArray *table_data, BIOSLinker *linker,
> MicrovmMachineState *mms)
> @@ -69,6 +119,7 @@ build_dsdt_microvm(GArray *table_data, BIOSLinker *linker,
>  build_ged_aml(sb_scope, GED_DEVICE, HOTPLUG_HANDLER(mms->acpi_dev),
>GED_MMIO_IRQ, AML_SYSTEM_MEMORY, GED_MMIO_BASE);
>  acpi_dsdt_add_power_button(sb_scope);
> +acpi_dsdt_add_virtio(sb_scope);
>  aml_append(dsdt, sb_scope);
>  
>  scope = aml_scope("\\");
> -- 
> 2.18.4
> 


signature.asc
Description: PGP signature

Re: [PATCH v2 13/13] microvm/acpi: use GSI 16-23 for virtio

2020-05-06 Thread Sergio Lopez

On Tue, May 05, 2020 at 03:43:05PM +0200, Gerd Hoffmann wrote:
> With ACPI enabled and IO-APIC being properly declared in the ACPI tables
> we can use interrupt lines 16-23 for virtio and avoid shared interrupts.
> 
> With acpi disabled we continue to use lines 8-15.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/i386/microvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Sergio Lopez 

> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> index 2aa2804e4ca0..08ed2a17f2ca 100644
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -124,7 +124,7 @@ static void microvm_devices_init(MicrovmMachineState *mms)
>  
>  kvmclock_create();
>  
> -mms->virtio_irq_base = 8;
> +mms->virtio_irq_base = x86_machine_is_acpi_enabled(x86ms) ? 16 : 8;
>  for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>  sysbus_create_simple("virtio-mmio",
>   VIRTIO_MMIO_BASE + i * 512,
> -- 
> 2.18.4
> 


signature.asc
Description: PGP signature

Re: [PATCH v2 12/13] microvm: make virtio irq base runtime configurable

2020-05-06 Thread Sergio Lopez

On Tue, May 05, 2020 at 03:43:04PM +0200, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> ---
>  include/hw/i386/microvm.h |  2 +-
>  hw/i386/acpi-microvm.c|  6 +++---
>  hw/i386/microvm.c | 11 +++
>  3 files changed, 11 insertions(+), 8 deletions(-)

Reviewed-by: Sergio Lopez 

> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
> index 55f5984cfaa1..878d2a8011f4 100644
> --- a/include/hw/i386/microvm.h
> +++ b/include/hw/i386/microvm.h
> @@ -28,7 +28,6 @@
>  
>  /* Platform virtio definitions */
>  #define VIRTIO_MMIO_BASE  0xc000
> -#define VIRTIO_IRQ_BASE   5
>  #define VIRTIO_NUM_TRANSPORTS 8
>  #define VIRTIO_CMDLINE_MAXLEN 64
>  
> @@ -63,6 +62,7 @@ typedef struct {
>  bool auto_kernel_cmdline;
>  
>  /* Machine state */
> +uint32_t virtio_irq_base;
>  bool kernel_cmdline_fixed;
>  Notifier machine_done;
>  AcpiDeviceIf *acpi_dev;
> diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c
> index 4d91ac9360ce..1230080c45cd 100644
> --- a/hw/i386/acpi-microvm.c
> +++ b/hw/i386/acpi-microvm.c
> @@ -47,7 +47,7 @@ static void acpi_dsdt_add_power_button(Aml *scope)
>  aml_append(scope, dev);
>  }
>  
> -static void acpi_dsdt_add_virtio(Aml *scope)
> +static void acpi_dsdt_add_virtio(Aml *scope, MicrovmMachineState *mms)
>  {
>  gchar *separator;
>  long int index;
> @@ -75,7 +75,7 @@ static void acpi_dsdt_add_virtio(Aml *scope)
>  continue;
>  }
>  
> -uint32_t irq = VIRTIO_IRQ_BASE + index;
> +uint32_t irq = mms->virtio_irq_base + index;
>  hwaddr base = VIRTIO_MMIO_BASE + index * 512;
>  hwaddr size = 512;
>  
> @@ -119,7 +119,7 @@ build_dsdt_microvm(GArray *table_data, BIOSLinker *linker,
>  build_ged_aml(sb_scope, GED_DEVICE, HOTPLUG_HANDLER(mms->acpi_dev),
>GED_MMIO_IRQ, AML_SYSTEM_MEMORY, GED_MMIO_BASE);
>  acpi_dsdt_add_power_button(sb_scope);
> -acpi_dsdt_add_virtio(sb_scope);
> +acpi_dsdt_add_virtio(sb_scope, mms);
>  aml_append(dsdt, sb_scope);
>  
>  scope = aml_scope("\\");
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> index a3708fdf1e39..2aa2804e4ca0 100644
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -124,10 +124,11 @@ static void microvm_devices_init(MicrovmMachineState 
> *mms)
>  
>  kvmclock_create();
>  
> +mms->virtio_irq_base = 8;
>  for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>  sysbus_create_simple("virtio-mmio",
>   VIRTIO_MMIO_BASE + i * 512,
> - x86ms->gsi[VIRTIO_IRQ_BASE + i]);
> + x86ms->gsi[mms->virtio_irq_base + i]);
>  }
>  
>  /* Optional and legacy devices */
> @@ -274,7 +275,7 @@ static void microvm_memory_init(MicrovmMachineState *mms)
>  x86ms->ioapic_as = &address_space_memory;
>  }
>  
> -static gchar *microvm_get_mmio_cmdline(gchar *name)
> +static gchar *microvm_get_mmio_cmdline(gchar *name, uint32_t virtio_irq_base)
>  {
>  gchar *cmdline;
>  gchar *separator;
> @@ -294,7 +295,7 @@ static gchar *microvm_get_mmio_cmdline(gchar *name)
>  ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
>   " virtio_mmio.device=512@0x%lx:%ld",
>   VIRTIO_MMIO_BASE + index * 512,
> - VIRTIO_IRQ_BASE + index);
> + virtio_irq_base + index);
>  if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
>  g_free(cmdline);
>  return NULL;
> @@ -306,6 +307,7 @@ static gchar *microvm_get_mmio_cmdline(gchar *name)
>  static void microvm_fix_kernel_cmdline(MachineState *machine)
>  {
>  X86MachineState *x86ms = X86_MACHINE(machine);
> +MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>  BusState *bus;
>  BusChild *kid;
>  char *cmdline;
> @@ -329,7 +331,8 @@ static void microvm_fix_kernel_cmdline(MachineState 
> *machine)
>  BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>  
>  if (!QTAILQ_EMPTY(&mmio_bus->children)) {
> -gchar *mmio_cmdline = 
> microvm_get_mmio_cmdline(mmio_bus->name);
> +gchar *mmio_cmdline = microvm_get_mmio_cmdline
> +(mmio_bus->name, mms->virtio_irq_base);
>  if (mmio_cmdline) {
>  char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, 
> NULL);
>  g_free(mmio_cmdline);
> -- 
> 2.18.4
> 


signature.asc
Description: PGP signature

[PATCH] icount: fix shift=auto for record/replay

2020-05-06 Thread Pavel Dovgalyuk

This patch fixes shift=auto when record/replay is enabled.
Now user does not need to guess the best shift value.

Signed-off-by: Pavel Dovgalyuk 
---
 cpus.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 5670c96bcf..dfb9f4717f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -379,7 +379,8 @@ static void icount_adjust(void)
 
 seqlock_write_lock(&timers_state.vm_clock_seqlock,
&timers_state.vm_clock_lock);
-cur_time = cpu_get_clock_locked();
+cur_time = REPLAY_CLOCK_LOCKED(REPLAY_CLOCK_VIRTUAL_RT,
+   cpu_get_clock_locked());
 cur_icount = cpu_get_icount_locked();
 
 delta = cur_icount - cur_time;
@@ -685,6 +686,7 @@ static const VMStateDescription icount_vmstate_timers = {
 .fields = (VMStateField[]) {
 VMSTATE_INT64(qemu_icount_bias, TimersState),
 VMSTATE_INT64(qemu_icount, TimersState),
+VMSTATE_INT16(icount_time_shift, TimersState),
 VMSTATE_END_OF_LIST()
 },
 .subsections = (const VMStateDescription*[]) {

Re: [PATCH v23 0/4] implement zstd cluster compression method

2020-05-06 Thread Denis Plotnikov





On 05.05.2020 15:03, Max Reitz wrote:

On 05.05.20 12:26, Max Reitz wrote:

On 30.04.20 12:19, Denis Plotnikov wrote:

v23:
Undecided: whether to add zstd(zlib) compression
   details to the qcow2 spec
03: tighten assertion on zstd decompression [Eric]
04: use _rm_test_img appropriately [Max]

Thanks, applied to my block branch:

I’m afraid I have to unqueue this series again, because it makes many
iotests fail due to an additional “compression_type=zlib” output when
images are created, an additional “compression type” line in
qemu-img info output where format-specific information is not
suppressed, and an additional line in qemu-img create -f qcow2 -o help.

Max



Hmm, this is strange. I made some modifications for the tests
in 0001 of the series (qcow2: introduce compression type feature).

Among the other test related changes, the patch contains the hunk:

+++ b/tests/qemu-iotests/common.filter
@@ -152,7 +152,8 @@ _filter_img_create()
 -e "s# refcount_bits=[0-9]\\+##g" \
 -e "s# key-secret=[a-zA-Z0-9]\\+##g" \
 -e "s# iter-time=[0-9]\\+##g" \
-    -e "s# force_size=\\(on\\|off\\)##g"
+    -e "s# force_size=\\(on\\|off\\)##g" \
+    -e "s# compression_type=[a-zA-Z0-9]\\+##g"
 }

which has to filter out "compression_type" on image creation.

But you say that you can see the "compression_type" on the image creation.
May be the patch wasn't fully applied? Or the test related modification 
were omitted?


I've just re-based the series on top of:

681b07f4e76dbb700c16918d (vanilla/master, mainstream)
Merge: a2261b2754 714eb0dbc5
Author: Peter Maydell 
Date:   Tue May 5 15:47:44 2020 +0100

and run the tests with 'make check-block'

and got the following:

Not run: 071 099 184 220 259 267
Some cases not run in: 030 040 041
Passed all 113 iotests

May be I do something wrong?

Denis

Re: [PATCH v2 3/4] backup: Make sure that source and target size match

2020-05-06 Thread Kevin Wolf

Am 06.05.2020 um 08:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 05.05.2020 13:03, Kevin Wolf wrote:
> > Am 30.04.2020 um 20:21 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 30.04.2020 17:27, Kevin Wolf wrote:
> > > > Since the introduction of a backup filter node in commit 00e30f05d, the
> > > > backup block job crashes when the target image is smaller than the
> > > > source image because it will try to write after the end of the target
> > > > node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer
> > > > would have caught this and errored out gracefully.)
> > > > 
> > > > We can fix this and even do better than the old behaviour: Check that
> > > > source and target have the same image size at the start of the block job
> > > > and unshare BLK_PERM_RESIZE. (This permission was already unshared
> > > > before the same commit 00e30f05d, but the BlockBackend that was used to
> > > > make the restriction was removed without a replacement.) This will
> > > > immediately error out when starting the job instead of only when writing
> > > > to a block that doesn't exist in the target.
> > > > 
> > > > Longer target than source would technically work because we would never
> > > > write to blocks that don't exist, but semantically these are invalid,
> > > > too, because a backup is supposed to create a copy, not just an image
> > > > that starts with a copy.
> > > > 
> > > > Fixes: 00e30f05de1d19586345ec373970ef4c192c6270
> > > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593
> > > > Cc: qemu-sta...@nongnu.org
> > > > Signed-off-by: Kevin Wolf 
> > > 
> > > I'm OK with it as is, as it fixes bug:
> > > 
> > > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > > 
> > > still, some notes below
> > > 
> > > 
> > > > ---
> > > >block/backup-top.c | 14 +-
> > > >block/backup.c | 14 +-
> > > >2 files changed, 22 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/block/backup-top.c b/block/backup-top.c
> > > > index 3b50c06e2c..79b268e6dc 100644
> > > > --- a/block/backup-top.c
> > > > +++ b/block/backup-top.c
> > > > @@ -148,8 +148,10 @@ static void backup_top_child_perm(BlockDriverState 
> > > > *bs, BdrvChild *c,
> > > > *
> > > > * Share write to target (child_file), to not interfere
> > > > * with guest writes to its disk which may be in target 
> > > > backing chain.
> > > > + * Can't resize during a backup block job because we check the 
> > > > size
> > > > + * only upfront.
> > > > */
> > > > -*nshared = BLK_PERM_ALL;
> > > > +*nshared = BLK_PERM_ALL & ~BLK_PERM_RESIZE;
> > > >*nperm = BLK_PERM_WRITE;
> > > >} else {
> > > >/* Source child */
> > > > @@ -159,7 +161,7 @@ static void backup_top_child_perm(BlockDriverState 
> > > > *bs, BdrvChild *c,
> > > >if (perm & BLK_PERM_WRITE) {
> > > >*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
> > > >}
> > > > -*nshared &= ~BLK_PERM_WRITE;
> > > > +*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
> > > >}
> > > >}
> > > > @@ -192,11 +194,13 @@ BlockDriverState 
> > > > *bdrv_backup_top_append(BlockDriverState *source,
> > > >{
> > > >Error *local_err = NULL;
> > > >BDRVBackupTopState *state;
> > > > -BlockDriverState *top = 
> > > > bdrv_new_open_driver(&bdrv_backup_top_filter,
> > > > - filter_node_name,
> > > > - BDRV_O_RDWR, errp);
> > > > +BlockDriverState *top;
> > > >bool appended = false;
> > > > +assert(source->total_sectors == target->total_sectors);
> > > 
> > > May be better to error-out, just to keep backup-top independent. Still, 
> > > now it's not
> > > really needed, as we have only one caller. And this function have to be 
> > > refactored
> > > anyway, when publishing this filter (open() and close() should appear, so 
> > > this code
> > > will be rewritten anyway.)
> > 
> > Yes, the whole function only works because it's used in this restricted
> > context today. For example, we only know that total_sectors is up to
> > date because the caller has called bdrv_getlength() just a moment ago.
> > 
> > I think fixing this would be beyond the scope of this patch, but
> > certainly a good idea anyway.
> > 
> > > And the other thought: the permissions we declared above, will be 
> > > activated only after
> > > successful bdrv_child_refresh_perms(). I think some kind of race is 
> > > possible, so that
> > > size is changed actual permission activation. So, may be good to double 
> > > check sizes after
> > > bdrv_child_refresh_perms().. But it's a kind of paranoia.
> > 
> > We're not in coroutine context, so we can't yield. I don't see who could
> > change the size in parallel (apart from an external process, but an
> > external process can mess up anything).
> >

[Bug 1866870] Re: KVM Guest pauses after upgrade to Ubuntu 20.04

2020-05-06 Thread Andreas Weller

Hi Christian.

Just filed bug: #1877052

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866870

Title:
  KVM Guest pauses after upgrade to Ubuntu 20.04

Status in QEMU:
  Invalid
Status in qemu package in Ubuntu:
  Invalid
Status in seabios package in Ubuntu:
  Fix Released

Bug description:
  Symptom:
  Error unpausing domain: internal error: unable to execute QEMU command 
'cont': Resetting the Virtual Machine is required

  Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in 
cb_wrapper
  callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
  callback(*args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 
66, in newfn
  ret = fn(self, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/domain.py", line 1311, in 
resume
  self._backend.resume()
File "/usr/lib/python3/dist-packages/libvirt.py", line 2174, in resume
  if ret == -1: raise libvirtError ('virDomainResume() failed', dom=self)
  libvirt.libvirtError: internal error: unable to execute QEMU command 'cont': 
Resetting the Virtual Machine is required

  
  ---

  As outlined here:
  https://bugs.launchpad.net/qemu/+bug/1813165/comments/15

  After upgrade, all KVM guests are in a default pause state. Even after
  forcing them off via virsh, and restarting them the guests are paused.

  These Guests are not nested.

  A lot of diganostic information are outlined in the previous bug
  report link provided. The solution mentioned in previous report had
  been allegedly integrated into the downstream updates.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1866870/+subscriptions

Re: [PATCH v7 0/7] reference implementation of RSS and hash report

2020-05-06 Thread Yuri Benditovich

I'll send v8 soon

Thanks,
Yuri

On Wed, May 6, 2020 at 8:37 AM Jason Wang  wrote:

>
> On 2020/5/1 下午12:01, Yuri Benditovich wrote:
> > Michael/Jason,
> >
> > As Linux headers was updated in qemu and now include RSC/RSS/Hash
> > definitions, please let me know what you prefer:
> > 1. You apply this series as is, then I submit clean-up series that
> > will remove all the redundant defines from virtio-net.c
> > 2. I post v8 of this series with cleanup of all the redundant defines
> > and also RSC ones
> > 3. Something other
>
>
> Hi Yuri:
>
> Though I've queued this series but consider we have new headers, I think
> it's better to post v8.
>
> Thanks
>
>

Re: [PATCH v1 3/4] .cirrus.yml: bump FreeBSD to the current stable release

2020-05-06 Thread Alex Bennée



Li-Wen Hsu  writes:

> On Fri, May 1, 2020 at 7:15 PM Alex Bennée  wrote:
>>
>> Hopefully this will un-stick the test which has been broken for a long
>> time.
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  .cirrus.yml | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/.cirrus.yml b/.cirrus.yml
>> index 90645fede6..f06f5af2b9 100644
>> --- a/.cirrus.yml
>> +++ b/.cirrus.yml
>> @@ -3,7 +3,7 @@ env:
>>
>>  freebsd_12_task:
>>freebsd_instance:
>> -image: freebsd-12-0-release-amd64
>> +image_family: freebsd-12-1
>>  cpu: 8
>>  memory: 8G
>>install_script: pkg install -y
>> --
>> 2.20.1
>>
>
> Reviewed-by: Li-Wen Hsu 
> Tested-by: Li-Wen Hsu 
>
> I would be nice to also add this patch:
> https://github.com/lwhsu/qemu/commit/ac699f79b4d86d8195d76c3befada65ade449cc0.patch
> To prevent problems in the future.

Done. I took the liberty of apply your s-o-b tag as it wasn't on the
github commit but it came from your repo.

>
> The error was due to the pkg version got "fixed" when building image,
> and was too old when VM got provisioned, then it cannot be not
> compatible with the package repository. Ref:
> https://lists.freebsd.org/pipermail/freebsd-cloud/2020-April/000234.html
>
> Best,
> Li-Wen


-- 
Alex Bennée

[PATCH] replay: synchronize on every virtual timer callback

2020-05-06 Thread Pavel Dovgalyuk

Sometimes virtual timer callbacks depend on order
of virtual timer processing and warping of virtual clock.
Therefore every callback should be logged to make replay deterministic.
This patch creates a checkpoint before every virtual timer callback.
With these checkpoints virtual timers processing and clock warping
events order is completely deterministic.

Signed-off-by: Pavel Dovgalyuk 
---
 util/qemu-timer.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index d548d3c1ad..47833f338f 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
 qemu_mutex_lock(&timer_list->active_timers_lock);
 
 progress = true;
+/*
+ * Callback may insert new checkpoints, therefore add new checkpoint
+ * for the virtual timers.
+ */
+need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL;
 }
 qemu_mutex_unlock(&timer_list->active_timers_lock);

[Bug 1877052] [NEW] KVM Win 10 guest pauses after kernel upgrade

2020-05-06 Thread Andreas Weller

Public bug reported:

Hello!
Unfortunately the bug has apparently reappeared. I have a Windows 10 running in 
a VM, which after my today's "apt upgrade" goes into pause mode after a few 
seconds of running time.

Until yesterday it used to work and I was able to boot the VM. During
the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active
and then went into pause mode. Even after a reboot of my host system the
problem still persists: the VM boots for a few seconds and then switches
to pause mode.

Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed
Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Maybe relevant logfile lines:
2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from 
pid 1593 (/usr/sbin/libvirtd)
2020-05-06 07:47:23.101+: shutting down, reason=destroyed


Kind regards,
   Andreas

** Affects: qemu
 Importance: Undecided
 Status: New

** Affects: qemu (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1877052

Title:
  KVM Win 10 guest pauses after kernel upgrade

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  Hello!
  Unfortunately the bug has apparently reappeared. I have a Windows 10 running 
in a VM, which after my today's "apt upgrade" goes into pause mode after a few 
seconds of running time.

  Until yesterday it used to work and I was able to boot the VM. During
  the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active
  and then went into pause mode. Even after a reboot of my host system
  the problem still persists: the VM boots for a few seconds and then
  switches to pause mode.

  Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP
  Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  Maybe relevant logfile lines:
  2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from 
pid 1593 (/usr/sbin/libvirtd)
  2020-05-06 07:47:23.101+: shutting down, reason=destroyed


  Kind regards,
     Andreas

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions

[Bug 1877052] Re: KVM Win 10 guest pauses after kernel upgrade

2020-05-06 Thread Andreas Weller

** Attachment added: "VM's XML configuration file"
   
https://bugs.launchpad.net/qemu/+bug/1877052/+attachment/5367249/+files/win10.xml

** Also affects: qemu (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1877052

Title:
  KVM Win 10 guest pauses after kernel upgrade

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  Hello!
  Unfortunately the bug has apparently reappeared. I have a Windows 10 running 
in a VM, which after my today's "apt upgrade" goes into pause mode after a few 
seconds of running time.

  Until yesterday it used to work and I was able to boot the VM. During
  the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active
  and then went into pause mode. Even after a reboot of my host system
  the problem still persists: the VM boots for a few seconds and then
  switches to pause mode.

  Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP
  Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  Maybe relevant logfile lines:
  2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from 
pid 1593 (/usr/sbin/libvirtd)
  2020-05-06 07:47:23.101+: shutting down, reason=destroyed


  Kind regards,
     Andreas

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions

[Bug 1877052] Re: KVM Win 10 guest pauses after kernel upgrade

2020-05-06 Thread Andreas Weller

** Attachment added: "Libvirt Logfile"
   
https://bugs.launchpad.net/qemu/+bug/1877052/+attachment/5367244/+files/win10.log

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1877052

Title:
  KVM Win 10 guest pauses after kernel upgrade

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  Hello!
  Unfortunately the bug has apparently reappeared. I have a Windows 10 running 
in a VM, which after my today's "apt upgrade" goes into pause mode after a few 
seconds of running time.

  Until yesterday it used to work and I was able to boot the VM. During
  the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active
  and then went into pause mode. Even after a reboot of my host system
  the problem still persists: the VM boots for a few seconds and then
  switches to pause mode.

  Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP
  Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  Maybe relevant logfile lines:
  2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from 
pid 1593 (/usr/sbin/libvirtd)
  2020-05-06 07:47:23.101+: shutting down, reason=destroyed


  Kind regards,
     Andreas

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions

Re: [PATCH v2 3/4] backup: Make sure that source and target size match

2020-05-06 Thread Vladimir Sementsov-Ogievskiy


06.05.2020 11:02, Kevin Wolf wrote:

Am 06.05.2020 um 08:07 hat Vladimir Sementsov-Ogievskiy geschrieben:

05.05.2020 13:03, Kevin Wolf wrote:

Am 30.04.2020 um 20:21 hat Vladimir Sementsov-Ogievskiy geschrieben:

30.04.2020 17:27, Kevin Wolf wrote:

Since the introduction of a backup filter node in commit 00e30f05d, the
backup block job crashes when the target image is smaller than the
source image because it will try to write after the end of the target
node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer
would have caught this and errored out gracefully.)

We can fix this and even do better than the old behaviour: Check that
source and target have the same image size at the start of the block job
and unshare BLK_PERM_RESIZE. (This permission was already unshared
before the same commit 00e30f05d, but the BlockBackend that was used to
make the restriction was removed without a replacement.) This will
immediately error out when starting the job instead of only when writing
to a block that doesn't exist in the target.

Longer target than source would technically work because we would never
write to blocks that don't exist, but semantically these are invalid,
too, because a backup is supposed to create a copy, not just an image
that starts with a copy.

Fixes: 00e30f05de1d19586345ec373970ef4c192c6270
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593
Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 


I'm OK with it as is, as it fixes bug:

Reviewed-by: Vladimir Sementsov-Ogievskiy 

still, some notes below



---
block/backup-top.c | 14 +-
block/backup.c | 14 +-
2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/block/backup-top.c b/block/backup-top.c
index 3b50c06e2c..79b268e6dc 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -148,8 +148,10 @@ static void backup_top_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 *
 * Share write to target (child_file), to not interfere
 * with guest writes to its disk which may be in target backing 
chain.
+ * Can't resize during a backup block job because we check the size
+ * only upfront.
 */
-*nshared = BLK_PERM_ALL;
+*nshared = BLK_PERM_ALL & ~BLK_PERM_RESIZE;
*nperm = BLK_PERM_WRITE;
} else {
/* Source child */
@@ -159,7 +161,7 @@ static void backup_top_child_perm(BlockDriverState *bs, 
BdrvChild *c,
if (perm & BLK_PERM_WRITE) {
*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
}
-*nshared &= ~BLK_PERM_WRITE;
+*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
}
}
@@ -192,11 +194,13 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
{
Error *local_err = NULL;
BDRVBackupTopState *state;
-BlockDriverState *top = bdrv_new_open_driver(&bdrv_backup_top_filter,
- filter_node_name,
- BDRV_O_RDWR, errp);
+BlockDriverState *top;
bool appended = false;
+assert(source->total_sectors == target->total_sectors);


May be better to error-out, just to keep backup-top independent. Still, now 
it's not
really needed, as we have only one caller. And this function have to be 
refactored
anyway, when publishing this filter (open() and close() should appear, so this 
code
will be rewritten anyway.)


Yes, the whole function only works because it's used in this restricted
context today. For example, we only know that total_sectors is up to
date because the caller has called bdrv_getlength() just a moment ago.

I think fixing this would be beyond the scope of this patch, but
certainly a good idea anyway.


And the other thought: the permissions we declared above, will be activated 
only after
successful bdrv_child_refresh_perms(). I think some kind of race is possible, 
so that
size is changed actual permission activation. So, may be good to double check 
sizes after
bdrv_child_refresh_perms().. But it's a kind of paranoia.


We're not in coroutine context, so we can't yield. I don't see who could
change the size in parallel (apart from an external process, but an
external process can mess up anything).

When we make backup-top an independent driver, instead of double
checking (what would you do on error?), maybe we could move the size
initialisation (then with bdrv_getlength()) to after
bdrv_child_refresh_perms().


Also, third thought: the restricted permissions doesn't save us from resizing
of the source through exactly this node, does it? Hmm, but your test works 
somehow. But
(I assume) it worked in a previous patch version without unsharing on source..

Ha, but bdrv_co_truncate just can't work on backup-top, because it doesn't have 
file child.
But, if we fix bdrv_co_truncate to skip filters, we'll need to define 
.bdrv_co_truncate in
backup_top, which will return

[Bug 1877052] Re: KVM Win 10 guest pauses after kernel upgrade

2020-05-06 Thread Andreas Weller

** Attachment added: "History.log from apt"
   
https://bugs.launchpad.net/qemu/+bug/1877052/+attachment/5367245/+files/history.log

** Description changed:

- 
  Hello!
  Unfortunately the bug has apparently reappeared. I have a Windows 10 running 
in a VM, which after my today's "apt upgrade" goes into pause mode after a few 
seconds of running time.
  
  Until yesterday it used to work and I was able to boot the VM. During
  the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active
  and then went into pause mode. Even after a reboot of my host system the
  problem still persists: the VM boots for a few seconds and then switches
  to pause mode.
  
+ Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed
+ Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
+ 
  
  Kind regards,
-Andreas
+    Andreas

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1877052

Title:
  KVM Win 10 guest pauses after kernel upgrade

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  Hello!
  Unfortunately the bug has apparently reappeared. I have a Windows 10 running 
in a VM, which after my today's "apt upgrade" goes into pause mode after a few 
seconds of running time.

  Until yesterday it used to work and I was able to boot the VM. During
  the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active
  and then went into pause mode. Even after a reboot of my host system
  the problem still persists: the VM boots for a few seconds and then
  switches to pause mode.

  Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP
  Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  Maybe relevant logfile lines:
  2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from 
pid 1593 (/usr/sbin/libvirtd)
  2020-05-06 07:47:23.101+: shutting down, reason=destroyed


  Kind regards,
     Andreas

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions

[Bug 1877052] Re: KVM Win 10 guest pauses after kernel upgrade

2020-05-06 Thread Andreas Weller

** Attachment added: "Dpkg -l Output"
   
https://bugs.launchpad.net/qemu/+bug/1877052/+attachment/5367247/+files/dpkg-list.txt

** Description changed:

  Hello!
  Unfortunately the bug has apparently reappeared. I have a Windows 10 running 
in a VM, which after my today's "apt upgrade" goes into pause mode after a few 
seconds of running time.
  
  Until yesterday it used to work and I was able to boot the VM. During
  the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active
  and then went into pause mode. Even after a reboot of my host system the
  problem still persists: the VM boots for a few seconds and then switches
  to pause mode.
  
  Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP Wed
  Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  
+ Maybe relevant logfile lines:
+ 2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
+ 2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
+ 2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
+ 2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
+ 2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from 
pid 1593 (/usr/sbin/libvirtd)
+ 2020-05-06 07:47:23.101+: shutting down, reason=destroyed
+ 
  
  Kind regards,
     Andreas

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1877052

Title:
  KVM Win 10 guest pauses after kernel upgrade

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  Hello!
  Unfortunately the bug has apparently reappeared. I have a Windows 10 running 
in a VM, which after my today's "apt upgrade" goes into pause mode after a few 
seconds of running time.

  Until yesterday it used to work and I was able to boot the VM. During
  the kernel update (from 5.4.0-28.33 to 5.4.0-29.34) the VM was active
  and then went into pause mode. Even after a reboot of my host system
  the problem still persists: the VM boots for a few seconds and then
  switches to pause mode.

  Current Kernel: Linux andreas-laptop 5.4.0-29-generic #33-Ubuntu SMP
  Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  Maybe relevant logfile lines:
  2020-05-06T07:46:42.857574Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.857718Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:46:42.860567Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
  2020-05-06T07:46:42.860582Z qemu-system-x86_64: warning: host doesn't support 
requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
  2020-05-06T07:47:22.901057Z qemu-system-x86_64: terminating on signal 15 from 
pid 1593 (/usr/sbin/libvirtd)
  2020-05-06 07:47:23.101+: shutting down, reason=destroyed


  Kind regards,
     Andreas

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1877052/+subscriptions

Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-05-06 Thread Yan Zhao

On Mon, May 04, 2020 at 11:58:56PM +0800, Kirti Wankhede wrote:
> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Fixed below error by changing pgsize type from uint64_t to size_t.
> Reported-by: kbuild test robot 
> 
> All errors:
> drivers/vfio/vfio_iommu_type1.c:197: undefined reference to `__udivdi3'
> 
> drivers/vfio/vfio_iommu_type1.c:225: undefined reference to `__udivdi3'
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 266 
> +++-
>  1 file changed, 260 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index fa735047b04d..01dcb417836f 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -71,6 +71,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
> + booldirty_page_tracking;
>  };
>  
>  struct vfio_domain {
> @@ -91,6 +92,7 @@ struct vfio_dma {
>   boollock_cap;   /* capable(CAP_IPC_LOCK) */
>   struct task_struct  *task;
>   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> + unsigned long   *bitmap;
>  };
>  
>  struct vfio_group {
> @@ -125,7 +127,21 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(&iommu->domain_list))
>  
> +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> BITS_PER_BYTE)
> +
> +/*
> + * Input argument of number of bits to bitmap_set() is unsigned integer, 
> which
> + * further casts to signed integer for unaligned multi-bit operation,
> + * __bitmap_set().
> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> + * system.
> + */
> +#define DIRTY_BITMAP_PAGES_MAX((u64)INT_MAX)
> +#define DIRTY_BITMAP_SIZE_MAX 
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> +
>  static int put_pfn(unsigned long pfn, int prot);
> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
>  /*
>   * This code handles mapping and unmapping of user data buffers
> @@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(&old->node, &iommu->dma_list);
>  }
>  
> +
> +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, size_t pgsize)
> +{
> + uint64_t npages = dma->size / pgsize;
> +
> + if (npages > DIRTY_BITMAP_PAGES_MAX)
> + return -EINVAL;
> +
> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> + if (!dma->bitmap)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free(struct vfio_dma *dma)
> +{
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> +}
> +
> +static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize)
> +{
> + struct rb_node *p;
> +
> + if (RB_EMPTY_ROOT(&dma->pfn_list))
> + return;
> +
> + for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
> +
> + bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
> + }
> +}
> +
> +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t pgsize)
> +{
> + struct rb_node *n = rb_first(&iommu->dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> + int ret;
> +
> + ret = vfio_dma_bitmap_alloc(dma, pgsize);
> + if (ret) {
> + struct rb_node *p = rb_prev(n);
> +
> + for (; p; p = rb_prev(p)) {
> + struct vfio_dma *dma = rb_entry(n,
> + struct vfio_dma, node);
> +
> + vfio_dma_bitmap_free(dma);
> + }
> +

[PATCH v2 1/9] hw/net/xilinx_axienet: Auto-clear PHY Autoneg

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Auto-clear PHY CR Autoneg bits. This makes this model
work with recent Linux kernels.

Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
 hw/net/xilinx_axienet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 704788811a..0f97510d8a 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -149,8 +149,8 @@ tdk_write(struct PHY *phy, unsigned int req, unsigned int 
data)
 break;
 }
 
-/* Unconditionally clear regs[BMCR][BMCR_RESET] */
-phy->regs[0] &= ~0x8000;
+/* Unconditionally clear regs[BMCR][BMCR_RESET] and auto-neg */
+phy->regs[0] &= ~0x8200;
 }
 
 static void
-- 
2.20.1

[PATCH v2 0/9] hw/core: stream: Add end-of-packet flag

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Hi,

When modeling pipelines of processing nodes that communicate
through streaming interfaces (e.g AXI-Stream), some of these
nodes send packets while others may just stream unpacketized data.

The purpose of this series is to add an end-of-packet flag, e.g
what AXI-Stream calls tlast. This is in preparation for modeling
future nodes that may use huge packets that we wouldn't be able
to buffer and also to handle nodes that don't use packets.

Along the way I fixed a few things in the petalinux-ml605 eth setup.

Cheers,
Edgar

ChangeLog:

v1 -> v2:
* Check that packets fit c_txmem

Edgar E. Iglesias (9):
  hw/net/xilinx_axienet: Auto-clear PHY Autoneg
  hw/net/xilinx_axienet: Cleanup stream->push assignment
  hw/net/xilinx_axienet: Remove unncessary cast
  hw/dma/xilinx_axidma: Add DMA memory-region property
  hw/core: stream: Add an end-of-packet flag
  hw/net/xilinx_axienet: Handle fragmented packets from DMA
  hw/dma/xilinx_axidma: mm2s: Stream descriptor by descriptor
  hw/dma/xilinx_axidma: s2mm: Support stream fragments
  MAINTAINERS: Add myself as streams maintainer

 include/hw/stream.h |  5 +--
 hw/core/stream.c|  4 +--
 hw/dma/xilinx_axidma.c  | 75 ++---
 hw/net/xilinx_axienet.c | 70 --
 hw/ssi/xilinx_spips.c   |  2 +-
 MAINTAINERS |  6 
 6 files changed, 113 insertions(+), 49 deletions(-)

-- 
2.20.1

[PATCH v2 8/9] hw/dma/xilinx_axidma: s2mm: Support stream fragments

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Add support for stream fragments.

Reviewed-by: Alistair Francis 
Signed-off-by: Edgar E. Iglesias 
---
 hw/dma/xilinx_axidma.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index 101d32a965..87be9cade7 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -110,6 +110,7 @@ struct Stream {
 
 int nr;
 
+bool sof;
 struct SDesc desc;
 unsigned int complete_cnt;
 uint32_t regs[R_MAX];
@@ -174,6 +175,7 @@ static void stream_reset(struct Stream *s)
 {
 s->regs[R_DMASR] = DMASR_HALTED;  /* starts up halted.  */
 s->regs[R_DMACR] = 1 << 16; /* Starts with one in compl threshold.  */
+s->sof = true;
 }
 
 /* Map an offset addr into a channel index.  */
@@ -321,12 +323,11 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
 }
 
 static size_t stream_process_s2mem(struct Stream *s, unsigned char *buf,
-   size_t len)
+   size_t len, bool eop)
 {
 uint32_t prev_d;
 unsigned int rxlen;
 size_t pos = 0;
-int sof = 1;
 
 if (!stream_running(s) || stream_idle(s)) {
 return 0;
@@ -352,16 +353,16 @@ static size_t stream_process_s2mem(struct Stream *s, 
unsigned char *buf,
 pos += rxlen;
 
 /* Update the descriptor.  */
-if (!len) {
+if (eop) {
 stream_complete(s);
 memcpy(s->desc.app, s->app, sizeof(s->desc.app));
 s->desc.status |= SDESC_STATUS_EOF;
 }
 
-s->desc.status |= sof << SDESC_STATUS_SOF_BIT;
+s->desc.status |= s->sof << SDESC_STATUS_SOF_BIT;
 s->desc.status |= SDESC_STATUS_COMPLETE;
 stream_desc_store(s, s->regs[R_CURDESC]);
-sof = 0;
+s->sof = eop;
 
 /* Advance.  */
 prev_d = s->regs[R_CURDESC];
@@ -426,8 +427,7 @@ xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned 
char *buf, size_t len,
 struct Stream *s = &ds->dma->streams[1];
 size_t ret;
 
-assert(eop);
-ret = stream_process_s2mem(s, buf, len);
+ret = stream_process_s2mem(s, buf, len, eop);
 stream_update_irq(s);
 return ret;
 }
-- 
2.20.1

[PATCH v2 2/9] hw/net/xilinx_axienet: Cleanup stream->push assignment

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Split the shared stream_class_init function to assign
stream->push with better type-safety.

Reviewed-by: Alistair Francis 
Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
 hw/net/xilinx_axienet.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 0f97510d8a..84073753d7 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -1029,11 +1029,19 @@ static void xilinx_enet_class_init(ObjectClass *klass, 
void *data)
 dc->reset = xilinx_axienet_reset;
 }
 
-static void xilinx_enet_stream_class_init(ObjectClass *klass, void *data)
+static void xilinx_enet_control_stream_class_init(ObjectClass *klass,
+  void *data)
 {
 StreamSlaveClass *ssc = STREAM_SLAVE_CLASS(klass);
 
-ssc->push = data;
+ssc->push = xilinx_axienet_control_stream_push;
+}
+
+static void xilinx_enet_data_stream_class_init(ObjectClass *klass, void *data)
+{
+StreamSlaveClass *ssc = STREAM_SLAVE_CLASS(klass);
+
+ssc->push = xilinx_axienet_data_stream_push;
 }
 
 static const TypeInfo xilinx_enet_info = {
@@ -1048,8 +1056,7 @@ static const TypeInfo xilinx_enet_data_stream_info = {
 .name  = TYPE_XILINX_AXI_ENET_DATA_STREAM,
 .parent= TYPE_OBJECT,
 .instance_size = sizeof(struct XilinxAXIEnetStreamSlave),
-.class_init= xilinx_enet_stream_class_init,
-.class_data= xilinx_axienet_data_stream_push,
+.class_init= xilinx_enet_data_stream_class_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_STREAM_SLAVE },
 { }
@@ -1060,8 +1067,7 @@ static const TypeInfo xilinx_enet_control_stream_info = {
 .name  = TYPE_XILINX_AXI_ENET_CONTROL_STREAM,
 .parent= TYPE_OBJECT,
 .instance_size = sizeof(struct XilinxAXIEnetStreamSlave),
-.class_init= xilinx_enet_stream_class_init,
-.class_data= xilinx_axienet_control_stream_push,
+.class_init= xilinx_enet_control_stream_class_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_STREAM_SLAVE },
 { }
-- 
2.20.1

[PATCH v2 3/9] hw/net/xilinx_axienet: Remove unncessary cast

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Remove unncessary cast, buf is already uint8_t *.
No functional change.

Reviewed-by: Alistair Francis 
Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
 hw/net/xilinx_axienet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 84073753d7..c8dfcda3ee 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -918,7 +918,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t 
*buf, size_t size)
 uint16_t csum;
 
 tmp_csum = net_checksum_add(size - start_off,
-(uint8_t *)buf + start_off);
+buf + start_off);
 /* Accumulate the seed.  */
 tmp_csum += s->hdr[2] & 0x;
 
-- 
2.20.1

[PATCH v2 9/9] MAINTAINERS: Add myself as streams maintainer

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Since we're missing a maintainer, add myself.

Reviewed-by: Alistair Francis 
Signed-off-by: Edgar E. Iglesias 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1f84e3ae2c..d3663d6c9a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2315,6 +2315,12 @@ F: net/slirp.c
 F: include/net/slirp.h
 T: git https://people.debian.org/~sthibault/qemu.git slirp
 
+Streams
+M: Edgar E. Iglesias 
+S: Maintained
+F: hw/core/stream.c
+F: include/hw/stream.h
+
 Stubs
 M: Paolo Bonzini 
 S: Maintained
-- 
2.20.1

[PATCH v2 5/9] hw/core: stream: Add an end-of-packet flag

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Some stream clients stream an endless stream of data while
other clients stream data in packets. Stream interfaces
usually have a way to signal the end of a packet or the
last beat of a transfer.

This adds an end-of-packet flag to the push interface.

Reviewed-by: Alistair Francis 
Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
 include/hw/stream.h |  5 +++--
 hw/core/stream.c|  4 ++--
 hw/dma/xilinx_axidma.c  | 10 ++
 hw/net/xilinx_axienet.c | 14 ++
 hw/ssi/xilinx_spips.c   |  2 +-
 5 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/hw/stream.h b/include/hw/stream.h
index d02f62ca89..ed09e83683 100644
--- a/include/hw/stream.h
+++ b/include/hw/stream.h
@@ -39,12 +39,13 @@ typedef struct StreamSlaveClass {
  * @obj: Stream slave to push to
  * @buf: Data to write
  * @len: Maximum number of bytes to write
+ * @eop: End of packet flag
  */
-size_t (*push)(StreamSlave *obj, unsigned char *buf, size_t len);
+size_t (*push)(StreamSlave *obj, unsigned char *buf, size_t len, bool eop);
 } StreamSlaveClass;
 
 size_t
-stream_push(StreamSlave *sink, uint8_t *buf, size_t len);
+stream_push(StreamSlave *sink, uint8_t *buf, size_t len, bool eop);
 
 bool
 stream_can_push(StreamSlave *sink, StreamCanPushNotifyFn notify,
diff --git a/hw/core/stream.c b/hw/core/stream.c
index 39b1e595cd..a65ad1208d 100644
--- a/hw/core/stream.c
+++ b/hw/core/stream.c
@@ -3,11 +3,11 @@
 #include "qemu/module.h"
 
 size_t
-stream_push(StreamSlave *sink, uint8_t *buf, size_t len)
+stream_push(StreamSlave *sink, uint8_t *buf, size_t len, bool eop)
 {
 StreamSlaveClass *k =  STREAM_SLAVE_GET_CLASS(sink);
 
-return k->push(sink, buf, len);
+return k->push(sink, buf, len, eop);
 }
 
 bool
diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index 4540051448..a770e12c96 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -283,7 +283,7 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
 
 if (stream_desc_sof(&s->desc)) {
 s->pos = 0;
-stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app));
+stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app), 
true);
 }
 
 txlen = s->desc.control & SDESC_CTRL_LEN_MASK;
@@ -298,7 +298,7 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
 s->pos += txlen;
 
 if (stream_desc_eof(&s->desc)) {
-stream_push(tx_data_dev, s->txbuf, s->pos);
+stream_push(tx_data_dev, s->txbuf, s->pos, true);
 s->pos = 0;
 stream_complete(s);
 }
@@ -384,7 +384,7 @@ static void xilinx_axidma_reset(DeviceState *dev)
 
 static size_t
 xilinx_axidma_control_stream_push(StreamSlave *obj, unsigned char *buf,
-  size_t len)
+  size_t len, bool eop)
 {
 XilinxAXIDMAStreamSlave *cs = XILINX_AXI_DMA_CONTROL_STREAM(obj);
 struct Stream *s = &cs->dma->streams[1];
@@ -416,12 +416,14 @@ xilinx_axidma_data_stream_can_push(StreamSlave *obj,
 }
 
 static size_t
-xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t 
len)
+xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t 
len,
+   bool eop)
 {
 XilinxAXIDMAStreamSlave *ds = XILINX_AXI_DMA_DATA_STREAM(obj);
 struct Stream *s = &ds->dma->streams[1];
 size_t ret;
 
+assert(eop);
 ret = stream_process_s2mem(s, buf, len);
 stream_update_irq(s);
 return ret;
diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index c8dfcda3ee..bd48305577 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -697,14 +697,14 @@ static void axienet_eth_rx_notify(void *opaque)
axienet_eth_rx_notify, s)) {
 size_t ret = stream_push(s->tx_control_dev,
  (void *)s->rxapp + CONTROL_PAYLOAD_SIZE
- - s->rxappsize, s->rxappsize);
+ - s->rxappsize, s->rxappsize, true);
 s->rxappsize -= ret;
 }
 
 while (s->rxsize && stream_can_push(s->tx_data_dev,
 axienet_eth_rx_notify, s)) {
 size_t ret = stream_push(s->tx_data_dev, (void *)s->rxmem + s->rxpos,
- s->rxsize);
+ s->rxsize, true);
 s->rxsize -= ret;
 s->rxpos += ret;
 if (!s->rxsize) {
@@ -874,12 +874,14 @@ static ssize_t eth_rx(NetClientState *nc, const uint8_t 
*buf, size_t size)
 }
 
 static size_t
-xilinx_axienet_control_stream_push(StreamSlave *obj, uint8_t *buf, size_t len)
+xilinx_axienet_control_stream_push(StreamSlave *obj, uint8_t *buf, size_t len,
+   bool eop)
 {
 int i;
 Xi

[PATCH v2 4/9] hw/dma/xilinx_axidma: Add DMA memory-region property

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Add DMA memory-region property to externally control what
address-space this DMA operates on.

Reviewed-by: Alistair Francis 
Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
 hw/dma/xilinx_axidma.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index 018f36991b..4540051448 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -33,6 +33,7 @@
 #include "qemu/log.h"
 #include "qemu/module.h"
 
+#include "sysemu/dma.h"
 #include "hw/stream.h"
 
 #define D(x)
@@ -103,6 +104,7 @@ enum {
 };
 
 struct Stream {
+struct XilinxAXIDMA *dma;
 ptimer_state *ptimer;
 qemu_irq irq;
 
@@ -125,6 +127,9 @@ struct XilinxAXIDMAStreamSlave {
 struct XilinxAXIDMA {
 SysBusDevice busdev;
 MemoryRegion iomem;
+MemoryRegion *dma_mr;
+AddressSpace as;
+
 uint32_t freqhz;
 StreamSlave *tx_data_dev;
 StreamSlave *tx_control_dev;
@@ -186,7 +191,7 @@ static void stream_desc_load(struct Stream *s, hwaddr addr)
 {
 struct SDesc *d = &s->desc;
 
-cpu_physical_memory_read(addr, d, sizeof *d);
+address_space_read(&s->dma->as, addr, MEMTXATTRS_UNSPECIFIED, d, sizeof 
*d);
 
 /* Convert from LE into host endianness.  */
 d->buffer_address = le64_to_cpu(d->buffer_address);
@@ -204,7 +209,8 @@ static void stream_desc_store(struct Stream *s, hwaddr addr)
 d->nxtdesc = cpu_to_le64(d->nxtdesc);
 d->control = cpu_to_le32(d->control);
 d->status = cpu_to_le32(d->status);
-cpu_physical_memory_write(addr, d, sizeof *d);
+address_space_write(&s->dma->as, addr, MEMTXATTRS_UNSPECIFIED,
+d, sizeof *d);
 }
 
 static void stream_update_irq(struct Stream *s)
@@ -286,8 +292,9 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
  txlen + s->pos);
 }
 
-cpu_physical_memory_read(s->desc.buffer_address,
- s->txbuf + s->pos, txlen);
+address_space_read(&s->dma->as, s->desc.buffer_address,
+   MEMTXATTRS_UNSPECIFIED,
+   s->txbuf + s->pos, txlen);
 s->pos += txlen;
 
 if (stream_desc_eof(&s->desc)) {
@@ -336,7 +343,8 @@ static size_t stream_process_s2mem(struct Stream *s, 
unsigned char *buf,
 rxlen = len;
 }
 
-cpu_physical_memory_write(s->desc.buffer_address, buf + pos, rxlen);
+address_space_write(&s->dma->as, s->desc.buffer_address,
+MEMTXATTRS_UNSPECIFIED, buf + pos, rxlen);
 len -= rxlen;
 pos += rxlen;
 
@@ -525,6 +533,7 @@ static void xilinx_axidma_realize(DeviceState *dev, Error 
**errp)
 XilinxAXIDMAStreamSlave *cs = XILINX_AXI_DMA_CONTROL_STREAM(
 
&s->rx_control_dev);
 Error *local_err = NULL;
+int i;
 
 object_property_add_link(OBJECT(ds), "dma", TYPE_XILINX_AXI_DMA,
  (Object **)&ds->dma,
@@ -545,17 +554,19 @@ static void xilinx_axidma_realize(DeviceState *dev, Error 
**errp)
 goto xilinx_axidma_realize_fail;
 }
 
-int i;
-
 for (i = 0; i < 2; i++) {
 struct Stream *st = &s->streams[i];
 
+st->dma = s;
 st->nr = i;
 st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_DEFAULT);
 ptimer_transaction_begin(st->ptimer);
 ptimer_set_freq(st->ptimer, s->freqhz);
 ptimer_transaction_commit(st->ptimer);
 }
+
+address_space_init(&s->as,
+   s->dma_mr ? s->dma_mr : get_system_memory(), "dma");
 return;
 
 xilinx_axidma_realize_fail:
@@ -575,6 +586,11 @@ static void xilinx_axidma_init(Object *obj)
 &s->rx_control_dev, sizeof(s->rx_control_dev),
 TYPE_XILINX_AXI_DMA_CONTROL_STREAM, &error_abort,
 NULL);
+object_property_add_link(obj, "dma", TYPE_MEMORY_REGION,
+ (Object **)&s->dma_mr,
+ qdev_prop_allow_set_link_before_realize,
+ OBJ_PROP_LINK_STRONG,
+ &error_abort);
 
 sysbus_init_irq(sbd, &s->streams[0].irq);
 sysbus_init_irq(sbd, &s->streams[1].irq);
-- 
2.20.1

Re: [PATCH v23 0/4] implement zstd cluster compression method

2020-05-06 Thread Vladimir Sementsov-Ogievskiy


06.05.2020 11:01, Denis Plotnikov wrote:



On 05.05.2020 15:03, Max Reitz wrote:

On 05.05.20 12:26, Max Reitz wrote:

On 30.04.20 12:19, Denis Plotnikov wrote:

v23:
    Undecided: whether to add zstd(zlib) compression
   details to the qcow2 spec
    03: tighten assertion on zstd decompression [Eric]
    04: use _rm_test_img appropriately [Max]

Thanks, applied to my block branch:

I’m afraid I have to unqueue this series again, because it makes many
iotests fail due to an additional “compression_type=zlib” output when
images are created, an additional “compression type” line in
qemu-img info output where format-specific information is not
suppressed, and an additional line in qemu-img create -f qcow2 -o help.

Max



Hmm, this is strange. I made some modifications for the tests
in 0001 of the series (qcow2: introduce compression type feature).

Among the other test related changes, the patch contains the hunk:

+++ b/tests/qemu-iotests/common.filter
@@ -152,7 +152,8 @@ _filter_img_create()
  -e "s# refcount_bits=[0-9]\\+##g" \
  -e "s# key-secret=[a-zA-Z0-9]\\+##g" \
  -e "s# iter-time=[0-9]\\+##g" \
-    -e "s# force_size=\\(on\\|off\\)##g"
+    -e "s# force_size=\\(on\\|off\\)##g" \
+    -e "s# compression_type=[a-zA-Z0-9]\\+##g"
  }

which has to filter out "compression_type" on image creation.

But you say that you can see the "compression_type" on the image creation.
May be the patch wasn't fully applied? Or the test related modification were 
omitted?

I've just re-based the series on top of:

681b07f4e76dbb700c16918d (vanilla/master, mainstream)
Merge: a2261b2754 714eb0dbc5
Author: Peter Maydell 
Date:   Tue May 5 15:47:44 2020 +0100

and run the tests with 'make check-block'

and got the following:

Not run: 071 099 184 220 259 267
Some cases not run in: 030 040 041
Passed all 113 iotests



Hmm, we definitely have a lot more iotests. So, I assume make check-block 
doesn't run all of them.

To run all:

cd tests/qemu-iotests
./check -qcow2
./check -raw
... and so for any format you want to test

I also recommend to do
  export TEST_DIR=/something
before running tests, where something is tmpfs or ssd, which will decrease 
testing time a lot. Still, some (very small subset) of tests doesn't run on 
tmpfs, you can rerun them with TEST_DIR=/normal/hdd/directory

--
Best regards,
Vladimir

[PATCH v2 6/9] hw/net/xilinx_axienet: Handle fragmented packets from DMA

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Add support for fragmented packets from the DMA.

Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
 hw/net/xilinx_axienet.c | 38 +++---
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index bd48305577..498afe2f54 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -402,6 +402,9 @@ struct XilinxAXIEnet {
 
 uint32_t hdr[CONTROL_PAYLOAD_WORDS];
 
+uint8_t *txmem;
+uint32_t txpos;
+
 uint8_t *rxmem;
 uint32_t rxsize;
 uint32_t rxpos;
@@ -421,6 +424,7 @@ static void axienet_rx_reset(XilinxAXIEnet *s)
 static void axienet_tx_reset(XilinxAXIEnet *s)
 {
 s->tc = TC_JUM | TC_TX | TC_VLAN;
+s->txpos = 0;
 }
 
 static inline int axienet_rx_resetting(XilinxAXIEnet *s)
@@ -902,17 +906,35 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t 
*buf, size_t size,
 XilinxAXIEnetStreamSlave *ds = XILINX_AXI_ENET_DATA_STREAM(obj);
 XilinxAXIEnet *s = ds->enet;
 
-/* We don't support fragmented packets yet.  */
-assert(eop);
-
 /* TX enable ?  */
 if (!(s->tc & TC_TX)) {
 return size;
 }
 
+if (s->txpos + size > s->c_txmem) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Packet larger than txmem\n",
+  TYPE_XILINX_AXI_ENET);
+s->txpos = 0;
+return size;
+}
+
+if (s->txpos == 0 && eop) {
+/* Fast path single fragment.  */
+s->txpos = size;
+} else {
+memcpy(s->txmem + s->txpos, buf, size);
+buf = s->txmem;
+s->txpos += size;
+
+if (!eop) {
+return size;
+}
+}
+
 /* Jumbo or vlan sizes ?  */
 if (!(s->tc & TC_JUM)) {
-if (size > 1518 && size <= 1522 && !(s->tc & TC_VLAN)) {
+if (s->txpos > 1518 && s->txpos <= 1522 && !(s->tc & TC_VLAN)) {
+s->txpos = 0;
 return size;
 }
 }
@@ -923,7 +945,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t 
*buf, size_t size,
 uint32_t tmp_csum;
 uint16_t csum;
 
-tmp_csum = net_checksum_add(size - start_off,
+tmp_csum = net_checksum_add(s->txpos - start_off,
 buf + start_off);
 /* Accumulate the seed.  */
 tmp_csum += s->hdr[2] & 0x;
@@ -936,12 +958,13 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t 
*buf, size_t size,
 buf[write_off + 1] = csum & 0xff;
 }
 
-qemu_send_packet(qemu_get_queue(s->nic), buf, size);
+qemu_send_packet(qemu_get_queue(s->nic), buf, s->txpos);
 
-s->stats.tx_bytes += size;
+s->stats.tx_bytes += s->txpos;
 s->regs[R_IS] |= IS_TX_COMPLETE;
 enet_update_irq(s);
 
+s->txpos = 0;
 return size;
 }
 
@@ -989,6 +1012,7 @@ static void xilinx_enet_realize(DeviceState *dev, Error 
**errp)
 s->TEMAC.parent = s;
 
 s->rxmem = g_malloc(s->c_rxmem);
+s->txmem = g_malloc(s->c_txmem);
 return;
 
 xilinx_enet_realize_fail:
-- 
2.20.1

[PATCH v2 7/9] hw/dma/xilinx_axidma: mm2s: Stream descriptor by descriptor

2020-05-06 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Stream descriptor by descriptor from memory instead of
buffering entire packets before pushing. This enables
non-packet streaming clients to work and also lifts the
limitation that our internal DMA buffer needs to be able
to hold entire packets.

Reviewed-by: Alistair Francis 
Signed-off-by: Edgar E. Iglesias 
---
 hw/dma/xilinx_axidma.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index a770e12c96..101d32a965 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -111,7 +111,6 @@ struct Stream {
 int nr;
 
 struct SDesc desc;
-int pos;
 unsigned int complete_cnt;
 uint32_t regs[R_MAX];
 uint8_t app[20];
@@ -267,7 +266,9 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
  StreamSlave *tx_control_dev)
 {
 uint32_t prev_d;
-unsigned int txlen;
+uint32_t txlen;
+uint64_t addr;
+bool eop;
 
 if (!stream_running(s) || stream_idle(s)) {
 return;
@@ -282,24 +283,26 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
 }
 
 if (stream_desc_sof(&s->desc)) {
-s->pos = 0;
 stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app), 
true);
 }
 
 txlen = s->desc.control & SDESC_CTRL_LEN_MASK;
-if ((txlen + s->pos) > sizeof s->txbuf) {
-hw_error("%s: too small internal txbuf! %d\n", __func__,
- txlen + s->pos);
-}
 
-address_space_read(&s->dma->as, s->desc.buffer_address,
-   MEMTXATTRS_UNSPECIFIED,
-   s->txbuf + s->pos, txlen);
-s->pos += txlen;
+eop = stream_desc_eof(&s->desc);
+addr = s->desc.buffer_address;
+while (txlen) {
+unsigned int len;
+
+len = txlen > sizeof s->txbuf ? sizeof s->txbuf : txlen;
+address_space_read(&s->dma->as, addr,
+   MEMTXATTRS_UNSPECIFIED,
+   s->txbuf, len);
+stream_push(tx_data_dev, s->txbuf, len, eop && len == txlen);
+txlen -= len;
+addr += len;
+}
 
-if (stream_desc_eof(&s->desc)) {
-stream_push(tx_data_dev, s->txbuf, s->pos, true);
-s->pos = 0;
+if (eop) {
 stream_complete(s);
 }
 
-- 
2.20.1

Re: [PATCH v4 03/13] acpi: rtc: use a single crs range

2020-05-06 Thread Gerd Hoffmann

  Hi,

> >  crs = aml_resource_template();
> >  aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE, RTC_ISA_BASE,
> > -   0x10, 0x02));
> > +   0x10, 0x08));
> >  aml_append(crs, aml_irq_no_flags(RTC_ISA_IRQ));
> > -aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE + 2, RTC_ISA_BASE + 
> > 2,
> > -   0x02, 0x06));
> can we just drop the later range as unused? (I don't see where it's actually 
> initialized)

I'd rather follow what physical hardware is doing here
for better compatibility ...

take care,
  Gerd

Re: [PATCH 1/2] migration/multifd: fix memleaks in multifd_new_send_channel_async

2020-05-06 Thread Juan Quintela

Pan Nengyuan  wrote:
> When error happen in multifd_new_send_channel_async, 'sioc' will not be used
> to create the multifd_send_thread. Let's free it to avoid a memleak. And also
> do error_free after migrate_set_error() to avoid another leak in the same 
> place.
>
> The leak stack:
> Direct leak of 2880 byte(s) in 8 object(s) allocated from:
> #0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
> #1 0x7f20b44df1d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5)
> #2 0x564133bce18b in object_new_with_type 
> /mnt/sdb/backup/qemu/qom/object.c:683
> #3 0x564133eea950 in qio_channel_socket_new 
> /mnt/sdb/backup/qemu/io/channel-socket.c:56
> #4 0x5641339cfe4f in socket_send_channel_create 
> /mnt/sdb/backup/qemu/migration/socket.c:37
> #5 0x564133a10328 in multifd_save_setup 
> /mnt/sdb/backup/qemu/migration/multifd.c:772
> #6 0x5641339cebed in migrate_fd_connect 
> /mnt/sdb/backup/qemu/migration/migration.c:3530
> #7 0x5641339d15e4 in migration_channel_connect 
> /mnt/sdb/backup/qemu/migration/channel.c:92
> #8 0x5641339cf5b7 in socket_outgoing_migration 
> /mnt/sdb/backup/qemu/migration/socket.c:108
>
> Direct leak of 384 byte(s) in 8 object(s) allocated from:
> #0 0x7f20b5118cf0 in calloc (/lib64/libasan.so.5+0xefcf0)
> #1 0x7f20b44df22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d)
> #2 0x56413406fc17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61
> #3 0x564134070464 in error_setg_errno_internal 
> /mnt/sdb/backup/qemu/util/error.c:109
> #4 0x5641340851be in inet_connect_addr 
> /mnt/sdb/backup/qemu/util/qemu-sockets.c:379
> #5 0x5641340851be in inet_connect_saddr 
> /mnt/sdb/backup/qemu/util/qemu-sockets.c:458
> #6 0x5641340870ab in socket_connect 
> /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105
> #7 0x564133eeaabf in qio_channel_socket_connect_sync 
> /mnt/sdb/backup/qemu/io/channel-socket.c:145
> #8 0x564133eeabf5 in qio_channel_socket_connect_worker 
> /mnt/sdb/backup/qemu/io/channel-socket.c:168
>
> Indirect leak of 360 byte(s) in 8 object(s) allocated from:
> #0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
> #1 0x7f20af901817 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d817)
> #2 0x7f20b451fa6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c)
> #3 0x7f20b44f8cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0)
> #4 0x7f20b44f8d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c)
> #5 0x56413406fc86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65
> #6 0x564134070464 in error_setg_errno_internal 
> /mnt/sdb/backup/qemu/util/error.c:109
> #7 0x5641340851be in inet_connect_addr 
> /mnt/sdb/backup/qemu/util/qemu-sockets.c:379
> #8 0x5641340851be in inet_connect_saddr 
> /mnt/sdb/backup/qemu/util/qemu-sockets.c:458
> #9 0x5641340870ab in socket_connect 
> /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105
> #10 0x564133eeaabf in qio_channel_socket_connect_sync 
> /mnt/sdb/backup/qemu/io/channel-socket.c:145
> #11 0x564133eeabf5 in qio_channel_socket_connect_worker 
> /mnt/sdb/backup/qemu/io/channel-socket.c:168
>
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 

Reviewed-by: Juan Quintela 

I am not sure that this are the only possible error cases, but they are
a step on the right direction.

Thanks, Juan.

Re: [PATCH 2/2] migration/multifd: Do error_free after migrate_set_error to avoid memleaks

2020-05-06 Thread Juan Quintela

Pan Nengyuan  wrote:
> When error happen in multifd_send_thread, it use error_copy to set migrate 
> error in
> multifd_send_terminate_threads(). We should call error_free after it.
>
> Similarly, fix another two places in multifd_recv_thread/multifd_save_cleanup.
>
> The leak stack:
> Direct leak of 48 byte(s) in 1 object(s) allocated from:
> #0 0x7f781af07cf0 in calloc (/lib64/libasan.so.5+0xefcf0)
> #1 0x7f781a2ce22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d)
> #2 0x55ee1d075c17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61
> #3 0x55ee1d076464 in error_setg_errno_internal 
> /mnt/sdb/backup/qemu/util/error.c:109
> #4 0x55ee1cef066e in qio_channel_socket_writev 
> /mnt/sdb/backup/qemu/io/channel-socket.c:569
> #5 0x55ee1cee806b in qio_channel_writev 
> /mnt/sdb/backup/qemu/io/channel.c:207
> #6 0x55ee1cee806b in qio_channel_writev_all 
> /mnt/sdb/backup/qemu/io/channel.c:171
> #7 0x55ee1cee8248 in qio_channel_write_all 
> /mnt/sdb/backup/qemu/io/channel.c:257
> #8 0x55ee1ca12c9a in multifd_send_thread 
> /mnt/sdb/backup/qemu/migration/multifd.c:657
> #9 0x55ee1d0607fc in qemu_thread_start 
> /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519
> #10 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
> #11 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2)
>
> Indirect leak of 52 byte(s) in 1 object(s) allocated from:
> #0 0x7f781af07f28 in __interceptor_realloc (/lib64/libasan.so.5+0xeff28)
> #1 0x7f78156f07d9 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d7d9)
> #2 0x7f781a30ea6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c)
> #3 0x7f781a2e7cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0)
> #4 0x7f781a2e7d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c)
> #5 0x55ee1d075c86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65
> #6 0x55ee1d076464 in error_setg_errno_internal 
> /mnt/sdb/backup/qemu/util/error.c:109
> #7 0x55ee1cef066e in qio_channel_socket_writev 
> /mnt/sdb/backup/qemu/io/channel-socket.c:569
> #8 0x55ee1cee806b in qio_channel_writev 
> /mnt/sdb/backup/qemu/io/channel.c:207
> #9 0x55ee1cee806b in qio_channel_writev_all 
> /mnt/sdb/backup/qemu/io/channel.c:171
> #10 0x55ee1cee8248 in qio_channel_write_all 
> /mnt/sdb/backup/qemu/io/channel.c:257
> #11 0x55ee1ca12c9a in multifd_send_thread 
> /mnt/sdb/backup/qemu/migration/multifd.c:657
> #12 0x55ee1d0607fc in qemu_thread_start 
> /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519
> #13 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
> #14 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2)
>
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 

Reviewed-by: Juan Quintela

Re: [PATCH v4 07/13] acpi: move aml builder code for parallel device

2020-05-06 Thread Gerd Hoffmann

> > +static void parallel_isa_build_aml(ISADevice *isadev, Aml *scope)
> > +{
> > +ISAParallelState *isa = ISA_PARALLEL(isadev);
> > +int i, uid = 0;
> > +Aml *dev;
> > +Aml *crs;
> > +
> > +for (i = 0; i < ARRAY_SIZE(isa_parallel_io); i++) {
> > +if (isa->iobase == isa_parallel_io[i]) {
> > +uid = i + 1;
> 
> I'm not sure about this check, as we can create a ISA device setting
> manually index & iobase. What about using simply "uid = isa->index + 1"
> instead?

Looking at the code I see isa->index is assigned unconditionally.  I
misremembered that detail.  So, yes, simply using isa->index should work
fine even with '-device isa-serial,iobase=".  I'll fix it for
both serial and parallel.

cheers,
  Gerd

Re: [PATCH] s390x: pv: Fix KVM_PV_PREP_RESET command wrapper name

2020-05-06 Thread Cornelia Huck

On Tue,  5 May 2020 08:41:59 -0400
Janosch Frank  wrote:

> s390_pv_perf_clear_reset() is not a very helpful name since that
> function needs to be called for a normal and a clear reset via
> diag308.
> 
> Let's instead name it s390_pv_prep_reset() which reflects the purpose
> of the function a bit better.
> 
> Signed-off-by: Janosch Frank 
> ---
>  hw/s390x/pv.c  | 2 +-
>  hw/s390x/s390-virtio-ccw.c | 2 +-
>  include/hw/s390x/pv.h  | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)

Thanks, applied.

Re: [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state

2020-05-06 Thread Dima Stepanov

On Sun, May 03, 2020 at 09:06:38PM -0400, Raphael Norwitz wrote:
> Apologies for mixing up patches last time. This looks good from a
> vhost-user-blk perspective, but I worry that some of these changes
> could impact other vhost device types.
> 
> I agree with adding notifiers_set to struct vhost_dev, and setting it in
> vhost_dev_enable/disable notifiers, but is there any reason notifiers_set
> can’t be checked inside vhost-user-blk?
Thanks for your review. I also have some concerns about changing current
API, but my idea was that these issues will be triggered for all
vhost-user/reconnect devices. But maybe you are right and first we
should fix vhost-user-blk issues.
I'll try to modify patch 2 and 3 in my patchset, so new notifiers_set
field will be added, but no API change will be made. Will see how it
looks.

Re: [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device

2020-05-06 Thread Dima Stepanov

On Sun, May 03, 2020 at 08:36:45PM -0400, Raphael Norwitz wrote:
> I’m happy from the vhost, vhost-user-blk and vhost-user-scsi side. For
> other device types it looks pretty straightforward, but their maintainers
> should probably confirm.
> 
> Since you plan to change the behavior of these helpers in subsequent
> patches, maybe consider sending the other device types separately
> after the rest of the series has been merged? That way the changes to
> individual devices will be much easier to review.

Thanks for comments.
Agree, will make a more straightforward fix only for vhost-user-blk.
After it we can figure out how to propogate this change to other
devices.

> 
> On Thu, Apr 30, 2020 at 9:48 AM Dima Stepanov  wrote:
> >
> > Introduce new wrappers to set/reset guest notifiers for the virtio
> > device in the vhost device module:
> >   vhost_dev_assign_guest_notifiers
> > ->set_guest_notifiers(..., ..., true);
> >   vhost_dev_drop_guest_notifiers
> > ->set_guest_notifiers(..., ..., false);
> > This is a preliminary step to refactor code, so the set_guest_notifiers
> > methods could be called based on the vhost device state.
> > Update all vhost used devices to use these wrappers instead of direct
> > method call.
> >
> > Signed-off-by: Dima Stepanov 
> > ---
> >  backends/cryptodev-vhost.c  | 26 +++---
> >  backends/vhost-user.c   | 16 +---
> >  hw/block/vhost-user-blk.c   | 15 +--
> >  hw/net/vhost_net.c  | 30 +-
> >  hw/scsi/vhost-scsi-common.c | 15 +--
> >  hw/virtio/vhost-user-fs.c   | 17 +++--
> >  hw/virtio/vhost-vsock.c | 18 --
> >  hw/virtio/vhost.c   | 38 ++
> >  hw/virtio/virtio.c  | 13 +
> >  include/hw/virtio/vhost.h   |  4 
> >  include/hw/virtio/virtio.h  |  1 +
> >  11 files changed, 118 insertions(+), 75 deletions(-)
> >

Re: [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write

2020-05-06 Thread Li Feng

Thanks,

Feng Li

Dima Stepanov  于2020年4月30日周四 下午9:36写道：
>
> During testing of the vhost-user-blk reconnect functionality the qemu
> SIGSEGV was triggered:
>  start qemu as:
>  x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \
>-object 
> memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \
>-numa node,cpus=0,memdev=ram-node0 \
>-chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \
>-device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm
>  start vhost-user-blk daemon:
>  ./vhost-user-blk -s ./vhost.sock -b test-img.raw
>
> If vhost-user-blk will be killed during the vhost initialization
> process, for instance after getting VHOST_SET_VRING_CALL command, then
> QEMU will fail with the following backtrace:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffd5b0)
> at ./hw/virtio/vhost-user.c:260
> 260 CharBackend *chr = u->user->chr;
>
>  #0  0x559272bb in vhost_user_read (dev=0x7fffef2d53e0, 
> msg=0x7fffd5b0)
> at ./hw/virtio/vhost-user.c:260
>  #1  0x5592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, 
> config=0x7fffef2d5394 "", config_len=60)
> at ./hw/virtio/vhost-user.c:1645
>  #2  0x55925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, 
> config=0x7fffef2d5394 "", config_len=60)
> at ./hw/virtio/vhost.c:1490
>  #3  0x558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, 
> errp=0x7fffd8f0)
> at ./hw/block/vhost-user-blk.c:429
>  #4  0x55920090 in virtio_device_realize (dev=0x7fffef2d51a0, 
> errp=0x7fffd948)
> at ./hw/virtio/virtio.c:3615
>  #5  0x55a9779c in device_set_realized (obj=0x7fffef2d51a0, 
> value=true, errp=0x7fffdb88)
> at ./hw/core/qdev.c:891
>  ...
>
> The problem is that vhost_user_write doesn't get an error after
> disconnect and try to call vhost_user_read(). The tcp_chr_write()
> routine should return -1 in case of disconnect. Indicate the EIO error
> if this routine is called in the disconnected state.
>
> Signed-off-by: Dima Stepanov 
> ---
>  chardev/char-socket.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 185fe38..c128cca 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t 
> *buf, int len)
>  if (ret < 0 && errno != EAGAIN) {
>  if (tcp_chr_read_poll(chr) <= 0) {
>  tcp_chr_disconnect_locked(chr);
> -return len;
> +/* Return an error since we made a disconnect. */
> +return ret;
The `return` statement could be deleted.
The outside has a return statement.

>  } /* else let the read handler finish it properly */
>  }
>
>  return ret;
>  } else {
> -/* XXX: indicate an error ? */
> -return len;
> +/* Indicate an error. */
> +errno = EIO;
> +return -1;
>  }
>  }
>
> --
> 2.7.4
>

[PATCH 0/8] drop unallocated_blocks_are_zero

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

Hi all!

This is first step to block-status refactoring, and solves most simple
problem mentioned in my investigation of block-status described in
the thread "backing chain & block status & filters":
  https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg04706.html


unallocated_blocks_are_zero doesn't simplify all the logic about
block-status, and happily it's not needed, as shown by the following
patches. So, let's get rid of it.

Vladimir Sementsov-Ogievskiy (8):
  block/vdi: return ZERO block-status when appropriate
  block/vpc: return ZERO block-status when appropriate
  block/crypto: drop unallocated_blocks_are_zero
  block/iscsi: drop unallocated_blocks_are_zero
  block/file-posix: drop unallocated_blocks_are_zero
  block/vhdx: drop unallocated_blocks_are_zero
  qemu-img: convert: don't use unallocated_blocks_are_zero
  block: drop unallocated_blocks_are_zero

 include/block/block.h |  6 --
 include/block/block_int.h | 13 -
 block.c   | 15 ---
 block/crypto.c|  1 -
 block/file-posix.c|  3 ---
 block/io.c|  8 
 block/iscsi.c |  1 -
 block/qcow2.c |  1 -
 block/qed.c   |  1 -
 block/vdi.c   |  3 +--
 block/vhdx.c  |  3 ---
 block/vpc.c   |  3 +--
 qemu-img.c|  4 +---
 13 files changed, 19 insertions(+), 43 deletions(-)

-- 
2.21.0

[PATCH 2/8] block/vpc: return ZERO block-status when appropriate

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

In case when get_image_offset() returns -1, we do zero out the
corresponding chunk of qiov. So, this should be reported as ZERO.

After block-status update, it never reports 0, so setting
unallocated_blocks_are_zero doesn't make sense. Drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/vpc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 2d1eade146..555f9d8587 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -606,7 +606,6 @@ static int vpc_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 bdi->cluster_size = s->block_size;
 }
 
-bdi->unallocated_blocks_are_zero = true;
 return 0;
 }
 
@@ -745,7 +744,7 @@ static int coroutine_fn 
vpc_co_block_status(BlockDriverState *bs,
 image_offset = get_image_offset(bs, offset, false, NULL);
 allocated = (image_offset != -1);
 *pnum = 0;
-ret = 0;
+ret = BDRV_BLOCK_ZERO;
 
 do {
 /* All sectors in a block are contiguous (without using the bitmap) */
-- 
2.21.0

[PATCH 5/8] block/file-posix: drop unallocated_blocks_are_zero

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

raw_co_block_status() in block/file-posix.c never returns 0, so
unallocated_blocks_are_zero is useless. Drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/file-posix.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 05e094be29..5c01735108 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2878,9 +2878,6 @@ static int coroutine_fn raw_co_pwrite_zeroes(
 
 static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
-BDRVRawState *s = bs->opaque;
-
-bdi->unallocated_blocks_are_zero = s->discard_zeroes;
 return 0;
 }
 
-- 
2.21.0

[PATCH 4/8] block/iscsi: drop unallocated_blocks_are_zero

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

We set bdi->unallocated_blocks_are_zero = iscsilun->lbprz, but
iscsi_co_block_status doesn't return 0 in case of iscsilun->lbprz, it
returns ZERO when appropriate. So actually unallocated_blocks_are_zero
is useless. Drop it now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/iscsi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index a8b76979d8..767e3e75fd 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2163,7 +2163,6 @@ static int coroutine_fn 
iscsi_co_truncate(BlockDriverState *bs, int64_t offset,
 static int iscsi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 IscsiLun *iscsilun = bs->opaque;
-bdi->unallocated_blocks_are_zero = iscsilun->lbprz;
 bdi->cluster_size = iscsilun->cluster_size;
 return 0;
 }
-- 
2.21.0

[PATCH 1/8] block/vdi: return ZERO block-status when appropriate

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

In case of !VDI_IS_ALLOCATED[], we do zero out the corresponding chunk
of qiov. So, this should be reported as ZERO.

After block-status update, it never reports 0, so setting
unallocated_blocks_are_zero doesn't make sense. Drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/vdi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 0c7835ae70..83471528d2 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -334,7 +334,6 @@ static int vdi_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 logout("\n");
 bdi->cluster_size = s->block_size;
 bdi->vm_state_offset = 0;
-bdi->unallocated_blocks_are_zero = true;
 return 0;
 }
 
@@ -536,7 +535,7 @@ static int coroutine_fn 
vdi_co_block_status(BlockDriverState *bs,
 *pnum = MIN(s->block_size - index_in_block, bytes);
 result = VDI_IS_ALLOCATED(bmap_entry);
 if (!result) {
-return 0;
+return BDRV_BLOCK_ZERO;
 }
 
 *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
-- 
2.21.0

[PATCH 8/8] block: drop unallocated_blocks_are_zero

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

Currently this field only set by qed and qcow2. But in fact, all
backing-supporting formats (parallels, qcow, qcow2, qed, vmdk) share
this semantics: on unallocated blocks, if there is no backing file they
just memset the buffer with zeroes.

So, document this behavior for .supports_backing and drop
.unallocated_blocks_are_zero

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  6 --
 include/block/block_int.h | 13 -
 block.c   | 15 ---
 block/io.c|  8 
 block/qcow2.c |  1 -
 block/qed.c   |  1 -
 6 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 8b62429aa4..db1cb503ec 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -21,11 +21,6 @@ typedef struct BlockDriverInfo {
 /* offset at which the VM state can be saved (0 if not possible) */
 int64_t vm_state_offset;
 bool is_dirty;
-/*
- * True if unallocated blocks read back as zeroes. This is equivalent
- * to the LBPRZ flag in the SCSI logical block provisioning page.
- */
-bool unallocated_blocks_are_zero;
 /*
  * True if this block driver only supports compressed writes
  */
@@ -431,7 +426,6 @@ int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, 
int64_t bytes);
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_has_zero_init_truncate(BlockDriverState *bs);
-bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int bdrv_block_status(BlockDriverState *bs, int64_t offset,
   int64_t bytes, int64_t *pnum, int64_t *map,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 92335f33c7..c156b22c6b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -115,7 +115,18 @@ struct BlockDriver {
  */
 bool bdrv_needs_filename;
 
-/* Set if a driver can support backing files */
+/*
+ * Set if a driver can support backing files. This also implies the
+ * following semantics:
+ *
+ *  - Return status 0 of .bdrv_co_block_status means that corresponding
+ *blocks are not allocated in this layer of backing-chain
+ *  - For such (unallocated) blocks, read will:
+ *- read from backing file if there is one and big enough
+ *- fill buffer with zeroes if there is no backing file
+ *- space after EOF of the backing file considered as zero
+ *  (corresponding part of read-buffer must be zeroed by driver)
+ */
 bool supports_backing;
 
 /* For handling image reopen for split or non-split files */
diff --git a/block.c b/block.c
index cf5c19b1db..0283fdecbc 100644
--- a/block.c
+++ b/block.c
@@ -5305,21 +5305,6 @@ int bdrv_has_zero_init_truncate(BlockDriverState *bs)
 return 0;
 }
 
-bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
-{
-BlockDriverInfo bdi;
-
-if (bs->backing) {
-return false;
-}
-
-if (bdrv_get_info(bs, &bdi) == 0) {
-return bdi.unallocated_blocks_are_zero;
-}
-
-return false;
-}
-
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs)
 {
 if (!(bs->open_flags & BDRV_O_UNMAP)) {
diff --git a/block/io.c b/block/io.c
index a4f9714230..484adec5a1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2385,16 +2385,16 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 
 if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
 ret |= BDRV_BLOCK_ALLOCATED;
-} else if (want_zero) {
-if (bdrv_unallocated_blocks_are_zero(bs)) {
-ret |= BDRV_BLOCK_ZERO;
-} else if (bs->backing) {
+} else if (want_zero && bs->drv->supports_backing) {
+if (bs->backing) {
 BlockDriverState *bs2 = bs->backing->bs;
 int64_t size2 = bdrv_getlength(bs2);
 
 if (size2 >= 0 && offset >= size2) {
 ret |= BDRV_BLOCK_ZERO;
 }
+} else {
+ret |= BDRV_BLOCK_ZERO;
 }
 }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 2ba0b17c39..dc3c0aac2b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4858,7 +4858,6 @@ err:
 static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 BDRVQcow2State *s = bs->opaque;
-bdi->unallocated_blocks_are_zero = true;
 bdi->cluster_size = s->cluster_size;
 bdi->vm_state_offset = qcow2_vm_state_offset(s);
 return 0;
diff --git a/block/qed.c b/block/qed.c
index b0fdb8f565..fb7833dc8b 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1514,7 +1514,6 @@ static int bdrv_qed_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 memset(bdi, 0, sizeof(*bdi));
 bdi->cluster_size = s->header.cluster_size;
 bdi->is_dirty = s->header.features & QED_F_NEED_CHECK;
-bdi->unallocated_blocks_are_zero = true;
 return 0;

[PATCH 3/8] block/crypto: drop unallocated_blocks_are_zero

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

It's false by default, no needs to set it. We are going to drop this
variable at all, so drop it now here, it doesn't hurt.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/crypto.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index e02f343590..7685e61844 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -694,7 +694,6 @@ static int block_crypto_get_info_luks(BlockDriverState *bs,
 return ret;
 }
 
-bdi->unallocated_blocks_are_zero = false;
 bdi->cluster_size = subbdi.cluster_size;
 
 return 0;
-- 
2.21.0

[PATCH 6/8] block/vhdx: drop unallocated_blocks_are_zero

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

vhdx doesn't have .bdrv_co_block_status handler, so DATA|ALLOCATED is
always assumed for it. unallocated_blocks_are_zero is useless, drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/vhdx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index aedd782604..45963a3166 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1164,9 +1164,6 @@ static int vhdx_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 
 bdi->cluster_size = s->block_size;
 
-bdi->unallocated_blocks_are_zero =
-(s->params.data_bits & VHDX_PARAMS_HAS_PARENT) == 0;
-
 return 0;
 }
 
-- 
2.21.0

[PATCH 7/8] qemu-img: convert: don't use unallocated_blocks_are_zero

2020-05-06 Thread Vladimir Sementsov-Ogievskiy

qemu-img convert wants to distinguish ZERO which comes from short
backing files. unallocated_blocks_are_zero field of bdi is unrelated:
space after EOF is always considered to be zero anyway. So, just make
post_backing_zero true in case of short backing file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qemu-img.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 6a4327aaba..4fe751878b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1632,7 +1632,6 @@ typedef struct ImgConvertState {
 BlockBackend *target;
 bool has_zero_init;
 bool compressed;
-bool unallocated_blocks_are_zero;
 bool target_is_new;
 bool target_has_backing;
 int64_t target_backing_sectors; /* negative if unknown */
@@ -1677,7 +1676,7 @@ static int convert_iteration_sectors(ImgConvertState *s, 
int64_t sector_num)
 
 if (s->target_backing_sectors >= 0) {
 if (sector_num >= s->target_backing_sectors) {
-post_backing_zero = s->unallocated_blocks_are_zero;
+post_backing_zero = true;
 } else if (sector_num + n > s->target_backing_sectors) {
 /* Split requests around target_backing_sectors (because
  * starting from there, zeros are handled differently) */
@@ -2580,7 +2579,6 @@ static int img_convert(int argc, char **argv)
 } else {
 s.compressed = s.compressed || bdi.needs_compressed_writes;
 s.cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE;
-s.unallocated_blocks_are_zero = bdi.unallocated_blocks_are_zero;
 }
 
 ret = convert_do_copy(&s);
-- 
2.21.0

Re: [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write

2020-05-06 Thread Marc-André Lureau

On Thu, Apr 30, 2020 at 3:37 PM Dima Stepanov  wrote:
>
> During testing of the vhost-user-blk reconnect functionality the qemu
> SIGSEGV was triggered:
>  start qemu as:
>  x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \
>-object 
> memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \
>-numa node,cpus=0,memdev=ram-node0 \
>-chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \
>-device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm
>  start vhost-user-blk daemon:
>  ./vhost-user-blk -s ./vhost.sock -b test-img.raw
>
> If vhost-user-blk will be killed during the vhost initialization
> process, for instance after getting VHOST_SET_VRING_CALL command, then
> QEMU will fail with the following backtrace:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffd5b0)
> at ./hw/virtio/vhost-user.c:260
> 260 CharBackend *chr = u->user->chr;
>
>  #0  0x559272bb in vhost_user_read (dev=0x7fffef2d53e0, 
> msg=0x7fffd5b0)
> at ./hw/virtio/vhost-user.c:260
>  #1  0x5592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, 
> config=0x7fffef2d5394 "", config_len=60)
> at ./hw/virtio/vhost-user.c:1645
>  #2  0x55925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, 
> config=0x7fffef2d5394 "", config_len=60)
> at ./hw/virtio/vhost.c:1490
>  #3  0x558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, 
> errp=0x7fffd8f0)
> at ./hw/block/vhost-user-blk.c:429
>  #4  0x55920090 in virtio_device_realize (dev=0x7fffef2d51a0, 
> errp=0x7fffd948)
> at ./hw/virtio/virtio.c:3615
>  #5  0x55a9779c in device_set_realized (obj=0x7fffef2d51a0, 
> value=true, errp=0x7fffdb88)
> at ./hw/core/qdev.c:891
>  ...
>
> The problem is that vhost_user_write doesn't get an error after
> disconnect and try to call vhost_user_read(). The tcp_chr_write()
> routine should return -1 in case of disconnect. Indicate the EIO error
> if this routine is called in the disconnected state.
>
> Signed-off-by: Dima Stepanov 


Reviewed-by: Marc-André Lureau 

> ---
>  chardev/char-socket.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 185fe38..c128cca 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t 
> *buf, int len)
>  if (ret < 0 && errno != EAGAIN) {
>  if (tcp_chr_read_poll(chr) <= 0) {
>  tcp_chr_disconnect_locked(chr);
> -return len;
> +/* Return an error since we made a disconnect. */
> +return ret;
>  } /* else let the read handler finish it properly */
>  }
>
>  return ret;
>  } else {
> -/* XXX: indicate an error ? */
> -return len;
> +/* Indicate an error. */
> +errno = EIO;
> +return -1;
>  }
>  }
>
> --
> 2.7.4
>

[PATCH v1 01/17] exec: Introduce ram_block_discard_set_(unreliable|required)()

2020-05-06 Thread David Hildenbrand

We want to replace qemu_balloon_inhibit() by something more generic.
Especially, we want to make sure that technologies that really rely on
RAM block discards to work reliably to run mutual exclusive with
technologies that break it.

E.g., vfio will usually pin all guest memory, turning the virtio-balloon
basically useless and make the VM consume more memory than reported via
the balloon. While the balloon is special already (=> no guarantees, same
behavior possible afer reboots and with huge pages), this will be
different, especially, with virtio-mem.

Let's implement a way such that we can make both types of technology run
mutually exclusive. We'll convert existing balloon inhibitors in successive
patches and add some new ones. Add the check to
qemu_balloon_is_inhibited() for now. We might want to make
virtio-balloon an acutal inhibitor in the future - however, that
requires more thought to not break existing setups.

Cc: "Michael S. Tsirkin" 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Signed-off-by: David Hildenbrand 
---
 balloon.c |  3 ++-
 exec.c| 48 +++
 include/exec/memory.h | 41 
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/balloon.c b/balloon.c
index f104b42961..c49f57c27b 100644
--- a/balloon.c
+++ b/balloon.c
@@ -40,7 +40,8 @@ static int balloon_inhibit_count;
 
 bool qemu_balloon_is_inhibited(void)
 {
-return atomic_read(&balloon_inhibit_count) > 0;
+return atomic_read(&balloon_inhibit_count) > 0 ||
+   ram_block_discard_is_broken();
 }
 
 void qemu_balloon_inhibit(bool state)
diff --git a/exec.c b/exec.c
index 2874bb5088..52a6e40e99 100644
--- a/exec.c
+++ b/exec.c
@@ -4049,4 +4049,52 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, 
MemoryRegion *root)
 }
 }
 
+static int ram_block_discard_broken;
+
+int ram_block_discard_set_broken(bool state)
+{
+int old;
+
+if (!state) {
+atomic_dec(&ram_block_discard_broken);
+return 0;
+}
+
+do {
+old = atomic_read(&ram_block_discard_broken);
+if (old < 0) {
+return -EBUSY;
+}
+} while (atomic_cmpxchg(&ram_block_discard_broken, old, old + 1) != old);
+return 0;
+}
+
+int ram_block_discard_set_required(bool state)
+{
+int old;
+
+if (!state) {
+atomic_inc(&ram_block_discard_broken);
+return 0;
+}
+
+do {
+old = atomic_read(&ram_block_discard_broken);
+if (old > 0) {
+return -EBUSY;
+}
+} while (atomic_cmpxchg(&ram_block_discard_broken, old, old - 1) != old);
+return 0;
+}
+
+bool ram_block_discard_is_broken(void)
+{
+return atomic_read(&ram_block_discard_broken) > 0;
+}
+
+bool ram_block_discard_is_required(void)
+{
+return atomic_read(&ram_block_discard_broken) < 0;
+}
+
 #endif
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e000bd2f97..9bb5ced38d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2463,6 +2463,47 @@ static inline MemOp devend_memop(enum device_endian end)
 }
 #endif
 
+/*
+ * Inhibit technologies that rely on discarding of parts of RAM blocks to work
+ * reliably, e.g., to manage the actual amount of memory consumed by the VM
+ * (then, the memory provided by RAM blocks might be bigger than the desired
+ * memory consumption). This *must* be set if:
+ * - Discarding parts of a RAM blocks does not result in the change being
+ *   reflected in the VM and the pages getting freed.
+ * - All memory in RAM blocks is pinned or duplicated, invaldiating any 
previous
+ *   discards blindly.
+ * - Discarding parts of a RAM blocks will result in integrity issues (e.g.,
+ *   encrypted VMs).
+ * Technologies that only temporarily pin the current working set of a
+ * driver are fine, because we don't expect such pages to be discarded
+ * (esp. based on guest action like balloon inflation).
+ *
+ * This is *not* to be used to protect from concurrent discards (esp.,
+ * postcopy).
+ *
+ * Returns 0 if successful. Returns -EBUSY if a technology that relies on
+ * discards to work reliably is active.
+ */
+int ram_block_discard_set_broken(bool state);
+
+/*
+ * Inhibit technologies that will break discarding of pages in RAM blocks.
+ *
+ * Returns 0 if successful. Returns -EBUSY if discards are already set to
+ * broken.
+ */
+int ram_block_discard_set_required(bool state);
+
+/*
+ * Test if discarding of memory in ram blocks is broken.
+ */
+bool ram_block_discard_is_broken(void);
+
+/*
+ * Test if discarding of memory in ram blocks is required to work reliably.
+ */
+bool ram_block_discard_is_required(void);
+
 #endif
 
 #endif
-- 
2.25.3

[PATCH v1 03/17] accel/kvm: Convert to ram_block_discard_set_broken()

2020-05-06 Thread David Hildenbrand

Discarding memory does not work as expected. At the time this is called,
we cannot have anyone active that relies on discards to work properly.

Cc: Paolo Bonzini 
Signed-off-by: David Hildenbrand 
---
 accel/kvm/kvm-all.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 439a4efe52..33421184ac 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -40,7 +40,6 @@
 #include "trace.h"
 #include "hw/irq.h"
 #include "sysemu/sev.h"
-#include "sysemu/balloon.h"
 #include "qapi/visitor.h"
 #include "qapi/qapi-types-common.h"
 #include "qapi/qapi-visit-common.h"
@@ -2107,7 +2106,7 @@ static int kvm_init(MachineState *ms)
 
 s->sync_mmu = !!kvm_vm_check_extension(kvm_state, KVM_CAP_SYNC_MMU);
 if (!s->sync_mmu) {
-qemu_balloon_inhibit(true);
+g_assert(ram_block_discard_set_broken(true));
 }
 
 return 0;
-- 
2.25.3

[PATCH v1 02/17] vfio: Convert to ram_block_discard_set_broken()

2020-05-06 Thread David Hildenbrand

VFIO is (except devices without a physical IOMMU or some mediated devices)
incompatible ram_block_discard_set_broken. The kernel will pin basically
all VM memory. Let's convert to ram_block_discard_set_broke(), which can
now fail, in contrast to qemu_balloon_inhibit().

Leave "x-balloon-allowed" named as it is for now.

Cc: Cornelia Huck 
Cc: Alex Williamson 
Cc: Christian Borntraeger 
Cc: Tony Krowiak 
Cc: Halil Pasic 
Cc: Pierre Morel 
Cc: Eric Farman 
Signed-off-by: David Hildenbrand 
---
 hw/vfio/ap.c  | 10 +++
 hw/vfio/ccw.c | 11 
 hw/vfio/common.c  | 53 +++
 hw/vfio/pci.c |  6 ++--
 include/hw/vfio/vfio-common.h |  4 +--
 5 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 8649ac15f9..b51546d67a 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -105,12 +105,12 @@ static void vfio_ap_realize(DeviceState *dev, Error 
**errp)
 vapdev->vdev.dev = dev;
 
 /*
- * vfio-ap devices operate in a way compatible with
- * memory ballooning, as no pages are pinned in the host.
- * This needs to be set before vfio_get_device() for vfio common to
- * handle the balloon inhibitor.
+ * vfio-ap devices operate in a way compatible discarding of memory in
+ * RAM blocks, as no pages are pinned in the host. This needs to be
+ * set before vfio_get_device() for vfio common to handle
+ * ram_block_discard_set_broken().
  */
-vapdev->vdev.balloon_allowed = true;
+vapdev->vdev.ram_block_discard_allowed = true;
 
 ret = vfio_get_device(vfio_group, mdevid, &vapdev->vdev, errp);
 if (ret) {
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 50cc2ec75c..0dd6c3f2ab 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -425,12 +425,13 @@ static void vfio_ccw_get_device(VFIOGroup *group, 
VFIOCCWDevice *vcdev,
 
 /*
  * All vfio-ccw devices are believed to operate in a way compatible with
- * memory ballooning, ie. pages pinned in the host are in the current
- * working set of the guest driver and therefore never overlap with pages
- * available to the guest balloon driver.  This needs to be set before
- * vfio_get_device() for vfio common to handle the balloon inhibitor.
+ * discarding of memory in RAM blocks, ie. pages pinned in the host are
+ * in the current working set of the guest driver and therefore never
+ * overlap e.g., with pages available to the guest balloon driver.  This
+ * needs to be set before vfio_get_device() for vfio common to handle
+ * ram_block_discard_set_broken().
  */
-vcdev->vdev.balloon_allowed = true;
+vcdev->vdev.ram_block_discard_allowed = true;
 
 if (vfio_get_device(group, vcdev->cdev.mdevid, &vcdev->vdev, errp)) {
 goto out_err;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0b3593b3c0..98b2573ae6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -33,7 +33,6 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/range.h"
-#include "sysemu/balloon.h"
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
 #include "trace.h"
@@ -1215,31 +1214,36 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 space = vfio_get_address_space(as);
 
 /*
- * VFIO is currently incompatible with memory ballooning insofar as the
+ * VFIO is currently incompatible with discarding of RAM insofar as the
  * madvise to purge (zap) the page from QEMU's address space does not
  * interact with the memory API and therefore leaves stale virtual to
  * physical mappings in the IOMMU if the page was previously pinned.  We
- * therefore add a balloon inhibit for each group added to a container,
+ * therefore set discarding broken for each group added to a container,
  * whether the container is used individually or shared.  This provides
  * us with options to allow devices within a group to opt-in and allow
- * ballooning, so long as it is done consistently for a group (for instance
+ * discarding, so long as it is done consistently for a group (for instance
  * if the device is an mdev device where it is known that the host vendor
  * driver will never pin pages outside of the working set of the guest
- * driver, which would thus not be ballooning candidates).
+ * driver, which would thus not be discarding candidates).
  *
  * The first opportunity to induce pinning occurs here where we attempt to
  * attach the group to existing containers within the AddressSpace.  If any
- * pages are already zapped from the virtual address space, such as from a
- * previous ballooning opt-in, new pinning will cause valid mappings to be
+ * pages are already zapped from the virtual address space, such as from
+ * previous discards, new pinning will cause valid mappings to be
  * re-established.  Likewise, when the overal

[PATCH v1 00/17] virtio-mem: Paravirtualized memory hot(un)plug

2020-05-06 Thread David Hildenbrand

This is the very basic, initial version of virtio-mem. More info on
virtio-mem in general can be found in the Linux kernel driver posting [1]
and in patch #10.

"The basic idea of virtio-mem is to provide a flexible,
cross-architecture memory hot(un)plug solution that avoids many limitations
imposed by existing technologies, architectures, and interfaces."

There are a lot of addons in the works (esp. protection of unplugged
memory, better hugepage support (esp. when reading unplugged memory),
resizeable memory backends, migration optimizations, support for more
architectures, ...), this is the very basic version to get the ball
rolling.

The first 8 patches make sure we don't have any sudden surprises e.g., if
somebody tries to pin all memory in RAM blocks, resulting in a higher
memory consumption than desired. The remaining patches add basic virtio-mem
along with support for x86-64.

[1] https://lkml.kernel.org/r/20200311171422.10484-1-da...@redhat.com

David Hildenbrand (17):
  exec: Introduce ram_block_discard_set_(unreliable|required)()
  vfio: Convert to ram_block_discard_set_broken()
  accel/kvm: Convert to ram_block_discard_set_broken()
  s390x/pv: Convert to ram_block_discard_set_broken()
  virtio-balloon: Rip out qemu_balloon_inhibit()
  target/i386: sev: Use ram_block_discard_set_broken()
  migration/rdma: Use ram_block_discard_set_broken()
  migration/colo: Use ram_block_discard_set_broken()
  linux-headers: update to contain virtio-mem
  virtio-mem: Paravirtualized memory hot(un)plug
  virtio-pci: Proxy for virtio-mem
  MAINTAINERS: Add myself as virtio-mem maintainer
  hmp: Handle virtio-mem when printing memory device info
  numa: Handle virtio-mem in NUMA stats
  pc: Support for virtio-mem-pci
  virtio-mem: Allow notifiers for size changes
  virtio-pci: Send qapi events when the virtio-mem size changes

 MAINTAINERS |   8 +
 accel/kvm/kvm-all.c |   3 +-
 balloon.c   |  17 -
 exec.c  |  48 ++
 hw/core/numa.c  |   6 +
 hw/i386/Kconfig |   1 +
 hw/i386/pc.c|  49 +-
 hw/s390x/s390-virtio-ccw.c  |  22 +-
 hw/vfio/ap.c|  10 +-
 hw/vfio/ccw.c   |  11 +-
 hw/vfio/common.c|  53 +-
 hw/vfio/pci.c   |   6 +-
 hw/virtio/Kconfig   |  11 +
 hw/virtio/Makefile.objs |   2 +
 hw/virtio/virtio-balloon.c  |  12 +-
 hw/virtio/virtio-mem-pci.c  | 159 
 hw/virtio/virtio-mem-pci.h  |  34 +
 hw/virtio/virtio-mem.c  | 781 
 include/exec/memory.h   |  41 +
 include/hw/pci/pci.h|   1 +
 include/hw/vfio/vfio-common.h   |   4 +-
 include/hw/virtio/virtio-mem.h  |  85 +++
 include/migration/colo.h|   2 +-
 include/standard-headers/linux/virtio_ids.h |   1 +
 include/standard-headers/linux/virtio_mem.h | 208 ++
 include/sysemu/balloon.h|   2 -
 migration/migration.c   |   8 +-
 migration/postcopy-ram.c|  23 -
 migration/rdma.c|  18 +-
 migration/savevm.c  |  11 +-
 monitor/hmp-cmds.c  |  16 +
 monitor/monitor.c   |   1 +
 qapi/misc.json  |  64 +-
 target/i386/sev.c   |   1 +
 34 files changed, 1598 insertions(+), 121 deletions(-)
 create mode 100644 hw/virtio/virtio-mem-pci.c
 create mode 100644 hw/virtio/virtio-mem-pci.h
 create mode 100644 hw/virtio/virtio-mem.c
 create mode 100644 include/hw/virtio/virtio-mem.h
 create mode 100644 include/standard-headers/linux/virtio_mem.h

-- 
2.25.3

[PATCH v1 15/17] pc: Support for virtio-mem-pci

2020-05-06 Thread David Hildenbrand

Let's wire it up similar to virtio-pmem. Also disallow unplug, so it's
harder for users to shoot themselves into the foot.

Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: David Hildenbrand 
---
 hw/i386/Kconfig |  1 +
 hw/i386/pc.c| 49 -
 2 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index c93f32f657..03e347b207 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -35,6 +35,7 @@ config PC
 select ACPI_PCI
 select ACPI_VMGENID
 select VIRTIO_PMEM_SUPPORTED
+select VIRTIO_MEM_SUPPORTED
 
 config PC_PCI
 bool
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f6b8431c8b..588804f895 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -86,6 +86,7 @@
 #include "hw/net/ne2000-isa.h"
 #include "standard-headers/asm-x86/bootparam.h"
 #include "hw/virtio/virtio-pmem-pci.h"
+#include "hw/virtio/virtio-mem-pci.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "qapi/qmp/qerror.h"
@@ -1654,8 +1655,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 numa_cpu_pre_plug(cpu_slot, dev, errp);
 }
 
-static void pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev,
-DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_pre_plug(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
 {
 HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
 Error *local_err = NULL;
@@ -1666,7 +1667,8 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler 
*hotplug_dev,
  * order. This should never be the case on x86, however better add
  * a safety net.
  */
-error_setg(errp, "virtio-pmem-pci not supported on this bus.");
+error_setg(errp,
+   "virtio based memory devices not supported on this bus.");
 return;
 }
 /*
@@ -1681,8 +1683,8 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler 
*hotplug_dev,
 error_propagate(errp, local_err);
 }
 
-static void pc_virtio_pmem_pci_plug(HotplugHandler *hotplug_dev,
-DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_plug(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
 {
 HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
 Error *local_err = NULL;
@@ -1700,17 +1702,17 @@ static void pc_virtio_pmem_pci_plug(HotplugHandler 
*hotplug_dev,
 error_propagate(errp, local_err);
 }
 
-static void pc_virtio_pmem_pci_unplug_request(HotplugHandler *hotplug_dev,
-  DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_unplug_request(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
 {
-/* We don't support virtio pmem hot unplug */
-error_setg(errp, "virtio pmem device unplug not supported.");
+/* We don't support hot unplug of virtio based memory devices */
+error_setg(errp, "virtio based memory devices cannot be unplugged.");
 }
 
-static void pc_virtio_pmem_pci_unplug(HotplugHandler *hotplug_dev,
-  DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_unplug(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
 {
-/* We don't support virtio pmem hot unplug */
+/* We don't support hot unplug of virtio based memory devices */
 }
 
 static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
@@ -1720,8 +1722,9 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler 
*hotplug_dev,
 pc_memory_pre_plug(hotplug_dev, dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 pc_cpu_pre_plug(hotplug_dev, dev, errp);
-} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-pc_virtio_pmem_pci_pre_plug(hotplug_dev, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+   object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
 }
 }
 
@@ -1732,8 +1735,9 @@ static void pc_machine_device_plug_cb(HotplugHandler 
*hotplug_dev,
 pc_memory_plug(hotplug_dev, dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 pc_cpu_plug(hotplug_dev, dev, errp);
-} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-pc_virtio_pmem_pci_plug(hotplug_dev, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+   object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+pc_virtio_md_pci_plug(hotplug_dev, dev, errp);
 }
 }
 
@@ -1744,8 +1748,9 @@ sta

[PATCH v1 09/17] linux-headers: update to contain virtio-mem

2020-05-06 Thread David Hildenbrand

To be merged hopefully soon. Then, we can replace this by a proper
header sync.

Cc: "Michael S. Tsirkin" 
Signed-off-by: David Hildenbrand 
---
 include/standard-headers/linux/virtio_ids.h |   1 +
 include/standard-headers/linux/virtio_mem.h | 208 
 2 files changed, 209 insertions(+)
 create mode 100644 include/standard-headers/linux/virtio_mem.h

diff --git a/include/standard-headers/linux/virtio_ids.h 
b/include/standard-headers/linux/virtio_ids.h
index ecc27a1740..b052355ac7 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -44,6 +44,7 @@
 #define VIRTIO_ID_VSOCK19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO   20 /* virtio crypto */
 #define VIRTIO_ID_IOMMU23 /* virtio IOMMU */
+#define VIRTIO_ID_MEM  24 /* virtio mem */
 #define VIRTIO_ID_FS   26 /* virtio filesystem */
 #define VIRTIO_ID_PMEM 27 /* virtio pmem */
 #define VIRTIO_ID_MAC80211_HWSIM 29 /* virtio mac80211-hwsim */
diff --git a/include/standard-headers/linux/virtio_mem.h 
b/include/standard-headers/linux/virtio_mem.h
new file mode 100644
index 00..c28dd40870
--- /dev/null
+++ b/include/standard-headers/linux/virtio_mem.h
@@ -0,0 +1,208 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+/*
+ * Virtio Mem Device
+ *
+ * Copyright Red Hat, Inc. 2020
+ *
+ * Authors:
+ * David Hildenbrand 
+ *
+ * This header is BSD licensed so anyone can use the definitions
+ * to implement compatible drivers/servers:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#ifndef _LINUX_VIRTIO_MEM_H
+#define _LINUX_VIRTIO_MEM_H
+
+#include "standard-headers/linux/types.h"
+#include "standard-headers/linux/virtio_types.h"
+#include "standard-headers/linux/virtio_ids.h"
+#include "standard-headers/linux/virtio_config.h"
+
+/*
+ * Each virtio-mem device manages a dedicated region in physical address
+ * space. Each device can belong to a single NUMA node, multiple devices
+ * for a single NUMA node are possible. A virtio-mem device is like a
+ * "resizable DIMM" consisting of small memory blocks that can be plugged
+ * or unplugged. The device driver is responsible for (un)plugging memory
+ * blocks on demand.
+ *
+ * Virtio-mem devices can only operate on their assigned memory region in
+ * order to (un)plug memory. A device cannot (un)plug memory belonging to
+ * other devices.
+ *
+ * The "region_size" corresponds to the maximum amount of memory that can
+ * be provided by a device. The "size" corresponds to the amount of memory
+ * that is currently plugged. "requested_size" corresponds to a request
+ * from the device to the device driver to (un)plug blocks. The
+ * device driver should try to (un)plug blocks in order to reach the
+ * "requested_size". It is impossible to plug more memory than requested.
+ *
+ * The "usable_region_size" represents the memory region that can actually
+ * be used to (un)plug memory. It is always at least as big as the
+ * "requested_size" and will grow dynamically. It will only shrink when
+ * explicitly triggered (VIRTIO_MEM_REQ_UNPLUG).
+ *
+ * There are no guarantees what will happen if unplugged memory is
+ * read/written. Such memory should, in general, not be touched. E.g.,
+ * even writing might succeed, but the values will simply be discarded at
+ * random points in time.
+ *
+ * It can happen that the device cannot process a request, because it is
+ * busy. The device driver has to retry later.
+ *
+ * Usually, during system resets all memory wi

[PATCH v1 06/17] target/i386: sev: Use ram_block_discard_set_broken()

2020-05-06 Thread David Hildenbrand

AMD SEV will pin all guest memory, mark discarding of RAM broken. At the
time this is called, we cannot have anyone active that relies on discards
to work properly.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: David Hildenbrand 
---
 target/i386/sev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 846018a12d..608225f9ba 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -722,6 +722,7 @@ sev_guest_init(const char *id)
 ram_block_notifier_add(&sev_ram_notifier);
 qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
 qemu_add_vm_change_state_handler(sev_vm_state_change, s);
+g_assert(!ram_block_discard_set_broken(true));
 
 return s;
 err:
-- 
2.25.3

[PATCH v1 17/17] virtio-pci: Send qapi events when the virtio-mem size changes

2020-05-06 Thread David Hildenbrand

Let's register the notifier and trigger the qapi event with the right
device id.

MEMORY_DEVICE_SIZE_CHANGE is similar to BALLOON_CHANGE, however on a
memory device level.

Don't unregister the notifier (we neither have finalize() nor unrealize()
for VirtIOPCIProxy, so it's not that simple to do it) - both devices are
expected to vanish at the same time.

Cc: "Michael S. Tsirkin" 
Cc: Markus Armbruster 
Cc: "Dr. David Alan Gilbert" 
Cc: Eric Blake 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 hw/virtio/virtio-mem-pci.c | 28 
 hw/virtio/virtio-mem-pci.h |  1 +
 hw/virtio/virtio-mem.c |  2 +-
 monitor/monitor.c  |  1 +
 qapi/misc.json | 25 +
 5 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
index a47d21c81f..780d7b4af7 100644
--- a/hw/virtio/virtio-mem-pci.c
+++ b/hw/virtio/virtio-mem-pci.c
@@ -15,6 +15,7 @@
 #include "virtio-mem-pci.h"
 #include "hw/mem/memory-device.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-misc.h"
 
 static void virtio_mem_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
@@ -75,6 +76,21 @@ static void virtio_mem_pci_fill_device_info(const 
MemoryDeviceState *md,
 info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
 }
 
+static void virtio_mem_pci_size_change_notify(Notifier *notifier, void *data)
+{
+VirtIOMEMPCI *pci_mem = container_of(notifier, VirtIOMEMPCI,
+ size_change_notifier);
+DeviceState *dev = DEVICE(pci_mem);
+const uint64_t * const size_p = data;
+const char *id = NULL;
+
+if (dev->id) {
+id = g_strdup(dev->id);
+}
+
+qapi_event_send_memory_device_size_change(!!id, id, *size_p);
+}
+
 static void virtio_mem_pci_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -99,9 +115,21 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, 
void *data)
 static void virtio_mem_pci_instance_init(Object *obj)
 {
 VirtIOMEMPCI *dev = VIRTIO_MEM_PCI(obj);
+VirtIOMEMClass *vmc;
+VirtIOMEM *vmem;
 
 virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
 TYPE_VIRTIO_MEM);
+
+dev->size_change_notifier.notify = virtio_mem_pci_size_change_notify;
+vmem = VIRTIO_MEM(&dev->vdev);
+vmc = VIRTIO_MEM_GET_CLASS(vmem);
+/*
+ * We never remove the notifier again, as we expect both devices to
+ * disappear at the same time.
+ */
+vmc->add_size_change_notifier(vmem, &dev->size_change_notifier);
+
 object_property_add_alias(obj, VIRTIO_MEM_BLOCK_SIZE_PROP,
   OBJECT(&dev->vdev),
   VIRTIO_MEM_BLOCK_SIZE_PROP, &error_abort);
diff --git a/hw/virtio/virtio-mem-pci.h b/hw/virtio/virtio-mem-pci.h
index 8820cd6628..b51a28b275 100644
--- a/hw/virtio/virtio-mem-pci.h
+++ b/hw/virtio/virtio-mem-pci.h
@@ -28,6 +28,7 @@ typedef struct VirtIOMEMPCI VirtIOMEMPCI;
 struct VirtIOMEMPCI {
 VirtIOPCIProxy parent_obj;
 VirtIOMEM vdev;
+Notifier size_change_notifier;
 };
 
 #endif /* QEMU_VIRTIO_MEM_PCI_H */
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 88a99a0d90..eb5cf66855 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -491,7 +491,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev, 
Error **errp)
 virtio_del_queue(vdev, 0);
 virtio_cleanup(vdev);
 g_free(vmem->bitmap);
-ramblock_discard_set_required(false);
+ram_block_discard_set_required(false);
 }
 
 static int virtio_mem_pre_save(void *opaque)
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 125494410a..19dcb8fbe3 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -235,6 +235,7 @@ static MonitorQAPIEventConf 
monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
 [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
 [QAPI_EVENT_QUORUM_FAILURE]= { 1000 * SCALE_MS },
 [QAPI_EVENT_VSERPORT_CHANGE]   = { 1000 * SCALE_MS },
+[QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
 };
 
 /*
diff --git a/qapi/misc.json b/qapi/misc.json
index feaeacec22..58b073562b 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1432,6 +1432,31 @@
 ##
 { 'command': 'query-memory-devices', 'returns': ['MemoryDeviceInfo'] }
 
+##
+# @MEMORY_DEVICE_SIZE_CHANGE:
+#
+# Emitted when the size of a memory device changes. Only emitted for memory
+# devices that can actually change the size (e.g., virtio-mem due to guest
+# action).
+#
+# @id: device's ID
+# @size: the new size of memory that the device provides
+#
+# Note: this event is rate-limited.
+#
+# Since: 5.1
+#
+# Example:
+#
+# <- { "event": "MEMORY_DEVICE_SIZE_CHANGE",
+#  "data": { "id": "vm0", "size": 1073741824},
+#  "timestamp": { "seconds": 1588168529, "microseconds": 201316 } }
+#
+##
+{ 'event': 'MEMORY_DEVICE_SIZE_CHANGE',
+  'data': { '*id': 'str', 'size': 'siz

[PATCH v1 04/17] s390x/pv: Convert to ram_block_discard_set_broken()

2020-05-06 Thread David Hildenbrand

Discarding RAM does not work as expected with protected VMs. Let's
switch to ram_block_discard_set_broken() for now, as we want to get rid
of qemu_balloon_inhibit(). Note that it will currently never fail, but
might fail in the future with new technologies (e.g., virtio-mem).

Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: Janosch Frank 
Signed-off-by: David Hildenbrand 
---
 hw/s390x/s390-virtio-ccw.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 45292fb5a8..883ea392bc 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -43,7 +43,6 @@
 #include "hw/qdev-properties.h"
 #include "hw/s390x/tod.h"
 #include "sysemu/sysemu.h"
-#include "sysemu/balloon.h"
 #include "hw/s390x/pv.h"
 #include "migration/blocker.h"
 
@@ -329,7 +328,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms)
 ms->pv = false;
 migrate_del_blocker(pv_mig_blocker);
 error_free_or_abort(&pv_mig_blocker);
-qemu_balloon_inhibit(false);
+ram_block_discard_set_broken(false);
 }
 
 static int s390_machine_protect(S390CcwMachineState *ms)
@@ -338,17 +337,22 @@ static int s390_machine_protect(S390CcwMachineState *ms)
 int rc;
 
/*
-* Ballooning on protected VMs needs support in the guest for
-* sharing and unsharing balloon pages. Block ballooning for
-* now, until we have a solution to make at least Linux guests
-* either support it or fail gracefully.
+* Discarding of memory in RAM blocks does not work as expected with
+* protected VMs. Sharing and unsharing pages would be required. Mark it as
+* broken for now, until until we have a solution to make at least Linux
+* guests either support it (e.g., virtio-balloon) or fail gracefully.
 */
-qemu_balloon_inhibit(true);
+rc = ram_block_discard_set_broken(true);
+if (rc) {
+error_report("protected VMs: cannot set discarding of RAM broken");
+return rc;
+}
+
 error_setg(&pv_mig_blocker,
"protected VMs are currently not migrateable.");
 rc = migrate_add_blocker(pv_mig_blocker, &local_err);
 if (rc) {
-qemu_balloon_inhibit(false);
+ram_block_discard_set_broken(false);
 error_report_err(local_err);
 error_free_or_abort(&pv_mig_blocker);
 return rc;
@@ -357,7 +361,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
 /* Create SE VM */
 rc = s390_pv_vm_enable();
 if (rc) {
-qemu_balloon_inhibit(false);
+ram_block_discard_set_broken(false);
 migrate_del_blocker(pv_mig_blocker);
 error_free_or_abort(&pv_mig_blocker);
 return rc;
-- 
2.25.3

[PATCH v1 07/17] migration/rdma: Use ram_block_discard_set_broken()

2020-05-06 Thread David Hildenbrand

RDMA will pin all guest memory (as documented in docs/rdma.txt). We want
to mark RAM block discards to be broken - however, to keep it simple
use ram_block_discard_is_required() instead of inhibiting.

Cc: "Michael S. Tsirkin" 
Cc: Juan Quintela 
Cc: "Dr. David Alan Gilbert" 
Signed-off-by: David Hildenbrand 
---
 migration/rdma.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f61587891b..029adbb950 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -29,6 +29,7 @@
 #include "qemu/sockets.h"
 #include "qemu/bitmap.h"
 #include "qemu/coroutine.h"
+#include "exec/memory.h"
 #include 
 #include 
 #include 
@@ -4017,8 +4018,14 @@ void rdma_start_incoming_migration(const char 
*host_port, Error **errp)
 Error *local_err = NULL;
 
 trace_rdma_start_incoming_migration();
-rdma = qemu_rdma_data_init(host_port, &local_err);
 
+/* Avoid ram_block_discard_set_broken(), cannot change during migration. */
+if (ram_block_discard_is_required()) {
+error_setg(errp, "RDMA: cannot set discarding of RAM broken");
+return;
+}
+
+rdma = qemu_rdma_data_init(host_port, &local_err);
 if (rdma == NULL) {
 goto err;
 }
@@ -4064,10 +4071,17 @@ void rdma_start_outgoing_migration(void *opaque,
 const char *host_port, Error **errp)
 {
 MigrationState *s = opaque;
-RDMAContext *rdma = qemu_rdma_data_init(host_port, errp);
 RDMAContext *rdma_return_path = NULL;
+RDMAContext *rdma;
 int ret = 0;
 
+/* Avoid ram_block_discard_set_broken(), cannot change during migration. */
+if (ram_block_discard_is_required()) {
+error_setg(errp, "RDMA: cannot set discarding of RAM broken");
+return;
+}
+
+rdma = qemu_rdma_data_init(host_port, errp);
 if (rdma == NULL) {
 goto err;
 }
-- 
2.25.3

[PATCH v1 16/17] virtio-mem: Allow notifiers for size changes

2020-05-06 Thread David Hildenbrand

We want to send qapi events in case the size of a virtio-mem device
changes. This allows upper layers to always know how much memory is
actually currently consumed via a virtio-mem device.

Unfortuantely, we have to report the id of our proxy device. Let's provide
an easy way for our proxy device to register, so it can send the qapi
events. Piggy-backing on the notifier infrastructure (although we'll
only ever have one notifier registered) seems to be an easy way.

Cc: "Michael S. Tsirkin" 
Cc: "Dr. David Alan Gilbert" 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 hw/virtio/virtio-mem.c | 21 -
 include/hw/virtio/virtio-mem.h |  5 +
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index e25b2c74f2..88a99a0d90 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -198,6 +198,7 @@ static int virtio_mem_state_change_request(VirtIOMEM *vmem, 
uint64_t gpa,
 } else {
 vmem->size -= size;
 }
+notifier_list_notify(&vmem->size_change_notifiers, &vmem->size);
 return VIRTIO_MEM_RESP_ACK;
 }
 
@@ -253,7 +254,10 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem)
 return -EBUSY;
 }
 bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
-vmem->size = 0;
+if (vmem->size != 0) {
+vmem->size = 0;
+notifier_list_notify(&vmem->size_change_notifiers, &vmem->size);
+}
 
 virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
 return 0;
@@ -594,6 +598,18 @@ static MemoryRegion 
*virtio_mem_get_memory_region(VirtIOMEM *vmem, Error **errp)
 return &vmem->memdev->mr;
 }
 
+static void virtio_mem_add_size_change_notifier(VirtIOMEM *vmem,
+Notifier *notifier)
+{
+notifier_list_add(&vmem->size_change_notifiers, notifier);
+}
+
+static void virtio_mem_remove_size_change_notifier(VirtIOMEM *vmem,
+   Notifier *notifier)
+{
+notifier_remove(notifier);
+}
+
 static void virtio_mem_get_size(Object *obj, Visitor *v, const char *name,
 void *opaque, Error **errp)
 {
@@ -705,6 +721,7 @@ static void virtio_mem_instance_init(Object *obj)
 VirtIOMEM *vmem = VIRTIO_MEM(obj);
 
 vmem->block_size = VIRTIO_MEM_MIN_BLOCK_SIZE;
+notifier_list_init(&vmem->size_change_notifiers);
 
 object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size,
 NULL, NULL, NULL, &error_abort);
@@ -743,6 +760,8 @@ static void virtio_mem_class_init(ObjectClass *klass, void 
*data)
 
 vmc->fill_device_info = virtio_mem_fill_device_info;
 vmc->get_memory_region = virtio_mem_get_memory_region;
+vmc->add_size_change_notifier = virtio_mem_add_size_change_notifier;
+vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier;
 }
 
 static const TypeInfo virtio_mem_info = {
diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 27158cb611..5820b5c23e 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -66,6 +66,9 @@ typedef struct VirtIOMEM {
 /* block size and alignment */
 uint32_t block_size;
 uint32_t migration_block_size;
+
+/* notifiers to notify when "size" changes */
+NotifierList size_change_notifiers;
 } VirtIOMEM;
 
 typedef struct VirtIOMEMClass {
@@ -75,6 +78,8 @@ typedef struct VirtIOMEMClass {
 /* public */
 void (*fill_device_info)(const VirtIOMEM *vmen, VirtioMEMDeviceInfo *vi);
 MemoryRegion *(*get_memory_region)(VirtIOMEM *vmem, Error **errp);
+void (*add_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifier);
+void (*remove_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifier);
 } VirtIOMEMClass;
 
 #endif
-- 
2.25.3

Re: [PATCH v2 1/3] acpi: Move build_tpm2() in the generic part

2020-05-06 Thread Auger Eric

Hi,

On 5/6/20 8:33 AM, Andrew Jones wrote:
> On Tue, May 05, 2020 at 04:44:17PM +0200, Eric Auger wrote:
>> We plan to build the tpm2 table on ARM too. In order to reuse the
>> generation code, let's move build_tpm2() to aml-build.c.
>>
>> No change in the implementation.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  include/hw/acpi/aml-build.h |  2 ++
>>  hw/acpi/aml-build.c | 30 ++
>>  hw/i386/acpi-build.c| 30 --
>>  3 files changed, 32 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index 0f4ed53d7f..a67ab4618a 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -437,4 +437,6 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
>> MachineState *ms);
>>  
>>  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
>>  const char *oem_id, const char *oem_table_id);
>> +
>> +void build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog);
>>  #endif
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 2c3702b882..1f7fd09112 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -26,6 +26,7 @@
>>  #include "qemu/bitops.h"
>>  #include "sysemu/numa.h"
>>  #include "hw/boards.h"
>> +#include "hw/acpi/tpm.h"
>>  
>>  static GArray *build_alloc_array(void)
>>  {
>> @@ -1875,6 +1876,35 @@ build_hdr:
>>   "FACP", tbl->len - fadt_start, f->rev, oem_id, 
>> oem_table_id);
>>  }
>>  
>> +void build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
>> +{
>> +Acpi20TPM2 *tpm2_ptr = acpi_data_push(table_data, sizeof *tpm2_ptr);
>> +unsigned log_addr_size = sizeof(tpm2_ptr->log_area_start_address);
>> +unsigned log_addr_offset =
>> +(char *)&tpm2_ptr->log_area_start_address - table_data->data;
>> +
>> +tpm2_ptr->platform_class = cpu_to_le16(TPM2_ACPI_CLASS_CLIENT);
>> +if (TPM_IS_TIS_ISA(tpm_find())) {
>> +tpm2_ptr->control_area_address = cpu_to_le64(0);
>> +tpm2_ptr->start_method = cpu_to_le32(TPM2_START_METHOD_MMIO);
>> +} else if (TPM_IS_CRB(tpm_find())) {
>> +tpm2_ptr->control_area_address = cpu_to_le64(TPM_CRB_ADDR_CTRL);
>> +tpm2_ptr->start_method = cpu_to_le32(TPM2_START_METHOD_CRB);
>> +} else {
>> +g_warn_if_reached();
>> +}
>> +
>> +tpm2_ptr->log_area_minimum_length =
>> +cpu_to_le32(TPM_LOG_AREA_MINIMUM_SIZE);
>> +
>> +/* log area start address to be filled by Guest linker */
>> +bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>> +   log_addr_offset, log_addr_size,
>> +   ACPI_BUILD_TPMLOG_FILE, 0);
>> +build_header(linker, table_data,
>> + (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, 
>> NULL);
>> +}
>> +
> 
> I'll let Igor and mst confirm/deny this, but my understanding was that the
> build_append* API was the preferred way to create the table. Indeed, I
> don't see too many table.field = cpu_to_le(...) lines in aml-build.c
> 
> I realize this function is just getting moved, but maybe it should get
> converted to the build_append* API while being moved?

The reason I did not convert is that the struct is as follows

struct Acpi20TPM2 {
ACPI_TABLE_HEADER_DEF
uint16_t platform_class;
uint16_t reserved;
uint64_t control_area_address;
uint32_t start_method;
uint8_t start_method_params[12];
uint32_t log_area_minimum_length;
uint64_t log_area_start_address;
} QEMU_PACKED;


If I understand correctly the build_append* adds the fields
contiguously. It was not straightforward to me how to skip the
start_method_params array.

While we are at it the tcpalog arg is not used. Shall I remove it?

Thanks

Eric

> 
> Thanks,
> drew
> 
>

[PATCH v1 08/17] migration/colo: Use ram_block_discard_set_broken()

2020-05-06 Thread David Hildenbrand

COLO will copy all memory in a RAM block, mark discarding of RAM broken.

Cc: "Michael S. Tsirkin" 
Cc: Hailiang Zhang 
Cc: Juan Quintela 
Cc: "Dr. David Alan Gilbert" 
Signed-off-by: David Hildenbrand 
---
 include/migration/colo.h |  2 +-
 migration/migration.c|  8 +++-
 migration/savevm.c   | 11 +--
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 1636e6f907..768e1f04c3 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -25,7 +25,7 @@ void migrate_start_colo_process(MigrationState *s);
 bool migration_in_colo_state(void);
 
 /* loadvm */
-void migration_incoming_enable_colo(void);
+int migration_incoming_enable_colo(void);
 void migration_incoming_disable_colo(void);
 bool migration_incoming_colo_enabled(void);
 void *colo_process_incoming_thread(void *opaque);
diff --git a/migration/migration.c b/migration/migration.c
index 177cce9e95..f6830e4620 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -338,12 +338,18 @@ bool migration_incoming_colo_enabled(void)
 
 void migration_incoming_disable_colo(void)
 {
+ram_block_discard_set_broken(false);
 migration_colo_enabled = false;
 }
 
-void migration_incoming_enable_colo(void)
+int migration_incoming_enable_colo(void)
 {
+if (ram_block_discard_set_broken(true)) {
+error_report("COLO: cannot set discarding of RAM broken");
+return -EBUSY;
+}
 migration_colo_enabled = true;
+return 0;
 }
 
 void migrate_add_address(SocketAddress *address)
diff --git a/migration/savevm.c b/migration/savevm.c
index c00a6807d9..19b4f9600d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2111,8 +2111,15 @@ static int 
loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
 
 static int loadvm_process_enable_colo(MigrationIncomingState *mis)
 {
-migration_incoming_enable_colo();
-return colo_init_ram_cache();
+int ret = migration_incoming_enable_colo();
+
+if (!ret) {
+ret = colo_init_ram_cache();
+if (ret) {
+migration_incoming_disable_colo();
+}
+}
+return ret;
 }
 
 /*
-- 
2.25.3

[PATCH v1 05/17] virtio-balloon: Rip out qemu_balloon_inhibit()

2020-05-06 Thread David Hildenbrand

The only remaining special case is postcopy. It cannot handle
concurrent discards yet, which would result in requesting already sent
pages from the source. Special-case it in virtio-balloon instead.

Cc: "Michael S. Tsirkin" 
Cc: Juan Quintela 
Cc: "Dr. David Alan Gilbert" 
Signed-off-by: David Hildenbrand 
---
 balloon.c  | 18 --
 hw/virtio/virtio-balloon.c | 12 +++-
 include/sysemu/balloon.h   |  2 --
 migration/postcopy-ram.c   | 23 ---
 4 files changed, 11 insertions(+), 44 deletions(-)

diff --git a/balloon.c b/balloon.c
index c49f57c27b..354408c6ea 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,24 +36,6 @@
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
 static void *balloon_opaque;
-static int balloon_inhibit_count;
-
-bool qemu_balloon_is_inhibited(void)
-{
-return atomic_read(&balloon_inhibit_count) > 0 ||
-   ram_block_discard_is_broken();
-}
-
-void qemu_balloon_inhibit(bool state)
-{
-if (state) {
-atomic_inc(&balloon_inhibit_count);
-} else {
-atomic_dec(&balloon_inhibit_count);
-}
-
-assert(atomic_read(&balloon_inhibit_count) >= 0);
-}
 
 static bool have_balloon(Error **errp)
 {
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc9..aa5b89fb47 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -29,6 +29,7 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "migration/misc.h"
+#include "migration/postcopy-ram.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -63,6 +64,15 @@ static bool 
virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
 return pbp->base_gpa == base_gpa;
 }
 
+static bool virtio_balloon_inhibited(void)
+{
+PostcopyState ps = postcopy_state_get();
+
+/* Postcopy cannot deal with concurrent discards (yet), so it's special. */
+return ram_block_discard_is_broken() ||
+   (ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END);
+}
+
 static void balloon_inflate_page(VirtIOBalloon *balloon,
  MemoryRegion *mr, hwaddr mr_offset,
  PartiallyBalloonedPage *pbp)
@@ -360,7 +370,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 
 trace_virtio_balloon_handle_output(memory_region_name(section.mr),
pa);
-if (!qemu_balloon_is_inhibited()) {
+if (!virtio_balloon_inhibited()) {
 if (vq == s->ivq) {
 balloon_inflate_page(s, section.mr,
  section.offset_within_region, &pbp);
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index aea0c44985..20a2defe3a 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -23,7 +23,5 @@ typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo 
*info);
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
  QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
-bool qemu_balloon_is_inhibited(void);
-void qemu_balloon_inhibit(bool state);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index a36402722b..b41a9fe2fd 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -27,7 +27,6 @@
 #include "qemu/notify.h"
 #include "qemu/rcu.h"
 #include "sysemu/sysemu.h"
-#include "sysemu/balloon.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 #include "hw/boards.h"
@@ -520,20 +519,6 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis)
 return 0;
 }
 
-/*
- * Manage a single vote to the QEMU balloon inhibitor for all postcopy usage,
- * last caller wins.
- */
-static void postcopy_balloon_inhibit(bool state)
-{
-static bool cur_state = false;
-
-if (state != cur_state) {
-qemu_balloon_inhibit(state);
-cur_state = state;
-}
-}
-
 /*
  * At the end of a migration where postcopy_ram_incoming_init was called.
  */
@@ -565,8 +550,6 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState 
*mis)
 mis->have_fault_thread = false;
 }
 
-postcopy_balloon_inhibit(false);
-
 if (enable_mlock) {
 if (os_mlock() < 0) {
 error_report("mlock: %s", strerror(errno));
@@ -1160,12 +1143,6 @@ int postcopy_ram_incoming_setup(MigrationIncomingState 
*mis)
 }
 memset(mis->postcopy_tmp_zero_page, '\0', mis->largest_page_size);
 
-/*
- * Ballooning can mark pages as absent while we're postcopying
- * that would cause false userfaults.
- */
-postcopy_balloon_inhibit(true);
-
 trace_postcopy_ram_enable_notify();
 
 return 0;
-- 
2.25.3

[PATCH v1 13/17] hmp: Handle virtio-mem when printing memory device info

2020-05-06 Thread David Hildenbrand

Print the memory device info just like for other memory devices.

Cc: "Dr. David Alan Gilbert" 
Cc: "Michael S. Tsirkin" 
Signed-off-by: David Hildenbrand 
---
 monitor/hmp-cmds.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 7f6e982dc8..4b3638a2a6 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1805,6 +1805,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
*qdict)
 MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err);
 MemoryDeviceInfoList *info;
 VirtioPMEMDeviceInfo *vpi;
+VirtioMEMDeviceInfo *vmi;
 MemoryDeviceInfo *value;
 PCDIMMDeviceInfo *di;
 
@@ -1839,6 +1840,21 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "  size: %" PRIu64 "\n", vpi->size);
 monitor_printf(mon, "  memdev: %s\n", vpi->memdev);
 break;
+case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM:
+vmi = value->u.virtio_mem.data;
+monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
+   MemoryDeviceInfoKind_str(value->type),
+   vmi->id ? vmi->id : "");
+monitor_printf(mon, "  memaddr: 0x%" PRIx64 "\n", 
vmi->memaddr);
+monitor_printf(mon, "  node: %" PRId64 "\n", vmi->node);
+monitor_printf(mon, "  requested-size: %" PRIu64 "\n",
+   vmi->requested_size);
+monitor_printf(mon, "  size: %" PRIu64 "\n", vmi->size);
+monitor_printf(mon, "  max-size: %" PRIu64 "\n", 
vmi->max_size);
+monitor_printf(mon, "  block-size: %" PRIu64 "\n",
+   vmi->block_size);
+monitor_printf(mon, "  memdev: %s\n", vmi->memdev);
+break;
 default:
 g_assert_not_reached();
 }
-- 
2.25.3

[PATCH v1 11/17] virtio-pci: Proxy for virtio-mem

2020-05-06 Thread David Hildenbrand

Let's add a proxy for virtio-mem, make it a memory device, and
pass-through the properties.

Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: "Dr. David Alan Gilbert" 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 hw/virtio/Makefile.objs|   1 +
 hw/virtio/virtio-mem-pci.c | 131 +
 hw/virtio/virtio-mem-pci.h |  33 ++
 include/hw/pci/pci.h   |   1 +
 4 files changed, 166 insertions(+)
 create mode 100644 hw/virtio/virtio-mem-pci.c
 create mode 100644 hw/virtio/virtio-mem-pci.h

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 7df70e977e..b9661f9c01 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -19,6 +19,7 @@ obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) 
+= vhost-user-fs-p
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
 obj-$(CONFIG_VIRTIO_MEM) += virtio-mem.o
+common-obj-$(call land,$(CONFIG_VIRTIO_MEM),$(CONFIG_VIRTIO_PCI)) += 
virtio-mem-pci.o
 
 ifeq ($(CONFIG_VIRTIO_PCI),y)
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock-pci.o
diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
new file mode 100644
index 00..a47d21c81f
--- /dev/null
+++ b/hw/virtio/virtio-mem-pci.c
@@ -0,0 +1,131 @@
+/*
+ * Virtio MEM PCI device
+ *
+ * Copyright (C) 2020 Red Hat, Inc.
+ *
+ * Authors:
+ *  David Hildenbrand 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "virtio-mem-pci.h"
+#include "hw/mem/memory-device.h"
+#include "qapi/error.h"
+
+static void virtio_mem_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+VirtIOMEMPCI *mem_pci = VIRTIO_MEM_PCI(vpci_dev);
+DeviceState *vdev = DEVICE(&mem_pci->vdev);
+
+qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
+object_property_set_bool(OBJECT(vdev), true, "realized", errp);
+}
+
+static void virtio_mem_pci_set_addr(MemoryDeviceState *md, uint64_t addr,
+Error **errp)
+{
+object_property_set_uint(OBJECT(md), addr, VIRTIO_MEM_ADDR_PROP, errp);
+}
+
+static uint64_t virtio_mem_pci_get_addr(const MemoryDeviceState *md)
+{
+return object_property_get_uint(OBJECT(md), VIRTIO_MEM_ADDR_PROP,
+&error_abort);
+}
+
+static MemoryRegion *virtio_mem_pci_get_memory_region(MemoryDeviceState *md,
+  Error **errp)
+{
+VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md);
+VirtIOMEM *vmem = VIRTIO_MEM(&pci_mem->vdev);
+VirtIOMEMClass *vmc = VIRTIO_MEM_GET_CLASS(vmem);
+
+return vmc->get_memory_region(vmem, errp);
+}
+
+static uint64_t virtio_mem_pci_get_plugged_size(const MemoryDeviceState *md,
+Error **errp)
+{
+return object_property_get_uint(OBJECT(md), VIRTIO_MEM_SIZE_PROP,
+errp);
+}
+
+static void virtio_mem_pci_fill_device_info(const MemoryDeviceState *md,
+MemoryDeviceInfo *info)
+{
+VirtioMEMDeviceInfo *vi = g_new0(VirtioMEMDeviceInfo, 1);
+VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md);
+VirtIOMEM *vmem = VIRTIO_MEM(&pci_mem->vdev);
+VirtIOMEMClass *vpc = VIRTIO_MEM_GET_CLASS(vmem);
+DeviceState *dev = DEVICE(md);
+
+if (dev->id) {
+vi->has_id = true;
+vi->id = g_strdup(dev->id);
+}
+
+/* let the real device handle everything else */
+vpc->fill_device_info(vmem, vi);
+
+info->u.virtio_mem.data = vi;
+info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
+}
+
+static void virtio_mem_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
+
+k->realize = virtio_mem_pci_realize;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_MEM;
+pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+pcidev_k->class_id = PCI_CLASS_OTHERS;
+
+mdc->get_addr = virtio_mem_pci_get_addr;
+mdc->set_addr = virtio_mem_pci_set_addr;
+mdc->get_plugged_size = virtio_mem_pci_get_plugged_size;
+mdc->get_memory_region = virtio_mem_pci_get_memory_region;
+mdc->fill_device_info = virtio_mem_pci_fill_device_info;
+}
+
+static void virtio_mem_pci_instance_init(Object *obj)
+{
+VirtIOMEMPCI *dev = VIRTIO_MEM_PCI(obj);
+
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_MEM);
+object_property_add_alias(obj, VIRTIO_MEM_BLOCK_SIZE_PROP,
+  OBJECT(&dev->vdev),
+  VIRTIO_MEM_BLOCK_SIZE_PROP, &error_abort);
+object_property_add_a

[PATCH v1 14/17] numa: Handle virtio-mem in NUMA stats

2020-05-06 Thread David Hildenbrand

Account the memory to the configured nid.

Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Michael S. Tsirkin" 
Signed-off-by: David Hildenbrand 
---
 hw/core/numa.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 316bc50d75..06960918e7 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -812,6 +812,7 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
 MemoryDeviceInfoList *info;
 PCDIMMDeviceInfo *pcdimm_info;
 VirtioPMEMDeviceInfo *vpi;
+VirtioMEMDeviceInfo *vmi;
 
 for (info = info_list; info; info = info->next) {
 MemoryDeviceInfo *value = info->value;
@@ -832,6 +833,11 @@ static void numa_stat_memory_devices(NumaNodeMem 
node_mem[])
 node_mem[0].node_mem += vpi->size;
 node_mem[0].node_plugged_mem += vpi->size;
 break;
+case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM:
+vmi = value->u.virtio_mem.data;
+node_mem[vmi->node].node_mem += vmi->size;
+node_mem[vmi->node].node_plugged_mem += vmi->size;
+break;
 default:
 g_assert_not_reached();
 }
-- 
2.25.3

Re: [PATCH v2 1/3] acpi: Move build_tpm2() in the generic part

2020-05-06 Thread Michael S. Tsirkin

On Wed, May 06, 2020 at 08:33:14AM +0200, Andrew Jones wrote:
> I realize this function is just getting moved, but maybe it should get
> converted to the build_append* API while being moved?

I'd rather refactoring was done in a separate patch -
easier to review.

-- 
MST

Re: [PATCH RFC 1/4] migration: Export migration_bitmap_sync_precopy()

2020-05-06 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> Make it usable outside migration.  To make it easier to use, remove the
> RAMState parameter since after all ram.c has the reference of ram_state
> directly from its context.
> 
> Signed-off-by: Peter Xu 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  include/migration/misc.h |  1 +
>  migration/ram.c  | 10 +-
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index d2762257aa..e338be8c30 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -66,6 +66,7 @@ void remove_migration_state_change_notifier(Notifier 
> *notify);
>  bool migration_in_setup(MigrationState *);
>  bool migration_has_finished(MigrationState *);
>  bool migration_has_failed(MigrationState *);
> +void migration_bitmap_sync_precopy(void);
>  /* ...and after the device transmission */
>  bool migration_in_postcopy_after_devices(MigrationState *);
>  void migration_global_dump(Monitor *mon);
> diff --git a/migration/ram.c b/migration/ram.c
> index 04f13feb2e..d737175d4e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -970,7 +970,7 @@ static void migration_bitmap_sync(RAMState *rs)
>  }
>  }
>  
> -static void migration_bitmap_sync_precopy(RAMState *rs)
> +void migration_bitmap_sync_precopy(void)
>  {
>  Error *local_err = NULL;
>  
> @@ -983,7 +983,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
>  local_err = NULL;
>  }
>  
> -migration_bitmap_sync(rs);
> +migration_bitmap_sync(ram_state);
>  
>  if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, &local_err)) {
>  error_report_err(local_err);
> @@ -2303,7 +2303,7 @@ static void ram_init_bitmaps(RAMState *rs)
>  WITH_RCU_READ_LOCK_GUARD() {
>  ram_list_init_bitmaps();
>  memory_global_dirty_log_start();
> -migration_bitmap_sync_precopy(rs);
> +migration_bitmap_sync_precopy();
>  }
>  qemu_mutex_unlock_ramlist();
>  qemu_mutex_unlock_iothread();
> @@ -2592,7 +2592,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>  
>  WITH_RCU_READ_LOCK_GUARD() {
>  if (!migration_in_postcopy()) {
> -migration_bitmap_sync_precopy(rs);
> +migration_bitmap_sync_precopy();
>  }
>  
>  ram_control_before_iterate(f, RAM_CONTROL_FINISH);
> @@ -2642,7 +2642,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
> uint64_t max_size,
>  remaining_size < max_size) {
>  qemu_mutex_lock_iothread();
>  WITH_RCU_READ_LOCK_GUARD() {
> -migration_bitmap_sync_precopy(rs);
> +migration_bitmap_sync_precopy();
>  }
>  qemu_mutex_unlock_iothread();
>  remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

RE: [PATCH v2 01/10] net: cadence_gem: Fix debug statements

2020-05-06 Thread Sai Pavan Boddu

Hi Edgar,

Below comments will be taken care in V3.

Thanks,
Sai Pavan

> -Original Message-
> From: Edgar E. Iglesias 
> Sent: Monday, May 4, 2020 8:09 PM
> To: Sai Pavan Boddu 
> Cc: Alistair Francis ; Peter Maydell
> ; Jason Wang ; Markus
> Armbruster ; Philippe Mathieu-Daudé
> ; Tong Ho ; Ramon Fried
> ; qemu-...@nongnu.org; qemu-
> de...@nongnu.org
> Subject: Re: [PATCH v2 01/10] net: cadence_gem: Fix debug statements
> 
> On Mon, May 04, 2020 at 07:35:59PM +0530, Sai Pavan Boddu wrote:
> > Enabling debug breaks the build, Fix them and make debug statements
> > always compilable. Fix few statements to use sized integer casting.
> >
> > Signed-off-by: Sai Pavan Boddu 
> > ---
> >  hw/net/cadence_gem.c | 28 ++--
> >  1 file changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c index
> > 22a0b1b..2f244eb 100644
> > --- a/hw/net/cadence_gem.c
> > +++ b/hw/net/cadence_gem.c
> > @@ -35,14 +35,13 @@
> >  #include "sysemu/dma.h"
> >  #include "net/checksum.h"
> >
> > -#ifdef CADENCE_GEM_ERR_DEBUG
> > -#define DB_PRINT(...) do { \
> > -fprintf(stderr,  ": %s: ", __func__); \
> > -fprintf(stderr, ## __VA_ARGS__); \
> > -} while (0)
> > -#else
> > -#define DB_PRINT(...)
> > -#endif
> > +#define CADENCE_GEM_ERR_DEBUG 0
> > +#define DB_PRINT(...) do {\
> > +if (CADENCE_GEM_ERR_DEBUG) {   \
> > +qemu_log(": %s: ", __func__); \
> > +qemu_log(__VA_ARGS__); \
> > +} \
> > +} while (0)
> >
> >  #define GEM_NWCTRL(0x/4) /* Network Control reg */
> >  #define GEM_NWCFG (0x0004/4) /* Network Config reg */
> > @@ -979,7 +978,8 @@ static ssize_t gem_receive(NetClientState *nc,
> const uint8_t *buf, size_t size)
> >  size += 4;
> >  }
> >
> > -DB_PRINT("config bufsize: %d packet size: %ld\n", rxbufsize, size);
> > +DB_PRINT("config bufsize: %" PRIu64 " packet size: %" PRIu64 "\n",
> > + (uint64_t) rxbufsize, (uint64_t) size);
> 
> Shouldn't these be %u and %zd rather than casting to uint64_t?
> 
> 
> >
> >  /* Find which queue we are targeting */
> >  q = get_queue_from_screen(s, rxbuf_ptr, rxbufsize); @@ -992,9
> > +992,9 @@ static ssize_t gem_receive(NetClientState *nc, const uint8_t
> *buf, size_t size)
> >  return -1;
> >  }
> >
> > -DB_PRINT("copy %u bytes to 0x%" PRIx64 "\n",
> > - MIN(bytes_to_copy, rxbufsize),
> > - rx_desc_get_buffer(s, s->rx_desc[q]));
> > +DB_PRINT("copy %" PRIu32 " bytes to 0x%" PRIx64 "\n",
> > +MIN(bytes_to_copy, rxbufsize),
> > +rx_desc_get_buffer(s, s->rx_desc[q] + rxbuf_offset));
> 
> Looks like this is changing what we print (+ rxbuf_offset), was that
> intentional? (it was not mentioned in the commit message)
> 
> 
> >
> >  /* Copy packet data to emulated DMA buffer */
> >  address_space_write(&s->dma_as, rx_desc_get_buffer(s,
> > s->rx_desc[q]) + @@ -1160,8 +1160,8 @@ static void
> gem_transmit(CadenceGEMState *s)
> >   */
> >  if ((tx_desc_get_buffer(s, desc) == 0) ||
> >  (tx_desc_get_length(desc) == 0)) {
> > -DB_PRINT("Invalid TX descriptor @ 0x%x\n",
> > - (unsigned)packet_desc_addr);
> > +DB_PRINT("Invalid TX descriptor @ 0x%" HWADDR_PRIx "\n",
> > + packet_desc_addr);
> >  break;
> >  }
> >
> > --
> > 2.7.4
> >

[PATCH v1 10/17] virtio-mem: Paravirtualized memory hot(un)plug

2020-05-06 Thread David Hildenbrand

This is the very basic/initial version of virtio-mem. An introduction to
virtio-mem can be found in the Linux kernel driver [1]. While it can be
used in the current state for hotplug of a smaller amount of memory, it
will heavily benefit from resizeable memory regions in the future.

Each virtio-mem device manages a memory region (provided via a memory
backend). After requested by the hypervisor ("requested-size"), the
guest can try to plug/unplug blocks of memory within that region, in order
to reach the requested size. Initially, and after a reboot, all memory is
unplugged (except in special cases - reboot during postcopy).

The guest may only try to plug/unplug blocks of memory within the usable
region size. The usable region size is a little bigger than the
requested size, to give the device driver some flexibility. The usable
region size will only grow, except on reboots or when all memory is
requested to get unplugged. The guest can never plug more memory than
requested. Unplugged memory will get zapped/discarded, similar to in a
balloon device.

The block size is variable, however, it is always chosen in a way such that
THP splits are avoided (e.g., 2MB). The state of each block
(plugged/unplugged) is tracked in a bitmap.

As virtio-mem devices (e.g., virtio-mem-pci) will be memory devices, we now
expose "VirtioMEMDeviceInfo" via "query-memory-devices".

--

There are two important follow-up items that are in the works:
1. Resizeable memory regions: Use resizeable allocations/RAM blocks to
   grow/shrink along with the usable region size. This avoids creating
   initially very big VMAs, RAM blocks, and KVM slots.
2. Protection of unplugged memory: Make sure the gust cannot actually
   make use of unplugged memory.

Other follow-up items that are in the works:
1. Exclude unplugged memory during migration (via precopy notifier).
2. Handle remapping of memory.
3. Support for other architectures.

--

Example usage (virtio-mem-pci is introduced in follow-up patches):

Start QEMU with two virtio-mem devices (one per NUMA node):
 $ qemu-system-x86_64 -m 4G,maxmem=20G \
  -smp sockets=2,cores=2 \
  -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 \
  [...]
  -object memory-backend-ram,id=mem0,size=8G \
  -device virtio-mem-pci,id=vm0,memdev=mem0,node=0,requested-size=0M \
  -object memory-backend-ram,id=mem1,size=8G \
  -device virtio-mem-pci,id=vm1,memdev=mem1,node=1,requested-size=1G

Query the configuration:
 (qemu) info memory-devices
 Memory device [virtio-mem]: "vm0"
   memaddr: 0x14000
   node: 0
   requested-size: 0
   size: 0
   max-size: 8589934592
   block-size: 2097152
   memdev: /objects/mem0
 Memory device [virtio-mem]: "vm1"
   memaddr: 0x34000
   node: 1
   requested-size: 1073741824
   size: 1073741824
   max-size: 8589934592
   block-size: 2097152
   memdev: /objects/mem1

Add some memory to node 0:
 (qemu) qom-set vm0 requested-size 500M

Remove some memory from node 1:
 (qemu) qom-set vm1 requested-size 200M

Query the configuration again:
 (qemu) info memory-devices
 Memory device [virtio-mem]: "vm0"
   memaddr: 0x14000
   node: 0
   requested-size: 524288000
   size: 524288000
   max-size: 8589934592
   block-size: 2097152
   memdev: /objects/mem0
 Memory device [virtio-mem]: "vm1"
   memaddr: 0x34000
   node: 1
   requested-size: 209715200
   size: 209715200
   max-size: 8589934592
   block-size: 2097152
   memdev: /objects/mem1

[1] https://lkml.kernel.org/r/20200311171422.10484-1-da...@redhat.com

Cc: "Michael S. Tsirkin" 
Cc: Eric Blake 
Cc: Markus Armbruster 
Cc: "Dr. David Alan Gilbert" 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 hw/virtio/Kconfig  |  11 +
 hw/virtio/Makefile.objs|   1 +
 hw/virtio/virtio-mem.c | 762 +
 include/hw/virtio/virtio-mem.h |  80 
 qapi/misc.json |  39 +-
 5 files changed, 892 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/virtio-mem.c
 create mode 100644 include/hw/virtio/virtio-mem.h

diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
index 83122424fa..0eda25c4e1 100644
--- a/hw/virtio/Kconfig
+++ b/hw/virtio/Kconfig
@@ -47,3 +47,14 @@ config VIRTIO_PMEM
 depends on VIRTIO
 depends on VIRTIO_PMEM_SUPPORTED
 select MEM_DEVICE
+
+config VIRTIO_MEM_SUPPORTED
+bool
+
+config VIRTIO_MEM
+bool
+default y
+depends on VIRTIO
+depends on LINUX
+depends on VIRTIO_MEM_SUPPORTED
+select MEM_DEVICE
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 4e4d39a0a4..7df70e977e 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -18,6 +18,7 @@ common-obj-$(call 
land,$(CONFIG_VIRTIO_PMEM),$(CONFIG_VIRTIO_PCI)) += virtio-pme
 obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) += 
vhost-user-fs-pci.o
 obj-$(CONFIG_V

Re: [PATCH 1/5] roms/opensbi: Update to support building bios images for generic platform

2020-05-06 Thread Bin Meng

Hi Alistair,

On Tue, May 5, 2020 at 12:00 AM Alistair Francis  wrote:
>
> On Mon, May 4, 2020 at 1:05 AM Bin Meng  wrote:
> >
> > On Mon, May 4, 2020 at 4:00 PM Bin Meng  wrote:
> > >
> > > Hi Anup,
> > >
> > > On Mon, May 4, 2020 at 3:52 PM Anup Patel  wrote:
> > > >
> > > > On Mon, May 4, 2020 at 12:46 PM Bin Meng  wrote:
> > > > >
> > > > > Hi Anup,
> > > > >
> > > > > On Sun, May 3, 2020 at 12:38 PM Anup Patel  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 1, 2020 at 9:26 PM Bin Meng  wrote:
> > > > > > >
> > > > > > > From: Bin Meng 
> > > > > > >
> > > > > > > The RISC-V generic platform is a flattened device tree (FDT) based
> > > > > > > platform where all platform specific functionality is provided 
> > > > > > > based
> > > > > > > on FDT passed by previous booting stage. The support was added in
> > > > > > > upstream opensbi recently.
> > > > > > >
> > > > > > > Bring the QEMU roms/opensbi submodule to the upstream opensbi 
> > > > > > > commit:
> > > > > > > commit 4f18c6e55049 ("platform: generic: Add Sifive FU540 TLB 
> > > > > > > flush range limit override")
> > > > > > > with the following changes since v0.7 release:
> > > > > > >
> > > > > > >   1bb00ab lib: No need to provide default PMP region using 
> > > > > > > platform callbacks
> > > > > > >   a9eac67 include: sbi_platform: Combine reboot and shutdown into 
> > > > > > > one callback
> > > > > > >   6585fab lib: utils: Add SiFive test device
> > > > > > >   4781545 platform: Add Nuclei UX600 platform
> > > > > > >   3a326af scripts: adapt binary archive script for Nuclei UX600
> > > > > > >   5bdf022 firmware: fw_base: Remove CSR_MTVEC update check
> > > > > > >   e6c1345 lib: utils/serial: Skip baudrate config if input 
> > > > > > > frequency is zero
> > > > > > >   01a8c8e lib: utils: Improve fdt_parse_uart8250() API
> > > > > > >   0a0093b lib: utils: Add fdt_parse_uart8250_node() function
> > > > > > >   243b0d0 lib: utils: Remove redundant clint_ipi_sync() 
> > > > > > > declaration
> > > > > > >   e3ad7c1 lib: utils: Rename fdt_parse_clint() to 
> > > > > > > fdt_parse_compat_addr()
> > > > > > >   a39cd6f lib: utils: Add FDT match table based node lookup
> > > > > > >   dd33b9e lib: utils: Make fdt_get_node_addr_size() public 
> > > > > > > function
> > > > > > >   66185b3 lib: utils: Add fdt_parse_sifive_uart_node() function
> > > > > > >   19e966b lib: utils: Add fdt_parse_hart_id() function
> > > > > > >   44dd7be lib: utils: Add fdt_parse_max_hart_id() API
> > > > > > >   f0eb503 lib: utils: Add fdt_parse_plic_node() function
> > > > > > >   1ac794c include: Add array_size() macro
> > > > > > >   8ff2b94 lib: utils: Add simple FDT timer framework
> > > > > > >   76f0f81 lib: utils: Add simple FDT ipi framework
> > > > > > >   75322a6 lib: utils: Add simple FDT irqchip framework
> > > > > > >   76a8940 lib: utils: Add simple FDT serial framework
> > > > > > >   7cc6fa4 lib: utils: Add simple FDT reset framework
> > > > > > >   4d06353 firmware: fw_base: Introduce optional fw_platform_init()
> > > > > > >   f1aa9e5 platform: Add generic FDT based platform support
> > > > > > >   1f21b99 lib: sbi: Print platform hart count at boot time
> > > > > > >   2ba7087 scripts: Add generic platform to 
> > > > > > > create-binary-archive.sh
> > > > > > >   4f18c6e platform: generic: Add Sifive FU540 TLB flush range 
> > > > > > > limit override
> > > > > > >
> > > > > > > Update our Makefile to build the generic platform instead of 
> > > > > > > building
> > > > > > > virt and sifive_u separately.
>
> Hey Bin,
>
> Thanks for the patch!
>
> I don't think we can take this update though, as we should stick with
> OpenSBI releases.

Sure, will delay this series once OpenSBI v0.8 release is out.

>
> > > > > > >
> > > > > > > Signed-off-by: Bin Meng 
> > > > > > > ---
> > > > > > >
> > > > > > >  roms/Makefile | 30 --
> > > > > > >  roms/opensbi  |  2 +-
> > > > > > >  2 files changed, 9 insertions(+), 23 deletions(-)
> > > > > > >
> > > > > > > diff --git a/roms/Makefile b/roms/Makefile
> > > > > > > index f9acf39..cb00628 100644
> > > > > > > --- a/roms/Makefile
> > > > > > > +++ b/roms/Makefile
> > > > > > > @@ -64,10 +64,8 @@ default help:
> > > > > > > @echo "  u-boot.e500-- update u-boot.e500"
> > > > > > > @echo "  u-boot.sam460  -- update u-boot.sam460"
> > > > > > > @echo "  efi-- update UEFI (edk2) 
> > > > > > > platform firmware"
> > > > > > > -   @echo "  opensbi32-virt -- update OpenSBI for 32-bit 
> > > > > > > virt machine"
> > > > > > > -   @echo "  opensbi64-virt -- update OpenSBI for 64-bit 
> > > > > > > virt machine"
> > > > > > > -   @echo "  opensbi32-sifive_u -- update OpenSBI for 32-bit 
> > > > > > > sifive_u machine"
> > > > > > > -   @echo "  opensbi64-sifive_u -- update OpenSBI for 64-bit 
> > > > > > > sifive_u machine"
> > > > > > > +   @echo "  opensbi32-generic  -- update OpenSBI for 32-b

[PATCH v1 12/17] MAINTAINERS: Add myself as virtio-mem maintainer

2020-05-06 Thread David Hildenbrand

Let's make sure patches/bug reports find the right person.

Cc: "Michael S. Tsirkin" 
Cc: Peter Maydell 
Cc: Markus Armbruster 
Signed-off-by: David Hildenbrand 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1f84e3ae2c..09fff9e1bd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1734,6 +1734,14 @@ F: hw/virtio/virtio-crypto.c
 F: hw/virtio/virtio-crypto-pci.c
 F: include/hw/virtio/virtio-crypto.h
 
+virtio-mem
+M: David Hildenbrand 
+S: Supported
+F: hw/virtio/virtio-mem.c
+F: hw/virtio/virtio-mem-pci.h
+F: hw/virtio/virtio-mem-pci.c
+F: include/hw/virtio/virtio-mem.h
+
 nvme
 M: Keith Busch 
 L: qemu-bl...@nongnu.org
-- 
2.25.3

Re: [PATCH 2/2] qemu-img: Add --start-offset and --max-length to map

2020-05-06 Thread Eyal Moscovici

Thanks for the review. I will send V2 based on QEMU version 5.0.

On 29/04/2020, 18:06, "Eric Blake"  wrote:

On 3/22/20 4:11 AM, Eyal Moscovici wrote:
> The mapping operation of large disks especially ones stored over a
> long chain of QCOW2 files can take a long time to finish.
> Additionally when mapping fails there was no way recover by
> restarting the mapping from the failed location.
> 
> The new options, --start-offset and --max-length allows the user to
> divide these type of map operations into shorter independent tasks.
> 
> Acked-by: Mark Kanda 
> Co-developed-by: Yoav Elnekave 
> Signed-off-by: Yoav Elnekave 
> Signed-off-by: Eyal Moscovici 
> ---
>   docs/tools/qemu-img.rst |  2 +-
>   qemu-img-cmds.hx|  4 ++--
>   qemu-img.c  | 30 +-
>   3 files changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> index 0080f83a76..924e89f679 100644
> --- a/docs/tools/qemu-img.rst
> +++ b/docs/tools/qemu-img.rst
> @@ -519,7 +519,7 @@ Command description:
>   ``ImageInfoSpecific*`` QAPI object (e.g. ``ImageInfoSpecificQCow2``
>   for qcow2 images).
>   
> -.. option:: map [--object OBJECTDEF] [--image-opts] [-f FMT] 
[--output=OFMT] [-U] FILENAME
> +.. option:: map [--object OBJECTDEF] [--image-opts] [-f FMT] 
[--start-offset=offset] [--max-length=len] [--output=OFMT] [-U] FILENAME

Consistency with the rest of the line says this should be 
[--start-offset=OFFSET] [--max-length=LEN]

Will fix.

>   
> Dump the metadata of image *FILENAME* and its backing file chain.
> In particular, this commands dumps the allocation state of every 
sector
> diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
> index c9c54de1df..35f832816f 100644
> --- a/qemu-img-cmds.hx
> +++ b/qemu-img-cmds.hx
> @@ -63,9 +63,9 @@ SRST
>   ERST
>   
>   DEF("map", img_map,
> -"map [--object objectdef] [--image-opts] [-f fmt] [--output=ofmt] 
[-U] filename")
> +"map [--object objectdef] [--image-opts] [-f fmt] 
[--start-offset=offset] [--max-length=len] [--output=ofmt] [-U] filename")

this one is fine,

>   SRST
> -.. option:: map [--object OBJECTDEF] [--image-opts] [-f FMT] 
[--output=OFMT] [-U] FILENAME
> +.. option:: map [--object OBJECTDEF] [--image-opts] [-f FMT] 
[--start-offset=OFFSET] [--max-length=LEN] [--output=OFMT] [-U] FILENAME

this one has the same problem as the .rst.

> @@ -3005,6 +3009,26 @@ static int img_map(int argc, char **argv)
>   case OPTION_OUTPUT:
>   output = optarg;
>   break;
> +case 's':
> +start_offset = cvtnum(optarg);
> +if (start_offset < 0) {
> +error_report("Invalid start offset specified! You may 
use "
> + "k, M, G, T, P or E suffixes for ");
> +error_report("kilobytes, megabytes, gigabytes, 
terabytes, "
> + "petabytes and exabytes.");

Pre-existing elsewhere in the file, but this seems rather verbose - 
shouldn't we have cvtnum() (or another wrapper function) give this extra 
information about what is valid, rather than open-coding it at every 
client of cvtnum()?

You are probably right I will send a patch that adds the error message  about 
the units to cvtnum().

> +return 1;
> +}
> +break;
> +case 'l':
> +max_length = cvtnum(optarg);
> +if (max_length < 0) {
> +error_report("Invalid max length specified! You may use "
> + "k, M, G, T, P or E suffixes for ");
> +error_report("kilobytes, megabytes, gigabytes, 
terabytes, "
> + "petabytes and exabytes.");
> +return 1;
> +}
> +break;
>   case OPTION_OBJECT: {
>   QemuOpts *opts;
>   opts = qemu_opts_parse_noisily(&qemu_object_opts,
> @@ -3050,7 +3074,11 @@ static int img_map(int argc, char **argv)
>   printf("[");
>   }
>   
> +curr.start = start_offset;
>   length = blk_getlength(blk);
> +if (max_length != -1) {
> +length = MIN(start_offset + max_length, length);
> +}

Pre-existing, but where does this code check for length == -1?  But your 
MIN() doesn't make it any worse (if we fail to get length, we merely 
skip the loop).

I don't think there is a check. I will add a check in a different patch.

>   while (curr.start + curr.length < length) {
>   int64_t offset = curr.start + curr.length;

Re: [PATCH v2 05/13] acpi: move acpi_init_common_fadt_data to acpi-common.c

2020-05-06 Thread Gerd Hoffmann

On Tue, May 05, 2020 at 04:25:55PM +0200, Igor Mammedov wrote:
> On Tue,  5 May 2020 15:42:57 +0200
> Gerd Hoffmann  wrote:
> 
> the same question like with FACS, why legacy data are needed for with reduced 
> profile?
> it mostly initializes data for fixed-hw model.
> 
> I'd preffer if you drop FACS and use minimal FADT like build_fadt_rev5() does
> without pulling along a bunch of legacy junk (unless there is a good 
> justification for it).

Leftover from the isa-acpi version, it is indeed not needed any more
(same for facs).

take care,
  Gerd

Re: [PATCH 1/2] qemu-img: refactor dump_map_entry JSON format output

2020-05-06 Thread Eyal Moscovici




On 29/04/2020, 17:58, "Eric Blake"  wrote:

On 3/22/20 4:11 AM, Eyal Moscovici wrote:
> Previously dump_map_entry identified whether we need to start a new JSON
> array based on whether start address == 0. In this refactor we remove
> this assumption as in following patches we will allow map to start from
> an arbitrary position.
> 
> Acked-by: Mark Kanda 
> Signed-off-by: Eyal Moscovici 
> ---
>   qemu-img.c | 12 
>   1 file changed, 8 insertions(+), 4 deletions(-)
> 

> @@ -2871,8 +2870,8 @@ static int dump_map_entry(OutputFormat 
output_format, MapEntry *e,
>   }
>   putchar('}');
>   
> -if (!next) {
> -printf("]\n");
> +if (next) {
> +printf(",\n");

As long as you're touching this, puts(",") is slightly more efficient 
than printf().  But what you have is not wrong.

Thanks, will fix.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v16 QEMU 08/16] vfio: Register SaveVMHandlers for VFIO device

2020-05-06 Thread Cornelia Huck

On Wed, 6 May 2020 02:38:46 -0400
Yan Zhao  wrote:

> On Tue, May 05, 2020 at 12:37:26PM +0800, Alex Williamson wrote:
> > It's been a long time, but that doesn't seem like what I was asking.
> > The sysfs version checking is used to select a target that is likely to
> > succeed, but the migration stream is still generated by a user and the
> > vendor driver is still ultimately responsible for validating that
> > stream.  I would hope that a vendor migration stream therefore starts
> > with information similar to that found in the sysfs interface, allowing
> > the receiving vendor driver to validate the source device and vendor
> > software version, such that we can fail an incoming migration that the
> > vendor driver deems incompatible.  Ideally the vendor driver might also
> > include consistency and sequence checking throughout the stream to
> > prevent a malicious user from exploiting the internal operation of the
> > vendor driver.  Thanks,

Some kind of somewhat standardized marker for driver/version seems like
a good idea. Further checking is also a good idea, but I think the
details of that need to be left to the individual drivers.

> >   
> maybe we can add a rw field migration_version in
> struct vfio_device_migration_info besides sysfs interface ?
> 
> when reading it in src, it gets the same string as that from sysfs;
> when writing it in target, it returns success or not to check
> compatibility and fails the migration early in setup phase.

Getting both populated from the same source seems like a good idea.

Not sure if a string is the best value to put into a migration stream;
maybe the sysfs interface can derive a human-readable string from a
more compact value to be put into the migration region (and ultimately
the stream)? Might be overengineering, just thinking out aloud here.

Re: [PATCH RFC 2/4] migration: Introduce migrate_is_precopy()

2020-05-06 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> Export a helper globally to check whether we're during a precopy.
> 
> Signed-off-by: Peter Xu 

Can you change this to 'migration_in_precopy' to match the existing
'migration_in_postcopy'.

Dave
> ---
>  include/migration/misc.h | 1 +
>  migration/migration.c| 7 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index e338be8c30..b4f6bf7842 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -61,6 +61,7 @@ void migration_shutdown(void);
>  void qemu_start_incoming_migration(const char *uri, Error **errp);
>  bool migration_is_idle(void);
>  bool migration_is_active(MigrationState *);
> +bool migration_is_precopy(void);
>  void add_migration_state_change_notifier(Notifier *notify);
>  void remove_migration_state_change_notifier(Notifier *notify);
>  bool migration_in_setup(MigrationState *);
> diff --git a/migration/migration.c b/migration/migration.c
> index 187ac0410c..0082880279 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1795,6 +1795,13 @@ bool migration_is_active(MigrationState *s)
>  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
>  }
>  
> +bool migration_is_precopy(void)
> +{
> +MigrationState *s = migrate_get_current();
> +
> +return s && s->state == MIGRATION_STATUS_ACTIVE;
> +}
> +
>  void migrate_init(MigrationState *s)
>  {
>  /*
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v5 11/31] qcow2: Add offset_into_subcluster() and size_to_subclusters()

2020-05-06 Thread Alberto Garcia

On Tue 05 May 2020 09:42:08 PM CEST, Eric Blake wrote:
> On 5/5/20 12:38 PM, Alberto Garcia wrote:
>> Like offset_into_cluster() and size_to_clusters(), but for
>> subclusters.
>> 
>> Signed-off-by: Alberto Garcia 
>> ---
>>   block/qcow2.h | 10 ++
>>   1 file changed, 10 insertions(+)
>> 
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index e68febb15b..8b1ed1cbcf 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -537,11 +537,21 @@ static inline int64_t 
>> offset_into_cluster(BDRVQcow2State *s, int64_t offset)
>>   return offset & (s->cluster_size - 1);
>>   }
>>   
>> +static inline int64_t offset_into_subcluster(BDRVQcow2State *s, int64_t 
>> offset)
>> +{
>> +return offset & (s->subcluster_size - 1);
>> +}
>> +
>>   static inline uint64_t size_to_clusters(BDRVQcow2State *s, uint64_t size)
>>   {
>>   return (size + (s->cluster_size - 1)) >> s->cluster_bits;
>>   }
>
> Pre-existing, but this could use DIV_ROUND_UP.

Yeah but it would be nicer to have a version of the macro optimized for
powers of two.

Berto

Re: [PATCH v2 08/13] acpi: generic event device for x86

2020-05-06 Thread Gerd Hoffmann

On Tue, May 05, 2020 at 05:03:16PM +0200, Igor Mammedov wrote:
> On Tue,  5 May 2020 15:43:00 +0200
> Gerd Hoffmann  wrote:
> 
> > Uses TYPE_ACPI_GED as QOM parent for code reuse.
> > Add registers for sleep states and reset.
> > Add powerdown notifier for power button events.
> registers aren't x86 specific per spec,
> can we put these registers into TYPE_ACPI_GED
> and enable the optionally like is done with memory hotplug registers
> in acpi_ged_initfn()

Sure, will do.

> > Set AcpiDeviceIfClass->madt_cpu.
> that's the only reason I could justify adding x86 specific flavour.

Also the powerdown notifier.  Seems the workflow is slightly different
on x86 (acpi device registers notifier) and arm (machine registers
notifier and calls acpidev->send_event).

take care,
  Gerd

Re: [PATCH v18 QEMU 16/18] vfio: Add ioctl to get dirty pages bitmap during dma unmap.

2020-05-06 Thread Cornelia Huck

On Tue, 5 May 2020 04:14:51 +0530
Kirti Wankhede  wrote:

> With vIOMMU, IO virtual address range can get unmapped while in pre-copy
> phase of migration. In that case, unmap ioctl should return pages pinned
> in that range and QEMU should find its correcponding guest physical
> addresses and report those dirty.
> 
> Note: This patch is not yet tested. I'm trying to see how I can test this
> code path.

This remark should go beneath the '---' line, so that it does not end
up in the final commit.

> 
> Suggested-by: Alex Williamson 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/common.c | 79 
> +---
>  1 file changed, 75 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 4277b275ca21..b94e2bcb1178 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -311,11 +311,77 @@ static bool vfio_devices_are_stopped_and_saving(void)
>  return true;
>  }
>  
> +static bool vfio_devices_are_running_and_saving(void)

Maybe s/are/all/ to make it sure that the scope is *all* vfio devices
here?

Is there any global state for this which we could use to check this in
a simpler way?

> +{
> +VFIOGroup *group;
> +VFIODevice *vbasedev;
> +
> +QLIST_FOREACH(group, &vfio_group_list, next) {
> +QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
> +(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
> +continue;
> +} else {
> +return false;
> +}
> +}
> +}
> +return true;
> +}
> +
> +static int vfio_dma_unmap_bitmap(VFIOContainer *container,
> + hwaddr iova, ram_addr_t size,
> + IOMMUTLBEntry *iotlb)
> +{
> +struct vfio_iommu_type1_dma_unmap *unmap;
> +struct vfio_bitmap *bitmap;
> +uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS;
> +int ret;
> +
> +unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));

g_malloc0 cannot fail (it will abort). If you want to be able to
tolerate memory allocation failure, you should use g_try_malloc0().

> +if (!unmap) {
> +return -ENOMEM;
> +}
> +
> +unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
> +unmap->flags |= VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP;
> +bitmap = (struct vfio_bitmap *)&unmap->data;
> +
> +/*
> + * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
> + * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to
> + * TARGET_PAGE_SIZE.
> + */
> +
> +bitmap->pgsize = TARGET_PAGE_SIZE;
> +bitmap->size = ROUND_UP(pages / 8, sizeof(uint64_t));
> +bitmap->data = g_malloc0(bitmap->size);
> +if (!bitmap->data) {
> +error_report("UNMAP: Error allocating bitmap of size 0x%llx",
> + bitmap->size);
> +g_free(unmap);
> +return -ENOMEM;
> +}
> +
> +ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
> +if (!ret) {
> +cpu_physical_memory_set_dirty_lebitmap((uint64_t *)bitmap->data,
> +iotlb->translated_addr, pages);
> +} else {
> +error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %d", -errno);
> +}
> +
> +g_free(bitmap->data);
> +g_free(unmap);
> +return ret;
> +}
> +

Re: [PATCH v2 09/13] microvm: add minimal acpi support

2020-05-06 Thread Gerd Hoffmann

> > +/* FIXME: copy & paste */
> > +static void acpi_dsdt_add_power_button(Aml *scope)
> > +{
> > +Aml *dev = aml_device(ACPI_POWER_BUTTON_DEVICE);
> > +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C0C")));
> > +aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > +aml_append(scope, dev);
> > +}
> 
> could be unified with ARM's version

Yep.  Suggestions for a good place?  hw/acpi/aml-build.c ?

> > +acpi_align_size(tables_blob, ACPI_BUILD_TABLE_SIZE);
> > +acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE);
> I'd drop these as that was mostly to counter various migration issues on 
> legacy

Dropped.

take care,
  Gerd

Re: [PATCH v3 09/33] block: Add generic bdrv_inherited_options()

2020-05-06 Thread Kevin Wolf

Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
> After the series this patch belongs to, we want to have a common
> BdrvChildClass that encompasses all of child_file, child_format, and
> child_backing.  Such a single class needs a single .inherit_options()
> implementation, and this patch introduces it.
> 
> The next patch will show how the existing implementations can fall back
> to it just by passing appropriate BdrvChildRole and parent_is_format
> values.
> 
> Signed-off-by: Max Reitz 
> Reviewed-by: Eric Blake 
> ---
>  block.c | 84 +
>  1 file changed, 84 insertions(+)
> 
> diff --git a/block.c b/block.c
> index c33f0e9b42..9179b9b604 100644
> --- a/block.c
> +++ b/block.c
> @@ -998,6 +998,90 @@ static void bdrv_temp_snapshot_options(int *child_flags, 
> QDict *child_options,
>  *child_flags &= ~BDRV_O_NATIVE_AIO;
>  }
>  
> +/*
> + * Returns the options and flags that a generic child of a BDS should
> + * get, based on the given options and flags for the parent BDS.
> + */
> +static void __attribute__((unused))
> +bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
> +   int *child_flags, QDict *child_options,
> +   int parent_flags, QDict *parent_options)
> +{
> +int flags = parent_flags;
> +
> +/*
> + * First, decide whether to set, clear, or leave BDRV_O_PROTOCOL.
> + * Generally, the question to answer is: Should this child be
> + * format-probed by default?
> + */
> +
> +/*
> + * Pure and non-filtered data children of non-format nodes should
> + * be probed by default (even when the node itself has BDRV_O_PROTOCOL
> + * set).  This only affects a very limited set of drivers (namely
> + * quorum and blkverify when this comment was written).
> + * Force-clear BDRV_O_PROTOCOL then.
> + */
> +if (!parent_is_format &&
> +(role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
> + BDRV_CHILD_FILTERED)) ==
> +BDRV_CHILD_DATA)

You could avoid the odd indentation (I can't decide whether or not it
should be one space more to align correctly) and probably also make the
expression more readable if you split it into:

(role & BDRV_CHILD_DATA) &&
!(role & (BDRV_CHILD_METADATA | BDRV_CHILD_FILTERED))

> +{
> +flags &= ~BDRV_O_PROTOCOL;
> +}
> +
> +/*
> + * All children of format nodes (except for COW children) and all
> + * metadata children in general should never be format-probed.
> + * Force-set BDRV_O_PROTOCOL then.
> + */
> +if ((parent_is_format && !(role & BDRV_CHILD_COW)) ||
> +(role & BDRV_CHILD_METADATA))
> +{
> +flags |= BDRV_O_PROTOCOL;
> +}
> +
> +/*
> + * If the cache mode isn't explicitly set, inherit direct and no-flush 
> from
> + * the parent.
> + */
> +qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
> +qdict_copy_default(child_options, parent_options, 
> BDRV_OPT_CACHE_NO_FLUSH);
> +qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
> +
> +if (role & BDRV_CHILD_COW) {
> +/* backing files are always opened read-only */

Not "always", just by default.

> +qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
> +qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
> +} else {
> +/* Inherit the read-only option from the parent if it's not set */
> +qdict_copy_default(child_options, parent_options, 
> BDRV_OPT_READ_ONLY);
> +qdict_copy_default(child_options, parent_options,
> +   BDRV_OPT_AUTO_READ_ONLY);
> +}
> +
> +if (parent_is_format && !(role & BDRV_CHILD_COW)) {
> +/*
> + * Our format drivers take care to send flushes and respect
> + * unmap policy, so we can default to enable both on lower
> + * layers regardless of the corresponding parent options.
> + */
> +qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
> +}

Why the restriction to format here? Don't we break "unmap" propagation
through filters with this?

It would probably also be a good question why we don't propagate it to
the backing file, but this is preexisting.

> +/* Clear flags that only apply to the top layer */
> +flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ);
> +
> +if (role & BDRV_CHILD_METADATA) {
> +flags &= ~BDRV_O_NO_IO;
> +}

Hm... This is interesting, but I guess it makes sense. It just never was
a hard rule that a format driver must not access data even internally
with NO_IO, but I think it holds true.

> +if (role & BDRV_CHILD_COW) {
> +flags &= ~BDRV_O_TEMPORARY;
> +}

We could probably have a long discussion about whether this is right in
theory, but in practice BDRV_O_TEMPORARY only exists for snapshot=on,
for whi

RE: [PATCH v2 04/10] net: cadence_gem: Define access permission for interrupt registers

2020-05-06 Thread Sai Pavan Boddu

Hi Edgar,

> -Original Message-
> From: Edgar E. Iglesias 
> Sent: Monday, May 4, 2020 8:27 PM
> To: Sai Pavan Boddu 
> Cc: Alistair Francis ; Peter Maydell
> ; Jason Wang ; Markus
> Armbruster ; Philippe Mathieu-Daudé
> ; Tong Ho ; Ramon Fried
> ; qemu-...@nongnu.org; qemu-
> de...@nongnu.org
> Subject: Re: [PATCH v2 04/10] net: cadence_gem: Define access permission
> for interrupt registers
> 
> On Mon, May 04, 2020 at 07:36:02PM +0530, Sai Pavan Boddu wrote:
> > Q1 to Q7 ISR's are clear-on-read, IER/IDR registers are write-only,
> > mask reg are read-only.
> >
> > Signed-off-by: Sai Pavan Boddu 
> > ---
> >  hw/net/cadence_gem.c | 14 ++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c index
> > a930bf1..c532a14 100644
> > --- a/hw/net/cadence_gem.c
> > +++ b/hw/net/cadence_gem.c
> > @@ -458,6 +458,7 @@ static const uint8_t broadcast_addr[] = { 0xFF, 0xFF,
> 0xFF, 0xFF, 0xFF, 0xFF };
> >   */
> >  static void gem_init_register_masks(CadenceGEMState *s)  {
> > +unsigned int i;
> >  /* Mask of register bits which are read only */
> >  memset(&s->regs_ro[0], 0, sizeof(s->regs_ro));
> >  s->regs_ro[GEM_NWCTRL]   = 0xFFF8;
> > @@ -470,10 +471,19 @@ static void
> gem_init_register_masks(CadenceGEMState *s)
> >  s->regs_ro[GEM_ISR]  = 0x;
> >  s->regs_ro[GEM_IMR]  = 0x;
> >  s->regs_ro[GEM_MODID]= 0x;
> > +for (i = 0; i < s->num_priority_queues; i++) {
> > +s->regs_ro[GEM_INT_Q1_STATUS + i] = 0x;
> > +s->regs_ro[GEM_INT_Q1_ENABLE + i] = 0xE319;
> > +s->regs_ro[GEM_INT_Q1_DISABLE + i] = 0xE319;
> 
> Shouldn't these be 0xf319?
[Sai Pavan Boddu] This one is right. I would fix it thanks.

Regards,
Sai Pavan
> Perhaps I'm looking at old specs but mine says bits upper bits [31:12] are
> reserved and read-only.
> 
> 
> With that fixed:
> Reviewed-by: Edgar E. Iglesias 
> 
> 
> 
> 
> 
> > +s->regs_ro[GEM_INT_Q1_MASK + i] = 0x;
> 
> > +}
> >
> >  /* Mask of register bits which are clear on read */
> >  memset(&s->regs_rtc[0], 0, sizeof(s->regs_rtc));
> >  s->regs_rtc[GEM_ISR]  = 0x;
> > +for (i = 0; i < s->num_priority_queues; i++) {
> > +s->regs_rtc[GEM_INT_Q1_STATUS + i] = 0x0CE6;
> > +}
> >
> >  /* Mask of register bits which are write 1 to clear */
> >  memset(&s->regs_w1c[0], 0, sizeof(s->regs_w1c)); @@ -485,6
> > +495,10 @@ static void gem_init_register_masks(CadenceGEMState *s)
> >  s->regs_wo[GEM_NWCTRL]   = 0x00073E60;
> >  s->regs_wo[GEM_IER]  = 0x07FF;
> >  s->regs_wo[GEM_IDR]  = 0x07FF;
> > +for (i = 0; i < s->num_priority_queues; i++) {
> > +s->regs_wo[GEM_INT_Q1_ENABLE + i] = 0x0CE6;
> > +s->regs_wo[GEM_INT_Q1_DISABLE + i] = 0x0CE6;
> > +}
> >  }
> >
> >  /*
> > --
> > 2.7.4
> >

Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-05-06 Thread Cornelia Huck

On Mon, 4 May 2020 21:28:56 +0530
Kirti Wankhede  wrote:

> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Fixed below error by changing pgsize type from uint64_t to size_t.
> Reported-by: kbuild test robot 
> 
> All errors:
> drivers/vfio/vfio_iommu_type1.c:197: undefined reference to `__udivdi3'
> 
> drivers/vfio/vfio_iommu_type1.c:225: undefined reference to `__udivdi3'

Move that below the '---' delimiter so that it does not end up in the
commit? (Crediting the build bot is fine, but the details are not
really useful when you look at the code later.)

> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 266 
> +++-
>  1 file changed, 260 insertions(+), 6 deletions(-)

> @@ -2278,6 +2435,93 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   return copy_to_user((void __user *)arg, &unmap, minsz) ?
>   -EFAULT : 0;
> + } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) {
> + struct vfio_iommu_type1_dirty_bitmap dirty;
> + uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
> + VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
> + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
> + int ret = 0;
> +
> + if (!iommu->v2)
> + return -EACCES;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap,
> + flags);
> +
> + if (copy_from_user(&dirty, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (dirty.argsz < minsz || dirty.flags & ~mask)
> + return -EINVAL;
> +
> + /* only one flag should be set at a time */
> + if (__ffs(dirty.flags) != __fls(dirty.flags))
> + return -EINVAL;
> +

Shouldn't you also check whether the flag that is set is actually
valid? (maybe dirty.flags & ~VFIO_IOMMU_DIRTY_PAGES_FLAG_MASK and do a
switch/case over dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_MASK)

Re: [PATCH RFC 3/4] vl: Sync dirty bits for system resets during precopy

2020-05-06 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> System resets will also reset system memory layout.  Although the memory 
> layout
> after the reset should probably the same as before the reset, we still need to
> do frequent memory section removals and additions during the reset process.
> Those operations could accidentally lose per-mem-section information like KVM
> memslot dirty bitmaps.
> 
> Previously we keep those dirty bitmaps by sync it during memory removal.
> However that's hard to make it right after all [1].  Instead, we sync dirty
> pages before system reset if we know we're during a precopy migration.  This
> should solve the same problem explicitly.
> 
> [1] https://lore.kernel.org/qemu-devel/20200327150425.GJ422390@xz-x1/

I think the problem is knowing whether this is sufficient or whether you
definitely need to do it for other cases of removal/readd.

However, assuming we need to do it during reset, how do we know this is
the right point to do it, and not something further inside the reset
process?  Is this just on the basis that vcpus are stopped?

Dave

> Signed-off-by: Peter Xu 
> ---
>  softmmu/vl.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 32c0047889..8f864fee43 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -1387,6 +1387,22 @@ void qemu_system_reset(ShutdownCause reason)
>  
>  cpu_synchronize_all_states();
>  
> +/*
> + * System reboot could reset memory layout.  Although the final status of
> + * the memory layout should be the same as before the reset, the memory
> + * sections can still be removed and added back frequently due to the 
> reset
> + * process.  This could potentially drop dirty bits in track for those
> + * memory sections before the reset.
> + *
> + * Do a global dirty sync before the reset happens if we are during a
> + * precopy, so we don't lose the dirty bits during the memory shuffles.
> + */
> +if (migration_is_precopy()) {
> +WITH_RCU_READ_LOCK_GUARD() {
> +migration_bitmap_sync_precopy();
> +}
> +}
> +
>  if (mc && mc->reset) {
>  mc->reset(current_machine);
>  } else {
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

RE: [PATCH v2 05/10] net: cadence_gem: Set ISR according to queue in use

2020-05-06 Thread Sai Pavan Boddu

Hi Edgar,

> -Original Message-
> From: Edgar E. Iglesias 
> Sent: Monday, May 4, 2020 8:32 PM
> To: Sai Pavan Boddu 
> Cc: Alistair Francis ; Peter Maydell
> ; Jason Wang ; Markus
> Armbruster ; Philippe Mathieu-Daudé
> ; Tong Ho ; Ramon Fried
> ; qemu-...@nongnu.org; qemu-
> de...@nongnu.org
> Subject: Re: [PATCH v2 05/10] net: cadence_gem: Set ISR according to queue
> in use
> 
> On Mon, May 04, 2020 at 07:36:03PM +0530, Sai Pavan Boddu wrote:
> > Set ISR according to queue in use, added interrupt support for all
> > queues.
> 
> Would it help to add a gem_set_isr(CadenceGEMState *s, int q, uint32_t
> flag) ?
> Instead of open coding these if (q == 0) else... all over the place...
[Sai Pavan Boddu] Yeah, it would be nice. Will try to include this in v3

Thanks,
Sai Pavan
> 
> Anyway, the logic looks good to me:
> Reviewed-by: Edgar E. Iglesias 
> 
> 
> 
> >
> > Signed-off-by: Sai Pavan Boddu 
> > ---
> >  hw/net/cadence_gem.c | 31 ++-
> >  1 file changed, 22 insertions(+), 9 deletions(-)
> >
> > diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c index
> > c532a14..beb38ec 100644
> > --- a/hw/net/cadence_gem.c
> > +++ b/hw/net/cadence_gem.c
> > @@ -896,7 +896,13 @@ static void gem_get_rx_desc(CadenceGEMState
> *s, int q)
> >  if (rx_desc_get_ownership(s->rx_desc[q]) == 1) {
> >  DB_PRINT("descriptor 0x%" HWADDR_PRIx " owned by sw.\n",
> desc_addr);
> >  s->regs[GEM_RXSTATUS] |= GEM_RXSTATUS_NOBUF;
> > -s->regs[GEM_ISR] |= GEM_INT_RXUSED & ~(s->regs[GEM_IMR]);
> > +if (q == 0) {
> > +s->regs[GEM_ISR] |= GEM_INT_RXUSED & ~(s->regs[GEM_IMR]);
> > +} else {
> > +s->regs[GEM_INT_Q1_STATUS + q - 1] |= GEM_INT_RXUSED &
> > +  ~(s->regs[GEM_INT_Q1_MASK + q - 
> > 1]);
> > +}
> > +
> >  /* Handle interrupt consequences */
> >  gem_update_int_status(s);
> >  }
> > @@ -1071,8 +1077,12 @@ static ssize_t gem_receive(NetClientState *nc,
> const uint8_t *buf, size_t size)
> >  gem_receive_updatestats(s, buf, size);
> >
> >  s->regs[GEM_RXSTATUS] |= GEM_RXSTATUS_FRMRCVD;
> > -s->regs[GEM_ISR] |= GEM_INT_RXCMPL & ~(s->regs[GEM_IMR]);
> > -
> > +if (q == 0) {
> > +s->regs[GEM_ISR] |= GEM_INT_RXCMPL & ~(s->regs[GEM_IMR]);
> > +} else {
> > +s->regs[GEM_INT_Q1_STATUS + q - 1] |= GEM_INT_RXCMPL &
> > +  ~(s->regs[GEM_INT_Q1_MASK + q - 1]);
> > +}
> >  /* Handle interrupt consequences */
> >  gem_update_int_status(s);
> >
> > @@ -1223,12 +1233,12 @@ static void gem_transmit(CadenceGEMState
> *s)
> >  DB_PRINT("TX descriptor next: 0x%08x\n",
> > s->tx_desc_addr[q]);
> >
> >  s->regs[GEM_TXSTATUS] |= GEM_TXSTATUS_TXCMPL;
> > -s->regs[GEM_ISR] |= GEM_INT_TXCMPL & ~(s->regs[GEM_IMR]);
> > -
> > +if (q == 0) {
> > +s->regs[GEM_ISR] |= GEM_INT_TXCMPL & ~(s-
> >regs[GEM_IMR]);
> > +} else {
> >  /* Update queue interrupt status */
> > -if (s->num_priority_queues > 1) {
> > -s->regs[GEM_INT_Q1_STATUS + q] |=
> > -GEM_INT_TXCMPL & ~(s->regs[GEM_INT_Q1_MASK + 
> > q]);
> > +s->regs[GEM_INT_Q1_STATUS + q - 1] |=
> > +GEM_INT_TXCMPL & ~s->regs[GEM_INT_Q1_MASK
> > + + q - 1];
> >  }
> >
> >  /* Handle interrupt consequences */ @@ -1280,7
> > +1290,10 @@ static void gem_transmit(CadenceGEMState *s)
> >
> >  if (tx_desc_get_used(desc)) {
> >  s->regs[GEM_TXSTATUS] |= GEM_TXSTATUS_USED;
> > -s->regs[GEM_ISR] |= GEM_INT_TXUSED & ~(s->regs[GEM_IMR]);
> > +/* IRQ TXUSED is defined only for queue 0 */
> > +if (q == 0) {
> > +s->regs[GEM_ISR] |= GEM_INT_TXUSED & ~(s-
> >regs[GEM_IMR]);
> > +}
> >  gem_update_int_status(s);
> >  }
> >  }
> > --
> > 2.7.4
> >

Re: [PATCH v25 08/10] ACPI: Record Generic Error Status Block(GESB) table

2020-05-06 Thread gengdongjiu




On 2020/5/5 18:44, Igor Mammedov wrote:
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Xiang Zheng 
> Reviewed-by: Igor Mammedov 
> 
> Also we should ratelimit error messages that could be triggered at runtime
> from acpi_ghes_record_errors() and functions it's calling.
> It could be a patch on top.

Ok, thanks Igor.
I can make another patch for that after this series is applied.

> 
>

Re: [INFO] Some preliminary performance data

2020-05-06 Thread Alex Bennée

Aleksandar Markovic  writes:

Some preliminary thoughts

>> Hi, all.
>>
>> I just want to share with you some bits and pieces of data that I got
>> while doing some preliminary experimentation for the GSoC project "TCG
>> Continuous Benchmarking", that Ahmed Karaman, a student of the fourth final
>> year of Electical Engineering Faculty in Cairo, will execute.
>>
>> *User Mode*
>>
>>* As expected, for any program dealing with any substantional
>> floating-point calculation, softfloat library will be the the heaviest CPU
>> cycles consumer.
>>* We plan to examine the performance behaviour of non-FP programs
>> (integer arithmetic), or even non-numeric programs (sorting strings, for
>> example).

Emilio was the last person to do extensive bench-marking on TCG and he
used a mild fork of the venerable nbench:

  https://github.com/cota/dbt-bench

as the hot code is fairly small it offers a good way of testing quality
of the output. Larger programs will differ as they can involve more code
generation.

>>
>> *System Mode*
>>
>>* I did profiling of booting several machines using a tool called
>> callgrind (a part of valgrind). The tool offers pletora of information,
>> however it looks it is little confused by usage of coroutines, and that
>> makes some of its reports look very illogical, or plain ugly.

Doesn't running through valgrind inherently serialise execution anyway?
If you are looking for latency caused by locks we have support for the
QEMU sync profiler built into the code. See "help sync-profile" on the HMP.

>> Still, it
>> seems valid data can be extracted from it. Without going into details, here
>> is what it says for one machine (bear in mind that results may vary to a
>> great extent between machines):

You can also use perf to use sampling to find hot points in the code.
One of last years GSoC student wrote some patches that included the
ability to dump a jit info file for perf to consume. We never got it
merged in the end but it might be worth having a go at pulling the
relevant bits out from:

  Subject: [PATCH  v9 00/13] TCG code quality tracking and perf integration
  Date: Mon,  7 Oct 2019 16:28:26 +0100
  Message-Id: <20191007152839.30804-1-alex.ben...@linaro.org>

>>  ** The booting involved six threads, one for display handling, one
>> for emulations, and four more. The last four did almost nothing during
>> boot, just almost entire time siting idle, waiting for something. As far as
>> "Total Instruction Fetch Count" (this is the main measure used in
>> callgrind), they were distributed in proportion 1:3 between display thread
>> and emulation thread (the rest of threads were negligible) (but,
>> interestingly enough, for another machine that proportion was 1:20).
>>  ** The display thread is dominated by vga_update_display() function
>> (21.5% "self" time, and 51.6% "self + callees" time, called almost 4
>> times). Other functions worth mentioning are
>> cpu_physical_memory_snapshot_get_dirty() and
>> memory_region_snapshot_get_dirty(), which are very small functions, but are
>> both invoked over 26 000 000 times, and contribute with over 20% of display
>> thread instruction fetch count together.

The memory region tracking code will end up forcing the slow path for a
lot of memory accesses to video memory via softmmu. You may want to
measure if there is a difference using one of the virtio based graphics
displays.

>>  ** Focusing now on emulation thread, "Total Instruction Fetch Counts"
>> were roughly distributed this way:
>>- 15.7% is execution of GIT-ed code from translation block
>> buffer
>>- 39.9% is execution of helpers
>>- 44.4% is code translation stage, including some coroutine
>> activities
>> Top two among helpers:
>>   - helper_le_stl_memory()

I assume that is the MMU slow-path being called from the generated code.

>>   - helper_lookup_tb_ptr() (this one is invoked whopping 36 000
>> 000 times)

This is an optimisation to avoid exiting the run-loop to find the next
block. From memory I think the two main cases you'll see are:

 - computed jumps (i.e. target not known at JIT time)
 - jumps outside of the current page

>> Single largest instruction consumer of code translation:
>>   - liveness_pass_1(), that constitutes 21.5% of the entire
>> "emulation thread" consumption, or, in other way, almost half of code
>> translation stage (that sits at 44.4%)

This is very much driven by how much code generation vs running you see.
In most of my personal benchmarks I never really notice code generation
because I give my machines large amounts of RAM so code tends to stay
resident so not need to be re-translated. When the optimiser shows up
it's usually accompanied by high TB flush and invalidate counts in "info
jit" because we are doing more translation that we usually do.

I'll also mention my foray into tracking down the performance regression
of DOSBox Doom:

  https

Re: [PATCH v2 09/18] s390x/cpumodel: Fix UI to CPU features pcc-cmac-{aes,eaes}-256

2020-05-06 Thread Cornelia Huck

On Tue,  5 May 2020 17:29:17 +0200
Markus Armbruster  wrote:

> Both s390_features[S390_FEAT_PCC_CMAC_AES_256].name and
> s390_features[S390_FEAT_PCC_CMAC_EAES_256].name is
> "pcc-cmac-eaes-256".  The former is obviously a pasto.
> 
> Impact:
> 
> * s390_feat_bitmap_to_ascii() misidentifies S390_FEAT_PCC_CMAC_AES_256
>   as "pcc-cmac-eaes-256".  Affects QMP commands query-cpu-definitions,
>   query-cpu-model-expansion, query-cpu-model-baseline,
>   query-cpu-model-comparison, and the error message when
>   s390_realize_cpu_model() fails in check_compatibility().
> 
> * s390_cpu_list() also misidentifies it.  Affects -cpu help.
> 
> * s390_cpu_model_register_props() creates CPU property
>   "pcc-cmac-eaes-256" twice.  The second one fails, but the error is
>   ignored (a later commit will change that).  Results in a single
>   property "pcc-cmac-eaes-256" with the description for
>   S390_FEAT_PCC_CMAC_AES_256, and no property for
>   S390_FEAT_PCC_CMAC_EAES_256.  CPU properties are visible in CLI -cpu
>   and -device, QMP & HMP device_add, QMP device-list-properties, and
>   QOM introspection.
> 
> Fix by deleting the wayward 'e'.
> 
> Fixes: 782417446279717aa85320191a519b51f6d5dd31

I like the more standard

Fixes: 782417446279 ("s390x/cpumodel: introduce CPU features")

for that.

> Cc: Halil Pasic 
> Cc: Cornelia Huck 
> Cc: Christian Borntraeger 
> Cc: Richard Henderson 
> Cc: David Hildenbrand 
> Cc: qemu-s3...@nongnu.org
> Signed-off-by: Markus Armbruster 
> Reviewed-by: David Hildenbrand 
> Tested-by: Christian Borntraeger 
> ---
>  target/s390x/cpu_features_def.inc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Cornelia Huck 

I assume you'll take this one together with the rest of the series?

Re: [PATCH v2 2/9] hw/net/xilinx_axienet: Cleanup stream->push assignment

2020-05-06 Thread Philippe Mathieu-Daudé


On 5/6/20 10:25 AM, Edgar E. Iglesias wrote:

From: "Edgar E. Iglesias" 

Split the shared stream_class_init function to assign
stream->push with better type-safety.

Reviewed-by: Alistair Francis 
Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 


Reviewed-by: Philippe Mathieu-Daudé 


---
  hw/net/xilinx_axienet.c | 18 --
  1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 0f97510d8a..84073753d7 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -1029,11 +1029,19 @@ static void xilinx_enet_class_init(ObjectClass *klass, 
void *data)
  dc->reset = xilinx_axienet_reset;
  }
  
-static void xilinx_enet_stream_class_init(ObjectClass *klass, void *data)

+static void xilinx_enet_control_stream_class_init(ObjectClass *klass,
+  void *data)
  {
  StreamSlaveClass *ssc = STREAM_SLAVE_CLASS(klass);
  
-ssc->push = data;

+ssc->push = xilinx_axienet_control_stream_push;
+}
+
+static void xilinx_enet_data_stream_class_init(ObjectClass *klass, void *data)
+{
+StreamSlaveClass *ssc = STREAM_SLAVE_CLASS(klass);
+
+ssc->push = xilinx_axienet_data_stream_push;
  }
  
  static const TypeInfo xilinx_enet_info = {

@@ -1048,8 +1056,7 @@ static const TypeInfo xilinx_enet_data_stream_info = {
  .name  = TYPE_XILINX_AXI_ENET_DATA_STREAM,
  .parent= TYPE_OBJECT,
  .instance_size = sizeof(struct XilinxAXIEnetStreamSlave),
-.class_init= xilinx_enet_stream_class_init,
-.class_data= xilinx_axienet_data_stream_push,
+.class_init= xilinx_enet_data_stream_class_init,
  .interfaces = (InterfaceInfo[]) {
  { TYPE_STREAM_SLAVE },
  { }
@@ -1060,8 +1067,7 @@ static const TypeInfo xilinx_enet_control_stream_info = {
  .name  = TYPE_XILINX_AXI_ENET_CONTROL_STREAM,
  .parent= TYPE_OBJECT,
  .instance_size = sizeof(struct XilinxAXIEnetStreamSlave),
-.class_init= xilinx_enet_stream_class_init,
-.class_data= xilinx_axienet_control_stream_push,
+.class_init= xilinx_enet_control_stream_class_init,
  .interfaces = (InterfaceInfo[]) {
  { TYPE_STREAM_SLAVE },
  { }

Re: [PATCH v2 3/9] hw/net/xilinx_axienet: Remove unncessary cast

2020-05-06 Thread Philippe Mathieu-Daudé


On 5/6/20 10:25 AM, Edgar E. Iglesias wrote:

From: "Edgar E. Iglesias" 

Remove unncessary cast, buf is already uint8_t *.


Typo "unnecessary" here and in patch subject.
Otherwise:
Reviewed-by: Philippe Mathieu-Daudé 


No functional change.

Reviewed-by: Alistair Francis 
Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
  hw/net/xilinx_axienet.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 84073753d7..c8dfcda3ee 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -918,7 +918,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t 
*buf, size_t size)
  uint16_t csum;
  
  tmp_csum = net_checksum_add(size - start_off,

-(uint8_t *)buf + start_off);
+buf + start_off);
  /* Accumulate the seed.  */
  tmp_csum += s->hdr[2] & 0x;

Re: [PATCH v2 09/18] s390x/cpumodel: Fix UI to CPU features pcc-cmac-{aes,eaes}-256

2020-05-06 Thread David Hildenbrand

On 05.05.20 17:29, Markus Armbruster wrote:
> Both s390_features[S390_FEAT_PCC_CMAC_AES_256].name and
> s390_features[S390_FEAT_PCC_CMAC_EAES_256].name is
> "pcc-cmac-eaes-256".  The former is obviously a pasto.
> 
> Impact:
> 
> * s390_feat_bitmap_to_ascii() misidentifies S390_FEAT_PCC_CMAC_AES_256
>   as "pcc-cmac-eaes-256".  Affects QMP commands query-cpu-definitions,
>   query-cpu-model-expansion, query-cpu-model-baseline,
>   query-cpu-model-comparison, and the error message when
>   s390_realize_cpu_model() fails in check_compatibility().
> 
> * s390_cpu_list() also misidentifies it.  Affects -cpu help.
> 
> * s390_cpu_model_register_props() creates CPU property
>   "pcc-cmac-eaes-256" twice.  The second one fails, but the error is
>   ignored (a later commit will change that).  Results in a single
>   property "pcc-cmac-eaes-256" with the description for
>   S390_FEAT_PCC_CMAC_AES_256, and no property for
>   S390_FEAT_PCC_CMAC_EAES_256.  CPU properties are visible in CLI -cpu
>   and -device, QMP & HMP device_add, QMP device-list-properties, and
>   QOM introspection.
> 
> Fix by deleting the wayward 'e'.

You dropped the comment regarding msa4, was that intended?

-- 
Thanks,

David / dhildenb

Re: [PATCH v25 00/10] Add ARMv8 RAS virtualization support in QEMU

2020-05-06 Thread gengdongjiu

On 2020/4/17 21:32, Peter Maydell wrote:
> On Fri, 10 Apr 2020 at 12:46, Dongjiu Geng  wrote:
>>
>> In the ARMv8 platform, the CPU error types includes synchronous external 
>> abort(SEA)
>> and SError Interrupt (SEI). If exception happens in guest, host does not 
>> know the detailed
>> information of guest, so it is expected that guest can do the recovery. For 
>> example, if an
>> exception happens in a guest user-space application, host does not know 
>> which application
>> encounters errors, only guest knows it.
>>
>> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify 
>> userspace.
>> After user space gets the notification, it will record the CPER into guest 
>> GHES
>> buffer and inject an exception or IRQ to guest.
>>
>> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we 
>> will
>> treat it as a synchronous exception, and notify guest with ARMv8 SEA
>> notification type after recording CPER into guest.
> 
> Hi. I left a comment on patch 1. The other 3 patches unreviewed
> are 5, 6 and 8, which are all ACPI core code, so that's for
> MST, Igor or Shannon to review.
> 
> Once those have been reviewed, please ping me if you want this
> to go via target-arm.next.

Hi Peter,
   Igor have reviewed all ACPI core code. whether you can apply this series to 
target-arm.next? I can make another patches to solve your comments on patch1 
and another APCI comment.
Thanks very much in advance.

> 
> thanks
> -- PMM
> 
> .
>

Re: [PATCH v2 4/9] hw/dma/xilinx_axidma: Add DMA memory-region property

2020-05-06 Thread Philippe Mathieu-Daudé


Hi Edgar,

On 5/6/20 10:25 AM, Edgar E. Iglesias wrote:

From: "Edgar E. Iglesias" 

Add DMA memory-region property to externally control what
address-space this DMA operates on.

Reviewed-by: Alistair Francis 
Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
  hw/dma/xilinx_axidma.c | 30 +++---
  1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index 018f36991b..4540051448 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -33,6 +33,7 @@
  #include "qemu/log.h"
  #include "qemu/module.h"
  
+#include "sysemu/dma.h"

  #include "hw/stream.h"
  
  #define D(x)

@@ -103,6 +104,7 @@ enum {
  };
  
  struct Stream {

+struct XilinxAXIDMA *dma;
  ptimer_state *ptimer;
  qemu_irq irq;
  
@@ -125,6 +127,9 @@ struct XilinxAXIDMAStreamSlave {

  struct XilinxAXIDMA {
  SysBusDevice busdev;
  MemoryRegion iomem;
+MemoryRegion *dma_mr;
+AddressSpace as;
+
  uint32_t freqhz;
  StreamSlave *tx_data_dev;
  StreamSlave *tx_control_dev;
@@ -186,7 +191,7 @@ static void stream_desc_load(struct Stream *s, hwaddr addr)
  {
  struct SDesc *d = &s->desc;
  
-cpu_physical_memory_read(addr, d, sizeof *d);

+address_space_read(&s->dma->as, addr, MEMTXATTRS_UNSPECIFIED, d, sizeof 
*d);
  
  /* Convert from LE into host endianness.  */

  d->buffer_address = le64_to_cpu(d->buffer_address);
@@ -204,7 +209,8 @@ static void stream_desc_store(struct Stream *s, hwaddr addr)
  d->nxtdesc = cpu_to_le64(d->nxtdesc);
  d->control = cpu_to_le32(d->control);
  d->status = cpu_to_le32(d->status);
-cpu_physical_memory_write(addr, d, sizeof *d);
+address_space_write(&s->dma->as, addr, MEMTXATTRS_UNSPECIFIED,
+d, sizeof *d);
  }
  
  static void stream_update_irq(struct Stream *s)

@@ -286,8 +292,9 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
   txlen + s->pos);
  }
  
-cpu_physical_memory_read(s->desc.buffer_address,

- s->txbuf + s->pos, txlen);
+address_space_read(&s->dma->as, s->desc.buffer_address,
+   MEMTXATTRS_UNSPECIFIED,
+   s->txbuf + s->pos, txlen);
  s->pos += txlen;
  
  if (stream_desc_eof(&s->desc)) {

@@ -336,7 +343,8 @@ static size_t stream_process_s2mem(struct Stream *s, 
unsigned char *buf,
  rxlen = len;
  }
  
-cpu_physical_memory_write(s->desc.buffer_address, buf + pos, rxlen);

+address_space_write(&s->dma->as, s->desc.buffer_address,
+MEMTXATTRS_UNSPECIFIED, buf + pos, rxlen);
  len -= rxlen;
  pos += rxlen;
  
@@ -525,6 +533,7 @@ static void xilinx_axidma_realize(DeviceState *dev, Error **errp)

  XilinxAXIDMAStreamSlave *cs = XILINX_AXI_DMA_CONTROL_STREAM(
  
&s->rx_control_dev);
  Error *local_err = NULL;
+int i;
  
  object_property_add_link(OBJECT(ds), "dma", TYPE_XILINX_AXI_DMA,

   (Object **)&ds->dma,
@@ -545,17 +554,19 @@ static void xilinx_axidma_realize(DeviceState *dev, Error 
**errp)
  goto xilinx_axidma_realize_fail;
  }
  
-int i;

-
  for (i = 0; i < 2; i++) {
  struct Stream *st = &s->streams[i];
  
+st->dma = s;

  st->nr = i;
  st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_DEFAULT);
  ptimer_transaction_begin(st->ptimer);
  ptimer_set_freq(st->ptimer, s->freqhz);
  ptimer_transaction_commit(st->ptimer);
  }
+
+address_space_init(&s->as,
+   s->dma_mr ? s->dma_mr : get_system_memory(), "dma");


I'd instead return an error (earlier in this function) if dma_mr 
property is not set. If you ever respin...


Reviewed-by: Philippe Mathieu-Daudé 



  return;
  
  xilinx_axidma_realize_fail:

@@ -575,6 +586,11 @@ static void xilinx_axidma_init(Object *obj)
  &s->rx_control_dev, sizeof(s->rx_control_dev),
  TYPE_XILINX_AXI_DMA_CONTROL_STREAM, &error_abort,
  NULL);
+object_property_add_link(obj, "dma", TYPE_MEMORY_REGION,
+ (Object **)&s->dma_mr,
+ qdev_prop_allow_set_link_before_realize,
+ OBJ_PROP_LINK_STRONG,
+ &error_abort);
  
  sysbus_init_irq(sbd, &s->streams[0].irq);

  sysbus_init_irq(sbd, &s->streams[1].irq);

Re: [PATCH v2 00/13] microvm: add acpi support

2020-05-06 Thread Gerd Hoffmann

On Tue, May 05, 2020 at 10:04:02AM -0400, Michael S. Tsirkin wrote:
> On Tue, May 05, 2020 at 03:42:52PM +0200, Gerd Hoffmann wrote:
> > I know that not supporting ACPI in microvm is intentional.  If you still
> > don't want ACPI this is perfectly fine, you can use the usual -no-acpi
> > switch to toggle ACPI support.
> > 
> > These are the advantages you are going to loose then:
> > 
> >   (1) virtio-mmio device discovery without command line hacks (tweaking
> >   the command line is a problem when not using direct kernel boot).
> >   (2) Better IO-APIC support, we can use IRQ lines 16-23.
> >   (3) ACPI power button (aka powerdown request) works.
> >   (4) machine poweroff (aka S5 state) works.
> 
> Questions
> 
> - what's the tradeoff in startup time?

In the noise.  0.28-0.29 seconds on my hardware to the "i8042: PNP: No
PS/2 controller found" message, no matter whenever acpi is on or off.
With "quiet" (acpi prints more and logging to the serial console is
slow).

At that point -no-acpi takes one second to figure the ps2 controller
really isn't there (as discussed before).

Another interesting difference is interrupt handling.

The -no-acpi version:

   CPU0   
  2:  0XT-PIC  cascade
  4:284   IO-APIC   4-edge  ttyS0
  8:  0   IO-APIC   8-edge  rtc0
 14:   5399   IO-APIC  14-edge  virtio1
 15: 58   IO-APIC  15-edge  virtio0
NMI:  0   Non-maskable interrupts
[ ... ]

The acpi version:

   CPU0   
  1:  0   IO-APIC   9-edge  ACPI:Ged
  2:231   IO-APIC  23-fasteoi   virtio0
  3:   6291   IO-APIC  22-fasteoi   virtio1
  4:   1758   IO-APIC   4-edge  ttyS0
  5:  0   IO-APIC   8-edge  rtc0
NMI:  0   Non-maskable interrupts
[ ... ]

> - what should be the default?

IMO it makes sense to enable it by default.  You get working
power management.  You can boot stock cloud images (patched
seabios parses the dsdt to find virtio-mmio devices to boot
from virtio-mmio disks).

It's easier to leave behind legacy stuff:  The kernel trusts the
firmware and doesn't go into "trying harder to find ps2 kbd" mode.
Also what is this "cascade" thing in /proc/interrupts above? [1]

I expect dropping the rtc is easier with acpi too, the kernel probably
wouldn't try to find it then.  Right now seabios needs rtc cmos for
ram size probing, so I didn't test that yet.

On the other hand I don't really see any disadvantages.  The tables are
small ...

# find /sys/firmware/acpi/tables/ -type f | xargs ls -l
-r. 1 root root  70 May  6 06:48 /sys/firmware/acpi/tables/APIC
-r. 1 root root 472 May  6 06:48 /sys/firmware/acpi/tables/DSDT
-r. 1 root root 268 May  6 06:48 /sys/firmware/acpi/tables/FACP

... and simple (no methods) so you can hardly call that "bloat".

> Based on above I'd be inclined to say default should stay no acpi and
> users should enable acpi with an option.

I disagree, but I can live with off by default too.  We already have
acpi=OnOffAuto for X86MachineState, so it is just a matter of handling
microvm.acpi=auto accordingly in x86_machine_is_acpi_enabled().

take care,
  Gerd

[1] Rhetorical question, I know what it is. [2]
[2] I don't want remember though.

Re: [PATCH v2 00/13] microvm: add acpi support

2020-05-06 Thread Michael S. Tsirkin

On Wed, May 06, 2020 at 01:46:35PM +0200, Gerd Hoffmann wrote:
> On Tue, May 05, 2020 at 10:04:02AM -0400, Michael S. Tsirkin wrote:
> > On Tue, May 05, 2020 at 03:42:52PM +0200, Gerd Hoffmann wrote:
> > > I know that not supporting ACPI in microvm is intentional.  If you still
> > > don't want ACPI this is perfectly fine, you can use the usual -no-acpi
> > > switch to toggle ACPI support.
> > > 
> > > These are the advantages you are going to loose then:
> > > 
> > >   (1) virtio-mmio device discovery without command line hacks (tweaking
> > >   the command line is a problem when not using direct kernel boot).
> > >   (2) Better IO-APIC support, we can use IRQ lines 16-23.
> > >   (3) ACPI power button (aka powerdown request) works.
> > >   (4) machine poweroff (aka S5 state) works.
> > 
> > Questions
> > 
> > - what's the tradeoff in startup time?
> 
> In the noise.  0.28-0.29 seconds on my hardware to the "i8042: PNP: No
> PS/2 controller found" message, no matter whenever acpi is on or off.
> With "quiet" (acpi prints more and logging to the serial console is
> slow).
> 
> At that point -no-acpi takes one second to figure the ps2 controller
> really isn't there (as discussed before).
> 
> Another interesting difference is interrupt handling.
> 
> The -no-acpi version:
> 
>CPU0   
>   2:  0XT-PIC  cascade
>   4:284   IO-APIC   4-edge  ttyS0
>   8:  0   IO-APIC   8-edge  rtc0
>  14:   5399   IO-APIC  14-edge  virtio1
>  15: 58   IO-APIC  15-edge  virtio0
> NMI:  0   Non-maskable interrupts
> [ ... ]
> 
> The acpi version:
> 
>CPU0   
>   1:  0   IO-APIC   9-edge  ACPI:Ged
>   2:231   IO-APIC  23-fasteoi   virtio0
>   3:   6291   IO-APIC  22-fasteoi   virtio1
>   4:   1758   IO-APIC   4-edge  ttyS0
>   5:  0   IO-APIC   8-edge  rtc0
> NMI:  0   Non-maskable interrupts
> [ ... ]
> 
> > - what should be the default?
> 
> IMO it makes sense to enable it by default.  You get working
> power management.  You can boot stock cloud images (patched
> seabios parses the dsdt to find virtio-mmio devices to boot
> from virtio-mmio disks).
> 
> It's easier to leave behind legacy stuff:  The kernel trusts the
> firmware and doesn't go into "trying harder to find ps2 kbd" mode.
> Also what is this "cascade" thing in /proc/interrupts above? [1]
> 
> I expect dropping the rtc is easier with acpi too, the kernel probably
> wouldn't try to find it then.  Right now seabios needs rtc cmos for
> ram size probing, so I didn't test that yet.
> 
> On the other hand I don't really see any disadvantages.  The tables are
> small ...
> 
> # find /sys/firmware/acpi/tables/ -type f | xargs ls -l
> -r. 1 root root  70 May  6 06:48 /sys/firmware/acpi/tables/APIC
> -r. 1 root root 472 May  6 06:48 /sys/firmware/acpi/tables/DSDT
> -r. 1 root root 268 May  6 06:48 /sys/firmware/acpi/tables/FACP
> 
> ... and simple (no methods) so you can hardly call that "bloat".
> 
> > Based on above I'd be inclined to say default should stay no acpi and
> > users should enable acpi with an option.
> 
> I disagree, but I can live with off by default too.  We already have
> acpi=OnOffAuto for X86MachineState, so it is just a matter of handling
> microvm.acpi=auto accordingly in x86_machine_is_acpi_enabled().
> 
> take care,
>   Gerd
> 
> [1] Rhetorical question, I know what it is. [2]
> [2] I don't want remember though.

Let's leave flipping the default as a separate patch, to be
decided on merits after a bunch of people test with/without.

-- 
MST

Re: [PATCH v2 5/9] hw/core: stream: Add an end-of-packet flag

2020-05-06 Thread Philippe Mathieu-Daudé


Hi Edgar,

On 5/6/20 10:25 AM, Edgar E. Iglesias wrote:

From: "Edgar E. Iglesias" 

Some stream clients stream an endless stream of data while
other clients stream data in packets. Stream interfaces
usually have a way to signal the end of a packet or the
last beat of a transfer.

This adds an end-of-packet flag to the push interface.

Reviewed-by: Alistair Francis 
Reviewed-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
---
  include/hw/stream.h |  5 +++--
  hw/core/stream.c|  4 ++--
  hw/dma/xilinx_axidma.c  | 10 ++
  hw/net/xilinx_axienet.c | 14 ++
  hw/ssi/xilinx_spips.c   |  2 +-
  5 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/hw/stream.h b/include/hw/stream.h
index d02f62ca89..ed09e83683 100644
--- a/include/hw/stream.h
+++ b/include/hw/stream.h
@@ -39,12 +39,13 @@ typedef struct StreamSlaveClass {
   * @obj: Stream slave to push to
   * @buf: Data to write
   * @len: Maximum number of bytes to write
+ * @eop: End of packet flag
   */
-size_t (*push)(StreamSlave *obj, unsigned char *buf, size_t len);
+size_t (*push)(StreamSlave *obj, unsigned char *buf, size_t len, bool eop);


I'd split this patch, first add EOP in the push handler, keeping current 
code working, then the following patches (implementing the feature in 
the backend handlers), then ...



  } StreamSlaveClass;
  
  size_t

-stream_push(StreamSlave *sink, uint8_t *buf, size_t len);
+stream_push(StreamSlave *sink, uint8_t *buf, size_t len, bool eop);


... this final patch, enable the feature and let the frontends use it.

  
  bool

  stream_can_push(StreamSlave *sink, StreamCanPushNotifyFn notify,
diff --git a/hw/core/stream.c b/hw/core/stream.c
index 39b1e595cd..a65ad1208d 100644
--- a/hw/core/stream.c
+++ b/hw/core/stream.c
@@ -3,11 +3,11 @@
  #include "qemu/module.h"
  
  size_t

-stream_push(StreamSlave *sink, uint8_t *buf, size_t len)
+stream_push(StreamSlave *sink, uint8_t *buf, size_t len, bool eop)
  {
  StreamSlaveClass *k =  STREAM_SLAVE_GET_CLASS(sink);
  
-return k->push(sink, buf, len);

+return k->push(sink, buf, len, eop);


So in this first part patch I'd use 'false' here, and update by 'eop' in 
the other part (last patch in series). Does it make sense?


Regards,

Phil.


  }
  
  bool

diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index 4540051448..a770e12c96 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -283,7 +283,7 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
  
  if (stream_desc_sof(&s->desc)) {

  s->pos = 0;
-stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app));
+stream_push(tx_control_dev, s->desc.app, sizeof(s->desc.app), 
true);
  }
  
  txlen = s->desc.control & SDESC_CTRL_LEN_MASK;

@@ -298,7 +298,7 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSlave *tx_data_dev,
  s->pos += txlen;
  
  if (stream_desc_eof(&s->desc)) {

-stream_push(tx_data_dev, s->txbuf, s->pos);
+stream_push(tx_data_dev, s->txbuf, s->pos, true);
  s->pos = 0;
  stream_complete(s);
  }
@@ -384,7 +384,7 @@ static void xilinx_axidma_reset(DeviceState *dev)
  
  static size_t

  xilinx_axidma_control_stream_push(StreamSlave *obj, unsigned char *buf,
-  size_t len)
+  size_t len, bool eop)
  {
  XilinxAXIDMAStreamSlave *cs = XILINX_AXI_DMA_CONTROL_STREAM(obj);
  struct Stream *s = &cs->dma->streams[1];
@@ -416,12 +416,14 @@ xilinx_axidma_data_stream_can_push(StreamSlave *obj,
  }
  
  static size_t

-xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t 
len)
+xilinx_axidma_data_stream_push(StreamSlave *obj, unsigned char *buf, size_t 
len,
+   bool eop)
  {
  XilinxAXIDMAStreamSlave *ds = XILINX_AXI_DMA_DATA_STREAM(obj);
  struct Stream *s = &ds->dma->streams[1];
  size_t ret;
  
+assert(eop);

  ret = stream_process_s2mem(s, buf, len);
  stream_update_irq(s);
  return ret;
diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index c8dfcda3ee..bd48305577 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -697,14 +697,14 @@ static void axienet_eth_rx_notify(void *opaque)
 axienet_eth_rx_notify, s)) {
  size_t ret = stream_push(s->tx_control_dev,
   (void *)s->rxapp + CONTROL_PAYLOAD_SIZE
- - s->rxappsize, s->rxappsize);
+ - s->rxappsize, s->rxappsize, true);
  s->rxappsize -= ret;
  }
  
  while (s->rxsize && stream_can_push(s->tx_data_dev,

  axienet_eth_rx_notify, s)) {
  size_t ret = stream_push(s->tx_data_dev, (void *)s

Re: [PATCH v2 9/9] MAINTAINERS: Add myself as streams maintainer

2020-05-06 Thread Philippe Mathieu-Daudé


On 5/6/20 10:25 AM, Edgar E. Iglesias wrote:

From: "Edgar E. Iglesias" 

Since we're missing a maintainer, add myself.

Reviewed-by: Alistair Francis 
Signed-off-by: Edgar E. Iglesias 
---
  MAINTAINERS | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1f84e3ae2c..d3663d6c9a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2315,6 +2315,12 @@ F: net/slirp.c
  F: include/net/slirp.h
  T: git https://people.debian.org/~sthibault/qemu.git slirp
  
+Streams

+M: Edgar E. Iglesias 
+S: Maintained
+F: hw/core/stream.c
+F: include/hw/stream.h
+
  Stubs
  M: Paolo Bonzini 
  S: Maintained



Reviewed-by: Philippe Mathieu-Daudé

[PULL 02/14] .travis.yml: drop MacOSX

2020-05-06 Thread Alex Bennée

This keeps breaking on Travis so lets just fall back to the Cirrus CI
builds which seem to be better maintained. Fix up the comments while
we are doing this as we never had a windows build.

Signed-off-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Richard Henderson 
Message-Id: <2020050505.4225-3-alex.ben...@linaro.org>

diff --git a/.travis.yml b/.travis.yml
index a4c3c6c8058..49267b73b36 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -9,9 +9,8 @@ compiler:
 cache:
   # There is one cache per branch and compiler version.
   # characteristics of each job are used to identify the cache:
-  # - OS name (currently, linux, osx, or windows)
+  # - OS name (currently only linux)
   # - OS distribution (for Linux, xenial, trusty, or precise)
-  # - macOS image name (e.g., xcode7.2)
   # - Names and values of visible environment variables set in .travis.yml or 
Settings panel
   timeout: 1200
   ccache: true
@@ -271,31 +270,6 @@ jobs:
 - TEST_CMD=""
 
 
-# MacOSX builds - cirrus.yml also tests some MacOS builds including latest 
Xcode
-
-- name: "OSX Xcode 10.3"
-  env:
-- BASE_CONFIG="--disable-docs --enable-tools"
-- 
CONFIG="--target-list=i386-softmmu,ppc-softmmu,ppc64-softmmu,m68k-softmmu,x86_64-softmmu"
-  os: osx
-  osx_image: xcode10.3
-  compiler: clang
-  addons:
-homebrew:
-  packages:
-- ccache
-- glib
-- pixman
-- gnu-sed
-- python
-  update: true
-  before_script:
-- brew link --overwrite python
-- export PATH="/usr/local/opt/ccache/libexec:$PATH"
-- mkdir -p ${BUILD_DIR} && cd ${BUILD_DIR}
-- ${SRC_DIR}/configure ${BASE_CONFIG} ${CONFIG} || { cat config.log && 
exit 1; }
-
-
 # Python builds
 - name: "GCC Python 3.5 (x86_64-softmmu)"
   env:
-- 
2.20.1

1 2 3 >

1 - 100 of 286 matches

Mail list logo