date:20190913

Re: [Qemu-devel] [PATCH v3 3/6] vmstate: replace DeviceState with VMStateIf

2019-09-13 Thread Marc-André Lureau

Hi

On Thu, Sep 12, 2019 at 8:19 PM Halil Pasic  wrote:
>
> On Thu, 12 Sep 2019 16:25:11 +0400
> Marc-André Lureau  wrote:
>
> > diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
> > index d4807f..16b9bbf04d 100644
> > --- a/hw/s390x/s390-skeys.c
> > +++ b/hw/s390x/s390-skeys.c
> > @@ -392,7 +392,7 @@ static inline void 
> > s390_skeys_set_migration_enabled(Object *obj, bool value,
> >  register_savevm_live(NULL, TYPE_S390_SKEYS, 0, 1,
> >   &savevm_s390_storage_keys, ss);
> >  } else {
> > -unregister_savevm(DEVICE(ss), TYPE_S390_SKEYS, ss);
> > +unregister_savevm(VMSTATE_IF(ss), TYPE_S390_SKEYS, ss);
> >  }
> >  }
>
> Acked-by: Halil Pasic 
>

thanks

> BTW what does the 'f' in VMStateIf stand for? (I've had a look
> at 2/6 but didn't figure out the answer).

"If" stands for interface, and seems to be more common in qemu.
gobject code usually uses "Iface", and other prefer "I" prefix.

Re: [Qemu-devel] [PATCH v2 1/5] rcu: Add automatically released rcu_read_lock variant

2019-09-13 Thread Paolo Bonzini

On 12/09/19 19:45, Dr. David Alan Gilbert wrote:
> Do you think it's best to use the block version for all cases
> or to use the non-block version by taste?
> The block version is quite nice, but that turns most of the patches
> into 'indent everything'.

I don't really know myself.

On first glance I didn't like too much the non-block version and thought
it was because our coding standards does not include variables declared
in the middle of a block.  However, I think what really bothering me is
"AUTO" in the name.  What do you think about "RCU_READ_LOCK_GUARD()"?
The block version would have the additional prefix "WITH_".

We could also add LOCK_GUARD(lock) and WITH_LOCK_GUARD(lock), using
QemuLockable for polymorphism.  I even had patches a while ago (though
they used something like LOCK_GUARD(guard_var, lock).  I think we
dropped them because of fear that the API was a bit over-engineered.

Paolo

Re: [Qemu-devel] [PATCH] target/arm: fix CBAR register for AArch64 CPUs

2019-09-13 Thread Luc Michel

Hi Peter,

On 9/12/19 6:03 PM, Peter Maydell wrote:
> On Thu, 12 Sep 2019 at 12:01, Luc Michel  wrote:
>>
>> For AArch64 CPUs with a CBAR register, we have two views for it:
>>   - in AArch64 state, the CBAR_EL1 register (S3_1_C15_C3_0), returns the
>> full 64 bits CBAR value
>>   - in AArch32 state, the CBAR register (cp15, opc1=1, CRn=15, CRm=3, opc2=0)
>> returns a 32 bits view such that:
>>   CBAR = CBAR_EL1[31:18] 0..0 CBAR_EL1[43:32]
>>
>> This commit fixes the current implementation where:
>>   - CBAR_EL1 was returning the 32 bits view instead of the full 64 bits
>> value,
>>   - CBAR was returning a truncated 32 bits version of the full 64 bits
>> one, instead of the 32 bits view
>>   - CBAR was declared as cp15, opc1=4, CRn=15, CRm=0, opc2=0, which is
>> the CBAR register found in the ARMv7 Cortex-Ax CPUs, but not in
>> ARMv8 CPUs.
>>
>> Signed-off-by: Luc Michel 
>> ---
>>  target/arm/helper.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>> index 507026c915..755aa18a2d 100644
>> --- a/target/arm/helper.c
>> +++ b/target/arm/helper.c
>> @@ -6740,12 +6740,12 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>>  ARMCPRegInfo cbar_reginfo[] = {
>>  { .name = "CBAR",
>>.type = ARM_CP_CONST,
>> -  .cp = 15, .crn = 15, .crm = 0, .opc1 = 4, .opc2 = 0,
>> -  .access = PL1_R, .resetvalue = cpu->reset_cbar },
>> +  .cp = 15, .crn = 15, .crm = 3, .opc1 = 1, .opc2 = 0,
>> +  .access = PL1_R, .resetvalue = cbar32 },
> 
> This will break the Cortex-A9 &c which use the 15/0/4/0 encoding
> and the un-rearranged value for this register.
I don't think so because we are in the "if (arm_feature(env,
ARM_FEATURE_AARCH64))" branch of the code. The else branch still maps
15/0/4/0 for non-AArch64 CPUs.

> 
> I think we need to check through the TRMs to confirm which CPUs use
> which format for the CBAR, and have a different feature bit for the
> newer format/sysreg encoding, so we can provide the right sysregs for
> the right cores.
I checked all the AArch64 Cortex's TRMs, for those having a PERIPHBASE
signal and CBAR register (namely Cortex-A53, 57, 72, 73), they all match
the mapping I put in this patch, so I think we don't need to split the
CBAR feature further. I believe more recent Cortex's address the GIC
using coprocessor registers, and CBAR does not exist in those ones.

-- 
Luc

> 
> thanks
> -- PMM
>

Re: [Qemu-devel] [RFC PATCH] virtio-blk: schedule virtio_notify_config to run on main context

2019-09-13 Thread Sergio Lopez


Michael S. Tsirkin  writes:

> On Thu, Sep 12, 2019 at 08:19:25PM +0200, Sergio Lopez wrote:
>> Another AioContext-related issue, and this is a tricky one.
>> 
>> Executing a QMP block_resize request for a virtio-blk device running
>> on an iothread may cause a deadlock involving the following mutexes:
>> 
>>  - main thead
>>   * Has acquired: qemu_mutex_global.
>>   * Is trying the acquire: iothread AioContext lock via
>> AIO_WAIT_WHILE (after aio_poll).
>> 
>>  - iothread
>>   * Has acquired: AioContext lock.
>>   * Is trying to acquire: qemu_mutex_global (via
>> virtio_notify_config->prepare_mmio_access).
>
> Hmm is this really the only case iothread takes qemu mutex?

Not the only one that takes the mutex, but the only one so far we found
doing so upon request from a job running on the main thread (should be
quite noticeable, due to the deadlock).

> If any such access can deadlock, don't we need a generic
> solution? Maybe main thread can drop qemu mutex
> before taking io thread AioContext lock?

The mutex is acquired very early at os_host_main_loop_wait(), so I
assume there may be many assumptions in multiple code paths that it has
been acquired.

>> With this change, virtio_blk_resize checks if it's being called from a
>> coroutine context running on a non-main thread, and if that's the
>> case, creates a new coroutine and schedules it to be run on the main
>> thread.
>> 
>> This works, but means the actual operation is done
>> asynchronously, perhaps opening a window in which a "device_del"
>> operation may fit and remove the VirtIODevice before
>> virtio_notify_config() is executed.
>> 
>> I *think* it shouldn't be possible, as BHs will be processed before
>> any new QMP/monitor command, but I'm open to a different approach.
>> 
>> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1744955
>> Signed-off-by: Sergio Lopez 
>> ---
>>  hw/block/virtio-blk.c | 25 -
>>  1 file changed, 24 insertions(+), 1 deletion(-)
>> 
>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>> index 18851601cb..c763d071f6 100644
>> --- a/hw/block/virtio-blk.c
>> +++ b/hw/block/virtio-blk.c
>> @@ -16,6 +16,7 @@
>>  #include "qemu/iov.h"
>>  #include "qemu/module.h"
>>  #include "qemu/error-report.h"
>> +#include "qemu/main-loop.h"
>>  #include "trace.h"
>>  #include "hw/block/block.h"
>>  #include "hw/qdev-properties.h"
>> @@ -1086,11 +1087,33 @@ static int virtio_blk_load_device(VirtIODevice 
>> *vdev, QEMUFile *f,
>>  return 0;
>>  }
>>  
>> +static void coroutine_fn virtio_resize_co_entry(void *opaque)
>> +{
>> +VirtIODevice *vdev = opaque;
>> +
>> +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>> +virtio_notify_config(vdev);
>> +aio_wait_kick();
>> +}
>> +
>>  static void virtio_blk_resize(void *opaque)
>>  {
>>  VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
>> +Coroutine *co;
>>  
>> -virtio_notify_config(vdev);
>> +if (qemu_in_coroutine() &&
>> +qemu_get_current_aio_context() != qemu_get_aio_context()) {
>> +/*
>> + * virtio_notify_config() needs to acquire the global mutex,
>> + * so calling it from a coroutine running on a non-main context
>> + * may cause a deadlock. Instead, create a new coroutine and
>> + * schedule it to be run on the main thread.
>> + */
>> +co = qemu_coroutine_create(virtio_resize_co_entry, vdev);
>> +aio_co_schedule(qemu_get_aio_context(), co);
>> +} else {
>> +virtio_notify_config(vdev);
>> +}
>>  }
>>  
>>  static const BlockDevOps virtio_block_ops = {
>> -- 
>> 2.21.0



signature.asc
Description: PGP signature

[Qemu-devel] [PATCH] target/s390x/kvm: Officially require at least kernel 3.15

2019-09-13 Thread Thomas Huth

Since QEMU v2.10, the KVM acceleration does not work on older kernels
anymore since the code accidentally requires the KVM_CAP_DEVICE_CTRL
capability now - it should have been optional instead.
Instead of fixing the bug, we asked in the ChangeLog of QEMU 2.11 - 3.0
that people should speak up if they still need support of QEMU running
with KVM on older kernels, but seems like nobody really complained.
Thus let's make this official now and turn it into a proper error
message, telling the users to use at least kernel 3.15 now.

Signed-off-by: Thomas Huth 
---
 hw/intc/s390_flic_kvm.c | 6 --
 target/s390x/kvm.c  | 7 +++
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
index 819aa5e198..cedccba8a9 100644
--- a/hw/intc/s390_flic_kvm.c
+++ b/hw/intc/s390_flic_kvm.c
@@ -589,12 +589,6 @@ static void kvm_s390_flic_realize(DeviceState *dev, Error 
**errp)
 goto fail;
 }
 flic_state->fd = -1;
-if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
-error_setg_errno(&errp_local, errno, "KVM is missing capability"
- " KVM_CAP_DEVICE_CTRL");
-trace_flic_no_device_api(errno);
-goto fail;
-}
 
 cd.type = KVM_DEV_TYPE_FLIC;
 ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd);
diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index cea71ac7c3..97a662ad0e 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -316,6 +316,13 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 MachineClass *mc = MACHINE_GET_CLASS(ms);
 
 mc->default_cpu_type = S390_CPU_TYPE_NAME("host");
+
+if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
+error_report("KVM is missing capability KVM_CAP_DEVICE_CTRL - "
+ "please use kernel 3.15 or newer");
+return -1;
+}
+
 cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS);
 cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
 cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
-- 
2.18.1

Re: [Qemu-devel] [PATCH] target/s390x/kvm: Officially require at least kernel 3.15

2019-09-13 Thread David Hildenbrand

On 13.09.19 09:46, Thomas Huth wrote:
> Since QEMU v2.10, the KVM acceleration does not work on older kernels
> anymore since the code accidentally requires the KVM_CAP_DEVICE_CTRL
> capability now - it should have been optional instead.
> Instead of fixing the bug, we asked in the ChangeLog of QEMU 2.11 - 3.0
> that people should speak up if they still need support of QEMU running
> with KVM on older kernels, but seems like nobody really complained.
> Thus let's make this official now and turn it into a proper error
> message, telling the users to use at least kernel 3.15 now.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/intc/s390_flic_kvm.c | 6 --
>  target/s390x/kvm.c  | 7 +++
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
> index 819aa5e198..cedccba8a9 100644
> --- a/hw/intc/s390_flic_kvm.c
> +++ b/hw/intc/s390_flic_kvm.c
> @@ -589,12 +589,6 @@ static void kvm_s390_flic_realize(DeviceState *dev, 
> Error **errp)
>  goto fail;
>  }
>  flic_state->fd = -1;
> -if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
> -error_setg_errno(&errp_local, errno, "KVM is missing capability"
> - " KVM_CAP_DEVICE_CTRL");
> -trace_flic_no_device_api(errno);
> -goto fail;
> -}
>  
>  cd.type = KVM_DEV_TYPE_FLIC;
>  ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd);
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index cea71ac7c3..97a662ad0e 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -316,6 +316,13 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
>  mc->default_cpu_type = S390_CPU_TYPE_NAME("host");
> +
> +if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
> +error_report("KVM is missing capability KVM_CAP_DEVICE_CTRL - "
> + "please use kernel 3.15 or newer");
> +return -1;
> +}
> +
>  cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS);
>  cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
>  cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
> 

The tracepoint has to go as well as fat as I can see.

Apart from that, looks good to me.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH] blockdev: avoid acquiring AioContext lock twice at do_drive_backup()

2019-09-13 Thread Max Reitz

On 12.09.19 18:16, Sergio Lopez wrote:
> do_drive_backup() acquires the AioContext lock of the corresponding
> BlockDriverState. This is not a problem when it's called from
> qmp_drive_backup(), but drive_backup_prepare() also acquires the lock
> before calling it.
> 
> This change adds a BlockDriverState argument to do_drive_backup(),
> which is used to signal that the context lock is already acquired and
> to save a couple of redundant calls.

But those redundant calls don’t really hurt (it’s just bdrv_lookup_bs(),
as far as I can tell).  Wouldn’t it be simpler to just release the
context lock in drive_backup_prepare() before calling do_drive_backup()?
 The BDS is drained anyway.

On top of that, do_backup_common() calls bdrv_try_set_aio_context() to
bring the target into the source’s AioContext.  However, this function
must be called with the old AioContext held, and the new context not held.

Currently, it’s called exactly the other way around: With the new
context held, but the old one not held.

So I think it indeed actually makes more sense to release the AioContext
before calling do_drive_backup(), and to move the
bdrv_try_set_aio_context() call for target_bs to the callers of
do_backup_common() (where they have not yet taken the AioContext lock).

Max

> Signed-off-by: Sergio Lopez 
> ---
>  blockdev.c | 54 ++
>  1 file changed, 38 insertions(+), 16 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index fbef6845c8..0cc6c69ceb 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1762,8 +1762,10 @@ typedef struct DriveBackupState {
>  BlockJob *job;
>  } DriveBackupState;
>  
> -static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
> -Error **errp);
> +static BlockJob *do_drive_backup(DriveBackup *backup,
> + BlockDriverState *backup_bs,
> + JobTxn *txn,
> + Error **errp);
>  
>  static void drive_backup_prepare(BlkActionState *common, Error **errp)
>  {
> @@ -1781,6 +1783,11 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  return;
>  }
>  
> +if (!bs->drv) {
> +error_setg(errp, "Device has no medium");
> +return;
> +}
> +
>  aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(aio_context);
>  
> @@ -1789,7 +1796,9 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  
>  state->bs = bs;
>  
> -state->job = do_drive_backup(backup, common->block_job_txn, &local_err);
> +state->job = do_drive_backup(backup, bs,
> + common->block_job_txn,
> + &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  goto out;
> @@ -3607,7 +3616,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
>  return job;
>  }
>  
> -static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
> +static BlockJob *do_drive_backup(DriveBackup *backup,
> + BlockDriverState *backup_bs,
> + JobTxn *txn,
>   Error **errp)
>  {
>  BlockDriverState *bs;
> @@ -3625,18 +3636,27 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
> JobTxn *txn,
>  backup->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
>  }
>  
> -bs = bdrv_lookup_bs(backup->device, backup->device, errp);
> -if (!bs) {
> -return NULL;
> -}
> +if (backup_bs) {
> +bs = backup_bs;
> +/*
> + * If the caller passes us a BDS, we assume it has already
> + * acquired the context lock.
> + */
> +aio_context = bdrv_get_aio_context(bs);
> +} else {
> +bs = bdrv_lookup_bs(backup->device, backup->device, errp);
> +if (!bs) {
> +return NULL;
> +}
>  
> -if (!bs->drv) {
> -error_setg(errp, "Device has no medium");
> -return NULL;
> -}
> +if (!bs->drv) {
> +error_setg(errp, "Device has no medium");
> +return NULL;
> +}
>  
> -aio_context = bdrv_get_aio_context(bs);
> -aio_context_acquire(aio_context);
> +aio_context = bdrv_get_aio_context(bs);
> +aio_context_acquire(aio_context);
> +}
>  
>  if (!backup->has_format) {
>  backup->format = backup->mode == NEW_IMAGE_MODE_EXISTING ?
> @@ -3713,7 +3733,9 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
> JobTxn *txn,
>  unref:
>  bdrv_unref(target_bs);
>  out:
> -aio_context_release(aio_context);
> +if (!backup_bs) {
> +aio_context_release(aio_context);
> +}
>  return job;
>  }
>  
> @@ -3721,7 +3743,7 @@ void qmp_drive_backup(DriveBackup *arg, Error **errp)
>  {
>  
>  BlockJob *job;
> -job = do_drive_backup(arg, NULL, errp);
> +job = do_drive_ba

Re: [Qemu-devel] [PATCH v2 3/3] xen: perform XenDevice clean-up in XenBus watch handler

2019-09-13 Thread Paul Durrant

> -Original Message-
> From: Xen-devel  On Behalf Of Paul 
> Durrant
> Sent: 12 September 2019 16:16
> To: Anthony Perard 
> Cc: xen-de...@lists.xenproject.org; Stefano Stabellini 
> ; qemu-devel@nongnu.org
> Subject: Re: [Xen-devel] [PATCH v2 3/3] xen: perform XenDevice clean-up in 
> XenBus watch handler
> 
> > -Original Message-
> > From: Anthony PERARD 
> > Sent: 12 September 2019 16:04
> > To: Paul Durrant 
> > Cc: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org; Stefano 
> > Stabellini
> 
> > Subject: Re: [PATCH v2 3/3] xen: perform XenDevice clean-up in XenBus watch 
> > handler
> >
> > On Thu, Sep 12, 2019 at 01:16:46PM +0100, Paul Durrant wrote:
> > > Cleaning up offine XenDevice objects directly in
> >   ^ offline
> >
> > > xen_device_backend_changed() is dangerous as xen_device_unrealize() will
> > > modify the watch list that is being walked. Even the QLIST_FOREACH_SAFE()
> > > used in notifier_list_notify() is insufficient as *two* notifiers (for
> > > the frontend and backend watches) are removed, thus potentially rendering
> > > the 'next' pointer unsafe.
> > >
> > > The solution is to use the XenBus backend_watch handler to do the clean-up
> > > instead, as it is invoked whilst walking a separate watch list.
> > >
> > > This patch therefore adds a new 'offline_devices' list to XenBus, to which
> > > offline devices are added by xen_device_backend_changed(). The XenBus
> > > backend_watch registration is also changed to not only invoke
> > > xen_bus_enumerate() but also a new xen_bus_cleanup() function, which will
> > > walk 'offline_devices' and perform the necessary actions.
> > > For safety a an extra 'online' check is also added to
> >  ^ one 'a' too many?
> >
> > > xen_bus_type_enumerate() to make sure that no attempt is made to create a
> > > new XenDevice object for a backend that is offline.
> > >
> > > NOTE: This patch also include some cosmetic changes:
> > >   - substitute the local variable name 'backend_state'
> > > in xen_bus_type_enumerate() with 'state', since there
> > > is no ambiguity with any other state in that context.
> > >   - change xen_device_state_is_active() to
> > > xen_device_frontend_is_active() (and pass a XenDevice directly)
> > > since the state tests contained therein only apply to a frontend.
> > >   - use 'state' rather then 'xendev->backend_state' in
> > > xen_device_backend_changed() to shorten the code.
> > >
> > > Signed-off-by: Paul Durrant 
> > > ---
> > >
> > > v2:
> > >  - Make sure we don't try to add a XenDevice to 'offline_devices' more 
> > > than
> > >once
> > > ---
> > >
> > >  /*
> > >   * If a backend is still 'online' then we should leave it alone but,
> > > - * if a backend is not 'online', then the device should be destroyed
> > > - * once the state is Closed.
> > > + * if a backend is not 'online', then the device is a candidate
> > > + * for destruction. Hence add it to the 'offline' list to be cleaned
> > > + * by xen_bus_cleanup().
> > >   */
> > > -if (!xendev->backend_online &&
> > > -(xendev->backend_state == XenbusStateClosed ||
> > > - xendev->backend_state == XenbusStateInitialising ||
> > > - xendev->backend_state == XenbusStateInitWait ||
> > > - xendev->backend_state == XenbusStateUnknown)) {
> > > -Error *local_err = NULL;
> > > +if (!online &&
> > > +(state == XenbusStateClosed ||  state == XenbusStateInitialising 
> > > ||
> > > + state == XenbusStateInitWait || state == XenbusStateUnknown) &&
> > > +!QLIST_NEXT(xendev, list)) {
> >
> > Could you add a note about this QLIST_NEXT? I don't think it's going to
> > be obvious enough why we check that there are no next item. I might only
> > understand it just because of your reply to the v1 of the patch.
> > (Well the changelog of the patch also point out what it's for.)
> >
> 
> Sure, it's worth a comment.

Actually, on closer inspection !QLIST_NEXT() is an insufficient checked. I had 
assumed that QLISTs were full doubly-linked lists but they are not; the last 
element on a list will still have a NULL next pointer. It will be sufficient, 
and also clearer to the reader, if I add a boolean to XenDevice which is set 
when it is added to the list.

  Paul

[Qemu-devel] [PATCH v3 1/3] xen / notify: introduce a new XenWatchList abstraction

2019-09-13 Thread Paul Durrant

Xenstore watch call-backs are already abstracted away from XenBus using
the XenWatch data structure but the associated NotifierList manipulation
and file handle registration is still open coded in various xen_bus_...()
functions.
This patch creates a new XenWatchList data structure to allow these
interactions to be abstracted away from XenBus as well. This is in
preparation for a subsequent patch which will introduce separate watch lists
for XenBus and XenDevice objects.

NOTE: This patch also introduces a new notifier_list_empty() helper function
  for the purposes of adding an assertion that a XenWatchList is not
  freed whilst its associated NotifierList is still occupied.

Signed-off-by: Paul Durrant 
Reviewed-by: Anthony Perard 
---
Cc: Stefano Stabellini 
---
 hw/xen/trace-events  |   5 +-
 hw/xen/xen-bus.c | 117 +--
 include/hw/xen/xen-bus.h |   3 +-
 include/qemu/notify.h|   2 +
 util/notify.c|   5 ++
 5 files changed, 87 insertions(+), 45 deletions(-)

diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index bc82ecb1a5..ac8d9c20d2 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -19,9 +19,8 @@ xen_bus_unrealize(void) ""
 xen_bus_enumerate(void) ""
 xen_bus_type_enumerate(const char *type) "type: %s"
 xen_bus_backend_create(const char *type, const char *path) "type: %s path: %s"
-xen_bus_add_watch(const char *node, const char *key, char *token) "node: %s 
key: %s token: %s"
-xen_bus_remove_watch(const char *node, const char *key, char *token) "node: %s 
key: %s token: %s"
-xen_bus_watch(const char *token) "token: %s"
+xen_bus_add_watch(const char *node, const char *key) "node: %s key: %s"
+xen_bus_remove_watch(const char *node, const char *key) "node: %s key: %s"
 xen_device_realize(const char *type, char *name) "type: %s name: %s"
 xen_device_unrealize(const char *type, char *name) "type: %s name: %s"
 xen_device_backend_state(const char *type, char *name, const char *state) 
"type: %s name: %s -> %s"
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 025df5e59f..28efaccff2 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -157,18 +157,60 @@ static void free_watch(XenWatch *watch)
 g_free(watch);
 }
 
-static XenWatch *xen_bus_add_watch(XenBus *xenbus, const char *node,
-   const char *key, XenWatchHandler handler,
-   void *opaque, Error **errp)
+struct XenWatchList {
+struct xs_handle *xsh;
+NotifierList notifiers;
+};
+
+static void watch_list_event(void *opaque)
+{
+XenWatchList *watch_list = opaque;
+char **v;
+const char *token;
+
+v = xs_check_watch(watch_list->xsh);
+if (!v) {
+return;
+}
+
+token = v[XS_WATCH_TOKEN];
+
+notifier_list_notify(&watch_list->notifiers, (void *)token);
+
+free(v);
+}
+
+static XenWatchList *watch_list_create(struct xs_handle *xsh)
+{
+XenWatchList *watch_list = g_new0(XenWatchList, 1);
+
+g_assert(xsh);
+
+watch_list->xsh = xsh;
+notifier_list_init(&watch_list->notifiers);
+qemu_set_fd_handler(xs_fileno(watch_list->xsh), watch_list_event, NULL,
+watch_list);
+
+return watch_list;
+}
+
+static void watch_list_destroy(XenWatchList *watch_list)
+{
+g_assert(notifier_list_empty(&watch_list->notifiers));
+qemu_set_fd_handler(xs_fileno(watch_list->xsh), NULL, NULL, NULL);
+g_free(watch_list);
+}
+
+static XenWatch *watch_list_add(XenWatchList *watch_list, const char *node,
+const char *key, XenWatchHandler handler,
+void *opaque, Error **errp)
 {
 XenWatch *watch = new_watch(node, key, handler, opaque);
 Error *local_err = NULL;
 
-trace_xen_bus_add_watch(watch->node, watch->key, watch->token);
+notifier_list_add(&watch_list->notifiers, &watch->notifier);
 
-notifier_list_add(&xenbus->watch_notifiers, &watch->notifier);
-
-xs_node_watch(xenbus->xsh, node, key, watch->token, &local_err);
+xs_node_watch(watch_list->xsh, node, key, watch->token, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 
@@ -181,18 +223,34 @@ static XenWatch *xen_bus_add_watch(XenBus *xenbus, const 
char *node,
 return watch;
 }
 
-static void xen_bus_remove_watch(XenBus *xenbus, XenWatch *watch,
- Error **errp)
+static void watch_list_remove(XenWatchList *watch_list, XenWatch *watch,
+  Error **errp)
 {
-trace_xen_bus_remove_watch(watch->node, watch->key, watch->token);
-
-xs_node_unwatch(xenbus->xsh, watch->node, watch->key, watch->token,
+xs_node_unwatch(watch_list->xsh, watch->node, watch->key, watch->token,
 errp);
 
 notifier_remove(&watch->notifier);
 free_watch(watch);
 }
 
+static XenWatch *xen_bus_add_watch(XenBus *xenbus, const char *node,
+   const char *key

[Qemu-devel] [PATCH v3 0/3] xen: fix a potential crash in xen-bus

2019-09-13 Thread Paul Durrant

This series fixes a potential segfault caused by NotifierList corruption
in xen-bus. The first two patches lay the groundwork and the third
actually fixes the problem.

Paul Durrant (3):
  xen / notify: introduce a new XenWatchList abstraction
  xen: introduce separate XenWatchList for XenDevice objects
  xen: perform XenDevice clean-up in XenBus watch handler

 hw/xen/trace-events  |   9 +-
 hw/xen/xen-bus.c | 277 ---
 include/hw/xen/xen-bus.h |   8 +-
 include/qemu/notify.h|   2 +
 util/notify.c|   5 +
 5 files changed, 220 insertions(+), 81 deletions(-)
---
Cc: Anthony Perard 
Cc: Stefano Stabellini 
-- 
2.20.1.2.gb21ebb6

[Qemu-devel] [PATCH v3 3/3] xen: perform XenDevice clean-up in XenBus watch handler

2019-09-13 Thread Paul Durrant

Cleaning up offline XenDevice objects directly in
xen_device_backend_changed() is dangerous as xen_device_unrealize() will
modify the watch list that is being walked. Even the QLIST_FOREACH_SAFE()
used in notifier_list_notify() is insufficient as *two* notifiers (for
the frontend and backend watches) are removed, thus potentially rendering
the 'next' pointer unsafe.

The solution is to use the XenBus backend_watch handler to do the clean-up
instead, as it is invoked whilst walking a separate watch list.

This patch therefore adds a new 'inactive_devices' list to XenBus, to which
offline devices are added by xen_device_backend_changed(). The XenBus
backend_watch registration is also changed to not only invoke
xen_bus_enumerate() but also a new xen_bus_cleanup() function, which will
walk 'inactive_devices' and perform the necessary actions.
For safety an extra 'online' check is also added to xen_bus_type_enumerate()
to make sure that no attempt is made to create a new XenDevice object for a
backend that is offline.

NOTE: This patch also includes some cosmetic changes:
  - substitute the local variable name 'backend_state'
in xen_bus_type_enumerate() with 'state', since there
is no ambiguity with any other state in that context.
  - change xen_device_state_is_active() to
xen_device_frontend_is_active() (and pass a XenDevice directly)
since the state tests contained therein only apply to a frontend.
  - use 'state' rather then 'xendev->backend_state' in
xen_device_backend_changed() to shorten the code.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 

v3:
 - s/offline_devices/inactive_devices/g
 - Also add an 'inactive' boolean to XenDevice which is set when the
   device is added to the inactive list so we really can make sure that it
   doesn't happen more than once

v2:
 - Make sure we don't try to add a XenDevice to 'offline_devices' more than
   once
---
 hw/xen/trace-events  |  2 +
 hw/xen/xen-bus.c | 94 +---
 include/hw/xen/xen-bus.h |  3 ++
 3 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 80ce3dafad..e6885bc751 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -17,8 +17,10 @@ xen_domid_restrict(int err) "err: %u"
 xen_bus_realize(void) ""
 xen_bus_unrealize(void) ""
 xen_bus_enumerate(void) ""
+xen_bus_cleanup(void) ""
 xen_bus_type_enumerate(const char *type) "type: %s"
 xen_bus_backend_create(const char *type, const char *path) "type: %s path: %s"
+xen_bus_device_cleanup(const char *type, char *name) "type: %s name: %s"
 xen_bus_add_watch(const char *node, const char *key) "node: %s key: %s"
 xen_bus_remove_watch(const char *node, const char *key) "node: %s key: %s"
 xen_device_realize(const char *type, char *name) "type: %s name: %s"
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 810a4e2df3..55c157393d 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -340,13 +340,18 @@ static void xen_bus_type_enumerate(XenBus *xenbus, const 
char *type)
 for (i = 0; i < n; i++) {
 char *backend_path = g_strdup_printf("%s/%s", domain_path,
  backend[i]);
-enum xenbus_state backend_state;
+enum xenbus_state state;
+unsigned int online;
 
 if (xs_node_scanf(xenbus->xsh, XBT_NULL, backend_path, "state",
-  NULL, "%u", &backend_state) != 1)
-backend_state = XenbusStateUnknown;
+  NULL, "%u", &state) != 1)
+state = XenbusStateUnknown;
 
-if (backend_state == XenbusStateInitialising) {
+if (xs_node_scanf(xenbus->xsh, XBT_NULL, backend_path, "online",
+  NULL, "%u", &online) != 1)
+online = 0;
+
+if (online && state == XenbusStateInitialising) {
 Error *local_err = NULL;
 
 xen_bus_backend_create(xenbus, type, backend[i], backend_path,
@@ -365,9 +370,8 @@ out:
 g_free(domain_path);
 }
 
-static void xen_bus_enumerate(void *opaque)
+static void xen_bus_enumerate(XenBus *xenbus)
 {
-XenBus *xenbus = opaque;
 char **type;
 unsigned int i, n;
 
@@ -385,6 +389,45 @@ static void xen_bus_enumerate(void *opaque)
 free(type);
 }
 
+static void xen_bus_device_cleanup(XenDevice *xendev)
+{
+const char *type = object_get_typename(OBJECT(xendev));
+Error *local_err = NULL;
+
+trace_xen_bus_device_cleanup(type, xendev->name);
+
+g_assert(!xendev->backend_online);
+
+if (!xen_backend_try_device_destroy(xendev, &local_err)) {
+object_unparent(OBJECT(xendev));
+}
+
+if (local_err) {
+error_report_err(local_err);
+}
+}
+
+static void xen_bus_cleanup(XenBus *xenbus)
+{
+XenDevice *xendev, *next;
+
+trace_xen_bus_cleanup();
+
+QLIST_FOREACH_SAFE(xendev, &xenbus->inactive_devices, list, next) {
+g_as

Re: [Qemu-devel] [PATCH v3 0/3] xen: fix a potential crash in xen-bus

2019-09-13 Thread Paul Durrant

I typo-ed 'xen-devel'. I'll re-send.

  Paul

> -Original Message-
> From: Paul Durrant 
> Sent: 13 September 2019 09:21
> To: qemu-devel@nongnu.org; xen-dev...@lists.xenproject.org
> Cc: Paul Durrant ; Anthony Perard 
> ; Stefano
> Stabellini 
> Subject: [PATCH v3 0/3] xen: fix a potential crash in xen-bus
> 
> This series fixes a potential segfault caused by NotifierList corruption
> in xen-bus. The first two patches lay the groundwork and the third
> actually fixes the problem.
> 
> Paul Durrant (3):
>   xen / notify: introduce a new XenWatchList abstraction
>   xen: introduce separate XenWatchList for XenDevice objects
>   xen: perform XenDevice clean-up in XenBus watch handler
> 
>  hw/xen/trace-events  |   9 +-
>  hw/xen/xen-bus.c | 277 ---
>  include/hw/xen/xen-bus.h |   8 +-
>  include/qemu/notify.h|   2 +
>  util/notify.c|   5 +
>  5 files changed, 220 insertions(+), 81 deletions(-)
> ---
> Cc: Anthony Perard 
> Cc: Stefano Stabellini 
> --
> 2.20.1.2.gb21ebb6

[Qemu-devel] [PATCH v3 2/3] xen: introduce separate XenWatchList for XenDevice objects

2019-09-13 Thread Paul Durrant

This patch uses the XenWatchList abstraction to add a separate watch list
for each device. This is more scalable than walking a single notifier
list for all watches and is also necessary to implement a bug-fix in a
subsequent patch.

Signed-off-by: Paul Durrant 
Reviewed-by: Anthony Perard 
---
Cc: Stefano Stabellini 
---
 hw/xen/trace-events  |  2 ++
 hw/xen/xen-bus.c | 72 
 include/hw/xen/xen-bus.h |  2 ++
 3 files changed, 62 insertions(+), 14 deletions(-)

diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index ac8d9c20d2..80ce3dafad 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -29,6 +29,8 @@ xen_device_backend_changed(const char *type, char *name) 
"type: %s name: %s"
 xen_device_frontend_state(const char *type, char *name, const char *state) 
"type: %s name: %s -> %s"
 xen_device_frontend_changed(const char *type, char *name) "type: %s name: %s"
 xen_device_unplug(const char *type, char *name) "type: %s name: %s"
+xen_device_add_watch(const char *type, char *name, const char *node, const 
char *key) "type: %s name: %s node: %s key: %s"
+xen_device_remove_watch(const char *type, char *name, const char *node, const 
char *key) "type: %s name: %s node: %s key: %s"
 
 # xen-bus-helper.c
 xs_node_create(const char *node) "%s"
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 28efaccff2..810a4e2df3 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -235,11 +235,11 @@ static void watch_list_remove(XenWatchList *watch_list, 
XenWatch *watch,
 
 static XenWatch *xen_bus_add_watch(XenBus *xenbus, const char *node,
const char *key, XenWatchHandler handler,
-   void *opaque, Error **errp)
+   Error **errp)
 {
 trace_xen_bus_add_watch(node, key);
 
-return watch_list_add(xenbus->watch_list, node, key, handler, opaque,
+return watch_list_add(xenbus->watch_list, node, key, handler, xenbus,
   errp);
 }
 
@@ -433,7 +433,7 @@ static void xen_bus_realize(BusState *bus, Error **errp)
 
 xenbus->backend_watch =
 xen_bus_add_watch(xenbus, "", /* domain root node */
-  "backend", xen_bus_enumerate, xenbus, &local_err);
+  "backend", xen_bus_enumerate, &local_err);
 if (local_err) {
 /* This need not be treated as a hard error so don't propagate */
 error_reportf_err(local_err,
@@ -621,6 +621,31 @@ static void xen_device_backend_changed(void *opaque)
 }
 }
 
+static XenWatch *xen_device_add_watch(XenDevice *xendev, const char *node,
+  const char *key,
+  XenWatchHandler handler,
+  Error **errp)
+{
+const char *type = object_get_typename(OBJECT(xendev));
+
+trace_xen_device_add_watch(type, xendev->name, node, key);
+
+return watch_list_add(xendev->watch_list, node, key, handler, xendev,
+  errp);
+}
+
+static void xen_device_remove_watch(XenDevice *xendev, XenWatch *watch,
+Error **errp)
+{
+const char *type = object_get_typename(OBJECT(xendev));
+
+trace_xen_device_remove_watch(type, xendev->name, watch->node,
+  watch->key);
+
+watch_list_remove(xendev->watch_list, watch, errp);
+}
+
+
 static void xen_device_backend_create(XenDevice *xendev, Error **errp)
 {
 XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
@@ -645,9 +670,9 @@ static void xen_device_backend_create(XenDevice *xendev, 
Error **errp)
 }
 
 xendev->backend_state_watch =
-xen_bus_add_watch(xenbus, xendev->backend_path,
-  "state", xen_device_backend_changed,
-  xendev, &local_err);
+xen_device_add_watch(xendev, xendev->backend_path,
+ "state", xen_device_backend_changed,
+ &local_err);
 if (local_err) {
 error_propagate_prepend(errp, local_err,
 "failed to watch backend state: ");
@@ -655,9 +680,9 @@ static void xen_device_backend_create(XenDevice *xendev, 
Error **errp)
 }
 
 xendev->backend_online_watch =
-xen_bus_add_watch(xenbus, xendev->backend_path,
-  "online", xen_device_backend_changed,
-  xendev, &local_err);
+xen_device_add_watch(xendev, xendev->backend_path,
+ "online", xen_device_backend_changed,
+ &local_err);
 if (local_err) {
 error_propagate_prepend(errp, local_err,
 "failed to watch backend online: ");
@@ -671,12 +696,12 @@ static void xen_device_backend_destroy(XenDevice *xendev)
 Error *local_err = NULL;
 
 if (xendev->backend_online_watch) {
-xen_b

[Qemu-devel] [PATCH v3 1/3] xen / notify: introduce a new XenWatchList abstraction

2019-09-13 Thread Paul Durrant

Xenstore watch call-backs are already abstracted away from XenBus using
the XenWatch data structure but the associated NotifierList manipulation
and file handle registration is still open coded in various xen_bus_...()
functions.
This patch creates a new XenWatchList data structure to allow these
interactions to be abstracted away from XenBus as well. This is in
preparation for a subsequent patch which will introduce separate watch lists
for XenBus and XenDevice objects.

NOTE: This patch also introduces a new notifier_list_empty() helper function
  for the purposes of adding an assertion that a XenWatchList is not
  freed whilst its associated NotifierList is still occupied.

Signed-off-by: Paul Durrant 
Reviewed-by: Anthony Perard 
---
Cc: Stefano Stabellini 
---
 hw/xen/trace-events  |   5 +-
 hw/xen/xen-bus.c | 117 +--
 include/hw/xen/xen-bus.h |   3 +-
 include/qemu/notify.h|   2 +
 util/notify.c|   5 ++
 5 files changed, 87 insertions(+), 45 deletions(-)

diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index bc82ecb1a5..ac8d9c20d2 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -19,9 +19,8 @@ xen_bus_unrealize(void) ""
 xen_bus_enumerate(void) ""
 xen_bus_type_enumerate(const char *type) "type: %s"
 xen_bus_backend_create(const char *type, const char *path) "type: %s path: %s"
-xen_bus_add_watch(const char *node, const char *key, char *token) "node: %s 
key: %s token: %s"
-xen_bus_remove_watch(const char *node, const char *key, char *token) "node: %s 
key: %s token: %s"
-xen_bus_watch(const char *token) "token: %s"
+xen_bus_add_watch(const char *node, const char *key) "node: %s key: %s"
+xen_bus_remove_watch(const char *node, const char *key) "node: %s key: %s"
 xen_device_realize(const char *type, char *name) "type: %s name: %s"
 xen_device_unrealize(const char *type, char *name) "type: %s name: %s"
 xen_device_backend_state(const char *type, char *name, const char *state) 
"type: %s name: %s -> %s"
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 025df5e59f..28efaccff2 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -157,18 +157,60 @@ static void free_watch(XenWatch *watch)
 g_free(watch);
 }
 
-static XenWatch *xen_bus_add_watch(XenBus *xenbus, const char *node,
-   const char *key, XenWatchHandler handler,
-   void *opaque, Error **errp)
+struct XenWatchList {
+struct xs_handle *xsh;
+NotifierList notifiers;
+};
+
+static void watch_list_event(void *opaque)
+{
+XenWatchList *watch_list = opaque;
+char **v;
+const char *token;
+
+v = xs_check_watch(watch_list->xsh);
+if (!v) {
+return;
+}
+
+token = v[XS_WATCH_TOKEN];
+
+notifier_list_notify(&watch_list->notifiers, (void *)token);
+
+free(v);
+}
+
+static XenWatchList *watch_list_create(struct xs_handle *xsh)
+{
+XenWatchList *watch_list = g_new0(XenWatchList, 1);
+
+g_assert(xsh);
+
+watch_list->xsh = xsh;
+notifier_list_init(&watch_list->notifiers);
+qemu_set_fd_handler(xs_fileno(watch_list->xsh), watch_list_event, NULL,
+watch_list);
+
+return watch_list;
+}
+
+static void watch_list_destroy(XenWatchList *watch_list)
+{
+g_assert(notifier_list_empty(&watch_list->notifiers));
+qemu_set_fd_handler(xs_fileno(watch_list->xsh), NULL, NULL, NULL);
+g_free(watch_list);
+}
+
+static XenWatch *watch_list_add(XenWatchList *watch_list, const char *node,
+const char *key, XenWatchHandler handler,
+void *opaque, Error **errp)
 {
 XenWatch *watch = new_watch(node, key, handler, opaque);
 Error *local_err = NULL;
 
-trace_xen_bus_add_watch(watch->node, watch->key, watch->token);
+notifier_list_add(&watch_list->notifiers, &watch->notifier);
 
-notifier_list_add(&xenbus->watch_notifiers, &watch->notifier);
-
-xs_node_watch(xenbus->xsh, node, key, watch->token, &local_err);
+xs_node_watch(watch_list->xsh, node, key, watch->token, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 
@@ -181,18 +223,34 @@ static XenWatch *xen_bus_add_watch(XenBus *xenbus, const 
char *node,
 return watch;
 }
 
-static void xen_bus_remove_watch(XenBus *xenbus, XenWatch *watch,
- Error **errp)
+static void watch_list_remove(XenWatchList *watch_list, XenWatch *watch,
+  Error **errp)
 {
-trace_xen_bus_remove_watch(watch->node, watch->key, watch->token);
-
-xs_node_unwatch(xenbus->xsh, watch->node, watch->key, watch->token,
+xs_node_unwatch(watch_list->xsh, watch->node, watch->key, watch->token,
 errp);
 
 notifier_remove(&watch->notifier);
 free_watch(watch);
 }
 
+static XenWatch *xen_bus_add_watch(XenBus *xenbus, const char *node,
+   const char *key

[Qemu-devel] [PATCH v3 3/3] xen: perform XenDevice clean-up in XenBus watch handler

2019-09-13 Thread Paul Durrant

Cleaning up offline XenDevice objects directly in
xen_device_backend_changed() is dangerous as xen_device_unrealize() will
modify the watch list that is being walked. Even the QLIST_FOREACH_SAFE()
used in notifier_list_notify() is insufficient as *two* notifiers (for
the frontend and backend watches) are removed, thus potentially rendering
the 'next' pointer unsafe.

The solution is to use the XenBus backend_watch handler to do the clean-up
instead, as it is invoked whilst walking a separate watch list.

This patch therefore adds a new 'inactive_devices' list to XenBus, to which
offline devices are added by xen_device_backend_changed(). The XenBus
backend_watch registration is also changed to not only invoke
xen_bus_enumerate() but also a new xen_bus_cleanup() function, which will
walk 'inactive_devices' and perform the necessary actions.
For safety an extra 'online' check is also added to xen_bus_type_enumerate()
to make sure that no attempt is made to create a new XenDevice object for a
backend that is offline.

NOTE: This patch also includes some cosmetic changes:
  - substitute the local variable name 'backend_state'
in xen_bus_type_enumerate() with 'state', since there
is no ambiguity with any other state in that context.
  - change xen_device_state_is_active() to
xen_device_frontend_is_active() (and pass a XenDevice directly)
since the state tests contained therein only apply to a frontend.
  - use 'state' rather then 'xendev->backend_state' in
xen_device_backend_changed() to shorten the code.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 

v3:
 - s/offline_devices/inactive_devices/g
 - Also add an 'inactive' boolean to XenDevice which is set when the
   device is added to the inactive list so we really can make sure that it
   doesn't happen more than once

v2:
 - Make sure we don't try to add a XenDevice to 'offline_devices' more than
   once
---
 hw/xen/trace-events  |  2 +
 hw/xen/xen-bus.c | 94 +---
 include/hw/xen/xen-bus.h |  3 ++
 3 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 80ce3dafad..e6885bc751 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -17,8 +17,10 @@ xen_domid_restrict(int err) "err: %u"
 xen_bus_realize(void) ""
 xen_bus_unrealize(void) ""
 xen_bus_enumerate(void) ""
+xen_bus_cleanup(void) ""
 xen_bus_type_enumerate(const char *type) "type: %s"
 xen_bus_backend_create(const char *type, const char *path) "type: %s path: %s"
+xen_bus_device_cleanup(const char *type, char *name) "type: %s name: %s"
 xen_bus_add_watch(const char *node, const char *key) "node: %s key: %s"
 xen_bus_remove_watch(const char *node, const char *key) "node: %s key: %s"
 xen_device_realize(const char *type, char *name) "type: %s name: %s"
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 810a4e2df3..55c157393d 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -340,13 +340,18 @@ static void xen_bus_type_enumerate(XenBus *xenbus, const 
char *type)
 for (i = 0; i < n; i++) {
 char *backend_path = g_strdup_printf("%s/%s", domain_path,
  backend[i]);
-enum xenbus_state backend_state;
+enum xenbus_state state;
+unsigned int online;
 
 if (xs_node_scanf(xenbus->xsh, XBT_NULL, backend_path, "state",
-  NULL, "%u", &backend_state) != 1)
-backend_state = XenbusStateUnknown;
+  NULL, "%u", &state) != 1)
+state = XenbusStateUnknown;
 
-if (backend_state == XenbusStateInitialising) {
+if (xs_node_scanf(xenbus->xsh, XBT_NULL, backend_path, "online",
+  NULL, "%u", &online) != 1)
+online = 0;
+
+if (online && state == XenbusStateInitialising) {
 Error *local_err = NULL;
 
 xen_bus_backend_create(xenbus, type, backend[i], backend_path,
@@ -365,9 +370,8 @@ out:
 g_free(domain_path);
 }
 
-static void xen_bus_enumerate(void *opaque)
+static void xen_bus_enumerate(XenBus *xenbus)
 {
-XenBus *xenbus = opaque;
 char **type;
 unsigned int i, n;
 
@@ -385,6 +389,45 @@ static void xen_bus_enumerate(void *opaque)
 free(type);
 }
 
+static void xen_bus_device_cleanup(XenDevice *xendev)
+{
+const char *type = object_get_typename(OBJECT(xendev));
+Error *local_err = NULL;
+
+trace_xen_bus_device_cleanup(type, xendev->name);
+
+g_assert(!xendev->backend_online);
+
+if (!xen_backend_try_device_destroy(xendev, &local_err)) {
+object_unparent(OBJECT(xendev));
+}
+
+if (local_err) {
+error_report_err(local_err);
+}
+}
+
+static void xen_bus_cleanup(XenBus *xenbus)
+{
+XenDevice *xendev, *next;
+
+trace_xen_bus_cleanup();
+
+QLIST_FOREACH_SAFE(xendev, &xenbus->inactive_devices, list, next) {
+g_as

[Qemu-devel] [PATCH v3 0/3] xen: fix a potential crash in xen-bus

2019-09-13 Thread Paul Durrant

This series fixes a potential segfault caused by NotifierList corruption
in xen-bus. The first two patches lay the groundwork and the third
actually fixes the problem.

Paul Durrant (3):
  xen / notify: introduce a new XenWatchList abstraction
  xen: introduce separate XenWatchList for XenDevice objects
  xen: perform XenDevice clean-up in XenBus watch handler

 hw/xen/trace-events  |   9 +-
 hw/xen/xen-bus.c | 277 ---
 include/hw/xen/xen-bus.h |   8 +-
 include/qemu/notify.h|   2 +
 util/notify.c|   5 +
 5 files changed, 220 insertions(+), 81 deletions(-)
---
Cc: Anthony Perard 
Cc: Stefano Stabellini 
-- 
2.20.1.2.gb21ebb6

[Qemu-devel] [PATCH v3 2/3] xen: introduce separate XenWatchList for XenDevice objects

2019-09-13 Thread Paul Durrant

This patch uses the XenWatchList abstraction to add a separate watch list
for each device. This is more scalable than walking a single notifier
list for all watches and is also necessary to implement a bug-fix in a
subsequent patch.

Signed-off-by: Paul Durrant 
Reviewed-by: Anthony Perard 
---
Cc: Stefano Stabellini 
---
 hw/xen/trace-events  |  2 ++
 hw/xen/xen-bus.c | 72 
 include/hw/xen/xen-bus.h |  2 ++
 3 files changed, 62 insertions(+), 14 deletions(-)

diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index ac8d9c20d2..80ce3dafad 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -29,6 +29,8 @@ xen_device_backend_changed(const char *type, char *name) 
"type: %s name: %s"
 xen_device_frontend_state(const char *type, char *name, const char *state) 
"type: %s name: %s -> %s"
 xen_device_frontend_changed(const char *type, char *name) "type: %s name: %s"
 xen_device_unplug(const char *type, char *name) "type: %s name: %s"
+xen_device_add_watch(const char *type, char *name, const char *node, const 
char *key) "type: %s name: %s node: %s key: %s"
+xen_device_remove_watch(const char *type, char *name, const char *node, const 
char *key) "type: %s name: %s node: %s key: %s"
 
 # xen-bus-helper.c
 xs_node_create(const char *node) "%s"
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 28efaccff2..810a4e2df3 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -235,11 +235,11 @@ static void watch_list_remove(XenWatchList *watch_list, 
XenWatch *watch,
 
 static XenWatch *xen_bus_add_watch(XenBus *xenbus, const char *node,
const char *key, XenWatchHandler handler,
-   void *opaque, Error **errp)
+   Error **errp)
 {
 trace_xen_bus_add_watch(node, key);
 
-return watch_list_add(xenbus->watch_list, node, key, handler, opaque,
+return watch_list_add(xenbus->watch_list, node, key, handler, xenbus,
   errp);
 }
 
@@ -433,7 +433,7 @@ static void xen_bus_realize(BusState *bus, Error **errp)
 
 xenbus->backend_watch =
 xen_bus_add_watch(xenbus, "", /* domain root node */
-  "backend", xen_bus_enumerate, xenbus, &local_err);
+  "backend", xen_bus_enumerate, &local_err);
 if (local_err) {
 /* This need not be treated as a hard error so don't propagate */
 error_reportf_err(local_err,
@@ -621,6 +621,31 @@ static void xen_device_backend_changed(void *opaque)
 }
 }
 
+static XenWatch *xen_device_add_watch(XenDevice *xendev, const char *node,
+  const char *key,
+  XenWatchHandler handler,
+  Error **errp)
+{
+const char *type = object_get_typename(OBJECT(xendev));
+
+trace_xen_device_add_watch(type, xendev->name, node, key);
+
+return watch_list_add(xendev->watch_list, node, key, handler, xendev,
+  errp);
+}
+
+static void xen_device_remove_watch(XenDevice *xendev, XenWatch *watch,
+Error **errp)
+{
+const char *type = object_get_typename(OBJECT(xendev));
+
+trace_xen_device_remove_watch(type, xendev->name, watch->node,
+  watch->key);
+
+watch_list_remove(xendev->watch_list, watch, errp);
+}
+
+
 static void xen_device_backend_create(XenDevice *xendev, Error **errp)
 {
 XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
@@ -645,9 +670,9 @@ static void xen_device_backend_create(XenDevice *xendev, 
Error **errp)
 }
 
 xendev->backend_state_watch =
-xen_bus_add_watch(xenbus, xendev->backend_path,
-  "state", xen_device_backend_changed,
-  xendev, &local_err);
+xen_device_add_watch(xendev, xendev->backend_path,
+ "state", xen_device_backend_changed,
+ &local_err);
 if (local_err) {
 error_propagate_prepend(errp, local_err,
 "failed to watch backend state: ");
@@ -655,9 +680,9 @@ static void xen_device_backend_create(XenDevice *xendev, 
Error **errp)
 }
 
 xendev->backend_online_watch =
-xen_bus_add_watch(xenbus, xendev->backend_path,
-  "online", xen_device_backend_changed,
-  xendev, &local_err);
+xen_device_add_watch(xendev, xendev->backend_path,
+ "online", xen_device_backend_changed,
+ &local_err);
 if (local_err) {
 error_propagate_prepend(errp, local_err,
 "failed to watch backend online: ");
@@ -671,12 +696,12 @@ static void xen_device_backend_destroy(XenDevice *xendev)
 Error *local_err = NULL;
 
 if (xendev->backend_online_watch) {
-xen_b

[Qemu-devel] [PATCH v1 0/6] Allow memory_region_register_iommu_notifier() to fail

2019-09-13 Thread Eric Auger

This series allows the memory_region_register_iommu_notifier()
to fail. As of now, when a MAP notifier is attempted to be
registered along with SMMUv3, Intel iommu without caching mode
or AMD IOMMU, we exit in the IOMMU MR notify_flag_changed()
callback. In case of VFIO assigned device hotplug, this could be
handled more nicely directly within the VFIO code, simply rejecting
the hotplug without exiting. This is what the series achieves
by handling the memory_region_register_iommu_notifier() returned
value.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/v4.1.0_register_iommu_notifier_fail_v1

History:

Follow-up of "VFIO/SMMUv3: Fail on VFIO/HW nested paging detection"
https://patchew.org/QEMU/20190829090141.21821-1-eric.au...@redhat.com/

Eric Auger (6):
  memory: allow memory_region_register_iommu_notifier() to fail
  vfio/common: Handle memory_region_register_iommu_notifier() failure
  exec: assert on memory_region_register_iommu_notifier() failure
  vhost: assert on memory_region_register_iommu_notifier() failure
  intel_iommu: Let vtd_iommu_notify_flag_changed() fail
  amd_iommu: Let amdvi_iommu_notify_flag_changed() fail

 exec.c|  3 ++-
 hw/arm/smmuv3.c   |  8 +---
 hw/i386/amd_iommu.c   |  9 +
 hw/i386/intel_iommu.c |  9 +
 hw/ppc/spapr_iommu.c  |  7 ---
 hw/vfio/common.c  |  8 ++--
 hw/virtio/vhost.c |  2 +-
 include/exec/memory.h | 18 +-
 memory.c  | 28 ++--
 9 files changed, 59 insertions(+), 33 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH v1 1/6] memory: allow memory_region_register_iommu_notifier() to fail

2019-09-13 Thread Eric Auger

Currently, when a notifier is attempted to be registered and its
flags are not supported (especially the MAP one) by the IOMMU MR,
we generally abruptly exit in the IOMMU code. The failure could be
handled more nicely in the caller and especially in the VFIO code.

So let's allow memory_region_register_iommu_notifier() to fail as
well as notify_flag_changed() callback.

All sites implementing the callback are updated. This patch does
not yet remove the exit(1) in the intel_iommu and amd_iommu code.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c   |  8 +---
 hw/i386/amd_iommu.c   |  7 ---
 hw/i386/intel_iommu.c |  7 ---
 hw/ppc/spapr_iommu.c  |  7 ---
 include/exec/memory.h | 18 +-
 memory.c  | 28 ++--
 6 files changed, 48 insertions(+), 27 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index db051dcac8..d7a86fc505 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1469,9 +1469,9 @@ static void smmuv3_class_init(ObjectClass *klass, void 
*data)
 dc->realize = smmu_realize;
 }
 
-static void smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
-   IOMMUNotifierFlag old,
-   IOMMUNotifierFlag new)
+static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
+  IOMMUNotifierFlag old,
+  IOMMUNotifierFlag new)
 {
 SMMUDevice *sdev = container_of(iommu, SMMUDevice, iommu);
 SMMUv3State *s3 = sdev->smmu;
@@ -1483,6 +1483,7 @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 
 warn_report("SMMUv3 does not support notification on MAP: "
  "device %s will not function properly", pcidev->name);
+return -EINVAL;
 }
 
 if (old == IOMMU_NOTIFIER_NONE) {
@@ -1492,6 +1493,7 @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 trace_smmuv3_notify_flag_del(iommu->parent_obj.name);
 QLIST_REMOVE(sdev, next);
 }
+return 0;
 }
 
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 08884523e2..137ba367db 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1466,9 +1466,9 @@ static const MemoryRegionOps mmio_mem_ops = {
 }
 };
 
-static void amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
-IOMMUNotifierFlag old,
-IOMMUNotifierFlag new)
+static int amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
+   IOMMUNotifierFlag old,
+   IOMMUNotifierFlag new)
 {
 AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
 
@@ -1478,6 +1478,7 @@ static void 
amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
  PCI_FUNC(as->devfn));
 exit(1);
 }
+return 0;
 }
 
 static void amdvi_init(AMDVIState *s)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 75ca6f9c70..7a89ea9ba1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2921,9 +2921,9 @@ static IOMMUTLBEntry 
vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
 return iotlb;
 }
 
-static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
-  IOMMUNotifierFlag old,
-  IOMMUNotifierFlag new)
+static int vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
+ IOMMUNotifierFlag old,
+ IOMMUNotifierFlag new)
 {
 VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
 IntelIOMMUState *s = vtd_as->iommu_state;
@@ -2942,6 +2942,7 @@ static void 
vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 } else if (new == IOMMU_NOTIFIER_NONE) {
 QLIST_REMOVE(vtd_as, next);
 }
+return 0;
 }
 
 static int vtd_post_load(void *opaque, int version_id)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index e87b3d50f7..8d3b38f90b 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -205,9 +205,9 @@ static int spapr_tce_get_attr(IOMMUMemoryRegion *iommu,
 return -EINVAL;
 }
 
-static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
-  IOMMUNotifierFlag old,
-  IOMMUNotifierFlag new)
+static int spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
+ IOMMUNotifierFlag old,
+ IOMMUNotifierFlag new)
 {
 struct SpaprTceTable *tbl = container_of(iommu, SpaprTceTable, iommu);
 
@@ -216,6 +216,7 @@ static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 } else if (old

[Qemu-devel] [PATCH v1 4/6] vhost: assert on memory_region_register_iommu_notifier() failure

2019-09-13 Thread Eric Auger

memory_region_register_iommu_notifier now returns an error
in case of failure. Assert in such a case.

Signed-off-by: Eric Auger 
---
 hw/virtio/vhost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 34accdf615..9b5551fc4d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -696,7 +696,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
 iommu->iommu_offset = section->offset_within_address_space -
   section->offset_within_region;
 iommu->hdev = dev;
-memory_region_register_iommu_notifier(section->mr, &iommu->n);
+assert(!memory_region_register_iommu_notifier(section->mr, &iommu->n));
 QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
 /* TODO: can replay help performance here? */
 }
-- 
2.20.1

[Qemu-devel] [PATCH v1 6/6] amd_iommu: Let amdvi_iommu_notify_flag_changed() fail

2019-09-13 Thread Eric Auger

In case a MAP notifier is attempted to be registered,
let's simply return an error. This latter now is
handled in the VFIO code.

Signed-off-by: Eric Auger 
---
 hw/i386/amd_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 137ba367db..09dee48fac 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1476,7 +1476,7 @@ static int 
amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 error_report("device %02x.%02x.%x requires iommu notifier which is not 
"
  "currently supported", as->bus_num, PCI_SLOT(as->devfn),
  PCI_FUNC(as->devfn));
-exit(1);
+return -EINVAL;
 }
 return 0;
 }
-- 
2.20.1

[Qemu-devel] [PATCH v1 2/6] vfio/common: Handle memory_region_register_iommu_notifier() failure

2019-09-13 Thread Eric Auger

Now memory_region_register_iommu_notifier() is allowed to fail,
let's handle the returned value in vfio_listener_region_add().

This will allow to remove the error handling (exit) in the
IOMMUs that implement a notify_flag_changed() that sometimes
cannot accept the MAP flag.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3e03c495d8..d57d72cfb9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -630,9 +630,13 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 section->offset_within_region,
 int128_get64(llend),
 iommu_idx);
-QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
-memory_region_register_iommu_notifier(section->mr, &giommu->n);
+ret = memory_region_register_iommu_notifier(section->mr, &giommu->n);
+if (ret) {
+g_free(giommu);
+goto fail;
+}
+QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 memory_region_iommu_replay(giommu->iommu, &giommu->n);
 
 return;
-- 
2.20.1

[Qemu-devel] [Bug 1843852] Re: QEMU does not express a dependency on perl-Test-Harness

2019-09-13 Thread Alex Bennée

Given we require python perhaps the simplest solution would be to re-
write the tap-driver as a python script rather than adding another
configure check?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1843852

Title:
  QEMU does not express a dependency on perl-Test-Harness

Status in QEMU:
  New

Bug description:
  This is a minor thing; in Fedora you can install most of the developer
  dependencies by issuing something like `dnf builddep qemu-kvm` and
  this takes care of just about everything such that you can run
  ./configure and make.

  For "make check" though, configure doesn't catch that you'll need
  perl-Test-Harness; so it fails halfway through the check routine, and
  you'll see this:

  ```
  Can't locate TAP/Parser.pm in @INC (you may need to install the TAP::Parser 
module) (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 
/usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 
/usr/share/perl5) at ./scripts/tap-driver.pl line 30.
  BEGIN failed--compilation aborted at ./scripts/tap-driver.pl line 30.
  make: *** [/home/jhuston/src/qemu/tests/Makefile.include:905: check-unit] 
Error 2
  ```

  I'm not sure how we should express this dependency; it shouldn't be a
  requirement for building, but it IS a dependency for testing. We
  probably ought not let users skip the qapi tests just because they
  don't have the perl requirement met.

  (And, separately, the Fedora package should list this as a builddep,
  but that's not an issue for here.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1843852/+subscriptions

[Qemu-devel] [PATCH v1 3/6] exec: assert on memory_region_register_iommu_notifier() failure

2019-09-13 Thread Eric Auger

memory_region_register_iommu_notifier now returns an error
in case of failure. Assert in such a case.

Signed-off-by: Eric Auger 
---
 exec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index 235d6bc883..da30251a2b 100644
--- a/exec.c
+++ b/exec.c
@@ -692,7 +692,8 @@ static void tcg_register_iommu_notifier(CPUState *cpu,
 0,
 HWADDR_MAX,
 iommu_idx);
-memory_region_register_iommu_notifier(notifier->mr, ¬ifier->n);
+assert(!memory_region_register_iommu_notifier(notifier->mr,
+  ¬ifier->n));
 }
 
 if (!notifier->active) {
-- 
2.20.1

[Qemu-devel] [PATCH v1 5/6] intel_iommu: Let vtd_iommu_notify_flag_changed() fail

2019-09-13 Thread Eric Auger

In case a MAP notifier is attempted to be registered without
caching mode, let's simply return an error. This latter now is
handled in the VFIO code.

Signed-off-by: Eric Auger 
---
 hw/i386/intel_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 7a89ea9ba1..2f66d6882c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2931,7 +2931,7 @@ static int 
vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
 if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
 error_report("We need to set caching-mode=on for intel-iommu to enable 
"
  "device assignment with IOMMU protection.");
-exit(1);
+return -EINVAL;
 }
 
 /* Update per-address-space notifier flags */
-- 
2.20.1

[Qemu-devel] [Bug 1841592] Re: ppc: softfloat float implementation issues

2019-09-13 Thread Alex Bennée

It looks like the test case isn't properly exercising the code that is
likely to be wrong. It sounds like we need a proper comprehensive
testcase for fused operations (along the line of the ARM fcvt test
case). This can probably be a multiarch testcase which we can build for
all the various targets.

** Changed in: qemu
   Status: Fix Committed => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1841592

Title:
  ppc: softfloat float implementation issues

Status in QEMU:
  Incomplete

Bug description:
  Per bug #1841491, Richard Henderson (rth) said:
  > The float test failure is part of a larger problem for target/powerpc
  > in which all float routines are implemented incorrectly. They are all
  > implemented as double operations with rounding to float as a second
  > step. Which not only produces incorrect exceptions, as in this case,
  > but incorrect numerical results from the double rounding.
  >
  > This should probably be split to a separate bug...

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1841592/+subscriptions

Re: [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io

2019-09-13 Thread Max Reitz

On 16.08.19 17:30, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is an asynchronous scheme for handling fragmented qcow2
> reads and writes. Both qcow2 read and write functions loops through
> sequential portions of data. The series aim it to parallelize these
> loops iterations.
> It improves performance for fragmented qcow2 images, I've tested it
> as described below.

Thanks, I’ve changed two things:
- Replaced assert((x & (BDRV_SECTOR_SIZE - 1)) == 0) by
  assert(QEMU_IS_ALIGNED(x, BDRV_SECTOR_SIZE)) in patch 3 (conflict with
  “block: Use QEMU_IS_ALIGNED”), and
- Replaced the remaining instance of “qcow2_co_do_pwritev()” by
  “qcow2_co_pwritev_task()” in a comment in patch 4

and applied the series to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH] virtio-blk: schedule virtio_notify_config to run on main context

2019-09-13 Thread Kevin Wolf

Am 12.09.2019 um 21:51 hat Michael S. Tsirkin geschrieben:
> On Thu, Sep 12, 2019 at 08:19:25PM +0200, Sergio Lopez wrote:
> > Another AioContext-related issue, and this is a tricky one.
> > 
> > Executing a QMP block_resize request for a virtio-blk device running
> > on an iothread may cause a deadlock involving the following mutexes:
> > 
> >  - main thead
> >   * Has acquired: qemu_mutex_global.
> >   * Is trying the acquire: iothread AioContext lock via
> > AIO_WAIT_WHILE (after aio_poll).
> > 
> >  - iothread
> >   * Has acquired: AioContext lock.
> >   * Is trying to acquire: qemu_mutex_global (via
> > virtio_notify_config->prepare_mmio_access).
> 
> Hmm is this really the only case iothread takes qemu mutex?
> If any such access can deadlock, don't we need a generic
> solution? Maybe main thread can drop qemu mutex
> before taking io thread AioContext lock?

The rule is that iothreads must not take the qemu mutex. If they do
(like in this case), it's a bug.

Maybe we could actually assert this in qemu_mutex_lock_iothread()?

> > With this change, virtio_blk_resize checks if it's being called from a
> > coroutine context running on a non-main thread, and if that's the
> > case, creates a new coroutine and schedules it to be run on the main
> > thread.
> > 
> > This works, but means the actual operation is done
> > asynchronously, perhaps opening a window in which a "device_del"
> > operation may fit and remove the VirtIODevice before
> > virtio_notify_config() is executed.
> > 
> > I *think* it shouldn't be possible, as BHs will be processed before
> > any new QMP/monitor command, but I'm open to a different approach.
> > 
> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1744955
> > Signed-off-by: Sergio Lopez 
> > ---
> >  hw/block/virtio-blk.c | 25 -
> >  1 file changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index 18851601cb..c763d071f6 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -16,6 +16,7 @@
> >  #include "qemu/iov.h"
> >  #include "qemu/module.h"
> >  #include "qemu/error-report.h"
> > +#include "qemu/main-loop.h"
> >  #include "trace.h"
> >  #include "hw/block/block.h"
> >  #include "hw/qdev-properties.h"
> > @@ -1086,11 +1087,33 @@ static int virtio_blk_load_device(VirtIODevice 
> > *vdev, QEMUFile *f,
> >  return 0;
> >  }
> >  
> > +static void coroutine_fn virtio_resize_co_entry(void *opaque)
> > +{
> > +VirtIODevice *vdev = opaque;
> > +
> > +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > +virtio_notify_config(vdev);
> > +aio_wait_kick();
> > +}
> > +
> >  static void virtio_blk_resize(void *opaque)
> >  {
> >  VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
> > +Coroutine *co;
> >  
> > -virtio_notify_config(vdev);
> > +if (qemu_in_coroutine() &&
> > +qemu_get_current_aio_context() != qemu_get_aio_context()) {
> > +/*
> > + * virtio_notify_config() needs to acquire the global mutex,
> > + * so calling it from a coroutine running on a non-main context
> > + * may cause a deadlock. Instead, create a new coroutine and
> > + * schedule it to be run on the main thread.
> > + */
> > +co = qemu_coroutine_create(virtio_resize_co_entry, vdev);
> > +aio_co_schedule(qemu_get_aio_context(), co);
> > +} else {
> > +virtio_notify_config(vdev);
> > +}
> >  }

Wouldn't a simple BH suffice (aio_bh_schedule_oneshot)? I don't see why
you need a coroutine when you never yield.

The reason why it deadlocks also has nothing to do with whether we are
called from a coroutine or not. The important part is that we're running
in an iothread.

Kevin

Re: [Qemu-devel] [PATCH 2/6] exec.c: remove an unnecessary assert on PHYS_MAP_NODE_NIL in phys_map_node_alloc()

2019-09-13 Thread Paolo Bonzini

On 13/09/19 01:02, Wei Yang wrote:
> It shows PHYS_MAP_NODE_NIL may represents more node the tree could hold.

Which is good, it means the assert can trigger.

> The assert here is not harmful, while maybe we can have a better way to handle
> it?

I don't know... The assert just says "careful, someone treats
PHYS_MAP_NODE_NIL specially!".  It's documentation too.

Paolo

Re: [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io

2019-09-13 Thread Vladimir Sementsov-Ogievskiy

13.09.2019 11:58, Max Reitz wrote:
> On 16.08.19 17:30, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> Here is an asynchronous scheme for handling fragmented qcow2
>> reads and writes. Both qcow2 read and write functions loops through
>> sequential portions of data. The series aim it to parallelize these
>> loops iterations.
>> It improves performance for fragmented qcow2 images, I've tested it
>> as described below.
> 
> Thanks, I’ve changed two things:
> - Replaced assert((x & (BDRV_SECTOR_SIZE - 1)) == 0) by
>assert(QEMU_IS_ALIGNED(x, BDRV_SECTOR_SIZE)) in patch 3 (conflict with
>“block: Use QEMU_IS_ALIGNED”), and
> - Replaced the remaining instance of “qcow2_co_do_pwritev()” by
>“qcow2_co_pwritev_task()” in a comment in patch 4
> 
> and applied the series to my block branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block
> 
> Max
> 

Thank you!!!

-- 
Best regards,
Vladimir

[Qemu-devel] [PATCH v2] target/s390x/kvm: Officially require at least kernel 3.15

2019-09-13 Thread Thomas Huth

Since QEMU v2.10, the KVM acceleration does not work on older kernels
anymore since the code accidentally requires the KVM_CAP_DEVICE_CTRL
capability now - it should have been optional instead.
Instead of fixing the bug, we asked in the ChangeLog of QEMU 2.11 - 3.0
that people should speak up if they still need support of QEMU running
with KVM on older kernels, but seems like nobody really complained.
Thus let's make this official now and turn it into a proper error
message, telling the users to use at least kernel 3.15 now.

Signed-off-by: Thomas Huth 
---
 v2: Remove also the entry in trace-events

 hw/intc/s390_flic_kvm.c | 6 --
 hw/intc/trace-events| 1 -
 target/s390x/kvm.c  | 7 +++
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
index 819aa5e198..cedccba8a9 100644
--- a/hw/intc/s390_flic_kvm.c
+++ b/hw/intc/s390_flic_kvm.c
@@ -589,12 +589,6 @@ static void kvm_s390_flic_realize(DeviceState *dev, Error 
**errp)
 goto fail;
 }
 flic_state->fd = -1;
-if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
-error_setg_errno(&errp_local, errno, "KVM is missing capability"
- " KVM_CAP_DEVICE_CTRL");
-trace_flic_no_device_api(errno);
-goto fail;
-}
 
 cd.type = KVM_DEV_TYPE_FLIC;
 ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd);
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 90c9d07c1a..719f46b516 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -75,7 +75,6 @@ xics_ics_simple_eoi(int nr) "ics_eoi: irq 0x%x"
 
 # s390_flic_kvm.c
 flic_create_device(int err) "flic: create device failed %d"
-flic_no_device_api(int err) "flic: no Device Contral API support %d"
 flic_reset_failed(int err) "flic: reset failed %d"
 
 # s390_flic.c
diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index cea71ac7c3..97a662ad0e 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -316,6 +316,13 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 MachineClass *mc = MACHINE_GET_CLASS(ms);
 
 mc->default_cpu_type = S390_CPU_TYPE_NAME("host");
+
+if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
+error_report("KVM is missing capability KVM_CAP_DEVICE_CTRL - "
+ "please use kernel 3.15 or newer");
+return -1;
+}
+
 cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS);
 cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
 cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
-- 
2.18.1

Re: [Qemu-devel] [PATCH v2] target/s390x/kvm: Officially require at least kernel 3.15

2019-09-13 Thread David Hildenbrand

I'd use the title "s390x/kvm: ..."

Reviewed-by: David Hildenbrand 

On 13.09.19 11:14, Thomas Huth wrote:
> Since QEMU v2.10, the KVM acceleration does not work on older kernels
> anymore since the code accidentally requires the KVM_CAP_DEVICE_CTRL
> capability now - it should have been optional instead.
> Instead of fixing the bug, we asked in the ChangeLog of QEMU 2.11 - 3.0
> that people should speak up if they still need support of QEMU running
> with KVM on older kernels, but seems like nobody really complained.
> Thus let's make this official now and turn it into a proper error
> message, telling the users to use at least kernel 3.15 now.
> 
> Signed-off-by: Thomas Huth 
> ---
>  v2: Remove also the entry in trace-events
> 
>  hw/intc/s390_flic_kvm.c | 6 --
>  hw/intc/trace-events| 1 -
>  target/s390x/kvm.c  | 7 +++
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
> index 819aa5e198..cedccba8a9 100644
> --- a/hw/intc/s390_flic_kvm.c
> +++ b/hw/intc/s390_flic_kvm.c
> @@ -589,12 +589,6 @@ static void kvm_s390_flic_realize(DeviceState *dev, 
> Error **errp)
>  goto fail;
>  }
>  flic_state->fd = -1;
> -if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
> -error_setg_errno(&errp_local, errno, "KVM is missing capability"
> - " KVM_CAP_DEVICE_CTRL");
> -trace_flic_no_device_api(errno);
> -goto fail;
> -}
>  
>  cd.type = KVM_DEV_TYPE_FLIC;
>  ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd);
> diff --git a/hw/intc/trace-events b/hw/intc/trace-events
> index 90c9d07c1a..719f46b516 100644
> --- a/hw/intc/trace-events
> +++ b/hw/intc/trace-events
> @@ -75,7 +75,6 @@ xics_ics_simple_eoi(int nr) "ics_eoi: irq 0x%x"
>  
>  # s390_flic_kvm.c
>  flic_create_device(int err) "flic: create device failed %d"
> -flic_no_device_api(int err) "flic: no Device Contral API support %d"
>  flic_reset_failed(int err) "flic: reset failed %d"
>  
>  # s390_flic.c
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index cea71ac7c3..97a662ad0e 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -316,6 +316,13 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
>  mc->default_cpu_type = S390_CPU_TYPE_NAME("host");
> +
> +if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
> +error_report("KVM is missing capability KVM_CAP_DEVICE_CTRL - "
> + "please use kernel 3.15 or newer");
> +return -1;
> +}
> +
>  cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS);
>  cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
>  cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
> 


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PULL] RISC-V Patches for the 4.2 Soft Freeze, Part 1

2019-09-13 Thread Peter Maydell

On Wed, 11 Sep 2019 at 09:24, Palmer Dabbelt  wrote:
>
> The following changes since commit 89ea03a7dc83ca36b670ba7f787802791fcb04b1:
>
>   Merge remote-tracking branch 
> 'remotes/huth-gitlab/tags/m68k-pull-2019-09-07' into staging (2019-09-09 
> 09:48:34 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/palmer-dabbelt/qemu.git tags/riscv-for-master-4.2-sf1
>
> for you to fetch changes up to 1b2d0961bfaaa2db3a237f53273527b6c5e3498a:
>
>   target/riscv: Use TB_FLAGS_MSTATUS_FS for floating point (2019-09-10 
> 06:08:42 -0700)
>
> 
> RISC-V Patches for the 4.2 Soft Freeze, Part 1
>
> This contains quite a few patches that I'd like to target for 4.2.
> They're mostly emulation fixes for the sifive_u board, which now much
> more closely matches the hardware and can therefor run the same fireware
> as what gets loaded onto the board.  Additional user-visible
> improvements include:
>
> * support for loading initrd files from the command line into Linux, via
>   /chosen/linux,initrd-{start,end} device tree nodes.
> * The conversion of LOG_TRACE to trace events.
> * The addition of clock DT nodes for our uart and ethernet.
>
> This also includes some preliminary work for the H extension patches,
> but does not include the H extension patches as I haven't had time to
> review them yet.
>
> This passes my OE boot test on 32-bit and 64-bit virt machines, as well
> as a 64-bit upstream Linux boot on the sifive_u machine.
>

Hi; this fails 'make check' on all platforms:

MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
QTEST_QEMU_BINARY=riscv32-softmmu/qemu-system-riscv32
QTEST_QEMU_IMG=qemu-img t
ests/qom-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl
--test-name="qom-test"
PASS 1 qom-test /riscv32/qom/spike
PASS 2 qom-test /riscv32/qom/virt
PASS 3 qom-test /riscv32/qom/none
PASS 4 qom-test /riscv32/qom/spike_v1.10
qemu-system-riscv32: Invalid SMP CPUs 1. The min CPUs supported by
machine 'sifive_u' is 2
socket_accept failed: Resource temporarily unavailable
**
ERROR:/home/petmay01/linaro/qemu-for-merges/tests/libqtest.c:266:qtest_init_without_qmp_handshake:
assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
/home/petmay01/linaro/qemu-for-merges/tests/libqtest.c:135:
kill_qemu() tried to terminate QEMU process but encountered exit
status 1
ERROR - Bail out!
ERROR:/home/petmay01/linaro/qemu-for-merges/tests/libqtest.c:266:qtest_init_without_qmp_handshake:
assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
Aborted (core dumped)
/home/petmay01/linaro/qemu-for-merges/tests/Makefile.include:900:
recipe for target 'check-qtest-riscv32' failed

thanks
-- PMM

Re: [Qemu-devel] [PATCH v1 3/6] exec: assert on memory_region_register_iommu_notifier() failure

2019-09-13 Thread Andrew Jones

On Fri, Sep 13, 2019 at 10:36:12AM +0200, Eric Auger wrote:
> memory_region_register_iommu_notifier now returns an error
> in case of failure. Assert in such a case.
> 
> Signed-off-by: Eric Auger 
> ---
>  exec.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/exec.c b/exec.c
> index 235d6bc883..da30251a2b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -692,7 +692,8 @@ static void tcg_register_iommu_notifier(CPUState *cpu,
>  0,
>  HWADDR_MAX,
>  iommu_idx);
> -memory_region_register_iommu_notifier(notifier->mr, ¬ifier->n);
> +assert(!memory_region_register_iommu_notifier(notifier->mr,
> +  ¬ifier->n));

 ret = memory_region_register_iommu_notifier(notifier->mr, ¬ifier->n);
 assert(!ret);

to avoid functions with side effects being called inside assert()'s, as
assert()'s could be compiled as no-ops.

Same comment for next patch.

Thanks,
drew

>  }
>  
>  if (!notifier->active) {
> -- 
> 2.20.1
> 
>

[Qemu-devel] [PATCH v2 0/2] ati: fix ati_cursor_define bug.

2019-09-13 Thread Gerd Hoffmann




Gerd Hoffmann (2):
  vga: move access helpers to separate include file
  ati: use vga_read_byte in ati_cursor_define

 hw/display/vga-access.h  | 49 
 hw/display/vga-helpers.h | 26 -
 hw/display/ati.c | 19 
 hw/display/vga.c |  1 +
 4 files changed, 60 insertions(+), 35 deletions(-)
 create mode 100644 hw/display/vga-access.h

-- 
2.18.1

[Qemu-devel] [PATCH v2 1/2] vga: move access helpers to separate include file

2019-09-13 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 hw/display/vga-access.h  | 49 
 hw/display/vga-helpers.h | 26 -
 hw/display/vga.c |  1 +
 3 files changed, 50 insertions(+), 26 deletions(-)
 create mode 100644 hw/display/vga-access.h

diff --git a/hw/display/vga-access.h b/hw/display/vga-access.h
new file mode 100644
index ..c0fbd9958b2e
--- /dev/null
+++ b/hw/display/vga-access.h
@@ -0,0 +1,49 @@
+/*
+ * QEMU VGA Emulator templates
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+static inline uint8_t vga_read_byte(VGACommonState *vga, uint32_t addr)
+{
+return vga->vram_ptr[addr & vga->vbe_size_mask];
+}
+
+static inline uint16_t vga_read_word_le(VGACommonState *vga, uint32_t addr)
+{
+uint32_t offset = addr & vga->vbe_size_mask & ~1;
+uint16_t *ptr = (uint16_t *)(vga->vram_ptr + offset);
+return lduw_le_p(ptr);
+}
+
+static inline uint16_t vga_read_word_be(VGACommonState *vga, uint32_t addr)
+{
+uint32_t offset = addr & vga->vbe_size_mask & ~1;
+uint16_t *ptr = (uint16_t *)(vga->vram_ptr + offset);
+return lduw_be_p(ptr);
+}
+
+static inline uint32_t vga_read_dword_le(VGACommonState *vga, uint32_t addr)
+{
+uint32_t offset = addr & vga->vbe_size_mask & ~3;
+uint32_t *ptr = (uint32_t *)(vga->vram_ptr + offset);
+return ldl_le_p(ptr);
+}
diff --git a/hw/display/vga-helpers.h b/hw/display/vga-helpers.h
index 5a752b3f9efd..10e9cfd40a04 100644
--- a/hw/display/vga-helpers.h
+++ b/hw/display/vga-helpers.h
@@ -95,32 +95,6 @@ static void vga_draw_glyph9(uint8_t *d, int linesize,
 } while (--h);
 }
 
-static inline uint8_t vga_read_byte(VGACommonState *vga, uint32_t addr)
-{
-return vga->vram_ptr[addr & vga->vbe_size_mask];
-}
-
-static inline uint16_t vga_read_word_le(VGACommonState *vga, uint32_t addr)
-{
-uint32_t offset = addr & vga->vbe_size_mask & ~1;
-uint16_t *ptr = (uint16_t *)(vga->vram_ptr + offset);
-return lduw_le_p(ptr);
-}
-
-static inline uint16_t vga_read_word_be(VGACommonState *vga, uint32_t addr)
-{
-uint32_t offset = addr & vga->vbe_size_mask & ~1;
-uint16_t *ptr = (uint16_t *)(vga->vram_ptr + offset);
-return lduw_be_p(ptr);
-}
-
-static inline uint32_t vga_read_dword_le(VGACommonState *vga, uint32_t addr)
-{
-uint32_t offset = addr & vga->vbe_size_mask & ~3;
-uint32_t *ptr = (uint32_t *)(vga->vram_ptr + offset);
-return ldl_le_p(ptr);
-}
-
 /*
  * 4 color mode
  */
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 573d223d46f0..82ebe5361096 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -1009,6 +1009,7 @@ void vga_mem_writeb(VGACommonState *s, hwaddr addr, 
uint32_t val)
 typedef void vga_draw_line_func(VGACommonState *s1, uint8_t *d,
 uint32_t srcaddr, int width);
 
+#include "vga-access.h"
 #include "vga-helpers.h"
 
 /* return true if the palette was modified */
-- 
2.18.1

[Qemu-devel] [PATCH v2 2/2] ati: use vga_read_byte in ati_cursor_define

2019-09-13 Thread Gerd Hoffmann

This makes sure reads are confined to vga video memory.

v2: fix ati_cursor_draw_line too.

Reported-by: xu hang 
Signed-off-by: Gerd Hoffmann 
---
 hw/display/ati.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/display/ati.c b/hw/display/ati.c
index 8f940eee221a..f6ae27c0c710 100644
--- a/hw/display/ati.c
+++ b/hw/display/ati.c
@@ -19,6 +19,7 @@
 #include "qemu/osdep.h"
 #include "ati_int.h"
 #include "ati_regs.h"
+#include "vga-access.h"
 #include "hw/qdev-properties.h"
 #include "vga_regs.h"
 #include "qemu/log.h"
@@ -135,19 +136,19 @@ static void ati_vga_switch_mode(ATIVGAState *s)
 static void ati_cursor_define(ATIVGAState *s)
 {
 uint8_t data[1024];
-uint8_t *src;
+unsigned srcoff;
 int i, j, idx = 0;
 
 if ((s->regs.cur_offset & BIT(31)) || s->cursor_guest_mode) {
 return; /* Do not update cursor if locked or rendered by guest */
 }
 /* FIXME handle cur_hv_offs correctly */
-src = s->vga.vram_ptr + s->regs.cur_offset -
-  (s->regs.cur_hv_offs >> 16) - (s->regs.cur_hv_offs & 0x) * 16;
+srcoff = s->regs.cur_offset -
+(s->regs.cur_hv_offs >> 16) - (s->regs.cur_hv_offs & 0x) * 16;
 for (i = 0; i < 64; i++) {
 for (j = 0; j < 8; j++, idx++) {
-data[idx] = src[i * 16 + j];
-data[512 + idx] = src[i * 16 + j + 8];
+data[idx] = vga_read_byte(&s->vga, srcoff + i * 16 + j);
+data[512 + idx] = vga_read_byte(&s->vga, srcoff + i * 16 + j + 8);
 }
 }
 if (!s->cursor) {
@@ -189,7 +190,7 @@ static void ati_cursor_invalidate(VGACommonState *vga)
 static void ati_cursor_draw_line(VGACommonState *vga, uint8_t *d, int scr_y)
 {
 ATIVGAState *s = container_of(vga, ATIVGAState, vga);
-uint8_t *src;
+uint32_t srcoff;
 uint32_t *dp = (uint32_t *)d;
 int i, j, h;
 
@@ -199,13 +200,13 @@ static void ati_cursor_draw_line(VGACommonState *vga, 
uint8_t *d, int scr_y)
 return;
 }
 /* FIXME handle cur_hv_offs correctly */
-src = s->vga.vram_ptr + s->cursor_offset + (scr_y - vga->hw_cursor_y) * 16;
+srcoff = s->cursor_offset + (scr_y - vga->hw_cursor_y) * 16;
 dp = &dp[vga->hw_cursor_x];
 h = ((s->regs.crtc_h_total_disp >> 16) + 1) * 8;
 for (i = 0; i < 8; i++) {
 uint32_t color;
-uint8_t abits = src[i];
-uint8_t xbits = src[i + 8];
+uint8_t abits = vga_read_byte(vga, srcoff + i);
+uint8_t xbits = vga_read_byte(vga, srcoff + i);
 for (j = 0; j < 8; j++, abits <<= 1, xbits <<= 1) {
 if (abits & BIT(7)) {
 if (xbits & BIT(7)) {
-- 
2.18.1

Re: [Qemu-devel] [RFC PATCH] virtio-blk: schedule virtio_notify_config to run on main context

2019-09-13 Thread Sergio Lopez


Kevin Wolf  writes:

> Am 12.09.2019 um 21:51 hat Michael S. Tsirkin geschrieben:
>> On Thu, Sep 12, 2019 at 08:19:25PM +0200, Sergio Lopez wrote:
>> > Another AioContext-related issue, and this is a tricky one.
>> > 
>> > Executing a QMP block_resize request for a virtio-blk device running
>> > on an iothread may cause a deadlock involving the following mutexes:
>> > 
>> >  - main thead
>> >   * Has acquired: qemu_mutex_global.
>> >   * Is trying the acquire: iothread AioContext lock via
>> > AIO_WAIT_WHILE (after aio_poll).
>> > 
>> >  - iothread
>> >   * Has acquired: AioContext lock.
>> >   * Is trying to acquire: qemu_mutex_global (via
>> > virtio_notify_config->prepare_mmio_access).
>> 
>> Hmm is this really the only case iothread takes qemu mutex?
>> If any such access can deadlock, don't we need a generic
>> solution? Maybe main thread can drop qemu mutex
>> before taking io thread AioContext lock?
>
> The rule is that iothreads must not take the qemu mutex. If they do
> (like in this case), it's a bug.
>
> Maybe we could actually assert this in qemu_mutex_lock_iothread()?
>
>> > With this change, virtio_blk_resize checks if it's being called from a
>> > coroutine context running on a non-main thread, and if that's the
>> > case, creates a new coroutine and schedules it to be run on the main
>> > thread.
>> > 
>> > This works, but means the actual operation is done
>> > asynchronously, perhaps opening a window in which a "device_del"
>> > operation may fit and remove the VirtIODevice before
>> > virtio_notify_config() is executed.
>> > 
>> > I *think* it shouldn't be possible, as BHs will be processed before
>> > any new QMP/monitor command, but I'm open to a different approach.
>> > 
>> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1744955
>> > Signed-off-by: Sergio Lopez 
>> > ---
>> >  hw/block/virtio-blk.c | 25 -
>> >  1 file changed, 24 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>> > index 18851601cb..c763d071f6 100644
>> > --- a/hw/block/virtio-blk.c
>> > +++ b/hw/block/virtio-blk.c
>> > @@ -16,6 +16,7 @@
>> >  #include "qemu/iov.h"
>> >  #include "qemu/module.h"
>> >  #include "qemu/error-report.h"
>> > +#include "qemu/main-loop.h"
>> >  #include "trace.h"
>> >  #include "hw/block/block.h"
>> >  #include "hw/qdev-properties.h"
>> > @@ -1086,11 +1087,33 @@ static int virtio_blk_load_device(VirtIODevice 
>> > *vdev, QEMUFile *f,
>> >  return 0;
>> >  }
>> >  
>> > +static void coroutine_fn virtio_resize_co_entry(void *opaque)
>> > +{
>> > +VirtIODevice *vdev = opaque;
>> > +
>> > +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>> > +virtio_notify_config(vdev);
>> > +aio_wait_kick();
>> > +}
>> > +
>> >  static void virtio_blk_resize(void *opaque)
>> >  {
>> >  VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
>> > +Coroutine *co;
>> >  
>> > -virtio_notify_config(vdev);
>> > +if (qemu_in_coroutine() &&
>> > +qemu_get_current_aio_context() != qemu_get_aio_context()) {
>> > +/*
>> > + * virtio_notify_config() needs to acquire the global mutex,
>> > + * so calling it from a coroutine running on a non-main context
>> > + * may cause a deadlock. Instead, create a new coroutine and
>> > + * schedule it to be run on the main thread.
>> > + */
>> > +co = qemu_coroutine_create(virtio_resize_co_entry, vdev);
>> > +aio_co_schedule(qemu_get_aio_context(), co);
>> > +} else {
>> > +virtio_notify_config(vdev);
>> > +}
>> >  }
>
> Wouldn't a simple BH suffice (aio_bh_schedule_oneshot)? I don't see why
> you need a coroutine when you never yield.

You're right, that's actually simpler, haven't thought of it.

Do you see any drawbacks or should I send a non-RFC fixed version of
this patch?

> The reason why it deadlocks also has nothing to do with whether we are
> called from a coroutine or not. The important part is that we're running
> in an iothread.
>
> Kevin



signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] blockdev: avoid acquiring AioContext lock twice at do_drive_backup()

2019-09-13 Thread Sergio Lopez


Max Reitz  writes:

> On 12.09.19 18:16, Sergio Lopez wrote:
>> do_drive_backup() acquires the AioContext lock of the corresponding
>> BlockDriverState. This is not a problem when it's called from
>> qmp_drive_backup(), but drive_backup_prepare() also acquires the lock
>> before calling it.
>> 
>> This change adds a BlockDriverState argument to do_drive_backup(),
>> which is used to signal that the context lock is already acquired and
>> to save a couple of redundant calls.
>
> But those redundant calls don’t really hurt (it’s just bdrv_lookup_bs(),
> as far as I can tell).  Wouldn’t it be simpler to just release the
> context lock in drive_backup_prepare() before calling do_drive_backup()?
>  The BDS is drained anyway.

Redundant calls rarely hurt, they're just redundant ;-)

> On top of that, do_backup_common() calls bdrv_try_set_aio_context() to
> bring the target into the source’s AioContext.  However, this function
> must be called with the old AioContext held, and the new context not held.

Is this documented somewhere? I see nothing in the function declaration
nor definition.

I'm starting to get the feeling that the block layer is riddled with
unwritten rules and assumptions that makes every change a lot harder
than it should be.

> Currently, it’s called exactly the other way around: With the new
> context held, but the old one not held.
>
> So I think it indeed actually makes more sense to release the AioContext
> before calling do_drive_backup(), and to move the
> bdrv_try_set_aio_context() call for target_bs to the callers of
> do_backup_common() (where they have not yet taken the AioContext lock).

OK. I see this also happens in external_snapshot_prepare() and
qmp_drive_mirror() too. I guess we should fix these too.

In qmp_drive_mirror(), would it be safe to delay the acquisition of any
context until just before the blockdev_mirror_common()?

> Max
>
>> Signed-off-by: Sergio Lopez 
>> ---
>>  blockdev.c | 54 ++
>>  1 file changed, 38 insertions(+), 16 deletions(-)
>> 
>> diff --git a/blockdev.c b/blockdev.c
>> index fbef6845c8..0cc6c69ceb 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -1762,8 +1762,10 @@ typedef struct DriveBackupState {
>>  BlockJob *job;
>>  } DriveBackupState;
>>  
>> -static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
>> -Error **errp);
>> +static BlockJob *do_drive_backup(DriveBackup *backup,
>> + BlockDriverState *backup_bs,
>> + JobTxn *txn,
>> + Error **errp);
>>  
>>  static void drive_backup_prepare(BlkActionState *common, Error **errp)
>>  {
>> @@ -1781,6 +1783,11 @@ static void drive_backup_prepare(BlkActionState 
>> *common, Error **errp)
>>  return;
>>  }
>>  
>> +if (!bs->drv) {
>> +error_setg(errp, "Device has no medium");
>> +return;
>> +}
>> +
>>  aio_context = bdrv_get_aio_context(bs);
>>  aio_context_acquire(aio_context);
>>  
>> @@ -1789,7 +1796,9 @@ static void drive_backup_prepare(BlkActionState 
>> *common, Error **errp)
>>  
>>  state->bs = bs;
>>  
>> -state->job = do_drive_backup(backup, common->block_job_txn, &local_err);
>> +state->job = do_drive_backup(backup, bs,
>> + common->block_job_txn,
>> + &local_err);
>>  if (local_err) {
>>  error_propagate(errp, local_err);
>>  goto out;
>> @@ -3607,7 +3616,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
>>  return job;
>>  }
>>  
>> -static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
>> +static BlockJob *do_drive_backup(DriveBackup *backup,
>> + BlockDriverState *backup_bs,
>> + JobTxn *txn,
>>   Error **errp)
>>  {
>>  BlockDriverState *bs;
>> @@ -3625,18 +3636,27 @@ static BlockJob *do_drive_backup(DriveBackup 
>> *backup, JobTxn *txn,
>>  backup->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
>>  }
>>  
>> -bs = bdrv_lookup_bs(backup->device, backup->device, errp);
>> -if (!bs) {
>> -return NULL;
>> -}
>> +if (backup_bs) {
>> +bs = backup_bs;
>> +/*
>> + * If the caller passes us a BDS, we assume it has already
>> + * acquired the context lock.
>> + */
>> +aio_context = bdrv_get_aio_context(bs);
>> +} else {
>> +bs = bdrv_lookup_bs(backup->device, backup->device, errp);
>> +if (!bs) {
>> +return NULL;
>> +}
>>  
>> -if (!bs->drv) {
>> -error_setg(errp, "Device has no medium");
>> -return NULL;
>> -}
>> +if (!bs->drv) {
>> +error_setg(errp, "Device has no medium");
>> +return NULL;
>> +}
>>  
>> -aio_context = bdrv_get_aio_context(bs);
>> -aio_contex

Re: [Qemu-devel] [RFC PATCH] virtio-blk: schedule virtio_notify_config to run on main context

2019-09-13 Thread Kevin Wolf

Am 13.09.2019 um 11:28 hat Sergio Lopez geschrieben:
> 
> Kevin Wolf  writes:
> 
> > Am 12.09.2019 um 21:51 hat Michael S. Tsirkin geschrieben:
> >> On Thu, Sep 12, 2019 at 08:19:25PM +0200, Sergio Lopez wrote:
> >> > Another AioContext-related issue, and this is a tricky one.
> >> > 
> >> > Executing a QMP block_resize request for a virtio-blk device running
> >> > on an iothread may cause a deadlock involving the following mutexes:
> >> > 
> >> >  - main thead
> >> >   * Has acquired: qemu_mutex_global.
> >> >   * Is trying the acquire: iothread AioContext lock via
> >> > AIO_WAIT_WHILE (after aio_poll).
> >> > 
> >> >  - iothread
> >> >   * Has acquired: AioContext lock.
> >> >   * Is trying to acquire: qemu_mutex_global (via
> >> > virtio_notify_config->prepare_mmio_access).
> >> 
> >> Hmm is this really the only case iothread takes qemu mutex?
> >> If any such access can deadlock, don't we need a generic
> >> solution? Maybe main thread can drop qemu mutex
> >> before taking io thread AioContext lock?
> >
> > The rule is that iothreads must not take the qemu mutex. If they do
> > (like in this case), it's a bug.
> >
> > Maybe we could actually assert this in qemu_mutex_lock_iothread()?
> >
> >> > With this change, virtio_blk_resize checks if it's being called from a
> >> > coroutine context running on a non-main thread, and if that's the
> >> > case, creates a new coroutine and schedules it to be run on the main
> >> > thread.
> >> > 
> >> > This works, but means the actual operation is done
> >> > asynchronously, perhaps opening a window in which a "device_del"
> >> > operation may fit and remove the VirtIODevice before
> >> > virtio_notify_config() is executed.
> >> > 
> >> > I *think* it shouldn't be possible, as BHs will be processed before
> >> > any new QMP/monitor command, but I'm open to a different approach.
> >> > 
> >> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1744955
> >> > Signed-off-by: Sergio Lopez 
> >> > ---
> >> >  hw/block/virtio-blk.c | 25 -
> >> >  1 file changed, 24 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> >> > index 18851601cb..c763d071f6 100644
> >> > --- a/hw/block/virtio-blk.c
> >> > +++ b/hw/block/virtio-blk.c
> >> > @@ -16,6 +16,7 @@
> >> >  #include "qemu/iov.h"
> >> >  #include "qemu/module.h"
> >> >  #include "qemu/error-report.h"
> >> > +#include "qemu/main-loop.h"
> >> >  #include "trace.h"
> >> >  #include "hw/block/block.h"
> >> >  #include "hw/qdev-properties.h"
> >> > @@ -1086,11 +1087,33 @@ static int virtio_blk_load_device(VirtIODevice 
> >> > *vdev, QEMUFile *f,
> >> >  return 0;
> >> >  }
> >> >  
> >> > +static void coroutine_fn virtio_resize_co_entry(void *opaque)
> >> > +{
> >> > +VirtIODevice *vdev = opaque;
> >> > +
> >> > +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> >> > +virtio_notify_config(vdev);
> >> > +aio_wait_kick();
> >> > +}
> >> > +
> >> >  static void virtio_blk_resize(void *opaque)
> >> >  {
> >> >  VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
> >> > +Coroutine *co;
> >> >  
> >> > -virtio_notify_config(vdev);
> >> > +if (qemu_in_coroutine() &&
> >> > +qemu_get_current_aio_context() != qemu_get_aio_context()) {
> >> > +/*
> >> > + * virtio_notify_config() needs to acquire the global mutex,
> >> > + * so calling it from a coroutine running on a non-main context
> >> > + * may cause a deadlock. Instead, create a new coroutine and
> >> > + * schedule it to be run on the main thread.
> >> > + */
> >> > +co = qemu_coroutine_create(virtio_resize_co_entry, vdev);
> >> > +aio_co_schedule(qemu_get_aio_context(), co);
> >> > +} else {
> >> > +virtio_notify_config(vdev);
> >> > +}
> >> >  }
> >
> > Wouldn't a simple BH suffice (aio_bh_schedule_oneshot)? I don't see why
> > you need a coroutine when you never yield.
> 
> You're right, that's actually simpler, haven't thought of it.
> 
> Do you see any drawbacks or should I send a non-RFC fixed version of
> this patch?

Sending a fixed non-RFC version sounds good to me.

Kevin


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH V2 1/2] block/nfs: tear down aio before nfs_close

2019-09-13 Thread Max Reitz

On 10.09.19 17:41, Peter Lieven wrote:
> nfs_close is a sync call from libnfs and has its own event
> handler polling on the nfs FD. Avoid that both QEMU and libnfs
> are intefering here.
> 
> CC: qemu-sta...@nongnu.org
> Signed-off-by: Peter Lieven 
> ---
>  block/nfs.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

I’ve just seen that Kevin has already included the second patch (in its
v1) in his pull request.

So all that I can do is take this patch, which sounds good to me,
especially since Ronnie has agreed that we should remove our FD handler
there.

(So I’ve rebased the patch on top of Kevin’s pull request, and I’ve
taken it to my block branch.)

Max

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [RFC v3 0/3] KVM/ARM: Fix >256 vcpus

2019-09-13 Thread Eric Auger

Since 4.18, KVM/ARM exposes a KVM_MAX_VCPUS equal to 512. However it was
reported [1] that a VM with more than 256 vcpus cannot be launched. 5.4
is about to fix the situation with 2 patches:
- one upgrade of the KVM_IRQ_LINE API [2] supporting a vcpu id encoded
  on 12 bits,
- the reduction of KVM IO devices consumed by each GICv3 redistributor [3]

This series uses the new KVM_IRQ_LINE API and also checks the associated
capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2) in machvirt.

Without the series, as soon as the -smp arguments exceeds 256, QEMU exits
with "kvm_set_irq: Invalid argument".

Best Regards

Eric

References:
[1] Can we boot a 512U kvm guest?
https://patchwork.kernel.org/patch/11091501/
[2] [PATCH] KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE
https://patchwork.kernel.org/patch/11099609/
[3] [PATCH] KVM: arm/arm64: vgic: Use a single IO device per redistributor
https://patchwork.kernel.org/patch/2141/

This series can be found at:
https://github.com/eauger/qemu/tree/v4.1.0-256fix-rfc-v3

History:
v2 -> v3:
- simplifications in kvm_arm_gic_set_irq
- Implement KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 check in kvm_arch_init

v1 -> v2:
- New layout set for kvm_arm_gic_set_irq and
  arm_cpu_kvm_set_irq through kvm_arm_set_irq
- Introduced kvm_arm_irq_line_layout_mismatch()

Eric Auger (3):
  linux headers: update for KVM_CAP_ARM_IRQ_LINE_LAYOUT_2
  intc/arm_gic: Support IRQ injection for more than 256 vpus
  ARM: KVM: Check KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 for smp_cpus > 256

 hw/intc/arm_gic_kvm.c|  7 ++---
 include/standard-headers/asm-x86/bootparam.h |  2 ++
 include/standard-headers/asm-x86/kvm_para.h  |  1 +
 include/standard-headers/linux/ethtool.h |  2 ++
 include/standard-headers/linux/pci_regs.h|  4 +++
 include/standard-headers/linux/virtio_ids.h  |  1 +
 include/standard-headers/linux/virtio_pmem.h |  6 ++---
 linux-headers/asm-arm/kvm.h  | 16 ++-
 linux-headers/asm-arm/unistd-common.h|  2 ++
 linux-headers/asm-arm64/kvm.h| 21 ++-
 linux-headers/asm-generic/mman-common.h  | 15 ++-
 linux-headers/asm-generic/mman.h | 10 +++
 linux-headers/asm-generic/unistd.h   |  8 +-
 linux-headers/asm-mips/unistd_n32.h  |  1 +
 linux-headers/asm-mips/unistd_n64.h  |  1 +
 linux-headers/asm-mips/unistd_o32.h  |  1 +
 linux-headers/asm-powerpc/mman.h |  6 +
 linux-headers/asm-powerpc/unistd_32.h|  2 ++
 linux-headers/asm-powerpc/unistd_64.h|  2 ++
 linux-headers/asm-s390/unistd_32.h   |  2 ++
 linux-headers/asm-s390/unistd_64.h   |  2 ++
 linux-headers/asm-x86/kvm.h  | 28 +++-
 linux-headers/asm-x86/unistd_32.h|  2 ++
 linux-headers/asm-x86/unistd_64.h|  2 ++
 linux-headers/asm-x86/unistd_x32.h   |  2 ++
 linux-headers/linux/kvm.h| 12 ++---
 linux-headers/linux/psp-sev.h|  5 +---
 target/arm/cpu.c | 10 +++
 target/arm/kvm.c | 22 ++-
 target/arm/kvm_arm.h |  1 +
 30 files changed, 147 insertions(+), 49 deletions(-)

-- 
2.20.1

[Qemu-devel] [RFC v3 3/3] ARM: KVM: Check KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 for smp_cpus > 256

2019-09-13 Thread Eric Auger

Host kernel within [4.18, 5.3] report an erroneous KVM_MAX_VCPUS=512
for ARM. The actual capability to instantiate more than 256 vcpus
was fixed in 5.4 with the upgrade of the KVM_IRQ_LINE ABI to support
vcpu id encoded on 12 bits instead of 8 and a redistributor consuming
a single KVM IO device instead of 2.

So let's check this capability when attempting to use more than 256
vcpus within any ARM kvm accelerated machine.

Signed-off-by: Eric Auger 

---

v2 -> v3:
- Implement the check in kvm_arch_init as suggested by Peter
---
 target/arm/kvm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index b10581fa06..b473c63edb 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -182,6 +182,7 @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
 
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
+int ret = 0;
 /* For ARM interrupt delivery is always asynchronous,
  * whether we are using an in-kernel VGIC or not.
  */
@@ -195,7 +196,14 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
 cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
 
-return 0;
+if (ms->smp.cpus > 256 &&
+!kvm_check_extension(s, KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)) {
+error_report("Using more than 256 vcpus requires a host kernel "
+ "with KVM_CAP_ARM_IRQ_LINE_LAYOUT_2");
+ret = -EINVAL;
+}
+
+return ret;
 }
 
 unsigned long kvm_arch_vcpu_id(CPUState *cpu)
-- 
2.20.1

[Qemu-devel] [RFC v3 2/3] intc/arm_gic: Support IRQ injection for more than 256 vpus

2019-09-13 Thread Eric Auger

Host kernels that expose the KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 capability
allow injection of interrupts along with vcpu ids larger than 255.
Let's encode the vpcu id on 12 bits according to the upgraded KVM_IRQ_LINE
ABI when needed.

Given that we have two callsites that need to assemble
the value for kvm_set_irq(), a new helper routine, kvm_arm_set_irq
is introduced.

Without that patch qemu exits with "kvm_set_irq: Invalid argument"
message.

Signed-off-by: Eric Auger 
Reported-by: Zenghui Yu 

---

v2 -> v3:
- remove if (cpu !=0), drop mask, as per Drew's suggestions
---
 hw/intc/arm_gic_kvm.c |  7 ++-
 target/arm/cpu.c  | 10 --
 target/arm/kvm.c  | 12 
 target/arm/kvm_arm.h  |  1 +
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index b56fda144f..9deb15e7e6 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -55,7 +55,7 @@ void kvm_arm_gic_set_irq(uint32_t num_irq, int irq, int level)
  * has separate fields in the irq number for type,
  * CPU number and interrupt number.
  */
-int kvm_irq, irqtype, cpu;
+int irqtype, cpu;
 
 if (irq < (num_irq - GIC_INTERNAL)) {
 /* External interrupt. The kernel numbers these like the GIC
@@ -72,10 +72,7 @@ void kvm_arm_gic_set_irq(uint32_t num_irq, int irq, int 
level)
 cpu = irq / GIC_INTERNAL;
 irq %= GIC_INTERNAL;
 }
-kvm_irq = (irqtype << KVM_ARM_IRQ_TYPE_SHIFT)
-| (cpu << KVM_ARM_IRQ_VCPU_SHIFT) | irq;
-
-kvm_set_irq(kvm_state, kvm_irq, !!level);
+kvm_arm_set_irq(cpu, irqtype, irq, !!level);
 }
 
 static void kvm_arm_gicv2_set_irq(void *opaque, int irq, int level)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 2399c14471..13813fb213 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -576,16 +576,16 @@ static void arm_cpu_kvm_set_irq(void *opaque, int irq, 
int level)
 ARMCPU *cpu = opaque;
 CPUARMState *env = &cpu->env;
 CPUState *cs = CPU(cpu);
-int kvm_irq = KVM_ARM_IRQ_TYPE_CPU << KVM_ARM_IRQ_TYPE_SHIFT;
 uint32_t linestate_bit;
+int irq_id;
 
 switch (irq) {
 case ARM_CPU_IRQ:
-kvm_irq |= KVM_ARM_IRQ_CPU_IRQ;
+irq_id = KVM_ARM_IRQ_CPU_IRQ;
 linestate_bit = CPU_INTERRUPT_HARD;
 break;
 case ARM_CPU_FIQ:
-kvm_irq |= KVM_ARM_IRQ_CPU_FIQ;
+irq_id = KVM_ARM_IRQ_CPU_FIQ;
 linestate_bit = CPU_INTERRUPT_FIQ;
 break;
 default:
@@ -597,9 +597,7 @@ static void arm_cpu_kvm_set_irq(void *opaque, int irq, int 
level)
 } else {
 env->irq_line_state &= ~linestate_bit;
 }
-
-kvm_irq |= cs->cpu_index << KVM_ARM_IRQ_VCPU_SHIFT;
-kvm_set_irq(kvm_state, kvm_irq, level ? 1 : 0);
+kvm_arm_set_irq(cs->cpu_index, KVM_ARM_IRQ_TYPE_CPU, irq_id, !!level);
 #endif
 }
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index b2eaa50b8d..b10581fa06 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -744,6 +744,18 @@ int kvm_arm_vgic_probe(void)
 }
 }
 
+int kvm_arm_set_irq(int cpu, int irqtype, int irq, int level)
+{
+int kvm_irq = (irqtype << KVM_ARM_IRQ_TYPE_SHIFT) | irq;
+int cpu_idx1 = cpu % 256;
+int cpu_idx2 = cpu / 256;
+
+kvm_irq |= (cpu_idx1 << KVM_ARM_IRQ_VCPU_SHIFT) |
+   (cpu_idx2 << KVM_ARM_IRQ_VCPU2_SHIFT);
+
+return kvm_set_irq(kvm_state, kvm_irq, !!level);
+}
+
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
  uint64_t address, uint32_t data, PCIDevice *dev)
 {
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index b3106c8600..b4e19457a0 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -253,6 +253,7 @@ int kvm_arm_vgic_probe(void);
 
 void kvm_arm_pmu_set_irq(CPUState *cs, int irq);
 void kvm_arm_pmu_init(CPUState *cs);
+int kvm_arm_set_irq(int cpu, int irqtype, int irq, int level);
 
 #else
 
-- 
2.20.1

[Qemu-devel] [RFC v3 1/3] linux headers: update for KVM_CAP_ARM_IRQ_LINE_LAYOUT_2

2019-09-13 Thread Eric Auger

Temporary header update against 5.3-rc5 + Marc's patch:
"[PATCH] KVM: arm/arm64: vgic: Allow more than 256 vcpus for
KVM_IRQ_LINE"

Signed-off-by: Eric Auger 
---
 include/standard-headers/asm-x86/bootparam.h |  2 ++
 include/standard-headers/asm-x86/kvm_para.h  |  1 +
 include/standard-headers/linux/ethtool.h |  2 ++
 include/standard-headers/linux/pci_regs.h|  4 +++
 include/standard-headers/linux/virtio_ids.h  |  1 +
 include/standard-headers/linux/virtio_pmem.h |  6 ++---
 linux-headers/asm-arm/kvm.h  | 16 ++-
 linux-headers/asm-arm/unistd-common.h|  2 ++
 linux-headers/asm-arm64/kvm.h| 21 ++-
 linux-headers/asm-generic/mman-common.h  | 15 ++-
 linux-headers/asm-generic/mman.h | 10 +++
 linux-headers/asm-generic/unistd.h   |  8 +-
 linux-headers/asm-mips/unistd_n32.h  |  1 +
 linux-headers/asm-mips/unistd_n64.h  |  1 +
 linux-headers/asm-mips/unistd_o32.h  |  1 +
 linux-headers/asm-powerpc/mman.h |  6 +
 linux-headers/asm-powerpc/unistd_32.h|  2 ++
 linux-headers/asm-powerpc/unistd_64.h|  2 ++
 linux-headers/asm-s390/unistd_32.h   |  2 ++
 linux-headers/asm-s390/unistd_64.h   |  2 ++
 linux-headers/asm-x86/kvm.h  | 28 +++-
 linux-headers/asm-x86/unistd_32.h|  2 ++
 linux-headers/asm-x86/unistd_64.h|  2 ++
 linux-headers/asm-x86/unistd_x32.h   |  2 ++
 linux-headers/linux/kvm.h| 12 ++---
 linux-headers/linux/psp-sev.h|  5 +---
 26 files changed, 119 insertions(+), 37 deletions(-)

diff --git a/include/standard-headers/asm-x86/bootparam.h 
b/include/standard-headers/asm-x86/bootparam.h
index 67d4f0119f..a6f7cf535e 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -29,6 +29,8 @@
 #define XLF_EFI_HANDOVER_32(1<<2)
 #define XLF_EFI_HANDOVER_64(1<<3)
 #define XLF_EFI_KEXEC  (1<<4)
+#define XLF_5LEVEL (1<<5)
+#define XLF_5LEVEL_ENABLED (1<<6)
 
 
 #endif /* _ASM_X86_BOOTPARAM_H */
diff --git a/include/standard-headers/asm-x86/kvm_para.h 
b/include/standard-headers/asm-x86/kvm_para.h
index e1715143fd..90604a8fb7 100644
--- a/include/standard-headers/asm-x86/kvm_para.h
+++ b/include/standard-headers/asm-x86/kvm_para.h
@@ -30,6 +30,7 @@
 #define KVM_FEATURE_ASYNC_PF_VMEXIT10
 #define KVM_FEATURE_PV_SEND_IPI11
 #define KVM_FEATURE_POLL_CONTROL   12
+#define KVM_FEATURE_PV_SCHED_YIELD 13
 
 #define KVM_HINTS_REALTIME  0
 
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 9b9919a8f6..16d0eeea86 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -1483,6 +1483,8 @@ enum ethtool_link_mode_bit_indices {
ETHTOOL_LINK_MODE_20baseLR4_ER4_FR4_Full_BIT = 64,
ETHTOOL_LINK_MODE_20baseDR4_Full_BIT = 65,
ETHTOOL_LINK_MODE_20baseCR4_Full_BIT = 66,
+   ETHTOOL_LINK_MODE_100baseT1_Full_BIT = 67,
+   ETHTOOL_LINK_MODE_1000baseT1_Full_BIT= 68,
 
/* must be last entry */
__ETHTOOL_LINK_MODE_MASK_NBITS
diff --git a/include/standard-headers/linux/pci_regs.h 
b/include/standard-headers/linux/pci_regs.h
index 27164769d1..f28e562d7c 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -528,6 +528,7 @@
 #define  PCI_EXP_LNKCAP_SLS_5_0GB 0x0002 /* LNKCAP2 SLS Vector bit 1 */
 #define  PCI_EXP_LNKCAP_SLS_8_0GB 0x0003 /* LNKCAP2 SLS Vector bit 2 */
 #define  PCI_EXP_LNKCAP_SLS_16_0GB 0x0004 /* LNKCAP2 SLS Vector bit 3 */
+#define  PCI_EXP_LNKCAP_SLS_32_0GB 0x0005 /* LNKCAP2 SLS Vector bit 4 */
 #define  PCI_EXP_LNKCAP_MLW0x03f0 /* Maximum Link Width */
 #define  PCI_EXP_LNKCAP_ASPMS  0x0c00 /* ASPM Support */
 #define  PCI_EXP_LNKCAP_L0SEL  0x7000 /* L0s Exit Latency */
@@ -556,6 +557,7 @@
 #define  PCI_EXP_LNKSTA_CLS_5_0GB 0x0002 /* Current Link Speed 5.0GT/s */
 #define  PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */
 #define  PCI_EXP_LNKSTA_CLS_16_0GB 0x0004 /* Current Link Speed 16.0GT/s */
+#define  PCI_EXP_LNKSTA_CLS_32_0GB 0x0005 /* Current Link Speed 32.0GT/s */
 #define  PCI_EXP_LNKSTA_NLW0x03f0  /* Negotiated Link Width */
 #define  PCI_EXP_LNKSTA_NLW_X1 0x0010  /* Current Link Width x1 */
 #define  PCI_EXP_LNKSTA_NLW_X2 0x0020  /* Current Link Width x2 */
@@ -661,6 +663,7 @@
 #define  PCI_EXP_LNKCAP2_SLS_5_0GB 0x0004 /* Supported Speed 5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_8_0GB 0x0008 /* Supported Speed 8GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_16_0GB0x0010 /* Supported Speed 16GT/s */
+#define  PCI_EXP_LNKCAP2_SLS_32_0GB0x0020 /* Supported Speed 32GT/s */
 #d

Re: [Qemu-devel] [PATCH] blockdev: avoid acquiring AioContext lock twice at do_drive_backup()

2019-09-13 Thread Max Reitz

On 13.09.19 11:37, Sergio Lopez wrote:
> 
> Max Reitz  writes:
> 
>> On 12.09.19 18:16, Sergio Lopez wrote:
>>> do_drive_backup() acquires the AioContext lock of the corresponding
>>> BlockDriverState. This is not a problem when it's called from
>>> qmp_drive_backup(), but drive_backup_prepare() also acquires the lock
>>> before calling it.
>>>
>>> This change adds a BlockDriverState argument to do_drive_backup(),
>>> which is used to signal that the context lock is already acquired and
>>> to save a couple of redundant calls.
>>
>> But those redundant calls don’t really hurt (it’s just bdrv_lookup_bs(),
>> as far as I can tell).  Wouldn’t it be simpler to just release the
>> context lock in drive_backup_prepare() before calling do_drive_backup()?
>>  The BDS is drained anyway.
> 
> Redundant calls rarely hurt, they're just redundant ;-)

If they’re expensive and in a hot path, they hurt.

>> On top of that, do_backup_common() calls bdrv_try_set_aio_context() to
>> bring the target into the source’s AioContext.  However, this function
>> must be called with the old AioContext held, and the new context not held.
> 
> Is this documented somewhere? I see nothing in the function declaration
> nor definition.
> 
> I'm starting to get the feeling that the block layer is riddled with
> unwritten rules and assumptions that makes every change a lot harder
> than it should be.

It is written, it’s just that it’s written in
bdrv_set_aio_context_ignore()’s definition.

Yes, we should document it directly for bdrv_try_set_aio_context(), too,
because that’s what external callers are much more likely to use.

>> Currently, it’s called exactly the other way around: With the new
>> context held, but the old one not held.
>>
>> So I think it indeed actually makes more sense to release the AioContext
>> before calling do_drive_backup(), and to move the
>> bdrv_try_set_aio_context() call for target_bs to the callers of
>> do_backup_common() (where they have not yet taken the AioContext lock).
> 
> OK. I see this also happens in external_snapshot_prepare() and
> qmp_drive_mirror() too. I guess we should fix these too.
> 
> In qmp_drive_mirror(), would it be safe to delay the acquisition of any
> context until just before the blockdev_mirror_common()?

From mirror’s perspective I think so, but I don’t think it’s safe to
access any of a BDS’s fields without having acquired its AioContext.
(In fact, I wonder whether we should acquire the context even before
bdrv_op_is_blocked()...)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part

2019-09-13 Thread Kevin Wolf

Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Further patch will run partial requests of iterations of
> qcow2_co_preadv in parallel for performance reasons. To prepare for
> this, separate part which may be parallelized into separate function
> (qcow2_co_preadv_task).
> 
> While being here, also separate encrypted clusters reading to own
> function, like it is done for compressed reading.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Max Reitz 
> ---
>  qapi/block-core.json |   2 +-
>  block/qcow2.c| 205 +++
>  2 files changed, 111 insertions(+), 96 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0d43d4f37c..dd80aa11db 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3266,7 +3266,7 @@
>  'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
>  'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
>  'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
> -'cor_write', 'cluster_alloc_space', 'none'] }
> +'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }

What's the point of this new blkdebug event?

Obviously, read_aio for an encrypted image must mean a read of encrypted
data. The same image can never trigger both read_aio and
read_encrypted, so why do we need to distinguish them as two different
events?

Kevin

Re: [Qemu-devel] [PATCH V2 2/2] block/nfs: add support for nfs_umount

2019-09-13 Thread ronnie sahlberg

On Wed, Sep 11, 2019 at 5:48 PM Max Reitz  wrote:
>
> On 10.09.19 17:41, Peter Lieven wrote:
> > libnfs recently added support for unmounting. Add support
> > in Qemu too.
> >
> > Signed-off-by: Peter Lieven 
> > ---
> >  block/nfs.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/block/nfs.c b/block/nfs.c
> > index 2c98508275..f39acfdb28 100644
> > --- a/block/nfs.c
> > +++ b/block/nfs.c
> > @@ -398,6 +398,9 @@ static void nfs_client_close(NFSClient *client)
> >  nfs_close(client->context, client->fh);
> >  client->fh = NULL;
> >  }
> > +#ifdef LIBNFS_FEATURE_UMOUNT
> > +nfs_umount(client->context);
> > +#endif
> >  nfs_destroy_context(client->context);
> >  client->context = NULL;
> >  }
>
> I don’t understand what unmounting means in this context.  Is it just
> generic clean-up for NFSv3 (it appears that it’s a no-op for NFSv4)?
> Why isn’t that done by nfs_destroy_context()?

Umount is weird since there isn't actually any state in NFSv3 and
"mounting" in nfsv3 is really just a matter of converting the path to
be mounted into a filehandle.
That is all the mount protocol is really used for.

This is all handled in a separate protocol/server called rpc.mountd
that is separate from NFSd. Running as a different process and
listening to a different port.
And the only purpose of rpc.mountd is to take a path to a share and
return a nfsv3 filehandle to the root of that path.
As a side-effect, rpc.mountd also keeps track of which clients have
called MNT but not yet UMNT and thus showmount -a
can give a lost of all client that have "mounted" the share but not
yet called "UMNT".

It has no effect at all on NFSv3 and is purely cosmetic. This ONLY
affects "showmount -a" output.
NFSv4 does away with all these separate protocols such as mount,
statd, nlm and even portmapper
so there is not even a concept of showmount -a for nfsv4.

As the libnfs maintainer, why did I do nfs_umount() the way I did?
First of all, I think of nfs UMNT as really just cosmetic and didn't
want to put too much work into it. But people wanted it.

I implemented it as a sync function since I think few people would
actually use it at all and it meant that I just didn't have to invest
in having to build an async piupelinje.

I did NOT implement it inside nfs_destroy_context() since that
function is supposed to be, in my view, non-blocking andn should just
tear the connection and immediately return.
As unmount would be
* close the tcp socket to the nfs server
* open new socket to portmapper and ask "where is rpc.mountd"
* close socket to portmapper, then open new socket to rpc.mountd
 * tell rpc.mountd to remove us from the showmount -a list
* close socket

I just took the cheap and easy path. Do it as a sync function with my
own eventloop.

Again, UMNT has no real effect on anything related to NFS except what
showmount -a will return. That is one big reason why
I was just not much motivated enough to build it as an async function.

Once we all switch to NFSv4 this will all be moot since the MOUNT
protocol no longer exists and neither does rpc.mountd.

>
> Max
>

Re: [Qemu-devel] [PATCH V2 1/2] block/nfs: tear down aio before nfs_close

2019-09-13 Thread Peter Lieven

Am 13.09.19 um 11:51 schrieb Max Reitz:
> On 10.09.19 17:41, Peter Lieven wrote:
>> nfs_close is a sync call from libnfs and has its own event
>> handler polling on the nfs FD. Avoid that both QEMU and libnfs
>> are intefering here.
>>
>> CC: qemu-sta...@nongnu.org
>> Signed-off-by: Peter Lieven 
>> ---
>>  block/nfs.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
> I’ve just seen that Kevin has already included the second patch (in its
> v1) in his pull request.
>
> So all that I can do is take this patch, which sounds good to me,
> especially since Ronnie has agreed that we should remove our FD handler
> there.
>
> (So I’ve rebased the patch on top of Kevin’s pull request, and I’ve
> taken it to my block branch.)


Thank you. After I discovered that we had this bug also before I added the 
nfs_umount call I thought

it would be good to have this patch first and the add the umount call because 
the fix should also go into

stable because in theory we could also run into trouble with just the *sync* 
nfs_clsoe call.


Peter

Re: [Qemu-devel] [PATCH] blockdev: avoid acquiring AioContext lock twice at do_drive_backup()

2019-09-13 Thread Sergio Lopez


Max Reitz  writes:

> On 13.09.19 11:37, Sergio Lopez wrote:
>> 
>> Max Reitz  writes:
>> 
>>> On 12.09.19 18:16, Sergio Lopez wrote:
 do_drive_backup() acquires the AioContext lock of the corresponding
 BlockDriverState. This is not a problem when it's called from
 qmp_drive_backup(), but drive_backup_prepare() also acquires the lock
 before calling it.

 This change adds a BlockDriverState argument to do_drive_backup(),
 which is used to signal that the context lock is already acquired and
 to save a couple of redundant calls.
>>>
>>> But those redundant calls don’t really hurt (it’s just bdrv_lookup_bs(),
>>> as far as I can tell).  Wouldn’t it be simpler to just release the
>>> context lock in drive_backup_prepare() before calling do_drive_backup()?
>>>  The BDS is drained anyway.
>> 
>> Redundant calls rarely hurt, they're just redundant ;-)
>
> If they’re expensive and in a hot path, they hurt.
>
>>> On top of that, do_backup_common() calls bdrv_try_set_aio_context() to
>>> bring the target into the source’s AioContext.  However, this function
>>> must be called with the old AioContext held, and the new context not held.
>> 
>> Is this documented somewhere? I see nothing in the function declaration
>> nor definition.
>> 
>> I'm starting to get the feeling that the block layer is riddled with
>> unwritten rules and assumptions that makes every change a lot harder
>> than it should be.
>
> It is written, it’s just that it’s written in
> bdrv_set_aio_context_ignore()’s definition.
>
> Yes, we should document it directly for bdrv_try_set_aio_context(), too,
> because that’s what external callers are much more likely to use.
>
>>> Currently, it’s called exactly the other way around: With the new
>>> context held, but the old one not held.
>>>
>>> So I think it indeed actually makes more sense to release the AioContext
>>> before calling do_drive_backup(), and to move the
>>> bdrv_try_set_aio_context() call for target_bs to the callers of
>>> do_backup_common() (where they have not yet taken the AioContext lock).
>> 
>> OK. I see this also happens in external_snapshot_prepare() and
>> qmp_drive_mirror() too. I guess we should fix these too.
>> 
>> In qmp_drive_mirror(), would it be safe to delay the acquisition of any
>> context until just before the blockdev_mirror_common()?
>
> From mirror’s perspective I think so, but I don’t think it’s safe to
> access any of a BDS’s fields without having acquired its AioContext.
> (In fact, I wonder whether we should acquire the context even before
> bdrv_op_is_blocked()...)

In that case, I wonder if we can safely release the context to honor
bdrv_try_set_aio_context() requirements, knowing we aren't in a drained
section.

Sergio.



signature.asc
Description: PGP signature

[Qemu-devel] [PATCH 0/2] HPPA tcg fixes

2019-09-13 Thread Sven Schnelle

Hi Richard,

here are two patches for HPPA tcg. QEMU was crashing with a tcg assert
because dead temporaries where used. This could be observed at the end·
of a HP-UX 11.11 installation, or by running the STARBASE X11 demos in
HP-UX 10.20.

Thanks,
Sven

Sven Schnelle (2):
  target/hppa: prevent trashing of temporary in trans_mtctl()
  target/hppa: prevent trashing of temporary in do_depw_sar()

 target/hppa/translate.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

-- 
2.23.0.rc1

[Qemu-devel] [PATCH 2/2] target/hppa: prevent trashing of temporary in do_depw_sar()

2019-09-13 Thread Sven Schnelle

nullify_over() calls brcond which destroys all temporaries.

Signed-off-by: Sven Schnelle 
---
 target/hppa/translate.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index b12525d535..c1b2822f60 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -3404,10 +3404,6 @@ static bool do_depw_sar(DisasContext *ctx, unsigned rt, 
unsigned c,
 TCGv_reg mask, tmp, shift, dest;
 unsigned msb = 1U << (len - 1);
 
-if (c) {
-nullify_over(ctx);
-}
-
 dest = dest_gpr(ctx, rt);
 shift = tcg_temp_new();
 tmp = tcg_temp_new();
@@ -3440,11 +3436,17 @@ static bool do_depw_sar(DisasContext *ctx, unsigned rt, 
unsigned c,
 
 static bool trans_depw_sar(DisasContext *ctx, arg_depw_sar *a)
 {
+if (a->c) {
+nullify_over(ctx);
+}
 return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_gpr(ctx, a->r));
 }
 
 static bool trans_depwi_sar(DisasContext *ctx, arg_depwi_sar *a)
 {
+if (a->c) {
+nullify_over(ctx);
+}
 return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_const(ctx, a->i));
 }
 
-- 
2.23.0.rc1

[Qemu-devel] [PATCH 1/2] target/hppa: prevent trashing of temporary in trans_mtctl()

2019-09-13 Thread Sven Schnelle

nullify_over() calls brcond which destroys all temporaries.

Signed-off-by: Sven Schnelle 
---
 target/hppa/translate.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 53e17d8963..b12525d535 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2214,10 +2214,11 @@ static bool trans_mtsp(DisasContext *ctx, arg_mtsp *a)
 static bool trans_mtctl(DisasContext *ctx, arg_mtctl *a)
 {
 unsigned ctl = a->t;
-TCGv_reg reg = load_gpr(ctx, a->r);
+TCGv_reg reg;
 TCGv_reg tmp;
 
 if (ctl == CR_SAR) {
+reg = load_gpr(ctx, a->r);
 tmp = tcg_temp_new();
 tcg_gen_andi_reg(tmp, reg, TARGET_REGISTER_BITS - 1);
 save_or_nullify(ctx, cpu_sar, tmp);
@@ -2232,6 +2233,8 @@ static bool trans_mtctl(DisasContext *ctx, arg_mtctl *a)
 
 #ifndef CONFIG_USER_ONLY
 nullify_over(ctx);
+reg = load_gpr(ctx, a->r);
+
 switch (ctl) {
 case CR_IT:
 gen_helper_write_interval_timer(cpu_env, reg);
-- 
2.23.0.rc1

Re: [Qemu-devel] [PATCH V2 1/2] block/nfs: tear down aio before nfs_close

2019-09-13 Thread Kevin Wolf

Am 13.09.2019 um 11:51 hat Max Reitz geschrieben:
> On 10.09.19 17:41, Peter Lieven wrote:
> > nfs_close is a sync call from libnfs and has its own event
> > handler polling on the nfs FD. Avoid that both QEMU and libnfs
> > are intefering here.
> > 
> > CC: qemu-sta...@nongnu.org
> > Signed-off-by: Peter Lieven 
> > ---
> >  block/nfs.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> I’ve just seen that Kevin has already included the second patch (in its
> v1) in his pull request.

Oops, sorry about this. Peter hasn't pulled yet, so I'll update the tag
for the pull request. Let's see which version gets merged in the end.

Kevin


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 1/5] rcu: Add automatically released rcu_read_lock variant

2019-09-13 Thread Dr. David Alan Gilbert

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> On 12/09/19 19:45, Dr. David Alan Gilbert wrote:
> > Do you think it's best to use the block version for all cases
> > or to use the non-block version by taste?
> > The block version is quite nice, but that turns most of the patches
> > into 'indent everything'.
> 
> I don't really know myself.

OK, new version coming up with a mix - the diffs do look a lot more
hectic with the block version.

> On first glance I didn't like too much the non-block version and thought
> it was because our coding standards does not include variables declared
> in the middle of a block.

I took that as being a coding standard to avoid confusing humans and
since it wasn't visible it didn't matter too much.

> However, I think what really bothering me is
> "AUTO" in the name.  What do you think about "RCU_READ_LOCK_GUARD()"?
> The block version would have the additional prefix "WITH_".

Oh well, if it's just the name we can fix that.

> We could also add LOCK_GUARD(lock) and WITH_LOCK_GUARD(lock), using
> QemuLockable for polymorphism.  I even had patches a while ago (though
> they used something like LOCK_GUARD(guard_var, lock).  I think we
> dropped them because of fear that the API was a bit over-engineered.

Probably a separate set.

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v3 3/5] migration: Use automatic rcu_read unlock in ram.c

2019-09-13 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Use the automatic read unlocker in migration/ram.c

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 260 ++--
 1 file changed, 121 insertions(+), 139 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index cff35477ec..6c5f0199fd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -181,14 +181,14 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void 
*opaque)
 RAMBlock *block;
 int ret = 0;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
+
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 ret = func(block, opaque);
 if (ret) {
 break;
 }
 }
-rcu_read_unlock();
 return ret;
 }
 
@@ -1849,12 +1849,12 @@ static void migration_bitmap_sync(RAMState *rs)
 memory_global_dirty_log_sync();
 
 qemu_mutex_lock(&rs->bitmap_mutex);
-rcu_read_lock();
-RAMBLOCK_FOREACH_NOT_IGNORED(block) {
-ramblock_sync_dirty_bitmap(rs, block);
+WITH_RCU_READ_LOCK_GUARD() {
+RAMBLOCK_FOREACH_NOT_IGNORED(block) {
+ramblock_sync_dirty_bitmap(rs, block);
+}
+ram_counters.remaining = ram_bytes_remaining();
 }
-ram_counters.remaining = ram_bytes_remaining();
-rcu_read_unlock();
 qemu_mutex_unlock(&rs->bitmap_mutex);
 
 memory_global_after_dirty_log_sync();
@@ -2398,13 +2398,12 @@ static void migration_page_queue_free(RAMState *rs)
 /* This queue generally should be empty - but in the case of a failed
  * migration might have some droppings in.
  */
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 QSIMPLEQ_FOREACH_SAFE(mspr, &rs->src_page_requests, next_req, next_mspr) {
 memory_region_unref(mspr->rb->mr);
 QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
 g_free(mspr);
 }
-rcu_read_unlock();
 }
 
 /**
@@ -2425,7 +2424,8 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t 
start, ram_addr_t len)
 RAMState *rs = ram_state;
 
 ram_counters.postcopy_requests++;
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
+
 if (!rbname) {
 /* Reuse last RAMBlock */
 ramblock = rs->last_req_rb;
@@ -2467,12 +2467,10 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t 
start, ram_addr_t len)
 QSIMPLEQ_INSERT_TAIL(&rs->src_page_requests, new_entry, next_req);
 migration_make_urgent_request();
 qemu_mutex_unlock(&rs->src_page_req_mutex);
-rcu_read_unlock();
 
 return 0;
 
 err:
-rcu_read_unlock();
 return -1;
 }
 
@@ -2712,7 +2710,8 @@ static uint64_t ram_bytes_total_common(bool count_ignored)
 RAMBlock *block;
 uint64_t total = 0;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
+
 if (count_ignored) {
 RAMBLOCK_FOREACH_MIGRATABLE(block) {
 total += block->used_length;
@@ -2722,7 +2721,6 @@ static uint64_t ram_bytes_total_common(bool count_ignored)
 total += block->used_length;
 }
 }
-rcu_read_unlock();
 return total;
 }
 
@@ -3086,7 +3084,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
 RAMBlock *block;
 int ret;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 
 /* This should be our last sync, the src is now paused */
 migration_bitmap_sync(rs);
@@ -3107,13 +3105,11 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
  * point.
  */
 error_report("migration ram resized during precopy phase");
-rcu_read_unlock();
 return -EINVAL;
 }
 /* Deal with TPS != HPS and huge pages */
 ret = postcopy_chunk_hostpages(ms, block);
 if (ret) {
-rcu_read_unlock();
 return ret;
 }
 
@@ -3128,7 +3124,6 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
 trace_ram_postcopy_send_discard_bitmap();
 
 ret = postcopy_each_ram_send_discard(ms);
-rcu_read_unlock();
 
 return ret;
 }
@@ -3149,7 +3144,7 @@ int ram_discard_range(const char *rbname, uint64_t start, 
size_t length)
 
 trace_ram_discard_range(rbname, start, length);
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 RAMBlock *rb = qemu_ram_block_by_name(rbname);
 
 if (!rb) {
@@ -3169,8 +3164,6 @@ int ram_discard_range(const char *rbname, uint64_t start, 
size_t length)
 ret = ram_block_discard_range(rb, start, length);
 
 err:
-rcu_read_unlock();
-
 return ret;
 }
 
@@ -3303,13 +3296,12 @@ static void ram_init_bitmaps(RAMState *rs)
 /* For memory_global_dirty_log_start below.  */
 qemu_mutex_lock_iothread();
 qemu_mutex_lock_ramlist();
-rcu_read_lock();
 
-ram_list_init_bitmaps();
-memory_global_dirty_log_start();
-migration_bitmap_sync_precopy(rs);
-
-rcu_read_unlock();
+WITH_RCU_READ_LOCK_GUARD() {
+ram_list_init_bitmaps();
+memory_global_dirty_log_start();
+migration_bitmap_sync_precopy(rs);
+}
 qemu_mute

[Qemu-devel] [PATCH v3 1/5] rcu: Add automatically released rcu_read_lock variants

2019-09-13 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

RCU_READ_LOCK_GUARD() takes the rcu_read_lock and then uses glib's
g_auto infrastructure (and thus whatever the compiler's hooks are) to
release it on all exits of the block.

WITH_RCU_READ_LOCK_GUARD() is similar but is used as a wrapper for the
lock, i.e.:

   WITH_RCU_READ_LOCK_GUARD() {
   stuff under lock
   }

Signed-off-by: Dr. David Alan Gilbert 
---
 docs/devel/rcu.txt | 16 
 include/qemu/rcu.h | 25 +
 2 files changed, 41 insertions(+)

diff --git a/docs/devel/rcu.txt b/docs/devel/rcu.txt
index c84e7f42b2..d83fed2f79 100644
--- a/docs/devel/rcu.txt
+++ b/docs/devel/rcu.txt
@@ -187,6 +187,22 @@ The following APIs must be used before RCU is used in a 
thread:
 Note that these APIs are relatively heavyweight, and should _not_ be
 nested.
 
+Convenience macros
+==
+
+Two macros are provided that automatically release the read lock at the
+end of the scope.
+
+  RCU_READ_LOCK_GUARD()
+
+ Takes the lock and will release it at the end of the block it's
+ used in.
+
+  WITH_RCU_READ_LOCK_GUARD()  { code }
+
+ Is used at the head of a block to protect the code within the block.
+
+Note that 'goto'ing out of the guarded block will also drop the lock.
 
 DIFFERENCES WITH LINUX
 ==
diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 22876d1428..3a8d4cf28b 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -154,6 +154,31 @@ extern void call_rcu1(struct rcu_head *head, RCUCBFunc 
*func);
   }),\
   (RCUCBFunc *)g_free);
 
+typedef void RCUReadAuto;
+static inline RCUReadAuto *rcu_read_auto_lock(void)
+{
+rcu_read_lock();
+/* Anything non-NULL causes the cleanup function to be called */
+return (void *)(uintptr_t)0x1;
+}
+
+static inline void rcu_read_auto_unlock(RCUReadAuto *r)
+{
+rcu_read_unlock();
+}
+
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(RCUReadAuto, rcu_read_auto_unlock)
+
+#define WITH_RCU_READ_LOCK_GUARD() \
+WITH_RCU_READ_LOCK_GUARD_(_rcu_read_auto##__COUNTER__)
+
+#define WITH_RCU_READ_LOCK_GUARD_(var) \
+for (g_autoptr(RCUReadAuto) var = rcu_read_auto_lock(); \
+(var); rcu_read_auto_unlock(var), (var) = NULL)
+
+#define RCU_READ_LOCK_GUARD() \
+g_autoptr(RCUReadAuto) _rcu_read_auto = rcu_read_auto_lock()
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.21.0

[Qemu-devel] [PATCH v3 0/5] Automatic RCU read unlock

2019-09-13 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

This patch uses glib's g_auto mechanism to automatically free
rcu_read_lock's at the end of the block.  Given that humans
have a habit of forgetting an error path somewhere it's
best to leave it to the compiler.

v3
  Add block-head version of macro
  Rename
  Add docs
  Convert more cases using the block-head version

Dr. David Alan Gilbert (5):
  rcu: Add automatically released rcu_read_lock variants
  migration: Fix missing rcu_read_unlock
  migration: Use automatic rcu_read unlock in ram.c
  migration: Use automatic rcu_read unlock in rdma.c
  rcu: Use automatic rc_read unlock in core memory/exec code

 docs/devel/rcu.txt  |  16 +++
 exec.c  | 120 +++-
 include/exec/ram_addr.h | 138 +--
 include/qemu/rcu.h  |  25 
 memory.c|  15 +-
 migration/ram.c | 295 +++-
 migration/rdma.c|  57 ++--
 7 files changed, 310 insertions(+), 356 deletions(-)

-- 
2.21.0

[Qemu-devel] [PATCH v3 2/5] migration: Fix missing rcu_read_unlock

2019-09-13 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Use the automatic rcu_read unlocker to fix a missing unlock.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index b2bd618a89..cff35477ec 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3445,28 +3445,27 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 (*rsp)->f = f;
 
-rcu_read_lock();
-
-qemu_put_be64(f, ram_bytes_total_common(true) | RAM_SAVE_FLAG_MEM_SIZE);
+WITH_RCU_READ_LOCK_GUARD() {
+qemu_put_be64(f, ram_bytes_total_common(true) | 
RAM_SAVE_FLAG_MEM_SIZE);
 
-RAMBLOCK_FOREACH_MIGRATABLE(block) {
-if (!block->idstr[0]) {
-error_report("%s: RAMBlock with empty name", __func__);
-return -1;
-}
-qemu_put_byte(f, strlen(block->idstr));
-qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
-qemu_put_be64(f, block->used_length);
-if (migrate_postcopy_ram() && block->page_size != qemu_host_page_size) 
{
-qemu_put_be64(f, block->page_size);
-}
-if (migrate_ignore_shared()) {
-qemu_put_be64(f, block->mr->addr);
+RAMBLOCK_FOREACH_MIGRATABLE(block) {
+if (!block->idstr[0]) {
+error_report("%s: RAMBlock with empty name", __func__);
+return -1;
+}
+qemu_put_byte(f, strlen(block->idstr));
+qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
+qemu_put_be64(f, block->used_length);
+if (migrate_postcopy_ram() && block->page_size !=
+  qemu_host_page_size) {
+qemu_put_be64(f, block->page_size);
+}
+if (migrate_ignore_shared()) {
+qemu_put_be64(f, block->mr->addr);
+}
 }
 }
 
-rcu_read_unlock();
-
 ram_control_before_iterate(f, RAM_CONTROL_SETUP);
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-- 
2.21.0

[Qemu-devel] [PATCH v3 5/5] rcu: Use automatic rc_read unlock in core memory/exec code

2019-09-13 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Signed-off-by: Dr. David Alan Gilbert 
---
 exec.c  | 120 +++---
 include/exec/ram_addr.h | 138 +++-
 memory.c|  15 ++---
 3 files changed, 120 insertions(+), 153 deletions(-)

diff --git a/exec.c b/exec.c
index 235d6bc883..e75be06819 100644
--- a/exec.c
+++ b/exec.c
@@ -1034,16 +1034,14 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr 
addr, MemTxAttrs attrs)
 return;
 }
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 mr = address_space_translate(as, addr, &addr, &l, false, attrs);
 if (!(memory_region_is_ram(mr)
   || memory_region_is_romd(mr))) {
-rcu_read_unlock();
 return;
 }
 ram_addr = memory_region_get_ram_addr(mr) + addr;
 tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
-rcu_read_unlock();
 }
 
 static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
@@ -1329,14 +1327,13 @@ static void tlb_reset_dirty_range_all(ram_addr_t start, 
ram_addr_t length)
 end = TARGET_PAGE_ALIGN(start + length);
 start &= TARGET_PAGE_MASK;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 block = qemu_get_ram_block(start);
 assert(block == qemu_get_ram_block(end - 1));
 start1 = (uintptr_t)ramblock_ptr(block, start - block->offset);
 CPU_FOREACH(cpu) {
 tlb_reset_dirty(cpu, start1, length);
 }
-rcu_read_unlock();
 }
 
 /* Note: start and end must be within the same ram block.  */
@@ -1357,30 +1354,29 @@ bool 
cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
 end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
 page = start >> TARGET_PAGE_BITS;
 
-rcu_read_lock();
-
-blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
-ramblock = qemu_get_ram_block(start);
-/* Range sanity check on the ramblock */
-assert(start >= ramblock->offset &&
-   start + length <= ramblock->offset + ramblock->used_length);
-
-while (page < end) {
-unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
-unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
-unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+WITH_RCU_READ_LOCK_GUARD() {
+blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+ramblock = qemu_get_ram_block(start);
+/* Range sanity check on the ramblock */
+assert(start >= ramblock->offset &&
+   start + length <= ramblock->offset + ramblock->used_length);
+
+while (page < end) {
+unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+unsigned long num = MIN(end - page,
+DIRTY_MEMORY_BLOCK_SIZE - offset);
+
+dirty |= bitmap_test_and_clear_atomic(blocks->blocks[idx],
+  offset, num);
+page += num;
+}
 
-dirty |= bitmap_test_and_clear_atomic(blocks->blocks[idx],
-  offset, num);
-page += num;
+mr_offset = (ram_addr_t)(page << TARGET_PAGE_BITS) - ramblock->offset;
+mr_size = (end - page) << TARGET_PAGE_BITS;
+memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
 }
 
-mr_offset = (ram_addr_t)(page << TARGET_PAGE_BITS) - ramblock->offset;
-mr_size = (end - page) << TARGET_PAGE_BITS;
-memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
-
-rcu_read_unlock();
-
 if (dirty && tcg_enabled()) {
 tlb_reset_dirty_range_all(start, length);
 }
@@ -1408,28 +1404,27 @@ DirtyBitmapSnapshot 
*cpu_physical_memory_snapshot_and_clear_dirty
 end  = last  >> TARGET_PAGE_BITS;
 dest = 0;
 
-rcu_read_lock();
-
-blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+WITH_RCU_READ_LOCK_GUARD() {
+blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
 
-while (page < end) {
-unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
-unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
-unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+while (page < end) {
+unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+unsigned long num = MIN(end - page,
+DIRTY_MEMORY_BLOCK_SIZE - offset);
 
-assert(QEMU_IS_ALIGNED(offset, (1 << BITS_PER_LEVEL)));
-assert(QEMU_IS_ALIGNED(num,(1 << BITS_PER_LEVEL)));
-offset >>= BITS_PER_LEVEL;
+assert(QEMU_IS_ALIGNED(offset, (1 << BITS_PER_LEVEL)));
+assert(QEMU_IS_ALIGNED(num,(1 << BITS_PER_LEVEL)));
+offset >>= BITS_PER_LEVEL;
 
-bitmap_copy_and_clear_atomic(snap->dirty + des

[Qemu-devel] [PATCH v3 4/5] migration: Use automatic rcu_read unlock in rdma.c

2019-09-13 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Use the automatic read unlocker in migration/rdma.c.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 57 ++--
 1 file changed, 11 insertions(+), 46 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 78e6b72bac..5c9054721d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -88,7 +88,6 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
 " to abort!"); \
 rdma->error_reported = 1; \
 } \
-rcu_read_unlock(); \
 return rdma->error_state; \
 } \
 } while (0)
@@ -2678,11 +2677,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 size_t i;
 size_t len = 0;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 rdma = atomic_rcu_read(&rioc->rdmaout);
 
 if (!rdma) {
-rcu_read_unlock();
 return -EIO;
 }
 
@@ -2695,7 +2693,6 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 ret = qemu_rdma_write_flush(f, rdma);
 if (ret < 0) {
 rdma->error_state = ret;
-rcu_read_unlock();
 return ret;
 }
 
@@ -2715,7 +2712,6 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
 if (ret < 0) {
 rdma->error_state = ret;
-rcu_read_unlock();
 return ret;
 }
 
@@ -2724,7 +2720,6 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 }
 }
 
-rcu_read_unlock();
 return done;
 }
 
@@ -2764,11 +2759,10 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 ssize_t i;
 size_t done = 0;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 rdma = atomic_rcu_read(&rioc->rdmain);
 
 if (!rdma) {
-rcu_read_unlock();
 return -EIO;
 }
 
@@ -2805,7 +2799,6 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
 if (ret < 0) {
 rdma->error_state = ret;
-rcu_read_unlock();
 return ret;
 }
 
@@ -2819,14 +2812,12 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 /* Still didn't get enough, so lets just return */
 if (want) {
 if (done == 0) {
-rcu_read_unlock();
 return QIO_CHANNEL_ERR_BLOCK;
 } else {
 break;
 }
 }
 }
-rcu_read_unlock();
 return done;
 }
 
@@ -2882,7 +2873,7 @@ qio_channel_rdma_source_prepare(GSource *source,
 GIOCondition cond = 0;
 *timeout = -1;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 if (rsource->condition == G_IO_IN) {
 rdma = atomic_rcu_read(&rsource->rioc->rdmain);
 } else {
@@ -2891,7 +2882,6 @@ qio_channel_rdma_source_prepare(GSource *source,
 
 if (!rdma) {
 error_report("RDMAContext is NULL when prepare Gsource");
-rcu_read_unlock();
 return FALSE;
 }
 
@@ -2900,7 +2890,6 @@ qio_channel_rdma_source_prepare(GSource *source,
 }
 cond |= G_IO_OUT;
 
-rcu_read_unlock();
 return cond & rsource->condition;
 }
 
@@ -2911,7 +2900,7 @@ qio_channel_rdma_source_check(GSource *source)
 RDMAContext *rdma;
 GIOCondition cond = 0;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 if (rsource->condition == G_IO_IN) {
 rdma = atomic_rcu_read(&rsource->rioc->rdmain);
 } else {
@@ -2920,7 +2909,6 @@ qio_channel_rdma_source_check(GSource *source)
 
 if (!rdma) {
 error_report("RDMAContext is NULL when check Gsource");
-rcu_read_unlock();
 return FALSE;
 }
 
@@ -2929,7 +2917,6 @@ qio_channel_rdma_source_check(GSource *source)
 }
 cond |= G_IO_OUT;
 
-rcu_read_unlock();
 return cond & rsource->condition;
 }
 
@@ -2943,7 +2930,7 @@ qio_channel_rdma_source_dispatch(GSource *source,
 RDMAContext *rdma;
 GIOCondition cond = 0;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 if (rsource->condition == G_IO_IN) {
 rdma = atomic_rcu_read(&rsource->rioc->rdmain);
 } else {
@@ -2952,7 +2939,6 @@ qio_channel_rdma_source_dispatch(GSource *source,
 
 if (!rdma) {
 error_report("RDMAContext is NULL when dispatch Gsource");
-rcu_read_unlock();
 return FALSE;
 }
 
@@ -2961,7 +2947,6 @@ qio_channel_rdma_source_dispatch(GSource *source,
 }
 cond |= G_IO_OUT;
 
-rcu_read_unlock();
 return (*func)(QIO_CHANNEL(rsource->rioc),
(cond & rsource->condition),
user_data);
@@ -3058,7 +3043,7 @@ qio_channel_rdma_shutdown(QIOChannel *ioc,
 QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
 RDMAContext *rdmain, *rdmaout;
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
 
 rdmain = atomic_rcu_read(&rioc->rdmain);
 rdmaout = atomic_rcu_read(&rioc->rdmain);
@@ -3085,7 +3070,6 @@ qio_channel_rdma_shutdown(QIOChannel *ioc,
 break;
 }
 
-rcu_read_un

Re: [Qemu-devel] [PATCH v2 1/5] rcu: Add automatically released rcu_read_lock variant

2019-09-13 Thread Paolo Bonzini

On 13/09/19 12:24, Dr. David Alan Gilbert wrote:
>> We could also add LOCK_GUARD(lock) and WITH_LOCK_GUARD(lock), using
>> QemuLockable for polymorphism.  I even had patches a while ago (though
>> they used something like LOCK_GUARD(guard_var, lock).  I think we
>> dropped them because of fear that the API was a bit over-engineered.
> Probably a separate set.

Definitely so.  Thanks!

Paolo

Re: [Qemu-devel] [PATCH v3 1/3] block/qcow2: refactoring of threaded encryption code

2019-09-13 Thread Max Reitz

On 13.09.19 00:37, Maxim Levitsky wrote:
> This commit tries to clarify few function arguments,
> and add comments describing the encrypt/decrypt interface
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/qcow2-cluster.c |  8 +++---
>  block/qcow2-threads.c | 63 ++-
>  2 files changed, 54 insertions(+), 17 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index f09cc992af..b95e64c237 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -463,8 +463,8 @@ static int coroutine_fn 
> do_perform_cow_read(BlockDriverState *bs,
>  }
>  
>  static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
> -uint64_t src_cluster_offset,
> -uint64_t cluster_offset,
> +uint64_t 
> guest_cluster_offset,
> +uint64_t host_cluster_offset,
>  unsigned offset_in_cluster,
>  uint8_t *buffer,
>  unsigned bytes)
> @@ -474,8 +474,8 @@ static bool coroutine_fn 
> do_perform_cow_encrypt(BlockDriverState *bs,
>  assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
>  assert((bytes & ~BDRV_SECTOR_MASK) == 0);
>  assert(s->crypto);
> -if (qcow2_co_encrypt(bs, cluster_offset,
> - src_cluster_offset + offset_in_cluster,
> +if (qcow2_co_encrypt(bs, host_cluster_offset,
> + guest_cluster_offset + offset_in_cluster,
>   buffer, bytes) < 0) {
>  return false;
>  }
> diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
> index 3b1e63fe41..6da1838e95 100644
> --- a/block/qcow2-threads.c
> +++ b/block/qcow2-threads.c
> @@ -234,15 +234,19 @@ static int qcow2_encdec_pool_func(void *opaque)
>  }
>  
>  static int coroutine_fn
> -qcow2_co_encdec(BlockDriverState *bs, uint64_t file_cluster_offset,
> -  uint64_t offset, void *buf, size_t len, Qcow2EncDecFunc 
> func)
> +qcow2_co_encdec(BlockDriverState *bs, uint64_t host_cluster_offset,
> +uint64_t guest_offset, void *buf, size_t len,
> +Qcow2EncDecFunc func)
>  {
>  BDRVQcow2State *s = bs->opaque;
> +
> +uint64_t offset = s->crypt_physical_offset ?
> +host_cluster_offset + offset_into_cluster(s, guest_offset) :
> +guest_offset;
> +
>  Qcow2EncDecData arg = {
>  .block = s->crypto,
> -.offset = s->crypt_physical_offset ?
> -  file_cluster_offset + offset_into_cluster(s, offset) :
> -  offset,
> +.offset = offset,
>  .buf = buf,
>  .len = len,
>  .func = func,
> @@ -251,18 +255,51 @@ qcow2_co_encdec(BlockDriverState *bs, uint64_t 
> file_cluster_offset,
>  return qcow2_co_process(bs, qcow2_encdec_pool_func, &arg);
>  }
>  
> +
> +/*
> + * qcow2_co_encrypt()
> + *
> + * Encrypts one or more contiguous aligned sectors
> + *
> + * @host_cluster_offset - underlying storage offset of the first cluster
> + * in which the encrypted data will be written
> + * Used as an initialization vector for encryption

s/as an/for generating the/

(AFAIU)

> + *
> + * @guest_offset - guest (virtual) offset of the first sector of the
> + * data to be encrypted
> + * Used as an initialization vector for older, qcow2 native encryption

I wouldn’t be so specific here.  I think it’d be better to just have a
common sentence like “Depending on the encryption method, either of
these offsets may be used for generating the initialization vector for
encryption.”

Well, technically, host_cluster_offset will not be used itself.  Before
we can use it, of course we have to add the in-cluster offset to it
(which qcow2_co_encdec() does).

This makes me wonder whether it wouldn’t make more sense to pass a
host_offset instead of a host_cluster_offset (i.e. make the callers add
the in-cluster offset to the host offset already)?

> + *
> + * @buf - buffer with the data to encrypt, that after encryption
> + *will be written to the underlying storage device at
> + *@host_cluster_offset

I think just “buffer with the data to encrypt” is sufficient.  (The rest
is just the same as above.)

> + *
> + * @len - length of the buffer (in sector size multiplies)

“In sector size multiples” to me means that it is a sector count (in
that “one sector size multiple” is equal to 512 byes).

Maybe “must be a BDRV_SECTOR_SIZE multiple” instead?

> + *
> + * Note that the group of the sectors, don't have to be aligned
> + * on cluster boundary and can also cross a cluster boundary.

Maybe “Note that while the whole range must be aligned on sectors, it
does not have to be aligned on clusters and can also cr

Re: [Qemu-devel] [PULL 0/1] Filemon test patches

2019-09-13 Thread Peter Maydell

On Wed, 11 Sep 2019 at 10:33, Daniel P. Berrangé  wrote:
>
> The following changes since commit 89ea03a7dc83ca36b670ba7f787802791fcb04b1:
>
>   Merge remote-tracking branch 
> 'remotes/huth-gitlab/tags/m68k-pull-2019-09-07' into staging (2019-09-09 
> 09:48:34 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/berrange/qemu tags/filemon-test-pull-request
>
> for you to fetch changes up to bf9e0313c27d8e6ecd7f7de3d63e1cb25d8f6311:
>
>   tests: make filemonitor test more robust to event ordering (2019-09-11 
> 10:29:27 +0100)
>
> 
> Fix filemonitor test broken with newest Linux kernel
>
> 
>
> Daniel P. Berrangé (1):
>   tests: make filemonitor test more robust to event ordering
>

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM

[Qemu-devel] [PATCH 1/2] trace: Remove trailing newline in events

2019-09-13 Thread Philippe Mathieu-Daudé

While the tracing frawework does not forbid trailing newline in
events format string, using them lead to confuse output.
It is the responsibility of the backend to properly end an event
line.

Some of our formats have trailing newlines, remove them.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/misc/trace-events | 10 +-
 hw/scsi/trace-events |  2 +-
 hw/sd/trace-events   |  2 +-
 nbd/trace-events |  4 ++--
 net/trace-events |  6 +++---
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/misc/trace-events b/hw/misc/trace-events
index c1ea1aa437..74276225f8 100644
--- a/hw/misc/trace-events
+++ b/hw/misc/trace-events
@@ -118,11 +118,11 @@ iotkit_secctl_ns_read(uint32_t offset, uint64_t data, 
unsigned size) "IoTKit Sec
 iotkit_secctl_ns_write(uint32_t offset, uint64_t data, unsigned size) "IoTKit 
SecCtl NS regs write: offset 0x%x data 0x%" PRIx64 " size %u"
 
 # imx6ul_ccm.c
-ccm_entry(void) "\n"
-ccm_freq(uint32_t freq) "freq = %d\n"
-ccm_clock_freq(uint32_t clock, uint32_t freq) "(Clock = %d) = %d\n"
-ccm_read_reg(const char *reg_name, uint32_t value) "reg[%s] <= 0x%" PRIx32 "\n"
-ccm_write_reg(const char *reg_name, uint32_t value) "reg[%s] => 0x%" PRIx32 
"\n"
+ccm_entry(void) ""
+ccm_freq(uint32_t freq) "freq = %d"
+ccm_clock_freq(uint32_t clock, uint32_t freq) "(Clock = %d) = %d"
+ccm_read_reg(const char *reg_name, uint32_t value) "reg[%s] <= 0x%" PRIx32
+ccm_write_reg(const char *reg_name, uint32_t value) "reg[%s] => 0x%" PRIx32
 
 # iotkit-sysinfo.c
 iotkit_sysinfo_read(uint64_t offset, uint64_t data, unsigned size) "IoTKit 
SysInfo read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events
index 452b5994e6..b0820052f8 100644
--- a/hw/scsi/trace-events
+++ b/hw/scsi/trace-events
@@ -28,7 +28,7 @@ mptsas_mmio_read(void *dev, uint32_t addr, uint32_t val) "dev 
%p addr 0x%08x val
 mptsas_mmio_unhandled_read(void *dev, uint32_t addr) "dev %p addr 0x%08x"
 mptsas_mmio_unhandled_write(void *dev, uint32_t addr, uint32_t val) "dev %p 
addr 0x%08x value 0x%x"
 mptsas_mmio_write(void *dev, uint32_t addr, uint32_t val) "dev %p addr 0x%08x 
value 0x%x"
-mptsas_process_message(void *dev, int msg, uint32_t ctx) "dev %p cmd %d 
context 0x%08x\n"
+mptsas_process_message(void *dev, int msg, uint32_t ctx) "dev %p cmd %d 
context 0x%08x"
 mptsas_process_scsi_io_request(void *dev, int bus, int target, int lun, 
uint64_t len) "dev %p dev %d:%d:%d length %"PRIu64""
 mptsas_reset(void *dev) "dev %p "
 mptsas_scsi_overflow(void *dev, uint32_t ctx, uint64_t req, uint64_t found) 
"dev %p context 0x%08x: %"PRIu64"/%"PRIu64""
diff --git a/hw/sd/trace-events b/hw/sd/trace-events
index 52971dc033..efcff666a2 100644
--- a/hw/sd/trace-events
+++ b/hw/sd/trace-events
@@ -4,7 +4,7 @@
 bcm2835_sdhost_read(uint64_t offset, uint64_t data, unsigned size) "offset 
0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 bcm2835_sdhost_write(uint64_t offset, uint64_t data, unsigned size) "offset 
0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 bcm2835_sdhost_edm_change(const char *why, uint32_t edm) "(%s) EDM now 0x%x"
-bcm2835_sdhost_update_irq(uint32_t irq) "IRQ bits 0x%x\n"
+bcm2835_sdhost_update_irq(uint32_t irq) "IRQ bits 0x%x"
 
 # core.c
 sdbus_command(const char *bus_name, uint8_t cmd, uint32_t arg) "@%s CMD%02d 
arg 0x%08x"
diff --git a/nbd/trace-events b/nbd/trace-events
index f6cde96790..a955918e97 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -61,8 +61,8 @@ nbd_negotiate_begin(void) "Beginning negotiation"
 nbd_negotiate_new_style_size_flags(uint64_t size, unsigned flags) "advertising 
size %" PRIu64 " and flags 0x%x"
 nbd_negotiate_success(void) "Negotiation succeeded"
 nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t 
from, uint32_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 
", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu32 " }"
-nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching 
clients to AIO context %p\n"
-nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients 
from AIO context %p\n"
+nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching 
clients to AIO context %p"
+nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients 
from AIO context %p"
 nbd_co_send_simple_reply(uint64_t handle, uint32_t error, const char *errname, 
int len) "Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 " (%s), 
len = %d"
 nbd_co_send_structured_done(uint64_t handle) "Send structured reply done: 
handle = %" PRIu64
 nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void *data, 
size_t size) "Send structured read data reply: handle = %" PRIu64 ", offset = 
%" PRIu64 ", data = %p, len = %zu"
diff --git a/net/trace-events b/net/trace-events
index ac57056497..02c13fd0ba 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -17,9 +17,9 @@ colo_compare_icmp_miscompare(const char *sta, int size) ": %s 
= %d"

[Qemu-devel] [PATCH 0/2] trace: Forbid trailing newline in event format

2019-09-13 Thread Philippe Mathieu-Daudé

Hi Stefan,

I'v been confused by trailing newline in trace reports,
so this series aims to fix this, by cleaning current
formats and add a check to catch new one introduced.

Regards,

Phil.

Philippe Mathieu-Daudé (2):
  trace: Remove trailing newline in events
  trace: Forbid event format ending with newline character

 docs/devel/tracing.txt|  2 ++
 hw/misc/trace-events  | 10 +-
 hw/scsi/trace-events  |  2 +-
 hw/sd/trace-events|  2 +-
 nbd/trace-events  |  4 ++--
 net/trace-events  |  6 +++---
 scripts/tracetool/__init__.py |  3 +++
 7 files changed, 17 insertions(+), 12 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH 2/2] trace: Forbid event format ending with newline character

2019-09-13 Thread Philippe Mathieu-Daudé

Event format ending with newlines confuse the trace reports.
Forbid them.

Add a check to refuse new format added with trailing newline:

  $ make
  [...]
GEN hw/misc/trace.h
  Traceback (most recent call last):
File "scripts/tracetool.py", line 152, in 
  main(sys.argv)
File "scripts/tracetool.py", line 143, in main
  events.extend(tracetool.read_events(fh, arg))
File "scripts/tracetool/__init__.py", line 367, in read_events
  event = Event.build(line)
File "scripts/tracetool/__init__.py", line 281, in build
  raise ValueError("Event format can not end with a newline character")
  ValueError: Error at hw/misc/trace-events:121: Event format can not end with 
a newline character

Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/devel/tracing.txt| 2 ++
 scripts/tracetool/__init__.py | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/docs/devel/tracing.txt b/docs/devel/tracing.txt
index 76e492a489..8231bbf5d1 100644
--- a/docs/devel/tracing.txt
+++ b/docs/devel/tracing.txt
@@ -112,6 +112,8 @@ Trace events should use types as follows:
 Format strings should reflect the types defined in the trace event.  Take
 special care to use PRId64 and PRIu64 for int64_t and uint64_t types,
 respectively.  This ensures portability between 32- and 64-bit platforms.
+Format strings must not end with a newline character.  It is the responsibility
+of backends to adapt line ending for proper logging.
 
 Each event declaration will start with the event name, then its arguments,
 finally a format string for pretty-printing. For example:
diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 6fca674936..57df74e67c 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -277,6 +277,9 @@ class Event(object):
 if fmt.find("%m") != -1 or fmt_trans.find("%m") != -1:
 raise ValueError("Event format '%m' is forbidden, pass the error "
  "as an explicit trace argument")
+if fmt.endswith("\\n\""):
+raise ValueError("Event format must not end with a newline "
+ "character")
 
 if len(fmt_trans) > 0:
 fmt = [fmt_trans, fmt]
-- 
2.20.1

Re: [Qemu-devel] [PATCH v3 2/3] block/qcow2: fix the corruption when rebasing luks encrypted files

2019-09-13 Thread Max Reitz

On 13.09.19 00:37, Maxim Levitsky wrote:
> This fixes subtle corruption introduced by luks threaded encryption
> in commit 8ac0f15f335
> 
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922
> 
> The corruption happens when we do a write that
>* writes to two or more unallocated clusters at once
>* doesn't fully cover the first sector
>* doesn't fully cover the last sector
> 
> In this case, when allocating the new clusters we COW both areas
> prior to the write and after the write, and we encrypt them.
> 
> The above mentioned commit accidentally made it so we encrypt the
> second COW area using the physical cluster offset of the first area.
> 
> Fix this by:
>  * Remove the offset_in_cluster parameter of do_perform_cow_encrypt,
>since it is misleading. That offset can be larger than cluster size
>currently.
> 
>Instead just add the start and the end COW area offsets to both host
>and guest offsets that do_perform_cow_encrypt receives.
> 
> *  in do_perform_cow_encrypt, remove the cluster offset from the host_offset,
>and thus pass correctly to the qcow2_co_encrypt, the host cluster offset
>and full guest offset
> 
> In the bugreport that was triggered by rebasing a luks image to new,
> zero filled base, which lot of such writes, and causes some files
> with zero areas to contain garbage there instead.
> But as described above it can happen elsewhere as well
> 
> 
> Signed-off-by: Maxim Levitsky 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2-cluster.c | 29 +
>  1 file changed, 17 insertions(+), 12 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part

2019-09-13 Thread Vladimir Sementsov-Ogievskiy

13.09.2019 13:01, Kevin Wolf wrote:
> Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Further patch will run partial requests of iterations of
>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>> this, separate part which may be parallelized into separate function
>> (qcow2_co_preadv_task).
>>
>> While being here, also separate encrypted clusters reading to own
>> function, like it is done for compressed reading.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> Reviewed-by: Max Reitz 
>> ---
>>   qapi/block-core.json |   2 +-
>>   block/qcow2.c| 205 +++
>>   2 files changed, 111 insertions(+), 96 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 0d43d4f37c..dd80aa11db 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3266,7 +3266,7 @@
>>   'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
>>   'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
>>   'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
>> -'cor_write', 'cluster_alloc_space', 'none'] }
>> +'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
> 
> What's the point of this new blkdebug event?
> 
> Obviously, read_aio for an encrypted image must mean a read of encrypted
> data. The same image can never trigger both read_aio and
> read_encrypted, so why do we need to distinguish them as two different
> events?
> 

Seems I just done it looking at qcow2_co_preadv_compressed..

Anyway, I think you are right, so, I don't mind if Max drops this new event
and use read_aio in his branch, or I can resend the series or send a follow-up,
whichever you prefer.


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH 2/2] target/hppa: prevent trashing of temporary in do_depw_sar()

2019-09-13 Thread Philippe Mathieu-Daudé

Hi Sven,

On 9/13/19 12:17 PM, Sven Schnelle wrote:
> nullify_over() calls brcond which destroys all temporaries.
> 
> Signed-off-by: Sven Schnelle 
> ---
>  target/hppa/translate.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/target/hppa/translate.c b/target/hppa/translate.c
> index b12525d535..c1b2822f60 100644
> --- a/target/hppa/translate.c
> +++ b/target/hppa/translate.c
> @@ -3404,10 +3404,6 @@ static bool do_depw_sar(DisasContext *ctx, unsigned 
> rt, unsigned c,
>  TCGv_reg mask, tmp, shift, dest;
>  unsigned msb = 1U << (len - 1);
>  
> -if (c) {
> -nullify_over(ctx);
> -}
> -
>  dest = dest_gpr(ctx, rt);
>  shift = tcg_temp_new();
>  tmp = tcg_temp_new();
> @@ -3440,11 +3436,17 @@ static bool do_depw_sar(DisasContext *ctx, unsigned 
> rt, unsigned c,
>  
>  static bool trans_depw_sar(DisasContext *ctx, arg_depw_sar *a)
>  {
> +if (a->c) {
> +nullify_over(ctx);
> +}
>  return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_gpr(ctx, a->r));
>  }
>  
>  static bool trans_depwi_sar(DisasContext *ctx, arg_depwi_sar *a)
>  {
> +if (a->c) {
> +nullify_over(ctx);
> +}

I don't see how this patch helps or change anything, isn't it the same?
You clean in the caller rather than the callee.

>  return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_const(ctx, 
> a->i));
>  }
>  
>

[Qemu-devel] [PATCH v2] virtio-blk: schedule virtio_notify_config to run on main context

2019-09-13 Thread Sergio Lopez

virtio_notify_config() needs to acquire the global mutex, which isn't
allowed from an iothread, and may lead to a deadlock like this:

 - main thead
  * Has acquired: qemu_global_mutex.
  * Is trying the acquire: iothread AioContext lock via
AIO_WAIT_WHILE (after aio_poll).

 - iothread
  * Has acquired: AioContext lock.
  * Is trying to acquire: qemu_global_mutex (via
virtio_notify_config->prepare_mmio_access).

If virtio_blk_resize() is called from an iothread, schedule
virtio_notify_config() to be run in the main context BH.

Signed-off-by: Sergio Lopez 
---
Changelog

v2:
 - Use aio_bh_schedule_oneshot instead of scheduling a coroutine
   (thanks Kevin Wolf).
 - Switch from RFC to v2 patch.
---
 hw/block/virtio-blk.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 18851601cb..669dc60f5b 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -16,6 +16,7 @@
 #include "qemu/iov.h"
 #include "qemu/module.h"
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
 #include "trace.h"
 #include "hw/block/block.h"
 #include "hw/qdev-properties.h"
@@ -1086,11 +1087,29 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+static void virtio_resize_cb(void *opaque)
+{
+VirtIODevice *vdev = opaque;
+
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+virtio_notify_config(vdev);
+}
+
 static void virtio_blk_resize(void *opaque)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
 
-virtio_notify_config(vdev);
+if (qemu_get_current_aio_context() != qemu_get_aio_context()) {
+/*
+ * virtio_notify_config() needs to acquire the global mutex,
+ * so it can't be called from an iothread. Instead, schedule
+ * it to be run in the main context BH.
+ */
+aio_bh_schedule_oneshot(qemu_get_aio_context(),
+virtio_resize_cb, vdev);
+} else {
+virtio_notify_config(vdev);
+}
 }
 
 static const BlockDevOps virtio_block_ops = {
-- 
2.21.0

Re: [Qemu-devel] [PATCH v3 3/3] qemu-iotests: Add test for bz #1745922

2019-09-13 Thread Max Reitz

On 13.09.19 00:37, Maxim Levitsky wrote:
> Signed-off-by: Maxim Levitsky 
> ---
>  tests/qemu-iotests/263 | 91 ++
>  tests/qemu-iotests/263.out | 40 +
>  tests/qemu-iotests/group   |  1 +
>  3 files changed, 132 insertions(+)
>  create mode 100755 tests/qemu-iotests/263
>  create mode 100644 tests/qemu-iotests/263.out
> 
> diff --git a/tests/qemu-iotests/263 b/tests/qemu-iotests/263
> new file mode 100755
> index 00..afbd668cda
> --- /dev/null
> +++ b/tests/qemu-iotests/263
> @@ -0,0 +1,91 @@
> +#!/usr/bin/env bash
> +#
> +# Test encrypted write that crosses cluster boundary of two unallocated 
> clusters
> +# Based on 188
> +#
> +# Copyright (C) 2019 Red Hat, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +#
> +
> +# creator
> +owner=mlevi...@redhat.com
> +
> +seq=`basename $0`
> +echo "QA output created by $seq"
> +
> +status=1 # failure is the default!
> +
> +_cleanup()
> +{
> + _cleanup_test_img
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +
> +_supported_fmt qcow2
> +_supported_proto generic
> +_supported_os Linux
> +
> +
> +size=1M
> +
> +SECRET="secret,id=sec0,data=astrochicken"
> +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
> +
> +
> +_run_test()
> +{
> + echo "== reading the whole image =="
> + $QEMU_IO --object $SECRET -c "read -P 0 0 $size" --image-opts $1 | 
> _filter_qemu_io | _filter_testdir
> +
> + echo
> + echo "== write two 512 byte sectors on a cluster boundary =="
> + $QEMU_IO --object $SECRET -c "write -P 0xAA 0xFE00 0x400" --image-opts 
> $1 | _filter_qemu_io | _filter_testdir
> +
> + echo
> + echo "== verify that the rest of the image is not changed =="
> + $QEMU_IO --object $SECRET -c "read -P 0x00 0x0 0xFE00" --image-opts 
> $1 | _filter_qemu_io | _filter_testdir
> + $QEMU_IO --object $SECRET -c "read -P 0xAA 0x0FE00 0x400" --image-opts 
> $1 | _filter_qemu_io | _filter_testdir
> + $QEMU_IO --object $SECRET -c "read -P 0x00 0x10200 0xEFE00" 
> --image-opts $1 | _filter_qemu_io | _filter_testdir

Looks good to me overall, but in case you respin, I don’t think there’s
reason not to quote the $1.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] target/arm: Fix sign-extension for SMLAL*

2019-09-13 Thread Philippe Mathieu-Daudé

On 9/12/19 8:30 PM, Richard Henderson wrote:
> The 32-bit product should be sign-extended, not zero-extended.
> 
> Fixes: ea96b374641b
> Reported-by: Laurent Desnogues 
> Signed-off-by: Richard Henderson 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  target/arm/translate.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 34bb280e3d..fd2f0e3048 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -8045,7 +8045,9 @@ static bool op_smlaxxx(DisasContext *s, arg_ *a,
>  case 2:
>  tl = load_reg(s, a->ra);
>  th = load_reg(s, a->rd);
> -t1 = tcg_const_i32(0);
> +/* Sign-extend the 32-bit product to 64 bits.  */
> +t1 = tcg_temp_new_i32();
> +tcg_gen_sari_i32(t1, t0, 31);
>  tcg_gen_add2_i32(tl, th, tl, th, t0, t1);
>  tcg_temp_free_i32(t0);
>  tcg_temp_free_i32(t1);
>

Re: [Qemu-devel] [PATCH] build: Don't ignore qapi-visit-core.c

2019-09-13 Thread Philippe Mathieu-Daudé

On 9/12/19 8:46 PM, Eric Blake wrote:
> This file is version-controlled, and not generated from a .json file.
> 
> Fixes: bf582c3461b
> Reported-by: Thomas Huth 

Good catch!

Reviewed-by: Philippe Mathieu-Daudé 

> Signed-off-by: Eric Blake 
> ---
>  .gitignore | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/.gitignore b/.gitignore
> index e9bbc006d39e..7de868d1eab4 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -41,6 +41,7 @@
>  /qapi/qapi-types-*.[ch]
>  /qapi/qapi-types.[ch]
>  /qapi/qapi-visit-*.[ch]
> +!/qapi/qapi-visit-core.c
>  /qapi/qapi-visit.[ch]
>  /qapi/qapi-doc.texi
>  /qemu-doc.html
>

Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part

2019-09-13 Thread Max Reitz

On 13.09.19 12:53, Vladimir Sementsov-Ogievskiy wrote:
> 13.09.2019 13:01, Kevin Wolf wrote:
>> Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Further patch will run partial requests of iterations of
>>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>>> this, separate part which may be parallelized into separate function
>>> (qcow2_co_preadv_task).
>>>
>>> While being here, also separate encrypted clusters reading to own
>>> function, like it is done for compressed reading.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> Reviewed-by: Max Reitz 
>>> ---
>>>   qapi/block-core.json |   2 +-
>>>   block/qcow2.c| 205 +++
>>>   2 files changed, 111 insertions(+), 96 deletions(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 0d43d4f37c..dd80aa11db 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -3266,7 +3266,7 @@
>>>   'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
>>>   'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
>>>   'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
>>> -'cor_write', 'cluster_alloc_space', 'none'] }
>>> +'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
>>
>> What's the point of this new blkdebug event?
>>
>> Obviously, read_aio for an encrypted image must mean a read of encrypted
>> data. The same image can never trigger both read_aio and
>> read_encrypted, so why do we need to distinguish them as two different
>> events?
>>
> 
> Seems I just done it looking at qcow2_co_preadv_compressed..
> 
> Anyway, I think you are right, so, I don't mind if Max drops this new event
> and use read_aio in his branch, or I can resend the series or send a 
> follow-up,
> whichever you prefer.

Should I squash this in?

diff --git a/qapi/block-core.json b/qapi/block-core.json
index d9ae73a43c..e6edd641f1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3264,7 +3264,7 @@
 'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
 'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
-'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
+'cor_write', 'cluster_alloc_space', 'none'] }

 ##
 # @BlkdebugIOType:
diff --git a/block/qcow2.c b/block/qcow2.c
index b5fe014b20..c07ce84d54 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2001,7 +2001,7 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
 return -ENOMEM;
 }

-BLKDBG_EVENT(bs->file, BLKDBG_READ_ENCRYPTED);
+BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
 ret = bdrv_co_pread(s->data_file,
 file_cluster_offset + offset_into_cluster(s,
offset),
 bytes, buf, 0);



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/2] target/hppa: prevent trashing of temporary in do_depw_sar()

2019-09-13 Thread Sven Schnelle

Hi Philippe,

On Fri, Sep 13, 2019 at 12:58:14PM +0200, Philippe Mathieu-Daudé wrote:
> Hi Sven,
> 
> On 9/13/19 12:17 PM, Sven Schnelle wrote:
> > nullify_over() calls brcond which destroys all temporaries.
> > 
> > Signed-off-by: Sven Schnelle 
> > ---
> >  target/hppa/translate.c | 10 ++
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/target/hppa/translate.c b/target/hppa/translate.c
> > index b12525d535..c1b2822f60 100644
> > --- a/target/hppa/translate.c
> > +++ b/target/hppa/translate.c
> > @@ -3404,10 +3404,6 @@ static bool do_depw_sar(DisasContext *ctx, unsigned 
> > rt, unsigned c,
> >  TCGv_reg mask, tmp, shift, dest;
> >  unsigned msb = 1U << (len - 1);
> >  
> > -if (c) {
> > -nullify_over(ctx);
> > -}
> > -
> >  dest = dest_gpr(ctx, rt);
> >  shift = tcg_temp_new();
> >  tmp = tcg_temp_new();
> > @@ -3440,11 +3436,17 @@ static bool do_depw_sar(DisasContext *ctx, unsigned 
> > rt, unsigned c,
> >  
> >  static bool trans_depw_sar(DisasContext *ctx, arg_depw_sar *a)
> >  {
> > +if (a->c) {
> > +nullify_over(ctx);
> > +}
> >  return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_gpr(ctx, 
> > a->r));
> >  }
> >  
> >  static bool trans_depwi_sar(DisasContext *ctx, arg_depwi_sar *a)
> >  {
> > +if (a->c) {
> > +nullify_over(ctx);
> > +}
> 
> I don't see how this patch helps or change anything, isn't it the same?
> You clean in the caller rather than the callee.

The Problem is that load_gpr()/load_const() allocate a temporary, which
gets destroyed in do_depw_sar() when nullify_over() is called. If we
move the nullify_over() before doing the load_gpr()/load_const() this
doesn't happen.

> >  return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_const(ctx, 
> > a->i));
> >  }
> >  
> >

Regards
Sven

Re: [Qemu-devel] [PATCH] blockdev: avoid acquiring AioContext lock twice at do_drive_backup()

2019-09-13 Thread Max Reitz

On 13.09.19 12:15, Sergio Lopez wrote:
> 
> Max Reitz  writes:
> 
>> On 13.09.19 11:37, Sergio Lopez wrote:
>>>
>>> Max Reitz  writes:
>>>
 On 12.09.19 18:16, Sergio Lopez wrote:
> do_drive_backup() acquires the AioContext lock of the corresponding
> BlockDriverState. This is not a problem when it's called from
> qmp_drive_backup(), but drive_backup_prepare() also acquires the lock
> before calling it.
>
> This change adds a BlockDriverState argument to do_drive_backup(),
> which is used to signal that the context lock is already acquired and
> to save a couple of redundant calls.

 But those redundant calls don’t really hurt (it’s just bdrv_lookup_bs(),
 as far as I can tell).  Wouldn’t it be simpler to just release the
 context lock in drive_backup_prepare() before calling do_drive_backup()?
  The BDS is drained anyway.
>>>
>>> Redundant calls rarely hurt, they're just redundant ;-)
>>
>> If they’re expensive and in a hot path, they hurt.
>>
 On top of that, do_backup_common() calls bdrv_try_set_aio_context() to
 bring the target into the source’s AioContext.  However, this function
 must be called with the old AioContext held, and the new context not held.
>>>
>>> Is this documented somewhere? I see nothing in the function declaration
>>> nor definition.
>>>
>>> I'm starting to get the feeling that the block layer is riddled with
>>> unwritten rules and assumptions that makes every change a lot harder
>>> than it should be.
>>
>> It is written, it’s just that it’s written in
>> bdrv_set_aio_context_ignore()’s definition.
>>
>> Yes, we should document it directly for bdrv_try_set_aio_context(), too,
>> because that’s what external callers are much more likely to use.
>>
 Currently, it’s called exactly the other way around: With the new
 context held, but the old one not held.

 So I think it indeed actually makes more sense to release the AioContext
 before calling do_drive_backup(), and to move the
 bdrv_try_set_aio_context() call for target_bs to the callers of
 do_backup_common() (where they have not yet taken the AioContext lock).
>>>
>>> OK. I see this also happens in external_snapshot_prepare() and
>>> qmp_drive_mirror() too. I guess we should fix these too.
>>>
>>> In qmp_drive_mirror(), would it be safe to delay the acquisition of any
>>> context until just before the blockdev_mirror_common()?
>>
>> From mirror’s perspective I think so, but I don’t think it’s safe to
>> access any of a BDS’s fields without having acquired its AioContext.
>> (In fact, I wonder whether we should acquire the context even before
>> bdrv_op_is_blocked()...)
> 
> In that case, I wonder if we can safely release the context to honor
> bdrv_try_set_aio_context() requirements, knowing we aren't in a drained
> section.

Hm.  I suppose it depends.  From the main context, it is OK to access
all fields that only the main context accesses.  So
bdrv_try_set_aio_context() should be safe because they do that.

I suppose the same goes for bdrv_op_is_blocked(), because op blockers
are only set up or removed in the main context.  So it should be safe as
it is after all.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH V2 2/2] block/nfs: add support for nfs_umount

2019-09-13 Thread Max Reitz

On 13.09.19 12:09, ronnie sahlberg wrote:
> On Wed, Sep 11, 2019 at 5:48 PM Max Reitz  wrote:
>>
>> On 10.09.19 17:41, Peter Lieven wrote:
>>> libnfs recently added support for unmounting. Add support
>>> in Qemu too.
>>>
>>> Signed-off-by: Peter Lieven 
>>> ---
>>>  block/nfs.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/block/nfs.c b/block/nfs.c
>>> index 2c98508275..f39acfdb28 100644
>>> --- a/block/nfs.c
>>> +++ b/block/nfs.c
>>> @@ -398,6 +398,9 @@ static void nfs_client_close(NFSClient *client)
>>>  nfs_close(client->context, client->fh);
>>>  client->fh = NULL;
>>>  }
>>> +#ifdef LIBNFS_FEATURE_UMOUNT
>>> +nfs_umount(client->context);
>>> +#endif
>>>  nfs_destroy_context(client->context);
>>>  client->context = NULL;
>>>  }
>>
>> I don’t understand what unmounting means in this context.  Is it just
>> generic clean-up for NFSv3 (it appears that it’s a no-op for NFSv4)?
>> Why isn’t that done by nfs_destroy_context()?
> 
> Umount is weird since there isn't actually any state in NFSv3 and
> "mounting" in nfsv3 is really just a matter of converting the path to
> be mounted into a filehandle.
> That is all the mount protocol is really used for.
> 
> This is all handled in a separate protocol/server called rpc.mountd
> that is separate from NFSd. Running as a different process and
> listening to a different port.
> And the only purpose of rpc.mountd is to take a path to a share and
> return a nfsv3 filehandle to the root of that path.
> As a side-effect, rpc.mountd also keeps track of which clients have
> called MNT but not yet UMNT and thus showmount -a
> can give a lost of all client that have "mounted" the share but not
> yet called "UMNT".
> 
> It has no effect at all on NFSv3 and is purely cosmetic. This ONLY
> affects "showmount -a" output.
> NFSv4 does away with all these separate protocols such as mount,
> statd, nlm and even portmapper
> so there is not even a concept of showmount -a for nfsv4.
> 
> 
> As the libnfs maintainer, why did I do nfs_umount() the way I did?
> First of all, I think of nfs UMNT as really just cosmetic and didn't
> want to put too much work into it. But people wanted it.
> 
> I implemented it as a sync function since I think few people would
> actually use it at all and it meant that I just didn't have to invest
> in having to build an async piupelinje.
> 
> I did NOT implement it inside nfs_destroy_context() since that
> function is supposed to be, in my view, non-blocking andn should just
> tear the connection and immediately return.
> As unmount would be
> * close the tcp socket to the nfs server
> * open new socket to portmapper and ask "where is rpc.mountd"
> * close socket to portmapper, then open new socket to rpc.mountd
>  * tell rpc.mountd to remove us from the showmount -a list
> * close socket
> 
> I just took the cheap and easy path. Do it as a sync function with my
> own eventloop.
> 
> Again, UMNT has no real effect on anything related to NFS except what
> showmount -a will return. That is one big reason why
> I was just not much motivated enough to build it as an async function.
> 
> Once we all switch to NFSv4 this will all be moot since the MOUNT
> protocol no longer exists and neither does rpc.mountd.

OK.  Thanks a lot for the detailed explanation! :-)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/2] target/hppa: prevent trashing of temporary in do_depw_sar()

2019-09-13 Thread Philippe Mathieu-Daudé

On 9/13/19 1:08 PM, Sven Schnelle wrote:
> Hi Philippe,
> 
> On Fri, Sep 13, 2019 at 12:58:14PM +0200, Philippe Mathieu-Daudé wrote:
>> Hi Sven,
>>
>> On 9/13/19 12:17 PM, Sven Schnelle wrote:
>>> nullify_over() calls brcond which destroys all temporaries.
>>>
>>> Signed-off-by: Sven Schnelle 
>>> ---
>>>  target/hppa/translate.c | 10 ++
>>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/target/hppa/translate.c b/target/hppa/translate.c
>>> index b12525d535..c1b2822f60 100644
>>> --- a/target/hppa/translate.c
>>> +++ b/target/hppa/translate.c
>>> @@ -3404,10 +3404,6 @@ static bool do_depw_sar(DisasContext *ctx, unsigned 
>>> rt, unsigned c,
>>>  TCGv_reg mask, tmp, shift, dest;
>>>  unsigned msb = 1U << (len - 1);
>>>  
>>> -if (c) {
>>> -nullify_over(ctx);
>>> -}
>>> -
>>>  dest = dest_gpr(ctx, rt);
>>>  shift = tcg_temp_new();
>>>  tmp = tcg_temp_new();
>>> @@ -3440,11 +3436,17 @@ static bool do_depw_sar(DisasContext *ctx, unsigned 
>>> rt, unsigned c,
>>>  
>>>  static bool trans_depw_sar(DisasContext *ctx, arg_depw_sar *a)
>>>  {
>>> +if (a->c) {
>>> +nullify_over(ctx);
>>> +}
>>>  return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_gpr(ctx, 
>>> a->r));
>>>  }
>>>  
>>>  static bool trans_depwi_sar(DisasContext *ctx, arg_depwi_sar *a)
>>>  {
>>> +if (a->c) {
>>> +nullify_over(ctx);
>>> +}
>>
>> I don't see how this patch helps or change anything, isn't it the same?
>> You clean in the caller rather than the callee.
> 
> The Problem is that load_gpr()/load_const() allocate a temporary, which
> gets destroyed in do_depw_sar() when nullify_over() is called. If we
> move the nullify_over() before doing the load_gpr()/load_const() this
> doesn't happen.

Ah! The 'val' argument... I missed that, thanks!

Maybe we can add a comment to make it clearer:

   if (a->c) {
   /*
* nullify here to not free the load_gpr() arg before
* calling depw_sar.
*/
   nullify_over(ctx);
   }
   return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_gpr(ctx, a->r));

(also in the other function).

Reviewed-by: Philippe Mathieu-Daudé 

>>>  return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_const(ctx, 
>>> a->i));
>>>  }
>>>  
>>>
> 
> Regards
> Sven
>

Re: [Qemu-devel] [Qemu-block] [PATCH v2 6/7] curl: Handle success in multi_check_completion

2019-09-13 Thread Max Reitz

On 10.09.19 18:13, Maxim Levitsky wrote:
> On Tue, 2019-09-10 at 14:41 +0200, Max Reitz wrote:
>> Background: As of cURL 7.59.0, it verifies that several functions are
>> not called from within a callback.  Among these functions is
>> curl_multi_add_handle().
>>
>> curl_read_cb() is a callback from cURL and not a coroutine.  Waking up
>> acb->co will lead to entering it then and there, which means the current
>> request will settle and the caller (if it runs in the same coroutine)
>> may then issue the next request.  In such a case, we will enter
>> curl_setup_preadv() effectively from within curl_read_cb().
>>
>> Calling curl_multi_add_handle() will then fail and the new request will
>> not be processed.
>>
>> Fix this by not letting curl_read_cb() wake up acb->co.  Instead, leave
>> the whole business of settling the AIOCB objects to
>> curl_multi_check_completion() (which is called from our timer callback
>> and our FD handler, so not from any cURL callbacks).
>>
>> Reported-by: Natalie Gavrielov 
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1740193
>> Cc: qemu-sta...@nongnu.org
>> Signed-off-by: Max Reitz 
>> ---
>>  block/curl.c | 69 ++--
>>  1 file changed, 29 insertions(+), 40 deletions(-)
>>
>> diff --git a/block/curl.c b/block/curl.c
>> index fd70f1ebc4..c343c7ed3d 100644
>> --- a/block/curl.c
>> +++ b/block/curl.c
>> @@ -229,7 +229,6 @@ static size_t curl_read_cb(void *ptr, size_t size, 
>> size_t nmemb, void *opaque)
>>  {
>>  CURLState *s = ((CURLState*)opaque);
>>  size_t realsize = size * nmemb;
>> -int i;
>>  
>>  trace_curl_read_cb(realsize);
>>  
>> @@ -245,32 +244,6 @@ static size_t curl_read_cb(void *ptr, size_t size, 
>> size_t nmemb, void *opaque)
>>  memcpy(s->orig_buf + s->buf_off, ptr, realsize);
>>  s->buf_off += realsize;
>>  
>> -for(i=0; i> -CURLAIOCB *acb = s->acb[i];
>> -
>> -if (!acb)
>> -continue;
>> -
>> -if ((s->buf_off >= acb->end)) {
>> -size_t request_length = acb->bytes;
>> -
>> -qemu_iovec_from_buf(acb->qiov, 0, s->orig_buf + acb->start,
>> -acb->end - acb->start);
>> -
>> -if (acb->end - acb->start < request_length) {
>> -size_t offset = acb->end - acb->start;
>> -qemu_iovec_memset(acb->qiov, offset, 0,
>> -  request_length - offset);
>> -}
>> -
>> -acb->ret = 0;
>> -s->acb[i] = NULL;
>> -qemu_mutex_unlock(&s->s->mutex);
>> -aio_co_wake(acb->co);
>> -qemu_mutex_lock(&s->s->mutex);
>> -}
>> -}
>> -
>>  read_end:
>>  /* curl will error out if we do not return this value */
>>  return size * nmemb;
>> @@ -351,13 +324,14 @@ static void curl_multi_check_completion(BDRVCURLState 
>> *s)
>>  break;
>>  
>>  if (msg->msg == CURLMSG_DONE) {
>> +int i;
>>  CURLState *state = NULL;
>> +bool error = msg->data.result != CURLE_OK;
>> +
>>  curl_easy_getinfo(msg->easy_handle, CURLINFO_PRIVATE,
>>(char **)&state);
>>  
>> -/* ACBs for successful messages get completed in curl_read_cb */
>> -if (msg->data.result != CURLE_OK) {
>> -int i;
>> +if (error) {
>>  static int errcount = 100;
>>  
>>  /* Don't lose the original error message from curl, since
>> @@ -369,20 +343,35 @@ static void curl_multi_check_completion(BDRVCURLState 
>> *s)
>>  error_report("curl: further errors suppressed");
>>  }
>>  }
>> +}
>>  
>> -for (i = 0; i < CURL_NUM_ACB; i++) {
>> -CURLAIOCB *acb = state->acb[i];
>> +for (i = 0; i < CURL_NUM_ACB; i++) {
>> +CURLAIOCB *acb = state->acb[i];
>>  
>> -if (acb == NULL) {
>> -continue;
>> -}
>> +if (acb == NULL) {
>> +continue;
>> +}
>> +
>> +if (!error) {
>> +/* Assert that we have read all data */
>> +assert(state->buf_off >= acb->end);
>> +
>> +qemu_iovec_from_buf(acb->qiov, 0,
>> +state->orig_buf + acb->start,
>> +acb->end - acb->start);
>>  
>> -acb->ret = -EIO;
>> -state->acb[i] = NULL;
>> -qemu_mutex_unlock(&s->mutex);
>> -aio_co_wake(acb->co);
>> -qemu_mutex_lock(&s->mutex);
>> +if (acb->end - acb->start < acb->bytes) {
>> +size_t offset = acb->end - acb->start;
>> +qemu_io

Re: [Qemu-devel] [PATCH v3] blockjob: update nodes head while removing all bdrv

2019-09-13 Thread Max Reitz

On 11.09.19 12:03, Max Reitz wrote:
> From: Sergio Lopez 
> 
> block_job_remove_all_bdrv() iterates through job->nodes, calling
> bdrv_root_unref_child() for each entry. The call to the latter may
> reach child_job_[can_]set_aio_ctx(), which will also attempt to
> traverse job->nodes, potentially finding entries that where freed
> on previous iterations.
> 
> To avoid this situation, update job->nodes head on each iteration to
> ensure that already freed entries are no longer linked to the list.
> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1746631
> Signed-off-by: Sergio Lopez 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Max Reitz 
> ---
> v3:
> - Rewrote the loop to make the whole function a bit simpler
>   (Also, remove the node from the job->nodes list before unref'ing it,
>   just to be extra-safe)
> ---
>  blockjob.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)

Thanks Sergio for tracking down the bug’s cause, the original patch, and
the review; I’ve applied the patch to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] build: Don't ignore qapi-visit-core.c

2019-09-13 Thread Eric Blake

cc: qemu-trivial

On 9/12/19 1:46 PM, Eric Blake wrote:
> This file is version-controlled, and not generated from a .json file.
> 
> Fixes: bf582c3461b
> Reported-by: Thomas Huth 
> Signed-off-by: Eric Blake 
> ---
>  .gitignore | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/.gitignore b/.gitignore
> index e9bbc006d39e..7de868d1eab4 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -41,6 +41,7 @@
>  /qapi/qapi-types-*.[ch]
>  /qapi/qapi-types.[ch]
>  /qapi/qapi-visit-*.[ch]
> +!/qapi/qapi-visit-core.c
>  /qapi/qapi-visit.[ch]
>  /qapi/qapi-doc.texi
>  /qemu-doc.html
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part

2019-09-13 Thread Kevin Wolf

Am 13.09.2019 um 13:06 hat Max Reitz geschrieben:
> On 13.09.19 12:53, Vladimir Sementsov-Ogievskiy wrote:
> > 13.09.2019 13:01, Kevin Wolf wrote:
> >> Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>> Further patch will run partial requests of iterations of
> >>> qcow2_co_preadv in parallel for performance reasons. To prepare for
> >>> this, separate part which may be parallelized into separate function
> >>> (qcow2_co_preadv_task).
> >>>
> >>> While being here, also separate encrypted clusters reading to own
> >>> function, like it is done for compressed reading.
> >>>
> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >>> Reviewed-by: Max Reitz 
> >>> ---
> >>>   qapi/block-core.json |   2 +-
> >>>   block/qcow2.c| 205 +++
> >>>   2 files changed, 111 insertions(+), 96 deletions(-)
> >>>
> >>> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >>> index 0d43d4f37c..dd80aa11db 100644
> >>> --- a/qapi/block-core.json
> >>> +++ b/qapi/block-core.json
> >>> @@ -3266,7 +3266,7 @@
> >>>   'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
> >>>   'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
> >>>   'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
> >>> -'cor_write', 'cluster_alloc_space', 'none'] }
> >>> +'cor_write', 'cluster_alloc_space', 'none', 
> >>> 'read_encrypted'] }
> >>
> >> What's the point of this new blkdebug event?
> >>
> >> Obviously, read_aio for an encrypted image must mean a read of encrypted
> >> data. The same image can never trigger both read_aio and
> >> read_encrypted, so why do we need to distinguish them as two different
> >> events?
> >>
> > 
> > Seems I just done it looking at qcow2_co_preadv_compressed..
> > 
> > Anyway, I think you are right, so, I don't mind if Max drops this new event
> > and use read_aio in his branch, or I can resend the series or send a 
> > follow-up,
> > whichever you prefer.
> 
> Should I squash this in?

Looks good to me.



signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-block] [PATCH v2 6/7] curl: Handle success in multi_check_completion

2019-09-13 Thread Maxim Levitsky

On Fri, 2019-09-13 at 13:20 +0200, Max Reitz wrote:
> On 10.09.19 18:13, Maxim Levitsky wrote:
> > On Tue, 2019-09-10 at 14:41 +0200, Max Reitz wrote:
> > > Background: As of cURL 7.59.0, it verifies that several functions are
> > > not called from within a callback.  Among these functions is
> > > curl_multi_add_handle().
> > > 
> > > curl_read_cb() is a callback from cURL and not a coroutine.  Waking up
> > > acb->co will lead to entering it then and there, which means the current
> > > request will settle and the caller (if it runs in the same coroutine)
> > > may then issue the next request.  In such a case, we will enter
> > > curl_setup_preadv() effectively from within curl_read_cb().
> > > 
> > > Calling curl_multi_add_handle() will then fail and the new request will
> > > not be processed.
> > > 
> > > Fix this by not letting curl_read_cb() wake up acb->co.  Instead, leave
> > > the whole business of settling the AIOCB objects to
> > > curl_multi_check_completion() (which is called from our timer callback
> > > and our FD handler, so not from any cURL callbacks).
> > > 
> > > Reported-by: Natalie Gavrielov 
> > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1740193
> > > Cc: qemu-sta...@nongnu.org
> > > Signed-off-by: Max Reitz 
> > > ---
> > >  block/curl.c | 69 ++--
> > >  1 file changed, 29 insertions(+), 40 deletions(-)
> > > 
> > > diff --git a/block/curl.c b/block/curl.c
> > > index fd70f1ebc4..c343c7ed3d 100644
> > > --- a/block/curl.c
> > > +++ b/block/curl.c
> > > @@ -229,7 +229,6 @@ static size_t curl_read_cb(void *ptr, size_t size, 
> > > size_t nmemb, void *opaque)
> > >  {
> > >  CURLState *s = ((CURLState*)opaque);
> > >  size_t realsize = size * nmemb;
> > > -int i;
> > >  
> > >  trace_curl_read_cb(realsize);
> > >  
> > > @@ -245,32 +244,6 @@ static size_t curl_read_cb(void *ptr, size_t size, 
> > > size_t nmemb, void *opaque)
> > >  memcpy(s->orig_buf + s->buf_off, ptr, realsize);
> > >  s->buf_off += realsize;
> > >  
> > > -for(i=0; i > > -CURLAIOCB *acb = s->acb[i];
> > > -
> > > -if (!acb)
> > > -continue;
> > > -
> > > -if ((s->buf_off >= acb->end)) {
> > > -size_t request_length = acb->bytes;
> > > -
> > > -qemu_iovec_from_buf(acb->qiov, 0, s->orig_buf + acb->start,
> > > -acb->end - acb->start);
> > > -
> > > -if (acb->end - acb->start < request_length) {
> > > -size_t offset = acb->end - acb->start;
> > > -qemu_iovec_memset(acb->qiov, offset, 0,
> > > -  request_length - offset);
> > > -}
> > > -
> > > -acb->ret = 0;
> > > -s->acb[i] = NULL;
> > > -qemu_mutex_unlock(&s->s->mutex);
> > > -aio_co_wake(acb->co);
> > > -qemu_mutex_lock(&s->s->mutex);
> > > -}
> > > -}
> > > -
> > >  read_end:
> > >  /* curl will error out if we do not return this value */
> > >  return size * nmemb;
> > > @@ -351,13 +324,14 @@ static void 
> > > curl_multi_check_completion(BDRVCURLState *s)
> > >  break;
> > >  
> > >  if (msg->msg == CURLMSG_DONE) {
> > > +int i;
> > >  CURLState *state = NULL;
> > > +bool error = msg->data.result != CURLE_OK;
> > > +
> > >  curl_easy_getinfo(msg->easy_handle, CURLINFO_PRIVATE,
> > >(char **)&state);
> > >  
> > > -/* ACBs for successful messages get completed in 
> > > curl_read_cb */
> > > -if (msg->data.result != CURLE_OK) {
> > > -int i;
> > > +if (error) {
> > >  static int errcount = 100;
> > >  
> > >  /* Don't lose the original error message from curl, since
> > > @@ -369,20 +343,35 @@ static void 
> > > curl_multi_check_completion(BDRVCURLState *s)
> > >  error_report("curl: further errors suppressed");
> > >  }
> > >  }
> > > +}
> > >  
> > > -for (i = 0; i < CURL_NUM_ACB; i++) {
> > > -CURLAIOCB *acb = state->acb[i];
> > > +for (i = 0; i < CURL_NUM_ACB; i++) {
> > > +CURLAIOCB *acb = state->acb[i];
> > >  
> > > -if (acb == NULL) {
> > > -continue;
> > > -}
> > > +if (acb == NULL) {
> > > +continue;
> > > +}
> > > +
> > > +if (!error) {
> > > +/* Assert that we have read all data */
> > > +assert(state->buf_off >= acb->end);
> > > +
> > > +qemu_iovec_from_buf(acb->qiov, 0,
> > > +state->orig_buf + acb->start,
> > > +acb->end - acb

Re: [Qemu-devel] [PATCH v2 0/7] block/curl: Fix hang and potential crash

2019-09-13 Thread Max Reitz

On 10.09.19 14:41, Max Reitz wrote:
> Hi,
> 
> As reported in https://bugzilla.redhat.com/show_bug.cgi?id=1740193, our
> curl block driver can spontaneously hang.  This becomes visible e.g.
> when reading compressed qcow2 images:
> 
> $ qemu-img convert -p -O raw -n \
>   https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img \
>   null-co://
> 
> (Hangs at 74.21 %, usually.)
> 
> A more direct way is:
> 
> $ qemu-img bench -f raw http://download.qemu.org/qemu-4.1.0.tar.xz \
> -d 1 -S 524288 -c 2
> 
> (Which simply performs two requests, and the second one hangs.  You can
> use any HTTP resource (probably FTP, too) you’d like that is at least
> 1 MB in size.)
> 
> It turns out that this is because cURL 7.59.0 has added a protective
> feature against some misuse we had in our code: curl_multi_add_handle()
> must not be called from within a cURL callback, but in some cases we
> did.  As of 7.59.0, this fails, our new request is not registered and
> the I/O request stalls.  This is fixed by patch 6.
> 
> Patch 7 makes us check for curl_multi_add_handle()’s return code,
> because if we had done that before, debugging would have been much
> simpler.
> 
> 
> On the way to fixing it, I had a look over the whole cURL code and found
> a suspicious QLIST_FOREACH_SAFE() loop that actually does not seem very
> safe at all.  I think this may lead to crashes, although I have never
> seen any myself.  https://bugzilla.redhat.com/show_bug.cgi?id=1744602#c5
> shows one in exactly the function in question, so I think it actually is
> a problem.
> 
> This is fixed by patch 5, patches 1, 2, and 4 prepare for it.
> 
> (Patch 3 is kind of a misc patch that should ensure that we always end
> up calling curl_multi_check_completion() whenever a request might have
> been completed.)

Thanks for the review, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 00/14] block: Try to create well-typed json:{} filenames

2019-09-13 Thread Max Reitz

Another gentle ping.

Max


On 24.06.19 19:39, Max Reitz wrote:
> Hi,
> 
> There are two explanations of this cover letter, a relative one (to v3)
> and an absolute one.
> 
> 
> *** Important note ***
> 
> The final patch in this series is an example that converts most of
> block-core.json to use default values where possible.  We may decide to
> take it or not.  It isn’t important for the main purpose of this series,
> so I’d be very much fine with chopping it off.
> 
> (It does have a nice diff stat, though.)
> 
> *** Important note end ***
> 
> 
> Relative explanation:
> 
> The actual functional goal of this series is to allow all blockdev
> options that can be represented with -drive to have an equivalent with
> -blockdev (safe for rbd’s =keyvalue-pairs).
> 
> To this end, qcow(2)’s encryption needs an “auto” format which can
> automatically deduce the format from the image header.  To make things
> nicer, I decided (already in v1) to make this format optional so users
> could just specify encrypt.secret and let the format driver figure out
> the rest.
> 
> Until v3, this was implemented by letting the discriminator of flat
> unions be optional, as long as a default-value is provided.  Markus
> (rightfully) complained that this is very specific and would be covered
> by just having default values for QAPI struct members in general.
> So now this version implements this.  This is a bit more complicated
> than just implementing a default-variant, mainly because the latter only
> needs to accept enum values, whereas a generally usable “default” should
> accept values of all QAPI types (to the extent what is reasonable).
> 
> So what was (until v3)
> 
>   { 'union': 'Foo',
> 'base': { '*discr': 'SomeEnum' },
> 'discriminator': 'discr',
> 'default-variant': 'value1',
> 'data': { 'value1': 'Bar', 'value2': 'Baz' } }
> 
> becomes
> 
>   { 'union': 'Foo',
> 'base': { '*discr': { 'type': 'SomeEnum', 'default': 'value1' } },
> 'discriminator': 'discr',
> 'data': { 'value1': 'Bar', 'value2': 'Baz' } }
> 
> 
> 
> Absolute explanation:
> 
> When qemu reports json:{} filename, it just uses whatever type you gave
> an option in.  With -drive, all options are strings and they do not have
> to pass the test of the typing firewall of the QAPI schema, so you just
> get strings thrown back at you even if that does not match the schema.
> (Also, if you use json:{} yourself, you’re free to give the options as
> strings as well.)
> 
> Example:
> 
> $ ./qemu-img info --image-opts driver=raw,size=512,file.driver=null-co
> image: json:{"driver": "raw", "size": "512", "file": {"driver": "null-co"}}
> 
> @size is supposed to be an integer, according to the schema, so the
> correct result would be (which is what you get after this series):
> 
> $ ./qemu-img info --image-opts driver=raw,size=512,file.driver=null-co
> image: json:{"driver": "raw", "size": 512, "file": {"driver": "null-co"}}
> 
> 
> This is achieved by patch 11, which makes bdrv_refresh_filename() run
> the options through the flat-confused input visitor, and then through
> the output visitor to get all to the correct type.  If anything fails,
> the result is as before (hence the “Try” in the title).
> 
> There are cases where this cannot work.  Those are the ones where -drive
> accepts something that is not allowed by the QAPI schema.  One of these
> cases is rbd’s =keyvalue-pairs, which is just broken altogether, so
> let’s simply ignore that.  (I don’t think anybody’s going to complain
> that the json:{} filename they get is not correctly typed after they’ve
> used that option.)
> 
> The other case (I know of) is qcow(2)’s encryption.  In the QAPI schema,
> encrypt.format is not optional because it is the discriminator for which
> kind of options to use.  However, for -drive, it is optional because the
> qcow2 driver can infer the encryption format from the image header.
> 
> The solution that’s proposed by this series is to make flat union
> discriminators optional and provide a default.  This is accomplished by
> generally allowing default values to be provided for QAPI struct
> members.
> 
> Both AES and LUKS encryption allow only a key-secret option, so we can
> add a new pseudo-format “auto” that accepts exactly that option and
> makes the qcow2 driver read the real format from the image header.  This
> pseudo-format is made the default for encrypt.format, and thus you can
> then specify encrypt.key-secret without having to specify
> encrypt.format (while still adhering to the QAPI schema).
> 
> 
> So, in this series:
> - The QAPI code generator is modified to allow default values for
>   optional struct members.  This in turn allows flat union
>   discriminators be optional, too, but only if a default value is
>   provided.
>   - Accordingly, documentation, tests, and introspection are adjusted.
> 
> - This is used to make qcow’s and qcow2’s encrypt.format parameter
>   optional.  It now defaults to “from-image” which is a new
>

Re: [Qemu-devel] [PATCH v3 3/3] qemu-iotests: Add test for bz #1745922

2019-09-13 Thread Maxim Levitsky

On Fri, 2019-09-13 at 13:01 +0200, Max Reitz wrote:
> On 13.09.19 00:37, Maxim Levitsky wrote:
> > Signed-off-by: Maxim Levitsky 
> > ---
> >  tests/qemu-iotests/263 | 91 ++
> >  tests/qemu-iotests/263.out | 40 +
> >  tests/qemu-iotests/group   |  1 +
> >  3 files changed, 132 insertions(+)
> >  create mode 100755 tests/qemu-iotests/263
> >  create mode 100644 tests/qemu-iotests/263.out
> > 
> > diff --git a/tests/qemu-iotests/263 b/tests/qemu-iotests/263
> > new file mode 100755
> > index 00..afbd668cda
> > --- /dev/null
> > +++ b/tests/qemu-iotests/263
> > @@ -0,0 +1,91 @@
> > +#!/usr/bin/env bash
> > +#
> > +# Test encrypted write that crosses cluster boundary of two unallocated 
> > clusters
> > +# Based on 188
> > +#
> > +# Copyright (C) 2019 Red Hat, Inc.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 2 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see .
> > +#
> > +
> > +# creator
> > +owner=mlevi...@redhat.com
> > +
> > +seq=`basename $0`
> > +echo "QA output created by $seq"
> > +
> > +status=1   # failure is the default!
> > +
> > +_cleanup()
> > +{
> > +   _cleanup_test_img
> > +}
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +# get standard environment, filters and checks
> > +. ./common.rc
> > +. ./common.filter
> > +
> > +_supported_fmt qcow2
> > +_supported_proto generic
> > +_supported_os Linux
> > +
> > +
> > +size=1M
> > +
> > +SECRET="secret,id=sec0,data=astrochicken"
> > +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
> > +
> > +
> > +_run_test()
> > +{
> > +   echo "== reading the whole image =="
> > +   $QEMU_IO --object $SECRET -c "read -P 0 0 $size" --image-opts $1 | 
> > _filter_qemu_io | _filter_testdir
> > +
> > +   echo
> > +   echo "== write two 512 byte sectors on a cluster boundary =="
> > +   $QEMU_IO --object $SECRET -c "write -P 0xAA 0xFE00 0x400" --image-opts 
> > $1 | _filter_qemu_io | _filter_testdir
> > +
> > +   echo
> > +   echo "== verify that the rest of the image is not changed =="
> > +   $QEMU_IO --object $SECRET -c "read -P 0x00 0x0 0xFE00" --image-opts 
> > $1 | _filter_qemu_io | _filter_testdir
> > +   $QEMU_IO --object $SECRET -c "read -P 0xAA 0x0FE00 0x400" --image-opts 
> > $1 | _filter_qemu_io | _filter_testdir
> > +   $QEMU_IO --object $SECRET -c "read -P 0x00 0x10200 0xEFE00" 
> > --image-opts $1 | _filter_qemu_io | _filter_testdir
> 
> Looks good to me overall, but in case you respin, I don’t think there’s
> reason not to quote the $1.
> 
> Max
> 
Fixed, good idea.

Best regards,
Maxim Levitsky

Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part

2019-09-13 Thread Max Reitz

On 13.09.19 13:34, Kevin Wolf wrote:
> Am 13.09.2019 um 13:06 hat Max Reitz geschrieben:
>> On 13.09.19 12:53, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.09.2019 13:01, Kevin Wolf wrote:
 Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Further patch will run partial requests of iterations of
> qcow2_co_preadv in parallel for performance reasons. To prepare for
> this, separate part which may be parallelized into separate function
> (qcow2_co_preadv_task).
>
> While being here, also separate encrypted clusters reading to own
> function, like it is done for compressed reading.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Max Reitz 
> ---
>   qapi/block-core.json |   2 +-
>   block/qcow2.c| 205 +++
>   2 files changed, 111 insertions(+), 96 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0d43d4f37c..dd80aa11db 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3266,7 +3266,7 @@
>   'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
>   'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
>   'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
> -'cor_write', 'cluster_alloc_space', 'none'] }
> +'cor_write', 'cluster_alloc_space', 'none', 
> 'read_encrypted'] }

 What's the point of this new blkdebug event?

 Obviously, read_aio for an encrypted image must mean a read of encrypted
 data. The same image can never trigger both read_aio and
 read_encrypted, so why do we need to distinguish them as two different
 events?

>>>
>>> Seems I just done it looking at qcow2_co_preadv_compressed..
>>>
>>> Anyway, I think you are right, so, I don't mind if Max drops this new event
>>> and use read_aio in his branch, or I can resend the series or send a 
>>> follow-up,
>>> whichever you prefer.
>>
>> Should I squash this in?
> 
> Looks good to me.

OK, done.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 1/5] rcu: Add automatically released rcu_read_lock variants

2019-09-13 Thread Paolo Bonzini

On 13/09/19 12:25, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> RCU_READ_LOCK_GUARD() takes the rcu_read_lock and then uses glib's
> g_auto infrastructure (and thus whatever the compiler's hooks are) to
> release it on all exits of the block.
> 
> WITH_RCU_READ_LOCK_GUARD() is similar but is used as a wrapper for the
> lock, i.e.:
> 
>WITH_RCU_READ_LOCK_GUARD() {
>stuff under lock
>}
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  docs/devel/rcu.txt | 16 
>  include/qemu/rcu.h | 25 +
>  2 files changed, 41 insertions(+)
> 
> diff --git a/docs/devel/rcu.txt b/docs/devel/rcu.txt
> index c84e7f42b2..d83fed2f79 100644
> --- a/docs/devel/rcu.txt
> +++ b/docs/devel/rcu.txt
> @@ -187,6 +187,22 @@ The following APIs must be used before RCU is used in a 
> thread:
>  Note that these APIs are relatively heavyweight, and should _not_ be
>  nested.
>  
> +Convenience macros
> +==
> +
> +Two macros are provided that automatically release the read lock at the
> +end of the scope.
> +
> +  RCU_READ_LOCK_GUARD()
> +
> + Takes the lock and will release it at the end of the block it's
> + used in.
> +
> +  WITH_RCU_READ_LOCK_GUARD()  { code }
> +
> + Is used at the head of a block to protect the code within the block.
> +
> +Note that 'goto'ing out of the guarded block will also drop the lock.
>  
>  DIFFERENCES WITH LINUX
>  ==
> diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
> index 22876d1428..3a8d4cf28b 100644
> --- a/include/qemu/rcu.h
> +++ b/include/qemu/rcu.h
> @@ -154,6 +154,31 @@ extern void call_rcu1(struct rcu_head *head, RCUCBFunc 
> *func);
>}),\
>(RCUCBFunc *)g_free);
>  
> +typedef void RCUReadAuto;
> +static inline RCUReadAuto *rcu_read_auto_lock(void)
> +{
> +rcu_read_lock();
> +/* Anything non-NULL causes the cleanup function to be called */
> +return (void *)(uintptr_t)0x1;
> +}
> +
> +static inline void rcu_read_auto_unlock(RCUReadAuto *r)
> +{
> +rcu_read_unlock();
> +}
> +
> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(RCUReadAuto, rcu_read_auto_unlock)
> +
> +#define WITH_RCU_READ_LOCK_GUARD() \
> +WITH_RCU_READ_LOCK_GUARD_(_rcu_read_auto##__COUNTER__)
> +
> +#define WITH_RCU_READ_LOCK_GUARD_(var) \
> +for (g_autoptr(RCUReadAuto) var = rcu_read_auto_lock(); \
> +(var); rcu_read_auto_unlock(var), (var) = NULL)
> +
> +#define RCU_READ_LOCK_GUARD() \
> +g_autoptr(RCUReadAuto) _rcu_read_auto = rcu_read_auto_lock()
> +
>  #ifdef __cplusplus
>  }
>  #endif
> 

Acked-by: Paolo Bonzini

[Qemu-devel] [PATCH v3] virtio-mmio: implement modern (v2) personality (virtio-1)

2019-09-13 Thread Sergio Lopez

Implement the modern (v2) personality, according to the VirtIO 1.0
specification.

Support for v2 among guests is not as widespread as it'd be
desirable. While the Linux driver has had it for a while, support is
missing, at least, from Tianocore EDK II, NetBSD and FreeBSD.

For this reason, the v2 personality is disabled, keeping the legacy
behavior as default. Machine types willing to use v2, can enable it
using MachineClass's compat_props.

Signed-off-by: Sergio Lopez 
---
Changelog:

v3:
 - Use %HWADDR_PRIx instead of %x. (Stefan Hajnoczi)
 - Return 0 if host_features_sel > 0 for legacy mode. (Cornelia Huck)
 - Mask out legacy features in non-legacy mode. (Cornelia Huck)
 - Log an error in guest attempts to write guest_features with
   guest_features_sel > 0 in legacy mode. (Cornelia Huck)

v2:
 - Switch from RFC to PATCH.
 - Avoid the modern vs. legacy dichotomy. Use legacy or non-legacy
   instead. (Andrea Bolognani, Cornelia Huck)
 - Include the register offset in the warning messages. (Stefan
   Hajnoczi)
 - Fix device endianness for the non-legacy mode. (Michael S. Tsirkin)
 - Honor the specs in VIRTIO_MMIO_QUEUE_READY. (Michael S. Tsirkin)
---
 hw/virtio/virtio-mmio.c | 342 +---
 1 file changed, 319 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 97b7f35496..3578ae37be 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -47,14 +47,24 @@
 OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
 
 #define VIRT_MAGIC 0x74726976 /* 'virt' */
-#define VIRT_VERSION 1
+#define VIRT_VERSION 2
+#define VIRT_VERSION_LEGACY 1
 #define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
 
+typedef struct VirtIOMMIOQueue {
+uint16_t num;
+bool enabled;
+uint32_t desc[2];
+uint32_t avail[2];
+uint32_t used[2];
+} VirtIOMMIOQueue;
+
 typedef struct {
 /* Generic */
 SysBusDevice parent_obj;
 MemoryRegion iomem;
 qemu_irq irq;
+bool legacy;
 /* Guest accessible state needing migration and reset */
 uint32_t host_features_sel;
 uint32_t guest_features_sel;
@@ -62,6 +72,9 @@ typedef struct {
 /* virtio-bus */
 VirtioBusState bus;
 bool format_transport_address;
+/* Fields only used for non-legacy (v2) devices */
+uint32_t guest_features[2];
+VirtIOMMIOQueue vqs[VIRTIO_QUEUE_MAX];
 } VirtIOMMIOProxy;
 
 static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
@@ -115,7 +128,11 @@ static uint64_t virtio_mmio_read(void *opaque, hwaddr 
offset, unsigned size)
 case VIRTIO_MMIO_MAGIC_VALUE:
 return VIRT_MAGIC;
 case VIRTIO_MMIO_VERSION:
-return VIRT_VERSION;
+if (proxy->legacy) {
+return VIRT_VERSION_LEGACY;
+} else {
+return VIRT_VERSION;
+}
 case VIRTIO_MMIO_VENDOR_ID:
 return VIRT_VENDOR;
 default:
@@ -146,28 +163,64 @@ static uint64_t virtio_mmio_read(void *opaque, hwaddr 
offset, unsigned size)
 case VIRTIO_MMIO_MAGIC_VALUE:
 return VIRT_MAGIC;
 case VIRTIO_MMIO_VERSION:
-return VIRT_VERSION;
+if (proxy->legacy) {
+return VIRT_VERSION_LEGACY;
+} else {
+return VIRT_VERSION;
+}
 case VIRTIO_MMIO_DEVICE_ID:
 return vdev->device_id;
 case VIRTIO_MMIO_VENDOR_ID:
 return VIRT_VENDOR;
 case VIRTIO_MMIO_DEVICE_FEATURES:
-if (proxy->host_features_sel) {
-return 0;
+if (proxy->legacy) {
+if (proxy->host_features_sel) {
+return 0;
+} else {
+return vdev->host_features;
+}
+} else {
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+return (vdev->host_features & ~vdc->legacy_features)
+>> (32 * proxy->host_features_sel);
 }
-return vdev->host_features;
 case VIRTIO_MMIO_QUEUE_NUM_MAX:
 if (!virtio_queue_get_num(vdev, vdev->queue_sel)) {
 return 0;
 }
 return VIRTQUEUE_MAX_SIZE;
 case VIRTIO_MMIO_QUEUE_PFN:
+if (!proxy->legacy) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: read from legacy register (0x%"
+  HWADDR_PRIx ") in non-legacy mode\n",
+  __func__, offset);
+return 0;
+}
 return virtio_queue_get_addr(vdev, vdev->queue_sel)
 >> proxy->guest_page_shift;
+case VIRTIO_MMIO_QUEUE_READY:
+if (proxy->legacy) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: read from non-legacy register (0x%"
+  HWADDR_PRIx ") in legacy mode\n",
+  __func__, offset);
+return 0;
+}
+return proxy->vqs[vdev->queue_sel].enabled;
 case VIRTIO_MMIO_INTERRUPT_STATUS:
 return atomic_read(

[Qemu-devel] [PATCH] MAINTAINERS: update my email address

2019-09-13 Thread Paul Durrant

My Citrix email address will expire shortly.

Signed-off-by: Paul Durrant 
---
Cc: Anthony Perard 
Cc: Stefano Stabellini 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50eaf005f4..3cabb9e449 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -406,7 +406,7 @@ Guest CPU Cores (Xen)
 X86 Xen CPUs
 M: Stefano Stabellini 
 M: Anthony Perard 
-M: Paul Durrant 
+M: Paul Durrant 
 L: xen-de...@lists.xenproject.org
 S: Supported
 F: */xen*
-- 
2.20.1.2.gb21ebb6

Re: [Qemu-devel] [PATCH 2/2] roms/Makefile.edk2: don't pull in submodules when building from tarball

2019-09-13 Thread Laszlo Ersek

On 09/13/19 01:12, Michael Roth wrote:
> Currently the `make efi` target pulls submodules nested under the
> roms/edk2 submodule as dependencies. However, when we attempt to build
> from a tarball this fails since we are no longer in a git tree.
> 
> A preceding patch will pre-populate these submodules in the tarball,
> so assume this build dependency is only needed when building from a
> git tree.
> 
> Reported-by: Bruce Rogers 
> Cc: Laszlo Ersek 
> Cc: Bruce Rogers 
> Cc: qemu-sta...@nongnu.org # v4.1.0
> Signed-off-by: Michael Roth 
> ---
>  roms/Makefile.edk2 | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/roms/Makefile.edk2 b/roms/Makefile.edk2
> index c2f2ff59d5..33a074d3a4 100644
> --- a/roms/Makefile.edk2
> +++ b/roms/Makefile.edk2
> @@ -46,8 +46,13 @@ all: $(foreach 
> flashdev,$(flashdevs),../pc-bios/edk2-$(flashdev).fd.bz2) \
>  # files.
>  .INTERMEDIATE: $(foreach 
> flashdev,$(flashdevs),../pc-bios/edk2-$(flashdev).fd)
>  
> +# Fetch edk2 submodule's submodules. If it is not in a git tree, assume
> +# we're building from a tarball and that they've already been fetched by
> +# make-release/tarball scripts.
>  submodules:
> - cd edk2 && git submodule update --init --force
> + if test -d edk2/.git; then \
> + cd edk2 && git submodule update --init --force; \
> + fi
>  
>  # See notes on the ".NOTPARALLEL" target and the "+" indicator in
>  # "tests/uefi-test-tools/Makefile".
> 

Reviewed-by: Laszlo Ersek 

Thank you,
Laszlo

Re: [Qemu-devel] [PATCH v2 16/28] s390x/tcg: Fault-safe memmove

2019-09-13 Thread David Hildenbrand

On 12.09.19 00:03, Richard Henderson wrote:
> On 9/6/19 3:57 AM, David Hildenbrand wrote:
>> +static void access_memmove_idx(CPUS390XState *env, vaddr dest, vaddr src,
>> +   int size, int dest_idx, int src_idx,
>> +   uintptr_t ra)
>> +{
>> +S390Access srca = access_prepare_idx(env, src, size, MMU_DATA_LOAD, 
>> src_idx,
>> + ra);
>> +S390Access desta = access_prepare_idx(env, dest, size, MMU_DATA_STORE,
>> +  dest_idx, ra);
> 
> I was just thinking that it might be worth passing in these Access structures.
>  It seems that usually we wind up computing them in several locations.
> 
> Hoisting it up it would make MVC look like
> 
> midx = cpu_mmu_index(env);
> srca = access_prepare_idx(env, src, size, LOAD, midx, ra);
> dsta = access_prepare_idx(env, dst, size, STORE, midx, ra);
> 
> if (dst == src + 1) {
> uint8_t x = access_get_byte(env, &srca, 0, ra);
> access_memset(env, &dsta, x, size, ra);
> } else if (!is_destructive_overlap(env, dst, src, size)) {
> access_memmove(env, &dsta, &srca, size, ra);
> } else {
> // byte by byte loop, but still need srca, dsta.
> }
> 
> which seems even More Correct, since the current access_memset path does not
> check for read watchpoints or faults on all of [src, src+size-1].
> 

I had precisely that in previous versions :) Can switch to that model.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PULL 0/1] Block patches

2019-09-13 Thread Peter Maydell

On Wed, 11 Sep 2019 at 15:36, Stefan Hajnoczi  wrote:
>
> The following changes since commit cc6613e244e86c66f83467eab5284825d7057cea:
>
>   Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' 
> into staging (2019-09-03 11:06:09 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to ebb6ff25cd888a52a64a9adc3692541c6d1d9a42:
>
>   virtio-blk: Cancel the pending BH when the dataplane is reset (2019-09-03 
> 16:11:18 +0100)
>
> 
> Pull request
>
> 
>
> Philippe Mathieu-Daudé (1):
>   virtio-blk: Cancel the pending BH when the dataplane is reset
>
>  hw/block/dataplane/virtio-blk.c | 3 +++
>  1 file changed, 3 insertions(+)


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM

Re: [Qemu-devel] [PATCH v3 6/8] iotests: Test driver whitelisting in 093

2019-09-13 Thread Max Reitz

On 20.08.19 23:32, John Snow wrote:
> 
> 
> On 8/19/19 4:18 PM, Max Reitz wrote:
>> null-aio may not be whitelisted.  Skip all test cases that require it.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  tests/qemu-iotests/093 | 12 +---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/093 b/tests/qemu-iotests/093
>> index 50c1e7f2ec..f03fa24a07 100755
>> --- a/tests/qemu-iotests/093
>> +++ b/tests/qemu-iotests/093
>> @@ -24,7 +24,7 @@ import iotests
>>  nsec_per_sec = 10
>>  
>>  class ThrottleTestCase(iotests.QMPTestCase):
>> -test_img = "null-aio://"
>> +test_driver = "null-aio"
>>  max_drives = 3
>>  
>>  def blockstats(self, device):
>> @@ -35,10 +35,14 @@ class ThrottleTestCase(iotests.QMPTestCase):
>>  return stat['rd_bytes'], stat['rd_operations'], 
>> stat['wr_bytes'], stat['wr_operations']
>>  raise Exception("Device not found for blockstats: %s" % device)
>>  
>> +def required_drivers(self):
>> +return [self.test_driver]
>> +
>> +@iotests.skip_if_unsupported(required_drivers)
> 
> Oh, I see why you're passing args[0] back to the callback now. Why not
> just pass self.required_drivers and call it with no arguments instead?
> 
> You can get a bound version that way that doesn't need additional
> arguments, and then the callback is free to take generic callables of
> any kind.

Am I doing something wrong?

I just get

+Traceback (most recent call last):
+  File "093", line 26, in 
+class ThrottleTestCase(iotests.QMPTestCase):
+  File "093", line 41, in ThrottleTestCase
+@iotests.skip_if_unsupported(self.required_drivers)
+NameError: name 'self' is not defined

this way.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] qapi/qmp-dispatch: Fix error class for reporting disabled commands

2019-09-13 Thread Markus Armbruster

Michal Privoznik  writes:

> If a command is disabled an error is reported. But due to usage
> of error_setg() the class of the error is GenericError which does
> not help callers in distinguishing this case from a case where a
> qmp command fails regularly due to other reasons. Use
> CommandNotFound error class which is much closer to the actual
> root cause.
>
> Signed-off-by: Michal Privoznik 
> ---

I'd like to tweak the commit message a bit:

  qmp-dispatch: Use CommandNotFound error for disabled commands

  If a command is disabled an error is reported.  But due to usage of
  error_setg() the class of the error is GenericError which does not
  help callers in distinguishing this case from a case where a qmp
  command fails regularly due to other reasons.

  We used to use class CommandDisabled until the great error
  simplification (commit de253f1491 for QMP and commit 93b91c59db for
  qemu-ga, both v1.2.0).

  Use CommandNotFound error class, which is close enough.

Objections?

Re: [Qemu-devel] [PATCH v6 25/42] mirror: Deal with filters

2019-09-13 Thread Kevin Wolf

Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> This includes some permission limiting (for example, we only need to
> take the RESIZE permission for active commits where the base is smaller
> than the top).
> 
> Signed-off-by: Max Reitz 
> ---
>  block/mirror.c | 117 ++---
>  blockdev.c |  47 +---
>  2 files changed, 131 insertions(+), 33 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 54bafdf176..6ddbfb9708 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -42,6 +42,7 @@ typedef struct MirrorBlockJob {
>  BlockBackend *target;
>  BlockDriverState *mirror_top_bs;
>  BlockDriverState *base;
> +BlockDriverState *base_overlay;
>  
>  /* The name of the graph node to replace */
>  char *replaces;
> @@ -665,8 +666,10 @@ static int mirror_exit_common(Job *job)
>   &error_abort);
>  if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>  BlockDriverState *backing = s->is_none_mode ? src : s->base;
> -if (backing_bs(target_bs) != backing) {
> -bdrv_set_backing_hd(target_bs, backing, &local_err);
> +BlockDriverState *unfiltered_target = 
> bdrv_skip_rw_filters(target_bs);
> +
> +if (bdrv_filtered_cow_bs(unfiltered_target) != backing) {
> +bdrv_set_backing_hd(unfiltered_target, backing, &local_err);
>  if (local_err) {
>  error_report_err(local_err);
>  ret = -EPERM;
> @@ -715,7 +718,7 @@ static int mirror_exit_common(Job *job)
>   * valid.
>   */
>  block_job_remove_all_bdrv(bjob);
> -bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), 
> &error_abort);
> +bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, 
> &error_abort);
>  
>  /* We just changed the BDS the job BB refers to (with either or both of 
> the
>   * bdrv_replace_node() calls), so switch the BB back so the cleanup does
> @@ -812,7 +815,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
> *s)
>  return 0;
>  }
>  
> -ret = bdrv_is_allocated_above(bs, base, false, offset, bytes, 
> &count);
> +ret = bdrv_is_allocated_above(bs, s->base_overlay, true, offset, 
> bytes,
> +  &count);
>  if (ret < 0) {
>  return ret;
>  }
> @@ -908,7 +912,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>  } else {
>  s->target_cluster_size = BDRV_SECTOR_SIZE;
>  }
> -if (backing_filename[0] && !target_bs->backing &&
> +if (backing_filename[0] && !bdrv_backing_chain_next(target_bs) &&
>  s->granularity < s->target_cluster_size) {
>  s->buf_size = MAX(s->buf_size, s->target_cluster_size);
>  s->cow_bitmap = bitmap_new(length);
> @@ -1088,8 +1092,9 @@ static void mirror_complete(Job *job, Error **errp)
>  if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
>  int ret;
>  
> -assert(!target->backing);
> -ret = bdrv_open_backing_file(target, NULL, "backing", errp);
> +assert(!bdrv_backing_chain_next(target));
> +ret = bdrv_open_backing_file(bdrv_skip_rw_filters(target), NULL,
> + "backing", errp);
>  if (ret < 0) {
>  return;
>  }
> @@ -1531,8 +1536,8 @@ static BlockJob *mirror_start_job(
>  MirrorBlockJob *s;
>  MirrorBDSOpaque *bs_opaque;
>  BlockDriverState *mirror_top_bs;
> -bool target_graph_mod;
>  bool target_is_backing;
> +uint64_t target_perms, target_shared_perms;
>  Error *local_err = NULL;
>  int ret;
>  
> @@ -1551,7 +1556,7 @@ static BlockJob *mirror_start_job(
>  buf_size = DEFAULT_MIRROR_BUF_SIZE;
>  }
>  
> -if (bs == target) {
> +if (bdrv_skip_rw_filters(bs) == bdrv_skip_rw_filters(target)) {
>  error_setg(errp, "Can't mirror node into itself");
>  return NULL;
>  }
> @@ -1615,15 +1620,50 @@ static BlockJob *mirror_start_job(
>   * In the case of active commit, things look a bit different, though,
>   * because the target is an already populated backing file in active use.
>   * We can allow anything except resize there.*/
> +
> +target_perms = BLK_PERM_WRITE;
> +target_shared_perms = BLK_PERM_WRITE_UNCHANGED;
> +
>  target_is_backing = bdrv_chain_contains(bs, target);
> -target_graph_mod = (backing_mode != MIRROR_LEAVE_BACKING_CHAIN);
> +if (target_is_backing) {
> +int64_t bs_size, target_size;
> +bs_size = bdrv_getlength(bs);
> +if (bs_size < 0) {
> +error_setg_errno(errp, -bs_size,
> + "Could not inquire top image size");
> +goto fail;
> +}
> +
> +target_size = bdrv_getlength(target);
> +if (target_size < 0) {
> +error_setg_errno(errp, -target_size,
> +

[Qemu-devel] [PATCH v4 0/3] Fix qcow2+luks corruption introduced by commit 8ac0f15f335

2019-09-13 Thread Maxim Levitsky

Commit 8ac0f15f335 accidently broke the COW of non changed areas
of newly allocated clusters, when the write spans multiple clusters,
and needs COW both prior and after the write.
This results in 'after' COW area being encrypted with wrong
sector address, which render it corrupted.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922

CC: qemu-stable 

V2: grammar, spelling and code style fixes.
V3: more fixes after the review.
V4: addressed review comments from Max Reitz,
and futher refactored the qcow2_co_encrypt to just take full host and guest 
offset
which simplifies everything.

Best regards,
Maxim Levitsky

Maxim Levitsky (3):
  block/qcow2: refactoring of threaded encryption code
  block/qcow2: fix the corruption when rebasing luks encrypted files
  qemu-iotests: Add test for bz #1745922

 block/qcow2-cluster.c  | 29 +++-
 block/qcow2-threads.c  | 62 --
 block/qcow2.c  |  5 ++-
 block/qcow2.h  |  8 ++--
 tests/qemu-iotests/263 | 91 ++
 tests/qemu-iotests/263.out | 40 +
 tests/qemu-iotests/group   |  1 +
 7 files changed, 205 insertions(+), 31 deletions(-)
 create mode 100755 tests/qemu-iotests/263
 create mode 100644 tests/qemu-iotests/263.out

-- 
2.17.2

[Qemu-devel] [PATCH v4 2/3] block/qcow2: fix the corruption when rebasing luks encrypted files

2019-09-13 Thread Maxim Levitsky

This fixes subtle corruption introduced by luks threaded encryption
in commit 8ac0f15f335

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922

The corruption happens when we do a write that
   * writes to two or more unallocated clusters at once
   * doesn't fully cover the first sector
   * doesn't fully cover the last sector

In this case, when allocating the new clusters we COW both areas
prior to the write and after the write, and we encrypt them.

The above mentioned commit accidentally made it so we encrypt the
second COW area using the physical cluster offset of the first area.

Fix this by:
 * Remove the offset_in_cluster parameter of do_perform_cow_encrypt,
   since it is misleading. That offset can be larger than cluster size
   currently.

   Instead just add the start and the end COW area offsets to both host
   and guest offsets that do_perform_cow_encrypt receives.

*  in do_perform_cow_encrypt, remove the cluster offset from the host_offset,
   and thus pass correctly to the qcow2_co_encrypt, the host cluster offset
   and full guest offset

In the bugreport that was triggered by rebasing a luks image to new,
zero filled base, which lot of such writes, and causes some files
with zero areas to contain garbage there instead.
But as described above it can happen elsewhere as well


Signed-off-by: Maxim Levitsky 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 46b0854d7e..47ec3511a5 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -463,21 +463,21 @@ static int coroutine_fn 
do_perform_cow_read(BlockDriverState *bs,
 }
 
 static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
-uint64_t guest_cluster_offset,
-uint64_t host_cluster_offset,
-unsigned offset_in_cluster,
+uint64_t guest_offset,
+uint64_t host_offset,
 uint8_t *buffer,
 unsigned bytes)
 {
 if (bytes && bs->encrypted) {
 BDRVQcow2State *s = bs->opaque;
-assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
-assert((bytes & ~BDRV_SECTOR_MASK) == 0);
+
+assert(QEMU_IS_ALIGNED(guest_offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(host_offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
 assert(s->crypto);
-if (qcow2_co_encrypt(bs,
- host_cluster_offset + offset_in_cluster,
- guest_cluster_offset + offset_in_cluster,
- buffer, bytes) < 0) {
+
+if (qcow2_co_encrypt(bs, host_offset, guest_offset,
+buffer, bytes) < 0) {
 return false;
 }
 }
@@ -891,11 +891,15 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta 
*m)
 
 /* Encrypt the data if necessary before writing it */
 if (bs->encrypted) {
-if (!do_perform_cow_encrypt(bs, m->offset, m->alloc_offset,
-start->offset, start_buffer,
+if (!do_perform_cow_encrypt(bs,
+m->offset + start->offset,
+m->alloc_offset + start->offset,
+start_buffer,
 start->nb_bytes) ||
-!do_perform_cow_encrypt(bs, m->offset, m->alloc_offset,
-end->offset, end_buffer, end->nb_bytes)) {
+!do_perform_cow_encrypt(bs,
+m->offset + end->offset,
+m->alloc_offset + end->offset,
+end_buffer, end->nb_bytes)) {
 ret = -EIO;
 goto fail;
 }
-- 
2.17.2

[Qemu-devel] [PATCH v4 1/3] block/qcow2: refactoring of threaded encryption code

2019-09-13 Thread Maxim Levitsky

This commit tries to clarify few function arguments,
and add comments describing the encrypt/decrypt interface

Signed-off-by: Maxim Levitsky 
---
 block/qcow2-cluster.c |  9 ---
 block/qcow2-threads.c | 62 ++-
 block/qcow2.c |  5 ++--
 block/qcow2.h |  8 +++---
 4 files changed, 61 insertions(+), 23 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f09cc992af..46b0854d7e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -463,8 +463,8 @@ static int coroutine_fn 
do_perform_cow_read(BlockDriverState *bs,
 }
 
 static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
-uint64_t src_cluster_offset,
-uint64_t cluster_offset,
+uint64_t guest_cluster_offset,
+uint64_t host_cluster_offset,
 unsigned offset_in_cluster,
 uint8_t *buffer,
 unsigned bytes)
@@ -474,8 +474,9 @@ static bool coroutine_fn 
do_perform_cow_encrypt(BlockDriverState *bs,
 assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
 assert((bytes & ~BDRV_SECTOR_MASK) == 0);
 assert(s->crypto);
-if (qcow2_co_encrypt(bs, cluster_offset,
- src_cluster_offset + offset_in_cluster,
+if (qcow2_co_encrypt(bs,
+ host_cluster_offset + offset_in_cluster,
+ guest_cluster_offset + offset_in_cluster,
  buffer, bytes) < 0) {
 return false;
 }
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 3b1e63fe41..9646243a9b 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -234,15 +234,15 @@ static int qcow2_encdec_pool_func(void *opaque)
 }
 
 static int coroutine_fn
-qcow2_co_encdec(BlockDriverState *bs, uint64_t file_cluster_offset,
-  uint64_t offset, void *buf, size_t len, Qcow2EncDecFunc func)
+qcow2_co_encdec(BlockDriverState *bs, uint64_t host_offset,
+uint64_t guest_offset, void *buf, size_t len,
+Qcow2EncDecFunc func)
 {
 BDRVQcow2State *s = bs->opaque;
+
 Qcow2EncDecData arg = {
 .block = s->crypto,
-.offset = s->crypt_physical_offset ?
-  file_cluster_offset + offset_into_cluster(s, offset) :
-  offset,
+.offset = s->crypt_physical_offset ? host_offset : guest_offset,
 .buf = buf,
 .len = len,
 .func = func,
@@ -251,18 +251,54 @@ qcow2_co_encdec(BlockDriverState *bs, uint64_t 
file_cluster_offset,
 return qcow2_co_process(bs, qcow2_encdec_pool_func, &arg);
 }
 
+
+/*
+ * qcow2_co_encrypt()
+ *
+ * Encrypts one or more contiguous aligned sectors
+ *
+ * @host_offset - underlying storage offset of the first sector of the
+ * data to be encrypted
+
+ * @guest_offset - guest (virtual) offset of the first sector of the
+ * data to be encrypted
+ *
+ * @buf - buffer with the data to encrypt, that after encryption
+ *will be written to the underlying storage device at
+ *@host_offset
+ *
+ * @len - length of the buffer (must be a BDRV_SECTOR_SIZE multiple)
+ *
+ * Depending on the encryption method, @host_cluster_offset and/or 
@guest_offset
+ * may be used for generating the initialization vector for
+ * encryption.
+ *
+ * Note that while the whole range must be aligned on sectors, it
+ * does not have to be aligned on clusters and can also cross cluster
+ * boundaries
+ *
+ */
 int coroutine_fn
-qcow2_co_encrypt(BlockDriverState *bs, uint64_t file_cluster_offset,
- uint64_t offset, void *buf, size_t len)
+qcow2_co_encrypt(BlockDriverState *bs, uint64_t host_offset,
+ uint64_t guest_offset, void *buf, size_t len)
 {
-return qcow2_co_encdec(bs, file_cluster_offset, offset, buf, len,
- qcrypto_block_encrypt);
+return qcow2_co_encdec(bs, host_offset, guest_offset, buf, len,
+   qcrypto_block_encrypt);
 }
 
+
+/*
+ * qcow2_co_decrypt()
+ *
+ * Decrypts one or more contiguous aligned sectors
+ * Similar to qcow2_co_encrypt
+ *
+ */
+
 int coroutine_fn
-qcow2_co_decrypt(BlockDriverState *bs, uint64_t file_cluster_offset,
- uint64_t offset, void *buf, size_t len)
+qcow2_co_decrypt(BlockDriverState *bs, uint64_t host_offset,
+ uint64_t guest_offset, void *buf, size_t len)
 {
-return qcow2_co_encdec(bs, file_cluster_offset, offset, buf, len,
- qcrypto_block_decrypt);
+return qcow2_co_encdec(bs, host_offset, guest_offset, buf, len,
+   qcrypto_block_decrypt);
 }
diff --git a/block/qcow2.c b/b

1 2 3 >

1 - 100 of 257 matches

Mail list logo