date:20220718

On Mon, Jul 18, 2022 at 7:48 AM Jason Wang  wrote:
>
> On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez  wrote:
> >
> > The SVQ vring used idx usually match with the guest visible one, as long
> > as all the guest buffers (GPA) maps to exactly one buffer within qemu's
> > VA. However, as we can see in virtqueue_map_desc, a single guest buffer
> > could map to many buffers in SVQ vring.
> >
> > The solution is to stop using the device's used idx and check for the
> > last avail idx. Since we cannot report in-flight descriptors with vdpa,
> > let's rewind all of them.
> >
> > Fixes: 6d0b22266633 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  hw/virtio/vhost-vdpa.c | 8 +---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 795ed5a049..18820498b3 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1194,11 +1194,10 @@ static int vhost_vdpa_get_vring_base(struct 
> > vhost_dev *dev,
> > struct vhost_vring_state *ring)
> >  {
> >  struct vhost_vdpa *v = dev->opaque;
> > -int vdpa_idx = ring->index - dev->vq_index;
> >  int ret;
> >
> >  if (v->shadow_vqs_enabled) {
> > -VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, 
> > vdpa_idx);
> > +VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
> >
> >  /*
> >   * Setting base as last used idx, so destination will see as 
> > available
> > @@ -1208,7 +1207,10 @@ static int vhost_vdpa_get_vring_base(struct 
> > vhost_dev *dev,
> >   * TODO: This is ok for networking, but other kinds of devices 
> > might
> >   * have problems with these retransmissions.
> >   */
> > -ring->num = svq->last_used_idx;
> > +while (virtqueue_rewind(vq, 1)) {
> > +continue;
> > +}
>
> Will this leak mapped VirtQueueElement?
>

vhost_get_vring_base op is called only on the device's teardown path,
so they should be free by vhost_svq_stop.

However, you have a point that maybe vhost_get_vring_base should not
trust in that cleaning, even for -stable.

So I think it should be better to squash this and the next one as the
same fix. But it's doing two things at the same time: One of them is
to use the right state (as vring_base), and another one is not
reverting in-flight descriptors.

Thanks!

> Thanks
>
> > +ring->num = virtio_queue_get_last_avail_idx(dev->vdev, 
> > ring->index);
> >  return 0;
> >  }
> >
> > --
> > 2.31.1
> >
>

Re: [RFC PATCH 02/12] vhost: Move SVQ queue rewind to the destination

On Mon, Jul 18, 2022 at 7:50 AM Jason Wang  wrote:
>
> On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez  wrote:
> >
> > Migration with SVQ already migrate the inflight descriptors,
>
> How is this done?
>

Migration between vhost-vdpa with x-svq=on maintain the guest's
visible state in VirtQueues, and they already have all the protocols
to migrate them.

We cannot migrate if we cannot restore the state / inflight in the
destination, but since we need x-svq=on at this point, we can restore
both of them. Extra care will be needed when ASID is introduced.

> > so the
> > destination can perform the work.
> >
> > This makes easier to migrate between backends or to recover them in
> > vhost devices that support set in flight descriptors.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  hw/virtio/vhost-vdpa.c | 24 +++-
> >  1 file changed, 11 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 18820498b3..4458c8d23e 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1178,7 +1178,18 @@ static int vhost_vdpa_set_vring_base(struct 
> > vhost_dev *dev,
> > struct vhost_vring_state *ring)
> >  {
> >  struct vhost_vdpa *v = dev->opaque;
> > +VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
> >
> > +/*
> > + * vhost-vdpa devices does not support in-flight requests. Set all of 
> > them
> > + * as available.
> > + *
> > + * TODO: This is ok for networking, but other kinds of devices might
> > + * have problems with these retransmissions.
> > + */
> > +while (virtqueue_rewind(vq, 1)) {
> > +continue;
> > +}
> >  if (v->shadow_vqs_enabled) {
> >  /*
> >   * Device vring base was set at device start. SVQ base is handled 
> > by
> > @@ -1197,19 +1208,6 @@ static int vhost_vdpa_get_vring_base(struct 
> > vhost_dev *dev,
> >  int ret;
> >
> >  if (v->shadow_vqs_enabled) {
> > -VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
> > -
> > -/*
> > - * Setting base as last used idx, so destination will see as 
> > available
> > - * all the entries that the device did not use, including the 
> > in-flight
> > - * processing ones.
> > - *
> > - * TODO: This is ok for networking, but other kinds of devices 
> > might
> > - * have problems with these retransmissions.
> > - */
> > -while (virtqueue_rewind(vq, 1)) {
> > -continue;
> > -}
>
> I think it's not a good idea to partially revert the code that was
> just introduced in the previous patch (unless it can be backported to
> -stable independently).
>
> We probably need to squash the changes.
>

You definitely have a point. Maybe it's better to remove the "Fixes"
tag? -stable version cannot enable SVQ anyway (it doesn't have the
parameter).

I'll send a v2 when we have all the comments sorted out. Maybe a
better alternative is to delay the x-svq parameter to the top of both
series if both of them go in?

Thanks!

> Thanks
>
> >  ring->num = virtio_queue_get_last_avail_idx(dev->vdev, 
> > ring->index);
> >  return 0;
> >  }
> > --
> > 2.31.1
> >
>

Re: [RFC PATCH 03/12] vdpa: Small rename of error labels

On Mon, Jul 18, 2022 at 7:50 AM Jason Wang  wrote:
>
> On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez  wrote:
> >
> > So later patches are cleaner
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  hw/virtio/vhost-vdpa.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 4458c8d23e..906c365036 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1039,7 +1039,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev 
> > *dev)
> >  int r;
> >  bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err);
> >  if (unlikely(!ok)) {
> > -goto err;
> > +goto err_svq_setup;
> >  }
> >
> >  vhost_svq_start(svq, dev->vdev, vq);
> > @@ -1064,8 +1064,7 @@ err_set_addr:
> >  err_map:
> >  vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, i));
> >
> > -err:
> > -error_reportf_err(err, "Cannot setup SVQ %u: ", i);
>
> This does not look like a simple rename.
>

That's right.

I'll drop his patch from the series, it's not really useful and only adds noise.

Thanks!

> Thanks
>
> > +err_svq_setup:
> >  for (unsigned j = 0; j < i; ++j) {
> >  VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, j);
> >  vhost_vdpa_svq_unmap_rings(dev, svq);
> > --
> > 2.31.1
> >
>

Re: Building tools on unsupported cpu/arch


On 30/04/2022 16.11, Michael Tokarev wrote:

Hello!

Previously, it was possible to build qemu tools (such as qemu-img, or qemu-ga)
on an unsupported cpu/architecture.  In a hackish way, by specifying
--enable-tcg-interpreter on the ./configure line.

Today (with 7.0), it does not work anymore, with the following error
during configure:

  common-user/meson.build:1:0: ERROR: Include dir host/unknown does not exist.

This is with --disable-system --disable-linux-user --disable-user.

And without --enable-tcg-interpreter, it gives:

  meson.build:390:6: ERROR: Problem encountered: Unsupported CPU m68k, try 
--enable-tcg-interpreter


What's the way to build tools on an unsupported architecture these days?


 Hi Michael,

I think all required patches should now have been merged - could you please 
try again whether building the tools now works again for you on unsupported 
CPU architectures?


 Thanks,
  Thomas

RE: [PATCH] i386: Disable BTS and PEBS

2022-07-18 Thread Duan, Zhenzhong



>-Original Message-
>From: Like Xu 
>Sent: Monday, July 18, 2022 11:57 AM
>To: Duan, Zhenzhong 
>Cc: pbonz...@redhat.com; mtosa...@redhat.com; Christopherson,, Sean
>; Ma, XiangfeiX ; qemu-
>de...@nongnu.org
>Subject: Re: [PATCH] i386: Disable BTS and PEBS
>
>On 18/7/2022 11:22 am, Zhenzhong Duan wrote:
>> Since below KVM commit, KVM hided BTS as it's not supported yet.
>> b9181c8ef356 ("KVM: x86/pmu: Avoid exposing Intel BTS feature")
>>
>> After below KVM commit, it gave control of MSR_IA32_MISC_ENABLES to
>userspace.
>> 9fc222967a39 ("KVM: x86: Give host userspace full control of
>> MSR_IA32_MISC_ENABLES")
>>
>> So qemu takes the responsibility to hide BTS.
>> Without fix, we get below warning in guest kernel:
>>
>> [] unchecked MSR access error: WRMSR to 0x1d9 (tried to write
>> 0x01c0) at rIP: 0xaa070644
>(native_write_msr+0x4/0x20) [] Call Trace:
>> []  
>> []  intel_pmu_enable_bts+0x5d/0x70
>> []  bts_event_add+0x77/0x90
>> []  event_sched_in.isra.135+0x99/0x1e0
>>
>> Tested-by: Xiangfei Ma 
>> Signed-off-by: Zhenzhong Duan 
>> ---
>>   target/i386/cpu.h | 6 --
>>   target/i386/kvm/kvm.c | 4 
>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h index
>> 82004b65b944..8a83d0995c66 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -434,8 +434,10 @@ typedef enum X86Seg {
>>
>>   #define MSR_IA32_MISC_ENABLE0x1a0
>>   /* Indicates good rep/movs microcode on some processors: */
>> -#define MSR_IA32_MISC_ENABLE_DEFAULT1
>> -#define MSR_IA32_MISC_ENABLE_MWAIT  (1ULL << 18)
>> +#define MSR_IA32_MISC_ENABLE_DEFAULT  1
>> +#define MSR_IA32_MISC_ENABLE_BTS_UNAVAIL  (1ULL << 11) #define
>> +MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL (1ULL << 12)
>> +#define MSR_IA32_MISC_ENABLE_MWAIT(1ULL << 18)
>>
>>   #define MSR_MTRRphysBase(reg)   (0x200 + 2 * (reg))
>>   #define MSR_MTRRphysMask(reg)   (0x200 + 2 * (reg) + 1)
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index
>> f148a6d52fa4..002e0520dd76 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -2180,6 +2180,10 @@ void kvm_arch_reset_vcpu(X86CPU *cpu)
>>   {
>>   CPUX86State *env = &cpu->env;
>>
>> +/* Disable BTS feature which is unsupported on KVM */
>> +env->msr_ia32_misc_enable |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
>> +env->msr_ia32_misc_enable |=
>MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
>
>Would it be more readable to group msr_ia32_misc_enable code into this
>function:
>
>   static void x86_cpu_reset(DeviceState *dev)

OK, will do. Initially I thought BTS/PEBS stuffs will only be implemented in 
KVM,
so thought putting it in kvm_arch_reset_vcpu() may be better.

>
>and, why disable PEBS (we need it at least for "-cpu host,migratable=no") ?

Will fix. I see in below commit in KVM side, PEBS is initialized as unavailable 
together with BTS.
9fc222967a39 ("KVM: x86: Give host userspace full control of 
MSR_IA32_MISC_ENABLES").
Then I thought PEBS is unsupported too.

>
>Also, the behavior of MISC_ENABLE_EMON is also inconsistent with
>"pmu=off”.

Will init EMON bit based on pmu setting. Thanks!

Zhenzhong

Re: [PATCH v4 00/12] Improve reliability of VM tests


On 12/07/2022 20.34, John Snow wrote:

On Fri, Jul 8, 2022 at 11:35 AM John Snow  wrote:


Note: patches 10-12 are included for testing simplicity, they shouldn't
be merged. They will be included in a forthcoming block PR.


Patches 1-9 are fully reviewed. Whose tree should this go in?


I can take them - unless Alex beats me with his next testing pull request ;-)

Queued to my testing-next now:

 https://gitlab.com/thuth/qemu/-/commits/testing-next

 Thomas

Re: [PATCH v4 00/12] Improve reliability of VM tests


On 08/07/2022 17.34, John Snow wrote:

Note: patches 10-12 are included for testing simplicity, they shouldn't
be merged. They will be included in a forthcoming block PR.

V4:

- Addressed concern by Marc-Andre in patch 01.
- Squashed Ubuntu patches (rth)

This patch series attempts to improve the reliability of several of the
VM test targets. In particular, both CentOS 8 tests are non-functional
because CentOS 8 was EOL at the beginning of this calendar year, with
repositories and mirrors going offline.

I also remove the ubuntu.i386 test because we no longer support Ubuntu
18.04 nor do we have explicit need of an i386 build test.

After this series, I am able to successfully run every VM target on an
x86_64 host, except:

- ubuntu.aarch64: Hangs often during testing, see below.
- centos.aarch64: Hangs often during testing, see below.
- haiku.x86_64: Build failures not addressed by this series, see
   https://lists.gnu.org/archive/html/qemu-devel/2022-06/msg02103.html

The unit tests that I see fail most often under aarch64 are:

- virtio-net-failover: Seems to like to hang on openbsd
- migration-test: Tends to hang under aarch64 tcg

Future work (next version? next series?);

- Try to get centos.aarch64 working reliably under TCG
- Upgrade ubuntu.aarch64 to 20.04 after fixing centos.aarch64
- Fix the Haiku build test, if possible.
- Ensure I can reliably run and pass "make vm-build-all".
   (Remove VMs from this recipe if necessary.)


Not sure whether it's related to your patches, but when testing these I just 
got an error while running 'vm-build-openbsd' :


VM-BUILD openbsd
fatal: not a valid object name:
failed to archive qemu
Failed to prepare guest environment
Traceback (most recent call last):
  File "/home/thuth/devel/qemu/tests/vm/basevm.py", line 641, in main
vm.add_source_dir(args.build_qemu)
  File "/home/thuth/devel/qemu/tests/vm/basevm.py", line 277, in add_source_dir
stdout=self._stdout, stderr=self._stderr)
  File "/usr/lib64/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['./scripts/archive-source.sh', 
'/home/thuth/tmp/qemu-build/vm-test-rm_z92hq.tmp/data-44e42.tar']' returned 
non-zero exit status 1.


The error did not occur again when running the command again, though.

 Thomas

[PATCH] x86: cpu: Error out if memory exceeds addressable range

2022-07-18 Thread Nikunj A Dadhania

Currently it is possible to start a guest with memory that is beyond
the addressable range of CPU and QEMU does not even warn about it.
The default phys_bits is 40 and can address 1TB. However it allows to
start a guest with greater than 1TB memory.

Prevent this by erroring out in such a scenario.

Reported-by: Shaju Abraham 
Signed-off-by: Nikunj A Dadhania 
---
 target/i386/cpu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6a57ef13af..1afbdbac7d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6376,6 +6376,7 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu)
 
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 {
+MachineState *machine = MACHINE(qdev_get_machine());
 CPUState *cs = CPU(dev);
 X86CPU *cpu = X86_CPU(dev);
 X86CPUClass *xcc = X86_CPU_GET_CLASS(dev);
@@ -6541,6 +6542,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 }
 }
 
+if (BIT_ULL(cpu->phys_bits) < machine->maxram_size) {
+error_setg(&local_err, "cannot setup guest memory: "
+   "%s memory(%lu MiB) exceeds addressable limit(%llu MiB)",
+   machine->maxram_size == machine->ram_size ? "" : "max",
+   machine->maxram_size / MiB,
+   BIT_ULL(cpu->phys_bits) / MiB);
+goto out;
+}
+
 /* Cache information initialization */
 if (!cpu->legacy_cache) {
 if (!xcc->model || !xcc->model->cpudef->cache_info) {
-- 
2.32.0

Re: [PATCH 1/4] Modifying ‘migrate’ qmp command to add multi-FD socket on particular source and destination pair

Het Gala  writes:

> i) Modified the format of the qemu monitor command : 'migrate' by adding a 
> list,
>each element in the list consists of multi-FD connection parameters: source
>and destination uris and of the number of multi-fd channels between each 
> pair.
>
> ii) Information of all multi-FD connection parameters’ list, length of the 
> list
> and total number of multi-fd channels for all the connections together is
> stored in ‘OutgoingArgs’ struct.
>
> Suggested-by: Manish Mishra 
> Signed-off-by: Het Gala 
> ---
>  include/qapi/util.h   |  9 
>  migration/migration.c | 47 ++
>  migration/socket.c| 53 ---
>  migration/socket.h| 17 +-
>  monitor/hmp-cmds.c| 22 --
>  qapi/migration.json   | 43 +++
>  6 files changed, 170 insertions(+), 21 deletions(-)
>
> diff --git a/include/qapi/util.h b/include/qapi/util.h
> index 81a2b13a33..3041feb3d9 100644
> --- a/include/qapi/util.h
> +++ b/include/qapi/util.h
> @@ -56,4 +56,13 @@ int parse_qapi_name(const char *name, bool complete);
>  (tail) = &(*(tail))->next; \
>  } while (0)
>  
> +#define QAPI_LIST_LENGTH(list) ({ \
> +int _len = 0; \
> +typeof(list) _elem; \
> +for (_elem = list; _elem != NULL; _elem = _elem->next) { \
> +_len++; \
> +} \
> +_len; \
> +})
> +

Unless there is a compelling reason for open-coding this, make it a
(non-inline) function.

>  #endif
> diff --git a/migration/migration.c b/migration/migration.c
> index 31739b2af9..c408175aeb 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2328,13 +2328,14 @@ static bool migrate_prepare(MigrationState *s, bool 
> blk, bool blk_inc,
>  return true;
>  }
>  
> -void qmp_migrate(const char *uri, bool has_blk, bool blk,
> +void qmp_migrate(const char *uri, bool has_multi_fd_uri_list,
> + MigrateUriParameterList *cap, bool has_blk, bool blk,
>   bool has_inc, bool inc, bool has_detach, bool detach,
>   bool has_resume, bool resume, Error **errp)
>  {
>  Error *local_err = NULL;
>  MigrationState *s = migrate_get_current();
> -const char *p = NULL;
> +const char *dst_ptr = NULL;
>  
>  if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
>   has_resume && resume, errp)) {
> @@ -2348,20 +2349,46 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  }
>  }
>  
> +/*
> + * In case of Multi-FD migration parameters, if uri is provided,
> + * supports only tcp network protocol.
> + */
> +if (has_multi_fd_uri_list) {
> +int length = QAPI_LIST_LENGTH(cap);
> +init_multifd_array(length);
> +for (int i = 0; i < length; i++) {
> +const char *p1 = NULL, *p2 = NULL;
> +const char *multifd_dst_uri = cap->value->destination_uri;
> +const char *multifd_src_uri = cap->value->source_uri;
> +uint8_t multifd_channels = cap->value->multifd_channels;
> +if (!strstart(multifd_dst_uri, "tcp:", &p1) ||
> +!strstart(multifd_src_uri, "tcp:", &p2)) {
> +error_setg(errp, "multi-fd destination and multi-fd source "
> +"uri, both should be present and follows tcp protocol only");
> +break;
> +} else {
> +store_multifd_migration_params(p1 ? p1 : multifd_dst_uri,
> +p2 ? p2 : multifd_src_uri,
> +multifd_channels, i, &local_err);
> +}
> +cap = cap->next;
> +}
> +}
> +
>  migrate_protocol_allow_multi_channels(false);
> -if (strstart(uri, "tcp:", &p) ||
> +if (strstart(uri, "tcp:", &dst_ptr) ||
>  strstart(uri, "unix:", NULL) ||
>  strstart(uri, "vsock:", NULL)) {
>  migrate_protocol_allow_multi_channels(true);
> -socket_start_outgoing_migration(s, p ? p : uri, &local_err);
> +socket_start_outgoing_migration(s, dst_ptr ? dst_ptr : uri, 
> &local_err);
>  #ifdef CONFIG_RDMA
> -} else if (strstart(uri, "rdma:", &p)) {
> -rdma_start_outgoing_migration(s, p, &local_err);
> +} else if (strstart(uri, "rdma:", &dst_ptr)) {
> +rdma_start_outgoing_migration(s, dst_ptr, &local_err);
>  #endif
> -} else if (strstart(uri, "exec:", &p)) {
> -exec_start_outgoing_migration(s, p, &local_err);
> -} else if (strstart(uri, "fd:", &p)) {
> -fd_start_outgoing_migration(s, p, &local_err);
> +} else if (strstart(uri, "exec:", &dst_ptr)) {
> +exec_start_outgoing_migration(s, dst_ptr, &local_err);
> +} else if (strstart(uri, "fd:", &dst_ptr)) {
> +fd_start_outgoing_migration(s, dst_ptr, &local_err);
>  } else {
>  if (!(has_resume && resume)) {
>  yank_

Re: [RFC PATCH 8/8] block: remove all unused ->can_set_aio_ctx and ->set_aio_ctx callbacks

2022-07-18 Thread Emanuele Giuseppe Esposito




Am 15/07/2022 um 16:34 schrieb Hanna Reitz:
> On 12.07.22 23:19, Emanuele Giuseppe Esposito wrote:
>> Together with all _can_set_ and _set_ APIs, as they are not needed
>> anymore.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> ---
>>   block.c    | 196 -
>>   block/block-backend.c  |  33 -
>>   blockjob.c |  35 --
>>   include/block/block-global-state.h |   9 --
>>   include/block/block_int-common.h   |   4 -
>>   5 files changed, 277 deletions(-)
> 
> Looks good!  I’d just like a follow-up commit that also drops
> bdrv_try_set_aio_context(), so it’s all gone (I think that’s the final
> remnant?).
> 

It's the same for me, I thought renaming bdrv_try_set_aio_context was a
little bit unnecessary. You want to rename it to something else, or
directly call bdrv_child_try_change_aio_context?

I agree with the rest of comments in this series :)

Emanuele

Re: [RFC PATCH 3/8] RFC: block: use transactions as a replacement of ->{can_}set_aio_context()

2022-07-18 Thread Emanuele Giuseppe Esposito




Am 14/07/2022 um 18:45 schrieb Hanna Reitz:
> On 12.07.22 23:19, Emanuele Giuseppe Esposito wrote:
>> -
>> RFC because I am not sure about the AioContext locks.
>> - Do we need to take the new AioContext lock? what does it protect?
>> - Taking the old AioContext lock is required now, because of
>>    bdrv_drained_begin calling AIO_WAIT_WHILE that unlocks the
>>    aiocontext. If we replace it with AIO_WAIT_WHILE_UNLOCKED,
>>    could we get rid of taking every time the old AioContext?
>>    drain would be enough to protect the graph modification.
> 
> It’s been a while, but as far as I remember (which may be wrong), the
> reason for how the locks are supposed to be taken was mostly that we
> need some defined state so that we know how to invoke
> bdrv_drained_begin() and bdrv_drained_end() (i.e. call the first one
> as-is, and switch the locks around for the latter one).

Right, so bdrv_drained_begin must be always under old aiocontext lock,
bdrv_drained_end always under the new one.

If so, then I think I can make the whole thing even cleaner:

1. bdrv_drained_begin is taken in bdrv_change_aio_context, and not in a
transaction. This was the initial idea, but somehow it didn't work,
something was failing. However, I must have changed something now
because all tests are passing :)
Then old aiocontext lock is assumed to be taken by the caller.

2. old aiocontext lock is released

3. set_aio_context (actual detach + attach operation) are kept in a
commit transaciton, as it is now. They don't need any aiocontext lock,
as far as I understand. bdrv_drained_end is implemented in a .clean
transaction

4. new aiocontext lock is taken

5. either tran_abort or tran_commit, depending on the result of the
recursion.

Disadvantage: the unlock/lock mess is still there, but at least we know
now why we need that.
Advantage: nothing should break (because it is similar as it was
before), no double lock held, and I can always add an additional patch
where I use _UNLOCKED functions and see that happens. In addition, no
tran_add_tail needed.

> 
> The idea of using _UNLOCKED sounds interesting, almost too obvious. I
> can’t see why that wouldn’t work, actually.
> 
>> --
>>
>> Simplify the way the aiocontext can be changed in a BDS graph.
>> There are currently two problems in bdrv_try_set_aio_context:
>> - There is a confusion of AioContext locks taken and released, because
>>    we assume that old aiocontext is always taken and new one is
>>    taken inside.
> 
> Yep, and that assumption is just broken in some cases, which is the main
> pain point I’m feeling with it right now.
> 
> For example, look at bdrv_attach_child_common(): Here, we attach a child
> to a parent, so we need to get them into a common AioContext. So first
> we try to put the child into the parent’s context, and if that fails,
> we’ll try the other way, putting the parent into the child’s context.
> 
> The problem is clear: The bdrv_try_set_aio_context() call requires us to
> hold the child’s current context but not the parent’s, and the
> child_class->set_aio_ctx() call requires the exact opposite.  But we
> never switch the context we have acquired, so this can’t both work. 
> Better yet, nowhere is it defined what context a caller to
> bdrv_attach_child_common() will hold.
> 
> In practice, what happens here most of the time is that something will
> be moved from the main context to some other context, and since we’re in
> the main context already, that’ll just work.  But you can construct
> cases where something is attempted to be moved from an I/O thread into a
> different thread and then you’ll get a crash.
> 
> I’d be happy if we could do away with the requirement of having to hold
> any lock for changing a node’s AioContext.
> 
>> - It doesn't look very safe to call bdrv_drained_begin while some
>>    nodes have already switched to the new aiocontext and others haven't.
>>    This could be especially dangerous because bdrv_drained_begin
>> polls, so
>>    something else could be executed while graph is in an inconsistent
>>    state.
>>
>> Additional minor nitpick: can_set and set_ callbacks both traverse the
>> graph, both using the ignored list of visited nodes in a different way.
>>
>> Therefore, get rid of all of this and introduce a new callback,
>> change_aio_context, that uses transactions to efficiently, cleanly
>> and most importantly safely change the aiocontext of a graph.
>>
>> This new callback is a "merge" of the two previous ones:
>> - Just like can_set_aio_context, recursively traverses the graph.
>>    Marks all nodes that are visited using a GList, and checks if
>>    they *could* change the aio_context.
>> - For each node that passes the above check, add a new transaction
>>    that implements a callback that effectively changes the aiocontext.
>> - If a node is a BDS, add two transactions: one taking care of draining
>>    the node at the beginning of the list (so that will be executed first)
>>    and one at t

[PATCH v2 0/1] MAINTAINERS: Add myself as Guest Agent co-maintainer

v1 -> v2: Rebase to current master
v1: https://patchew.org/QEMU/20220713133249.2229623-1-kkost...@redhat.com/

Konstantin Kostiuk (1):
  MAINTAINERS: Add myself as Guest Agent co-maintainer

 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.25.1

[PATCH v2 1/1] MAINTAINERS: Add myself as Guest Agent co-maintainer

Signed-off-by: Konstantin Kostiuk 
Acked-by: Michael Roth 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ead2bed652..6af9cd985c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2881,7 +2881,7 @@ T: git https://repo.or.cz/qemu/armbru.git qapi-next
 
 QEMU Guest Agent
 M: Michael Roth 
-R: Konstantin Kostiuk 
+M: Konstantin Kostiuk 
 S: Maintained
 F: qga/
 F: docs/interop/qemu-ga.rst
-- 
2.25.1

Re: [RFC PATCH 07/12] vhost: Add VhostVDPAStartOp operation

On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez  wrote:
>
> It allows to run commands at start of the device, before it have enabled
> any queue.
>
> Signed-off-by: Eugenio Pérez 
> ---
>  hw/virtio/vhost-shadow-virtqueue.h | 3 +++
>  include/hw/virtio/vhost-vdpa.h | 5 +
>  hw/virtio/vhost-vdpa.c | 8 
>  3 files changed, 16 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> b/hw/virtio/vhost-shadow-virtqueue.h
> index 03eb7ff670..210fe393cd 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -26,6 +26,8 @@ typedef struct SVQDescState {
>  } SVQDescState;
>
>  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> +typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
> +void *opaque);
>
>  /**
>   * Callback to handle an avail buffer.
> @@ -43,6 +45,7 @@ typedef int (*VirtQueueAvailCallback)(VhostShadowVirtqueue 
> *svq,
>void *vq_callback_opaque);
>
>  typedef struct VhostShadowVirtqueueOps {
> +ShadowVirtQueueStart start;

What's the difference between this and start_op?

Thanks

>  VirtQueueAvailCallback avail_handler;
>  } VhostShadowVirtqueueOps;
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index d10a89303e..b7d18b4e30 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -24,6 +24,10 @@ typedef struct VhostVDPAHostNotifier {
>  void *addr;
>  } VhostVDPAHostNotifier;
>
> +struct vhost_vdpa;
> +/* Called after send DRIVER_OK but after enabling vrings */
> +typedef int (*VhostVDPAStartOp)(struct vhost_vdpa *v);
> +
>  typedef struct vhost_vdpa {
>  int device_fd;
>  int index;
> @@ -39,6 +43,7 @@ typedef struct vhost_vdpa {
>  GPtrArray *shadow_vqs;
>  const VhostShadowVirtqueueOps *shadow_vq_ops;
>  void *shadow_vq_ops_opaque;
> +VhostVDPAStartOp start_op;
>  struct vhost_dev *dev;
>  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>  } VhostVDPA;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 1d8829c619..48f031b8c0 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1136,6 +1136,14 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
> bool started)
>  if (unlikely(r)) {
>  return r;
>  }
> +
> +if (v->start_op) {
> +r = v->start_op(v);
> +if (unlikely(r)) {
> +return r;
> +}
> +}
> +
>  vhost_vdpa_set_vring_ready(dev);
>  } else {
>  vhost_vdpa_reset_device(dev);
> --
> 2.31.1
>

Re: [RFC PATCH 09/12] vdpa: Extract vhost_vdpa_net_svq_add from vhost_vdpa_net_handle_ctrl_avail




在 2022/7/16 19:34, Eugenio Pérez 写道:

So we can reuse to inject state messages.

Signed-off-by: Eugenio Pérez 
---
  net/vhost-vdpa.c | 89 +++-
  1 file changed, 51 insertions(+), 38 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 211bd0468b..aaae51a778 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -334,6 +334,54 @@ static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
  return true;
  }
  
+static virtio_net_ctrl_ack vhost_vdpa_net_svq_add(VhostShadowVirtqueue *svq,

+   const struct iovec *dev_buffers)



The name should be tweaked since it is used only for cvq.



+{
+/* in buffer used for device model */
+virtio_net_ctrl_ack status;
+const struct iovec in = {
+.iov_base = &status,
+.iov_len = sizeof(status),
+};
+size_t dev_written;
+int r;
+void *unused = (void *)1;
+
+r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, unused);
+if (unlikely(r != 0)) {
+if (unlikely(r == -ENOSPC)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
+  __func__);
+}
+return VIRTIO_NET_ERR;
+}
+
+/*
+ * We can poll here since we've had BQL from the time we sent the
+ * descriptor. Also, we need to take the answer before SVQ pulls by itself,
+ * when BQL is released
+ */



This reminds me that, do we need a upper limit of the time on the 
polling here. (Avoid taking BQL for too long time).


Thanks



+dev_written = vhost_svq_poll(svq);
+if (unlikely(dev_written < sizeof(status))) {
+error_report("Insufficient written data (%zu)", dev_written);
+return VIRTIO_NET_ERR;
+}
+
+memcpy(&status, dev_buffers[1].iov_base, sizeof(status));
+if (status != VIRTIO_NET_OK) {
+return VIRTIO_NET_ERR;
+}
+
+status = VIRTIO_NET_ERR;
+virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, dev_buffers, 1);
+if (status != VIRTIO_NET_OK) {
+error_report("Bad CVQ processing in model");
+return VIRTIO_NET_ERR;
+}
+
+return VIRTIO_NET_OK;
+}
+
  static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v)
  {
  struct vhost_vring_state state = {
@@ -392,19 +440,13 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
  void *opaque)
  {
  VhostVDPAState *s = opaque;
-size_t in_len, dev_written;
+size_t in_len;
  virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
  /* out and in buffers sent to the device */
  struct iovec dev_buffers[2] = {
  { .iov_base = s->cvq_cmd_out_buffer },
  { .iov_base = s->cvq_cmd_in_buffer },
  };
-/* in buffer used for device model */
-const struct iovec in = {
-.iov_base = &status,
-.iov_len = sizeof(status),
-};
-int r;
  bool ok;
  
  ok = vhost_vdpa_net_cvq_map_elem(s, elem, dev_buffers);

@@ -417,36 +459,7 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
  goto out;
  }
  
-r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, elem);

-if (unlikely(r != 0)) {
-if (unlikely(r == -ENOSPC)) {
-qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
-  __func__);
-}
-goto out;
-}
-
-/*
- * We can poll here since we've had BQL from the time we sent the
- * descriptor. Also, we need to take the answer before SVQ pulls by itself,
- * when BQL is released
- */
-dev_written = vhost_svq_poll(svq);
-if (unlikely(dev_written < sizeof(status))) {
-error_report("Insufficient written data (%zu)", dev_written);
-goto out;
-}
-
-memcpy(&status, dev_buffers[1].iov_base, sizeof(status));
-if (status != VIRTIO_NET_OK) {
-goto out;
-}
-
-status = VIRTIO_NET_ERR;
-virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, dev_buffers, 1);
-if (status != VIRTIO_NET_OK) {
-error_report("Bad CVQ processing in model");
-}
+status = vhost_vdpa_net_svq_add(svq, dev_buffers);
  
  out:

  in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status,
@@ -462,7 +475,7 @@ out:
  if (dev_buffers[1].iov_base) {
  vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, dev_buffers[1].iov_base);
  }
-return r;
+return status == VIRTIO_NET_OK ? 0 : 1;
  }
  
  static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {

Re: [PATCH v2 1/1] python/machine: Fix AF_UNIX path too long on macOS

2022-07-18 Thread Daniel P . Berrangé

On Sat, Jul 16, 2022 at 10:34:34AM -0700, Peter Delevoryas wrote:
> On macOS, private $TMPDIR's are the default. These $TMPDIR's are
> generated from a user's unix UID and UUID [1], which can create a
> relatively long path:
> 
> /var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T/
> 
> QEMU's avocado tests create a temporary directory prefixed by
> "avo_qemu_sock_", and create QMP sockets within _that_ as well.
> The QMP socket is unnecessarily long, because a temporary directory
> is created for every QEMUMachine object.
> 
> /avo_qemu_sock_uh3w_dgc/qemu-37331-10bacf110-monitor.sock


Looking at this again, I realize my suggestion for dealing with the
second part of the path was mistaken.

The "qemu-37331-10bacf110-monitor.sock" part is combining two
pieces.

First the result of

  f"qemu-{os.getpid()}-{id(self):02x}"

is

  qemu-37331-10bacf110

and the code later than appends '-monitor.sock'

So...

> 
> The path limit for unix sockets on macOS is 104: [2]
> 
> /*
>  * [XSI] Definitions for UNIX IPC domain.
>  */
> struct  sockaddr_un {

> unsigned char   sun_len;/* sockaddr len including null */
> sa_family_t sun_family; /* [XSI] AF_UNIX */
> charsun_path[104];  /* [XSI] path name (gag) */
> };
> 
> This results in avocado tests failing on macOS because the QMP unix
> socket can't be created, because the path is too long:
> 
> ERROR| Failed to establish connection: OSError: AF_UNIX path too long
> 
> This change reduces the size of both paths, and removes the unique
> identification information from the socket name, since it seems to be
> unnecessary.
> 
> This commit produces paths like the following:
> 
> pdel@pdel-mbp:/var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T
> $ tree qemu*
> qemu_oc7h7f3u
> ├── qmp-console.sock
> └── qmp-monitor.sock
> 
> [1] 
> https://apple.stackexchange.com/questions/353832/why-is-mac-osx-temp-directory-in-weird-path
> [2] 
> /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/usr/include/sys/un.h
> 
> Signed-off-by: Peter Delevoryas 
> ---
>  python/qemu/machine/machine.py | 2 +-
>  tests/avocado/avocado_qemu/__init__.py | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index 37191f433b..b1823966b3 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -157,7 +157,7 @@ def __init__(self,
>  self._wrapper = wrapper
>  self._qmp_timer = qmp_timer
>  
> -self._name = name or f"qemu-{os.getpid()}-{id(self):02x}"
> +self._name = name or "qmp"

...my suggestion here was wrong.

We don't need the os.getpid() unoiqueness because the tmpdir already
ensures that is safe, but keeping 'id(self)' is a good idea, if the
test case creates multiple machines concurrently. Bearing in mind we
later append '-monitor.sock' we don't need 'qmp' in the self._name.

So on reflection I think I should have suggested using:

self._name = name or f"{id(self):02x}"

And *in addition*, a few lines later change:

self._monitor_address = os.path.join(
self.sock_dir, f"{self._name}-monitor.sock"
)

To

self._monitor_address = os.path.join(
self.sock_dir, f"{self._name}.qmp"
)


> diff --git a/tests/avocado/avocado_qemu/__init__.py 
> b/tests/avocado/avocado_qemu/__init__.py
> index ed4853c805..43b8c8848c 100644
> --- a/tests/avocado/avocado_qemu/__init__.py
> +++ b/tests/avocado/avocado_qemu/__init__.py
> @@ -296,7 +296,7 @@ def require_accelerator(self, accelerator):
>  "available" % accelerator)
>  
>  def _new_vm(self, name, *args):
> -self._sd = tempfile.TemporaryDirectory(prefix="avo_qemu_sock_")
> +self._sd = tempfile.TemporaryDirectory(prefix="qemu_")

This bit is fine.

>  vm = QEMUMachine(self.qemu_bin, base_temp_dir=self.workdir,
>   sock_dir=self._sd.name, log_dir=self.logdir)
>  self.log.debug('QEMUMachine "%s" created', name)

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH 11/12] vdpa: Add virtio-net mac address via CVQ at start




在 2022/7/16 19:34, Eugenio Pérez 写道:

This is needed so the destination vdpa device see the same state a the
guest set in the source.

Signed-off-by: Eugenio Pérez 
---
  net/vhost-vdpa.c | 53 +++-
  1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 0183fce353..2873be2ba4 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -383,7 +383,7 @@ static virtio_net_ctrl_ack 
vhost_vdpa_net_svq_add(VhostShadowVirtqueue *svq,
  return VIRTIO_NET_OK;
  }
  
-static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v)

+static int vhost_vdpa_enable_control_svq(struct vhost_vdpa *v)
  {
  struct vhost_vring_state state = {
  .index = v->dev->vq_index,
@@ -395,6 +395,57 @@ static int vhost_vdpa_start_control_svq(struct vhost_vdpa 
*v)
  return r < 0 ? -errno : r;
  }
  
+static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v)

+{
+
+VirtIONet *n = VIRTIO_NET(v->dev->vdev);
+uint64_t features = v->dev->vdev->host_features;
+VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, 0);
+VhostVDPAState *s = container_of(v, VhostVDPAState, vhost_vdpa);
+int r;
+
+r = vhost_vdpa_enable_control_svq(v);
+if (unlikely(r < 0)) {
+return r;
+}
+
+if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+const struct virtio_net_ctrl_hdr ctrl = {
+.class = VIRTIO_NET_CTRL_MAC,
+.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
+};
+uint8_t mac[6];
+const struct iovec out[] = {
+{
+.iov_base = (void *)&ctrl,
+.iov_len = sizeof(ctrl),
+},{
+.iov_base = mac,
+.iov_len = sizeof(mac),
+},
+};
+struct iovec dev_buffers[2] = {
+{ .iov_base = s->cvq_cmd_out_buffer },
+{ .iov_base = s->cvq_cmd_in_buffer },
+};
+bool ok;
+virtio_net_ctrl_ack state;
+
+ok = vhost_vdpa_net_cvq_map_sg(s, out, ARRAY_SIZE(out), dev_buffers);
+if (unlikely(!ok)) {
+return -1;
+}
+
+memcpy(mac, n->mac, sizeof(mac));
+state = vhost_vdpa_net_svq_add(svq, dev_buffers);
+vhost_vdpa_cvq_unmap_buf(v, dev_buffers[0].iov_base);
+vhost_vdpa_cvq_unmap_buf(v, dev_buffers[1].iov_base);



Any reason we do per buffer unmap instead of the sg unmap here?

Thanks



+return state == VIRTIO_NET_OK ? 0 : 1;
+}
+
+return 0;
+}
+
  /**
   * Do not forward commands not supported by SVQ. Otherwise, the device could
   * accept it and qemu would not know how to update the device model.

Re: Internal MAC addresses list (mac_table) usage

On Thu, Jul 14, 2022 at 7:54 PM Ovchinnikov, Vitalii
 wrote:
>
> Hi Jason,
>
> Thanks for pointing out that corner case with "52:54:00:12:34:XX".
>
> In the NIC model I'm developing qemu_macaddr_default_if_unset is called every 
> time MAC is updated in the NIC registers.
> This way a just assigned "52:54:00:12:34:XX" MAC is at least marked as used 
> in the mac_table.
>
> However it doesn't cover the case when "52:54:00:12:34:XX" MAC being assigned 
> through NIC registers has already been assigned to another NIC by QEMU.

This should be fine, and it needs to be addressed in a separate patch.

Thanks

> So one more improvement the code might need is a way to check whether MAC is 
> free or used from within NIC model.
> Returning bool from qemu_macaddr_default_if_unset may well do the trick. 
> Moreover it might also help to spot an error when -1 is returned from 
> qemu_macaddr_get_free (for the time being it's silently interpreted as 0xFF 
> MAC LSB).
>
> BR,
> Vitalii
>
> From: Jason Wang 
> Sent: Thursday, July 14, 2022 9:44
> To: Ovchinnikov, Vitalii
> Cc: qemu-devel@nongnu.org
> Subject: Re: Internal MAC addresses list (mac_table) usage
>
> On Tue, Jul 12, 2022 at 4:43 PM Ovchinnikov, Vitalii
>  wrote:
> >
> > Hi folks,
> >
> > While developing an Ethernet NIC model I noticed that QEMU maintains the 
> > following internal array which marks used/free MAC addresses in net/net.c:
> >
> > static int mac_table[256] = {0};
> >
> > with three private (static) functions accessing it: qemu_macaddr_set_used, 
> > qemu_macaddr_set_free, qemu_macaddr_get_free.
> > Public (non-static) interface to this array includes two functions: 
> > qemu_macaddr_default_if_unset and qemu_del_nic.
> >
> > The vast majority of existing NIC models calls 
> > qemu_macaddr_default_if_unset in their *_realize functions replacing 
> > zeroed-out MAC address with the free one returned by QEMU, for instance 
> > (lan9118_realize functions from hw/net/lan9118.c):
> >
> >...
> > qemu_macaddr_default_if_unset(&s->conf.macaddr);
> >
> > s->nic = qemu_new_nic(&net_lan9118_info, &s->conf,
> >   object_get_typename(OBJECT(dev)), dev->id, s);
> > qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
> >...
> >
> > qemu_del_nic is being called from net_cleanup function right before QEMU 
> > finishes execution.
> >
> > What appears to be a possible SW architecture gap is that NIC models have 
> > no means to inform QEMU about changing their MAC addresses during execution 
> > (again from hw/net/lan9118.c, do_mac_write function):
> >
> > case MAC_ADDRH:
> > s->conf.macaddr.a[4] = val & 0xff;
> > s->conf.macaddr.a[5] = (val >> 8) & 0xff;
> > lan9118_mac_changed(s);
> > break;
> > case MAC_ADDRL:
> > s->conf.macaddr.a[0] = val & 0xff;
> > s->conf.macaddr.a[1] = (val >> 8) & 0xff;
> > s->conf.macaddr.a[2] = (val >> 16) & 0xff;
> > s->conf.macaddr.a[3] = (val >> 24) & 0xff;
> > lan9118_mac_changed(s);
> > break;
> >
> > lan9118_mac_changed function here simply changes NIC info string using 
> > qemu_format_nic_info_str, hence stale MAC address stays marked as used in 
> > the mac_table whereas it's not actually in use any more.
> >
> > Am I right in thinking of it as a SW architecture gap/bug that needs to be 
> > addressed?
>
> I think so. Note that the code can not deal with the case when
> "52:54:00:12:34:XX" was passed from cli.
>
> Thanks
>
> >
> > BR,
> > Vitalii
> >
>
>
>

Re: [PATCH v4 2/2] ui/gtk: a new array param monitor to specify the target displays

Dongwon Kim  writes:

> On Tue, Jul 12, 2022 at 08:11:08AM +0200, Markus Armbruster wrote:
>> Dongwon Kim  writes:
>> 
>> > New integer array parameter, 'monitor' is for specifying the target
>> > monitors where individual GTK windows are placed upon launching.
>> >
>> > Monitor numbers in the array are associated with virtual consoles
>> > in the order of [VC0, VC1, VC2 ... VCn].
>> >
>> > Every GTK window containing each VC will be placed in the region
>> > of corresponding monitors.
>> >
>> > Usage: -display gtk,monitor.=,..
>> >ex)-display gtk,monitor.0=1,monitor.1=0
>> >
>> > Cc: Daniel P. Berrangé 
>> > Cc: Markus Armbruster 
>> > Cc: Philippe Mathieu-Daudé 
>> > Cc: Paolo Bonzini 
>> > Cc: Gerd Hoffmann 
>> > Cc: Vivek Kasireddy 
>> > Signed-off-by: Dongwon Kim 
>> > ---
>> >  qapi/ui.json|  9 -
>> >  qemu-options.hx |  3 ++-
>> >  ui/gtk.c| 30 --
>> >  3 files changed, 38 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/qapi/ui.json b/qapi/ui.json
>> > index 413371d5e8..ee0f9244ef 100644
>> > --- a/qapi/ui.json
>> > +++ b/qapi/ui.json
>> > @@ -1195,12 +1195,19 @@
>> >  #   assuming the guest will resize the display to match
>> >  #   the window size then.  Otherwise it defaults to "off".
>> >  #   Since 3.1
>> > +# @monitor: Array of numbers, each of which represents physical
>> > +#   monitor where GTK window containing a given VC will be
>> > +#   placed. Each monitor number in the array will be
>> > +#   associated with a virtual console starting from VC0.
>> > +#
>> > +#   since 7.1
>> 
>> I dislike repeating the type (here: array of numbers) in the
>> description.
>> 
>> Suggest something like
>> 
>># @monitor: List of physical monitor numbers where the GTK windows
>>#   containing the virtual consoles VC0, VC1, ... are to be
>>#   placed.  (Since 7.1)
>> 
>> Missing: what happens when there are more VCs than list elements.  Can
>> you tell us?
>
> # @monitor: List of physical monitor numbers where the GTK windows
> #   containing the virtual consoles VC0, VC1, ... are to be
> #   placed. If a mapping exists for a VC, then it'd be
> #   placed on that specific physical monitor; otherwise,
> #   it'd default to the monitor from where it was launched
> #   since 7.1
>
> How does this look?

Pretty good!

Nitpicks: replace "id'd" by "it will" or "it is to be", end your second
sentence with a period, and format "since" like we do elsewhere.
Together:

  # @monitor: List of physical monitor numbers where the GTK windows
  #   containing the virtual consoles VC0, VC1, ... are to be
  #   placed. If a mapping exists for a VC, then it is to be
  #   placed on that specific physical monitor; otherwise,
  #   it defaults to the monitor from where it was launched.
  #   (Since 7.1)

>> 
>> >  #
>> >  # Since: 2.12
>> >  ##
>> >  { 'struct'  : 'DisplayGTK',
>> >'data': { '*grab-on-hover' : 'bool',
>> > -'*zoom-to-fit'   : 'bool'  } }
>> > +'*zoom-to-fit'   : 'bool',
>> > +'*monitor'   : ['uint16']  } }
>> >  
>> >  ##
>> >  # @DisplayEGLHeadless:
>> 
>> [...]
>>

Re: [RFC PATCH 11/12] vdpa: Add virtio-net mac address via CVQ at start

On Mon, Jul 18, 2022 at 10:55 AM Jason Wang  wrote:
>
>
> 在 2022/7/16 19:34, Eugenio Pérez 写道:
> > This is needed so the destination vdpa device see the same state a the
> > guest set in the source.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   net/vhost-vdpa.c | 53 +++-
> >   1 file changed, 52 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 0183fce353..2873be2ba4 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -383,7 +383,7 @@ static virtio_net_ctrl_ack 
> > vhost_vdpa_net_svq_add(VhostShadowVirtqueue *svq,
> >   return VIRTIO_NET_OK;
> >   }
> >
> > -static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v)
> > +static int vhost_vdpa_enable_control_svq(struct vhost_vdpa *v)
> >   {
> >   struct vhost_vring_state state = {
> >   .index = v->dev->vq_index,
> > @@ -395,6 +395,57 @@ static int vhost_vdpa_start_control_svq(struct 
> > vhost_vdpa *v)
> >   return r < 0 ? -errno : r;
> >   }
> >
> > +static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v)
> > +{
> > +
> > +VirtIONet *n = VIRTIO_NET(v->dev->vdev);
> > +uint64_t features = v->dev->vdev->host_features;
> > +VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, 0);
> > +VhostVDPAState *s = container_of(v, VhostVDPAState, vhost_vdpa);
> > +int r;
> > +
> > +r = vhost_vdpa_enable_control_svq(v);
> > +if (unlikely(r < 0)) {
> > +return r;
> > +}
> > +
> > +if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
> > +const struct virtio_net_ctrl_hdr ctrl = {
> > +.class = VIRTIO_NET_CTRL_MAC,
> > +.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
> > +};
> > +uint8_t mac[6];
> > +const struct iovec out[] = {
> > +{
> > +.iov_base = (void *)&ctrl,
> > +.iov_len = sizeof(ctrl),
> > +},{
> > +.iov_base = mac,
> > +.iov_len = sizeof(mac),
> > +},
> > +};
> > +struct iovec dev_buffers[2] = {
> > +{ .iov_base = s->cvq_cmd_out_buffer },
> > +{ .iov_base = s->cvq_cmd_in_buffer },
> > +};
> > +bool ok;
> > +virtio_net_ctrl_ack state;
> > +
> > +ok = vhost_vdpa_net_cvq_map_sg(s, out, ARRAY_SIZE(out), 
> > dev_buffers);
> > +if (unlikely(!ok)) {
> > +return -1;
> > +}
> > +
> > +memcpy(mac, n->mac, sizeof(mac));
> > +state = vhost_vdpa_net_svq_add(svq, dev_buffers);
> > +vhost_vdpa_cvq_unmap_buf(v, dev_buffers[0].iov_base);
> > +vhost_vdpa_cvq_unmap_buf(v, dev_buffers[1].iov_base);
>
>
> Any reason we do per buffer unmap instead of the sg unmap here?
>

I think I don't get this comment.

vhost_vdpa_net_handle_ctrl_avail also unmap each buffer individually,
and I need a function to unmap one of them at a time. I could create a
function to unmap a whole sg, but I'm not sure how much value it adds.

Thanks!

> Thanks
>
>
> > +return state == VIRTIO_NET_OK ? 0 : 1;
> > +}
> > +
> > +return 0;
> > +}
> > +
> >   /**
> >* Do not forward commands not supported by SVQ. Otherwise, the device 
> > could
> >* accept it and qemu would not know how to update the device model.
>

Re: [RFC PATCH 3/8] RFC: block: use transactions as a replacement of ->{can_}set_aio_context()

2022-07-18 Thread Emanuele Giuseppe Esposito




Am 14/07/2022 um 18:45 schrieb Hanna Reitz:
>> + * First, go recursively through all nodes in the graph, and see if they
>> + * can change their AioContext.
>> + * If so, add for each node a new transaction with a callback to
>> change the
>> + * AioContext with the new one.
>> + * Once recursion is completed, if all nodes are allowed to change their
>> + * AioContext then commit the transaction, running all callbacks in
>> order.
>> + * Otherwise don't do anything.
> 
> Nice explanation, but I’d start with something more simple to describe
> the external interface, like “Change bs's and recursively all of its
> parents' and children's AioContext to the given new context, returning
> an error if that isn't possible.  If ignore_child is not NULL, that
> child (and its subgraph) will not be touched.”
You want to have your suggestion as a replacement or addition to mine?
> 
>> + *
>> + * This function still requires the caller to take the bs current
>> + * AioContext lock, otherwise draining could fail since AIO_WAIT_WHILE
> 
> Well, let’s just say “will” instead of “could”.  Callers don’t need to
> know they can get lucky when they ignore the rules.
> 
>> + * assumes the lock is always held if bs is in another AioContext.
>> + */
>> +int bdrv_child_try_change_aio_context(BlockDriverState *bs,
>> AioContext *ctx,
>> +  BdrvChild *ignore_child, Error
>> **errp)
> 
> Do the other functions need to be public?  Outside of this file, this
> one seems sufficient to me, but I may well be overlooking something.
> 
> Also, why is this function called bdrv_child_*, and not just
> bdrv_try_change_aio_context()?  (Or maybe have a
> bdrv_try_change_aio_context() variant without the ignore_child
> parameter, just like bdrv_try_set_aio_context()?)
> 
> Actually, wouldn’t just bdrv_change_aio_context() be sufficient?  I
> mean, it isn’t called bdrv_try_co_pwritev().  Most functions try to do
> something and return errors if they can’t.  Not sure if we need to make
> that clear in the name.
> 
> (I may be wrong, but I suspect bdrv_try_set_aio_context() is only named
> such that the compiler will check that the bdrv_set_aio_context()
> interface is completely unused;
> 42a65f02f9b380bd8074882d5844d4ea033389cc’s commit message does seem to
> confirm that.)

Ok, I will change the name in bdrv_change_aio_context. And I now
understand your suggestion on the last patch to remove
bdrv_try_set_aio_context.

Emanuele

Re: [RFC PATCH 07/12] vhost: Add VhostVDPAStartOp operation

On Mon, Jul 18, 2022 at 10:50 AM Jason Wang  wrote:
>
> On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez  wrote:
> >
> > It allows to run commands at start of the device, before it have enabled
> > any queue.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  hw/virtio/vhost-shadow-virtqueue.h | 3 +++
> >  include/hw/virtio/vhost-vdpa.h | 5 +
> >  hw/virtio/vhost-vdpa.c | 8 
> >  3 files changed, 16 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > b/hw/virtio/vhost-shadow-virtqueue.h
> > index 03eb7ff670..210fe393cd 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -26,6 +26,8 @@ typedef struct SVQDescState {
> >  } SVQDescState;
> >
> >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > +typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
> > +void *opaque);
> >
> >  /**
> >   * Callback to handle an avail buffer.
> > @@ -43,6 +45,7 @@ typedef int 
> > (*VirtQueueAvailCallback)(VhostShadowVirtqueue *svq,
> >void *vq_callback_opaque);
> >
> >  typedef struct VhostShadowVirtqueueOps {
> > +ShadowVirtQueueStart start;
>
> What's the difference between this and start_op?
>

This is a leftover, I'll delete for the next series.

Thanks!

> Thanks
>
> >  VirtQueueAvailCallback avail_handler;
> >  } VhostShadowVirtqueueOps;
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index d10a89303e..b7d18b4e30 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -24,6 +24,10 @@ typedef struct VhostVDPAHostNotifier {
> >  void *addr;
> >  } VhostVDPAHostNotifier;
> >
> > +struct vhost_vdpa;
> > +/* Called after send DRIVER_OK but after enabling vrings */
> > +typedef int (*VhostVDPAStartOp)(struct vhost_vdpa *v);
> > +
> >  typedef struct vhost_vdpa {
> >  int device_fd;
> >  int index;
> > @@ -39,6 +43,7 @@ typedef struct vhost_vdpa {
> >  GPtrArray *shadow_vqs;
> >  const VhostShadowVirtqueueOps *shadow_vq_ops;
> >  void *shadow_vq_ops_opaque;
> > +VhostVDPAStartOp start_op;
> >  struct vhost_dev *dev;
> >  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >  } VhostVDPA;
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 1d8829c619..48f031b8c0 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1136,6 +1136,14 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
> > *dev, bool started)
> >  if (unlikely(r)) {
> >  return r;
> >  }
> > +
> > +if (v->start_op) {
> > +r = v->start_op(v);
> > +if (unlikely(r)) {
> > +return r;
> > +}
> > +}
> > +
> >  vhost_vdpa_set_vring_ready(dev);
> >  } else {
> >  vhost_vdpa_reset_device(dev);
> > --
> > 2.31.1
> >
>

Re: [PATCH 1/2] block/parallels: Fix buffer-based write call

2022-07-18 Thread Denis V. Lunev


On 14.07.2022 15:28, Hanna Reitz wrote:

Commit a4072543ccdddbd241d5962d9237b8b41fd006bf has changed the I/O here
from working on a local one-element I/O vector to just using the buffer
directly (using the bdrv_co_pread()/bdrv_co_pwrite() helper functions
introduced shortly before).

However, it only changed the bdrv_co_preadv() call to bdrv_co_pread() -
the subsequent bdrv_co_pwritev() call stayed this way, and so still
expects a QEMUIOVector pointer instead of a plain buffer.  We must
change that to be a bdrv_co_pwrite() call.

Fixes: a4072543ccdddbd241d5962d ("block/parallels: use buffer-based io")
Signed-off-by: Hanna Reitz 
---
  block/parallels.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8b235b9505..a229c06f25 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -241,8 +241,8 @@ static int64_t allocate_clusters(BlockDriverState *bs, 
int64_t sector_num,
  return ret;
  }
  
-ret = bdrv_co_pwritev(bs->file, s->data_end * BDRV_SECTOR_SIZE,

-  nb_cow_bytes, buf, 0);
+ret = bdrv_co_pwrite(bs->file, s->data_end * BDRV_SECTOR_SIZE,
+ nb_cow_bytes, buf, 0);
  qemu_vfree(buf);
  if (ret < 0) {
  return ret;

Reviewed-by: Denis V. Lunev

Re: [PATCH 2/2] iotests/131: Add parallels regression test

2022-07-18 Thread Denis V. Lunev


On 14.07.2022 15:28, Hanna Reitz wrote:

Test an allocating write to a parallels image that has a backing node.
Before HEAD^, doing so used to give me a failed assertion (when the
backing node contains only `42` bytes; the results varies with the value
chosen, for `0` bytes, for example, all I get is EIO).

Signed-off-by: Hanna Reitz 
---
  tests/qemu-iotests/131 | 35 ++-
  tests/qemu-iotests/131.out | 13 +
  2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131
index d7456cae5b..a847692b4c 100755
--- a/tests/qemu-iotests/131
+++ b/tests/qemu-iotests/131
@@ -43,7 +43,7 @@ _supported_os Linux
  
  inuse_offset=$((0x2c))
  
-size=64M

+size=$((64 * 1024 * 1024))
  CLUSTER_SIZE=64k
  IMGFMT=parallels
  _make_test_img $size
@@ -70,6 +70,39 @@ _check_test_img
  _check_test_img -r all
  { $QEMU_IO -c "read -P 0x11 64k 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
  
+echo "== allocate with backing =="

+# Verify that allocating clusters works fine even when there is a backing 
image.
+# Regression test for a bug where we would pass a buffer read from the backing
+# node as a QEMUIOVector object, which could cause anything from I/O errors 
over
+# assertion failures to invalid reads from memory.
+
+# Clear image
+_make_test_img $size
+# Create base image
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+
+# Write some data to the base image (which would trigger an assertion failure 
if
+# interpreted as a QEMUIOVector)
+$QEMU_IO -c 'write -P 42 0 64k' "$TEST_IMG.base" | _filter_qemu_io
+
+# Parallels does not seem to support storing a backing filename in the image
+# itself, so we need to build our backing chain on the command line
+imgopts="driver=$IMGFMT,file.driver=$IMGPROTO,file.filename=$TEST_IMG"
+imgopts+=",backing.driver=$IMGFMT"
+imgopts+=",backing.file.driver=$IMGPROTO,backing.file.filename=$TEST_IMG.base"
+
+# Cause allocation in the top image
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \
+$QEMU_IO --image-opts "$imgopts" -c 'write -P 1 0 64' | _filter_qemu_io
+
+# Verify
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \
+$QEMU_IO --image-opts "$imgopts" \
+-c 'read -P 1 0 64' \
+-c "read -P 42 64 $((64 * 1024 - 64))" \
+-c "read -P 0 64k $((size - 64 * 1024))" \
+| _filter_qemu_io
+
  # success, all done
  echo "*** done"
  rm -f $seq.full
diff --git a/tests/qemu-iotests/131.out b/tests/qemu-iotests/131.out
index 70da03dee5..de5ef7a8f5 100644
--- a/tests/qemu-iotests/131.out
+++ b/tests/qemu-iotests/131.out
@@ -37,4 +37,17 @@ Double checking the fixed image now...
  No errors were found on the image.
  read 65536/65536 bytes at offset 65536
  64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== allocate with backing ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 64/64 bytes at offset 0
+64 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 64/64 bytes at offset 0
+64 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65472/65472 bytes at offset 64
+63.938 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 67043328/67043328 bytes at offset 65536
+63.938 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  *** done

Reviewed-by: Denis V. Lunev

Re: [PATCH 0/3] target: RFC: display deprecation note for '-cpu help'


On 14/07/2022 17.07, Daniel P. Berrangé wrote:

When querying '-cpu help' there is no presentation of fact that a
CPU may be deprecated. The user just has to try it and see if they
get a depecation message at runtime.  The QMP command for querying
CPUs report a deprecation bool flag, but not the explanatory
reason.

The Icelake-Client CPU (removed in 6df39f5e583ca0f67bd934d1327f9ead2e3bd49c)
handled this by modifying the '.notes' section to add the word
'deprecated':

 {
 .version = 2,
 .note = "no TSX, deprecated",
 .alias = "Icelake-Client-noTSX",
 .props = (PropValue[]) {
 { "hle", "off" },
 { "rtm", "off" },
 { /* end of list */ }
 },
 },

This relies on the person deprecating the CPU to remember to do this,
and is redundant when this info is already expressed in the
'.deprecation_note' field.

This short series suggests just modifying the '-cpu help'
formatter so that it displays the full deprecation message

eg

$ qemu-system-x86_64 -cpu help:
Available CPUs:
x86 486   (alias configured by machine type) (deprecated: use 
at least 'Nehalem' / 'Opteron_G4', or 'host' / 'max')

I wonder if this is too verbose, and we should just do a
concise flag like approach, similar to QMP:

$ qemu-system-x86_64 -cpu help:
Available CPUs:
x86 486   (alias configured by machine type) (deprecated)

leaving the full message to be displayed at runtime ? I'm slightly
inclined to the simpler more concise output.


I'd prefer to keep it short here and just write "deprecated" without the 
reason. Otherwise this will overflow the lines and break the readability of 
the output. And it's also what we're also doing for "-machine", e.g.:


$ ./qemu-system-ppc64 -M help | grep deprecate
taihutaihu (deprecated)
$ ./qemu-system-ppc64 -M taihu
qemu-system-ppc64: warning: Machine type 'taihu' is deprecated: incomplete, 
use 'ref405ep' instead


 Thomas

[PULL 0/1] MAINTAINERS: Add myself as Guest Agent co-maintainer



The following changes since commit 0ebf76aae58324b8f7bf6af798696687f5f4c2a9:

  Merge tag 'nvme-next-pull-request' of git://git.infradead.org/qemu-nvme into 
staging (2022-07-15 15:38:13 +0100)

are available in the Git repository at:

  g...@github.com:kostyanf14/qemu.git tags/qga-win32-pull-2022-07-18

for you to fetch changes up to e8cbe5842bad80bc2df692dea70134da0d13c556:

  MAINTAINERS: Add myself as Guest Agent co-maintainer (2022-07-18 11:56:09 
+0300)


qga-win32-pull-2022-07-18


Konstantin Kostiuk (1):
  MAINTAINERS: Add myself as Guest Agent co-maintainer

 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


--
2.25.1

[PULL 1/1] MAINTAINERS: Add myself as Guest Agent co-maintainer

Signed-off-by: Konstantin Kostiuk 
Acked-by: Michael Roth 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ead2bed652..6af9cd985c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2881,7 +2881,7 @@ T: git https://repo.or.cz/qemu/armbru.git qapi-next
 
 QEMU Guest Agent
 M: Michael Roth 
-R: Konstantin Kostiuk 
+M: Konstantin Kostiuk 
 S: Maintained
 F: qga/
 F: docs/interop/qemu-ga.rst
-- 
2.25.1

Re: [PATCH 0/3] target: RFC: display deprecation note for '-cpu help'

2022-07-18 Thread Cornelia Huck

On Mon, Jul 18 2022, Thomas Huth  wrote:

> On 14/07/2022 17.07, Daniel P. Berrangé wrote:
>> $ qemu-system-x86_64 -cpu help:
>> Available CPUs:
>> x86 486   (alias configured by machine type) (deprecated: 
>> use at least 'Nehalem' / 'Opteron_G4', or 'host' / 'max')
>> 
>> I wonder if this is too verbose, and we should just do a
>> concise flag like approach, similar to QMP:
>> 
>> $ qemu-system-x86_64 -cpu help:
>> Available CPUs:
>> x86 486   (alias configured by machine type) (deprecated)
>> 
>> leaving the full message to be displayed at runtime ? I'm slightly
>> inclined to the simpler more concise output.
>
> I'd prefer to keep it short here and just write "deprecated" without the 
> reason. Otherwise this will overflow the lines and break the readability of 
> the output. And it's also what we're also doing for "-machine", e.g.:
>
> $ ./qemu-system-ppc64 -M help | grep deprecate
> taihutaihu (deprecated)
> $ ./qemu-system-ppc64 -M taihu
> qemu-system-ppc64: warning: Machine type 'taihu' is deprecated: incomplete, 
> use 'ref405ep' instead

Ok, following what -machine does is certainly a good point.

Is it easy enough the figure out the deprecation note? I think you
either have to actually start something with the deprecated entity, or
use qmp (which is not that straightforward)?

Re: [PATCH 0/3] target: RFC: display deprecation note for '-cpu help'

2022-07-18 Thread Daniel P . Berrangé

On Mon, Jul 18, 2022 at 11:37:35AM +0200, Cornelia Huck wrote:
> On Mon, Jul 18 2022, Thomas Huth  wrote:
> 
> > On 14/07/2022 17.07, Daniel P. Berrangé wrote:
> >> $ qemu-system-x86_64 -cpu help:
> >> Available CPUs:
> >> x86 486   (alias configured by machine type) (deprecated: 
> >> use at least 'Nehalem' / 'Opteron_G4', or 'host' / 'max')
> >> 
> >> I wonder if this is too verbose, and we should just do a
> >> concise flag like approach, similar to QMP:
> >> 
> >> $ qemu-system-x86_64 -cpu help:
> >> Available CPUs:
> >> x86 486   (alias configured by machine type) (deprecated)
> >> 
> >> leaving the full message to be displayed at runtime ? I'm slightly
> >> inclined to the simpler more concise output.
> >
> > I'd prefer to keep it short here and just write "deprecated" without the 
> > reason. Otherwise this will overflow the lines and break the readability of 
> > the output. And it's also what we're also doing for "-machine", e.g.:
> >
> > $ ./qemu-system-ppc64 -M help | grep deprecate
> > taihutaihu (deprecated)
> > $ ./qemu-system-ppc64 -M taihu
> > qemu-system-ppc64: warning: Machine type 'taihu' is deprecated: incomplete, 
> > use 'ref405ep' instead
> 
> Ok, following what -machine does is certainly a good point.

Yes, I should have thought to check what -machine does, it makese
sense to be consistent.

> Is it easy enough the figure out the deprecation note? I think you
> either have to actually start something with the deprecated entity, or
> use qmp (which is not that straightforward)?

QMP doesn't tell you the note, just a boolean deprecation flag. It is
only printed on startup only right now.

In the context of libvirt what happens is that libvirt can report that
something is deprecated (based on the QMP response). If you go ahead
and use it anyway, you'll get the deprecation message in the logfile
for the VM, and the VM gets marked tainted by libvirt, which serves
as a guide to look in the logfile.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

Hao Wu  writes:

> This type is used to represent block devs that are not suitable to
> be represented by other existing types.
>
> A sample use is to represent an at24c eeprom device defined in
> hw/nvram/eeprom_at24c.c. The block device can be used to contain the
> content of the said eeprom device.
>
> Signed-off-by: Hao Wu 

Let me add a bit more history in the hope of helping other reviewers.

v3 of this series[1] used IF_NONE for the EEPROM device.

Peter Maydell questioned that[2], and we concluded it's wrong.  I wrote

[A] board can use any traditional interface type.  If none of them
matches, and common decency prevents use of a non-matching one,
invent a new one.  We could do IF_OTHER.

Thomas Huth cleaned up the existing abuse of IF_NONE to use IF_PFLASH
instead, in commit 6b717a8d44:

hw/misc/sifive_u_otp: Use IF_PFLASH for the OTP device instead of IF_NONE

Configuring a drive with "if=none" is meant for creation of a backend
only, it should not get automatically assigned to a device frontend.
Use "if=pflash" for the One-Time-Programmable device instead (like
it is e.g. also done for the efuse device in hw/arm/xlnx-zcu102.c).

Since the old way of configuring the device has already been published
with the previous QEMU versions, we cannot remove this immediately, but
have to deprecate it and support it for at least two more releases.

Signed-off-by: Thomas Huth 
Acked-by: Philippe Mathieu-Daudé 
Reviewed-by: Markus Armbruster 
Reviewed-by: Alistair Francis 
Message-id: 2029102549.217755-1-th...@redhat.com
Signed-off-by: Alistair Francis 

An OTP device isn't really a parallel flash, and neither are eFuses.
More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of
other interface types, too.

This patch introduces IF_OTHER.  The patch after next uses it for an
EEPROM device.

Do we want IF_OTHER?

If no, I guess we get to abuse IF_PFLASH some more.

If yes, I guess we should use IF_PFLASH only for actual parallel flash
memory going forward.  Cleaning up existing abuse of IF_PFLASH may not
be worth the trouble, though.

Thoughts?

> ---
>  blockdev.c| 4 +++-
>  include/sysemu/blockdev.h | 1 +
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index 9230888e34..befd69ac5f 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -82,6 +82,7 @@ static const char *const if_name[IF_COUNT] = {
>  [IF_MTD] = "mtd",
>  [IF_SD] = "sd",
>  [IF_VIRTIO] = "virtio",
> +[IF_OTHER] = "other",
>  [IF_XEN] = "xen",
>  };
>  
> @@ -726,7 +727,8 @@ QemuOptsList qemu_legacy_drive_opts = {
>  },{
>  .name = "if",
>  .type = QEMU_OPT_STRING,
> -.help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio)",
> +.help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio,"
> +" other)",
>  },{
>  .name = "file",
>  .type = QEMU_OPT_STRING,
> diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
> index 3211b16513..d9dd5af291 100644
> --- a/include/sysemu/blockdev.h
> +++ b/include/sysemu/blockdev.h
> @@ -21,6 +21,7 @@ typedef enum {
>   */
>  IF_NONE = 0,
>  IF_IDE, IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN,
> +IF_OTHER,
>  IF_COUNT
>  } BlockInterfaceType;

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00372.html

[2] Subject: does drive_get_next(IF_NONE) make sense?
Message-ID: 

https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00710.html

Re: [PATCH v5 7/8] hw/arm: Set drive property for at24c eeprom

Hao Wu  writes:

> This patch allows the user to attach an external drive as a property
> for an onboard at24c eeprom device.

What's the contents of the EEPROM before the patch?

I guess the patch lets users replace that contents.  Why would a user
want to do that?

> It uses an unit number to
> distinguish different devices.
>
> Signed-off-by: Hao Wu

Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER


On 18/07/2022 11.49, Markus Armbruster wrote:
[...]

An OTP device isn't really a parallel flash, and neither are eFuses.
More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of
other interface types, too.

This patch introduces IF_OTHER.  The patch after next uses it for an
EEPROM device.

Do we want IF_OTHER?

If no, I guess we get to abuse IF_PFLASH some more.

If yes, I guess we should use IF_PFLASH only for actual parallel flash
memory going forward.  Cleaning up existing abuse of IF_PFLASH may not
be worth the trouble, though.

Thoughts?


Maybe we should simply rename IF_PFLASH to IF_EPROM or IF_FLASH to make it 
clear that it is not only about parallel flashes anymore? Just my 0.02 €.


 Thomas

Re: [PATCH v2] gtk: Add show_tabs=on|off command line option.


On 12/07/2022 15.37, Felix xq Queißner wrote:

The patch adds "show_tabs" command line option for GTK ui similar to
"grab_on_hover". This option allows tabbed view mode to not have to be
enabled by hand at each start of the VM.


Nit: In case you have to respin again, please replace "show_tabs" with 
"show-tabs" and "grab_on_hover" with "grab-on-hover". Anyway:


Reviewed-by: Thomas Huth

Re: [PATCH 0/3] target: RFC: display deprecation note for '-cpu help'

2022-07-18 Thread Cornelia Huck

On Mon, Jul 18 2022, Daniel P. Berrangé  wrote:

> On Mon, Jul 18, 2022 at 11:37:35AM +0200, Cornelia Huck wrote:
>> Is it easy enough the figure out the deprecation note? I think you
>> either have to actually start something with the deprecated entity, or
>> use qmp (which is not that straightforward)?
>
> QMP doesn't tell you the note, just a boolean deprecation flag. It is
> only printed on startup only right now.
>
> In the context of libvirt what happens is that libvirt can report that
> something is deprecated (based on the QMP response). If you go ahead
> and use it anyway, you'll get the deprecation message in the logfile
> for the VM, and the VM gets marked tainted by libvirt, which serves
> as a guide to look in the logfile.

Hm... so, a user who notes via -help that 'foo' is deprecated does not
really have a good way to figure out what they should use instead, other
than actually trying to use 'foo'? Is that a use case worth spending
some effort on, or do we consider it more of a niche case?

[PATCH] target/arm: Add MO_128 entry to pred_esz_masks[]

In commit 7390e0e9ab8475, we added support for SME loads and stores.
Unlike SVE loads and stores, these include handling of 128-bit
elements.  The SME load/store functions call down into the existing
sve_cont_ldst_elements() function, which uses the element size MO_*
value as an index into the pred_esz_masks[] array.  Because this code
path now has to handle MO_128, we need to add an extra element to the
array.

This bug was spotted by Coverity because it meant we were reading off
the end of the array.

Resolves: Coverity CID 1490539, 1490541, 1490543, 1490544, 1490545,
 1490546, 1490548, 1490549, 1490550, 1490551, 1490555, 1490557,
 1490558, 1490560, 1490561, 1490563
Fixes: 7390e0e9ab8475 ("target/arm: Implement SME LD1, ST1")
Signed-off-by: Peter Maydell 
---
 target/arm/cpu.h   | 2 +-
 target/arm/translate-sve.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 1e36a839ee4..3123488014d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3374,7 +3374,7 @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, 
unsigned regno)
 }
 
 /* Shared between translate-sve.c and sve_helper.c.  */
-extern const uint64_t pred_esz_masks[4];
+extern const uint64_t pred_esz_masks[5];
 
 /* Helper for the macros below, validating the argument type. */
 static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs *x)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 41f8b12259e..621a2abb22f 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -529,9 +529,10 @@ static void do_predtest(DisasContext *s, int dofs, int 
gofs, int words)
 }
 
 /* For each element size, the bits within a predicate word that are active.  */
-const uint64_t pred_esz_masks[4] = {
+const uint64_t pred_esz_masks[5] = {
 0xull, 0xull,
-0xull, 0x0101010101010101ull
+0xull, 0x0101010101010101ull,
+0x0001000100010001ull,
 };
 
 static bool trans_INVALID(DisasContext *s, arg_INVALID *a)
-- 
2.25.1

Re: [PATCH 1/7] pci: designware: add 64-bit viewport limit

2022-07-18 Thread Ben Dooks


On 13/07/2022 17:54, Ben Dooks wrote:

Versions 4 and above add support for 64-bit viewport
limit. Add support for the DESIGNWARE_PCIE_ATU_UPPER_LIMIT
regiser where supported.

Signed-off-by: Ben Dooks 


Whoops, just noticed this was my old ct address.


---
  hw/pci-host/designware.c | 22 +-
  include/hw/pci-host/designware.h |  2 +-
  2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index bde3a343a2..296f1b9760 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -54,6 +54,7 @@
  #define DESIGNWARE_PCIE_ATU_BUS(x) (((x) >> 24) & 0xff)
  #define DESIGNWARE_PCIE_ATU_DEVFN(x)   (((x) >> 16) & 0xff)
  #define DESIGNWARE_PCIE_ATU_UPPER_TARGET   0x91C
+#define DESIGNWARE_PCIE_ATU_UPPER_LIMIT0x924
  
  #define DESIGNWARE_PCIE_IRQ_MSI3
  
@@ -196,6 +197,10 @@ designware_pcie_root_config_read(PCIDevice *d, uint32_t address, int len)

  val = viewport->target >> 32;
  break;
  
+case DESIGNWARE_PCIE_ATU_UPPER_LIMIT:

+val = viewport->limit >> 32;
+break;
+
  case DESIGNWARE_PCIE_ATU_LIMIT:
  val = viewport->limit;
  break;
@@ -269,7 +274,7 @@ static void 
designware_pcie_update_viewport(DesignwarePCIERoot *root,
  {
  const uint64_t target = viewport->target;
  const uint64_t base   = viewport->base;
-const uint64_t size   = (uint64_t)viewport->limit - base + 1;
+const uint64_t size   = viewport->limit - base + 1;
  const bool enabled= viewport->cr[1] & DESIGNWARE_PCIE_ATU_ENABLE;
  
  MemoryRegion *current, *other;

@@ -363,14 +368,21 @@ static void designware_pcie_root_config_write(PCIDevice 
*d, uint32_t address,
  viewport->target |= val;
  break;
  
+case DESIGNWARE_PCIE_ATU_UPPER_LIMIT:

+viewport->limit &= 0xUL;
+viewport->limit |= (uint64_t)val << 32;
+break;
+
  case DESIGNWARE_PCIE_ATU_LIMIT:
-viewport->limit = val;
+viewport->limit = 0xULL;
+viewport->limit |= val;
  break;
  
  case DESIGNWARE_PCIE_ATU_CR1:

  viewport->cr[0] = val;
  break;
  case DESIGNWARE_PCIE_ATU_CR2:
+//printf("CR2: value %08x\n", val);
  viewport->cr[1] = val;
  designware_pcie_update_viewport(root, viewport);
  break;
@@ -429,7 +441,7 @@ static void designware_pcie_root_realize(PCIDevice *dev, 
Error **errp)
  viewport->inbound = true;
  viewport->base= 0xULL;
  viewport->target  = 0xULL;
-viewport->limit   = UINT32_MAX;
+viewport->limit   = UINT64_MAX-1;
  viewport->cr[0]   = DESIGNWARE_PCIE_ATU_TYPE_MEM;
  
  source  = &host->pci.address_space_root;

@@ -453,7 +465,7 @@ static void designware_pcie_root_realize(PCIDevice *dev, 
Error **errp)
  viewport->inbound = false;
  viewport->base= 0xULL;
  viewport->target  = 0xULL;
-viewport->limit   = UINT32_MAX;
+viewport->limit   = UINT64_MAX-1;
  viewport->cr[0]   = DESIGNWARE_PCIE_ATU_TYPE_MEM;
  
  destination = &host->pci.memory;

@@ -560,7 +572,7 @@ static const VMStateDescription 
vmstate_designware_pcie_viewport = {
  .fields = (VMStateField[]) {
  VMSTATE_UINT64(base, DesignwarePCIEViewport),
  VMSTATE_UINT64(target, DesignwarePCIEViewport),
-VMSTATE_UINT32(limit, DesignwarePCIEViewport),
+VMSTATE_UINT64(limit, DesignwarePCIEViewport),
  VMSTATE_UINT32_ARRAY(cr, DesignwarePCIEViewport, 2),
  VMSTATE_END_OF_LIST()
  }
diff --git a/include/hw/pci-host/designware.h b/include/hw/pci-host/designware.h
index 6d9b51ae67..bd4dd49aec 100644
--- a/include/hw/pci-host/designware.h
+++ b/include/hw/pci-host/designware.h
@@ -44,7 +44,7 @@ typedef struct DesignwarePCIEViewport {
  
  uint64_t base;

  uint64_t target;
-uint32_t limit;
+uint64_t limit;
  uint32_t cr[2];
  
  bool inbound;

Re: [PATCH] gpio: designware gpio driver

2022-07-18 Thread Ben Dooks


On 13/07/2022 18:20, Ben Dooks wrote:

A model for the DesignWare GPIO (v1) block.


Is there anyone else who should be reviewing these that
was missed off the original list? I'd like to get an idea
if there is any work to do. I've got a couple more drivers
to submit and was waiting on feedback from this before
getting these submitted.

--
Ben

Re: [PATCH] gpio: designware gpio driver

On Mon, 18 Jul 2022 at 11:05, Ben Dooks  wrote:
>
> On 13/07/2022 18:20, Ben Dooks wrote:
> > A model for the DesignWare GPIO (v1) block.
>
> Is there anyone else who should be reviewing these that
> was missed off the original list? I'd like to get an idea
> if there is any work to do. I've got a couple more drivers
> to submit and was waiting on feedback from this before
> getting these submitted.

My overall feedback is: this isn't a pluggable device (PCI, etc),
so what's it intended to be used by? Generally we don't take
device models except when there's a board model that's using them.

thanks
-- PMM

Re: [RFC PATCH 09/12] vdpa: Extract vhost_vdpa_net_svq_add from vhost_vdpa_net_handle_ctrl_avail

On Mon, Jul 18, 2022 at 10:53 AM Jason Wang  wrote:
>
>
> 在 2022/7/16 19:34, Eugenio Pérez 写道:
> > So we can reuse to inject state messages.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   net/vhost-vdpa.c | 89 +++-
> >   1 file changed, 51 insertions(+), 38 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 211bd0468b..aaae51a778 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -334,6 +334,54 @@ static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState 
> > *s,
> >   return true;
> >   }
> >
> > +static virtio_net_ctrl_ack vhost_vdpa_net_svq_add(VhostShadowVirtqueue 
> > *svq,
> > +   const struct iovec 
> > *dev_buffers)
>
>
> The name should be tweaked since it is used only for cvq.
>

Right, I'll change.

>
> > +{
> > +/* in buffer used for device model */
> > +virtio_net_ctrl_ack status;
> > +const struct iovec in = {
> > +.iov_base = &status,
> > +.iov_len = sizeof(status),
> > +};
> > +size_t dev_written;
> > +int r;
> > +void *unused = (void *)1;
> > +
> > +r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, unused);
> > +if (unlikely(r != 0)) {
> > +if (unlikely(r == -ENOSPC)) {
> > +qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device 
> > queue\n",
> > +  __func__);
> > +}
> > +return VIRTIO_NET_ERR;
> > +}
> > +
> > +/*
> > + * We can poll here since we've had BQL from the time we sent the
> > + * descriptor. Also, we need to take the answer before SVQ pulls by 
> > itself,
> > + * when BQL is released
> > + */
>
>
> This reminds me that, do we need a upper limit of the time on the
> polling here. (Avoid taking BQL for too long time).
>

Sending a new version of rx filters here.

But we have other parts where we can have BQL forever because we trust
the device, like vring enable syscalls for example.

Thanks!

Re: [PATCH] gpio: designware gpio driver

2022-07-18 Thread Ben Dooks


On 18/07/2022 11:15, Peter Maydell wrote:

On Mon, 18 Jul 2022 at 11:05, Ben Dooks  wrote:


On 13/07/2022 18:20, Ben Dooks wrote:

A model for the DesignWare GPIO (v1) block.


Is there anyone else who should be reviewing these that
was missed off the original list? I'd like to get an idea
if there is any work to do. I've got a couple more drivers
to submit and was waiting on feedback from this before
getting these submitted.



My overall feedback is: this isn't a pluggable device (PCI, etc),
so what's it intended to be used by? Generally we don't take
device models except when there's a board model that's using them.


I have a board file, but that's currently under NDA, so we're not
allowed to release it at the moment. However we've done a few drivers
which we'd like to get out of our development tree which other people
might find useful (GPIO, SPI, I2C).

--
Ben

[PATCH v4 00/19] vdpa net devices Rx filter change notification with Shadow VQ

Control virtqueue is used by networking device for accepting various
commands from the driver. It's a must to support advanced configurations.

Rx filtering event is issues by qemu when device's MAC address changed once and
the previous one has not been queried by external agents.

Shadow VirtQueue (SVQ) already makes possible tracking the state of virtqueues,
effectively intercepting them so qemu can track what regions of memory are
dirty because device action and needs migration. However, this does not solve
networking device state seen by the driver because CVQ messages, like changes
on MAC addresses from the driver.

This series uses SVQ infrastructure to intercept networking control messages
used by the device. This way, qemu is able to update VirtIONet device model and
react to them. In particular, this series enables rx filter change
notification.

This is a prerequisite to achieve net vdpa device with CVQ live migration.
It's a stripped down version of [1], with error paths checked and no migration
enabled.

First nine patches reorder and clean code base so its easier to apply later
ones. No functional change should be noticed from these changes.

Patches from 11 to 14 enable SVQ API to make other parts of qemu to interact
with it. In particular, they will be used by vhost-vdpa net to handle CVQ
messages.

Patches 15 to 17 enable the update of the virtio-net device model for each
CVQ message acknowledged by the device.

Last patches enable x-svq parameter, forbidding device migration since it is
not restored in the destination's vdpa device yet. This will be added in later
series, using this work.

Comments are welcome.
v4:
- Add a timeout to vhost_svq_poll to not hold BQL forever

v3:
- Replace SVQElement with SVQDescState

v2:
- (Comments from series [1]).
- Active poll for CVQ answer instead of relay on async used callback
- Do not offer a new buffer to device but reuse qemu's
- Use vhost_svq_add instead of not needed vhost_svq_inject
- Delete used and detach callbacks, not needed anymore
- Embed members of SVQElement in VirtQueueElement
- Reuse the same buffers for all CVQ commands

[1]
https://patchwork.kernel.org/project/qemu-devel/cover/20220706184008.1649478-1-epere...@redhat.com/

Eugenio Pérez (19):
vhost: move descriptor translation to vhost_svq_vring_write_descs
virtio-net: Expose MAC_TABLE_ENTRIES
virtio-net: Expose ctrl virtqueue logic
vhost: Reorder vhost_svq_kick
vhost: Move vhost_svq_kick call to vhost_svq_add
vhost: Check for queue full at vhost_svq_add
vhost: Decouple vhost_svq_add from VirtQueueElement
vhost: Add SVQDescState
vhost: Track number of descs in SVQDescState
vhost: add vhost_svq_push_elem
vhost: Expose vhost_svq_add
vhost: add vhost_svq_poll
vhost: Add svq avail_handler callback
vdpa: Export vhost_vdpa_dma_map and unmap calls
vdpa: manual forward CVQ buffers
vdpa: Buffer CVQ support on shadow virtqueue
vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
vdpa: Add device migration blocker
vdpa: Add x-svq to NetdevVhostVDPAOptions

--
2.31.1

[PATCH v4 01/19] vhost: move descriptor translation to vhost_svq_vring_write_descs

It's done for both in and out descriptors so it's better placed here.

Acked-by: Jason Wang 
Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 38 +-
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 56c96ebd13..e2184a4481 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -122,17 +122,35 @@ static bool vhost_svq_translate_addr(const 
VhostShadowVirtqueue *svq,
 return true;
 }
 
-static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-const struct iovec *iovec, size_t num,
-bool more_descs, bool write)
+/**
+ * Write descriptors to SVQ vring
+ *
+ * @svq: The shadow virtqueue
+ * @sg: Cache for hwaddr
+ * @iovec: The iovec from the guest
+ * @num: iovec length
+ * @more_descs: True if more descriptors come in the chain
+ * @write: True if they are writeable descriptors
+ *
+ * Return true if success, false otherwise and print error.
+ */
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+const struct iovec *iovec, size_t num,
+bool more_descs, bool write)
 {
 uint16_t i = svq->free_head, last = svq->free_head;
 unsigned n;
 uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
 vring_desc_t *descs = svq->vring.desc;
+bool ok;
 
 if (num == 0) {
-return;
+return true;
+}
+
+ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+if (unlikely(!ok)) {
+return false;
 }
 
 for (n = 0; n < num; n++) {
@@ -150,6 +168,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue 
*svq, hwaddr *sg,
 }
 
 svq->free_head = le16_to_cpu(svq->desc_next[last]);
+return true;
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
@@ -169,21 +188,18 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
 return false;
 }
 
-ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
+ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
+ elem->in_num > 0, false);
 if (unlikely(!ok)) {
 return false;
 }
-vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-elem->in_num > 0, false);
-
 
-ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
+ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, 
false,
+ true);
 if (unlikely(!ok)) {
 return false;
 }
 
-vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
-
 /*
  * Put the entry in the available array (but don't update avail->idx until
  * they do sync).
-- 
2.31.1

[PATCH v4 03/19] virtio-net: Expose ctrl virtqueue logic

This allows external vhost-net devices to modify the state of the
VirtIO device model once the vhost-vdpa device has acknowledged the
control commands.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/virtio-net.h |  4 ++
 hw/net/virtio-net.c| 84 --
 2 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index cce1c554f7..ef234ffe7e 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -221,6 +221,10 @@ struct VirtIONet {
 struct EBPFRSSContext ebpf_rss;
 };
 
+size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+  const struct iovec *in_sg, unsigned in_num,
+  const struct iovec *out_sg,
+  unsigned out_num);
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
const char *type);
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f83e96e4ce..dd0d056fde 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1433,57 +1433,71 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t 
cmd,
 return VIRTIO_NET_OK;
 }
 
-static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+  const struct iovec *in_sg, unsigned in_num,
+  const struct iovec *out_sg,
+  unsigned out_num)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 struct virtio_net_ctrl_hdr ctrl;
 virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-VirtQueueElement *elem;
 size_t s;
 struct iovec *iov, *iov2;
-unsigned int iov_cnt;
+
+if (iov_size(in_sg, in_num) < sizeof(status) ||
+iov_size(out_sg, out_num) < sizeof(ctrl)) {
+virtio_error(vdev, "virtio-net ctrl missing headers");
+return 0;
+}
+
+iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
+s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
+iov_discard_front(&iov, &out_num, sizeof(ctrl));
+if (s != sizeof(ctrl)) {
+status = VIRTIO_NET_ERR;
+} else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
+status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
+status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
+status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
+status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
+status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
+}
+
+s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
+assert(s == sizeof(status));
+
+g_free(iov2);
+return sizeof(status);
+}
+
+static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtQueueElement *elem;
 
 for (;;) {
+size_t written;
 elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
 if (!elem) {
 break;
 }
-if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
-iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
-virtio_error(vdev, "virtio-net ctrl missing headers");
+
+written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
+ elem->out_sg, elem->out_num);
+if (written > 0) {
+virtqueue_push(vq, elem, written);
+virtio_notify(vdev, vq);
+g_free(elem);
+} else {
 virtqueue_detach_element(vq, elem, 0);
 g_free(elem);
 break;
 }
-
-iov_cnt = elem->out_num;
-iov2 = iov = g_memdup2(elem->out_sg,
-   sizeof(struct iovec) * elem->out_num);
-s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
-iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
-if (s != sizeof(ctrl)) {
-status = VIRTIO_NET_ERR;
-} else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
-status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
-status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
-status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
-status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_

[PATCH v4 11/19] vhost: Expose vhost_svq_add

This allows external parts of SVQ to forward custom buffers to the
device.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h | 3 +++
 hw/virtio/vhost-shadow-virtqueue.c | 6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index d9fc1f1799..dd78f4bec2 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -86,6 +86,9 @@ bool vhost_svq_valid_features(uint64_t features, Error 
**errp);
 
 void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
  const VirtQueueElement *elem, uint32_t len);
+int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
+  size_t out_num, const struct iovec *in_sg, size_t in_num,
+  VirtQueueElement *elem);
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index b377e125e7..406a823c81 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -238,9 +238,9 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
  *
  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
-static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
-  size_t out_num, const struct iovec *in_sg,
-  size_t in_num, VirtQueueElement *elem)
+int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
+  size_t out_num, const struct iovec *in_sg, size_t in_num,
+  VirtQueueElement *elem)
 {
 unsigned qemu_head;
 unsigned ndescs = in_num + out_num;
-- 
2.31.1

[PATCH v4 02/19] virtio-net: Expose MAC_TABLE_ENTRIES

vhost-vdpa control virtqueue needs to know the maximum entries supported
by the virtio-net device, so we know if it is possible to apply the
filter.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/virtio-net.h | 3 +++
 hw/net/virtio-net.c| 1 -
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eb87032627..cce1c554f7 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -35,6 +35,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
  * and latency. */
 #define TX_BURST 256
 
+/* Maximum VIRTIO_NET_CTRL_MAC_TABLE_SET unicast + multicast entries. */
+#define MAC_TABLE_ENTRIES64
+
 typedef struct virtio_net_conf
 {
 uint32_t txtimer;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7ad948ee7c..f83e96e4ce 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -49,7 +49,6 @@
 
 #define VIRTIO_NET_VM_VERSION11
 
-#define MAC_TABLE_ENTRIES64
 #define MAX_VLAN(1 << 12)   /* Per 802.1Q definition */
 
 /* previously fixed value */
-- 
2.31.1

[PATCH v4 15/19] vdpa: manual forward CVQ buffers

Do a simple forwarding of CVQ buffers, the same work SVQ could do but
through callbacks. No functional change intended.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h |  3 ++
 hw/virtio/vhost-vdpa.c |  3 +-
 net/vhost-vdpa.c   | 58 ++
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 7214eb47dc..d85643 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -15,6 +15,7 @@
 #include 
 
 #include "hw/virtio/vhost-iova-tree.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -35,6 +36,8 @@ typedef struct vhost_vdpa {
 /* IOVA mapping used by the Shadow Virtqueue */
 VhostIOVATree *iova_tree;
 GPtrArray *shadow_vqs;
+const VhostShadowVirtqueueOps *shadow_vq_ops;
+void *shadow_vq_ops_opaque;
 struct vhost_dev *dev;
 VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 96997210be..beaaa7049a 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -419,7 +419,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 for (unsigned n = 0; n < hdev->nvqs; ++n) {
 g_autoptr(VhostShadowVirtqueue) svq;
 
-svq = vhost_svq_new(v->iova_tree, NULL, NULL);
+svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
+v->shadow_vq_ops_opaque);
 if (unlikely(!svq)) {
 error_setg(errp, "Cannot create svq %u", n);
 return -1;
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index df1e69ee72..2e3b6b10d8 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -11,11 +11,14 @@
 
 #include "qemu/osdep.h"
 #include "clients.h"
+#include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/vhost-vdpa.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
+#include "qemu/log.h"
+#include "qemu/memalign.h"
 #include "qemu/option.h"
 #include "qapi/error.h"
 #include 
@@ -187,6 +190,57 @@ static NetClientInfo net_vhost_vdpa_info = {
 .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+/**
+ * Forward buffer for the moment.
+ */
+static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
+VirtQueueElement *elem,
+void *opaque)
+{
+unsigned int n = elem->out_num + elem->in_num;
+g_autofree struct iovec *dev_buffers = g_new(struct iovec, n);
+size_t in_len, dev_written;
+virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
+int r;
+
+memcpy(dev_buffers, elem->out_sg, elem->out_num);
+memcpy(dev_buffers + elem->out_num, elem->in_sg, elem->in_num);
+
+r = vhost_svq_add(svq, &dev_buffers[0], elem->out_num, &dev_buffers[1],
+  elem->in_num, elem);
+if (unlikely(r != 0)) {
+if (unlikely(r == -ENOSPC)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
+  __func__);
+}
+goto out;
+}
+
+/*
+ * We can poll here since we've had BQL from the time we sent the
+ * descriptor. Also, we need to take the answer before SVQ pulls by itself,
+ * when BQL is released
+ */
+dev_written = vhost_svq_poll(svq);
+if (unlikely(dev_written < sizeof(status))) {
+error_report("Insufficient written data (%zu)", dev_written);
+}
+
+out:
+in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status,
+  sizeof(status));
+if (unlikely(in_len < sizeof(status))) {
+error_report("Bad device CVQ written length");
+}
+vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status)));
+g_free(elem);
+return r;
+}
+
+static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
+.avail_handler = vhost_vdpa_net_handle_ctrl_avail,
+};
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
const char *device,
const char *name,
@@ -211,6 +265,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.device_fd = vdpa_device_fd;
 s->vhost_vdpa.index = queue_pair_index;
+if (!is_datapath) {
+s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
+s->vhost_vdpa.shadow_vq_ops_opaque = s;
+}
 ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
 if (ret) {
 qemu_del_net_client(nc);
-- 
2.31.1

[PATCH v4 04/19] vhost: Reorder vhost_svq_kick

Future code needs to call it from vhost_svq_add.

No functional change intended.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index e2184a4481..fd1839cec5 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -215,6 +215,20 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
 return true;
 }
 
+static void vhost_svq_kick(VhostShadowVirtqueue *svq)
+{
+/*
+ * We need to expose the available array entries before checking the used
+ * flags
+ */
+smp_mb();
+if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
+return;
+}
+
+event_notifier_set(&svq->hdev_kick);
+}
+
 /**
  * Add an element to a SVQ.
  *
@@ -235,20 +249,6 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, 
VirtQueueElement *elem)
 return true;
 }
 
-static void vhost_svq_kick(VhostShadowVirtqueue *svq)
-{
-/*
- * We need to expose the available array entries before checking the used
- * flags
- */
-smp_mb();
-if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
-return;
-}
-
-event_notifier_set(&svq->hdev_kick);
-}
-
 /**
  * Forward available buffers.
  *
-- 
2.31.1

[PATCH v4 06/19] vhost: Check for queue full at vhost_svq_add

The series need to expose vhost_svq_add with full functionality,
including checking for full queue.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 59 +-
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index e5a4a62daa..aee9891a67 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -233,21 +233,29 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
  * Add an element to a SVQ.
  *
  * The caller must check that there is enough slots for the new element. It
- * takes ownership of the element: In case of failure, it is free and the SVQ
- * is considered broken.
+ * takes ownership of the element: In case of failure not ENOSPC, it is free.
+ *
+ * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
-static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
 {
 unsigned qemu_head;
-bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
+unsigned ndescs = elem->in_num + elem->out_num;
+bool ok;
+
+if (unlikely(ndescs > vhost_svq_available_slots(svq))) {
+return -ENOSPC;
+}
+
+ok = vhost_svq_add_split(svq, elem, &qemu_head);
 if (unlikely(!ok)) {
 g_free(elem);
-return false;
+return -EINVAL;
 }
 
 svq->ring_id_maps[qemu_head] = elem;
 vhost_svq_kick(svq);
-return true;
+return 0;
 }
 
 /**
@@ -274,7 +282,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 
 while (true) {
 VirtQueueElement *elem;
-bool ok;
+int r;
 
 if (svq->next_guest_avail_elem) {
 elem = g_steal_pointer(&svq->next_guest_avail_elem);
@@ -286,25 +294,24 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 break;
 }
 
-if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) 
{
-/*
- * This condition is possible since a contiguous buffer in GPA
- * does not imply a contiguous buffer in qemu's VA
- * scatter-gather segments. If that happens, the buffer exposed
- * to the device needs to be a chain of descriptors at this
- * moment.
- *
- * SVQ cannot hold more available buffers if we are here:
- * queue the current guest descriptor and ignore further kicks
- * until some elements are used.
- */
-svq->next_guest_avail_elem = elem;
-return;
-}
-
-ok = vhost_svq_add(svq, elem);
-if (unlikely(!ok)) {
-/* VQ is broken, just return and ignore any other kicks */
+r = vhost_svq_add(svq, elem);
+if (unlikely(r != 0)) {
+if (r == -ENOSPC) {
+/*
+ * This condition is possible since a contiguous buffer in
+ * GPA does not imply a contiguous buffer in qemu's VA
+ * scatter-gather segments. If that happens, the buffer
+ * exposed to the device needs to be a chain of descriptors
+ * at this moment.
+ *
+ * SVQ cannot hold more available buffers if we are here:
+ * queue the current guest descriptor and ignore kicks
+ * until some elements are used.
+ */
+svq->next_guest_avail_elem = elem;
+}
+
+/* VQ is full or broken, just return and ignore kicks */
 return;
 }
 }
-- 
2.31.1

[PATCH v4 08/19] vhost: Add SVQDescState

This will allow SVQ to add context to the different queue elements.

This patch only store the actual element, no functional change intended.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  8 ++--
 hw/virtio/vhost-shadow-virtqueue.c | 16 
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index c132c994e9..d646c35054 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,10 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef struct SVQDescState {
+VirtQueueElement *elem;
+} SVQDescState;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
 /* Shadow vring */
@@ -47,8 +51,8 @@ typedef struct VhostShadowVirtqueue {
 /* IOVA mapping */
 VhostIOVATree *iova_tree;
 
-/* Map for use the guest's descriptors */
-VirtQueueElement **ring_id_maps;
+/* SVQ vring descriptors state */
+SVQDescState *desc_state;
 
 /* Next VirtQueue element that guest made available */
 VirtQueueElement *next_guest_avail_elem;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index b005a457c6..d12f5afffb 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -256,7 +256,7 @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, const 
struct iovec *out_sg,
 return -EINVAL;
 }
 
-svq->ring_id_maps[qemu_head] = elem;
+svq->desc_state[qemu_head].elem = elem;
 vhost_svq_kick(svq);
 return 0;
 }
@@ -410,21 +410,21 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 return NULL;
 }
 
-if (unlikely(!svq->ring_id_maps[used_elem.id])) {
+if (unlikely(!svq->desc_state[used_elem.id].elem)) {
 qemu_log_mask(LOG_GUEST_ERROR,
 "Device %s says index %u is used, but it was not available",
 svq->vdev->name, used_elem.id);
 return NULL;
 }
 
-num = svq->ring_id_maps[used_elem.id]->in_num +
-  svq->ring_id_maps[used_elem.id]->out_num;
+num = svq->desc_state[used_elem.id].elem->in_num +
+  svq->desc_state[used_elem.id].elem->out_num;
 last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
 svq->desc_next[last_used_chain] = svq->free_head;
 svq->free_head = used_elem.id;
 
 *len = used_elem.len;
-return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
+return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
 }
 
 static void vhost_svq_flush(VhostShadowVirtqueue *svq,
@@ -594,7 +594,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
VirtIODevice *vdev,
 memset(svq->vring.desc, 0, driver_size);
 svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
 memset(svq->vring.used, 0, device_size);
-svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+svq->desc_state = g_new0(SVQDescState, svq->vring.num);
 svq->desc_next = g_new0(uint16_t, svq->vring.num);
 for (unsigned i = 0; i < svq->vring.num - 1; i++) {
 svq->desc_next[i] = cpu_to_le16(i + 1);
@@ -619,7 +619,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 
 for (unsigned i = 0; i < svq->vring.num; ++i) {
 g_autofree VirtQueueElement *elem = NULL;
-elem = g_steal_pointer(&svq->ring_id_maps[i]);
+elem = g_steal_pointer(&svq->desc_state[i].elem);
 if (elem) {
 virtqueue_detach_element(svq->vq, elem, 0);
 }
@@ -631,7 +631,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 }
 svq->vq = NULL;
 g_free(svq->desc_next);
-g_free(svq->ring_id_maps);
+g_free(svq->desc_state);
 qemu_vfree(svq->vring.desc);
 qemu_vfree(svq->vring.used);
 }
-- 
2.31.1

[PATCH v4 12/19] vhost: add vhost_svq_poll

It allows the Shadow Control VirtQueue to wait for the device to use the
available buffers.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 27 +++
 2 files changed, 28 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index dd78f4bec2..cf442f7dea 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -89,6 +89,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
 int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
   size_t out_num, const struct iovec *in_sg, size_t in_num,
   VirtQueueElement *elem);
+size_t vhost_svq_poll(VhostShadowVirtqueue *svq);
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 406a823c81..3c26781cf7 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -484,6 +484,33 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 } while (!vhost_svq_enable_notification(svq));
 }
 
+/**
+ * Poll the SVQ for one device used buffer.
+ *
+ * This function race with main event loop SVQ polling, so extra
+ * synchronization is needed.
+ *
+ * Return the length written by the device.
+ */
+size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
+{
+int64_t start_us = g_get_monotonic_time();
+do {
+uint32_t len;
+VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
+if (elem) {
+return len;
+}
+
+if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
+return 0;
+}
+
+/* Make sure we read new used_idx */
+smp_rmb();
+} while (true);
+}
+
 /**
  * Forward used buffers.
  *
-- 
2.31.1

[PATCH v4 16/19] vdpa: Buffer CVQ support on shadow virtqueue

Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like rx filtering.

Virtio-net control VQ copies the descriptors to qemu's VA, so we avoid
TOCTOU with the guest's or device's memory every time there is a device
model change.  Otherwise, the guest could change the memory content in
the time between qemu and the device read it.

To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
implemented.  If the virtio-net driver changes MAC the virtio-net device
model will be updated with the new one, and a rx filtering change event
will be raised.

More cvq commands could be added here straightforwardly but they have
not been tested.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 211 +--
 1 file changed, 204 insertions(+), 7 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 2e3b6b10d8..3915b148c4 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -33,6 +33,9 @@ typedef struct VhostVDPAState {
 NetClientState nc;
 struct vhost_vdpa vhost_vdpa;
 VHostNetState *vhost_net;
+
+/* Control commands shadow buffers */
+void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
 bool started;
 } VhostVDPAState;
 
@@ -131,6 +134,8 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
 
+qemu_vfree(s->cvq_cmd_out_buffer);
+qemu_vfree(s->cvq_cmd_in_buffer);
 if (s->vhost_net) {
 vhost_net_cleanup(s->vhost_net);
 g_free(s->vhost_net);
@@ -190,24 +195,191 @@ static NetClientInfo net_vhost_vdpa_info = {
 .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
+{
+VhostIOVATree *tree = v->iova_tree;
+DMAMap needle = {
+/*
+ * No need to specify size or to look for more translations since
+ * this contiguous chunk was allocated by us.
+ */
+.translated_addr = (hwaddr)(uintptr_t)addr,
+};
+const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
+int r;
+
+if (unlikely(!map)) {
+error_report("Cannot locate expected map");
+return;
+}
+
+r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
+if (unlikely(r != 0)) {
+error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
+}
+
+vhost_iova_tree_remove(tree, map);
+}
+
+static size_t vhost_vdpa_net_cvq_cmd_len(void)
+{
+/*
+ * MAC_TABLE_SET is the ctrl command that produces the longer out buffer.
+ * In buffer is always 1 byte, so it should fit here
+ */
+return sizeof(struct virtio_net_ctrl_hdr) +
+   2 * sizeof(struct virtio_net_ctrl_mac) +
+   MAC_TABLE_ENTRIES * ETH_ALEN;
+}
+
+static size_t vhost_vdpa_net_cvq_cmd_page_len(void)
+{
+return ROUND_UP(vhost_vdpa_net_cvq_cmd_len(), qemu_real_host_page_size());
+}
+
+/** Copy and map a guest buffer. */
+static bool vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v,
+   const struct iovec *out_data,
+   size_t out_num, size_t data_len, void *buf,
+   size_t *written, bool write)
+{
+DMAMap map = {};
+int r;
+
+if (unlikely(!data_len)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid legnth of %s buffer\n",
+  __func__, write ? "in" : "out");
+return false;
+}
+
+*written = iov_to_buf(out_data, out_num, 0, buf, data_len);
+map.translated_addr = (hwaddr)(uintptr_t)buf;
+map.size = vhost_vdpa_net_cvq_cmd_page_len() - 1;
+map.perm = write ? IOMMU_RW : IOMMU_RO,
+r = vhost_iova_tree_map_alloc(v->iova_tree, &map);
+if (unlikely(r != IOVA_OK)) {
+error_report("Cannot map injected element");
+return false;
+}
+
+r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
+   !write);
+if (unlikely(r < 0)) {
+goto dma_map_err;
+}
+
+return true;
+
+dma_map_err:
+vhost_iova_tree_remove(v->iova_tree, &map);
+return false;
+}
+
 /**
- * Forward buffer for the moment.
+ * Copy the guest element into a dedicated buffer suitable to be sent to NIC
+ *
+ * @iov: [0] is the out buffer, [1] is the in one
+ */
+static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
+VirtQueueElement *elem,
+struct iovec *iov)
+{
+size_t in_copied;
+bool ok;
+
+iov[0].iov_base = s->cvq_cmd_out_buffer;
+ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
+vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base,
+&iov[0].iov_len, false);
+if (unlikely(!ok)) {
+return false;
+}
+
+iov[1].iov_base = s->cvq_cmd_in_buffer;
+ok = vhost_vdpa_cvq_map_buf(&s->

[PATCH v4 05/19] vhost: Move vhost_svq_kick call to vhost_svq_add

The series needs to expose vhost_svq_add with full functionality,
including kick

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index fd1839cec5..e5a4a62daa 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -246,6 +246,7 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, 
VirtQueueElement *elem)
 }
 
 svq->ring_id_maps[qemu_head] = elem;
+vhost_svq_kick(svq);
 return true;
 }
 
@@ -306,7 +307,6 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 /* VQ is broken, just return and ignore any other kicks */
 return;
 }
-vhost_svq_kick(svq);
 }
 
 virtio_queue_set_notification(svq->vq, true);
-- 
2.31.1

[PATCH v4 10/19] vhost: add vhost_svq_push_elem

This function allows external SVQ users to return guest's available
buffers.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  3 +++
 hw/virtio/vhost-shadow-virtqueue.c | 16 
 2 files changed, 19 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 5c7e7cbab6..d9fc1f1799 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -84,6 +84,9 @@ typedef struct VhostShadowVirtqueue {
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
+void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
+ const VirtQueueElement *elem, uint32_t len);
+
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index ae5bd6efa8..b377e125e7 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -427,6 +427,22 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
 }
 
+/**
+ * Push an element to SVQ, returning it to the guest.
+ */
+void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
+ const VirtQueueElement *elem, uint32_t len)
+{
+virtqueue_push(svq->vq, elem, len);
+if (svq->next_guest_avail_elem) {
+/*
+ * Avail ring was full when vhost_svq_flush was called, so it's a
+ * good moment to make more descriptors available if possible.
+ */
+vhost_handle_guest_kick(svq);
+}
+}
+
 static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 bool check_for_avail_queue)
 {
-- 
2.31.1

[PATCH v4 18/19] vdpa: Add device migration blocker

Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if
it uses CVQ.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h |  1 +
 hw/virtio/vhost-vdpa.c | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d85643..d10a89303e 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -35,6 +35,7 @@ typedef struct vhost_vdpa {
 bool shadow_vqs_enabled;
 /* IOVA mapping used by the Shadow Virtqueue */
 VhostIOVATree *iova_tree;
+Error *migration_blocker;
 GPtrArray *shadow_vqs;
 const VhostShadowVirtqueueOps *shadow_vq_ops;
 void *shadow_vq_ops_opaque;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index beaaa7049a..795ed5a049 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -20,6 +20,7 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "exec/address-spaces.h"
+#include "migration/blocker.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
@@ -1022,6 +1023,13 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
 return true;
 }
 
+if (v->migration_blocker) {
+int r = migrate_add_blocker(v->migration_blocker, &err);
+if (unlikely(r < 0)) {
+goto err_migration_blocker;
+}
+}
+
 for (i = 0; i < v->shadow_vqs->len; ++i) {
 VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
 VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
@@ -1064,6 +1072,9 @@ err:
 vhost_svq_stop(svq);
 }
 
+err_migration_blocker:
+error_reportf_err(err, "Cannot setup SVQ %u: ", i);
+
 return false;
 }
 
@@ -1083,6 +1094,9 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 }
 }
 
+if (v->migration_blocker) {
+migrate_del_blocker(v->migration_blocker);
+}
 return true;
 }
 
-- 
2.31.1

[PATCH v4 14/19] vdpa: Export vhost_vdpa_dma_map and unmap calls

Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks from
the guest that could set a different state in qemu device model and vdpa
device.

To do so, it needs to be able to map these new buffers to the device.

Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 include/hw/virtio/vhost-vdpa.h | 4 
 hw/virtio/vhost-vdpa.c | 7 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index a29dbb3f53..7214eb47dc 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -39,4 +39,8 @@ typedef struct vhost_vdpa {
 VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+   void *vaddr, bool readonly);
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
+
 #endif
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 0b13e98471..96997210be 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -71,8 +71,8 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 return false;
 }
 
-static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-  void *vaddr, bool readonly)
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+   void *vaddr, bool readonly)
 {
 struct vhost_msg_v2 msg = {};
 int fd = v->device_fd;
@@ -97,8 +97,7 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr 
iova, hwaddr size,
 return ret;
 }
 
-static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
-hwaddr size)
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
 {
 struct vhost_msg_v2 msg = {};
 int fd = v->device_fd;
-- 
2.31.1

[PATCH v4 13/19] vhost: Add svq avail_handler callback

This allows external handlers to be aware of new buffers that the guest
places in the virtqueue.

When this callback is defined the ownership of the guest's virtqueue
element is transferred to the callback. This means that if the user
wants to forward the descriptor it needs to manually inject it. The
callback is also free to process the command by itself and use the
element with svq_push.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h | 31 +-
 hw/virtio/vhost-shadow-virtqueue.c | 14 --
 hw/virtio/vhost-vdpa.c |  3 ++-
 3 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index cf442f7dea..d04c34a589 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -25,6 +25,27 @@ typedef struct SVQDescState {
 unsigned int ndescs;
 } SVQDescState;
 
+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+
+/**
+ * Callback to handle an avail buffer.
+ *
+ * @svq:  Shadow virtqueue
+ * @elem:  Element placed in the queue by the guest
+ * @vq_callback_opaque:  Opaque
+ *
+ * Returns 0 if the vq is running as expected.
+ *
+ * Note that ownership of elem is transferred to the callback.
+ */
+typedef int (*VirtQueueAvailCallback)(VhostShadowVirtqueue *svq,
+  VirtQueueElement *elem,
+  void *vq_callback_opaque);
+
+typedef struct VhostShadowVirtqueueOps {
+VirtQueueAvailCallback avail_handler;
+} VhostShadowVirtqueueOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
 /* Shadow vring */
@@ -69,6 +90,12 @@ typedef struct VhostShadowVirtqueue {
  */
 uint16_t *desc_next;
 
+/* Caller callbacks */
+const VhostShadowVirtqueueOps *ops;
+
+/* Caller callbacks opaque */
+void *ops_opaque;
+
 /* Next head to expose to the device */
 uint16_t shadow_avail_idx;
 
@@ -102,7 +129,9 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
VirtIODevice *vdev,
  VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+const VhostShadowVirtqueueOps *ops,
+void *ops_opaque);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 3c26781cf7..1646d3cb40 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -306,7 +306,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 break;
 }
 
-r = vhost_svq_add_element(svq, elem);
+if (svq->ops) {
+r = svq->ops->avail_handler(svq, elem, svq->ops_opaque);
+} else {
+r = vhost_svq_add_element(svq, elem);
+}
 if (unlikely(r != 0)) {
 if (r == -ENOSPC) {
 /*
@@ -684,12 +688,16 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * shadow methods and file descriptors.
  *
  * @iova_tree: Tree to perform descriptors translations
+ * @ops: SVQ owner callbacks
+ * @ops_opaque: ops opaque pointer
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+const VhostShadowVirtqueueOps *ops,
+void *ops_opaque)
 {
 g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
 int r;
@@ -711,6 +719,8 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree 
*iova_tree)
 event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
 event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
 svq->iova_tree = iova_tree;
+svq->ops = ops;
+svq->ops_opaque = ops_opaque;
 return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 66f054a12c..0b13e98471 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -418,8 +418,9 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 
 shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
 for (unsigned n = 0; n < hdev->nvqs; ++n) {
-g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
+g_autoptr(VhostShadowVirtqueue) svq;
 
+svq = vhost_svq_new(v->iova_tree, NULL, NULL);
 if (unlikely(!svq)) {
 error_setg(errp, "Cannot create svq %u", n);
 return -1;
-- 
2.31.1

[PATCH v4 19/19] vdpa: Add x-svq to NetdevVhostVDPAOptions

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez 
Acked-by: Markus Armbruster 
---
 qapi/net.json|  9 +-
 net/vhost-vdpa.c | 72 ++--
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index 9af11e9a3b..75ba2cb989 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -445,12 +445,19 @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #  (default: 1)
 #
+# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
+# (default: false)
+#
+# Features:
+# @unstable: Member @x-svq is experimental.
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
 '*vhostdev': 'str',
-'*queues':   'int' } }
+'*queues':   'int',
+'*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetdevVmnetHostOptions:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 0afa60bb51..986e6414b4 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -75,6 +75,28 @@ const int vdpa_feature_bits[] = {
 VHOST_INVALID_FEATURE_BIT
 };
 
+/** Supported device specific feature bits with SVQ */
+static const uint64_t vdpa_svq_device_features =
+BIT_ULL(VIRTIO_NET_F_CSUM) |
+BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+BIT_ULL(VIRTIO_NET_F_MTU) |
+BIT_ULL(VIRTIO_NET_F_MAC) |
+BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
+BIT_ULL(VIRTIO_NET_F_STATUS) |
+BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
+BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
+BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
+BIT_ULL(VIRTIO_NET_F_STANDBY);
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -133,9 +155,13 @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_dev *dev = &s->vhost_net->dev;
 
 qemu_vfree(s->cvq_cmd_out_buffer);
 qemu_vfree(s->cvq_cmd_in_buffer);
+if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
+g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+}
 if (s->vhost_net) {
 vhost_net_cleanup(s->vhost_net);
 g_free(s->vhost_net);
@@ -437,7 +463,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
int vdpa_device_fd,
int queue_pair_index,
int nvqs,
-   bool is_datapath)
+   bool is_datapath,
+   bool svq,
+   VhostIOVATree *iova_tree)
 {
 NetClientState *nc = NULL;
 VhostVDPAState *s;
@@ -455,6 +483,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.device_fd = vdpa_device_fd;
 s->vhost_vdpa.index = queue_pair_index;
+s->vhost_vdpa.shadow_vqs_enabled = svq;
+s->vhost_vdpa.iova_tree = iova_tree;
 if (!is_datapath) {
 s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
 vhost_vdpa_net_cvq_cmd_page_len());
@@ -465,6 +495,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
 s->vhost_vdpa.shadow_vq_ops_opaque = s;
+error_setg(&s->vhost_vdpa.migration_blocker,
+   "Migration disabled: vhost-vdpa uses CVQ.");
 }
 ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
 if (ret) {
@@ -474,6 +506,14 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 return nc;
 }
 
+static int vhost_vdpa_get_iova_range(int fd,
+ struct vhost_vdpa_iova_range *iova_range)
+{
+int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+return ret < 0 ? -errno : 0;
+}
+
 static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
 {
 int ret = ioctl(fd, VHOST_GET_FEATURES, features);
@@ -524,6 +564,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 uint64_t features;
 int vdpa_device_fd;
 g_autofree NetClientState **ncs = NULL;
+g_autoptr(VhostIOVATree) iova_tree = NULL;
 NetClientState *nc;
 int queue_pairs, r, i, has_cvq = 0;
 
@@ -551,22 +592,45 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 return queue_pairs;
 }
 
+if (

[PATCH v4 07/19] vhost: Decouple vhost_svq_add from VirtQueueElement

VirtQueueElement comes from the guest, but we're heading SVQ to be able
to modify the element presented to the device without the guest's
knowledge.

To do so, make SVQ accept sg buffers directly, instead of using
VirtQueueElement.

Add vhost_svq_add_element to maintain element convenience.

Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 hw/virtio/vhost-shadow-virtqueue.c | 33 --
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index aee9891a67..b005a457c6 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -172,30 +172,31 @@ static bool 
vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-VirtQueueElement *elem, unsigned *head)
+const struct iovec *out_sg, size_t out_num,
+const struct iovec *in_sg, size_t in_num,
+unsigned *head)
 {
 unsigned avail_idx;
 vring_avail_t *avail = svq->vring.avail;
 bool ok;
-g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
+g_autofree hwaddr *sgs = g_new(hwaddr, MAX(out_num, in_num));
 
 *head = svq->free_head;
 
 /* We need some descriptors here */
-if (unlikely(!elem->out_num && !elem->in_num)) {
+if (unlikely(!out_num && !in_num)) {
 qemu_log_mask(LOG_GUEST_ERROR,
   "Guest provided element with no descriptors");
 return false;
 }
 
-ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
- elem->in_num > 0, false);
+ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
+ false);
 if (unlikely(!ok)) {
 return false;
 }
 
-ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, 
false,
- true);
+ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
 if (unlikely(!ok)) {
 return false;
 }
@@ -237,17 +238,19 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
  *
  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
-static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
+  size_t out_num, const struct iovec *in_sg,
+  size_t in_num, VirtQueueElement *elem)
 {
 unsigned qemu_head;
-unsigned ndescs = elem->in_num + elem->out_num;
+unsigned ndescs = in_num + out_num;
 bool ok;
 
 if (unlikely(ndescs > vhost_svq_available_slots(svq))) {
 return -ENOSPC;
 }
 
-ok = vhost_svq_add_split(svq, elem, &qemu_head);
+ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
 if (unlikely(!ok)) {
 g_free(elem);
 return -EINVAL;
@@ -258,6 +261,14 @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, 
VirtQueueElement *elem)
 return 0;
 }
 
+/* Convenience wrapper to add a guest's element to SVQ */
+static int vhost_svq_add_element(VhostShadowVirtqueue *svq,
+ VirtQueueElement *elem)
+{
+return vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
+ elem->in_num, elem);
+}
+
 /**
  * Forward available buffers.
  *
@@ -294,7 +305,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 break;
 }
 
-r = vhost_svq_add(svq, elem);
+r = vhost_svq_add_element(svq, elem);
 if (unlikely(r != 0)) {
 if (r == -ENOSPC) {
 /*
-- 
2.31.1

[PATCH v4 17/19] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs

To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.

Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 net/vhost-vdpa.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3915b148c4..0afa60bb51 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -474,20 +474,24 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 return nc;
 }
 
-static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
+static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
+{
+int ret = ioctl(fd, VHOST_GET_FEATURES, features);
+if (unlikely(ret < 0)) {
+error_setg_errno(errp, errno,
+ "Fail to query features from vhost-vDPA device");
+}
+return ret;
+}
+
+static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
+  int *has_cvq, Error **errp)
 {
 unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
 g_autofree struct vhost_vdpa_config *config = NULL;
 __virtio16 *max_queue_pairs;
-uint64_t features;
 int ret;
 
-ret = ioctl(fd, VHOST_GET_FEATURES, &features);
-if (ret) {
-error_setg(errp, "Fail to query features from vhost-vDPA device");
-return ret;
-}
-
 if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
 *has_cvq = 1;
 } else {
@@ -517,10 +521,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 NetClientState *peer, Error **errp)
 {
 const NetdevVhostVDPAOptions *opts;
+uint64_t features;
 int vdpa_device_fd;
 g_autofree NetClientState **ncs = NULL;
 NetClientState *nc;
-int queue_pairs, i, has_cvq = 0;
+int queue_pairs, r, i, has_cvq = 0;
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = &netdev->u.vhost_vdpa;
@@ -534,7 +539,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 return -errno;
 }
 
-queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
+r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
+if (unlikely(r < 0)) {
+return r;
+}
+
+queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
  &has_cvq, errp);
 if (queue_pairs < 0) {
 qemu_close(vdpa_device_fd);
-- 
2.31.1

Re: [PATCH] gpio: designware gpio driver

On Mon, 18 Jul 2022 at 11:25, Ben Dooks  wrote:
>
> On 18/07/2022 11:15, Peter Maydell wrote:
> > On Mon, 18 Jul 2022 at 11:05, Ben Dooks  wrote:
> >>
> >> On 13/07/2022 18:20, Ben Dooks wrote:
> >>> A model for the DesignWare GPIO (v1) block.
> >>
> >> Is there anyone else who should be reviewing these that
> >> was missed off the original list? I'd like to get an idea
> >> if there is any work to do. I've got a couple more drivers
> >> to submit and was waiting on feedback from this before
> >> getting these submitted.
> >
> >
> > My overall feedback is: this isn't a pluggable device (PCI, etc),
> > so what's it intended to be used by? Generally we don't take
> > device models except when there's a board model that's using them.
>
> I have a board file, but that's currently under NDA, so we're not
> allowed to release it at the moment. However we've done a few drivers
> which we'd like to get out of our development tree which other people
> might find useful (GPIO, SPI, I2C).

As I say, we don't really accept those, because we have no way
of testing them upstream unless they're used in a board file:
they're dead code from our point of view. When you have a board
model you're ready to submit you can send them in the same
patchseries as the board model.

thanks
-- PMM

[PATCH v4 09/19] vhost: Track number of descs in SVQDescState

A guest's buffer continuos on GPA may need multiple descriptors on
qemu's VA, so SVQ should track its length sepparatedly.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h | 6 ++
 hw/virtio/vhost-shadow-virtqueue.c | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index d646c35054..5c7e7cbab6 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -17,6 +17,12 @@
 
 typedef struct SVQDescState {
 VirtQueueElement *elem;
+
+/*
+ * Number of descriptors exposed to the device. May or may not match
+ * guest's
+ */
+unsigned int ndescs;
 } SVQDescState;
 
 /* Shadow virtqueue to relay notifications */
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index d12f5afffb..ae5bd6efa8 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -257,6 +257,7 @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, const 
struct iovec *out_sg,
 }
 
 svq->desc_state[qemu_head].elem = elem;
+svq->desc_state[qemu_head].ndescs = ndescs;
 vhost_svq_kick(svq);
 return 0;
 }
@@ -417,8 +418,7 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 return NULL;
 }
 
-num = svq->desc_state[used_elem.id].elem->in_num +
-  svq->desc_state[used_elem.id].elem->out_num;
+num = svq->desc_state[used_elem.id].ndescs;
 last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
 svq->desc_next[last_used_chain] = svq->free_head;
 svq->free_head = used_elem.id;
-- 
2.31.1

Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal

Peter Maydell  writes:

> On Mon, 11 Jul 2022 at 12:09, Daniel P. Berrangé  wrote:
>>
>> On Mon, Jul 11, 2022 at 11:32:35AM +0100, Peter Maydell wrote:
>> > I'm pretty sure that nothing needs sparse array elements like
>> > that. The major reason for the len-PROP field is an implementation
>> > one: because there is currently no way for a QOM object to
>> > say "call this method if somebody tries to set a non-existent
>> > property", the way array properties work is that the 'set'
>> > method for the len-PROP property is the place where we then
>> > add the PROP[0], PROP[1], ... properties.
>>
>> Ahhh, I see what you mean. I totally missed this subtle detail.
>>
>> IIUC, there's essentially no such thing as array properties
>> in QOM. 'prop[0]', 'prop[1]', 'prop[2]', etc are all simply
>> scalar properties from QOM's, that just happen to follow a
>> common naming scheme, but QOM doesn't care about that.
>
> Yeah. The exception is the bit Markus pointed out where somebody
> has later added special case support for setting "prop[*]" to
> mean "whatever the next index is".
>
>> > If we either had a "call this for any property set/get attempt
>> > where there is no specific method set" or else had array
>> > properties supported by the core QOM code, we could avoid
>> > having to set len-PROP first.
>>
>> Techically arrays are already supported at the core QOM level, because
>> you can use any QAPI type as a property.  The authz/list.c object
>> has a 'rules' property that is an array of QAuthzListRule objects:
>>
>>   { 'struct': 'AuthZListProperties',
>> 'data': { '*policy': 'QAuthZListPolicy',
>>   '*rules': ['QAuthZListRule'] } }
>>
>> At the time I wrote that, we couldn't express it on the CLI though,
>> without using JSON syntax for -object. I don't think we've ever
>> made it possible to use the opts_visitor with non-scalar properties
>> though.

The opts visitor is flat by design.  Avoid it in new code, use
qobject_input_visitor_new_keyval() instead.  Typical use is via a
wrapper such as qobject_input_visitor_new_str().

> Mmm, if we had started the array-property implementation starting
> from a QAPI API (or even with the idea of setting properties from
> command line arguments) and working from there we'd probably have ended
> up with something different. The primary use case though has
> never involved QAPI or the command line, it's just C code for setting
> property values on devices created within QEMU.

Trouble with QOM is that things meant for internal use bleed to the
external interface so easily.  We've been quite cavalier about that.
Should we change our attitude?

Re: [RISU PATCH v4 15/29] Rearrange reginfo and memblock buffers

On Fri, 8 Jul 2022 at 17:06, Richard Henderson
 wrote:
>
> For send_register_info from master_sigill, do not keep a
> reginfo buffer on the stack.  At the moment, this struct
> is quite large for aarch64.
>
> Put the two reginfo buffers into an array, for the benefit
> of future dumping.  For recv_and_compare_register_info,
> index this array with constants, so it's a simple rename.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RISU PATCH v4 14/29] Merge reginfo.c into risu.c

On Fri, 8 Jul 2022 at 17:02, Richard Henderson
 wrote:
>
> The distinction between the two is artificial.  Following
> patches will rearrange the functions involved to make it
> easier for dumping of the trace file.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RISU PATCH v4 16/29] Split out recv_register_info

On Fri, 8 Jul 2022 at 17:36, Richard Henderson
 wrote:
>
> We will want to share this code when dumping.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RFC v4 0/9] Add support for zoned device

Sam Li  writes:

> Markus Armbruster  于2022年7月12日周二 13:47写道：
>>
>> Sam Li  writes:
>>
>> > This patch series adds support for zoned device to virtio-blk emulation. 
>> > Zoned
>> > Storage can support sequential writes, which reduces write amplification 
>> > in SSD,
>> > leading to higher write throughput and increased capacity.
>>
>> Forgive me if this has already been discussed, or is explained deeper in
>> the patch series...
>>
>> The commit message sounds like you're extending virtio-blk to optionally
>> emulate zoned storage.  Correct?
>
> Yes! The main purpose is to emulate zoned storage only for the zoned
> device files. Right now, QEMU sees those as regular block devices.
>
>> PATCH 1 adds a new block block device driver 'zoned_host_device', and
>> PATCH 9 exposes it in QAPI.  This is for passing through a zoned host
>> device, correct?
>
> Yes! It allows the guest os see zoned host device. It is still in
> development. Maybe the implementations will change later.

Your cover letter only mentions the virtio-blk part, not the
pass-through part.  Please correct that if you need to respin.

Re: [RISU PATCH v4 17/29] Add magic and size to the trace header

On Fri, 8 Jul 2022 at 17:08, Richard Henderson
 wrote:
>
> Sanity check that we're not getting out of sync with
> the trace stream.  This will be especially bad with
> the change in size of the sve save data.
>
> Signed-off-by: Richard Henderson 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v2 1/6] target/riscv: add check for supported privilege modes conbinations

2022-07-18 Thread Weiwei Li




在 2022/7/18 下午5:02, Andrew Jones 写道:

On Tue, Jul 12, 2022 at 02:32:31PM +0800, Weiwei Li wrote:

- There are 3 suggested privilege modes conbinations listed in the spec:

No need for '-' here.

s/modes/mode/
s/conbinations/combinations/

(Same typos in $SUBJECT, also please capitalize 'add' in $SUBJECT.)

When referencing the spec it's nice to point out the doc/version/section.


1) M, 2) M, U 3) M, S, U

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Alistair Francis 
---
  target/riscv/cpu.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index db2b8e4d30..36c1b26fb3 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -726,6 +726,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  return;
  }
  
+if (cpu->cfg.ext_s && !cpu->cfg.ext_u) {

+error_setg(errp,
+   "Setting S extension without U extension is illegal");
+return;
+}
+
  if (cpu->cfg.ext_f && !cpu->cfg.ext_icsr) {
  error_setg(errp, "F extension requires Zicsr");
  return;
--
2.17.1



Besides the commit message issues

Reviewed-by: Andrew Jones 


Thanks for your comments.

I'll fix the issues in commit messages in the next version.

Regards,

Weiwei Li

Re: [RISU PATCH v4 19/29] aarch64: Assume system support for SVE

On Fri, 8 Jul 2022 at 17:09, Richard Henderson
 wrote:
>
> SVE support is no longer new, assume it's present.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RISU PATCH v4 18/29] Compute reginfo_size based on the reginfo

On Fri, 8 Jul 2022 at 17:35, Richard Henderson
 wrote:
>
> This will allow dumping of SVE frames without having
> to know the SVE vector length beforehand.
>
> Signed-off-by: Richard Henderson 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RISU PATCH v4 20/29] aarch64: Reorg sve reginfo to save space

On Fri, 8 Jul 2022 at 17:38, Richard Henderson
 wrote:
>
> Mirror the signal frame by storing all of the registers
> as a lump.  Use the signal macros to pull out the values.
>
> Signed-off-by: Richard Henderson 
> ---
>  risu_reginfo_aarch64.h |  45 ++-
>  risu_reginfo_aarch64.c | 171 -
>  2 files changed, 108 insertions(+), 108 deletions(-)

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RISU PATCH v4 21/29] aarch64: Use arch_init to configure sve

On Fri, 8 Jul 2022 at 17:41, Richard Henderson
 wrote:
>
> Adjust some of the aarch64 code to look at the reginfo struct
> instead of looking at test_sve, so that we do not need to pass
> the --test-sve option in order to dump sve trace files.
>
> Diagnose EINVAL as either cpu or kernel does not support SVE.
>
> Signed-off-by: Richard Henderson 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

[PATCH v2 7/9] target/arm: Add PMSAv8r registers

From: Tobias Röhmel 

Signed-off-by: Tobias Röhmel 
---
 target/arm/cpu.h|  10 +++
 target/arm/helper.c | 171 
 2 files changed, 181 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 86e06116a9..632d0d13c6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -726,8 +726,18 @@ typedef struct CPUArchState {
  */
 uint32_t *rbar[M_REG_NUM_BANKS];
 uint32_t *rlar[M_REG_NUM_BANKS];
+uint32_t prbarn[255];
+uint32_t prlarn[255];
+uint32_t hprbarn[255];
+uint32_t hprlarn[255];
 uint32_t mair0[M_REG_NUM_BANKS];
 uint32_t mair1[M_REG_NUM_BANKS];
+uint32_t prbar;
+uint32_t prlar;
+uint32_t prselr;
+uint32_t hprbar;
+uint32_t hprlar;
+uint32_t hprselr;
 } pmsav8;
 
 /* v8M SAU */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 03bdc3d149..f9ed2bd5c3 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7422,6 +7422,78 @@ static CPAccessResult access_joscr_jmcr(CPUARMState *env,
 return CP_ACCESS_OK;
 }
 
+static void prbar_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+env->pmsav8.prbarn[env->pmsav8.prselr] = value;
+}
+
+static void prlar_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+env->pmsav8.prlarn[env->pmsav8.prselr] = value;
+}
+
+static uint64_t prbar_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return env->pmsav8.prbarn[env->pmsav8.prselr];
+}
+
+static uint64_t prlar_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return env->pmsav8.prlarn[env->pmsav8.prselr];
+}
+
+static void hprbar_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+env->pmsav8.hprbarn[env->pmsav8.hprselr] = value;
+}
+
+static void hprlar_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+env->pmsav8.hprlarn[env->pmsav8.hprselr] = value;
+}
+
+static void hprenr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+uint32_t n;
+ARMCPU *cpu = env_archcpu(env);
+for (n = 0; n < (int)cpu->pmsav7_dregion; ++n) {
+if (value & (1 << n)) {
+env->pmsav8.hprlarn[n] |= 0x1;
+} else {
+env->pmsav8.hprlarn[n] &= (~0x1);
+}
+}
+}
+
+static uint64_t hprbar_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return env->pmsav8.hprbarn[env->pmsav8.hprselr];
+}
+
+static uint64_t hprlar_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return env->pmsav8.hprlarn[env->pmsav8.hprselr];
+}
+
+static uint64_t hprenr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+uint32_t n;
+uint32_t result = 0x0;
+ARMCPU *cpu = env_archcpu(env);
+
+for (n = 0; n < (int)cpu->pmsav7_dregion; ++n) {
+if (env->pmsav8.hprlarn[n] & 0x1) {
+result |= (0x1 << n);
+}
+}
+return result;
+}
+
 static const ARMCPRegInfo jazelle_regs[] = {
 { .name = "JIDR",
   .cp = 14, .crn = 0, .crm = 0, .opc1 = 7, .opc2 = 0,
@@ -8235,6 +8307,46 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .type = ARM_CP_CONST,
   .resetvalue = cpu->pmsav7_dregion << 8
 };
+/* PMSAv8-R registers*/
+ARMCPRegInfo id_pmsav8_r_reginfo[] = {
+{ .name = "HMPUIR",
+  .cp = 15, .crn = 0, .crm = 0, .opc1 = 4, .opc2 = 4,
+  .access = PL2_R, .type = ARM_CP_CONST,
+  .resetvalue = cpu->pmsav7_dregion},
+ /* PMSAv8-R registers */
+{ .name = "PRBAR",
+  .cp = 15, .opc1 = 0, .crn = 6, .crm = 3, .opc2 = 0,
+  .access = PL1_RW, .resetvalue = 0,
+  .readfn = prbar_read, .writefn = prbar_write,
+  .fieldoffset = offsetof(CPUARMState, pmsav8.prbar)},
+{ .name = "PRLAR",
+  .cp = 15, .opc1 = 0, .crn = 6, .crm = 3, .opc2 = 1,
+  .access = PL1_RW, .resetvalue = 0,
+  .readfn = prlar_read, .writefn = prlar_write,
+  .fieldoffset = offsetof(CPUARMState, pmsav8.prlar)},
+{ .name = "PRSELR", .resetvalue = 0,
+  .cp = 15, .opc1 = 0, .crn = 6, .crm = 2, .opc2 = 1,
+  .access = PL1_RW, .accessfn = access_tvm_trvm,
+  .fieldoffset = offsetof(CPUARMState, pmsav8.prselr)},
+{ .name = "HPRBAR", .resetvalue = 0,
+  .readfn = hprbar_read, .writefn = hprbar_write,
+  .cp = 15, .opc1 = 4, .crn = 6, .crm = 3, .opc2 = 0,
+  .access = PL2_RW, .resetvalue = 0,
+  .fieldoffset = offsetof(CPUARMState, pmsav8.hprbar)},
+{ .name = "HPRLAR",
+  .cp = 15, .opc1 = 4, .crn = 6, .crm = 3, .opc2 = 1,
+  .access = PL2_RW, .resetvalue = 0,
+  .readfn = hprlar_r

[PATCH v2 3/9] target/arm: Make RVBAR available for all ARMv8 CPUs

From: Tobias Röhmel 

Signed-off-by: Tobias Röhmel 
---
 target/arm/cpu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 1b5d535788..9007768418 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -258,6 +258,10 @@ static void arm_cpu_reset(DeviceState *dev)
 env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
  CPACR, CP11, 3);
 #endif
+if (arm_feature(env, ARM_FEATURE_V8)) {
+env->cp15.rvbar = cpu->rvbar_prop;
+env->regs[15] = cpu->rvbar_prop;
+}
 }
 
 #if defined(CONFIG_USER_ONLY)
@@ -1273,7 +1277,7 @@ void arm_cpu_post_init(Object *obj)
 qdev_property_add_static(DEVICE(obj), &arm_cpu_reset_hivecs_property);
 }
 
-if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+if (arm_feature(&cpu->env, ARM_FEATURE_V8)) {
 object_property_add_uint64_ptr(obj, "rvbar",
&cpu->rvbar_prop,
OBJ_PROP_FLAG_READWRITE);
-- 
2.25.1

[PATCH v2 6/9] target/arm: Enable TTBCR_EAE for ARMv8-R AArch32

From: Tobias Röhmel 

ARMv8-R AArch32 CPUs behave as if TTBCR.EAE is always 1 even
tough they don't have the TTBCR register.
See ARM Architecture Reference Manual Supplement - ARMv8, for the ARMv8-R
AArch32 architecture profile Version:A.c section C1.2.

Signed-off-by: Tobias Röhmel 
---
 target/arm/debug_helper.c | 3 ++-
 target/arm/internals.h| 3 ++-
 target/arm/tlb_helper.c   | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index b18a6bd3a2..44b1e32974 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -434,7 +434,8 @@ static uint32_t arm_debug_exception_fsr(CPUARMState *env)
 using_lpae = true;
 } else {
 if (arm_feature(env, ARM_FEATURE_LPAE) &&
-(env->cp15.tcr_el[target_el].raw_tcr & TTBCR_EAE)) {
+((env->cp15.tcr_el[target_el].raw_tcr & TTBCR_EAE)
+|| arm_feature(env, ARM_FEATURE_V8_R))) {
 using_lpae = true;
 }
 }
diff --git a/target/arm/internals.h b/target/arm/internals.h
index b03049d920..e2a2b03d41 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -254,7 +254,8 @@ static inline bool extended_addresses_enabled(CPUARMState 
*env)
 {
 TCR *tcr = &env->cp15.tcr_el[arm_is_secure(env) ? 3 : 1];
 return arm_el_is_aa64(env, 1) ||
-   (arm_feature(env, ARM_FEATURE_LPAE) && (tcr->raw_tcr & TTBCR_EAE));
+   (arm_feature(env, ARM_FEATURE_LPAE) && ((tcr->raw_tcr & TTBCR_EAE)
+   || arm_feature(env, ARM_FEATURE_V8_R)));
 }
 
 /* Update a QEMU watchpoint based on the information the guest has set in the
diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
index 7d8a86b3c4..891326edb8 100644
--- a/target/arm/tlb_helper.c
+++ b/target/arm/tlb_helper.c
@@ -20,7 +20,8 @@ bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx 
mmu_idx)
 return true;
 }
 if (arm_feature(env, ARM_FEATURE_LPAE)
-&& (regime_tcr(env, mmu_idx)->raw_tcr & TTBCR_EAE)) {
+&& ((regime_tcr(env, mmu_idx)->raw_tcr & TTBCR_EAE)
+|| arm_feature(env, ARM_FEATURE_V8_R))) {
 return true;
 }
 return false;
-- 
2.25.1

[PATCH v2 0/9] Add Cortex-R52

v2:
PATCH 1:
I have left the flag in for now because there there is a lot of use for it in 
the MPU translation code.
The PMSAv8r differs in quite a view ways from the implementation in the 
Cortex-M. I think using
!ARM_FEATURE_M in all of those cases might run into problems down the road when 
new things are added.
But I'll gladly change that if those concerns are not valid.
PATCH 2:
Patch was moved and I removed the ATCM... registers.
PATCH 3:
The issue here is that the R52 has the MPUIR register which shares the encoding 
with one of the
MIDR alias registers. It's now changed to not add that register for 
ARM_FEATURE_V8_R.
PATCH 4:
Added RVBAR for all v8 CPUs instead of just ARMv8r
PATCH 7:
Instead of setting TTBCR_EAE to 1, change all functions that rely on that value 
and are
relevant for Cortex-R52
PATCH 10:
SPSR_hyp changes removed
PATCH 11:
Removed the r52_machine. The issue with adding the Cortex-R52 to the virt board 
is that target_page.bits
is expected to be 12 but is set to 10 for ARM_FEATURE_PMSA. Simply changing 
that or using
virt2.7(which doesn't have that restriction) leads to problems with memory 
accesses. It might be
best to model an existing board.

These patches add the ARM Cortex-R52. The biggest addition is an implementation 
of the armv8-r MPU.

All information is taken from:
- ARM Cortex-R52 TRM revision r1p3
- ARM Architecture Reference Manual Supplement
-ARMv8 for the ARMv8-R AArch32 architecture profile Version A.c

Functionality that is not implemented:
- Changing between single and double precision floats
- Some hypervisor related functionality (HCR.T(R)VM,HADFSR,...)

Tobias Röhmel (9):
  target/arm: Add ARM_FEATURE_V8_R
  target/arm: Don't add all MIDR aliases for Cortex-R
  target/arm: Make RVBAR available for all ARMv8 CPUs
  target/arm: Make stage_2_format for cache attributes optional
  target/arm: Add ARMCacheAttrs to the signature of pmsav8_mpu_lookup
  target/arm: Enable TTBCR_EAE for ARMv8-R AArch32
  target/arm: Add PMSAv8r registers
  target/arm: Add PMSAv8r functionality
  target/arm: Add ARM Cortex-R52 cpu

 target/arm/cpu.c  |   6 +-
 target/arm/cpu.h  |  11 +++
 target/arm/cpu_tcg.c  |  42 +
 target/arm/debug_helper.c |   3 +-
 target/arm/helper.c   | 183 +++-
 target/arm/internals.h|  16 ++--
 target/arm/m_helper.c |   3 +-
 target/arm/ptw.c  | 191 --
 target/arm/tlb_helper.c   |   3 +-
 9 files changed, 417 insertions(+), 41 deletions(-)

-- 
2.25.1

[PATCH v2 2/9] target/arm: Don't add all MIDR aliases for Cortex-R

From: Tobias Röhmel 

Cortex-R52 has the MPUIR register which has the same encoding
has the MIDR alias with opc2=4. So we only add that alias
when we are not realizing a Cortex-R.

Signed-off-by: Tobias Röhmel 
---
 target/arm/helper.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 6457e6301c..03bdc3d149 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8189,9 +8189,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .fieldoffset = offsetof(CPUARMState, cp15.c0_cpuid),
   .readfn = midr_read },
 /* crn = 0 op1 = 0 crm = 0 op2 = 4,7 : AArch32 aliases of MIDR */
-{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
-  .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4,
-  .access = PL1_R, .resetvalue = cpu->midr },
 { .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
   .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 7,
   .access = PL1_R, .resetvalue = cpu->midr },
@@ -8201,6 +8198,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .accessfn = access_aa64_tid1,
   .type = ARM_CP_CONST, .resetvalue = cpu->revidr },
 };
+ARMCPRegInfo id_v8_midr_alias_cp_reginfo[] = {
+{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
+  .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4,
+  .access = PL1_R, .resetvalue = cpu->midr },
+};
 ARMCPRegInfo id_cp_reginfo[] = {
 /* These are common to v8 and pre-v8 */
 { .name = "CTR",
@@ -8264,8 +8266,12 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 id_mpuir_reginfo.access = PL1_RW;
 id_tlbtr_reginfo.access = PL1_RW;
 }
+
 if (arm_feature(env, ARM_FEATURE_V8)) {
 define_arm_cp_regs(cpu, id_v8_midr_cp_reginfo);
+if (!arm_feature(env, ARM_FEATURE_V8_R)) {
+define_arm_cp_regs(cpu, id_v8_midr_alias_cp_reginfo);
+}
 } else {
 define_arm_cp_regs(cpu, id_pre_v8_midr_cp_reginfo);
 }
-- 
2.25.1

[PATCH v2 4/9] target/arm: Make stage_2_format for cache attributes optional

From: Tobias Röhmel 

The Cortex-R52 has a 2 stage MPU translation process but doesn't have the 
FEAT_S2FWB feature. This makes it neccessary to allow for the old cache 
attribut combination.
This is facilitated by changing the control path of combine_cacheattrs instead 
of failing if the second cache attributes struct is not in that format.

Signed-off-by: Tobias Röhmel 
---
 target/arm/ptw.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 4d97a24808..8b037c1f55 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2108,7 +2108,11 @@ static uint8_t combined_attrs_nofwb(CPUARMState *env,
 {
 uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs;
 
-s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
+if (s2.is_s2_format) {
+s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
+} else {
+s2_mair_attrs = s2.attrs;
+}
 
 s1lo = extract32(s1.attrs, 0, 4);
 s2lo = extract32(s2_mair_attrs, 0, 4);
@@ -2166,6 +2170,8 @@ static uint8_t force_cacheattr_nibble_wb(uint8_t attr)
 static uint8_t combined_attrs_fwb(CPUARMState *env,
   ARMCacheAttrs s1, ARMCacheAttrs s2)
 {
+assert(s2.is_s2_format && !s1.is_s2_format);
+
 switch (s2.attrs) {
 case 7:
 /* Use stage 1 attributes */
@@ -2215,7 +2221,6 @@ static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
 ARMCacheAttrs ret;
 bool tagged = false;
 
-assert(s2.is_s2_format && !s1.is_s2_format);
 ret.is_s2_format = false;
 
 if (s1.attrs == 0xf0) {
-- 
2.25.1

[PATCH v2 9/9] target/arm: Add ARM Cortex-R52 cpu

From: Tobias Röhmel 

All constants are taken from the ARM Cortex-R52 Processor TRM Revision: r1p3

Signed-off-by: Tobias Röhmel 
---
 target/arm/cpu_tcg.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index b751a19c8a..e0f445dc91 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -843,6 +843,47 @@ static void cortex_r5_initfn(Object *obj)
 define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
 }
 
+static void cortex_r52_initfn(Object *obj)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+
+set_feature(&cpu->env, ARM_FEATURE_V8);
+set_feature(&cpu->env, ARM_FEATURE_V8_R);
+set_feature(&cpu->env, ARM_FEATURE_EL2);
+set_feature(&cpu->env, ARM_FEATURE_PMSA);
+set_feature(&cpu->env, ARM_FEATURE_NEON);
+set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
+cpu->midr = 0x411fd133; /* r1p3 */
+cpu->revidr = 0x;
+cpu->reset_fpsid = 0x41034023;
+cpu->isar.mvfr0 = 0x10110222;
+cpu->isar.mvfr1 = 0x1211;
+cpu->isar.mvfr2 = 0x0043;
+cpu->ctr = 0x8144c004;
+cpu->reset_sctlr = 0x30c50838;
+cpu->isar.id_pfr0 = 0x0131;
+cpu->isar.id_pfr1 = 0x10111001;
+cpu->isar.id_dfr0 = 0x03010006;
+cpu->id_afr0 = 0x;
+cpu->isar.id_mmfr0 = 0x00211040;
+cpu->isar.id_mmfr1 = 0x4000;
+cpu->isar.id_mmfr2 = 0x0120;
+cpu->isar.id_mmfr3 = 0xf0102211;
+cpu->isar.id_mmfr4 = 0x0010;
+cpu->isar.id_isar0 = 0x02101110;
+cpu->isar.id_isar1 = 0x13112111;
+cpu->isar.id_isar2 = 0x21232142;
+cpu->isar.id_isar3 = 0x01112131;
+cpu->isar.id_isar4 = 0x00010142;
+cpu->isar.id_isar5 = 0x00010001;
+cpu->isar.dbgdidr = 0x77168000;
+cpu->clidr = (1 << 27) | (1 << 24) | 0x3;
+cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */
+cpu->ccsidr[1] = 0x201fe00a; /* 32KB L1 icache */
+
+cpu->pmsav7_dregion = 16;
+}
+
 static void cortex_r5f_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
@@ -1149,6 +1190,7 @@ static const ARMCPUInfo arm_tcg_cpus[] = {
  .class_init = arm_v7m_class_init },
 { .name = "cortex-r5",   .initfn = cortex_r5_initfn },
 { .name = "cortex-r5f",  .initfn = cortex_r5f_initfn },
+{ .name = "cortex-r52",  .initfn = cortex_r52_initfn },
 { .name = "ti925t",  .initfn = ti925t_initfn },
 { .name = "sa1100",  .initfn = sa1100_initfn },
 { .name = "sa1110",  .initfn = sa1110_initfn },
-- 
2.25.1

[PATCH v2 1/9] target/arm: Add ARM_FEATURE_V8_R

From: Tobias Röhmel 

This flag is necessary to add features for the Cortex-R52.

Signed-off-by: Tobias Röhmel 
---
 target/arm/cpu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index df677b2d5d..86e06116a9 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2287,6 +2287,7 @@ enum arm_features {
 ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
 ARM_FEATURE_M_MAIN, /* M profile Main Extension */
 ARM_FEATURE_V8_1M, /* M profile extras only in v8.1M and later */
+ARM_FEATURE_V8_R,
 };
 
 static inline int arm_feature(CPUARMState *env, int feature)
-- 
2.25.1

Re: [PATCH v7 07/13] multifd: Make flags field thread local

2022-07-18 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Use of flags with respect to locking was incensistant.  For the
> sending side:
> - it was set to 0 with mutex held on the multifd channel.
> - MULTIFD_FLAG_SYNC was set with mutex held on the migration thread.
> - Everything else was done without the mutex held on the multifd channel.
> 
> On the reception side, it is not used on the migration thread, only on
> the multifd channels threads.
> 
> So we move it to the multifd channels thread only variables, and we
> introduce a new bool sync_needed on the send side to pass that information.

Does this need rework after Leo's recent changes?

Dave

> Signed-off-by: Juan Quintela 
> ---
>  migration/multifd.h | 10 ++
>  migration/multifd.c | 23 +--
>  2 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 8a45dda58c..af8ce8921d 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -98,12 +98,12 @@ typedef struct {
>  bool running;
>  /* should this thread finish */
>  bool quit;
> -/* multifd flags for each packet */
> -uint32_t flags;
>  /* global number of generated multifd packets */
>  uint64_t packet_num;
>  /* How many bytes have we sent on the last packet */
>  uint64_t sent_bytes;
> +/* Do we need to do an iteration sync */
> +bool sync_needed;
>  /* thread has work to do */
>  int pending_job;
>  /* array of pages to sent.
> @@ -117,6 +117,8 @@ typedef struct {
>  
>  /* pointer to the packet */
>  MultiFDPacket_t *packet;
> +/* multifd flags for each packet */
> +uint32_t flags;
>  /* size of the next packet that contains pages */
>  uint32_t next_packet_size;
>  /* packets sent through this channel */
> @@ -163,8 +165,6 @@ typedef struct {
>  bool running;
>  /* should this thread finish */
>  bool quit;
> -/* multifd flags for each packet */
> -uint32_t flags;
>  /* global number of generated multifd packets */
>  uint64_t packet_num;
>  
> @@ -172,6 +172,8 @@ typedef struct {
>  
>  /* pointer to the packet */
>  MultiFDPacket_t *packet;
> +/* multifd flags for each packet */
> +uint32_t flags;
>  /* size of the next packet that contains pages */
>  uint32_t next_packet_size;
>  /* packets sent through this channel */
> diff --git a/migration/multifd.c b/migration/multifd.c
> index eef47c274f..69b9d7cf98 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -602,7 +602,7 @@ int multifd_send_sync_main(QEMUFile *f)
>  }
>  
>  p->packet_num = multifd_send_state->packet_num++;
> -p->flags |= MULTIFD_FLAG_SYNC;
> +p->sync_needed = true;
>  p->pending_job++;
>  qemu_mutex_unlock(&p->mutex);
>  qemu_sem_post(&p->sem);
> @@ -656,7 +656,11 @@ static void *multifd_send_thread(void *opaque)
>  
>  if (p->pending_job) {
>  uint64_t packet_num = p->packet_num;
> -uint32_t flags = p->flags;
> +p->flags = 0;
> +if (p->sync_needed) {
> +p->flags |= MULTIFD_FLAG_SYNC;
> +p->sync_needed = false;
> +}
>  p->normal_num = 0;
>  
>  if (use_zero_copy_send) {
> @@ -678,14 +682,13 @@ static void *multifd_send_thread(void *opaque)
>  }
>  }
>  multifd_send_fill_packet(p);
> -p->flags = 0;
>  p->num_packets++;
>  p->total_normal_pages += p->normal_num;
>  p->pages->num = 0;
>  p->pages->block = NULL;
>  qemu_mutex_unlock(&p->mutex);
>  
> -trace_multifd_send(p->id, packet_num, p->normal_num, flags,
> +trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
> p->next_packet_size);
>  
>  if (use_zero_copy_send) {
> @@ -713,7 +716,7 @@ static void *multifd_send_thread(void *opaque)
>  p->pending_job--;
>  qemu_mutex_unlock(&p->mutex);
>  
> -if (flags & MULTIFD_FLAG_SYNC) {
> +if (p->flags & MULTIFD_FLAG_SYNC) {
>  qemu_sem_post(&p->sem_sync);
>  }
>  qemu_sem_post(&multifd_send_state->channels_ready);
> @@ -1090,7 +1093,7 @@ static void *multifd_recv_thread(void *opaque)
>  rcu_register_thread();
>  
>  while (true) {
> -uint32_t flags;
> +bool sync_needed = false;
>  
>  if (p->quit) {
>  break;
> @@ -1112,11 +1115,11 @@ static void *multifd_recv_thread(void *opaque)
>  break;
>  }
>  
> -flags = p->flags;
> +trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
> +   p->next_packet_size);
> +sync_needed = p->flags & MULTIFD_FLAG_SYNC;
>  /* recv methods don't know how to handle the SYNC flag *

[PATCH v2 5/9] target/arm: Add ARMCacheAttrs to the signature of pmsav8_mpu_lookup

From: Tobias Röhmel 

Add ARMCacheAttrs to the signature of pmsav8_mpu_lookup to prepare for the 
Cortex-R52 MPU which uses and combines cache attributes of different 
translation levels.

Signed-off-by: Tobias Röhmel 
---
 target/arm/internals.h | 13 +++--
 target/arm/m_helper.c  |  3 ++-
 target/arm/ptw.c   | 11 +++
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 6f94f3019d..b03049d920 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1109,12 +1109,6 @@ void v8m_security_lookup(CPUARMState *env, uint32_t 
address,
  MMUAccessType access_type, ARMMMUIdx mmu_idx,
  V8M_SAttributes *sattrs);
 
-bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
-   MMUAccessType access_type, ARMMMUIdx mmu_idx,
-   hwaddr *phys_ptr, MemTxAttrs *txattrs,
-   int *prot, bool *is_subpage,
-   ARMMMUFaultInfo *fi, uint32_t *mregion);
-
 /* Cacheability and shareability attributes for a memory access */
 typedef struct ARMCacheAttrs {
 /*
@@ -1126,6 +1120,13 @@ typedef struct ARMCacheAttrs {
 bool is_s2_format:1;
 } ARMCacheAttrs;
 
+bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
+   MMUAccessType access_type, ARMMMUIdx mmu_idx,
+   hwaddr *phys_ptr, MemTxAttrs *txattrs,
+   int *prot, bool *is_subpage,
+   ARMMMUFaultInfo *fi, uint32_t *mregion,
+   ARMCacheAttrs *cacheattrs);
+
 bool get_phys_addr(CPUARMState *env, target_ulong address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot,
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index a740c3e160..44c80d733a 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -2829,10 +2829,11 @@ uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t 
addr, uint32_t op)
  * inspecting the other MPU state.
  */
 if (arm_current_el(env) != 0 || alt) {
+ARMCacheAttrs cacheattrs = {0};
 /* We can ignore the return value as prot is always set */
 pmsav8_mpu_lookup(env, addr, MMU_DATA_LOAD, mmu_idx,
   &phys_addr, &attrs, &prot, &is_subpage,
-  &fi, &mregion);
+  &fi, &mregion, &cacheattrs);
 if (mregion == -1) {
 mrvalid = false;
 mregion = 0;
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 8b037c1f55..c4f5721012 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1702,7 +1702,8 @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
hwaddr *phys_ptr, MemTxAttrs *txattrs,
int *prot, bool *is_subpage,
-   ARMMMUFaultInfo *fi, uint32_t *mregion)
+   ARMMMUFaultInfo *fi, uint32_t *mregion,
+   ARMCacheAttrs *cacheattrs)
 {
 /*
  * Perform a PMSAv8 MPU lookup (without also doing the SAU check
@@ -1968,7 +1969,7 @@ static bool get_phys_addr_pmsav8(CPUARMState *env, 
uint32_t address,
  MMUAccessType access_type, ARMMMUIdx mmu_idx,
  hwaddr *phys_ptr, MemTxAttrs *txattrs,
  int *prot, target_ulong *page_size,
- ARMMMUFaultInfo *fi)
+ ARMMMUFaultInfo *fi, ARMCacheAttrs 
*cacheattrs)
 {
 uint32_t secure = regime_is_secure(env, mmu_idx);
 V8M_SAttributes sattrs = {};
@@ -2036,7 +2037,8 @@ static bool get_phys_addr_pmsav8(CPUARMState *env, 
uint32_t address,
 }
 
 ret = pmsav8_mpu_lookup(env, address, access_type, mmu_idx, phys_ptr,
-txattrs, prot, &mpu_is_subpage, fi, NULL);
+txattrs, prot, &mpu_is_subpage, fi,
+NULL, cacheattrs);
 *page_size = sattrs.subpage || mpu_is_subpage ? 1 : TARGET_PAGE_SIZE;
 return ret;
 }
@@ -2416,7 +2418,8 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 if (arm_feature(env, ARM_FEATURE_V8)) {
 /* PMSAv8 */
 ret = get_phys_addr_pmsav8(env, address, access_type, mmu_idx,
-   phys_ptr, attrs, prot, page_size, fi);
+   phys_ptr, attrs, prot, page_size,
+   fi, cacheattrs);
 } else if (arm_feature(env, ARM_FEATURE_V7)) {
 /* PMSAv7 */
 ret = get_phys_addr_pmsav7(env, address, access_type, mmu_idx,
-- 
2.25.1

Re: [RISU PATCH v4 24/29] Add --fulldump and --diffdup options

On Fri, 8 Jul 2022 at 17:51, Richard Henderson
 wrote:
>
> These allow the inspection of the trace files.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

[PATCH v2 8/9] target/arm: Add PMSAv8r functionality

From: Tobias Röhmel 

Add PMSAv8r translation that is used by the ARM Cortex-R52.

Signed-off-by: Tobias Röhmel 
---
 target/arm/ptw.c | 171 +--
 1 file changed, 150 insertions(+), 21 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index c4f5721012..c7e37c66d0 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -140,6 +140,9 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx)
  */
 return true;
 }
+} else if (arm_feature(env, ARM_FEATURE_V8_R)) {
+return !(regime_sctlr(env, mmu_idx) & SCTLR_M) ||
+(!(regime_el(env, mmu_idx) == 2) && arm_hcr_el2_eff(env) & HCR_TGE);
 }
 
 hcr_el2 = arm_hcr_el2_eff(env);
@@ -1504,6 +1507,8 @@ static bool pmsav7_use_background_region(ARMCPU *cpu, 
ARMMMUIdx mmu_idx,
 if (arm_feature(env, ARM_FEATURE_M)) {
 return env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)]
 & R_V7M_MPU_CTRL_PRIVDEFENA_MASK;
+} else if (arm_feature(env, ARM_FEATURE_V8_R)) {
+return false;
 } else {
 return regime_sctlr(env, mmu_idx) & SCTLR_BR;
 }
@@ -1698,6 +1703,77 @@ static bool get_phys_addr_pmsav7(CPUARMState *env, 
uint32_t address,
 return !(*prot & (1 << access_type));
 }
 
+static uint32_t *regime_rbar(CPUARMState *env, ARMMMUIdx mmu_idx,
+ uint32_t secure)
+{
+if (arm_feature(env, ARM_FEATURE_V8_R)) {
+if (regime_el(env, mmu_idx) == 2) {
+return env->pmsav8.hprbarn;
+} else {
+return env->pmsav8.prbarn;
+}
+} else {
+ return env->pmsav8.rbar[secure];
+}
+}
+
+static uint32_t *regime_rlar(CPUARMState *env, ARMMMUIdx mmu_idx,
+ uint32_t secure)
+{
+if (arm_feature(env, ARM_FEATURE_V8_R)) {
+if (regime_el(env, mmu_idx) == 2) {
+return env->pmsav8.hprlarn;
+} else {
+return env->pmsav8.prlarn;
+}
+} else {
+return env->pmsav8.rlar[secure];
+}
+}
+
+static inline void get_phys_addr_pmsav8_default(CPUARMState *env,
+ARMMMUIdx mmu_idx,
+uint32_t address, int *prot)
+{
+if (arm_feature(env, ARM_FEATURE_V8_R)) {
+*prot = PAGE_READ | PAGE_WRITE;
+if (address <= 0x7FFF) {
+*prot |= PAGE_EXEC;
+}
+if ((regime_el(env, mmu_idx) == 2)
+&& (regime_sctlr(env, mmu_idx) & SCTLR_WXN)
+&& (regime_sctlr(env, mmu_idx) & SCTLR_M)) {
+*prot &= ~PAGE_EXEC;
+}
+} else {
+get_phys_addr_pmsav7_default(env, mmu_idx, address, prot);
+}
+}
+
+static bool pmsav8_fault(bool hit, CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+if (arm_feature(env, ARM_FEATURE_V8_R)) {
+if (regime_el(env, mmu_idx) == 2) {
+if (!hit && (mmu_idx != ARMMMUIdx_E2)) {
+return true;
+} else if (!hit && (mmu_idx == ARMMMUIdx_E2)
+   &&!(regime_sctlr(env, mmu_idx) & SCTLR_BR)) {
+return true;
+}
+} else {
+if (!hit && (mmu_idx != ARMMMUIdx_Stage1_E1)) {
+return true;
+} else if (!hit && (mmu_idx == ARMMMUIdx_Stage1_E1)
+   &&!(regime_sctlr(env, mmu_idx) & SCTLR_BR)) {
+return true;
+}
+}
+return false;
+} else {
+return !hit;
+}
+}
+
 bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
hwaddr *phys_ptr, MemTxAttrs *txattrs,
@@ -1730,6 +1806,12 @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t 
address,
 *mregion = -1;
 }
 
+if (arm_feature(env, ARM_FEATURE_V8_R)) {
+if (mmu_idx == ARMMMUIdx_Stage2) {
+fi->stage2 = true;
+}
+}
+
 /*
  * Unlike the ARM ARM pseudocode, we don't need to check whether this
  * was an exception vector read from the vector table (which is always
@@ -1746,17 +1828,26 @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t 
address,
 hit = true;
 }
 
+uint32_t bitmask;
+if (arm_feature(env, ARM_FEATURE_V8_R)) {
+bitmask = 0x3f;
+} else {
+bitmask = 0x1f;
+}
+
+
 for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
 /* region search */
 /*
- * Note that the base address is bits [31:5] from the register
- * with bits [4:0] all zeroes, but the limit address is bits
- * [31:5] from the register with bits [4:0] all ones.
+ * Note that the base address is bits [31:x] from the register
+ * with bits [x-1:0] all zeroes, but the limit address is bits
+ * [31:x] from the register with bits [

Re: [RISU PATCH v4 27/29] aarch64: Tidy reginfo dumping ahead of ZA state

On Fri, 8 Jul 2022 at 17:14, Richard Henderson
 wrote:
>
> A misalignment for sve_vl, plus add a bit more space
> on the left for the ZA[n] field name.
>
> Signed-off-by: Richard Henderson 
> ---
>  risu_reginfo_aarch64.c | 29 ++---
>  1 file changed, 18 insertions(+), 11 deletions(-)

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RISU PATCH v4 28/29] aarch64: Add support for ZA storage

On Fri, 8 Jul 2022 at 18:07, Richard Henderson
 wrote:
>
> Require NVL == SVL on startup, to make it easier to manage reginfo.
> Most of the time PSTATE.SM would be active with PSTATE.ZA anyway,
> for any non-trivial SME testing.
>
> Extend saved storage only when PSTATE.ZA is active.
> Use a carefully reserved uint16_t for saving SVCR.
>
> Signed-off-by: Richard Henderson 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RISU PATCH v4 23/29] Standardize reginfo_dump_mismatch printing

On Fri, 8 Jul 2022 at 17:39, Richard Henderson
 wrote:
>
> Hoist the "master vs apprentice" label to apprentice(), since
> we will want different labels for dumping.  Remove all of the
> "mismatch" text from reginfo_dump_mismatch -- just print "vs".
>
> Signed-off-by: Richard Henderson 

> diff --git a/risu_reginfo_ppc64.c b/risu_reginfo_ppc64.c
> index 9899b36..e96dc48 100644
> --- a/risu_reginfo_ppc64.c
> +++ b/risu_reginfo_ppc64.c
> @@ -27,6 +27,15 @@
>  const struct option * const arch_long_opts;
>  const char * const arch_extra_help;
>
> +static const char * const greg_names[NGREG] = {
> +"r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
> +"r8",  "r9", "r10", "r11", "r12", "r13", "r14", "r15",
> +   "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
> +   "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31",
> +   [XER] = "xer",
> +   [CCR] = "ccr",
> +};
> +
>  void process_arch_opt(int opt, const char *arg)
>  {
>  abort();
> @@ -147,35 +156,21 @@ int reginfo_dump(struct reginfo *ri, FILE * f)
>  return !ferror(f);
>  }
>
> -int reginfo_dump_mismatch(struct reginfo *m, struct reginfo *a, FILE *f)
> +void reginfo_dump_mismatch(struct reginfo *m, struct reginfo *a, FILE *f)
>  {
>  int i;
> -for (i = 0; i < 32; i++) {
> -if (i == 1 || i == 13) {
> -continue;
> +
> +for (i = 0; i < NGREG; i++) {
> +if (greg_names[i] != NULL && m->gregs[i] != a->gregs[i]) {
> +fprintf(f, "%6s: %016lx vs %016lx\n",
> +greg_names[i], m->gregs[i], a->gregs[i]);
>  }

This used to skip r1 and r13, and now it does not.

> -
> -if (m->gregs[i] != a->gregs[i]) {
> -fprintf(f, "Mismatch: Register r%d\n", i);
> -fprintf(f, "master: [%lx] - apprentice: [%lx]\n",
> -m->gregs[i], a->gregs[i]);
> -}
> -}
> -
> -if (m->gregs[XER] != a->gregs[XER]) {
> -fprintf(f, "Mismatch: XER\n");
> -fprintf(f, "m: [%lx] != a: [%lx]\n", m->gregs[XER], a->gregs[XER]);
> -}
> -
> -if (m->gregs[CCR] != a->gregs[CCR]) {
> -fprintf(f, "Mismatch: Cond. Register\n");
> -fprintf(f, "m: [%lx] != a: [%lx]\n", m->gregs[CCR], a->gregs[CCR]);
>  }
>
>  for (i = 0; i < 32; i++) {
>  if (m->fpregs[i] != a->fpregs[i]) {
> -fprintf(f, "Mismatch: Register f%d\n", i);
> -fprintf(f, "m: [%016lx] != a: [%016lx]\n",
> +fprintf(f, "%*s%d: %016lx vs %016lx\n",
> +6 - (i < 10 ? 1 : 2), "f", i,
>  m->fpregs[i], a->fpregs[i]);
>  }
>  }

thanks
-- PMM

Re: [RISU PATCH v4 25/29] Remove return value from reginfo_dump

On Fri, 8 Jul 2022 at 18:05, Richard Henderson
 wrote:
>
> No uses actually checked the error indication.  Even if we wanted
> to check ferror on the stream, we should do that generically rather
> than per arch.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH] block/rbd: support driver-specific reopen

2022-07-18 Thread Raphael Pour


On 7/1/22 11:41, Hanna Reitz wrote:

On 13.04.22 14:26, Raphael Pour wrote:
 >>   }

-    return ret;
+    /*
+ * Remove all keys from the generic layer which
+ * can't be converted by rbd
+ * >
Does any other driver do this?  Shouldn’t we leave them there so that 
the generic layer can verify that they aren’t changed?



+    qdict_del(state->options, "driver");
+    qdict_del(state->options, "node-name");
+    qdict_del(state->options, "auto-read-only");
+    qdict_del(state->options, "discard");
+    qdict_del(state->options, "cache");


Because AFAIU this would mean that users could attempt to change e.g. 
the @cache option, and wouldn’t receive an error back, even though there 
is no support for changing it.



+
+    /*
+ * To maintain the compatibility prior the rbd-reopen,
+ * where the generic layer can be altered without any
+ * rbd argument given,


What does “the generic layer can be altered” mean?  As far as I 
understand, it was only possible to change between read/write and 
read-only access.



+
+    keypairs = g_strdup(qdict_get_try_str(state->options, 
"=keyvalue-pairs"));

+    if (keypairs) {
+    qdict_del(state->options, "=keyvalue-pairs");
+    }
+
+    secretid = g_strdup(qdict_get_try_str(state->options, 
"password-secret"));

+    if (secretid) {
+    qdict_del(state->options, "password-secret");
+    }
+
+    r = qemu_rbd_convert_options(state->options, &opts, &local_err);
+    if (local_err) {
+    /*
+ * If keypairs are present, that means some options are 
present in
+ * the modern option format.  Don't attempt to parse legacy 
option

+ * formats, as we won't support mixed usage.
+ */
+    if (keypairs) {
+    error_propagate(errp, local_err);
+    goto out;
+    }
+
+    /*
+ * If the initial attempt to convert and process the options 
failed,
+ * we may be attempting to open an image file that has the 
rbd options
+ * specified in the older format consisting of all key/value 
pairs

+ * encoded in the filename.  Go ahead and attempt to parse the
+ * filename, and see if we can pull out the required options.
+ */
+    r = qemu_rbd_attempt_legacy_options(state->options, &opts, 
&keypairs);

+    if (r < 0) {
+    /*
+ * Propagate the original error, not the legacy parsing 
fallback

+ * error, as the latter was just a best-effort attempt.
+ */
+    error_propagate(errp, local_err);
+    goto out;
+    }
+    /*
+ * Take care whenever deciding to actually deprecate; once 
this ability
+ * is removed, we will not be able to open any images with 
legacy-styled

+ * backing image strings.
+ */
+    warn_report("RBD options encoded in the filename as keyvalue 
pairs "

+    "is deprecated");
+    }
+
+    /*
+ * Remove the processed options from the QDict (the visitor 
processes

+ * _all_ options in the QDict)
+ */
+    while ((e = qdict_first(state->options))) {
+    qdict_del(state->options, e->key);
+    }

>
> OK, I see why you’d then want to remove all non-RBD options before this
> point.  Other block drivers seem to solve this by having an
> X_runtime_opts QemuOptsList, which contains all driver-handled options,
> so they can then use qemu_opts_absorb_qdict() to extract the options
> they can handle.  (Compare e.g. qcow2_update_options_prepare().)
>

Looking through all the drivers, rbd seems to be the only one not 
absorbing its options via runtime_opts but instead using 
qemu_rbd_convert_options. The converter visits all the options from 
BlockdevOptionsRbd defined in block-core.json. If there is an unknown 
option the conversion fails with EINVAL.


The runtime_opts in contrast to the already defined json schema with the 
visitor-code-generation seem to be redundant. Do you have some insights 
why and when this redundancy is reasonable?


I came up with several alternatives:

1. add own runtime_opts:
  - pro: "the qemu way", it behaves like most of the drivers
  - con: redundancy to qemu_rbd_convert_options which does the same 
except skipping the generic block layer options and adds +1k lines
  - con: I couldn't figure out how to add non-primitive options to the 
runtime_opts like encrypt or server

2. tell visitor to ignore unknown keys (?)
3. parse rbd options manually (opposite of deleting the generic block keys)

I agree deleting the generic block opts isn't optimal.

What do you think?

Your remaining points are also reasonable and I'll submit their fix 
along with the solution to the issue above.


OpenPGP_0xCDB1EBB785C5EB7E.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

[PATCH v2] vhost: Get vring base from vq, not svq

The SVQ vring used idx usually match with the guest visible one, as long
as all the guest buffers (GPA) maps to exactly one buffer within qemu's
VA. However, as we can see in virtqueue_map_desc, a single guest buffer
could map to many buffers in SVQ vring.

Also, its also a mistake to rewind them at the source of migration.
Since VirtQueue is able to migrate the inflight descriptors, its
responsability of the destination to perform the rewind just in case it
cannot report the inflight descriptors to the device.

This makes easier to migrate between backends or to recover them in
vhost devices that support set in flight descriptors.

Fixes: 6d0b22266633 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")
Signed-off-by: Eugenio Pérez 

--
v2: Squash both fixes in one.
---
 hw/virtio/vhost-vdpa.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 795ed5a049..4458c8d23e 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1178,7 +1178,18 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev 
*dev,
struct vhost_vring_state *ring)
 {
 struct vhost_vdpa *v = dev->opaque;
+VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);

+/*
+ * vhost-vdpa devices does not support in-flight requests. Set all of them
+ * as available.
+ *
+ * TODO: This is ok for networking, but other kinds of devices might
+ * have problems with these retransmissions.
+ */
+while (virtqueue_rewind(vq, 1)) {
+continue;
+}
 if (v->shadow_vqs_enabled) {
 /*
  * Device vring base was set at device start. SVQ base is handled by
@@ -1194,21 +1205,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev 
*dev,
struct vhost_vring_state *ring)
 {
 struct vhost_vdpa *v = dev->opaque;
-int vdpa_idx = ring->index - dev->vq_index;
 int ret;

 if (v->shadow_vqs_enabled) {
-VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
-
-/*
- * Setting base as last used idx, so destination will see as available
- * all the entries that the device did not use, including the in-flight
- * processing ones.
- *
- * TODO: This is ok for networking, but other kinds of devices might
- * have problems with these retransmissions.
- */
-ring->num = svq->last_used_idx;
+ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index);
 return 0;
 }

--
2.31.1

Re: [RISU PATCH v4 00/29] risu cleanups and improvements

On Fri, 8 Jul 2022 at 16:49, Richard Henderson
 wrote:
>
> If you can imagine, v3 was back in 2020:
> https://lore.kernel.org/qemu-devel/20200522023440.26261-1-richard.hender...@linaro.org/
>
> I've refreshed that, not that risu has changed much in that time,
> and then also added support for SME to aarch64, i.e. SVCR and ZA
> storage are now present in the reginfo, and compared.
>
> I include a small test case, which allows one to see that ZA
> storage is being handled properly.  When run with
>
>   ./risu --test-sve=1 --test-za=1 --master -t test_sme_aarch64.{out,bin}
>   ./risu --fulldump -t test_sme_aarch64.out
>
> one can see the 16x16 bytes filled with row major then
> column major indexes.

I've applied patches 1-22 to risu upstream git, and left
review comments/reviewed-by tags on the rest.

thanks
-- PMM

Re: [PATCH 2/2] iotests/131: Add parallels regression test

2022-07-18 Thread Vladimir Sementsov-Ogievskiy


On 7/14/22 16:28, Hanna Reitz wrote:

Test an allocating write to a parallels image that has a backing node.
Before HEAD^, doing so used to give me a failed assertion (when the
backing node contains only `42` bytes; the results varies with the value
chosen, for `0` bytes, for example, all I get is EIO).

Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH 1/2] block/parallels: Fix buffer-based write call

2022-07-18 Thread Vladimir Sementsov-Ogievskiy


On 7/14/22 16:28, Hanna Reitz wrote:

Commit a4072543ccdddbd241d5962d9237b8b41fd006bf has changed the I/O here
from working on a local one-element I/O vector to just using the buffer
directly (using the bdrv_co_pread()/bdrv_co_pwrite() helper functions
introduced shortly before).

However, it only changed the bdrv_co_preadv() call to bdrv_co_pread() -
the subsequent bdrv_co_pwritev() call stayed this way, and so still
expects a QEMUIOVector pointer instead of a plain buffer.  We must
change that to be a bdrv_co_pwrite() call.

Fixes: a4072543ccdddbd241d5962d ("block/parallels: use buffer-based io")
Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [RISU PATCH v4 29/29] aarch64: Trivial SME test

On Fri, 8 Jul 2022 at 17:59, Richard Henderson
 wrote:
>
> Signed-off-by: Richard Henderson 
> ---
>  test_sme_aarch64.s | 55 ++
>  1 file changed, 55 insertions(+)
>  create mode 100644 test_sme_aarch64.s
>
> diff --git a/test_sme_aarch64.s b/test_sme_aarch64.s
> new file mode 100644
> index 000..a5ef909
> --- /dev/null
> +++ b/test_sme_aarch64.s
> @@ -0,0 +1,55 @@
> +   .arch_extension sme

Can we have the usual copyright-and-license comment at the
top of the file, please?

thanks
-- PMM

Re: [PATCH v8 00/12] s390x: CPU Topology

2022-07-18 Thread Pierre Morel





On 7/15/22 20:28, Janis Schoetterl-Glausch wrote:

On 7/15/22 15:47, Pierre Morel wrote:



On 7/15/22 11:31, Janis Schoetterl-Glausch wrote:

On 7/14/22 22:05, Pierre Morel wrote:



On 7/14/22 20:43, Janis Schoetterl-Glausch wrote:

On 6/20/22 16:03, Pierre Morel wrote:

Hi,

This new spin is essentially for coherence with the last Linux CPU
Topology patch, function testing and coding style modifications.

Forword
===

The goal of this series is to implement CPU topology for S390, it
improves the preceeding series with the implementation of books and
drawers, of non uniform CPU topology and with documentation.

To use these patches, you will need the Linux series version 10.
You find it there:
https://lkml.org/lkml/2022/6/20/590

Currently this code is for KVM only, I have no idea if it is interesting
to provide a TCG patch. If ever it will be done in another series.

To have a better understanding of the S390x CPU Topology and its
implementation in QEMU you can have a look at the documentation in the
last patch or follow the introduction here under.

A short introduction


CPU Topology is described in the S390 POP with essentially the description
of two instructions:

PTF Perform Topology function used to poll for topology change
   and used to set the polarization but this part is not part of this item.

STSI Store System Information and the SYSIB 15.1.x providing the Topology
   configuration.

S390 Topology is a 6 levels hierarchical topology with up to 5 level
   of containers. The last topology level, specifying the CPU cores.

   This patch series only uses the two lower levels sockets and cores.
    To get the information on the topology, S390 provides the STSI
   instruction, which stores a structures providing the list of the
   containers used in the Machine topology: the SYSIB.
   A selector within the STSI instruction allow to chose how many topology
   levels will be provide in the SYSIB.

   Using the Topology List Entries (TLE) provided inside the SYSIB we
   the Linux kernel is able to compute the information about the cache
   distance between two cores and can use this information to take
   scheduling decisions.


Do the socket, book, ... metaphors and looking at STSI from the existing
smp infrastructure even make sense?


Sorry, I do not understand.
I admit the cover-letter is old and I did not rewrite it really good since the 
first patch series.

What we do is:
Compute the STSI from the SMP + numa + device QEMU parameters .



STSI 15.1.x reports the topology to the guest and for a virtual machine,
this topology can be very dynamic. So a CPU can move from from one topology
container to another, but the socket of a cpu changing while it's running seems
a bit strange. And this isn't supported by this patch series as far as I 
understand,
the only topology changes are on hotplug.


A CPU changing from a socket to another socket is the only case the PTF 
instruction reports a change in the topology with the case a new CPU is plug in.


Can a CPU actually change between sockets right now?


To be exact, what I understand is that a shared CPU can be scheduled to another 
real CPU exactly as a guest vCPU can be scheduled by the host to another host 
CPU.


Ah, ok, this is what I'm forgetting, and what made communication harder,
there are two ways by which the topology can change:
1. the host topology changes
2. the vCPU threads are scheduled on another host CPU

I've been only thinking about the 2.
I assumed some outside entity (libvirt?) pins vCPU threads, and so it would
be the responsibility of that entity to set the topology which then is
reported to the guest. So if you pin vCPUs for the whole lifetime of the vm
then you could do that by specifying the topology up front with -devices.
If you want to support migration, then the outside entity would need a way
to tell qemu the updated topology.


Yes

  



The socket-id is computed from the core-id, so it's fixed, is it not?


the virtual socket-id is computed from the virtual core-id


Meaning cpu.env.core_id, correct? (which is the same as cpu.cpu_index which is 
the same as
ms->possible_cpus->cpus[core_id].props.core_id)
And a cpu's core id doesn't change during the lifetime of the vm, right?


right


And so it's socket id doesn't either.


Yes








It is not expected to appear often but it does appear.
The code has been removed from the kernel in spin 10 for 2 reasons:
1) we decided to first support only dedicated and pinned CPU> 2) Christian 
fears it may happen too often due to Linux host scheduling and could be a 
performance problem


This seems sensible, but now it seems too static.
For example after migration, you cannot tell the guest which CPUs are in the 
same socket, book, ...,
unless I'm misunderstanding something.


No, to do this we would need to ask the kernel about it.


You mean polling /sys/devices/system/cpu/cpu*/topology/*_id

Re: [PATCH v7 08/13] multifd: Prepare to send a packet without the mutex held

2022-07-18 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> We do the send_prepare() and the fill of the head packet without the
> mutex held.  It will help a lot for compression and later in the
> series for zero pages.
> 
> Notice that we can use p->pages without holding p->mutex because
> p->pending_job == 1.
> 
> Signed-off-by: Juan Quintela 
> ---
>  migration/multifd.h |  2 ++
>  migration/multifd.c | 11 ++-
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index af8ce8921d..d48597a1ea 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -109,7 +109,9 @@ typedef struct {
>  /* array of pages to sent.
>   * The owner of 'pages' depends of 'pending_job' value:
>   * pending_job == 0 -> migration_thread can use it.
> + * No need for mutex lock.
>   * pending_job != 0 -> multifd_channel can use it.
> + * No need for mutex lock.
>   */
>  MultiFDPages_t *pages;
>  
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 69b9d7cf98..056599cbaf 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -661,6 +661,8 @@ static void *multifd_send_thread(void *opaque)
>  p->flags |= MULTIFD_FLAG_SYNC;
>  p->sync_needed = false;
>  }
> +qemu_mutex_unlock(&p->mutex);
> +
>  p->normal_num = 0;
>  
>  if (use_zero_copy_send) {
> @@ -682,11 +684,6 @@ static void *multifd_send_thread(void *opaque)

Looking at my source tree, without the rest of your series, I see an
unlock just before this pair of }'s; doesn't that need to go?

Dave

>  }
>  }
>  multifd_send_fill_packet(p);
> -p->num_packets++;
> -p->total_normal_pages += p->normal_num;
> -p->pages->num = 0;
> -p->pages->block = NULL;
> -qemu_mutex_unlock(&p->mutex);
>  
>  trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,
> p->next_packet_size);
> @@ -711,6 +708,10 @@ static void *multifd_send_thread(void *opaque)
>  }
>  
>  qemu_mutex_lock(&p->mutex);
> +p->num_packets++;
> +p->total_normal_pages += p->normal_num;
> +p->pages->num = 0;
> +p->pages->block = NULL;
>  p->sent_bytes += p->packet_len;;
>  p->sent_bytes += p->next_packet_size;
>  p->pending_job--;
> -- 
> 2.35.3
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH] x86: cpu: Error out if memory exceeds addressable range

2022-07-18 Thread Igor Mammedov

On Mon, 18 Jul 2022 13:47:34 +0530
Nikunj A Dadhania  wrote:

> Currently it is possible to start a guest with memory that is beyond
> the addressable range of CPU and QEMU does not even warn about it.
> The default phys_bits is 40 and can address 1TB. However it allows to
> start a guest with greater than 1TB memory.
> 
> Prevent this by erroring out in such a scenario.
> 
> Reported-by: Shaju Abraham 
> Signed-off-by: Nikunj A Dadhania 


Following shall care of your issue:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg900136.html

> ---
>  target/i386/cpu.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 6a57ef13af..1afbdbac7d 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6376,6 +6376,7 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu)
>  
>  static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>  {
> +MachineState *machine = MACHINE(qdev_get_machine());
>  CPUState *cs = CPU(dev);
>  X86CPU *cpu = X86_CPU(dev);
>  X86CPUClass *xcc = X86_CPU_GET_CLASS(dev);
> @@ -6541,6 +6542,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  }
>  }
>  
> +if (BIT_ULL(cpu->phys_bits) < machine->maxram_size) {
> +error_setg(&local_err, "cannot setup guest memory: "
> +   "%s memory(%lu MiB) exceeds addressable limit(%llu MiB)",
> +   machine->maxram_size == machine->ram_size ? "" : "max",
> +   machine->maxram_size / MiB,
> +   BIT_ULL(cpu->phys_bits) / MiB);
> +goto out;
> +}
> +
>  /* Cache information initialization */
>  if (!cpu->legacy_cache) {
>  if (!xcc->model || !xcc->model->cpudef->cache_info) {

Re: [PULL 05/20] meson: Prefix each element of firmware path