date:20220708

Default ELEN is setting to 64 for now, which is incorrect setting for
Zve32*, and spec has mention minimum VLEN and supported EEW in chapter
"Zve*: Vector Extensions for Embedded Processors" is 32 for Zve32.

ELEN actaully could be derived from which extensions are enabled,
so this patch set elen to 0 as auto detect, and keep the capability to
let user could configure that.

Signed-off-by: Kito Cheng 
---
 target/riscv/cpu.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 487d0faa63..c1b96da7da 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -751,13 +751,22 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 "Vector extension ELEN must be power of 2");
 return;
 }
-if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
+if (cpu->cfg.elen == 0) {
+  if (cpu->cfg.ext_zve32f) {
+cpu->cfg.elen = 32;
+  }
+  if (cpu->cfg.ext_zve64f || cpu->cfg.ext_v) {
+cpu->cfg.elen = 64;
+  }
+}
+if (cpu->cfg.elen != 0 && (cpu->cfg.elen > 64 ||
+   cpu->cfg.elen < 8)) {
 error_setg(errp,
 "Vector extension implementation only supports ELEN "
 "in the range [8, 64]");
 return;
 }
-if (cpu->cfg.vlen < cpu->cfg.elen) {
+if (cpu->cfg.elen != 0 && cpu->cfg.vlen < cpu->cfg.elen) {
 error_setg(errp,
 "Vector extension VLEN must be greater than or equal "
 "to ELEN");
@@ -901,7 +910,8 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
 DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
 DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
-DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
+/* elen = 0 means set from v or zve* extension */
+DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 0),
 
 DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
 DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
-- 
2.34.0

[PATCH 1/2] target/riscv: Lower bound of VLEN is 32, and check VLEN >= ELEN

According RVV spec 1.0, the minmal requirement of VLEN is great than or
equal to ELEN, and minmal possible ELEN is 32, and also spec has mention
`Minimum VLEN` for zve32* is 32, so the lower bound of VLEN is 32 I
think.

[1] 
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#2-implementation-defined-constant-parameters
[2] 
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#182-zve-vector-extensions-for-embedded-processors

Signed-off-by: Kito Cheng 
---
 target/riscv/cpu.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1bb3973806..487d0faa63 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -740,10 +740,10 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 "Vector extension VLEN must be power of 2");
 return;
 }
-if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
+if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 32) {
 error_setg(errp,
 "Vector extension implementation only supports VLEN "
-"in the range [128, %d]", RV_VLEN_MAX);
+"in the range [32, %d]", RV_VLEN_MAX);
 return;
 }
 if (!is_power_of_2(cpu->cfg.elen)) {
@@ -757,6 +757,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 "in the range [8, 64]");
 return;
 }
+if (cpu->cfg.vlen < cpu->cfg.elen) {
+error_setg(errp,
+"Vector extension VLEN must be greater than or equal "
+"to ELEN");
+return;
+}
 if (cpu->cfg.vext_spec) {
 if (!g_strcmp0(cpu->cfg.vext_spec, "v1.0")) {
 vext_version = VEXT_VERSION_1_00_0;
-- 
2.34.0

Re: [PATCH] hw/riscv: virt: pass random seed to fdt

2022-07-08 Thread Alistair Francis

On Thu, Jul 7, 2022 at 11:04 AM Jason A. Donenfeld  wrote:
>
> Hey Alistair,
>
> On Tue, Jul 05, 2022 at 03:09:09AM +0200, Jason A. Donenfeld wrote:
> > Hi Alistair,
> >
> > On Wed, Jun 29, 2022 at 4:09 AM Alistair Francis  
> > wrote:
> > > I have a Linux 5.8 test case that is failing due to this patch.
> >
> > Before I started fixing things in random.c, there were a lot of early
> > boot bugs with the RNG in Linux. I backported the fixes for these to
> > all stable kernels. It's a bummer that risc-v got hit by these bugs,
> > but I think that's just the way things go unfortunately.

Hmm... That's a pain. So there is a bug in older kernels where they
won't boot if we specify this?

Can you point to the fixes?

> >
> > Jason
> >
>
> By the way, I still can't find this in your github tree. I was hoping we
> could get this in for 7.1.

Yeah, it's hard to accept when it will break users. I would rather
avoid someone upgrading to QEMU 7.1 and the kernel failing to boot
with no information.

>
> As for your 5.8 issue, I've been trying to reproduce that to understand
> more about it, but I'm unable to. I've been trying with
> nommu_virt_defconfig using my patch ontop of qemu master. Maybe it's
> possible in testing this out you were testing the wrong branch? Anyway,
> it'd be nice to get this queued up...

Hmm... you can't reproduce it?

Alistair

>
> Jason

AioContext lock removal: help needed

Hello everyone,

As you all know, I am trying to find a way to replace the well known
AioContext lock with something else that makes sense and provides the
same (or even better) guarantees than using this lock.

The reason for this change have been explained over and over and I don't
really want to repeat them. Please read the various series I posted in
the past [1] for more information.

The end goal is to get rid of the AioContext, and have fine-granularity
locks in the various components, to make the whole block layer more
multi-thread friendly and eventually be able to assign multiple virtual
queues to a single iothread.

AioContext lock is used everywhere, to protect a huge variety of data.
This limits a lot the level of multithreading that iothreads can achieve.

Before digging into the problem itself and possible solutions, I would
like to also add that we are having a weekly (or bi-weekly, we'll see)
public meeting where we plan to discuss about this project. Anyone
interested is very welcome to join. Event invitation is here:

https://calendar.google.com/event?action=TEMPLATE&tmeid=NTdja2VwMDFyYm9nNjNyc25pdXU5bm8wb3FfMjAyMjA3MTRUMDgwMDAwWiBlZXNwb3NpdEByZWRoYXQuY29t&tmsrc=eesposit%40redhat.com&scp=ALL

One huge blocker we are having is removing the AioContext from the block
API (bdrv_* and friends).
I identified two initial and main candidates that need to lose the
aiocontext protection:
- bdrv_replace_child_noperm
- bdtv_try_set_aio_context

When these two functions can safely run without AioContext lock, then we
are getting rid of the majority of its usage.
The main issue is: what can we use as replacement?

Let's analyze bdrv_replace_child_noperm (probably the toughest of the
two): this function performs a graph modification, removing a child from
a bs and putting it under another. It modifies the bs' ->parents and
->children nodes list, and it definitely needs protection because these
lists are also read from iothreads in parallel.

Possible candidates to use as replacement:

- rwlock. With the help of Paolo, I implemented a rwlock optimized for
many and fast readers, and few writers. Ideal for
bdrv_replace_child_noperm. However, the problem here is that when a
writer has to wait other readers to finish (since it has exclusive
access), it should call a nested event loop to allow others (reader
included) to progress.
And this brings us into serious complications, because polling with a
wlock taken is prone to a lot of deadlocks, including the fact that the
AioContext lock is still needed in AIO_WAIT_WHILE. The solution would be
to run everything, readers included, in coroutines. However, this is not
easy either: long story short, switching BlockDriverState callbacks to
coroutines is a big problem, as the AioContext lock is still being taken
in many of the callbacks caller and therefore switching from a coroutine
creates a mixture of locks taken that simply results in deadlocks.
Ideally we want to first get rid of the AioContext lock and then switch
to coroutines, but that's the whole point of the rwlock.
More on this here:
https://patchew.org/QEMU/20220426085114.199647-1-eespo...@redhat.com/#cc5e12d1-d25f-d338-bff2-0d3f5cc0d...@redhat.com

But I would say this is not an ideal candidate to replace the AioContext
lock. At least not in the immediate future.

- drains. This was the initial and still main lead. Using
bdrv_drained_begin/end we are sure that a node and all its parents will
be paused (included jobs), no io will further come since it will be
temporarily disabled and all processing requests are ensured to be
finished by the end bdrv_drained_begin returns.
Even better than bdrv_drained, I proposed using
bdrv_subtree_drained_begin, which also stops and protects the child of a
node.
I think the major drawback of this is that we need to be sure that there
are no cases where drains is not enough. Together with Kevin and Stefan
we identified that we need to prevent drain to be called in coroutines,
regardless on which AioContext they are run. That's because they could
allow parallel drain/graph reading to happen, for example (thinking
about the general case) a coroutine yielding after calling drain_begin
and in the middle of a graph modification could allow another coroutine
to drain/read the graph.
Note that draining itself also involves reading the graph too.

We thought the only usage of coroutines draining was in mirror run()
callback. However, that is just the tip of the iceberg.
Other functions like .bdrv_open callbacks (like qcow2_open) take care of
creating coroutines to execute part of the logic, with valid performance
reasons (we don't want to wait when we could simply yield and allow
something else to run).

So another question is: what could we do to solve this coroutine issue?
Ideas?

Main drain series:
https://patchew.org/QEMU/20220118162738.1366281-1-eespo...@redhat.com/
[1]



[1] = https://patchew.org/QEMU/20220301142113.163174-1-eespo...@redhat.com/

Thank you,
Emanuele

Re: [RFC PATCH v9 01/23] vhost: Return earlier if used buffers overrun

2022-07-08 Thread Jason Wang

On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez  wrote:
>
> Previous function misses the just picked avail buffer from the queue.
> This way keeps blocking the used queue forever, but is cleaner to check
> before calling to vhost_svq_get_buf.
>
> Fixes: 100890f7cad50 ("vhost: Shadow virtqueue buffers forwarding")
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> b/hw/virtio/vhost-shadow-virtqueue.c
> index 56c96ebd13..9280285435 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -405,19 +405,21 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>  vhost_svq_disable_notification(svq);
>  while (true) {
>  uint32_t len;
> -g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
> -if (!elem) {
> -break;
> -}
> +g_autofree VirtQueueElement *elem = NULL;
>
>  if (unlikely(i >= svq->vring.num)) {
>  qemu_log_mask(LOG_GUEST_ERROR,
>   "More than %u used buffers obtained in a %u size 
> SVQ",
>   i, svq->vring.num);
> -virtqueue_fill(vq, elem, len, i);
> -virtqueue_flush(vq, i);
> +virtqueue_flush(vq, svq->vring.num);
>  return;
>  }
> +
> +elem = vhost_svq_get_buf(svq, &len);
> +if (!elem) {
> +break;
> +}
> +
>  virtqueue_fill(vq, elem, len, i++);
>  }
>
> --
> 2.31.1
>

[PATCH 2/2] target/riscv: Implement dump content of vector register

Implement -d cpu,vu to dump content of vector register.

Signed-off-by: Kito Cheng 
---
 target/riscv/cpu.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c1b96da7da..97b289d277 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -72,6 +72,15 @@ const char * const riscv_fpr_regnames[] = {
   "f30/ft10", "f31/ft11"
 };
 
+const char * const riscv_vr_regnames[] = {
+  "v0",  "v1",  "v2",  "v3",  "v4",  "v5",
+  "v6",  "v7",  "v8",  "v9",  "v10", "v11",
+  "v12", "v13", "v14", "v15", "v16", "v17",
+  "v18", "v19", "v20", "v21", "v22", "v23",
+  "v24", "v25", "v26", "v27", "v28", "v29",
+  "v30", "v31"
+};
+
 static const char * const riscv_excp_names[] = {
 "misaligned_fetch",
 "fault_fetch",
@@ -375,6 +384,28 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
 }
 }
 }
+if (flags & CPU_DUMP_VU) {
+int vlen = cpu->cfg.vlen;
+int n_chunk = vlen / 64;
+if (vlen == 32) {
+for (i = 0; i < 32; i++) {
+qemu_fprintf(f, "0x%08" PRIx64 "\n", env->vreg[i]);
+}
+} else {
+for (i = 0; i < 32; i++) {
+qemu_fprintf(f, " %-8s ",
+ riscv_vr_regnames[i]);
+
+int vec_reg_offset = i * vlen / 64;
+qemu_fprintf(f, "0x");
+for (int j = n_chunk - 1; j >= 0; --j) {
+qemu_fprintf(f, "%016" PRIx64,
+ env->vreg[vec_reg_offset + j]);
+}
+qemu_fprintf(f, "\n");
+}
+}
+}
 }
 
 static void riscv_cpu_set_pc(CPUState *cs, vaddr value)
-- 
2.34.0

[PATCH 1/2] util/log: Add vu to dump content of vector unit

Add new option for -d vu to dump the content of vector unit, many target
has vector register, but there is no easy way to dump the content, we
use this on downstream for a while to help debug, and I feel that's
really useful, so I think it would be great to upstream that to save debug time
for other people :)

Signed-off-by: Kito Cheng 
---
 accel/tcg/cpu-exec.c  | 3 +++
 include/hw/core/cpu.h | 2 ++
 include/qemu/log.h| 1 +
 util/log.c| 2 ++
 4 files changed, 8 insertions(+)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index a565a3f8ec..2cbec0a6ed 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -221,6 +221,9 @@ static inline void log_cpu_exec(target_ulong pc, CPUState 
*cpu,
 if (qemu_loglevel_mask(CPU_LOG_TB_FPU)) {
 flags |= CPU_DUMP_FPU;
 }
+if (qemu_loglevel_mask(CPU_LOG_TB_VU)) {
+flags |= CPU_DUMP_VU;
+}
 #if defined(TARGET_I386)
 flags |= CPU_DUMP_CCOP;
 #endif
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 996f94059f..7a767e17cd 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -535,11 +535,13 @@ GuestPanicInformation *cpu_get_crash_info(CPUState *cpu);
  * @CPU_DUMP_CODE:
  * @CPU_DUMP_FPU: dump FPU register state, not just integer
  * @CPU_DUMP_CCOP: dump info about TCG QEMU's condition code optimization state
+ * @CPU_DUMP_VU: dump vector register state
  */
 enum CPUDumpFlags {
 CPU_DUMP_CODE = 0x0001,
 CPU_DUMP_FPU  = 0x0002,
 CPU_DUMP_CCOP = 0x0004,
+CPU_DUMP_VU   = 0x0008,
 };
 
 /**
diff --git a/include/qemu/log.h b/include/qemu/log.h
index c5643d8dd5..49bd0b0fbc 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -35,6 +35,7 @@ bool qemu_log_separate(void);
 /* LOG_STRACE is used for user-mode strace logging. */
 #define LOG_STRACE (1 << 19)
 #define LOG_PER_THREAD (1 << 20)
+#define CPU_LOG_TB_VU  (1 << 21)
 
 /* Lock/unlock output. */
 
diff --git a/util/log.c b/util/log.c
index d6eb0378c3..775d122c2e 100644
--- a/util/log.c
+++ b/util/log.c
@@ -441,6 +441,8 @@ const QEMULogItem qemu_log_items[] = {
 #ifdef CONFIG_PLUGIN
 { CPU_LOG_PLUGIN, "plugin", "output from TCG plugins\n"},
 #endif
+{ CPU_LOG_TB_VU, "vu",
+  "include vector unit registers in the 'cpu' logging" },
 { LOG_STRACE, "strace",
   "log every user-mode syscall, its input, and its result" },
 { LOG_PER_THREAD, "tid",
-- 
2.34.0

Re: [PATCH 1/8] virtio_queue_aio_attach_host_notifier: remove AioContext lock

Am 05/07/2022 um 16:11 schrieb Stefan Hajnoczi:
> On Thu, Jun 09, 2022 at 10:37:20AM -0400, Emanuele Giuseppe Esposito wrote:
>> @@ -146,7 +147,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
>>  
>>  s->dataplane_starting = false;
>>  s->dataplane_started = true;
>> -aio_context_release(s->ctx);
>>  return 0;
> 
> This looks risky because s->dataplane_started is accessed by IO code and
> there is a race condition here. Maybe you can refactor the code along
> the lines of virtio-blk to avoid the race.
> 

Uhmm could you explain why is virtio-blk also safe here?
And what is currently protecting dataplane_started (in both blk and
scsi, as I don't see any other AioContext lock taken)?

Because I see that for example virtio_blk_req_complete is IO_CODE, so it
could theoretically read dataplane_started while it is being changed in
dataplane_stop? Even though I guess it doesn't because we disable and
clean the host notifier before modifying it?

But if so, I don't get what is the difference with scsi code, and why we
need to protect only that instance with the aiocontext lock?

Thank you,
Emanuele

Re: [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK

2022-07-08 Thread Jason Wang

On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez  wrote:
>
> To restore the device in the destination of a live migration we send the
> commands through control virtqueue. For a device to read CVQ it must
> have received DRIVER_OK status bit.
>
> However this open a window where the device could start receiving
> packets in rx queue 0 before it receive the RSS configuration. To avoid
> that, we will not send vring_enable until all configuration is used by
> the device.
>
> As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.

I wonder if it's better to delay this to the series that implements
migration since the shadow cvq doesn't depends on this?

>
> Signed-off-by: Eugenio Pérez 
> ---
>  hw/virtio/vhost-vdpa.c | 22 --
>  1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 66f054a12c..2ee8009594 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -728,13 +728,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev 
> *dev, int idx)
>  return idx;
>  }
>
> +/**
> + * Set ready all vring of the device
> + *
> + * @dev: Vhost device
> + */
>  static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>  {
>  int i;
>  trace_vhost_vdpa_set_vring_ready(dev);
> -for (i = 0; i < dev->nvqs; ++i) {
> +for (i = 0; i < dev->vq_index_end; ++i) {
>  struct vhost_vring_state state = {
> -.index = dev->vq_index + i,
> +.index = i,

Looks like a cleanup or bugfix which deserves a separate patch?

>  .num = 1,
>  };
>  vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> @@ -1097,7 +1102,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
> bool started)
>  if (unlikely(!ok)) {
>  return -1;
>  }
> -vhost_vdpa_set_vring_ready(dev);
>  } else {
>  ok = vhost_vdpa_svqs_stop(dev);
>  if (unlikely(!ok)) {
> @@ -,16 +1115,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
> *dev, bool started)
>  }
>
>  if (started) {
> +int r;
> +
>  memory_listener_register(&v->listener, &address_space_memory);
> -return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +if (unlikely(r)) {
> +return r;
> +}
> +vhost_vdpa_set_vring_ready(dev);

Interesting, does this mean we only enable the last two queues without
this patch?

Thanks

>  } else {
>  vhost_vdpa_reset_device(dev);
>  vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> VIRTIO_CONFIG_S_DRIVER);
>  memory_listener_unregister(&v->listener);
> -
> -return 0;
>  }
> +
> +return 0;
>  }
>
>  static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> --
> 2.31.1
>

Re: [PATCH 3/8] virtio_blk_process_queued_requests: always run in a bh




Am 05/07/2022 um 16:23 schrieb Stefan Hajnoczi:
> On Thu, Jun 09, 2022 at 10:37:22AM -0400, Emanuele Giuseppe Esposito wrote:
>> diff --git a/hw/block/dataplane/virtio-blk.c 
>> b/hw/block/dataplane/virtio-blk.c
>> index f9224f23d2..03e10a36a4 100644
>> --- a/hw/block/dataplane/virtio-blk.c
>> +++ b/hw/block/dataplane/virtio-blk.c
>> @@ -234,8 +234,16 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
>>  goto fail_aio_context;
>>  }
>>  
>> +blk_inc_in_flight(s->conf->conf.blk);
> 
> Missing comment explaining why the in-flight counter is incremented and
> where the matching decrement operation is located.
> 
> I think you can get away without a comment if blk_inc_in_flight() is
> right next to aio_bh_new(), but in this case there are a few lines of
> code in between and it becomes unclear if there is a connection.

I will simply add:

/*
 * virtio_blk_restart_bh() code will take care of decrementing
 * in_flight counter.
 */

should make sense.

> 
>> +/*
>> + * vblk->bh is only set in virtio_blk_dma_restart_cb, which
>> + * is called only on vcpu start or stop.
>> + * Therefore it must be null.
>> + */
>> +assert(vblk->bh == NULL);
>>  /* Process queued requests before the ones in vring */
> 
> This comment makes an assumption about the order of file descriptor
> handlers vs BHs in the event loop. I suggest removing the comment. There
> is no reason for processing queued requests first anyway since
> virtio-blk devices can complete requests in any order.
> 

Ok, I guess you mean in a separate patch.

Thank you,
Emanuele

Re: [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq

2022-07-08 Thread Jason Wang

On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez  wrote:
>
> The used idx used to match with this, but it will not match from the
> moment we introduce svq_inject.

It might be better to explain what "svq_inject" means here.

> Rewind all the descriptors not used by
> vdpa device and get the vq state properly.
>
> Signed-off-by: Eugenio Pérez 
> ---
>  include/hw/virtio/virtio.h | 1 +
>  hw/virtio/vhost-vdpa.c | 7 +++
>  hw/virtio/virtio.c | 5 +
>  3 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index db1c0ddf6b..4b51ab9d06 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -302,6 +302,7 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int 
> n);
>  hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
>  hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
>  unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
> +unsigned int virtio_queue_get_in_use(const VirtQueue *vq);
>  void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n,
>   unsigned int idx);
>  void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 2ee8009594..de76128030 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1189,12 +1189,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev 
> *dev,
> struct vhost_vring_state *ring)
>  {
>  struct vhost_vdpa *v = dev->opaque;
> -int vdpa_idx = ring->index - dev->vq_index;
>  int ret;
>
>  if (v->shadow_vqs_enabled) {
> -VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, 
> vdpa_idx);
> -
> +const VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
>  /*
>   * Setting base as last used idx, so destination will see as 
> available
>   * all the entries that the device did not use, including the 
> in-flight
> @@ -1203,7 +1201,8 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev 
> *dev,
>   * TODO: This is ok for networking, but other kinds of devices might
>   * have problems with these retransmissions.
>   */
> -ring->num = svq->last_used_idx;
> +ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index) -
> +virtio_queue_get_in_use(vq);

I think we need to change the above comment as well otherwise readers
might get confused.

I wonder why we need to bother at this time. Is this an issue for
networking devices? And for block device, it's not sufficient since
there's no guarantee that the descriptor is handled in order?

Thanks

>  return 0;
>  }
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 5d607aeaa0..e02656f7a2 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3420,6 +3420,11 @@ unsigned int 
> virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
>  }
>  }
>
> +unsigned int virtio_queue_get_in_use(const VirtQueue *vq)
> +{
> +return vq->inuse;
> +}
> +
>  static void virtio_queue_packed_set_last_avail_idx(VirtIODevice *vdev,
> int n, unsigned int idx)
>  {
> --
> 2.31.1
>

Re: [PATCH 6/8] virtio-blk: mark IO_CODE functions




Am 05/07/2022 um 16:39 schrieb Stefan Hajnoczi:
> On Thu, Jun 09, 2022 at 10:37:25AM -0400, Emanuele Giuseppe Esposito wrote:
>> Just as done in the block API, mark functions in virtio-blk
>> that are called also from iothread(s).
>>
>> We know such functions are IO because many are blk_* callbacks,
>> running always in the device iothread, and remaining are propagated
>> from the leaf IO functions (if a function calls a IO_CODE function,
>> itself is categorized as IO_CODE too).
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> ---
>>  hw/block/dataplane/virtio-blk.c |  4 
>>  hw/block/virtio-blk.c   | 35 +
>>  2 files changed, 39 insertions(+)
> 
> The definition of IO_CODE() is:
> 
>   I/O API functions. These functions are thread-safe, and therefore
>   can run in any thread as long as the thread has called
>   aio_context_acquire/release().
> 
> I'm not sure it matches with the exact semantics you have in mind. Are
> they really allowed to be called from any thread and even from multiple
> threads? Or maybe just from the BlockBackend's AioContext thread?

I think it is just from the BlockBackend's AioContext thread. But I
classified blk_* functions as IO_CODE.

What is your opinion on that?

> 
> We need to be very careful to define these terms precisely and avoid
> applying them in cases that are similar but different as that will cause
> problems in the future.
> 
> Otherwise:
> Reviewed-by: Stefan Hajnoczi 
>

Re: [PATCH 7/8] VirtIOBlock: protect rq with its own lock




Am 05/07/2022 um 16:45 schrieb Stefan Hajnoczi:
> On Thu, Jun 09, 2022 at 10:37:26AM -0400, Emanuele Giuseppe Esposito wrote:
>> @@ -946,17 +955,20 @@ static void virtio_blk_reset(VirtIODevice *vdev)
>>   * stops all Iothreads.
>>   */
>>  blk_drain(s->blk);
>> +aio_context_release(ctx);
>>  
>>  /* We drop queued requests after blk_drain() because blk_drain() itself 
>> can
>>   * produce them. */
>> +qemu_mutex_lock(&s->req_mutex);
>>  while (s->rq) {
>>  req = s->rq;
>>  s->rq = req->next;
>> +qemu_mutex_unlock(&s->req_mutex);
>>  virtqueue_detach_element(req->vq, &req->elem, 0);
>>  virtio_blk_free_request(req);
>> +qemu_mutex_lock(&s->req_mutex);
> 
> Why is req_mutex dropped temporarily? At this point we don't really need
> the req_mutex (all I/O should be stopped and drained), but maybe we
> should do:

Agree that maybe it is not useful to drop the mutex temporarily.

Regarding why req_mutex is not needed, yes I guess it isn't. Should I
get rid of this hunk at all, and maybe leave a comment like "no
synchronization needed, due to drain + ->stop_ioeventfd()"?

> 
>   WITH_QEMU_MUTEX(&s->req_mutex) {
>   req = s->rq;
>   s->rq = NULL;
>   }
> 
>   ...process req list...

Not sure what you mean here, we are looping on s->rq, so do we need to
protect also that? and why setting it to NULL? Sorry I am a little bit
lost here.

Thank you,
Emanuele

> 
> Otherwise:
> Reviewed-by: Stefan Hajnoczi 
>

Re: AioContext lock removal: help needed




Am 08/07/2022 um 10:42 schrieb Emanuele Giuseppe Esposito:
> Hello everyone,
> 
> As you all know, I am trying to find a way to replace the well known
> AioContext lock with something else that makes sense and provides the
> same (or even better) guarantees than using this lock.
> 
> The reason for this change have been explained over and over and I don't
> really want to repeat them. Please read the various series I posted in
> the past [1] for more information.
> 
> The end goal is to get rid of the AioContext, and have fine-granularity
> locks in the various components, to make the whole block layer more
> multi-thread friendly and eventually be able to assign multiple virtual
> queues to a single iothread.
> 
> AioContext lock is used everywhere, to protect a huge variety of data.
> This limits a lot the level of multithreading that iothreads can achieve.
> 
> Before digging into the problem itself and possible solutions, I would
> like to also add that we are having a weekly (or bi-weekly, we'll see)
> public meeting where we plan to discuss about this project. Anyone
> interested is very welcome to join. Event invitation is here:
> 
> https://calendar.google.com/event?action=TEMPLATE&tmeid=NTdja2VwMDFyYm9nNjNyc25pdXU5bm8wb3FfMjAyMjA3MTRUMDgwMDAwWiBlZXNwb3NpdEByZWRoYXQuY29t&tmsrc=eesposit%40redhat.com&scp=ALL
> 
> One huge blocker we are having is removing the AioContext from the block
> API (bdrv_* and friends).
> I identified two initial and main candidates that need to lose the
> aiocontext protection:
> - bdrv_replace_child_noperm
> - bdtv_try_set_aio_context
> 
> When these two functions can safely run without AioContext lock, then we
> are getting rid of the majority of its usage.
> The main issue is: what can we use as replacement?
> 
> Let's analyze bdrv_replace_child_noperm (probably the toughest of the
> two): this function performs a graph modification, removing a child from
> a bs and putting it under another. It modifies the bs' ->parents and
> ->children nodes list, and it definitely needs protection because these
> lists are also read from iothreads in parallel.
> 
> Possible candidates to use as replacement:
> 
> - rwlock. With the help of Paolo, I implemented a rwlock optimized for
> many and fast readers, and few writers. Ideal for
> bdrv_replace_child_noperm. However, the problem here is that when a
> writer has to wait other readers to finish (since it has exclusive
> access), it should call a nested event loop to allow others (reader
> included) to progress.
> And this brings us into serious complications, because polling with a
> wlock taken is prone to a lot of deadlocks, including the fact that the
> AioContext lock is still needed in AIO_WAIT_WHILE. The solution would be
> to run everything, readers included, in coroutines. However, this is not
> easy either: long story short, switching BlockDriverState callbacks to
> coroutines is a big problem, as the AioContext lock is still being taken
> in many of the callbacks caller and therefore switching from a coroutine
> creates a mixture of locks taken that simply results in deadlocks.
> Ideally we want to first get rid of the AioContext lock and then switch
> to coroutines, but that's the whole point of the rwlock.
> More on this here:
> https://patchew.org/QEMU/20220426085114.199647-1-eespo...@redhat.com/#cc5e12d1-d25f-d338-bff2-0d3f5cc0d...@redhat.com

This is also very useful (on the same thread as above):
https://patchew.org/QEMU/20220426085114.199647-1-eespo...@redhat.com/#6fc3e40e-7682-b9dc-f789-3ca95e043...@redhat.com

> 
> But I would say this is not an ideal candidate to replace the AioContext
> lock. At least not in the immediate future.
> 
> - drains. This was the initial and still main lead. Using
> bdrv_drained_begin/end we are sure that a node and all its parents will
> be paused (included jobs), no io will further come since it will be
> temporarily disabled and all processing requests are ensured to be
> finished by the end bdrv_drained_begin returns.
> Even better than bdrv_drained, I proposed using
> bdrv_subtree_drained_begin, which also stops and protects the child of a
> node.
> I think the major drawback of this is that we need to be sure that there
> are no cases where drains is not enough. Together with Kevin and Stefan
> we identified that we need to prevent drain to be called in coroutines,
> regardless on which AioContext they are run. That's because they could
> allow parallel drain/graph reading to happen, for example (thinking
> about the general case) a coroutine yielding after calling drain_begin
> and in the middle of a graph modification could allow another coroutine
> to drain/read the graph.
> Note that draining itself also involves reading the graph too.
> 
> We thought the only usage of coroutines draining was in mirror run()
> callback. However, that is just the tip of the iceberg.
> Other functions like .bdrv_open callbacks (like qcow2_open) take care of
> creating coroutines to execute p

Re: [PATCH] hw/riscv: virt: pass random seed to fdt

2022-07-08 Thread Jason A. Donenfeld

Hi Alistair,

On 7/8/22, Alistair Francis  wrote:

>> > but I think that's just the way things go unfortunately.
>
> Hmm... That's a pain. So there is a bug in older kernels where they
> won't boot if we specify this?
>
> Can you point to the fixes?

Actually, in trying to reproduce this, I don't actually think this is
affected by those old random.c bugs.

>> As for your 5.8 issue, I've been trying to reproduce that to understand
>> more about it, but I'm unable to. I've been trying with
>> nommu_virt_defconfig using my patch ontop of qemu master. Maybe it's
>> possible in testing this out you were testing the wrong branch? Anyway,
>> it'd be nice to get this queued up...
>
> Hmm... you can't reproduce it?

No, I can't, and I'm now no longer convinced that there *is* a bug.
Can you try to repro again and send me detailed reproduction steps?

Thanks,
Jason

Re: [PATCH] tests: migration-test: Allow test to run without uffd

On Thu, Jul 07, 2022 at 02:46:00PM -0400, Peter Xu wrote:
> We used to stop running all tests if uffd is not detected.  However
> logically that's only needed for postcopy not the rest of tests.
> 
> Keep running the rest when still possible.
> 
> Signed-off-by: Peter Xu 
> ---
>  tests/qtest/migration-test.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK

On Fri, Jul 8, 2022 at 11:06 AM Jason Wang  wrote:
>
> On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez  wrote:
> >
> > To restore the device in the destination of a live migration we send the
> > commands through control virtqueue. For a device to read CVQ it must
> > have received DRIVER_OK status bit.
> >
> > However this open a window where the device could start receiving
> > packets in rx queue 0 before it receive the RSS configuration. To avoid
> > that, we will not send vring_enable until all configuration is used by
> > the device.
> >
> > As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.
>
> I wonder if it's better to delay this to the series that implements
> migration since the shadow cvq doesn't depends on this?
>
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  hw/virtio/vhost-vdpa.c | 22 --
> >  1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 66f054a12c..2ee8009594 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -728,13 +728,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev 
> > *dev, int idx)
> >  return idx;
> >  }
> >
> > +/**
> > + * Set ready all vring of the device
> > + *
> > + * @dev: Vhost device
> > + */
> >  static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >  {
> >  int i;
> >  trace_vhost_vdpa_set_vring_ready(dev);
> > -for (i = 0; i < dev->nvqs; ++i) {
> > +for (i = 0; i < dev->vq_index_end; ++i) {
> >  struct vhost_vring_state state = {
> > -.index = dev->vq_index + i,
> > +.index = i,
>
> Looks like a cleanup or bugfix which deserves a separate patch?
>
> >  .num = 1,
> >  };
> >  vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> > @@ -1097,7 +1102,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
> > *dev, bool started)
> >  if (unlikely(!ok)) {
> >  return -1;
> >  }
> > -vhost_vdpa_set_vring_ready(dev);
> >  } else {
> >  ok = vhost_vdpa_svqs_stop(dev);
> >  if (unlikely(!ok)) {
> > @@ -,16 +1115,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
> > *dev, bool started)
> >  }
> >
> >  if (started) {
> > +int r;
> > +
> >  memory_listener_register(&v->listener, &address_space_memory);
> > -return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +if (unlikely(r)) {
> > +return r;
> > +}
> > +vhost_vdpa_set_vring_ready(dev);
>
> Interesting, does this mean we only enable the last two queues without
> this patch?
>

The function vhost_vdpa_set_vring_ready is changed in this patch.
Instead of enabling only the vrings of the device, it enables all the
vrings of the device, from 0 to dev->vq_index_end.

In the case of networking, vq_index_end changes depending on how CVQ
and MQ are negotiated or not, so we should be safe here.

Based on your comments it's clear that this is an unexpected change
and I need to add that description to the patch message :).

Thanks!

Re: [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK

On Fri, Jul 8, 2022 at 11:06 AM Jason Wang  wrote:
>
> On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez  wrote:
> >
> > To restore the device in the destination of a live migration we send the
> > commands through control virtqueue. For a device to read CVQ it must
> > have received DRIVER_OK status bit.
> >
> > However this open a window where the device could start receiving
> > packets in rx queue 0 before it receive the RSS configuration. To avoid
> > that, we will not send vring_enable until all configuration is used by
> > the device.
> >
> > As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.
>
> I wonder if it's better to delay this to the series that implements
> migration since the shadow cvq doesn't depends on this?
>

(Forgot to add) this series is already capable of doing migration with
CVQ. It's just that it must use SVQ from the moment the source VM
boots up, which is far from ideal.

Thanks!

Re: [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq

On Fri, Jul 8, 2022 at 11:12 AM Jason Wang  wrote:
>
> On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez  wrote:
> >
> > The used idx used to match with this, but it will not match from the
> > moment we introduce svq_inject.
>
> It might be better to explain what "svq_inject" means here.
>

Good point, I'll change for the next version.

> > Rewind all the descriptors not used by
> > vdpa device and get the vq state properly.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  include/hw/virtio/virtio.h | 1 +
> >  hw/virtio/vhost-vdpa.c | 7 +++
> >  hw/virtio/virtio.c | 5 +
> >  3 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index db1c0ddf6b..4b51ab9d06 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -302,6 +302,7 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, 
> > int n);
> >  hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
> >  hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
> >  unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
> > +unsigned int virtio_queue_get_in_use(const VirtQueue *vq);
> >  void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n,
> >   unsigned int idx);
> >  void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n);
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 2ee8009594..de76128030 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1189,12 +1189,10 @@ static int vhost_vdpa_get_vring_base(struct 
> > vhost_dev *dev,
> > struct vhost_vring_state *ring)
> >  {
> >  struct vhost_vdpa *v = dev->opaque;
> > -int vdpa_idx = ring->index - dev->vq_index;
> >  int ret;
> >
> >  if (v->shadow_vqs_enabled) {
> > -VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, 
> > vdpa_idx);
> > -
> > +const VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
> >  /*
> >   * Setting base as last used idx, so destination will see as 
> > available
> >   * all the entries that the device did not use, including the 
> > in-flight
> > @@ -1203,7 +1201,8 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev 
> > *dev,
> >   * TODO: This is ok for networking, but other kinds of devices 
> > might
> >   * have problems with these retransmissions.
> >   */
> > -ring->num = svq->last_used_idx;
> > +ring->num = virtio_queue_get_last_avail_idx(dev->vdev, 
> > ring->index) -
> > +virtio_queue_get_in_use(vq);
>
> I think we need to change the above comment as well otherwise readers
> might get confused.
>

Re-thinking this: This part has always been buggy, so this is actually
a fix. I'll tag it for next versions or, even better, send it
separately.

But the comment still holds: We cannot use the device's used idx since
it could not match with the guest visible one. This is actually easy
to trigger if we migrate a guest many times with traffic.

Maybe it's cleaner to export directly used_idx from VirtQueue? Extra
care is needed with packed vq, but SVQ still does not support it. I
didn't want to duplicate that logic in virtio ring handling.

> I wonder why we need to bother at this time. Is this an issue for
> networking devices?

Every device has this issue when migrating as soon as the device's
used index is not the same as the guest's one.

> And for block device, it's not sufficient since
> there's no guarantee that the descriptor is handled in order?
>

Right, that part still hold here.

Thanks!

Re: Support for Gaisler multicore LEONx SoCs

2022-07-08 Thread Frederic Konrad

  Hi Gregg,
AFAIK the leon3-generic can emulate the GR712RC with some little differences in 
the memorymap and / or timer / CPU count.  (You should be able to boot the 
Gaisler monocore linux with it).

About the SMP support AdaCore had a few patches for it, I'll let Fabien answer.

Regards,Fred

 Le jeudi 7 juillet 2022 à 22:30:46 UTC+2, Peter Maydell 
 a écrit :  
 
 

On Thu, 7 Jul 2022 at 20:54, Gregg Allison  
wrote:


We are considering the Gaisler GR712RC (2 core LEON3) and GR740 (4 core LEON4) 
SoCs for a new deep space mission.

Does QEMU support these two multicore configurations at present? Is there an 
effort planned to provide multicore LEONx emulation?


I've cc'd the people listed in MAINTAINERS for Leon, but as far as I cansee 
there have been no Leon-related commits for a few years, so I don'tthink this 
area of QEMU is being actively developed. We seem to havecurrently LEON2 and 
LEON3 CPU support, and one machine type, the"Leon-3 generic" machine.
thanks
-- PMM

Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal

Alex Bennée  writes:

> Markus Armbruster  writes:
>
>> QDict is implemented as a simple hash table of fixed size.  Observe:
>>
>> * Slow for large n.  Not sure this matters.
>>
>> * A QDict with n entries takes 4120 + n * 32 bytes on my box.  Wastes
>>   space for small n, which is a common case.
>>
>> * Order of traversal depends on the hash function and on insertion
>>   order, because it iterates first over buckets, then collision
>>   chains.
>>
>> * Special code ensures qdict_size() takes constant time.
>>
>> Replace the hash table by a linked list.  Observe:
>>
>> * Even slower for large n.  Might be bad enough to matter.
>>
>> * A QDict with n entries takes 32 + n * 24 bytes.
>>
>> * Traversal is in insertion order.
>>
>> * qdict_size() is linear in the number of entries.
>>
>> This is an experiment.  Do not commit to master as is.
>
> Did you consider just using a straight array? What is the usual size of
> a QDict - how many entries do you expect to scale to?

I like the way you think :)

Let me hazard an educated guess.

QDict's intended purpose is "JSON AST for QMP".

Output and syntactically correct input satisfy the QAPI schema.  JSON
objects correspond to a complex type in the schema (struct or union).
The number of members in the schema limits the number of members in the
JSON object ("limits" because members can be optional).

BlockDeviceInfo has 32 members.  As far as I can tell, no type has more.

Exception 1: the 'any' type, currently used for QOM properties

Exception 2: the "'gen': false" schema backdoor, currenrly used for
 device_add arguments, so basically QOM properties again

I can't be bothered to go fishing for the QOM object with the most
properties.  Could exceed 32, but exceeding it by much would surprise
me.

For incorrect input, all bets are off.  We may hand-wave this away, but
only as long as input is trusted.

QDict is used for other purposes in places.  Can't say how many keys to
expect there.  Can say I wish it wasn't.

>> The change of traversal order affects expected test output.  I updated
>> only the tests covered by "make check" so far.  I expect some more to
>> hide under tests/qemu-iotests/.
>>
>> Signed-off-by: Markus Armbruster

[PATCH 00/22] vdpa net devices Rx filter change notification with Shadow VQ

Control virtqueue is used by networking device for accepting various
commands from the driver. It's a must to support advanced configurations.

Rx filtering event is issues by qemu when device's MAC address changed once and
the previous one has not been queried by external agents.

Shadow VirtQueue (SVQ) already makes possible tracking the state of virtqueues,
effectively intercepting them so qemu can track what regions of memory are
dirty because device action and needs migration. However, this does not solve
networking device state seen by the driver because CVQ messages, like changes
on MAC addresses from the driver.

This series uses SVQ infraestructure to intercept networking control messages
used by the device. This way, qemu is able to update VirtIONet device model and
react to them. In particular, this series enables rx filter change
notification.

This is a pre-requisite to achieve net vdpa device with CVQ live migration.
It's a stripped down version of [1], with error paths checked and no migration
enabled.

First patch solves a memory leak if the device is able to trick qemu to think
it has returned more buffers than svq size. This should not be possible, but
we're a bit safer this way.

Next nine patches reorder and clean code base so its easier to apply later
ones. No functional change should be noticed from these changes.

Patches from 11 to 16 enable SVQ API to make other parts of qemu to interact
with it. In particular, they will be used by vhost-vdpa net to handle CVQ
messages.

Patches 17 to 19 enable the update of the virtio-net device model for each
CVQ message acknoledged by the device.

Last patches enable x-svq parameter, forbidding device migration since it is
not restored in the destination's vdpa device yet. This will be added in later
series, using this work.

Comments are welcome.

[1]
https://patchwork.kernel.org/project/qemu-devel/cover/20220706184008.1649478-1-epere...@redhat.com/

Eugenio Pérez (22):
vhost: Return earlier if used buffers overrun
vhost: move descriptor translation to vhost_svq_vring_write_descs
vdpa: Clean vhost_vdpa_dev_start(dev, false)
virtio-net: Expose ctrl virtqueue logic
vhost: Decouple vhost_svq_add_split from VirtQueueElement
vhost: Reorder vhost_svq_last_desc_of_chain
vhost: Add SVQElement
vhost: Move last chain id to SVQ element
vhost: Add opaque member to SVQElement
vdpa: Small rename of error labels
vhost: add vhost_svq_push_elem
vhost: Add vhost_svq_inject
vhost: add vhost_svq_poll
vhost: Add custom used buffer callback
vhost: Add svq avail_handler callback
vhost: add detach SVQ operation
vdpa: Export vhost_vdpa_dma_map and unmap calls
vdpa: manual forward CVQ buffers
vdpa: Buffer CVQ support on shadow virtqueue
vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
vdpa: Add device migration blocker
vdpa: Add x-svq to NetdevVhostVDPAOptions

--
2.31.1

[PATCH 01/22] vhost: Return earlier if used buffers overrun

Previous function misses the just picked avail buffer from the queue.
This way keeps blocking the used queue forever, but is cleaner to check
before calling to vhost_svq_get_buf.

Fixes: 100890f7cad50 ("vhost: Shadow virtqueue buffers forwarding")
Acked-by: Jason Wang 
Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 56c96ebd13..9280285435 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -405,19 +405,21 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 vhost_svq_disable_notification(svq);
 while (true) {
 uint32_t len;
-g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
-if (!elem) {
-break;
-}
+g_autofree VirtQueueElement *elem = NULL;
 
 if (unlikely(i >= svq->vring.num)) {
 qemu_log_mask(LOG_GUEST_ERROR,
  "More than %u used buffers obtained in a %u size SVQ",
  i, svq->vring.num);
-virtqueue_fill(vq, elem, len, i);
-virtqueue_flush(vq, i);
+virtqueue_flush(vq, svq->vring.num);
 return;
 }
+
+elem = vhost_svq_get_buf(svq, &len);
+if (!elem) {
+break;
+}
+
 virtqueue_fill(vq, elem, len, i++);
 }
 
-- 
2.31.1

[PATCH 04/22] virtio-net: Expose ctrl virtqueue logic

This allows external vhost-net devices to modify the state of the
VirtIO device model once vhost-vdpa device has acknowledge the control
commands.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/virtio-net.h |  4 ++
 hw/net/virtio-net.c| 84 --
 2 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eb87032627..42caea0d1d 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -218,6 +218,10 @@ struct VirtIONet {
 struct EBPFRSSContext ebpf_rss;
 };
 
+size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+  const struct iovec *in_sg, unsigned in_num,
+  const struct iovec *out_sg,
+  unsigned out_num);
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
const char *type);
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7ad948ee7c..53bb92c9f1 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1434,57 +1434,71 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t 
cmd,
 return VIRTIO_NET_OK;
 }
 
-static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+  const struct iovec *in_sg, unsigned in_num,
+  const struct iovec *out_sg,
+  unsigned out_num)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 struct virtio_net_ctrl_hdr ctrl;
 virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-VirtQueueElement *elem;
 size_t s;
 struct iovec *iov, *iov2;
-unsigned int iov_cnt;
+
+if (iov_size(in_sg, in_num) < sizeof(status) ||
+iov_size(out_sg, out_num) < sizeof(ctrl)) {
+virtio_error(vdev, "virtio-net ctrl missing headers");
+return 0;
+}
+
+iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
+s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
+iov_discard_front(&iov, &out_num, sizeof(ctrl));
+if (s != sizeof(ctrl)) {
+status = VIRTIO_NET_ERR;
+} else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
+status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
+status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
+status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
+status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
+status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
+}
+
+s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
+assert(s == sizeof(status));
+
+g_free(iov2);
+return sizeof(status);
+}
+
+static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtQueueElement *elem;
 
 for (;;) {
+size_t written;
 elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
 if (!elem) {
 break;
 }
-if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
-iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
-virtio_error(vdev, "virtio-net ctrl missing headers");
+
+written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
+ elem->out_sg, elem->out_num);
+if (written > 0) {
+virtqueue_push(vq, elem, written);
+virtio_notify(vdev, vq);
+g_free(elem);
+} else {
 virtqueue_detach_element(vq, elem, 0);
 g_free(elem);
 break;
 }
-
-iov_cnt = elem->out_num;
-iov2 = iov = g_memdup2(elem->out_sg,
-   sizeof(struct iovec) * elem->out_num);
-s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
-iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
-if (s != sizeof(ctrl)) {
-status = VIRTIO_NET_ERR;
-} else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
-status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
-status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
-status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
-status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_

[PATCH 07/22] vhost: Add SVQElement

This will allow SVQ to add metadata to the different queue elements. To
simplify changes, only store actual element at this patch.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  8 --
 hw/virtio/vhost-shadow-virtqueue.c | 41 --
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index c132c994e9..0b34f48037 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,10 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef struct SVQElement {
+VirtQueueElement *elem;
+} SVQElement;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
 /* Shadow vring */
@@ -47,8 +51,8 @@ typedef struct VhostShadowVirtqueue {
 /* IOVA mapping */
 VhostIOVATree *iova_tree;
 
-/* Map for use the guest's descriptors */
-VirtQueueElement **ring_id_maps;
+/* Each element context */
+SVQElement *ring_id_maps;
 
 /* Next VirtQueue element that guest made available */
 VirtQueueElement *next_guest_avail_elem;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index a4d5d7bae0..d50e1383f5 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -246,7 +246,7 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const 
struct iovec *out_sg,
 return false;
 }
 
-svq->ring_id_maps[qemu_head] = elem;
+svq->ring_id_maps[qemu_head].elem = elem;
 return true;
 }
 
@@ -384,15 +384,25 @@ static void 
vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
 svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
 }
 
-static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
-   uint32_t *len)
+static bool vhost_svq_is_empty_elem(SVQElement elem)
+{
+return elem.elem == NULL;
+}
+
+static SVQElement vhost_svq_empty_elem(void)
+{
+return (SVQElement){};
+}
+
+static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
 {
 const vring_used_t *used = svq->vring.used;
 vring_used_elem_t used_elem;
+SVQElement svq_elem = vhost_svq_empty_elem();
 uint16_t last_used, last_used_chain, num;
 
 if (!vhost_svq_more_used(svq)) {
-return NULL;
+return svq_elem;
 }
 
 /* Only get used array entries after they have been exposed by dev */
@@ -405,24 +415,25 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 if (unlikely(used_elem.id >= svq->vring.num)) {
 qemu_log_mask(LOG_GUEST_ERROR, "Device %s says index %u is used",
   svq->vdev->name, used_elem.id);
-return NULL;
+return svq_elem;
 }
 
-if (unlikely(!svq->ring_id_maps[used_elem.id])) {
+svq_elem = svq->ring_id_maps[used_elem.id];
+svq->ring_id_maps[used_elem.id] = vhost_svq_empty_elem();
+if (unlikely(vhost_svq_is_empty_elem(svq_elem))) {
 qemu_log_mask(LOG_GUEST_ERROR,
 "Device %s says index %u is used, but it was not available",
 svq->vdev->name, used_elem.id);
-return NULL;
+return svq_elem;
 }
 
-num = svq->ring_id_maps[used_elem.id]->in_num +
-  svq->ring_id_maps[used_elem.id]->out_num;
+num = svq_elem.elem->in_num + svq_elem.elem->out_num;
 last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
 svq->desc_next[last_used_chain] = svq->free_head;
 svq->free_head = used_elem.id;
 
 *len = used_elem.len;
-return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
+return svq_elem;
 }
 
 static void vhost_svq_flush(VhostShadowVirtqueue *svq,
@@ -437,6 +448,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 vhost_svq_disable_notification(svq);
 while (true) {
 uint32_t len;
+SVQElement svq_elem;
 g_autofree VirtQueueElement *elem = NULL;
 
 if (unlikely(i >= svq->vring.num)) {
@@ -447,11 +459,12 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 return;
 }
 
-elem = vhost_svq_get_buf(svq, &len);
-if (!elem) {
+svq_elem = vhost_svq_get_buf(svq, &len);
+if (vhost_svq_is_empty_elem(svq_elem)) {
 break;
 }
 
+elem = g_steal_pointer(&svq_elem.elem);
 virtqueue_fill(vq, elem, len, i++);
 }
 
@@ -594,7 +607,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
VirtIODevice *vdev,
 memset(svq->vring.desc, 0, driver_size);
 svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
 memset(svq->vring.used, 0, device_size);
-svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+svq->ring_id_maps = g_new0(SVQElement, svq->v

[PATCH 02/22] vhost: move descriptor translation to vhost_svq_vring_write_descs

It's done for both in and out descriptors so it's better placed here.

Acked-by: Jason Wang 
Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 38 +-
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 9280285435..115d769b86 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -122,17 +122,35 @@ static bool vhost_svq_translate_addr(const 
VhostShadowVirtqueue *svq,
 return true;
 }
 
-static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-const struct iovec *iovec, size_t num,
-bool more_descs, bool write)
+/**
+ * Write descriptors to SVQ vring
+ *
+ * @svq: The shadow virtqueue
+ * @sg: Cache for hwaddr
+ * @iovec: The iovec from the guest
+ * @num: iovec length
+ * @more_descs: True if more descriptors come in the chain
+ * @write: True if they are writeable descriptors
+ *
+ * Return true if success, false otherwise and print error.
+ */
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+const struct iovec *iovec, size_t num,
+bool more_descs, bool write)
 {
 uint16_t i = svq->free_head, last = svq->free_head;
 unsigned n;
 uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
 vring_desc_t *descs = svq->vring.desc;
+bool ok;
 
 if (num == 0) {
-return;
+return true;
+}
+
+ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+if (unlikely(!ok)) {
+return false;
 }
 
 for (n = 0; n < num; n++) {
@@ -150,6 +168,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue 
*svq, hwaddr *sg,
 }
 
 svq->free_head = le16_to_cpu(svq->desc_next[last]);
+return true;
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
@@ -169,21 +188,18 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
 return false;
 }
 
-ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
+ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
+ elem->in_num > 0, false);
 if (unlikely(!ok)) {
 return false;
 }
-vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-elem->in_num > 0, false);
-
 
-ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
+ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, 
false,
+ true);
 if (unlikely(!ok)) {
 return false;
 }
 
-vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
-
 /*
  * Put the entry in the available array (but don't update avail->idx until
  * they do sync).
-- 
2.31.1

[PATCH 05/22] vhost: Decouple vhost_svq_add_split from VirtQueueElement

VirtQueueElement comes from the guest, but we're heading SVQ to be able
to inject element without the guest's knowledge.

To do so, make this accept sg buffers directly, instead of using
VirtQueueElement.

Add vhost_svq_add_element to maintain element convenience

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 38 +-
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 115d769b86..2d70f832e9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -172,30 +172,32 @@ static bool 
vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-VirtQueueElement *elem, unsigned *head)
+const struct iovec *out_sg, size_t out_num,
+const struct iovec *in_sg, size_t in_num,
+unsigned *head)
 {
 unsigned avail_idx;
 vring_avail_t *avail = svq->vring.avail;
 bool ok;
-g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
+g_autofree hwaddr *sgs = NULL;
 
 *head = svq->free_head;
 
 /* We need some descriptors here */
-if (unlikely(!elem->out_num && !elem->in_num)) {
+if (unlikely(!out_num && !in_num)) {
 qemu_log_mask(LOG_GUEST_ERROR,
   "Guest provided element with no descriptors");
 return false;
 }
 
-ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
- elem->in_num > 0, false);
+sgs = g_new(hwaddr, MAX(out_num, in_num));
+ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
+ false);
 if (unlikely(!ok)) {
 return false;
 }
 
-ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, 
false,
- true);
+ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
 if (unlikely(!ok)) {
 return false;
 }
@@ -222,10 +224,13 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
  * takes ownership of the element: In case of failure, it is free and the SVQ
  * is considered broken.
  */
-static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec 
*out_sg,
+  size_t out_num, const struct iovec *in_sg,
+  size_t in_num, VirtQueueElement *elem)
 {
 unsigned qemu_head;
-bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
+bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
+  &qemu_head);
 if (unlikely(!ok)) {
 g_free(elem);
 return false;
@@ -249,6 +254,18 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 event_notifier_set(&svq->hdev_kick);
 }
 
+static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
+  VirtQueueElement *elem)
+{
+bool ok = vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
+elem->in_num, elem);
+if (ok) {
+vhost_svq_kick(svq);
+}
+
+return ok;
+}
+
 /**
  * Forward available buffers.
  *
@@ -301,12 +318,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 return;
 }
 
-ok = vhost_svq_add(svq, elem);
+ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
 if (unlikely(!ok)) {
 /* VQ is broken, just return and ignore any other kicks */
 return;
 }
-vhost_svq_kick(svq);
 }
 
 virtio_queue_set_notification(svq->vq, true);
-- 
2.31.1

[PATCH 09/22] vhost: Add opaque member to SVQElement

When qemu injects buffers to the vdpa device it will be used to maintain
contextual data. If SVQ has no operation, it will be used to maintain
the VirtQueueElement pointer.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  3 ++-
 hw/virtio/vhost-shadow-virtqueue.c | 13 +++--
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 5646d875cb..3e1bea12ca 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -16,7 +16,8 @@
 #include "hw/virtio/vhost-iova-tree.h"
 
 typedef struct SVQElement {
-VirtQueueElement *elem;
+/* Opaque data */
+void *opaque;
 
 /* Last descriptor of the chain */
 uint32_t last_chain_id;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 635b6b359f..01caa5887e 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -236,7 +236,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const 
VhostShadowVirtqueue *svq,
  */
 static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec 
*out_sg,
   size_t out_num, const struct iovec *in_sg,
-  size_t in_num, VirtQueueElement *elem)
+  size_t in_num, void *opaque)
 {
 SVQElement *svq_elem;
 unsigned qemu_head;
@@ -244,13 +244,12 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, 
const struct iovec *out_sg,
 bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
   &qemu_head);
 if (unlikely(!ok)) {
-g_free(elem);
 return false;
 }
 
 n = out_num + in_num;
 svq_elem = &svq->ring_id_maps[qemu_head];
-svq_elem->elem = elem;
+svq_elem->opaque = opaque;
 svq_elem->last_chain_id = vhost_svq_last_desc_of_chain(svq, n, qemu_head);
 return true;
 }
@@ -276,6 +275,8 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
 elem->in_num, elem);
 if (ok) {
 vhost_svq_kick(svq);
+} else {
+g_free(elem);
 }
 
 return ok;
@@ -391,7 +392,7 @@ static void 
vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
 
 static bool vhost_svq_is_empty_elem(SVQElement elem)
 {
-return elem.elem == NULL;
+return elem.opaque == NULL;
 }
 
 static SVQElement vhost_svq_empty_elem(void)
@@ -466,7 +467,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 break;
 }
 
-elem = g_steal_pointer(&svq_elem.elem);
+elem = g_steal_pointer(&svq_elem.opaque);
 virtqueue_fill(vq, elem, len, i++);
 }
 
@@ -634,7 +635,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 
 for (unsigned i = 0; i < svq->vring.num; ++i) {
 g_autofree VirtQueueElement *elem = NULL;
-elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
+elem = g_steal_pointer(&svq->ring_id_maps[i].opaque);
 if (elem) {
 virtqueue_detach_element(svq->vq, elem, 0);
 }
-- 
2.31.1

[PATCH 03/22] vdpa: Clean vhost_vdpa_dev_start(dev, false)

Return value is never checked and is a clean path, so assume success

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 66f054a12c..d6ba4a492a 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -872,41 +872,35 @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
 /**
  * Unmap a SVQ area in the device
  */
-static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
+static void vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
   const DMAMap *needle)
 {
 const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
 hwaddr size;
-int r;
 
 if (unlikely(!result)) {
 error_report("Unable to find SVQ address to unmap");
-return false;
+return;
 }
 
 size = ROUND_UP(result->size, qemu_real_host_page_size());
-r = vhost_vdpa_dma_unmap(v, result->iova, size);
-return r == 0;
+vhost_vdpa_dma_unmap(v, result->iova, size);
 }
 
-static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
+static void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
const VhostShadowVirtqueue *svq)
 {
 DMAMap needle = {};
 struct vhost_vdpa *v = dev->opaque;
 struct vhost_vring_addr svq_addr;
-bool ok;
 
 vhost_svq_get_vring_addr(svq, &svq_addr);
 
 needle.translated_addr = svq_addr.desc_user_addr;
-ok = vhost_vdpa_svq_unmap_ring(v, &needle);
-if (unlikely(!ok)) {
-return false;
-}
+vhost_vdpa_svq_unmap_ring(v, &needle);
 
 needle.translated_addr = svq_addr.used_user_addr;
-return vhost_vdpa_svq_unmap_ring(v, &needle);
+vhost_vdpa_svq_unmap_ring(v, &needle);
 }
 
 /**
@@ -1066,23 +1060,19 @@ err:
 return false;
 }
 
-static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
+static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 {
 struct vhost_vdpa *v = dev->opaque;
 
 if (!v->shadow_vqs) {
-return true;
+return;
 }
 
 for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
 VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
-bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
-if (unlikely(!ok)) {
-return false;
-}
+vhost_vdpa_svq_unmap_rings(dev, svq);
 }
 
-return true;
 }
 
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
@@ -1099,10 +1089,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
 }
 vhost_vdpa_set_vring_ready(dev);
 } else {
-ok = vhost_vdpa_svqs_stop(dev);
-if (unlikely(!ok)) {
-return -1;
-}
+vhost_vdpa_svqs_stop(dev);
 vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
 }
 
-- 
2.31.1

[PATCH 12/22] vhost: Add vhost_svq_inject

This allows qemu to inject buffers to the device.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 37 ++
 2 files changed, 39 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 855fa82e3e..09b87078af 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -84,6 +84,8 @@ bool vhost_svq_valid_features(uint64_t features, Error 
**errp);
 
 void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
  const VirtQueueElement *elem, uint32_t len);
+int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+ size_t out_num, size_t in_num, void *opaque);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 2b0a268655..4d59954f1b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -282,6 +282,43 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue 
*svq,
 return ok;
 }
 
+/**
+ * Inject a chain of buffers to the device
+ *
+ * @svq: Shadow VirtQueue
+ * @iov: I/O vector
+ * @out_num: Number of front out descriptors
+ * @in_num: Number of last input descriptors
+ * @opaque: Contextual data to store in descriptor
+ *
+ * Return 0 on success, -ENOMEM if cannot inject
+ */
+int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+ size_t out_num, size_t in_num, void *opaque)
+{
+bool ok;
+size_t num = out_num + in_num;
+
+/*
+ * All vhost_svq_inject calls are controlled by qemu so we won't hit this
+ * assertions.
+ */
+assert(out_num || in_num);
+
+if (unlikely(num > vhost_svq_available_slots(svq))) {
+error_report("Injecting in a full queue");
+return -ENOMEM;
+}
+
+ok = vhost_svq_add(svq, iov, out_num, iov + out_num, in_num, opaque);
+if (unlikely(!ok)) {
+return -EINVAL;
+}
+
+vhost_svq_kick(svq);
+return 0;
+}
+
 /**
  * Forward available buffers.
  *
-- 
2.31.1

[PATCH 08/22] vhost: Move last chain id to SVQ element

We will allow SVQ user to store opaque data for each element, so its
easier if we store this kind of information just at avail.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  3 +++
 hw/virtio/vhost-shadow-virtqueue.c | 14 --
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 0b34f48037..5646d875cb 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -17,6 +17,9 @@
 
 typedef struct SVQElement {
 VirtQueueElement *elem;
+
+/* Last descriptor of the chain */
+uint32_t last_chain_id;
 } SVQElement;
 
 /* Shadow virtqueue to relay notifications */
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index d50e1383f5..635b6b359f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -238,7 +238,9 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const 
struct iovec *out_sg,
   size_t out_num, const struct iovec *in_sg,
   size_t in_num, VirtQueueElement *elem)
 {
+SVQElement *svq_elem;
 unsigned qemu_head;
+size_t n;
 bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
   &qemu_head);
 if (unlikely(!ok)) {
@@ -246,7 +248,10 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const 
struct iovec *out_sg,
 return false;
 }
 
-svq->ring_id_maps[qemu_head].elem = elem;
+n = out_num + in_num;
+svq_elem = &svq->ring_id_maps[qemu_head];
+svq_elem->elem = elem;
+svq_elem->last_chain_id = vhost_svq_last_desc_of_chain(svq, n, qemu_head);
 return true;
 }
 
@@ -399,7 +404,7 @@ static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue 
*svq, uint32_t *len)
 const vring_used_t *used = svq->vring.used;
 vring_used_elem_t used_elem;
 SVQElement svq_elem = vhost_svq_empty_elem();
-uint16_t last_used, last_used_chain, num;
+uint16_t last_used;
 
 if (!vhost_svq_more_used(svq)) {
 return svq_elem;
@@ -427,11 +432,8 @@ static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue 
*svq, uint32_t *len)
 return svq_elem;
 }
 
-num = svq_elem.elem->in_num + svq_elem.elem->out_num;
-last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
-svq->desc_next[last_used_chain] = svq->free_head;
+svq->desc_next[svq_elem.last_chain_id] = svq->free_head;
 svq->free_head = used_elem.id;
-
 *len = used_elem.len;
 return svq_elem;
 }
-- 
2.31.1

[PATCH 13/22] vhost: add vhost_svq_poll

It allows the Shadow Control VirtQueue to wait the device to use the commands
that restore the net device state after a live migration.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 54 --
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 09b87078af..57ff97ce4f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -86,6 +86,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
  const VirtQueueElement *elem, uint32_t len);
 int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
  size_t out_num, size_t in_num, void *opaque);
+ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 4d59954f1b..f4affa52ee 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -10,6 +10,8 @@
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
+#include 
+
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
@@ -492,10 +494,11 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
 }
 }
 
-static void vhost_svq_flush(VhostShadowVirtqueue *svq,
-bool check_for_avail_queue)
+static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
+  bool check_for_avail_queue)
 {
 VirtQueue *vq = svq->vq;
+size_t ret = 0;
 
 /* Forward as many used buffers as possible. */
 do {
@@ -512,7 +515,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
  "More than %u used buffers obtained in a %u size SVQ",
  i, svq->vring.num);
 virtqueue_flush(vq, svq->vring.num);
-return;
+return ret;
 }
 
 svq_elem = vhost_svq_get_buf(svq, &len);
@@ -522,6 +525,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 
 elem = g_steal_pointer(&svq_elem.opaque);
 virtqueue_fill(vq, elem, len, i++);
+ret++;
 }
 
 virtqueue_flush(vq, i);
@@ -535,6 +539,50 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 vhost_handle_guest_kick(svq);
 }
 } while (!vhost_svq_enable_notification(svq));
+
+return ret;
+}
+
+/**
+ * Poll the SVQ for device used buffers.
+ *
+ * This function race with main event loop SVQ polling, so extra
+ * synchronization is needed.
+ *
+ * Return the number of descriptors read from the device.
+ */
+ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq)
+{
+int fd = event_notifier_get_fd(&svq->hdev_call);
+GPollFD poll_fd = {
+.fd = fd,
+.events = G_IO_IN,
+};
+assert(fd >= 0);
+int r = g_poll(&poll_fd, 1, -1);
+
+if (unlikely(r < 0)) {
+error_report("Cannot poll device call fd "G_POLLFD_FORMAT": (%d) %s",
+ poll_fd.fd, errno, g_strerror(errno));
+return -errno;
+}
+
+if (r == 0) {
+return 0;
+}
+
+if (unlikely(poll_fd.revents & ~(G_IO_IN))) {
+error_report(
+"Error polling device call fd "G_POLLFD_FORMAT": revents=%d",
+poll_fd.fd, poll_fd.revents);
+return -1;
+}
+
+/*
+ * Max return value of vhost_svq_flush is (uint16_t)-1, so it's safe to
+ * convert to ssize_t.
+ */
+return vhost_svq_flush(svq, false);
 }
 
 /**
-- 
2.31.1

[PATCH 06/22] vhost: Reorder vhost_svq_last_desc_of_chain

SVQ is going to store it in SVQElement, so we need it before add functions.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 2d70f832e9..a4d5d7bae0 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -217,6 +217,16 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
 return true;
 }
 
+static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
+ uint16_t num, uint16_t i)
+{
+for (uint16_t j = 0; j < (num - 1); ++j) {
+i = le16_to_cpu(svq->desc_next[i]);
+}
+
+return i;
+}
+
 /**
  * Add an element to a SVQ.
  *
@@ -374,16 +384,6 @@ static void 
vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
 svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
 }
 
-static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
- uint16_t num, uint16_t i)
-{
-for (uint16_t j = 0; j < (num - 1); ++j) {
-i = le16_to_cpu(svq->desc_next[i]);
-}
-
-return i;
-}
-
 static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
uint32_t *len)
 {
-- 
2.31.1

[PATCH 10/22] vdpa: Small rename of error labels

So later patches are cleaner

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index d6ba4a492a..fccfc832ea 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1024,7 +1024,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
 int r;
 bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err);
 if (unlikely(!ok)) {
-goto err;
+goto err_svq_setup;
 }
 
 vhost_svq_start(svq, dev->vdev, vq);
@@ -1049,8 +1049,7 @@ err_set_addr:
 err_map:
 vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, i));
 
-err:
-error_reportf_err(err, "Cannot setup SVQ %u: ", i);
+err_svq_setup:
 for (unsigned j = 0; j < i; ++j) {
 VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, j);
 vhost_vdpa_svq_unmap_rings(dev, svq);
-- 
2.31.1

[PATCH 11/22] vhost: add vhost_svq_push_elem

This function allows external SVQ users to return guest's available
buffers.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 16 
 2 files changed, 18 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 3e1bea12ca..855fa82e3e 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -82,6 +82,8 @@ typedef struct VhostShadowVirtqueue {
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
+void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
+ const VirtQueueElement *elem, uint32_t len);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 01caa5887e..2b0a268655 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -439,6 +439,22 @@ static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue 
*svq, uint32_t *len)
 return svq_elem;
 }
 
+/**
+ * Push an element to SVQ, returning it to the guest.
+ */
+void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
+ const VirtQueueElement *elem, uint32_t len)
+{
+virtqueue_push(svq->vq, elem, len);
+if (svq->next_guest_avail_elem) {
+/*
+ * Avail ring was full when vhost_svq_flush was called, so it's a
+ * good moment to make more descriptors available if possible.
+ */
+vhost_handle_guest_kick(svq);
+}
+}
+
 static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 bool check_for_avail_queue)
 {
-- 
2.31.1

[PATCH 17/22] vdpa: Export vhost_vdpa_dma_map and unmap calls

Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks that
can set a different state in qemu device model and vdpa device.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h | 4 
 hw/virtio/vhost-vdpa.c | 7 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index a29dbb3f53..7214eb47dc 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -39,4 +39,8 @@ typedef struct vhost_vdpa {
 VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+   void *vaddr, bool readonly);
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
+
 #endif
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 9a4f00c114..7d2922ccbf 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -71,8 +71,8 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 return false;
 }
 
-static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-  void *vaddr, bool readonly)
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+   void *vaddr, bool readonly)
 {
 struct vhost_msg_v2 msg = {};
 int fd = v->device_fd;
@@ -97,8 +97,7 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr 
iova, hwaddr size,
 return ret;
 }
 
-static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
-hwaddr size)
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
 {
 struct vhost_msg_v2 msg = {};
 int fd = v->device_fd;
-- 
2.31.1

[PATCH 16/22] vhost: add detach SVQ operation

To notify the caller it needs to discard the element.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h | 11 +++
 hw/virtio/vhost-shadow-virtqueue.c | 11 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index cfc891e2e8..dc0059adc6 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -44,9 +44,20 @@ typedef void (*VirtQueueUsedCallback)(VhostShadowVirtqueue 
*svq,
   void *used_elem_opaque,
   uint32_t written);
 
+/**
+ * Detach the element from the shadow virtqueue.  SVQ needs to free it and it
+ * cannot be pushed or discarded.
+ *
+ * @elem_opaque: The element opaque
+ *
+ * Return the guest element to detach and free if any.
+ */
+typedef VirtQueueElement *(*VirtQueueDetachCallback)(void *elem_opaque);
+
 typedef struct VhostShadowVirtqueueOps {
 VirtQueueAvailCallback avail_handler;
 VirtQueueUsedCallback used_handler;
+VirtQueueDetachCallback detach_handler;
 } VhostShadowVirtqueueOps;
 
 /* Shadow virtqueue to relay notifications */
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 78579b9e0b..626691ac4e 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -749,7 +749,16 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 
 for (unsigned i = 0; i < svq->vring.num; ++i) {
 g_autofree VirtQueueElement *elem = NULL;
-elem = g_steal_pointer(&svq->ring_id_maps[i].opaque);
+void *opaque = g_steal_pointer(&svq->ring_id_maps[i].opaque);
+
+if (!opaque) {
+continue;
+} else if (svq->ops) {
+elem = svq->ops->detach_handler(opaque);
+} else {
+elem = opaque;
+}
+
 if (elem) {
 virtqueue_detach_element(svq->vq, elem, 0);
 }
-- 
2.31.1

[PATCH 15/22] vhost: Add svq avail_handler callback

This allows external handlers to be aware of new buffers that the guest
places in the virtqueue.

When this callback is defined the ownership of guest's virtqueue element
is transferred to the callback. This means that if the user wants to
forward the descriptor it needs to manually inject it. The callback is
also free to process the command by itself and use the element with
svq_push.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h | 23 ++-
 hw/virtio/vhost-shadow-virtqueue.c | 13 +++--
 hw/virtio/vhost-vdpa.c |  2 +-
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 96ce7aa62e..cfc891e2e8 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -24,11 +24,28 @@ typedef struct SVQElement {
 } SVQElement;
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+
+/**
+ * Callback to handle an avail buffer.
+ *
+ * @svq:  Shadow virtqueue
+ * @elem:  Element placed in the queue by the guest
+ * @vq_callback_opaque:  Opaque
+ *
+ * Returns true if the vq is running as expected, false otherwise.
+ *
+ * Note that ownership of elem is transferred to the callback.
+ */
+typedef bool (*VirtQueueAvailCallback)(VhostShadowVirtqueue *svq,
+   VirtQueueElement *elem,
+   void *vq_callback_opaque);
+
 typedef void (*VirtQueueUsedCallback)(VhostShadowVirtqueue *svq,
   void *used_elem_opaque,
   uint32_t written);
 
 typedef struct VhostShadowVirtqueueOps {
+VirtQueueAvailCallback avail_handler;
 VirtQueueUsedCallback used_handler;
 } VhostShadowVirtqueueOps;
 
@@ -79,6 +96,9 @@ typedef struct VhostShadowVirtqueue {
 /* Caller callbacks */
 const VhostShadowVirtqueueOps *ops;
 
+/* Caller callbacks opaque */
+void *ops_opaque;
+
 /* Next head to expose to the device */
 uint16_t shadow_avail_idx;
 
@@ -111,7 +131,8 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
VirtIODevice *vdev,
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
-const VhostShadowVirtqueueOps *ops);
+const VhostShadowVirtqueueOps *ops,
+void *ops_opaque);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 40183f8afd..78579b9e0b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -374,7 +374,13 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 return;
 }
 
-ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
+if (svq->ops) {
+ok = svq->ops->avail_handler(svq, g_steal_pointer(&elem),
+ svq->ops_opaque);
+} else {
+ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
+}
+
 if (unlikely(!ok)) {
 /* VQ is broken, just return and ignore any other kicks */
 return;
@@ -766,13 +772,15 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  *
  * @iova_tree: Tree to perform descriptors translations
  * @ops: SVQ owner callbacks
+ * @ops_opaque: ops opaque pointer
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
-const VhostShadowVirtqueueOps *ops)
+const VhostShadowVirtqueueOps *ops,
+void *ops_opaque)
 {
 g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
 int r;
@@ -795,6 +803,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree 
*iova_tree,
 event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
 svq->iova_tree = iova_tree;
 svq->ops = ops;
+svq->ops_opaque = ops_opaque;
 return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 25f7146fe4..9a4f00c114 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -420,7 +420,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 for (unsigned n = 0; n < hdev->nvqs; ++n) {
 g_autoptr(VhostShadowVirtqueue) svq;
 
-svq = vhost_svq_new(v->iova_tree, NULL);
+svq = vhost_svq_new(v->iova_tree, NULL, NULL);
 if (unlikely(!svq)) {
 error_setg(errp, "Cannot create svq %u", n);
 return -1;
-- 
2.31.1

[PATCH 18/22] vdpa: manual forward CVQ buffers

Do a simple forwarding of CVQ buffers, the same work SVQ could do but
through callbacks. No functional change intended.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h |  3 ++
 hw/virtio/vhost-vdpa.c |  3 +-
 net/vhost-vdpa.c   | 59 ++
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 7214eb47dc..d85643 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -15,6 +15,7 @@
 #include 
 
 #include "hw/virtio/vhost-iova-tree.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -35,6 +36,8 @@ typedef struct vhost_vdpa {
 /* IOVA mapping used by the Shadow Virtqueue */
 VhostIOVATree *iova_tree;
 GPtrArray *shadow_vqs;
+const VhostShadowVirtqueueOps *shadow_vq_ops;
+void *shadow_vq_ops_opaque;
 struct vhost_dev *dev;
 VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7d2922ccbf..c1162daecc 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -419,7 +419,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 for (unsigned n = 0; n < hdev->nvqs; ++n) {
 g_autoptr(VhostShadowVirtqueue) svq;
 
-svq = vhost_svq_new(v->iova_tree, NULL, NULL);
+svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
+v->shadow_vq_ops_opaque);
 if (unlikely(!svq)) {
 error_setg(errp, "Cannot create svq %u", n);
 return -1;
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index df1e69ee72..8558ad7a01 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -11,11 +11,14 @@
 
 #include "qemu/osdep.h"
 #include "clients.h"
+#include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/vhost-vdpa.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
+#include "qemu/log.h"
+#include "qemu/memalign.h"
 #include "qemu/option.h"
 #include "qapi/error.h"
 #include 
@@ -187,6 +190,58 @@ static NetClientInfo net_vhost_vdpa_info = {
 .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+/**
+ * Forward buffer for the moment.
+ */
+static bool vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
+ VirtQueueElement *guest_elem,
+ void *opaque)
+{
+g_autofree VirtQueueElement *elem = guest_elem;
+unsigned int n = elem->out_num + elem->in_num;
+g_autofree struct iovec *iov = g_new(struct iovec, n);
+size_t in_len;
+virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
+int r;
+
+memcpy(iov, elem->out_sg, elem->out_num);
+memcpy(iov + elem->out_num, elem->in_sg, elem->in_num);
+
+r = vhost_svq_inject(svq, iov, elem->out_num, elem->in_num, elem);
+if (unlikely(r != 0)) {
+goto err;
+}
+
+/* Now elem belongs to SVQ */
+g_steal_pointer(&elem);
+return true;
+
+err:
+in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status,
+  sizeof(status));
+vhost_svq_push_elem(svq, elem, in_len);
+return true;
+}
+
+static VirtQueueElement *vhost_vdpa_net_handle_ctrl_detach(void *elem_opaque)
+{
+return elem_opaque;
+}
+
+static void vhost_vdpa_net_handle_ctrl_used(VhostShadowVirtqueue *svq,
+void *vq_elem_opaque,
+uint32_t dev_written)
+{
+g_autofree VirtQueueElement *guest_elem = vq_elem_opaque;
+vhost_svq_push_elem(svq, guest_elem, sizeof(virtio_net_ctrl_ack));
+}
+
+static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
+.avail_handler = vhost_vdpa_net_handle_ctrl_avail,
+.used_handler = vhost_vdpa_net_handle_ctrl_used,
+.detach_handler = vhost_vdpa_net_handle_ctrl_detach,
+};
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
const char *device,
const char *name,
@@ -211,6 +266,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.device_fd = vdpa_device_fd;
 s->vhost_vdpa.index = queue_pair_index;
+if (!is_datapath) {
+s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
+s->vhost_vdpa.shadow_vq_ops_opaque = s;
+}
 ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
 if (ret) {
 qemu_del_net_client(nc);
-- 
2.31.1

[PATCH 19/22] vdpa: Buffer CVQ support on shadow virtqueue

Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like rx filtering.

Virtio-net control VQ copies now the descriptors to qemu's VA, so we
avoid TOCTOU with the guest's or device's memory every time there is a
device model change. Otherwise, the guest could change the memory
content in the time between qemu and the device reads it.

Likewise, qemu does not share the memory of the command with the device:
it exposes another copy to it.

To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
implemented.  If virtio-net driver changes MAC the virtio-net device
model will be updated with the new one, and a rx filtering change event
will be raised.

Others cvq commands could be added here straightforwardly but they have
been not tested.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 334 +--
 1 file changed, 322 insertions(+), 12 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 8558ad7a01..3ae74f7fb5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -28,6 +28,26 @@
 #include "monitor/monitor.h"
 #include "hw/virtio/vhost.h"
 
+typedef struct CVQElement {
+/* Device's in and out buffer */
+void *in_buf, *out_buf;
+
+/* Optional guest element from where this cvqelement was created */
+VirtQueueElement *guest_elem;
+
+/* Control header sent by the guest. */
+struct virtio_net_ctrl_hdr ctrl;
+
+/* vhost-vdpa device, for cleanup reasons */
+struct vhost_vdpa *vdpa;
+
+/* Length of out data */
+size_t out_len;
+
+/* Copy of the out data sent by the guest excluding ctrl. */
+uint8_t out_data[];
+} CVQElement;
+
 /* Todo:need to add the multiqueue support here */
 typedef struct VhostVDPAState {
 NetClientState nc;
@@ -191,29 +211,277 @@ static NetClientInfo net_vhost_vdpa_info = {
 };
 
 /**
- * Forward buffer for the moment.
+ * Unmap a descriptor chain of a SVQ element, optionally copying its in buffers
+ *
+ * @svq: Shadow VirtQueue
+ * @iova: SVQ IO Virtual address of descriptor
+ * @iov: Optional iovec to store device writable buffer
+ * @iov_cnt: iov length
+ * @buf_len: Length written by the device
+ *
+ * TODO: Use me! and adapt to net/vhost-vdpa format
+ * Print error message in case of error
+ */
+static void vhost_vdpa_cvq_unmap_buf(CVQElement *elem, void *addr)
+{
+struct vhost_vdpa *v = elem->vdpa;
+VhostIOVATree *tree = v->iova_tree;
+DMAMap needle = {
+/*
+ * No need to specify size or to look for more translations since
+ * this contiguous chunk was allocated by us.
+ */
+.translated_addr = (hwaddr)(uintptr_t)addr,
+};
+const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
+int r;
+
+if (unlikely(!map)) {
+error_report("Cannot locate expected map");
+goto err;
+}
+
+r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
+if (unlikely(r != 0)) {
+error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
+}
+
+vhost_iova_tree_remove(tree, map);
+
+err:
+qemu_vfree(addr);
+}
+
+static void vhost_vdpa_cvq_delete_elem(CVQElement *elem)
+{
+if (elem->out_buf) {
+vhost_vdpa_cvq_unmap_buf(elem, g_steal_pointer(&elem->out_buf));
+}
+
+if (elem->in_buf) {
+vhost_vdpa_cvq_unmap_buf(elem, g_steal_pointer(&elem->in_buf));
+}
+
+/* Guest element must have been returned to the guest or free otherway */
+assert(!elem->guest_elem);
+
+g_free(elem);
+}
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(CVQElement, vhost_vdpa_cvq_delete_elem);
+
+static int vhost_vdpa_net_cvq_svq_inject(VhostShadowVirtqueue *svq,
+ CVQElement *cvq_elem,
+ size_t out_len)
+{
+const struct iovec iov[] = {
+{
+.iov_base = cvq_elem->out_buf,
+.iov_len = out_len,
+},{
+.iov_base = cvq_elem->in_buf,
+.iov_len = sizeof(virtio_net_ctrl_ack),
+}
+};
+
+return vhost_svq_inject(svq, iov, 1, 1, cvq_elem);
+}
+
+static void *vhost_vdpa_cvq_alloc_buf(struct vhost_vdpa *v,
+  const uint8_t *out_data, size_t data_len,
+  bool write)
+{
+DMAMap map = {};
+size_t buf_len = ROUND_UP(data_len, qemu_real_host_page_size());
+void *buf = qemu_memalign(qemu_real_host_page_size(), buf_len);
+int r;
+
+if (!write) {
+memcpy(buf, out_data, data_len);
+memset(buf + data_len, 0, buf_len - data_len);
+} else {
+memset(buf, 0, data_len);
+}
+
+map.translated_addr = (hwaddr)(uintptr_t)buf;
+map.size = buf_len - 1;
+map.perm = write ? IOMMU_RW : IOMMU_RO,
+r = vhost_iova_tree_map_alloc(v->iova_tree, &map);
+if (unlikely(r != IOVA_OK)) {
+error_report("Cannot map injected element");
+goto err;
+}
+
+

[PATCH 20/22] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs

To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3ae74f7fb5..b6ed30bec3 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -588,20 +588,24 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 return nc;
 }
 
-static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
+static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
+{
+int ret = ioctl(fd, VHOST_GET_FEATURES, features);
+if (unlikely(ret < 0)) {
+error_setg_errno(errp, errno,
+ "Fail to query features from vhost-vDPA device");
+}
+return ret;
+}
+
+static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
+  int *has_cvq, Error **errp)
 {
 unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
 g_autofree struct vhost_vdpa_config *config = NULL;
 __virtio16 *max_queue_pairs;
-uint64_t features;
 int ret;
 
-ret = ioctl(fd, VHOST_GET_FEATURES, &features);
-if (ret) {
-error_setg(errp, "Fail to query features from vhost-vDPA device");
-return ret;
-}
-
 if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
 *has_cvq = 1;
 } else {
@@ -631,10 +635,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 NetClientState *peer, Error **errp)
 {
 const NetdevVhostVDPAOptions *opts;
+uint64_t features;
 int vdpa_device_fd;
 g_autofree NetClientState **ncs = NULL;
 NetClientState *nc;
-int queue_pairs, i, has_cvq = 0;
+int queue_pairs, r, i, has_cvq = 0;
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = &netdev->u.vhost_vdpa;
@@ -648,7 +653,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 return -errno;
 }
 
-queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
+r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
+if (unlikely(r < 0)) {
+return r;
+}
+
+queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
  &has_cvq, errp);
 if (queue_pairs < 0) {
 qemu_close(vdpa_device_fd);
-- 
2.31.1

[PATCH 14/22] vhost: Add custom used buffer callback

The callback allows SVQ users to know the VirtQueue requests and
responses. QEMU can use this to synchronize virtio device model state,
allowing to migrate it with minimum changes to the migration code.

If callbacks are specified at svq creation, the buffers need to be
injected to the device using vhost_svq_inject. An opaque data must be
given with it, and its returned to the callback at used_handler call.

In the case of networking, this will be used to inspect control
virtqueue messages status from the device.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h | 15 ++-
 hw/virtio/vhost-shadow-virtqueue.c | 22 --
 hw/virtio/vhost-vdpa.c |  3 ++-
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 57ff97ce4f..96ce7aa62e 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -23,6 +23,15 @@ typedef struct SVQElement {
 uint32_t last_chain_id;
 } SVQElement;
 
+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+typedef void (*VirtQueueUsedCallback)(VhostShadowVirtqueue *svq,
+  void *used_elem_opaque,
+  uint32_t written);
+
+typedef struct VhostShadowVirtqueueOps {
+VirtQueueUsedCallback used_handler;
+} VhostShadowVirtqueueOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
 /* Shadow vring */
@@ -67,6 +76,9 @@ typedef struct VhostShadowVirtqueue {
  */
 uint16_t *desc_next;
 
+/* Caller callbacks */
+const VhostShadowVirtqueueOps *ops;
+
 /* Next head to expose to the device */
 uint16_t shadow_avail_idx;
 
@@ -98,7 +110,8 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice 
*vdev,
  VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+const VhostShadowVirtqueueOps *ops);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index f4affa52ee..40183f8afd 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -306,6 +306,7 @@ int vhost_svq_inject(VhostShadowVirtqueue *svq, const 
struct iovec *iov,
  * assertions.
  */
 assert(out_num || in_num);
+assert(svq->ops);
 
 if (unlikely(num > vhost_svq_available_slots(svq))) {
 error_report("Injecting in a full queue");
@@ -508,7 +509,6 @@ static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
 while (true) {
 uint32_t len;
 SVQElement svq_elem;
-g_autofree VirtQueueElement *elem = NULL;
 
 if (unlikely(i >= svq->vring.num)) {
 qemu_log_mask(LOG_GUEST_ERROR,
@@ -523,13 +523,20 @@ static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
 break;
 }
 
-elem = g_steal_pointer(&svq_elem.opaque);
-virtqueue_fill(vq, elem, len, i++);
+if (svq->ops) {
+svq->ops->used_handler(svq, svq_elem.opaque, len);
+} else {
+g_autofree VirtQueueElement *elem = NULL;
+elem = g_steal_pointer(&svq_elem.opaque);
+virtqueue_fill(vq, elem, len, i++);
+}
 ret++;
 }
 
-virtqueue_flush(vq, i);
-event_notifier_set(&svq->svq_call);
+if (i > 0) {
+virtqueue_flush(vq, i);
+event_notifier_set(&svq->svq_call);
+}
 
 if (check_for_avail_queue && svq->next_guest_avail_elem) {
 /*
@@ -758,12 +765,14 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * shadow methods and file descriptors.
  *
  * @iova_tree: Tree to perform descriptors translations
+ * @ops: SVQ owner callbacks
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+const VhostShadowVirtqueueOps *ops)
 {
 g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
 int r;
@@ -785,6 +794,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree 
*iova_tree)
 event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
 event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
 svq->iova_tree = iova_tree;
+svq->ops = ops;
 return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index fccfc832ea..25f7146fe4 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/

[PATCH 21/22] vdpa: Add device migration blocker

The device may need to add migration blockers. For example, if vdpa
device uses features not compatible with migration.

Add the possibility here.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost-vdpa.h |  1 +
 hw/virtio/vhost-vdpa.c | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d85643..d10a89303e 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -35,6 +35,7 @@ typedef struct vhost_vdpa {
 bool shadow_vqs_enabled;
 /* IOVA mapping used by the Shadow Virtqueue */
 VhostIOVATree *iova_tree;
+Error *migration_blocker;
 GPtrArray *shadow_vqs;
 const VhostShadowVirtqueueOps *shadow_vq_ops;
 void *shadow_vq_ops_opaque;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c1162daecc..764a81b57f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -20,6 +20,7 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "exec/address-spaces.h"
+#include "migration/blocker.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
@@ -1016,6 +1017,13 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
 return true;
 }
 
+if (v->migration_blocker) {
+int r = migrate_add_blocker(v->migration_blocker, &err);
+if (unlikely(r < 0)) {
+goto err_migration_blocker;
+}
+}
+
 for (i = 0; i < v->shadow_vqs->len; ++i) {
 VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
 VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
@@ -1057,6 +1065,9 @@ err_svq_setup:
 vhost_svq_stop(svq);
 }
 
+err_migration_blocker:
+error_reportf_err(err, "Cannot setup SVQ %u: ", i);
+
 return false;
 }
 
@@ -1073,6 +1084,9 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 vhost_vdpa_svq_unmap_rings(dev, svq);
 }
 
+if (v->migration_blocker) {
+migrate_del_blocker(v->migration_blocker);
+}
 }
 
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
-- 
2.31.1

[PATCH 22/22] vdpa: Add x-svq to NetdevVhostVDPAOptions

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez 
---
 qapi/net.json|  9 +-
 net/vhost-vdpa.c | 74 ++--
 2 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index 9af11e9a3b..75ba2cb989 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -445,12 +445,19 @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #  (default: 1)
 #
+# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
+# (default: false)
+#
+# Features:
+# @unstable: Member @x-svq is experimental.
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
 '*vhostdev': 'str',
-'*queues':   'int' } }
+'*queues':   'int',
+'*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetdevVmnetHostOptions:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index b6ed30bec3..a6ebc234c0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -92,6 +92,30 @@ const int vdpa_feature_bits[] = {
 VHOST_INVALID_FEATURE_BIT
 };
 
+/** Supported device specific feature bits with SVQ */
+static const uint64_t vdpa_svq_device_features =
+BIT_ULL(VIRTIO_NET_F_CSUM) |
+BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
+BIT_ULL(VIRTIO_NET_F_MTU) |
+BIT_ULL(VIRTIO_NET_F_MAC) |
+BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
+BIT_ULL(VIRTIO_NET_F_STATUS) |
+BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
+BIT_ULL(VIRTIO_NET_F_MQ) |
+BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
+BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
+BIT_ULL(VIRTIO_NET_F_STANDBY);
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -150,7 +174,11 @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_dev *dev = &s->vhost_net->dev;
 
+if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
+g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+}
 if (s->vhost_net) {
 vhost_net_cleanup(s->vhost_net);
 g_free(s->vhost_net);
@@ -398,6 +426,14 @@ static uint64_t vhost_vdpa_net_iov_len(const struct iovec 
*iov,
 return len;
 }
 
+static int vhost_vdpa_get_iova_range(int fd,
+ struct vhost_vdpa_iova_range *iova_range)
+{
+int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+return ret < 0 ? -errno : 0;
+}
+
 static CVQElement *vhost_vdpa_net_cvq_copy_elem(VhostVDPAState *s,
 VirtQueueElement *elem)
 {
@@ -558,7 +594,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
int vdpa_device_fd,
int queue_pair_index,
int nvqs,
-   bool is_datapath)
+   bool is_datapath,
+   bool svq,
+   VhostIOVATree *iova_tree)
 {
 NetClientState *nc = NULL;
 VhostVDPAState *s;
@@ -576,9 +614,13 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.device_fd = vdpa_device_fd;
 s->vhost_vdpa.index = queue_pair_index;
+s->vhost_vdpa.shadow_vqs_enabled = svq;
+s->vhost_vdpa.iova_tree = iova_tree;
 if (!is_datapath) {
 s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
 s->vhost_vdpa.shadow_vq_ops_opaque = s;
+error_setg(&s->vhost_vdpa.migration_blocker,
+   "Migration disabled: vhost-vdpa uses CVQ.");
 }
 ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
 if (ret) {
@@ -638,6 +680,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 uint64_t features;
 int vdpa_device_fd;
 g_autofree NetClientState **ncs = NULL;
+g_autoptr(VhostIOVATree) iova_tree = NULL;
 NetClientState *nc;
 int queue_pairs, r, i, has_cvq = 0;
 
@@ -665,22 +708,45 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 return queue_pairs;
 }
 
+if (opts->x_svq) {
+struct vhost_vdpa_iova_range iova_range;
+
+uint64_t invalid_dev_features =
+features & ~vdpa_svq_device_features &
+/* Transport are all accepted at this point */
+~MAKE_64BIT_MASK(VIRTIO_TRANSP

Re: [PATCH v2 2/4] target/s390x: Remove DISAS_PC_STALE

On 02.07.22 08:02, Richard Henderson wrote:
> There is nothing to distinguish this from DISAS_TOO_MANY.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target/s390x/tcg/translate.c | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index e38ae9ce09..a3422c0eb0 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -1126,10 +1126,6 @@ typedef struct {
>  /* We have updated the PC and CC values.  */
>  #define DISAS_PC_CC_UPDATED DISAS_TARGET_2
>  
> -/* We are exiting the TB, but have neither emitted a goto_tb, nor
> -   updated the PC for the next instruction to be executed.  */
> -#define DISAS_PC_STALE  DISAS_TARGET_3
> -
>  /* We are exiting the TB to the main loop.  */
>  #define DISAS_PC_STALE_NOCHAIN  DISAS_TARGET_4
>  
> @@ -3993,7 +3989,7 @@ static DisasJumpType op_sacf(DisasContext *s, DisasOps 
> *o)
>  {
>  gen_helper_sacf(cpu_env, o->in2);
>  /* Addressing mode has changed, so end the block.  */
> -return DISAS_PC_STALE;
> +return DISAS_TOO_MANY;
>  }
>  #endif
>  
> @@ -4029,7 +4025,7 @@ static DisasJumpType op_sam(DisasContext *s, DisasOps 
> *o)
>  tcg_temp_free_i64(tsam);
>  
>  /* Always exit the TB, since we (may have) changed execution mode.  */
> -return DISAS_PC_STALE;
> +return DISAS_TOO_MANY;
>  }
>  
>  static DisasJumpType op_sar(DisasContext *s, DisasOps *o)
> @@ -6562,13 +6558,13 @@ static DisasJumpType translate_one(CPUS390XState 
> *env, DisasContext *s)
>  
>  /* io should be the last instruction in tb when icount is enabled */
>  if (unlikely(icount && ret == DISAS_NEXT)) {
> -ret = DISAS_PC_STALE;
> +ret = DISAS_TOO_MANY;
>  }
>  
>  #ifndef CONFIG_USER_ONLY
>  if (s->base.tb->flags & FLAG_MASK_PER) {
>  /* An exception might be triggered, save PSW if not already done.  */
> -if (ret == DISAS_NEXT || ret == DISAS_PC_STALE) {
> +if (ret == DISAS_NEXT || ret == DISAS_TOO_MANY) {
>  tcg_gen_movi_i64(psw_addr, s->pc_tmp);
>  }
>  
> @@ -6634,7 +6630,6 @@ static void s390x_tr_tb_stop(DisasContextBase *dcbase, 
> CPUState *cs)
>  case DISAS_NORETURN:
>  break;
>  case DISAS_TOO_MANY:
> -case DISAS_PC_STALE:
>  case DISAS_PC_STALE_NOCHAIN:
>  update_psw_addr(dc);
>  /* FALLTHRU */

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb

Re: [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions

On Thu, Jul 7, 2022 at 8:23 AM Markus Armbruster  wrote:
>
> Eugenio Pérez  writes:
>
> > Finally offering the possibility to enable SVQ from the command line.
>
> QMP, too, I guess.
>

Hi Markus,

I'm not sure what you mean. Dynamic enabling / disabling of SVQ was
delayed, and now it's only possible to enable or disable it from the
beginning of the run of qemu. Do you mean to enable SVQ before
starting the guest somehow?

Thanks!

> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  qapi/net.json|  9 +-
> >  net/vhost-vdpa.c | 72 ++--
> >  2 files changed, 77 insertions(+), 4 deletions(-)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 9af11e9a3b..75ba2cb989 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -445,12 +445,19 @@
> >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> >  #  (default: 1)
> >  #
> > +# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
> > +# (default: false)
> > +#
> > +# Features:
> > +# @unstable: Member @x-svq is experimental.
> > +#
> >  # Since: 5.1
> >  ##
> >  { 'struct': 'NetdevVhostVDPAOptions',
> >'data': {
> >  '*vhostdev': 'str',
> > -'*queues':   'int' } }
> > +'*queues':   'int',
> > +'*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
> >
> >  ##
>
> QAPI schema:
> Acked-by: Markus Armbruster 
>
> [...]
>

Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal

On Wed, Jul 06, 2022 at 01:35:22PM +0200, Markus Armbruster wrote:
> Markus Armbruster  writes:
> 
> > QDict is implemented as a simple hash table of fixed size.  Observe:
> >
> > * Slow for large n.  Not sure this matters.
> >
> > * A QDict with n entries takes 4120 + n * 32 bytes on my box.  Wastes
> >   space for small n, which is a common case.
> >
> > * Order of traversal depends on the hash function and on insertion
> >   order, because it iterates first over buckets, then collision
> >   chains.
> >
> > * Special code ensures qdict_size() takes constant time.
> >
> > Replace the hash table by a linked list.  Observe:
> >
> > * Even slower for large n.  Might be bad enough to matter.
> >
> > * A QDict with n entries takes 32 + n * 24 bytes.
> >
> > * Traversal is in insertion order.
> >
> > * qdict_size() is linear in the number of entries.
> >
> > This is an experiment.  Do not commit to master as is.
> 
> Forgot to mention: see also
> 
> Subject: Re: [PULL 14/15] qdev: Base object creation on QDict rather 
> than QemuOpts
> Message-ID: <87wnctzdl9@pond.sub.org>
> https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg00358.html

What alternative options do we have for addressing this scenario.

I can think of

  - Auto-create array elements, if seeing an element set before length.

This is based on the theory that 'len-PROP' field is largely
redundant. It is only needed if you want to create a sparse
array, with empty elements /after/ the last one explicitly
set, or if you want to get error reporting for an app setting
element 3 after saying it wanted a 2 element list. IMHO the
error reporting benefit is dubious, because the error scenario
only exists because we made the app set this redundant 'len-PROP'
attribute. Does anything actually need the 'sparse array'
facility ?

  - Special case array properties

Modify object_set_properties_from_qdict, so that it has a special
case first iterating over any properties with 'len-' prefix in
their name, then iterating over everything else.

Assuming this 'len-' property is the only case where we genuinely
have ordering dependancies between properties, this is quite a
simple fix, and avoid imposes ordering requirements on either
clients or QEMU in general.

  - Insertion order preserving QDict

What you've done here, pushing the ordering problem off to be
the caller's responsibility to get right. The caller could
easily have the same problem though. For example, for CLI args
these days, libvirt will populate a data structure based on
QAPI, and then serialize that to CLI args. I don't know offhand
if our code is insertion order preserving, or will hit this
exact same problem. Luckily we don't support the 'rocker'
object so havent hit this precise issue.

  - Any other options ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 1/4] target/s390x: Remove DISAS_GOTO_TB

On 02.07.22 08:02, Richard Henderson wrote:
> There is nothing to distinguish this from DISAS_NORETURN.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target/s390x/tcg/translate.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index fd2433d625..e38ae9ce09 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -1123,9 +1123,6 @@ typedef struct {
> exiting the TB.  */
>  #define DISAS_PC_UPDATEDDISAS_TARGET_0
>  
> -/* We have emitted one or more goto_tb.  No fixup required.  */
> -#define DISAS_GOTO_TB   DISAS_TARGET_1
> -
>  /* We have updated the PC and CC values.  */
>  #define DISAS_PC_CC_UPDATED DISAS_TARGET_2
>  
> @@ -1189,7 +1186,7 @@ static DisasJumpType help_goto_direct(DisasContext *s, 
> uint64_t dest)
>  tcg_gen_goto_tb(0);
>  tcg_gen_movi_i64(psw_addr, dest);
>  tcg_gen_exit_tb(s->base.tb, 0);
> -return DISAS_GOTO_TB;
> +return DISAS_NORETURN;
>  } else {
>  tcg_gen_movi_i64(psw_addr, dest);
>  per_branch(s, false);
> @@ -1258,7 +1255,7 @@ static DisasJumpType help_branch(DisasContext *s, 
> DisasCompare *c,
>  tcg_gen_movi_i64(psw_addr, dest);
>  tcg_gen_exit_tb(s->base.tb, 1);
>  
> -ret = DISAS_GOTO_TB;
> +ret = DISAS_NORETURN;
>  } else {
>  /* Fallthru can use goto_tb, but taken branch cannot.  */
>  /* Store taken branch destination before the brcond.  This
> @@ -6634,7 +6631,6 @@ static void s390x_tr_tb_stop(DisasContextBase *dcbase, 
> CPUState *cs)
>  DisasContext *dc = container_of(dcbase, DisasContext, base);
>  
>  switch (dc->base.is_jmp) {
> -case DISAS_GOTO_TB:
>  case DISAS_NORETURN:
>  break;
>  case DISAS_TOO_MANY:

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 3/4] target/s390x: Remove DISAS_PC_STALE_NOCHAIN

On 02.07.22 08:02, Richard Henderson wrote:
> Replace this with a flag: exit_to_mainloop.
> We can now control the exit for each of DISAS_TOO_MANY,
> DISAS_PC_UPDATED, and DISAS_PC_CC_UPDATED, and fold in
> the check for PER.
> 
> Signed-off-by: Richard Henderson 

Reviewed-by: David Hildenbrand 


-- 
Thanks,

David / dhildenb

Re: [PATCH v2 4/4] target/s390x: Exit tb after executing ex_value

On 02.07.22 08:02, Richard Henderson wrote:
> When EXECUTE sets ex_value to interrupt the constructed instruction,
> we implicitly disable interrupts so that the value is not corrupted.
> Exit to the main loop after execution, so that we re-evaluate any
> pending interrupts.
> 
> Reported-by: Sven Schnelle 
> Signed-off-by: Richard Henderson 
> ---
>  target/s390x/tcg/translate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index eac59c3dd1..e2ee005671 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -6593,7 +6593,7 @@ static void 
> s390x_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>  
>  dc->cc_op = CC_OP_DYNAMIC;
>  dc->ex_value = dc->base.tb->cs_base;
> -dc->exit_to_mainloop = (dc->base.tb->flags & FLAG_MASK_PER);
> +dc->exit_to_mainloop = (dc->base.tb->flags & FLAG_MASK_PER) || 
> dc->ex_value;
>  }
>  
>  static void s390x_tr_tb_start(DisasContextBase *db, CPUState *cs)

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb

Re: [PATCH 7/8] VirtIOBlock: protect rq with its own lock




Am 08/07/2022 um 11:33 schrieb Emanuele Giuseppe Esposito:
> 
> 
> Am 05/07/2022 um 16:45 schrieb Stefan Hajnoczi:
>> On Thu, Jun 09, 2022 at 10:37:26AM -0400, Emanuele Giuseppe Esposito wrote:
>>> @@ -946,17 +955,20 @@ static void virtio_blk_reset(VirtIODevice *vdev)
>>>   * stops all Iothreads.
>>>   */
>>>  blk_drain(s->blk);
>>> +aio_context_release(ctx);
>>>  
>>>  /* We drop queued requests after blk_drain() because blk_drain() 
>>> itself can
>>>   * produce them. */
>>> +qemu_mutex_lock(&s->req_mutex);
>>>  while (s->rq) {
>>>  req = s->rq;
>>>  s->rq = req->next;
>>> +qemu_mutex_unlock(&s->req_mutex);
>>>  virtqueue_detach_element(req->vq, &req->elem, 0);
>>>  virtio_blk_free_request(req);
>>> +qemu_mutex_lock(&s->req_mutex);
>>
>> Why is req_mutex dropped temporarily? At this point we don't really need
>> the req_mutex (all I/O should be stopped and drained), but maybe we
>> should do:
> 
> Agree that maybe it is not useful to drop the mutex temporarily.
> 
> Regarding why req_mutex is not needed, yes I guess it isn't. Should I
> get rid of this hunk at all, and maybe leave a comment like "no
> synchronization needed, due to drain + ->stop_ioeventfd()"?

Actually, regarding this, I found why I added the lock:

https://patchew.org/QEMU/20220426085114.199647-1-eespo...@redhat.com/#584d7d1a-94cc-9ebb-363b-2fddb8d79...@redhat.com

So maybe it's better to add it.

> 
>>
>>   WITH_QEMU_MUTEX(&s->req_mutex) {
>>   req = s->rq;
>>   s->rq = NULL;
>>   }
>>
>>   ...process req list...
> 
> Not sure what you mean here, we are looping on s->rq, so do we need to
> protect also that? and why setting it to NULL? Sorry I am a little bit
> lost here.
> 
> Thank you,
> Emanuele
> 
>>
>> Otherwise:
>> Reviewed-by: Stefan Hajnoczi 
>>

The case for array properties (was: [PULL 14/15] qdev: Base object creation on QDict rather than QemuOpts)

Cc'ing QOM maintainers.

Peter Maydell  writes:

> On Mon, 4 Jul 2022 at 05:50, Markus Armbruster  wrote:
>> My initial (knee-jerk) reaction to breaking array properties: Faster,
>> Pussycat! Kill! Kill!
>
> In an ideal world, what would you replace them with?

Let's first recapitulate their intended purpose.

commit 339659041f87a76f8b71ad3d12cadfc5f89b4bb3q
Author: Peter Crosthwaite 
Date:   Tue Aug 19 23:55:52 2014 -0700

qom: Add automatic arrayification to object_property_add()

If "[*]" is given as the last part of a QOM property name, treat that
as an array property. The added property is given the first available
name, replacing the * with a decimal number counting from 0.

First add with name "foo[*]" will be "foo[0]". Second "foo[1]" and so
on.

Callers may inspect the ObjectProperty * return value to see what
number the added property was given.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Andreas Färber 

This describes how they work, but sadly not why we want them.  For such
arcane lore, we need to consult a guru.  Possibly via the mailing list
archive.

Digression: when you add a feature, please, please, *please* explain why
you need it right in the commit message.  Such rationale is useful
information, tends to age well, and can be quite laborious to
reconstruct later.

Even though I'm sure we discussed the intended purpose(s) of array
properties before, a quick grep of my list archive comes up mostly
empty, so I'm falling back to (foggy) memory.  Please correct mistakes
and fill in omissions.

We occasionally have a need for an array of properties where the length
of the array is not fixed at compile time.  Say in code common to
several related devices, where some have two frobs, some four, and a
future one may have some other number.

We could define properties frob0, frob1, ... frobN for some fixed N.
Users have to set them like frob0=...,frob1=... and so forth.  We need
code to reject frobI=... for I exeeding the actual limit.

Array properties spare developers picking a fixed N, and users adding an
index to the property name.  Whether the latter is a good idea is
unclear.  We need code to reject usage exceeding the actual limit.

A secondary use is (was?) avoiding memory region name clashes in code we
don't want to touch.  Discussed in the review of my attempt to strangle
array properties in 2014:

Message-ID: <87sihn9nji@blackfin.pond.sub.org>
https://lists.gnu.org/archive/html/qemu-devel/2014-11/msg02103.html

[PATCH v2] hw/i386: pass RNG seed to e820 setup table

2022-07-08 Thread Jason A. Donenfeld

Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way, in this
case by the e820 setup table, which supplies a place for one. This
commit adds passing this random seed via the table. It is confirmed to
be working with the Linux patch in the link.

Link: https://lore.kernel.org/lkml/20220708113907.891319-1-ja...@zx2c4.com/
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c| 19 ++-
 include/standard-headers/asm-x86/bootparam.h |  1 +
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 6003b4b2df..0724759eec 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -26,6 +26,7 @@
 #include "qemu/cutils.h"
 #include "qemu/units.h"
 #include "qemu/datadir.h"
+#include "qemu/guest-random.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qapi-visit-common.h"
@@ -1045,6 +1046,16 @@ void x86_load_linux(X86MachineState *x86ms,
 }
 fclose(f);
 
+setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
+kernel_size = setup_data_offset + sizeof(struct setup_data) + 32;
+kernel = g_realloc(kernel, kernel_size);
+stq_p(header + 0x250, prot_addr + setup_data_offset);
+setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data->next = 0;
+setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
+setup_data->len = cpu_to_le32(32);
+qemu_guest_getrandom_nofail(setup_data->data, 32);
+
 /* append dtb to kernel */
 if (dtb_filename) {
 if (protocol < 0x209) {
@@ -1059,13 +1070,11 @@ void x86_load_linux(X86MachineState *x86ms,
 exit(1);
 }
 
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
+kernel_size += sizeof(struct setup_data) + dtb_size;
 kernel = g_realloc(kernel, kernel_size);
 
-stq_p(header + 0x250, prot_addr + setup_data_offset);
-
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data->next = prot_addr + setup_data_offset + sizeof(*setup_data) 
+ setup_data->len;
+++setup_data;
 setup_data->next = 0;
 setup_data->type = cpu_to_le32(SETUP_DTB);
 setup_data->len = cpu_to_le32(dtb_size);
diff --git a/include/standard-headers/asm-x86/bootparam.h 
b/include/standard-headers/asm-x86/bootparam.h
index 072e2ed546..b2aaad10e5 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -10,6 +10,7 @@
 #define SETUP_EFI  4
 #define SETUP_APPLE_PROPERTIES 5
 #define SETUP_JAILHOUSE6
+#define SETUP_RNG_SEED 9
 
 #define SETUP_INDIRECT (1<<31)
 
-- 
2.35.1

Re: The case for array properties (was: [PULL 14/15] qdev: Base object creation on QDict rather than QemuOpts)

On Fri, Jul 08, 2022 at 01:40:43PM +0200, Markus Armbruster wrote:
> Cc'ing QOM maintainers.
> 
> Peter Maydell  writes:
> 
> > On Mon, 4 Jul 2022 at 05:50, Markus Armbruster  wrote:
> >> My initial (knee-jerk) reaction to breaking array properties: Faster,
> >> Pussycat! Kill! Kill!
> >
> > In an ideal world, what would you replace them with?
> 
> Let's first recapitulate their intended purpose.
> 
> commit 339659041f87a76f8b71ad3d12cadfc5f89b4bb3q
> Author: Peter Crosthwaite 
> Date:   Tue Aug 19 23:55:52 2014 -0700
> 
> qom: Add automatic arrayification to object_property_add()
> 
> If "[*]" is given as the last part of a QOM property name, treat that
> as an array property. The added property is given the first available
> name, replacing the * with a decimal number counting from 0.
> 
> First add with name "foo[*]" will be "foo[0]". Second "foo[1]" and so
> on.
> 
> Callers may inspect the ObjectProperty * return value to see what
> number the added property was given.
> 
> Signed-off-by: Peter Crosthwaite 
> Signed-off-by: Andreas Färber 
> 
> This describes how they work, but sadly not why we want them.  For such
> arcane lore, we need to consult a guru.  Possibly via the mailing list
> archive.

Also doesn't describe why we need to explicitly set the array length
upfront, rather than inferring it from the set of elements that are
specified, auto-extending the array bounds as we set each property.

> Digression: when you add a feature, please, please, *please* explain why
> you need it right in the commit message.  Such rationale is useful
> information, tends to age well, and can be quite laborious to
> reconstruct later.
> 
> Even though I'm sure we discussed the intended purpose(s) of array
> properties before, a quick grep of my list archive comes up mostly
> empty, so I'm falling back to (foggy) memory.  Please correct mistakes
> and fill in omissions.
> 
> We occasionally have a need for an array of properties where the length
> of the array is not fixed at compile time.  Say in code common to
> several related devices, where some have two frobs, some four, and a
> future one may have some other number.
> 
> We could define properties frob0, frob1, ... frobN for some fixed N.
> Users have to set them like frob0=...,frob1=... and so forth.  We need
> code to reject frobI=... for I exeeding the actual limit.
> 
> Array properties spare developers picking a fixed N, and users adding an
> index to the property name.  Whether the latter is a good idea is
> unclear.  We need code to reject usage exceeding the actual limit.

If we consider that our canonical representation is aiming to be QAPI,
and QAPI has unbounded arrays, then by implication if we want a mapping
to a flat CLI syntax, then we need some mechanism for unbounded arrays.

It would be valid to argue that we shouldn'be be trying to map the full
expressiveness of QAPI into a flat CLI syntax though, and should just
strive for full JSON everywhere.

Indeed every time we have these discussions, I wish we were already at
the "full JSON everywhere" point, so we can stop consuming our time
debating how to flatten JSON structure into CLI options. But since
these array props already exist, we need to find a way out of the
problem...

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] hw/i386: pass RNG seed to e820 setup table

On Thu, Jun 30, 2022 at 01:37:17PM +0200, Jason A. Donenfeld wrote:
> Tiny machines optimized for fast boot time generally don't use EFI,
> which means a random seed has to be supplied some other way, in this
> case by the e820 setup table, which supplies a place for one. This
> commit adds passing this random seed via the table. It is confirmed to
> be working with the Linux patch in the link.

IIUC, this approach will only expose the random seed when QEMU
is booted using -kernel + -initrd args.

I agree with what you say about most VMs not using UEFI right now.
I'd say the majority of general purpose VMs are using SeaBIOS
still. The usage of -kernel + -initrd, is typically for more
specialized use cases. 

IOW, exposing random seed via the setup table feels like it'll
have a somewhat limited benefit.

Can we get an approach that exposes a random seed regardless of
whether using -kernel, or seabios, or uefi, or $whatever firmware ?

Perhaps (ab)use 'fw_cfg', which is exposed for any x86 VM no matter
what config it has for booting ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] hw/i386: pass RNG seed to e820 setup table

2022-07-08 Thread Jason A. Donenfeld

Hi Daniel,

On Fri, Jul 8, 2022 at 2:00 PM Daniel P. Berrangé  wrote:
>
> On Thu, Jun 30, 2022 at 01:37:17PM +0200, Jason A. Donenfeld wrote:
> > Tiny machines optimized for fast boot time generally don't use EFI,
> > which means a random seed has to be supplied some other way, in this
> > case by the e820 setup table, which supplies a place for one. This
> > commit adds passing this random seed via the table. It is confirmed to
> > be working with the Linux patch in the link.
>
> IIUC, this approach will only expose the random seed when QEMU
> is booted using -kernel + -initrd args.
>
> I agree with what you say about most VMs not using UEFI right now.
> I'd say the majority of general purpose VMs are using SeaBIOS
> still. The usage of -kernel + -initrd, is typically for more
> specialized use cases.

Highly disagree, based on seeing a lot of real world deployment.
Furthermore, this is going to be used within Linux itself for kexec,
so it makes sense to use it here too.

> Can we get an approach that exposes a random seed regardless of
> whether using -kernel, or seabios, or uefi, or $whatever firmware ?

No.

> Perhaps (ab)use 'fw_cfg', which is exposed for any x86 VM no matter
> what config it has for booting ?

That approach is super messy and doesn't work. I've already gone down
that route.

The entire point here is to include the seed on this part of the boot
protocol. There might be other opportunities for doing it elsewhere.
For example, EFI already has a thing.

Please don't sink a good idea because it doesn't handle every possible
use case. That type of mentality is just going to result in nothing
ever getting done anywhere, making a decades old problem last for
another decade. This patch here is simple and makes a tangible
incremental advance toward something good, and fits the pattern of how
it's done on all other platforms.

Thanks,
Jason

Re: The case for array properties

Daniel P. Berrangé  writes:

> On Fri, Jul 08, 2022 at 01:40:43PM +0200, Markus Armbruster wrote:
>> Cc'ing QOM maintainers.
>> 
>> Peter Maydell  writes:
>> 
>> > On Mon, 4 Jul 2022 at 05:50, Markus Armbruster  wrote:
>> >> My initial (knee-jerk) reaction to breaking array properties: Faster,
>> >> Pussycat! Kill! Kill!
>> >
>> > In an ideal world, what would you replace them with?
>> 
>> Let's first recapitulate their intended purpose.
>> 
>> commit 339659041f87a76f8b71ad3d12cadfc5f89b4bb3q
>> Author: Peter Crosthwaite 
>> Date:   Tue Aug 19 23:55:52 2014 -0700
>> 
>> qom: Add automatic arrayification to object_property_add()
>> 
>> If "[*]" is given as the last part of a QOM property name, treat that
>> as an array property. The added property is given the first available
>> name, replacing the * with a decimal number counting from 0.
>> 
>> First add with name "foo[*]" will be "foo[0]". Second "foo[1]" and so
>> on.
>> 
>> Callers may inspect the ObjectProperty * return value to see what
>> number the added property was given.
>> 
>> Signed-off-by: Peter Crosthwaite 
>> Signed-off-by: Andreas Färber 
>> 
>> This describes how they work, but sadly not why we want them.  For such
>> arcane lore, we need to consult a guru.  Possibly via the mailing list
>> archive.
>
> Also doesn't describe why we need to explicitly set the array length
> upfront, rather than inferring it from the set of elements that are
> specified, auto-extending the array bounds as we set each property.
>
>> Digression: when you add a feature, please, please, *please* explain why
>> you need it right in the commit message.  Such rationale is useful
>> information, tends to age well, and can be quite laborious to
>> reconstruct later.
>> 
>> Even though I'm sure we discussed the intended purpose(s) of array
>> properties before, a quick grep of my list archive comes up mostly
>> empty, so I'm falling back to (foggy) memory.  Please correct mistakes
>> and fill in omissions.
>> 
>> We occasionally have a need for an array of properties where the length
>> of the array is not fixed at compile time.  Say in code common to
>> several related devices, where some have two frobs, some four, and a
>> future one may have some other number.
>> 
>> We could define properties frob0, frob1, ... frobN for some fixed N.
>> Users have to set them like frob0=...,frob1=... and so forth.  We need
>> code to reject frobI=... for I exeeding the actual limit.
>> 
>> Array properties spare developers picking a fixed N, and users adding an
>> index to the property name.  Whether the latter is a good idea is
>> unclear.  We need code to reject usage exceeding the actual limit.
>
> If we consider that our canonical representation is aiming to be QAPI,
> and QAPI has unbounded arrays, then by implication if we want a mapping
> to a flat CLI syntax, then we need some mechanism for unbounded arrays.
>
> It would be valid to argue that we shouldn'be be trying to map the full
> expressiveness of QAPI into a flat CLI syntax though, and should just
> strive for full JSON everywhere.
>
> Indeed every time we have these discussions, I wish we were already at
> the "full JSON everywhere" point, so we can stop consuming our time
> debating how to flatten JSON structure into CLI options. But since
> these array props already exist, we need to find a way out of the
> problem...

This isn't just a CLI problem, it's worse: we have property-setting code
that relies on "automatic arrayification".

Re: Intermittent meson failures on msys2

2022-07-08 Thread Marc-André Lureau

Hi

On Mon, Jun 27, 2022 at 6:41 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> Hi guys,
>
> There's an occasional failure on msys2, where meson fails to capture the
> output of a build
> script.  E.g.
>
> https://gitlab.com/qemu-project/qemu/-/jobs/2642051161
>
> FAILED: ui/input-keymap-qcode-to-linux.c.inc
> "C:/GitLab-Runner/builds/qemu-project/qemu/msys64/mingw64/bin/python3.exe"
> "C:/GitLab-Runner/builds/qemu-project/qemu/meson/meson.py" "--internal"
> "exe" "--capture"
> "ui/input-keymap-qcode-to-linux.c.inc" "--"
> "C:/GitLab-Runner/builds/qemu-project/qemu/msys64/mingw64/bin/python3.exe"
> "../ui/keycodemapdb/tools/keymap-gen" "code-map" "--lang" "glib2"
> "--varname"
> "qemu_input_map_qcode_to_linux" "../ui/keycodemapdb/data/keymaps.csv"
> "qcode" "linux"
> [301/1665] Generating input-keymap-qcode-to-qnum.c.inc with a custom
> command (wrapped by
> meson to capture output)
> ninja: build stopped: subcommand failed.
>
>
> https://gitlab.com/qemu-project/qemu/-/jobs/2625836697
>
> FAILED: ui/shader/texture-blit-frag.h
> "C:/GitLab-Runner/builds/qemu-project/qemu/msys64/mingw64/bin/python3.exe"
> "C:/GitLab-Runner/builds/qemu-project/qemu/meson/meson.py" "--internal"
> "exe" "--capture"
> "ui/shader/texture-blit-frag.h" "--" "perl"
> "C:/GitLab-Runner/builds/qemu-project/qemu/scripts/shaderinclude.pl"
> "../ui/shader/texture-blit.frag"
> [313/1663] Generating texture-blit-vert.h with a custom command (wrapped
> by meson to
> capture output)
> ninja: build stopped: subcommand failed.
>
>
> Could you have a look please?
>
>
>
Ah, we don't have artifacts for msys2 builds it seems, that would perhaps
help. It would make sense to at least take meson-logs/*.txt. I'll work on a
patch.

My guess is that CI randomly fails with "too many opened files", as I have
seen that regularly on various projects with Windows runners. And here,
it's probably reaching limits when running python/perl scripts
simultaneously... I don't see an easy way to solve that if that's the issue.

-- 
Marc-André Lureau

Re: [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions

Eugenio Perez Martin  writes:

> On Thu, Jul 7, 2022 at 8:23 AM Markus Armbruster  wrote:
>>
>> Eugenio Pérez  writes:
>>
>> > Finally offering the possibility to enable SVQ from the command line.
>>
>> QMP, too, I guess.
>>
>
> Hi Markus,
>
> I'm not sure what you mean. Dynamic enabling / disabling of SVQ was
> delayed, and now it's only possible to enable or disable it from the
> beginning of the run of qemu. Do you mean to enable SVQ before
> starting the guest somehow?

QMP command netdev_add takes a Netdev argument.  Branch 'vhost-vdpa' has
member x-svq.  Are you telling me it doesn't work there?  Or only before
the guest runs?

[...]

Re: [PATCH 22/22] vdpa: Add x-svq to NetdevVhostVDPAOptions

Eugenio Pérez  writes:

> Finally offering the possibility to enable SVQ from the command line.
>
> Signed-off-by: Eugenio Pérez 

Please carry forward Acked-by and Reviewed-by you received for prior
revisions unless you change something that invalidates them.  This
ensures reviewers get credit, and also saves them time: if the tag is
still there, nothing much changed, and no need to look at it again.

Re: Intermittent meson failures on msys2


On 7/8/22 18:11, Marc-André Lureau wrote:
My guess is that CI randomly fails with "too many opened files", as I have seen that 
regularly on various projects with Windows runners. And here, it's probably reaching 
limits when running python/perl scripts simultaneously... I don't see an easy way to solve 
that if that's the issue.


If that's really the issue, with no solution, then we should turn these jobs 
off.


r~

Re: Intermittent meson failures on msys2

On Fri, Jul 08, 2022 at 04:41:48PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Jun 27, 2022 at 6:41 AM Richard Henderson <
> richard.hender...@linaro.org> wrote:
> 
> > Hi guys,
> >
> > There's an occasional failure on msys2, where meson fails to capture the
> > output of a build
> > script.  E.g.
> >
> > https://gitlab.com/qemu-project/qemu/-/jobs/2642051161
> >
> > FAILED: ui/input-keymap-qcode-to-linux.c.inc
> > "C:/GitLab-Runner/builds/qemu-project/qemu/msys64/mingw64/bin/python3.exe"
> > "C:/GitLab-Runner/builds/qemu-project/qemu/meson/meson.py" "--internal"
> > "exe" "--capture"
> > "ui/input-keymap-qcode-to-linux.c.inc" "--"
> > "C:/GitLab-Runner/builds/qemu-project/qemu/msys64/mingw64/bin/python3.exe"
> > "../ui/keycodemapdb/tools/keymap-gen" "code-map" "--lang" "glib2"
> > "--varname"
> > "qemu_input_map_qcode_to_linux" "../ui/keycodemapdb/data/keymaps.csv"
> > "qcode" "linux"
> > [301/1665] Generating input-keymap-qcode-to-qnum.c.inc with a custom
> > command (wrapped by
> > meson to capture output)
> > ninja: build stopped: subcommand failed.
> >
> >
> > https://gitlab.com/qemu-project/qemu/-/jobs/2625836697
> >
> > FAILED: ui/shader/texture-blit-frag.h
> > "C:/GitLab-Runner/builds/qemu-project/qemu/msys64/mingw64/bin/python3.exe"
> > "C:/GitLab-Runner/builds/qemu-project/qemu/meson/meson.py" "--internal"
> > "exe" "--capture"
> > "ui/shader/texture-blit-frag.h" "--" "perl"
> > "C:/GitLab-Runner/builds/qemu-project/qemu/scripts/shaderinclude.pl"
> > "../ui/shader/texture-blit.frag"
> > [313/1663] Generating texture-blit-vert.h with a custom command (wrapped
> > by meson to
> > capture output)
> > ninja: build stopped: subcommand failed.
> >
> >
> > Could you have a look please?
> >
> >
> >
> Ah, we don't have artifacts for msys2 builds it seems, that would perhaps
> help. It would make sense to at least take meson-logs/*.txt. I'll work on a
> patch.
> 
> My guess is that CI randomly fails with "too many opened files", as I have
> seen that regularly on various projects with Windows runners. And here,
> it's probably reaching limits when running python/perl scripts
> simultaneously... I don't see an easy way to solve that if that's the issue.

There shouldn't be very much parallelism even taking place, because

https://docs.gitlab.com/ee/ci/runners/saas/windows_saas_runner.html

says  "Windows runners execute your CI/CD jobs on n1-standard-2 
   instances with 2 vCPUs and 7.5 GB RAM. "

unless ninja is setting a parellism much higher than nCPUs ?


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH 00/10] enable pnv-phb user created devices

This series is built on top of

"[PATCH v3 00/12] powernv: introduce pnv-phb base/proxy devices" [1]

that is under review in [1]. I'm sending this last part of the pnv-phb
rework to allow everyone to see what's the endgame I'm planning with
this work.

The main differences between the approach taken here, versus the approach 
made by "[PATCH v2 00/16] powernv: introduce pnv-phb base/proxy
devices" [2], is:

- the Root Buses objects are now inheriting phb-id and chip-id. This
turned out to be a clean way of keeping the code QOM compliant, without
having to do things like dev->parent_bus->parent. All the attributes
that the root port needs are found in its bus parent;

- the logic exclusive to user created devices is all centered in a
single helper inside pnv-phb realize(). PHB3/PHB4 realize() are
oblivious to whether the device is user created or not. I believe this
approach is clearer than what I was doing before.

I'll respin/rebase this patches depending on the amount of changes
we have during the pnv-phb proxy device review.

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-06/msg04347.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2022-05/msg06254.html

Daniel Henrique Barboza (10):
  ppc/pnv: add phb-id/chip-id PnvPHB3RootBus properties
  ppc/pnv: add phb-id/chip-id PnvPHB4RootBus properties
  ppc/pnv: set root port chassis and slot using Bus properties
  ppc/pnv: add helpers for pnv-phb user devices
  ppc/pnv: turn chip8->phbs[] into a PnvPHB* array
  ppc/pnv: enable user created pnv-phb for powernv8
  ppc/pnv: add PHB4 helpers for user created pnv-phb
  ppc/pnv: enable user created pnv-phb powernv9
  ppc/pnv: change pnv_phb4_get_pec() to also retrieve chip10->pecs
  ppc/pnv: user creatable pnv-phb for powernv10

 hw/pci-host/pnv_phb.c  | 166 ++---
 hw/pci-host/pnv_phb3.c |  50 ++
 hw/pci-host/pnv_phb4.c |  51 ++
 hw/pci-host/pnv_phb4_pec.c |   6 +-
 hw/ppc/pnv.c   |  30 +-
 include/hw/pci-host/pnv_phb3.h |   9 +-
 include/hw/pci-host/pnv_phb4.h |  10 ++
 include/hw/ppc/pnv.h   |   6 +-
 8 files changed, 308 insertions(+), 20 deletions(-)

-- 
2.36.1

[PATCH 02/10] ppc/pnv: add phb-id/chip-id PnvPHB4RootBus properties

The same rationale provided in the PHB3 bus case applies here.

Note: we could have merged both buses in a single object, like we did
with the root ports, and spare some boilerplate. The reason we opted to
preserve both buses objects is twofold:

- there's not user side advantage in doing so. Unifying the root ports
presents a clear user QOL change when we enable user created devices back.
The buses objects, aside from having a different QOM name, is transparent
to the user;

- we leave a door opened in case we want to increase the root port limit
for phb4/5 later on without having to deal with phb3 code.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4.c | 51 ++
 include/hw/pci-host/pnv_phb4.h | 10 +++
 2 files changed, 61 insertions(+)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index fefdd3ad89..b4f750bf6d 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -1567,6 +1567,12 @@ void pnv_phb4_bus_init(DeviceState *dev, PnvPHB4 *phb)
  pnv_phb4_set_irq, pnv_phb4_map_irq, phb,
  &phb->pci_mmio, &phb->pci_io,
  0, 4, TYPE_PNV_PHB4_ROOT_BUS);
+
+object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id,
+&error_abort);
+object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id,
+&error_abort);
+
 pci_setup_iommu(pci->bus, pnv_phb4_dma_iommu, phb);
 pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
 }
@@ -1724,10 +1730,55 @@ static const TypeInfo pnv_phb5_type_info = {
 .instance_size = sizeof(PnvPHB4),
 };
 
+static void pnv_phb4_root_bus_get_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+{
+PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj);
+uint64_t value = 0;
+
+if (strcmp(name, "phb-id") == 0) {
+value = bus->phb_id;
+} else {
+value = bus->chip_id;
+}
+
+visit_type_size(v, name, &value, errp);
+}
+
+static void pnv_phb4_root_bus_set_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+
+{
+PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj);
+uint64_t value;
+
+if (!visit_type_size(v, name, &value, errp)) {
+return;
+}
+
+if (strcmp(name, "phb-id") == 0) {
+bus->phb_id = value;
+} else {
+bus->chip_id = value;
+}
+}
+
 static void pnv_phb4_root_bus_class_init(ObjectClass *klass, void *data)
 {
 BusClass *k = BUS_CLASS(klass);
 
+object_class_property_add(klass, "phb-id", "int",
+  pnv_phb4_root_bus_get_prop,
+  pnv_phb4_root_bus_set_prop,
+  NULL, NULL);
+
+object_class_property_add(klass, "chip-id", "int",
+  pnv_phb4_root_bus_get_prop,
+  pnv_phb4_root_bus_set_prop,
+  NULL, NULL);
+
 /*
  * PHB4 has only a single root complex. Enforce the limit on the
  * parent bus
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index 20aa4819d3..50d4faa001 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -45,7 +45,17 @@ typedef struct PnvPhb4DMASpace {
 QLIST_ENTRY(PnvPhb4DMASpace) list;
 } PnvPhb4DMASpace;
 
+/*
+ * PHB4 PCIe Root Bus
+ */
 #define TYPE_PNV_PHB4_ROOT_BUS "pnv-phb4-root"
+struct PnvPHB4RootBus {
+PCIBus parent;
+
+uint32_t chip_id;
+uint32_t phb_id;
+};
+OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB4RootBus, PNV_PHB4_ROOT_BUS)
 
 /*
  * PHB4 PCIe Host Bridge for PowerNV machines (POWER9)
-- 
2.36.1

[PATCH 04/10] ppc/pnv: add helpers for pnv-phb user devices

pnv_parent_qom_fixup() and pnv_parent_bus_fixup() are versions of the
helpers that were reverted by commit 9c10d86fee "ppc/pnv: Remove
user-created PHB{3,4,5} devices". They are needed to amend the QOM and
bus hierarchies of user created pnv-phbs, matching them with default
pnv-phbs.

A new helper pnv_phb_user_device_init() is created to handle
user-created devices setup. We're going to call it inside
pnv_phb_realize() in case we're realizing an user created device. This
will centralize all user device realated in a single spot, leaving the
realize functions of the phb3/phb4 backends untouched.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb.c | 69 +++
 1 file changed, 69 insertions(+)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 826c0c144e..da779dc298 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -18,6 +18,37 @@
 #include "hw/qdev-properties.h"
 #include "qom/object.h"
 
+
+/*
+ * Set the QOM parent of an object child. If the device state
+ * associated with the child has an id, use it as QOM id. Otherwise
+ * use object_typename[index] as QOM id.
+ */
+static void pnv_parent_qom_fixup(Object *parent, Object *child, int index)
+{
+g_autofree char *default_id =
+g_strdup_printf("%s[%d]", object_get_typename(child), index);
+const char *dev_id = DEVICE(child)->id;
+
+if (child->parent == parent) {
+return;
+}
+
+object_ref(child);
+object_unparent(child);
+object_property_add_child(parent, dev_id ? dev_id : default_id, child);
+object_unref(child);
+}
+
+static void pnv_parent_bus_fixup(DeviceState *parent, DeviceState *child)
+{
+BusState *parent_bus = qdev_get_parent_bus(parent);
+
+if (!qdev_set_parent_bus(child, parent_bus, &error_fatal)) {
+return;
+}
+}
+
 /*
  * Attach a root port device.
  *
@@ -41,6 +72,36 @@ static void pnv_phb_attach_root_port(PCIHostState *pci)
 pci_realize_and_unref(root, pci->bus, &error_fatal);
 }
 
+/*
+ * User created devices won't have the initial setup that default
+ * devices have. This setup consists of assigning a parent device
+ * (chip for PHB3, PEC for PHB4/5) that will be the QOM/bus parent
+ * of the PHB.
+ */
+static void pnv_phb_user_device_init(PnvPHB *phb)
+{
+PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
+PnvChip *chip = pnv_get_chip(pnv, phb->chip_id);
+Object *parent = NULL;
+
+if (!chip) {
+error_setg(&error_fatal, "invalid chip id: %d", phb->chip_id);
+return;
+}
+
+/*
+ * Reparent user created devices to the chip to build
+ * correctly the device tree. pnv_xscom_dt() needs every
+ * PHB to be a child of the chip to build the DT correctly.
+ *
+ * TODO: for version 3 we're still parenting the PHB with the
+ * chip. We should parent with a (so far not implemented)
+ * PHB3 PEC device.
+ */
+pnv_parent_qom_fixup(parent, OBJECT(phb), phb->phb_id);
+pnv_parent_bus_fixup(DEVICE(chip), DEVICE(phb));
+}
+
 static void pnv_phb_realize(DeviceState *dev, Error **errp)
 {
 PnvPHB *phb = PNV_PHB(dev);
@@ -74,6 +135,14 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
 object_property_set_uint(phb->backend, "chip-id", phb->chip_id, errp);
 object_property_set_link(phb->backend, "phb-base", OBJECT(phb), errp);
 
+/*
+ * Handle user created devices. User devices will not have a
+ * pointer to a chip (PHB3) and a PEC (PHB4/5).
+ */
+if (!phb->chip && !phb->pec) {
+pnv_phb_user_device_init(phb);
+}
+
 if (phb->version == 3) {
 object_property_set_link(phb->backend, "chip",
  OBJECT(phb->chip), errp);
-- 
2.36.1

[PATCH 06/10] ppc/pnv: enable user created pnv-phb for powernv8

The bulk of the work was already done by previous patches.

Use defaults_enabled() to determine whether we need to create the
default devices or not.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb.c | 9 +++--
 hw/ppc/pnv.c  | 6 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 077f391d59..953c384bf6 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -17,6 +17,7 @@
 #include "hw/ppc/pnv.h"
 #include "hw/qdev-properties.h"
 #include "qom/object.h"
+#include "sysemu/sysemu.h"
 
 
 /*
@@ -171,6 +172,10 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
 pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
 }
 
+if (phb->version == 3 && !defaults_enabled()) {
+return;
+}
+
 pnv_phb_attach_root_port(pci);
 }
 
@@ -206,7 +211,7 @@ static void pnv_phb_class_init(ObjectClass *klass, void 
*data)
 dc->realize = pnv_phb_realize;
 device_class_set_props(dc, pnv_phb_properties);
 set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
-dc->user_creatable = false;
+dc->user_creatable = true;
 }
 
 static void pnv_phb_root_port_reset(DeviceState *dev)
@@ -297,7 +302,7 @@ static void pnv_phb_root_port_class_init(ObjectClass 
*klass, void *data)
 device_class_set_parent_reset(dc, pnv_phb_root_port_reset,
   &rpc->parent_reset);
 dc->reset = &pnv_phb_root_port_reset;
-dc->user_creatable = false;
+dc->user_creatable = true;
 
 k->vendor_id = PCI_VENDOR_ID_IBM;
 /* device_id will be written during realize() */
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index a94f269644..f5af40ce39 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1155,6 +1155,10 @@ static void pnv_chip_power8_instance_init(Object *obj)
 
 object_initialize_child(obj, "homer", &chip8->homer, TYPE_PNV8_HOMER);
 
+if (!defaults_enabled()) {
+return;
+}
+
 chip8->num_phbs = pcc->num_phbs;
 
 for (i = 0; i < chip8->num_phbs; i++) {
@@ -2107,6 +2111,8 @@ static void pnv_machine_power8_class_init(ObjectClass 
*oc, void *data)
 
 pmc->compat = compat;
 pmc->compat_size = sizeof(compat);
+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
 }
 
 static void pnv_machine_power9_class_init(ObjectClass *oc, void *data)
-- 
2.36.1

[PATCH 01/10] ppc/pnv: add phb-id/chip-id PnvPHB3RootBus properties

We rely on the phb-id and chip-id, which are PHB properties, to assign
chassis and slot to the root port. For default devices this is no big
deal: the root port is being created under pnv_phb_realize() and the
values are being passed on via the 'index' and 'chip-id' of the
pnv_phb_attach_root_port() helper.

If we want to implement user created root ports we have a problem. The
user created root port will not be aware of which PHB it belongs to,
unless we're willing to violate QOM best practices and access the PHB
via dev->parent_bus->parent. What we can do is to access the root bus
parent bus.

Since we're already assigning the root port as QOM child of the bus, and
the bus is initiated using PHB properties, let's add phb-id and chip-id
as properties of the bus. This will allow us trivial access to them, for
both user-created and default root ports, without doing anything too
shady with QOM.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb3.c | 50 ++
 include/hw/pci-host/pnv_phb3.h |  9 +-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/hw/pci-host/pnv_phb3.c b/hw/pci-host/pnv_phb3.c
index 2966374008..b8e5b2423e 100644
--- a/hw/pci-host/pnv_phb3.c
+++ b/hw/pci-host/pnv_phb3.c
@@ -1006,6 +1006,11 @@ void pnv_phb3_bus_init(DeviceState *dev, PnvPHB3 *phb)
  &phb->pci_mmio, &phb->pci_io,
  0, 4, TYPE_PNV_PHB3_ROOT_BUS);
 
+object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id,
+&error_abort);
+object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id,
+&error_abort);
+
 pci_setup_iommu(pci->bus, pnv_phb3_dma_iommu, phb);
 }
 
@@ -1105,10 +1110,55 @@ static const TypeInfo pnv_phb3_type_info = {
 .instance_init = pnv_phb3_instance_init,
 };
 
+static void pnv_phb3_root_bus_get_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+{
+PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj);
+uint64_t value = 0;
+
+if (strcmp(name, "phb-id") == 0) {
+value = bus->phb_id;
+} else {
+value = bus->chip_id;
+}
+
+visit_type_size(v, name, &value, errp);
+}
+
+static void pnv_phb3_root_bus_set_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+
+{
+PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj);
+uint64_t value;
+
+if (!visit_type_size(v, name, &value, errp)) {
+return;
+}
+
+if (strcmp(name, "phb-id") == 0) {
+bus->phb_id = value;
+} else {
+bus->chip_id = value;
+}
+}
+
 static void pnv_phb3_root_bus_class_init(ObjectClass *klass, void *data)
 {
 BusClass *k = BUS_CLASS(klass);
 
+object_class_property_add(klass, "phb-id", "int",
+  pnv_phb3_root_bus_get_prop,
+  pnv_phb3_root_bus_set_prop,
+  NULL, NULL);
+
+object_class_property_add(klass, "chip-id", "int",
+  pnv_phb3_root_bus_get_prop,
+  pnv_phb3_root_bus_set_prop,
+  NULL, NULL);
+
 /*
  * PHB3 has only a single root complex. Enforce the limit on the
  * parent bus
diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
index bff69201d9..4854f6d2f6 100644
--- a/include/hw/pci-host/pnv_phb3.h
+++ b/include/hw/pci-host/pnv_phb3.h
@@ -104,9 +104,16 @@ struct PnvPBCQState {
 };
 
 /*
- * PHB3 PCIe Root port
+ * PHB3 PCIe Root Bus
  */
 #define TYPE_PNV_PHB3_ROOT_BUS "pnv-phb3-root"
+struct PnvPHB3RootBus {
+PCIBus parent;
+
+uint32_t chip_id;
+uint32_t phb_id;
+};
+OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB3RootBus, PNV_PHB3_ROOT_BUS)
 
 /*
  * PHB3 PCIe Host Bridge for PowerNV machines (POWER8)
-- 
2.36.1

[PATCH 07/10] ppc/pnv: add PHB4 helpers for user created pnv-phb

The PHB4 backend relies on a link with the corresponding PEC element.
This is trivial to do during machine_init() time for default devices,
but not so much for user created ones.

pnv_phb4_get_pec() is a small variation of the function that was
reverted by commit 9c10d86fee "ppc/pnv: Remove user-created PHB{3,4,5}
devices". We'll use it to determine the appropriate PEC for a given user
created pnv-phb that uses a PHB4 backend.

This is done during realize() time, in pnv_phb_user_device_init().

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 953c384bf6..9807d093f5 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -50,6 +50,34 @@ static void pnv_parent_bus_fixup(DeviceState *parent, 
DeviceState *child)
 }
 }
 
+static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb,
+ Error **errp)
+{
+Pnv9Chip *chip9 = PNV9_CHIP(chip);
+int chip_id = phb->chip_id;
+int index = phb->phb_id;
+int i, j;
+
+for (i = 0; i < chip->num_pecs; i++) {
+/*
+ * For each PEC, check the amount of phbs it supports
+ * and see if the given phb4 index matches an index.
+ */
+PnvPhb4PecState *pec = &chip9->pecs[i];
+
+for (j = 0; j < pec->num_phbs; j++) {
+if (index == pnv_phb4_pec_get_phb_id(pec, j)) {
+return pec;
+}
+}
+}
+error_setg(errp,
+   "pnv-phb4 chip-id %d index %d didn't match any existing PEC",
+   chip_id, index);
+
+return NULL;
+}
+
 /*
  * Attach a root port device.
  *
@@ -99,6 +127,17 @@ static void pnv_phb_user_device_init(PnvPHB *phb)
 chip8->num_phbs++;
 
 parent = OBJECT(phb->chip);
+} else {
+Error *local_err = NULL;
+
+phb->pec = pnv_phb4_get_pec(chip, PNV_PHB4(phb->backend), &local_err);
+
+if (local_err) {
+error_propagate(&error_fatal, local_err);
+return;
+}
+
+parent = OBJECT(phb->pec);
 }
 
 /*
-- 
2.36.1

[PATCH 08/10] ppc/pnv: enable user created pnv-phb powernv9

Enable pnv-phb user created devices for powernv9 now that we have
everything in place.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb.c  | 2 +-
 hw/pci-host/pnv_phb4_pec.c | 6 --
 hw/ppc/pnv.c   | 2 ++
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 9807d093f5..c241e90036 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -211,7 +211,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
 pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
 }
 
-if (phb->version == 3 && !defaults_enabled()) {
+if (!defaults_enabled()) {
 return;
 }
 
diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
index 8dc363d69c..9871f462cd 100644
--- a/hw/pci-host/pnv_phb4_pec.c
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -146,8 +146,10 @@ static void pnv_pec_realize(DeviceState *dev, Error **errp)
 pec->num_phbs = pecc->num_phbs[pec->index];
 
 /* Create PHBs if running with defaults */
-for (i = 0; i < pec->num_phbs; i++) {
-pnv_pec_default_phb_realize(pec, i, errp);
+if (defaults_enabled()) {
+for (i = 0; i < pec->num_phbs; i++) {
+pnv_pec_default_phb_realize(pec, i, errp);
+}
 }
 
 /* Initialize the XSCOM regions for the PEC registers */
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index f5af40ce39..32040a52c8 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2138,6 +2138,8 @@ static void pnv_machine_power9_class_init(ObjectClass 
*oc, void *data)
 pmc->compat = compat;
 pmc->compat_size = sizeof(compat);
 pmc->dt_power_mgt = pnv_dt_power_mgt;
+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
 }
 
 static void pnv_machine_power10_class_init(ObjectClass *oc, void *data)
-- 
2.36.1

[PATCH 03/10] ppc/pnv: set root port chassis and slot using Bus properties

For default root ports we have a way of accessing chassis and slot,
before root_port_realize(), via pnv_phb_attach_root_port(). For the
future user created root ports this won't be the case: we can't use
this helper because we don't have access to the PHB phb-id/chip-id
values.

In earlier patches we've added phb-id and chip-id to pnv-phb-root-bus
objects. We're now able to use the bus to retrieve them. The bus is
reachable for both user created and default devices, so we're changing
all the code paths. This also allow us to validate these changes with
the existing default devices.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index c47ed92462..826c0c144e 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -25,21 +25,19 @@
  * QOM id. 'chip_id' is going to be used as PCIE chassis for the
  * root port.
  */
-static void pnv_phb_attach_root_port(PCIHostState *pci, int index, int chip_id)
+static void pnv_phb_attach_root_port(PCIHostState *pci)
 {
 PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
-g_autofree char *default_id = g_strdup_printf("%s[%d]",
-  TYPE_PNV_PHB_ROOT_PORT,
-  index);
 const char *dev_id = DEVICE(root)->id;
+g_autofree char *default_id = NULL;
+int index;
+
+index = object_property_get_int(OBJECT(pci->bus), "phb-id", &error_fatal);
+default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index);
 
 object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id,
   OBJECT(root));
 
-/* Set unique chassis/slot values for the root port */
-qdev_prop_set_uint8(DEVICE(root), "chassis", chip_id);
-qdev_prop_set_uint16(DEVICE(root), "slot", index);
-
 pci_realize_and_unref(root, pci->bus, &error_fatal);
 }
 
@@ -93,7 +91,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
 pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
 }
 
-pnv_phb_attach_root_port(pci, phb->phb_id, phb->chip_id);
+pnv_phb_attach_root_port(pci);
 }
 
 static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge,
@@ -162,9 +160,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
 {
 PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
 PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev);
+PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev));
 PCIDevice *pci = PCI_DEVICE(dev);
 uint16_t device_id = 0;
 Error *local_err = NULL;
+int chip_id, index;
+
+chip_id = object_property_get_int(OBJECT(bus), "chip-id", &error_fatal);
+index = object_property_get_int(OBJECT(bus), "phb-id", &error_fatal);
+
+/* Set unique chassis/slot values for the root port */
+qdev_prop_set_uint8(dev, "chassis", chip_id);
+qdev_prop_set_uint16(dev, "slot", index);
 
 rpc->parent_realize(dev, &local_err);
 if (local_err) {
-- 
2.36.1

[PATCH 05/10] ppc/pnv: turn chip8->phbs[] into a PnvPHB* array

When enabling user created PHBs (a change reverted by commit 9c10d86fee)
we were handling PHBs created by default versus by the user in different
manners. The only difference between these PHBs is that one will have a
valid phb3->chip that is assigned during pnv_chip_power8_realize(),
while the user created needs to search which chip it belongs to.

Aside from that there shouldn't be any difference. Making the default
PHBs behave in line with the user created ones will make it easier to
re-introduce them later on. It will also make the code easier to follow
since we are dealing with them in equal manner.

The first step is to turn chip8->phbs[] into a PnvPHB3 pointer array.
This will allow us to assign user created PHBs into it later on. The way
we initilize the default case is now more in line with that would happen
with the user created case: the object is created, parented by the chip
because pnv_xscom_dt() relies on it, and then assigned to the array.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb.c | 11 +++
 hw/ppc/pnv.c  | 20 +++-
 include/hw/ppc/pnv.h  |  6 +-
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index da779dc298..077f391d59 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -89,6 +89,17 @@ static void pnv_phb_user_device_init(PnvPHB *phb)
 return;
 }
 
+if (phb->version == 3) {
+Pnv8Chip *chip8 = PNV8_CHIP(chip);
+
+phb->chip = chip;
+
+chip8->phbs[chip8->num_phbs] = phb;
+chip8->num_phbs++;
+
+parent = OBJECT(phb->chip);
+}
+
 /*
  * Reparent user created devices to the chip to build
  * correctly the device tree. pnv_xscom_dt() needs every
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index d649ed6b1b..a94f269644 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -661,7 +661,7 @@ static void pnv_chip_power8_pic_print_info(PnvChip *chip, 
Monitor *mon)
 ics_pic_print_info(&chip8->psi.ics, mon);
 
 for (i = 0; i < chip8->num_phbs; i++) {
-PnvPHB *phb = &chip8->phbs[i];
+PnvPHB *phb = chip8->phbs[i];
 PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
 
 pnv_phb3_msi_pic_print_info(&phb3->msis, mon);
@@ -1158,7 +1158,17 @@ static void pnv_chip_power8_instance_init(Object *obj)
 chip8->num_phbs = pcc->num_phbs;
 
 for (i = 0; i < chip8->num_phbs; i++) {
-object_initialize_child(obj, "phb[*]", &chip8->phbs[i], TYPE_PNV_PHB);
+PnvPHB *phb = PNV_PHB(object_new(TYPE_PNV_PHB));
+
+/*
+ * We need the chip to parent the PHB to allow the DT
+ * to build correctly (via pnv_xscom_dt()).
+ *
+ * TODO: the PHB should be parented by a PEC device that, at
+ * this moment, is not modelled powernv8/phb3.
+ */
+object_property_add_child(obj, "phb[*]", OBJECT(phb));
+chip8->phbs[i] = phb;
 }
 
 }
@@ -1274,7 +1284,7 @@ static void pnv_chip_power8_realize(DeviceState *dev, 
Error **errp)
 
 /* PHB controllers */
 for (i = 0; i < chip8->num_phbs; i++) {
-PnvPHB *phb = &chip8->phbs[i];
+PnvPHB *phb = chip8->phbs[i];
 
 object_property_set_int(OBJECT(phb), "index", i, &error_fatal);
 object_property_set_int(OBJECT(phb), "chip-id", chip->chip_id,
@@ -1942,7 +1952,7 @@ static ICSState *pnv_ics_get(XICSFabric *xi, int irq)
 }
 
 for (j = 0; j < chip8->num_phbs; j++) {
-PnvPHB *phb = &chip8->phbs[j];
+PnvPHB *phb = chip8->phbs[j];
 PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
 
 if (ics_valid_irq(&phb3->lsis, irq)) {
@@ -1981,7 +1991,7 @@ static void pnv_ics_resend(XICSFabric *xi)
 ics_resend(&chip8->psi.ics);
 
 for (j = 0; j < chip8->num_phbs; j++) {
-PnvPHB *phb = &chip8->phbs[j];
+PnvPHB *phb = chip8->phbs[j];
 PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
 
 ics_resend(&phb3->lsis);
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 033d907287..aea6128e7f 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -81,7 +81,11 @@ struct Pnv8Chip {
 PnvHomer homer;
 
 #define PNV8_CHIP_PHB3_MAX 4
-PnvPHB   phbs[PNV8_CHIP_PHB3_MAX];
+/*
+ * The array is used to allow quick access to the phbs by
+ * pnv_ics_get_child() and pnv_ics_resend_child().
+ */
+PnvPHB   *phbs[PNV8_CHIP_PHB3_MAX];
 uint32_t num_phbs;
 
 XICSFabric*xics;
-- 
2.36.1

[PATCH 09/10] ppc/pnv: change pnv_phb4_get_pec() to also retrieve chip10->pecs

The function assumes that we're always dealing with a PNV9_CHIP()
object. This is not the case when the pnv-phb device belongs to a
powernv10 machine.

Change pnv_phb4_get_pec() to be able to work with PNV10_CHIP() if
necessary.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index c241e90036..a5f3a8d256 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -53,17 +53,30 @@ static void pnv_parent_bus_fixup(DeviceState *parent, 
DeviceState *child)
 static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb,
  Error **errp)
 {
-Pnv9Chip *chip9 = PNV9_CHIP(chip);
+PnvPHB *phb_base = phb->phb_base;
+PnvPhb4PecState *pecs = NULL;
 int chip_id = phb->chip_id;
 int index = phb->phb_id;
 int i, j;
 
+if (phb_base->version == 4) {
+Pnv9Chip *chip9 = PNV9_CHIP(chip);
+
+pecs = chip9->pecs;
+} else if (phb_base->version == 5) {
+Pnv10Chip *chip10 = PNV10_CHIP(chip);
+
+pecs = chip10->pecs;
+} else {
+return NULL;
+}
+
 for (i = 0; i < chip->num_pecs; i++) {
 /*
  * For each PEC, check the amount of phbs it supports
  * and see if the given phb4 index matches an index.
  */
-PnvPhb4PecState *pec = &chip9->pecs[i];
+PnvPhb4PecState *pec = &pecs[i];
 
 for (j = 0; j < pec->num_phbs; j++) {
 if (index == pnv_phb4_pec_get_phb_id(pec, j)) {
-- 
2.36.1

[PATCH 10/10] ppc/pnv: user creatable pnv-phb for powernv10

Given that powernv9 and powernv10 uses the same pnv-phb backend, the
logic to allow user created pnv-phbs for powernv10 is already in place.
Let's flip the switch.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/pnv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 32040a52c8..b9c1bbaa84 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2163,6 +2163,8 @@ static void pnv_machine_power10_class_init(ObjectClass 
*oc, void *data)
 pmc->dt_power_mgt = pnv_dt_power_mgt;
 
 xfc->match_nvt = pnv10_xive_match_nvt;
+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
 }
 
 static bool pnv_machine_get_hb(Object *obj, Error **errp)
-- 
2.36.1

Re: [PATCH 1/9] monitor: make error_vprintf_unless_qmp() static

Marc-André Lureau  writes:

> Hi
>
> On Thu, Jul 7, 2022 at 4:25 PM Markus Armbruster  wrote:
>
>> marcandre.lur...@redhat.com writes:
>>
>> > From: Marc-André Lureau 
>> >
>> > Not needed outside monitor.c. Remove the needless stub.
>> >
>> > Signed-off-by: Marc-André Lureau 
>> > ---
>> >  include/monitor/monitor.h | 1 -
>> >  monitor/monitor.c | 3 ++-
>> >  stubs/error-printf.c  | 5 -
>> >  3 files changed, 2 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
>> > index a4b40e8391db..44653e195b45 100644
>> > --- a/include/monitor/monitor.h
>> > +++ b/include/monitor/monitor.h
>> > @@ -56,7 +56,6 @@ void monitor_register_hmp(const char *name, bool info,
>> >  void monitor_register_hmp_info_hrt(const char *name,
>> > HumanReadableText *(*handler)(Error 
>> > **errp));
>> >
>> > -int error_vprintf_unless_qmp(const char *fmt, va_list ap) 
>> > G_GNUC_PRINTF(1, 0);
>> >  int error_printf_unless_qmp(const char *fmt, ...) G_GNUC_PRINTF(1, 2);
>> >
>> >  #endif /* MONITOR_H */
>> > diff --git a/monitor/monitor.c b/monitor/monitor.c
>> > index 86949024f643..ba4c1716a48a 100644
>> > --- a/monitor/monitor.c
>> > +++ b/monitor/monitor.c
>> > @@ -273,7 +273,8 @@ int error_vprintf(const char *fmt, va_list ap)
>> >  return vfprintf(stderr, fmt, ap);
>> >  }
>> >
>> > -int error_vprintf_unless_qmp(const char *fmt, va_list ap)
>> > +G_GNUC_PRINTF(1, 0)
>> > +static int error_vprintf_unless_qmp(const char *fmt, va_list ap)
>> >  {
>> >  Monitor *cur_mon = monitor_cur();
>> >
>> > diff --git a/stubs/error-printf.c b/stubs/error-printf.c
>> > index 0e326d801059..1afa0f62ca26 100644
>> > --- a/stubs/error-printf.c
>> > +++ b/stubs/error-printf.c
>> > @@ -16,8 +16,3 @@ int error_vprintf(const char *fmt, va_list ap)
>> >  }
>> >  return vfprintf(stderr, fmt, ap);
>> >  }
>> > -
>> > -int error_vprintf_unless_qmp(const char *fmt, va_list ap)
>> > -{
>> > -return error_vprintf(fmt, ap);
>> > -}
>>
>> When I write a printf-like utility function, I habitually throw in a
>> vprintf-like function.
>>
>> Any particular reason for hiding this one?  To avoid misunderstandings:
>> I'm fine with hiding it if it's causing you trouble.
>
> I don't think I had an issue with it, only that I wrote tests for the
> error-report.h API, and didn't see the need to cover a function that isn't
> used outside the unit.

I'd keep it and not worry about missing tests; the tests of
error_printf_unless_qmp() exercise it fine.

>> Except I think we'd better delete than hide then: inline into
>> error_printf_unless_qmp().  Makes sense?
>
> It can't be easily inlined because of the surrounding va_start/va_end

Easily enough, I think:

diff --git a/monitor/monitor.c b/monitor/monitor.c
index 86949024f6..201a672ac6 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -273,27 +273,22 @@ int error_vprintf(const char *fmt, va_list ap)
 return vfprintf(stderr, fmt, ap);
 }
 
-int error_vprintf_unless_qmp(const char *fmt, va_list ap)
-{
-Monitor *cur_mon = monitor_cur();
-
-if (!cur_mon) {
-return vfprintf(stderr, fmt, ap);
-}
-if (!monitor_cur_is_qmp()) {
-return monitor_vprintf(cur_mon, fmt, ap);
-}
-return -1;
-}
-
 int error_printf_unless_qmp(const char *fmt, ...)
 {
+Monitor *cur_mon = monitor_cur();
 va_list ap;
 int ret;
 
 va_start(ap, fmt);
-ret = error_vprintf_unless_qmp(fmt, ap);
+if (!cur_mon) {
+ret = vfprintf(stderr, fmt, ap);
+} else if (!monitor_cur_is_qmp()) {
+ret = monitor_vprintf(cur_mon, fmt, ap);
+} else {
+ret = -1;
+}
 va_end(ap);
+
 return ret;
 }

Re: [PATCH 1/9] monitor: make error_vprintf_unless_qmp() static

2022-07-08 Thread Marc-André Lureau

Hi

On Fri, Jul 8, 2022 at 5:56 PM Markus Armbruster  wrote:

> Marc-André Lureau  writes:
>
> > Hi
> >
> > On Thu, Jul 7, 2022 at 4:25 PM Markus Armbruster 
> wrote:
> >
> >> marcandre.lur...@redhat.com writes:
> >>
> >> > From: Marc-André Lureau 
> >> >
> >> > Not needed outside monitor.c. Remove the needless stub.
> >> >
> >> > Signed-off-by: Marc-André Lureau 
> >> > ---
> >> >  include/monitor/monitor.h | 1 -
> >> >  monitor/monitor.c | 3 ++-
> >> >  stubs/error-printf.c  | 5 -
> >> >  3 files changed, 2 insertions(+), 7 deletions(-)
> >> >
> >> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> >> > index a4b40e8391db..44653e195b45 100644
> >> > --- a/include/monitor/monitor.h
> >> > +++ b/include/monitor/monitor.h
> >> > @@ -56,7 +56,6 @@ void monitor_register_hmp(const char *name, bool
> info,
> >> >  void monitor_register_hmp_info_hrt(const char *name,
> >> > HumanReadableText
> *(*handler)(Error **errp));
> >> >
> >> > -int error_vprintf_unless_qmp(const char *fmt, va_list ap)
> G_GNUC_PRINTF(1, 0);
> >> >  int error_printf_unless_qmp(const char *fmt, ...) G_GNUC_PRINTF(1,
> 2);
> >> >
> >> >  #endif /* MONITOR_H */
> >> > diff --git a/monitor/monitor.c b/monitor/monitor.c
> >> > index 86949024f643..ba4c1716a48a 100644
> >> > --- a/monitor/monitor.c
> >> > +++ b/monitor/monitor.c
> >> > @@ -273,7 +273,8 @@ int error_vprintf(const char *fmt, va_list ap)
> >> >  return vfprintf(stderr, fmt, ap);
> >> >  }
> >> >
> >> > -int error_vprintf_unless_qmp(const char *fmt, va_list ap)
> >> > +G_GNUC_PRINTF(1, 0)
> >> > +static int error_vprintf_unless_qmp(const char *fmt, va_list ap)
> >> >  {
> >> >  Monitor *cur_mon = monitor_cur();
> >> >
> >> > diff --git a/stubs/error-printf.c b/stubs/error-printf.c
> >> > index 0e326d801059..1afa0f62ca26 100644
> >> > --- a/stubs/error-printf.c
> >> > +++ b/stubs/error-printf.c
> >> > @@ -16,8 +16,3 @@ int error_vprintf(const char *fmt, va_list ap)
> >> >  }
> >> >  return vfprintf(stderr, fmt, ap);
> >> >  }
> >> > -
> >> > -int error_vprintf_unless_qmp(const char *fmt, va_list ap)
> >> > -{
> >> > -return error_vprintf(fmt, ap);
> >> > -}
> >>
> >> When I write a printf-like utility function, I habitually throw in a
> >> vprintf-like function.
> >>
> >> Any particular reason for hiding this one?  To avoid misunderstandings:
> >> I'm fine with hiding it if it's causing you trouble.
> >
> > I don't think I had an issue with it, only that I wrote tests for the
> > error-report.h API, and didn't see the need to cover a function that
> isn't
> > used outside the unit.
>
> I'd keep it and not worry about missing tests; the tests of
> error_printf_unless_qmp() exercise it fine.
>

ok


>
> >> Except I think we'd better delete than hide then: inline into
> >> error_printf_unless_qmp().  Makes sense?
> >
> > It can't be easily inlined because of the surrounding va_start/va_end
>
> Easily enough, I think:
>

ah yes indeed! :)


>
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 86949024f6..201a672ac6 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -273,27 +273,22 @@ int error_vprintf(const char *fmt, va_list ap)
>  return vfprintf(stderr, fmt, ap);
>  }
>
> -int error_vprintf_unless_qmp(const char *fmt, va_list ap)
> -{
> -Monitor *cur_mon = monitor_cur();
> -
> -if (!cur_mon) {
> -return vfprintf(stderr, fmt, ap);
> -}
> -if (!monitor_cur_is_qmp()) {
> -return monitor_vprintf(cur_mon, fmt, ap);
> -}
> -return -1;
> -}
> -
>  int error_printf_unless_qmp(const char *fmt, ...)
>  {
> +Monitor *cur_mon = monitor_cur();
>  va_list ap;
>  int ret;
>
>  va_start(ap, fmt);
> -ret = error_vprintf_unless_qmp(fmt, ap);
> +if (!cur_mon) {
> +ret = vfprintf(stderr, fmt, ap);
> +} else if (!monitor_cur_is_qmp()) {
> +ret = monitor_vprintf(cur_mon, fmt, ap);
> +} else {
> +ret = -1;
> +}
>  va_end(ap);
> +
>  return ret;
>  }
>
>
>

-- 
Marc-André Lureau

Re: [PATCH] hw/i386: pass RNG seed to e820 setup table

On Fri, Jul 08, 2022 at 02:04:40PM +0200, Jason A. Donenfeld wrote:
> Hi Daniel,
> 
> On Fri, Jul 8, 2022 at 2:00 PM Daniel P. Berrangé  wrote:
> >
> > On Thu, Jun 30, 2022 at 01:37:17PM +0200, Jason A. Donenfeld wrote:
> > > Tiny machines optimized for fast boot time generally don't use EFI,
> > > which means a random seed has to be supplied some other way, in this
> > > case by the e820 setup table, which supplies a place for one. This
> > > commit adds passing this random seed via the table. It is confirmed to
> > > be working with the Linux patch in the link.
> >
> > IIUC, this approach will only expose the random seed when QEMU
> > is booted using -kernel + -initrd args.
> >
> > I agree with what you say about most VMs not using UEFI right now.
> > I'd say the majority of general purpose VMs are using SeaBIOS
> > still. The usage of -kernel + -initrd, is typically for more
> > specialized use cases.
> 
> Highly disagree, based on seeing a lot of real world deployment.

I guess we're looking at different places then, as all of the large
scale virt mgmt apps I've experianced with KVM (OpenStack, oVirt,
KubeVirt), along with the small scale ones (GNOME Boxes, virt-manager,
virt-install, Cockpit), etc all primarily use SeaBIOS, and in more
recently times a bit of UEFI.  Direct kernel/initrd boot is usualy
reserved for special cases, since users like to be able to manage
their kernel/initrd inside the guest image.

> Furthermore, this is going to be used within Linux itself for kexec,
> so it makes sense to use it here too.

Ok, useful info.

> > Can we get an approach that exposes a random seed regardless of
> > whether using -kernel, or seabios, or uefi, or $whatever firmware ?
> 
> No.
> 
> > Perhaps (ab)use 'fw_cfg', which is exposed for any x86 VM no matter
> > what config it has for booting ?
> 
> That approach is super messy and doesn't work. I've already gone down
> that route.

What's the problem with it ? fw_cfg is a pretty straightforward
mechanism for injecting data into the guest OS, that we already
use for alot of stuff.

> The entire point here is to include the seed on this part of the boot
> protocol. There might be other opportunities for doing it elsewhere.
> For example, EFI already has a thing.
> 
> Please don't sink a good idea because it doesn't handle every possible
> use case. That type of mentality is just going to result in nothing
> ever getting done anywhere, making a decades old problem last for
> another decade. This patch here is simple and makes a tangible
> incremental advance toward something good, and fits the pattern of how
> it's done on all other platforms.

I'm not trying to sink an idea. If this turns out to be the best
idea, I've no problem with that.

I merely asked some reasonable questions about whether there were
alternative approaches that could solve more broadly useful scenarios,
given the narrow usage of direct kernel boot, in context of the common
VM deployments I've seen at large scale. You can't expect reviewers to
blindly accept any proposal without considering it broader context.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v5 25/45] target/arm: Implement BFMOPA, BFMOPS


On 7/7/22 15:12, Peter Maydell wrote:

+static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t 
neg)
+{
+pair ^= neg;


You seem to be negating element 1 of row and col ('neg' here is
1 << 15 unless I've misread something, and it gets passed to
the calls for both the row and column data), but the pseudocode
says we want to negate element 0 and element 1 of row, and not
negate the col elements.


Yep, thanks.


+if (!(pg & 1)) {
+pair &= 0xu;
+}
+if (!(pg & 4)) {
+pair &= 0xu;
+}


The pseudocode sets the element to 0 if it is not
predicated, and then applies the negation second.


Yes.  However, the negation is predicated too -- the squashed FPZero is never negated.  I 
found it simpler to unconditionally negate and then conditionally squash to zero.



+uint32_t n = *(uint32_t *)(vzn + row);


More missing H macros ?


Yep.


+if ((pa & 0b0101) == 0b0101 || (pb & 0b0101) == 0b0101) {


The pseudocode test for "do we do anything" is
  (prow_0 && pcol_0) || (prow_1 && pcol_1)

but isn't this C expression doing
  (prow_0 && prow_1) || (pcol_0 && pcol_1) ?


Yep, thanks.


r~

[PATCH] target/riscv: fix right shifts shamt value for rv128c

2022-07-08 Thread Frédéric Pétrot

For rv128c right shifts, the 6-bit shamt is sign extended to 7 bits.

Signed-off-by: Frédéric Pétrot 
---
 target/riscv/insn16.decode |  7 ---
 disas/riscv.c  | 27 +--
 target/riscv/translate.c   | 12 +++-
 3 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/target/riscv/insn16.decode b/target/riscv/insn16.decode
index 02c8f61b48..ea3c5a0411 100644
--- a/target/riscv/insn16.decode
+++ b/target/riscv/insn16.decode
@@ -31,7 +31,8 @@
 %imm_cb12:s1 5:2 2:1 10:2 3:2 !function=ex_shift_1
 %imm_cj12:s1 8:1 9:2 6:1 7:1 2:1 11:1 3:3 !function=ex_shift_1
 
-%shimm_6bit   12:1 2:5   !function=ex_rvc_shifti
+%shlimm_6bit   12:1 2:5  !function=ex_rvc_shiftli
+%shrimm_6bit   12:1 2:5  !function=ex_rvc_shiftri
 %uimm_6bit_lq 2:4 12:1 6:1   !function=ex_shift_4
 %uimm_6bit_ld 2:3 12:1 5:2   !function=ex_shift_3
 %uimm_6bit_lw 2:2 12:1 4:3   !function=ex_shift_2
@@ -82,9 +83,9 @@
 @c_addi16sp ... .  . . .. &i imm=%imm_addi16sp rs1=2 rd=2
 
 @c_shift... . .. ... . .. \
-&shift rd=%rs1_3 rs1=%rs1_3 shamt=%shimm_6bit
+&shift rd=%rs1_3 rs1=%rs1_3 shamt=%shrimm_6bit
 @c_shift2   ... . .. ... . .. \
-&shift rd=%rd rs1=%rd shamt=%shimm_6bit
+&shift rd=%rd rs1=%rd shamt=%shlimm_6bit
 
 @c_andi ... . .. ... . .. &i imm=%imm_ci rs1=%rs1_3 rd=%rs1_3
 
diff --git a/disas/riscv.c b/disas/riscv.c
index 7af6afc8fa..489c2ae5e8 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -2402,10 +2402,25 @@ static int32_t operand_sbimm12(rv_inst inst)
 ((inst << 56) >> 63) << 11;
 }
 
-static uint32_t operand_cimmsh6(rv_inst inst)
+static uint32_t operand_cimmshl6(rv_inst inst, rv_isa isa)
 {
-return ((inst << 51) >> 63) << 5 |
+int imm = ((inst << 51) >> 63) << 5 |
 (inst << 57) >> 59;
+if (isa == rv128) {
+imm = imm ? imm : 64;
+}
+return imm;
+}
+
+static uint32_t operand_cimmshr6(rv_inst inst, rv_isa isa)
+{
+int imm = ((inst << 51) >> 63) << 5 |
+(inst << 57) >> 59;
+if (isa == rv128) {
+imm = imm | (imm & 32) << 1;
+imm = imm ? imm : 64;
+}
+return imm;
 }
 
 static int32_t operand_cimmi(rv_inst inst)
@@ -2529,7 +2544,7 @@ static uint32_t operand_rnum(rv_inst inst)
 
 /* decode operands */
 
-static void decode_inst_operands(rv_decode *dec)
+static void decode_inst_operands(rv_decode *dec, rv_isa isa)
 {
 rv_inst inst = dec->inst;
 dec->codec = opcode_data[dec->op].codec;
@@ -2652,7 +2667,7 @@ static void decode_inst_operands(rv_decode *dec)
 case rv_codec_cb_sh6:
 dec->rd = dec->rs1 = operand_crs1rdq(inst) + 8;
 dec->rs2 = rv_ireg_zero;
-dec->imm = operand_cimmsh6(inst);
+dec->imm = operand_cimmshr6(inst, isa);
 break;
 case rv_codec_ci:
 dec->rd = dec->rs1 = operand_crs1rd(inst);
@@ -2667,7 +2682,7 @@ static void decode_inst_operands(rv_decode *dec)
 case rv_codec_ci_sh6:
 dec->rd = dec->rs1 = operand_crs1rd(inst);
 dec->rs2 = rv_ireg_zero;
-dec->imm = operand_cimmsh6(inst);
+dec->imm = operand_cimmshl6(inst, isa);
 break;
 case rv_codec_ci_16sp:
 dec->rd = rv_ireg_sp;
@@ -3193,7 +3208,7 @@ disasm_inst(char *buf, size_t buflen, rv_isa isa, 
uint64_t pc, rv_inst inst)
 dec.pc = pc;
 dec.inst = inst;
 decode_inst_opcode(&dec, isa);
-decode_inst_operands(&dec);
+decode_inst_operands(&dec, isa);
 decode_inst_decompress(&dec, isa);
 decode_inst_lift_pseudo(&dec);
 format_inst(buf, buflen, 16, &dec);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 63b04e8a94..af3a2cd68c 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -705,12 +705,22 @@ static int ex_rvc_register(DisasContext *ctx, int reg)
 return 8 + reg;
 }
 
-static int ex_rvc_shifti(DisasContext *ctx, int imm)
+static int ex_rvc_shiftli(DisasContext *ctx, int imm)
 {
 /* For RV128 a shamt of 0 means a shift by 64. */
 return imm ? imm : 64;
 }
 
+static int ex_rvc_shiftri(DisasContext *ctx, int imm)
+{
+/*
+ * For RV128 a shamt of 0 means a shift by 64, furthermore, for right
+ * shifts, the shamt is sign-extended.
+ */
+imm = imm | (imm & 32) << 1;
+return imm ? imm : 64;
+}
+
 /* Include the auto-generated decoder for 32 bit insn */
 #include "decode-insn32.c.inc"
 
-- 
2.36.1

[PATCH v6 02/45] target/arm: Add infrastructure for disas_sme

This includes the build rules for the decoder, and the
new file for translation, but excludes any instructions.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.h |  1 +
 target/arm/sme.decode  | 20 
 target/arm/translate-a64.c |  7 ++-
 target/arm/translate-sme.c | 35 +++
 target/arm/meson.build |  2 ++
 5 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 target/arm/sme.decode
 create mode 100644 target/arm/translate-sme.c

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index f0970c6b8c..789b6e8e78 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -146,6 +146,7 @@ static inline int pred_gvec_reg_size(DisasContext *s)
 }
 
 bool disas_sve(DisasContext *, uint32_t);
+bool disas_sme(DisasContext *, uint32_t);
 
 void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
new file mode 100644
index 00..c25c031a71
--- /dev/null
+++ b/target/arm/sme.decode
@@ -0,0 +1,20 @@
+# AArch64 SME instruction descriptions
+#
+#  Copyright (c) 2022 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see .
+
+#
+# This file is processed by scripts/decodetree.py
+#
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c86b97b1d4..a5f8a6c771 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14806,7 +14806,12 @@ static void aarch64_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 }
 
 switch (extract32(insn, 25, 4)) {
-case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
+case 0x0:
+if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
+unallocated_encoding(s);
+}
+break;
+case 0x1: case 0x3: /* UNALLOCATED */
 unallocated_encoding(s);
 break;
 case 0x2:
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
new file mode 100644
index 00..786c93fb2d
--- /dev/null
+++ b/target/arm/translate-sme.c
@@ -0,0 +1,35 @@
+/*
+ * AArch64 SME translation
+ *
+ * Copyright (c) 2022 Linaro, Ltd
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "translate.h"
+#include "exec/helper-gen.h"
+#include "translate-a64.h"
+#include "fpu/softfloat.h"
+
+
+/*
+ * Include the generated decoder.
+ */
+
+#include "decode-sme.c.inc"
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 43dc600547..6dd7e93643 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -1,5 +1,6 @@
 gen = [
   decodetree.process('sve.decode', extra_args: '--decode=disas_sve'),
+  decodetree.process('sme.decode', extra_args: '--decode=disas_sme'),
   decodetree.process('neon-shared.decode', extra_args: 
'--decode=disas_neon_shared'),
   decodetree.process('neon-dp.decode', extra_args: '--decode=disas_neon_dp'),
   decodetree.process('neon-ls.decode', extra_args: '--decode=disas_neon_ls'),
@@ -50,6 +51,7 @@ arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
   'sme_helper.c',
   'translate-a64.c',
   'translate-sve.c',
+  'translate-sme.c',
 ))
 
 arm_softmmu_ss = ss.source_set()
-- 
2.34.1

[PATCH v6 01/45] target/arm: Handle SME in aarch64_cpu_dump_state

Dump SVCR, plus use the correct access check for Streaming Mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index ae6dca2f01..9c58be8b14 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -878,6 +878,7 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
 int i;
 int el = arm_current_el(env);
 const char *ns_status;
+bool sve;
 
 qemu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
 for (i = 0; i < 32; i++) {
@@ -904,6 +905,12 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
  el,
  psr & PSTATE_SP ? 'h' : 't');
 
+if (cpu_isar_feature(aa64_sme, cpu)) {
+qemu_fprintf(f, "  SVCR=%08" PRIx64 " %c%c",
+ env->svcr,
+ (FIELD_EX64(env->svcr, SVCR, ZA) ? 'Z' : '-'),
+ (FIELD_EX64(env->svcr, SVCR, SM) ? 'S' : '-'));
+}
 if (cpu_isar_feature(aa64_bti, cpu)) {
 qemu_fprintf(f, "  BTYPE=%d", (psr & PSTATE_BTYPE) >> 10);
 }
@@ -918,7 +925,15 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
 qemu_fprintf(f, " FPCR=%08x FPSR=%08x\n",
  vfp_get_fpcr(env), vfp_get_fpsr(env));
 
-if (cpu_isar_feature(aa64_sve, cpu) && sve_exception_el(env, el) == 0) {
+if (cpu_isar_feature(aa64_sme, cpu) && FIELD_EX64(env->svcr, SVCR, SM)) {
+sve = sme_exception_el(env, el) == 0;
+} else if (cpu_isar_feature(aa64_sve, cpu)) {
+sve = sve_exception_el(env, el) == 0;
+} else {
+sve = false;
+}
+
+if (sve) {
 int j, zcr_len = sve_vqm1_for_el(env, el);
 
 for (i = 0; i <= FFR_PRED_NUM; i++) {
-- 
2.34.1

[PATCH v6 03/45] target/arm: Trap non-streaming usage when Streaming SVE is active

This new behaviour is in the ARM pseudocode function
AArch64.CheckFPAdvSIMDEnabled, which applies to AArch32
via AArch32.CheckAdvSIMDOrFPEnabled when the EL to which
the trap would be delivered is in AArch64 mode.

Given that ARMv9 drops support for AArch32 outside EL0, the trap EL
detection ought to be trivially true, but the pseudocode still contains
a number of conditions, and QEMU has not yet committed to dropping A32
support for EL[12] when v9 features are present.

Since the computation of SME_TRAP_NONSTREAMING is necessarily different
for the two modes, we might as well preserve bits within TBFLAG_ANY and
allocate separate bits within TBFLAG_A32 and TBFLAG_A64 instead.

Note that DDI0616A.a has typos for bits [22:21] of LD1RO in the table
of instructions illegal in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h   |  7 +++
 target/arm/translate.h |  4 ++
 target/arm/sme-fa64.decode | 90 ++
 target/arm/helper.c| 41 +
 target/arm/translate-a64.c | 40 -
 target/arm/translate-vfp.c | 12 +
 target/arm/translate.c |  2 +
 target/arm/meson.build |  1 +
 8 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 target/arm/sme-fa64.decode

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 1f4f3e0485..1e36a839ee 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3158,6 +3158,11 @@ FIELD(TBFLAG_A32, HSTR_ACTIVE, 9, 1)
  * the same thing as the current security state of the processor!
  */
 FIELD(TBFLAG_A32, NS, 10, 1)
+/*
+ * Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not.
+ * This requires an SME trap from AArch32 mode when using NEON.
+ */
+FIELD(TBFLAG_A32, SME_TRAP_NONSTREAMING, 11, 1)
 
 /*
  * Bit usage when in AArch32 state, for M-profile only.
@@ -3195,6 +3200,8 @@ FIELD(TBFLAG_A64, SMEEXC_EL, 20, 2)
 FIELD(TBFLAG_A64, PSTATE_SM, 22, 1)
 FIELD(TBFLAG_A64, PSTATE_ZA, 23, 1)
 FIELD(TBFLAG_A64, SVL, 24, 4)
+/* Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not. */
+FIELD(TBFLAG_A64, SME_TRAP_NONSTREAMING, 28, 1)
 
 /*
  * Helpers for using the above.
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 22fd882368..cbc907c751 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -102,6 +102,10 @@ typedef struct DisasContext {
 bool pstate_sm;
 /* True if PSTATE.ZA is set. */
 bool pstate_za;
+/* True if non-streaming insns should raise an SME Streaming exception. */
+bool sme_trap_nonstreaming;
+/* True if the current instruction is non-streaming. */
+bool is_nonstreaming;
 /* True if MVE insns are definitely not predicated by VPR or LTPSIZE */
 bool mve_no_pred;
 /*
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
new file mode 100644
index 00..3d90837fc7
--- /dev/null
+++ b/target/arm/sme-fa64.decode
@@ -0,0 +1,90 @@
+# AArch64 SME allowed instruction decoding
+#
+#  Copyright (c) 2022 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see .
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# These patterns are taken from Appendix E1.1 of DDI0616 A.a,
+# Arm Architecture Reference Manual Supplement,
+# The Scalable Matrix Extension (SME), for Armv9-A
+
+{
+  [
+OK  0-00 1110  0001 0010 11--     # SMOV W|Xd,Vn.B[0]
+OK  0-00 1110  0010 0010 11--     # SMOV W|Xd,Vn.H[0]
+OK  0100 1110  0100 0010 11--     # SMOV Xd,Vn.S[0]
+OK   1110  0001 0011 11--     # UMOV Wd,Vn.B[0]
+OK   1110  0010 0011 11--     # UMOV Wd,Vn.H[0]
+OK   1110  0100 0011 11--     # UMOV Wd,Vn.S[0]
+OK  0100 1110  1000 0011 11--     # UMOV Xd,Vn.D[0]
+  ]
+  FAIL  0--0 111-         # Advanced SIMD vector 
operations
+}
+
+{
+  [
+OK  0101 1110 --1-  11-1 11--     # FMULX/FRECPS/FRSQRTS 
(scalar)
+OK  0101 1110 -10-  00-1 11--     # FMULX/FRECPS/FRSQRTS 
(scalar, FP16)
+OK  01-1 1110 1-10 0001 11-1 10--     # FRECPE/FRSQRTE/FRECPX 
(scalar)
+OK  01-1 1110  1001 11-1 10--     # FRECPE/FRSQRTE/FRECPX 
(scalar, FP16)
+  ]
+  FAIL  01-1 111-         # Advanced SIMD 
single-element

[PATCH v6 07/45] target/arm: Mark PMULL, FMMLA as non-streaming

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  2 --
 target/arm/translate-sve.c | 24 +++-
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 4f515939d9..4ff2df82e5 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,8 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL0100 0101 000-  0110 1---     # PMULLB, PMULLT (128b 
result)
-FAIL0110 0100 --1-  1110 01--     # FMMLA, BFMMLA
 FAIL0110 0101 --0-   11--     # FTSMUL
 FAIL0110 0101 --01 0--- 100-      # FTMAD
 FAIL0110 0101 --01 1--- 001-      # FADDA
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index ae48040aa4..4ff2102fc8 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -6186,9 +6186,13 @@ static bool do_trans_pmull(DisasContext *s, arg_rrr_esz 
*a, bool sel)
 gen_helper_gvec_pmull_q, gen_helper_sve2_pmull_h,
 NULL,gen_helper_sve2_pmull_d,
 };
-if (a->esz == 0
-? !dc_isar_feature(aa64_sve2_pmull128, s)
-: !dc_isar_feature(aa64_sve, s)) {
+
+if (a->esz == 0) {
+if (!dc_isar_feature(aa64_sve2_pmull128, s)) {
+return false;
+}
+s->is_nonstreaming = true;
+} else if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
 return gen_gvec_ool_arg_zzz(s, fns[a->esz], a, sel);
@@ -7125,10 +7129,12 @@ DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz)
  * SVE Integer Multiply-Add (unpredicated)
  */
 
-TRANS_FEAT(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_, gen_helper_fmmla_s,
-   a->rd, a->rn, a->rm, a->ra, 0, FPST_FPCR)
-TRANS_FEAT(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_, gen_helper_fmmla_d,
-   a->rd, a->rn, a->rm, a->ra, 0, FPST_FPCR)
+TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_,
+gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra,
+0, FPST_FPCR)
+TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_,
+gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra,
+0, FPST_FPCR)
 
 static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = {
 NULL,   gen_helper_sve2_sqdmlal_zzzw_h,
@@ -7301,8 +7307,8 @@ TRANS_FEAT(BFDOT_, aa64_sve_bf16, 
gen_gvec_ool_arg_,
 TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_ool_arg_zzxz,
gen_helper_gvec_bfdot_idx, a)
 
-TRANS_FEAT(BFMMLA, aa64_sve_bf16, gen_gvec_ool_arg_,
-   gen_helper_gvec_bfmmla, a, 0)
+TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_ool_arg_,
+gen_helper_gvec_bfmmla, a, 0)
 
 static bool do_BFMLAL_zzzw(DisasContext *s, arg__esz *a, bool sel)
 {
-- 
2.34.1

[PATCH v6 05/45] target/arm: Mark RDFFR, WRFFR, SETFFR as non-streaming

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode | 2 --
 target/arm/translate-sve.c | 9 ++---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 73c71abc46..fa2b5cbf1a 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -61,8 +61,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 
 FAIL 0100 --1-  1011 -0--     # FTSSEL, FEXPA
 FAIL 0101 --10 0001 100-      # COMPACT
-FAIL0010 0101 --01 100-  000- ---0    # RDFFR, RDFFRS
-FAIL0010 0101 --10 1--- 1001      # WRFFR, SETFFR
 FAIL0100 0101 --0-  1011      # BDEP, BEXT, BGRP
 FAIL0100 0101 000-  0110 1---     # PMULLB, PMULLT (128b 
result)
 FAIL0110 0100 --1-  1110 01--     # FMMLA, BFMMLA
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 5d1db0d3ff..d6faec15fe 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1785,7 +1785,8 @@ static bool do_predset(DisasContext *s, int esz, int rd, 
int pat, bool setflag)
 TRANS_FEAT(PTRUE, aa64_sve, do_predset, a->esz, a->rd, a->pat, a->s)
 
 /* Note pat == 31 is #all, to set all elements.  */
-TRANS_FEAT(SETFFR, aa64_sve, do_predset, 0, FFR_PRED_NUM, 31, false)
+TRANS_FEAT_NONSTREAMING(SETFFR, aa64_sve,
+do_predset, 0, FFR_PRED_NUM, 31, false)
 
 /* Note pat == 32 is #unimp, to set no elements.  */
 TRANS_FEAT(PFALSE, aa64_sve, do_predset, 0, a->rd, 32, false)
@@ -1799,11 +1800,13 @@ static bool trans_RDFFR_p(DisasContext *s, arg_RDFFR_p 
*a)
 .rd = a->rd, .pg = a->pg, .s = a->s,
 .rn = FFR_PRED_NUM, .rm = FFR_PRED_NUM,
 };
+
+s->is_nonstreaming = true;
 return trans_AND_(s, &alt_a);
 }
 
-TRANS_FEAT(RDFFR, aa64_sve, do_mov_p, a->rd, FFR_PRED_NUM)
-TRANS_FEAT(WRFFR, aa64_sve, do_mov_p, FFR_PRED_NUM, a->rn)
+TRANS_FEAT_NONSTREAMING(RDFFR, aa64_sve, do_mov_p, a->rd, FFR_PRED_NUM)
+TRANS_FEAT_NONSTREAMING(WRFFR, aa64_sve, do_mov_p, FFR_PRED_NUM, a->rn)
 
 static bool do_pfirst_pnext(DisasContext *s, arg_rr_esz *a,
 void (*gen_fn)(TCGv_i32, TCGv_ptr,
-- 
2.34.1

[PATCH v6 00/45] target/arm: Scalable Matrix Extension

Changes for v6:
  * Some sub-word big-endian addressing fixups (pmm).
  * Logic errors for BFMOPA/FMOPA (pmm).
  * Fix for PR_SME_SET_VL hflags rebuild.

r~

Richard Henderson (45):
  target/arm: Handle SME in aarch64_cpu_dump_state
  target/arm: Add infrastructure for disas_sme
  target/arm: Trap non-streaming usage when Streaming SVE is active
  target/arm: Mark ADR as non-streaming
  target/arm: Mark RDFFR, WRFFR, SETFFR as non-streaming
  target/arm: Mark BDEP, BEXT, BGRP, COMPACT, FEXPA, FTSSEL as
non-streaming
  target/arm: Mark PMULL, FMMLA as non-streaming
  target/arm: Mark FTSMUL, FTMAD, FADDA as non-streaming
  target/arm: Mark SMMLA, UMMLA, USMMLA as non-streaming
  target/arm: Mark string/histo/crypto as non-streaming
  target/arm: Mark gather/scatter load/store as non-streaming
  target/arm: Mark gather prefetch as non-streaming
  target/arm: Mark LDFF1 and LDNF1 as non-streaming
  target/arm: Mark LD1RO as non-streaming
  target/arm: Add SME enablement checks
  target/arm: Handle SME in sve_access_check
  target/arm: Implement SME RDSVL, ADDSVL, ADDSPL
  target/arm: Implement SME ZERO
  target/arm: Implement SME MOVA
  target/arm: Implement SME LD1, ST1
  target/arm: Export unpredicated ld/st from translate-sve.c
  target/arm: Implement SME LDR, STR
  target/arm: Implement SME ADDHA, ADDVA
  target/arm: Implement FMOPA, FMOPS (non-widening)
  target/arm: Implement BFMOPA, BFMOPS
  target/arm: Implement FMOPA, FMOPS (widening)
  target/arm: Implement SME integer outer product
  target/arm: Implement PSEL
  target/arm: Implement REVD
  target/arm: Implement SCLAMP, UCLAMP
  target/arm: Reset streaming sve state on exception boundaries
  target/arm: Enable SME for -cpu max
  linux-user/aarch64: Clear tpidr2_el0 if CLONE_SETTLS
  linux-user/aarch64: Reset PSTATE.SM on syscalls
  linux-user/aarch64: Add SM bit to SVE signal context
  linux-user/aarch64: Tidy target_restore_sigframe error return
  linux-user/aarch64: Do not allow duplicate or short sve records
  linux-user/aarch64: Verify extra record lock succeeded
  linux-user/aarch64: Move sve record checks into restore
  linux-user/aarch64: Implement SME signal handling
  linux-user: Rename sve prctls
  linux-user/aarch64: Implement PR_SME_GET_VL, PR_SME_SET_VL
  target/arm: Only set ZEN in reset if SVE present
  target/arm: Enable SME for user-only
  linux-user/aarch64: Add SME related hwcap entries

 docs/system/arm/emulation.rst |4 +
 linux-user/aarch64/target_cpu.h   |5 +-
 linux-user/aarch64/target_prctl.h |   62 +-
 target/arm/cpu.h  |7 +
 target/arm/helper-sme.h   |  126 
 target/arm/helper-sve.h   |4 +
 target/arm/helper.h   |   18 +
 target/arm/translate-a64.h|   45 ++
 target/arm/translate.h|   16 +
 target/arm/sme-fa64.decode|   60 ++
 target/arm/sme.decode |   88 +++
 target/arm/sve.decode |   41 +-
 linux-user/aarch64/cpu_loop.c |9 +
 linux-user/aarch64/signal.c   |  243 +-
 linux-user/elfload.c  |   20 +
 linux-user/syscall.c  |   28 +-
 target/arm/cpu.c  |   35 +-
 target/arm/cpu64.c|   11 +
 target/arm/helper.c   |   56 +-
 target/arm/sme_helper.c   | 1140 +
 target/arm/sve_helper.c   |   28 +
 target/arm/translate-a64.c|  103 ++-
 target/arm/translate-sme.c|  373 ++
 target/arm/translate-sve.c|  393 --
 target/arm/translate-vfp.c|   12 +
 target/arm/translate.c|2 +
 target/arm/vec_helper.c   |   24 +
 target/arm/meson.build|3 +
 28 files changed, 2821 insertions(+), 135 deletions(-)
 create mode 100644 target/arm/sme-fa64.decode
 create mode 100644 target/arm/sme.decode
 create mode 100644 target/arm/translate-sme.c

-- 
2.34.1

[PATCH v6 10/45] target/arm: Mark string/histo/crypto as non-streaming

Mark these as non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  1 -
 target/arm/translate-sve.c | 35 ++-
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 3260ea2d64..fe462d2ccc 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,7 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL0100 0101 --1-  1---      # SVE2 string/histo/crypto 
instructions
 FAIL1000 010- -00-  10--      # SVE2 32-bit gather NT load 
(vector+scalar)
 FAIL1000 010- -00-  111-      # SVE 32-bit gather prefetch 
(vector+imm)
 FAIL1000 0100 0-1-  0---      # SVE 32-bit gather prefetch 
(scalar+vector)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 9bbf44f008..f8e0716474 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -7110,21 +7110,21 @@ DO_SVE2_ZZZ_NARROW(RSUBHNT, rsubhnt)
 static gen_helper_gvec_flags_4 * const match_fns[4] = {
 gen_helper_sve2_match_ppzz_b, gen_helper_sve2_match_ppzz_h, NULL, NULL
 };
-TRANS_FEAT(MATCH, aa64_sve2, do_ppzz_flags, a, match_fns[a->esz])
+TRANS_FEAT_NONSTREAMING(MATCH, aa64_sve2, do_ppzz_flags, a, match_fns[a->esz])
 
 static gen_helper_gvec_flags_4 * const nmatch_fns[4] = {
 gen_helper_sve2_nmatch_ppzz_b, gen_helper_sve2_nmatch_ppzz_h, NULL, NULL
 };
-TRANS_FEAT(NMATCH, aa64_sve2, do_ppzz_flags, a, nmatch_fns[a->esz])
+TRANS_FEAT_NONSTREAMING(NMATCH, aa64_sve2, do_ppzz_flags, a, 
nmatch_fns[a->esz])
 
 static gen_helper_gvec_4 * const histcnt_fns[4] = {
 NULL, NULL, gen_helper_sve2_histcnt_s, gen_helper_sve2_histcnt_d
 };
-TRANS_FEAT(HISTCNT, aa64_sve2, gen_gvec_ool_arg_zpzz,
-   histcnt_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(HISTCNT, aa64_sve2, gen_gvec_ool_arg_zpzz,
+histcnt_fns[a->esz], a, 0)
 
-TRANS_FEAT(HISTSEG, aa64_sve2, gen_gvec_ool_arg_zzz,
-   a->esz == 0 ? gen_helper_sve2_histseg : NULL, a, 0)
+TRANS_FEAT_NONSTREAMING(HISTSEG, aa64_sve2, gen_gvec_ool_arg_zzz,
+a->esz == 0 ? gen_helper_sve2_histseg : NULL, a, 0)
 
 DO_ZPZZ_FP(FADDP, aa64_sve2, sve2_faddp_zpzz)
 DO_ZPZZ_FP(FMAXNMP, aa64_sve2, sve2_fmaxnmp_zpzz)
@@ -7238,20 +7238,21 @@ TRANS_FEAT(SQRDCMLAH_, aa64_sve2, gen_gvec_ool_,
 TRANS_FEAT(USDOT_, aa64_sve_i8mm, gen_gvec_ool_arg_,
a->esz == 2 ? gen_helper_gvec_usdot_b : NULL, a, 0)
 
-TRANS_FEAT(AESMC, aa64_sve2_aes, gen_gvec_ool_zz,
-   gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt)
+TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz,
+gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt)
 
-TRANS_FEAT(AESE, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
-   gen_helper_crypto_aese, a, false)
-TRANS_FEAT(AESD, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
-   gen_helper_crypto_aese, a, true)
+TRANS_FEAT_NONSTREAMING(AESE, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
+gen_helper_crypto_aese, a, false)
+TRANS_FEAT_NONSTREAMING(AESD, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
+gen_helper_crypto_aese, a, true)
 
-TRANS_FEAT(SM4E, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
-   gen_helper_crypto_sm4e, a, 0)
-TRANS_FEAT(SM4EKEY, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
-   gen_helper_crypto_sm4ekey, a, 0)
+TRANS_FEAT_NONSTREAMING(SM4E, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
+gen_helper_crypto_sm4e, a, 0)
+TRANS_FEAT_NONSTREAMING(SM4EKEY, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
+gen_helper_crypto_sm4ekey, a, 0)
 
-TRANS_FEAT(RAX1, aa64_sve2_sha3, gen_gvec_fn_arg_zzz, gen_gvec_rax1, a)
+TRANS_FEAT_NONSTREAMING(RAX1, aa64_sve2_sha3, gen_gvec_fn_arg_zzz,
+gen_gvec_rax1, a)
 
 TRANS_FEAT(FCVTNT_sh, aa64_sve2, gen_gvec_fpst_arg_zpz,
gen_helper_sve2_fcvtnt_sh, a, 0, FPST_FPCR)
-- 
2.34.1

[PATCH v6 04/45] target/arm: Mark ADR as non-streaming

Mark ADR as a non-streaming instruction, which should trap
if full a64 support is not enabled in streaming mode.

Removing entries from sme-fa64.decode is an easy way to see
what remains to be done.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate.h | 7 +++
 target/arm/sme-fa64.decode | 1 -
 target/arm/translate-sve.c | 8 
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index cbc907c751..e2e619dab2 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -566,4 +566,11 @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
 static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \
 { return dc_isar_feature(FEAT, s) && FUNC(s, __VA_ARGS__); }
 
+#define TRANS_FEAT_NONSTREAMING(NAME, FEAT, FUNC, ...)\
+static bool trans_##NAME(DisasContext *s, arg_##NAME *a)  \
+{ \
+s->is_nonstreaming = true;\
+return dc_isar_feature(FEAT, s) && FUNC(s, __VA_ARGS__);  \
+}
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 3d90837fc7..73c71abc46 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,7 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL 0100 --1-  1010      # ADR
 FAIL 0100 --1-  1011 -0--     # FTSSEL, FEXPA
 FAIL 0101 --10 0001 100-      # COMPACT
 FAIL0010 0101 --01 100-  000- ---0    # RDFFR, RDFFRS
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 62b5f3040c..5d1db0d3ff 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1320,10 +1320,10 @@ static bool do_adr(DisasContext *s, arg_rrri *a, 
gen_helper_gvec_3 *fn)
 return gen_gvec_ool_zzz(s, fn, a->rd, a->rn, a->rm, a->imm);
 }
 
-TRANS_FEAT(ADR_p32, aa64_sve, do_adr, a, gen_helper_sve_adr_p32)
-TRANS_FEAT(ADR_p64, aa64_sve, do_adr, a, gen_helper_sve_adr_p64)
-TRANS_FEAT(ADR_s32, aa64_sve, do_adr, a, gen_helper_sve_adr_s32)
-TRANS_FEAT(ADR_u32, aa64_sve, do_adr, a, gen_helper_sve_adr_u32)
+TRANS_FEAT_NONSTREAMING(ADR_p32, aa64_sve, do_adr, a, gen_helper_sve_adr_p32)
+TRANS_FEAT_NONSTREAMING(ADR_p64, aa64_sve, do_adr, a, gen_helper_sve_adr_p64)
+TRANS_FEAT_NONSTREAMING(ADR_s32, aa64_sve, do_adr, a, gen_helper_sve_adr_s32)
+TRANS_FEAT_NONSTREAMING(ADR_u32, aa64_sve, do_adr, a, gen_helper_sve_adr_u32)
 
 /*
  *** SVE Integer Misc - Unpredicated Group
-- 
2.34.1

[PATCH v6 11/45] target/arm: Mark gather/scatter load/store as non-streaming

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode | 9 -
 target/arm/translate-sve.c | 6 ++
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index fe462d2ccc..1acc3ae080 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,19 +59,10 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL1000 010- -00-  10--      # SVE2 32-bit gather NT load 
(vector+scalar)
 FAIL1000 010- -00-  111-      # SVE 32-bit gather prefetch 
(vector+imm)
 FAIL1000 0100 0-1-  0---      # SVE 32-bit gather prefetch 
(scalar+vector)
-FAIL1000 010- -01-  1---      # SVE 32-bit gather load 
(vector+imm)
-FAIL1000 0100 0-0-  0---      # SVE 32-bit gather load 
byte (scalar+vector)
-FAIL1000 0100 1---  0---      # SVE 32-bit gather load 
half (scalar+vector)
-FAIL1000 0101 0---  0---      # SVE 32-bit gather load 
word (scalar+vector)
 FAIL1010 010-   011-      # SVE contiguous FF load 
(scalar+scalar)
 FAIL1010 010- ---1  101-      # SVE contiguous NF load 
(scalar+imm)
 FAIL1010 010- -01-  000-      # SVE load & replicate 32 
bytes (scalar+scalar)
 FAIL1010 010- -010  001-      # SVE load & replicate 32 
bytes (scalar+imm)
 FAIL1100 010-         # SVE 64-bit gather 
load/prefetch
-FAIL1110 010- -00-  001-      # SVE2 64-bit scatter NT 
store (vector+scalar)
-FAIL1110 010- -10-  001-      # SVE2 32-bit scatter NT 
store (vector+scalar)
-FAIL1110 010-   1-0-      # SVE scatter store 
(scalar+32-bit vector)
-FAIL1110 010-   101-      # SVE scatter store (misc)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index f8e0716474..b23c6aa0bf 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5669,6 +5669,7 @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz 
*a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5700,6 +5701,7 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz 
*a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5734,6 +5736,7 @@ static bool trans_LDNT1_zprz(DisasContext *s, 
arg_LD1_zprz *a)
 if (!dc_isar_feature(aa64_sve2, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5857,6 +5860,7 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz 
*a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5887,6 +5891,7 @@ static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz 
*a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5921,6 +5926,7 @@ static bool trans_STNT1_zprz(DisasContext *s, 
arg_ST1_zprz *a)
 if (!dc_isar_feature(aa64_sve2, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
-- 
2.34.1

[PATCH v6 06/45] target/arm: Mark BDEP, BEXT, BGRP, COMPACT, FEXPA, FTSSEL as non-streaming

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  3 ---
 target/arm/translate-sve.c | 22 --
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index fa2b5cbf1a..4f515939d9 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,9 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL 0100 --1-  1011 -0--     # FTSSEL, FEXPA
-FAIL 0101 --10 0001 100-      # COMPACT
-FAIL0100 0101 --0-  1011      # BDEP, BEXT, BGRP
 FAIL0100 0101 000-  0110 1---     # PMULLB, PMULLT (128b 
result)
 FAIL0110 0100 --1-  1110 01--     # FMMLA, BFMMLA
 FAIL0110 0101 --0-   11--     # FTSMUL
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d6faec15fe..ae48040aa4 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1333,14 +1333,15 @@ static gen_helper_gvec_2 * const fexpa_fns[4] = {
 NULL,   gen_helper_sve_fexpa_h,
 gen_helper_sve_fexpa_s, gen_helper_sve_fexpa_d,
 };
-TRANS_FEAT(FEXPA, aa64_sve, gen_gvec_ool_zz,
-   fexpa_fns[a->esz], a->rd, a->rn, 0)
+TRANS_FEAT_NONSTREAMING(FEXPA, aa64_sve, gen_gvec_ool_zz,
+fexpa_fns[a->esz], a->rd, a->rn, 0)
 
 static gen_helper_gvec_3 * const ftssel_fns[4] = {
 NULL,gen_helper_sve_ftssel_h,
 gen_helper_sve_ftssel_s, gen_helper_sve_ftssel_d,
 };
-TRANS_FEAT(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz, ftssel_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz,
+ftssel_fns[a->esz], a, 0)
 
 /*
  *** SVE Predicate Logical Operations Group
@@ -2536,7 +2537,8 @@ TRANS_FEAT(TRN2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
 static gen_helper_gvec_3 * const compact_fns[4] = {
 NULL, NULL, gen_helper_sve_compact_s, gen_helper_sve_compact_d
 };
-TRANS_FEAT(COMPACT, aa64_sve, gen_gvec_ool_arg_zpz, compact_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(COMPACT, aa64_sve, gen_gvec_ool_arg_zpz,
+compact_fns[a->esz], a, 0)
 
 /* Call the helper that computes the ARM LastActiveElement pseudocode
  * function, scaled by the element size.  This includes the not found
@@ -6374,22 +6376,22 @@ static gen_helper_gvec_3 * const bext_fns[4] = {
 gen_helper_sve2_bext_b, gen_helper_sve2_bext_h,
 gen_helper_sve2_bext_s, gen_helper_sve2_bext_d,
 };
-TRANS_FEAT(BEXT, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
-   bext_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(BEXT, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
+bext_fns[a->esz], a, 0)
 
 static gen_helper_gvec_3 * const bdep_fns[4] = {
 gen_helper_sve2_bdep_b, gen_helper_sve2_bdep_h,
 gen_helper_sve2_bdep_s, gen_helper_sve2_bdep_d,
 };
-TRANS_FEAT(BDEP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
-   bdep_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(BDEP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
+bdep_fns[a->esz], a, 0)
 
 static gen_helper_gvec_3 * const bgrp_fns[4] = {
 gen_helper_sve2_bgrp_b, gen_helper_sve2_bgrp_h,
 gen_helper_sve2_bgrp_s, gen_helper_sve2_bgrp_d,
 };
-TRANS_FEAT(BGRP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
-   bgrp_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(BGRP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
+bgrp_fns[a->esz], a, 0)
 
 static gen_helper_gvec_3 * const cadd_fns[4] = {
 gen_helper_sve2_cadd_b, gen_helper_sve2_cadd_h,
-- 
2.34.1

[PATCH v6 09/45] target/arm: Mark SMMLA, UMMLA, USMMLA as non-streaming

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  1 -
 target/arm/translate-sve.c | 12 ++--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index b5eaa2d0fa..3260ea2d64 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,7 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL0100 0101 --0-  1001 10--     # SMMLA, UMMLA, USMMLA
 FAIL0100 0101 --1-  1---      # SVE2 string/histo/crypto 
instructions
 FAIL1000 010- -00-  10--      # SVE2 32-bit gather NT load 
(vector+scalar)
 FAIL1000 010- -00-  111-      # SVE 32-bit gather prefetch 
(vector+imm)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d5aad53923..9bbf44f008 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -7302,12 +7302,12 @@ TRANS_FEAT(FMLALT_zzxw, aa64_sve2, do_FMLAL_zzxw, a, 
false, true)
 TRANS_FEAT(FMLSLB_zzxw, aa64_sve2, do_FMLAL_zzxw, a, true, false)
 TRANS_FEAT(FMLSLT_zzxw, aa64_sve2, do_FMLAL_zzxw, a, true, true)
 
-TRANS_FEAT(SMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
-   gen_helper_gvec_smmla_b, a, 0)
-TRANS_FEAT(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
-   gen_helper_gvec_usmmla_b, a, 0)
-TRANS_FEAT(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
-   gen_helper_gvec_ummla_b, a, 0)
+TRANS_FEAT_NONSTREAMING(SMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
+gen_helper_gvec_smmla_b, a, 0)
+TRANS_FEAT_NONSTREAMING(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
+gen_helper_gvec_usmmla_b, a, 0)
+TRANS_FEAT_NONSTREAMING(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
+gen_helper_gvec_ummla_b, a, 0)
 
 TRANS_FEAT(BFDOT_, aa64_sve_bf16, gen_gvec_ool_arg_,
gen_helper_gvec_bfdot, a, 0)
-- 
2.34.1

[PATCH v6 12/45] target/arm: Mark gather prefetch as non-streaming

Mark these as a non-streaming instructions, which should trap if full
a64 support is not enabled in streaming mode.  In this case, introduce
PRF_ns (prefetch non-streaming) to handle the checks.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  3 ---
 target/arm/sve.decode  | 10 +-
 target/arm/translate-sve.c | 11 +++
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 1acc3ae080..7d4c33fb5b 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,10 +59,7 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL1000 010- -00-  111-      # SVE 32-bit gather prefetch 
(vector+imm)
-FAIL1000 0100 0-1-  0---      # SVE 32-bit gather prefetch 
(scalar+vector)
 FAIL1010 010-   011-      # SVE contiguous FF load 
(scalar+scalar)
 FAIL1010 010- ---1  101-      # SVE contiguous NF load 
(scalar+imm)
 FAIL1010 010- -01-  000-      # SVE load & replicate 32 
bytes (scalar+scalar)
 FAIL1010 010- -010  001-      # SVE load & replicate 32 
bytes (scalar+imm)
-FAIL1100 010-         # SVE 64-bit gather 
load/prefetch
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index a54feb2f61..908643d7d9 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1183,10 +1183,10 @@ LD1RO_zpri  1010010 .. 01 0 001 ... . . 
\
 @rpri_load_msz nreg=0
 
 # SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets)
-PRF 110 00 -1 - 0-- --- - 0 
+PRF_ns  110 00 -1 - 0-- --- - 0 
 
 # SVE 32-bit gather prefetch (vector plus immediate)
-PRF 110 -- 00 - 111 --- - 0 
+PRF_ns  110 -- 00 - 111 --- - 0 
 
 # SVE contiguous prefetch (scalar plus immediate)
 PRF 110 11 1- - 0-- --- - 0 
@@ -1223,13 +1223,13 @@ LD1_zpiz1100010 .. 01 . 1.. ... . . 
\
 @rpri_g_load esz=3
 
 # SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets)
-PRF 1100010 00 11 - 1-- --- - 0 
+PRF_ns  1100010 00 11 - 1-- --- - 0 
 
 # SVE 64-bit gather prefetch (scalar plus unpacked 32-bit scaled offsets)
-PRF 1100010 00 -1 - 0-- --- - 0 
+PRF_ns  1100010 00 -1 - 0-- --- - 0 
 
 # SVE 64-bit gather prefetch (vector plus immediate)
-PRF 1100010 -- 00 - 111 --- - 0 
+PRF_ns  1100010 -- 00 - 111 --- - 0 
 
 ### SVE Memory Store Group
 
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b23c6aa0bf..bbf3bf2119 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5971,6 +5971,17 @@ static bool trans_PRF_rr(DisasContext *s, arg_PRF_rr *a)
 return true;
 }
 
+static bool trans_PRF_ns(DisasContext *s, arg_PRF_ns *a)
+{
+if (!dc_isar_feature(aa64_sve, s)) {
+return false;
+}
+/* Prefetch is a nop within QEMU.  */
+s->is_nonstreaming = true;
+(void)sve_access_check(s);
+return true;
+}
+
 /*
  * Move Prefix
  *
-- 
2.34.1

[PATCH v6 08/45] target/arm: Mark FTSMUL, FTMAD, FADDA as non-streaming

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  3 ---
 target/arm/translate-sve.c | 15 +++
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 4ff2df82e5..b5eaa2d0fa 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,9 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL0110 0101 --0-   11--     # FTSMUL
-FAIL0110 0101 --01 0--- 100-      # FTMAD
-FAIL0110 0101 --01 1--- 001-      # FADDA
 FAIL0100 0101 --0-  1001 10--     # SMMLA, UMMLA, USMMLA
 FAIL0100 0101 --1-  1---      # SVE2 string/histo/crypto 
instructions
 FAIL1000 010- -00-  10--      # SVE2 32-bit gather NT load 
(vector+scalar)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 4ff2102fc8..d5aad53923 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3861,9 +3861,9 @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
 NULL,   gen_helper_sve_ftmad_h,
 gen_helper_sve_ftmad_s, gen_helper_sve_ftmad_d,
 };
-TRANS_FEAT(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
-   ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
-   a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
+ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
+a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
 
 /*
  *** SVE Floating Point Accumulating Reduction Group
@@ -3886,6 +3886,7 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
 if (a->esz == 0 || !dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -3923,12 +3924,18 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz 
*a)
 DO_FP3(FADD_zzz, fadd)
 DO_FP3(FSUB_zzz, fsub)
 DO_FP3(FMUL_zzz, fmul)
-DO_FP3(FTSMUL, ftsmul)
 DO_FP3(FRECPS, recps)
 DO_FP3(FRSQRTS, rsqrts)
 
 #undef DO_FP3
 
+static gen_helper_gvec_3_ptr * const ftsmul_fns[4] = {
+NULL, gen_helper_gvec_ftsmul_h,
+gen_helper_gvec_ftsmul_s, gen_helper_gvec_ftsmul_d
+};
+TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
+ftsmul_fns[a->esz], a, 0)
+
 /*
  *** SVE Floating Point Arithmetic - Predicated Group
  */
-- 
2.34.1

[PATCH v6 25/45] target/arm: Implement BFMOPA, BFMOPS

Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h|  2 ++
 target/arm/sme.decode  |  2 ++
 target/arm/sme_helper.c| 56 ++
 target/arm/translate-sme.c | 30 
 4 files changed, 90 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index f50d0fe1d6..1d68fb8c74 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -125,3 +125,5 @@ DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_bfmopa, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index ba4774d174..afd9c0dffd 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -73,3 +73,5 @@ ADDVA_d 1100 11 01000 1 ... ... . 00 ...  
  @adda_64
 
 FMOPA_s 1000 100 . ... ... . . 00 ..@op_32
 FMOPA_d 1000 110 . ... ... . . 0 ...@op_64
+
+BFMOPA  1001 100 . ... ... . . 00 ..@op_32
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index 7dc76b6a1c..690a53eee2 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -987,3 +987,59 @@ void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, 
void *vpn,
 }
 }
 }
+
+/*
+ * Alter PAIR as needed for controlling predicates being false,
+ * and for NEG on an enabled row element.
+ */
+static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t 
neg)
+{
+/*
+ * The pseudocode uses a conditional negate after the conditional zero.
+ * It is simpler here to unconditionally negate before conditional zero.
+ */
+pair ^= neg;
+if (!(pg & 1)) {
+pair &= 0xu;
+}
+if (!(pg & 4)) {
+pair &= 0xu;
+}
+return pair;
+}
+
+void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn,
+void *vpm, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_maxsz(desc);
+uint32_t neg = simd_data(desc) * 0x80008000u;
+uint16_t *pn = vpn, *pm = vpm;
+
+for (row = 0; row < oprsz; ) {
+uint16_t prow = pn[H2(row >> 4)];
+do {
+void *vza_row = vza + tile_vslice_offset(row);
+uint32_t n = *(uint32_t *)(vzn + H1_4(row));
+
+n = f16mop_adj_pair(n, prow, neg);
+
+for (col = 0; col < oprsz; ) {
+uint16_t pcol = pm[H2(col >> 4)];
+do {
+if (prow & pcol & 0b0101) {
+uint32_t *a = vza_row + H1_4(col);
+uint32_t m = *(uint32_t *)(vzm + H1_4(col));
+
+m = f16mop_adj_pair(m, pcol, 0);
+*a = bfdotadd(*a, n, m);
+
+col += 4;
+pcol >>= 4;
+}
+} while (col & 15);
+}
+row += 4;
+prow >>= 4;
+} while (row & 15);
+}
+}
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
index fa8f343a7d..ecb7583c55 100644
--- a/target/arm/translate-sme.c
+++ b/target/arm/translate-sme.c
@@ -299,6 +299,33 @@ TRANS_FEAT(ADDVA_s, aa64_sme, do_adda, a, MO_32, 
gen_helper_sme_addva_s)
 TRANS_FEAT(ADDHA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addha_d)
 TRANS_FEAT(ADDVA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addva_d)
 
+static bool do_outprod(DisasContext *s, arg_op *a, MemOp esz,
+   gen_helper_gvec_5 *fn)
+{
+int svl = streaming_vec_reg_size(s);
+uint32_t desc = simd_desc(svl, svl, a->sub);
+TCGv_ptr za, zn, zm, pn, pm;
+
+if (!sme_smza_enabled_check(s)) {
+return true;
+}
+
+/* Sum XZR+zad to find ZAd. */
+za = get_tile_rowcol(s, esz, 31, a->zad, false);
+zn = vec_full_reg_ptr(s, a->zn);
+zm = vec_full_reg_ptr(s, a->zm);
+pn = pred_full_reg_ptr(s, a->pn);
+pm = pred_full_reg_ptr(s, a->pm);
+
+fn(za, zn, zm, pn, pm, tcg_constant_i32(desc));
+
+tcg_temp_free_ptr(za);
+tcg_temp_free_ptr(zn);
+tcg_temp_free_ptr(pn);
+tcg_temp_free_ptr(pm);
+return true;
+}
+
 static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
 gen_helper_gvec_5_ptr *fn)
 {
@@ -330,3 +357,6 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, 
MemOp esz,
 
 TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, 
gen_helper_sme_fmopa_s)
 TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, 
gen_helper_sme_fmopa_d)
+
+/* TODO: FEAT_EBF16 */
+TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa)
-- 
2.34.1

[PATCH v6 14/45] target/arm: Mark LD1RO as non-streaming

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode | 3 ---
 target/arm/translate-sve.c | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 2b5432bf85..47708ccc8d 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -58,6 +58,3 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --0-        # Load/store FP register 
(unscaled imm)
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
-
-FAIL1010 010- -01-  000-      # SVE load & replicate 32 
bytes (scalar+scalar)
-FAIL1010 010- -010  001-      # SVE load & replicate 32 
bytes (scalar+imm)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 5182ee4c06..96e934c1ea 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5062,6 +5062,7 @@ static bool trans_LD1RO_zprr(DisasContext *s, 
arg_rprr_load *a)
 if (a->rm == 31) {
 return false;
 }
+s->is_nonstreaming = true;
 if (sve_access_check(s)) {
 TCGv_i64 addr = new_tmp_a64(s);
 tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
@@ -5076,6 +5077,7 @@ static bool trans_LD1RO_zpri(DisasContext *s, 
arg_rpri_load *a)
 if (!dc_isar_feature(aa64_sve_f64mm, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (sve_access_check(s)) {
 TCGv_i64 addr = new_tmp_a64(s);
 tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 32);
-- 
2.34.1

[PATCH v6 13/45] target/arm: Mark LDFF1 and LDNF1 as non-streaming

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode | 2 --
 target/arm/translate-sve.c | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 7d4c33fb5b..2b5432bf85 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,7 +59,5 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL1010 010-   011-      # SVE contiguous FF load 
(scalar+scalar)
-FAIL1010 010- ---1  101-      # SVE contiguous NF load 
(scalar+imm)
 FAIL1010 010- -01-  000-      # SVE load & replicate 32 
bytes (scalar+scalar)
 FAIL1010 010- -010  001-      # SVE load & replicate 32 
bytes (scalar+imm)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index bbf3bf2119..5182ee4c06 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4805,6 +4805,7 @@ static bool trans_LDFF1_zprr(DisasContext *s, 
arg_rprr_load *a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (sve_access_check(s)) {
 TCGv_i64 addr = new_tmp_a64(s);
 tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
@@ -4906,6 +4907,7 @@ static bool trans_LDNF1_zpri(DisasContext *s, 
arg_rpri_load *a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (sve_access_check(s)) {
 int vsz = vec_full_reg_size(s);
 int elements = vsz >> dtype_esz[a->dtype];
-- 
2.34.1

[PATCH v6 36/45] linux-user/aarch64: Tidy target_restore_sigframe error return

Fold the return value setting into the goto, so each
point of failure need not do both.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/signal.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 3cef2f44cf..8b352abb97 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -287,7 +287,6 @@ static int target_restore_sigframe(CPUARMState *env,
 struct target_sve_context *sve = NULL;
 uint64_t extra_datap = 0;
 bool used_extra = false;
-bool err = false;
 int vq = 0, sve_size = 0;
 
 target_restore_general_frame(env, sf);
@@ -301,8 +300,7 @@ static int target_restore_sigframe(CPUARMState *env,
 switch (magic) {
 case 0:
 if (size != 0) {
-err = true;
-goto exit;
+goto err;
 }
 if (used_extra) {
 ctx = NULL;
@@ -314,8 +312,7 @@ static int target_restore_sigframe(CPUARMState *env,
 
 case TARGET_FPSIMD_MAGIC:
 if (fpsimd || size != sizeof(struct target_fpsimd_context)) {
-err = true;
-goto exit;
+goto err;
 }
 fpsimd = (struct target_fpsimd_context *)ctx;
 break;
@@ -329,13 +326,11 @@ static int target_restore_sigframe(CPUARMState *env,
 break;
 }
 }
-err = true;
-goto exit;
+goto err;
 
 case TARGET_EXTRA_MAGIC:
 if (extra || size != sizeof(struct target_extra_context)) {
-err = true;
-goto exit;
+goto err;
 }
 __get_user(extra_datap,
&((struct target_extra_context *)ctx)->datap);
@@ -348,8 +343,7 @@ static int target_restore_sigframe(CPUARMState *env,
 /* Unknown record -- we certainly didn't generate it.
  * Did we in fact get out of sync?
  */
-err = true;
-goto exit;
+goto err;
 }
 ctx = (void *)ctx + size;
 }
@@ -358,17 +352,19 @@ static int target_restore_sigframe(CPUARMState *env,
 if (fpsimd) {
 target_restore_fpsimd_record(env, fpsimd);
 } else {
-err = true;
+goto err;
 }
 
 /* SVE data, if present, overwrites FPSIMD data.  */
 if (sve) {
 target_restore_sve_record(env, sve, vq);
 }
-
- exit:
 unlock_user(extra, extra_datap, 0);
-return err;
+return 0;
+
+ err:
+unlock_user(extra, extra_datap, 0);
+return 1;
 }
 
 static abi_ulong get_sigframe(struct target_sigaction *ka,
-- 
2.34.1

[PATCH v6 16/45] target/arm: Handle SME in sve_access_check

The pseudocode for CheckSVEEnabled gains a check for Streaming
SVE mode, and for SME present but SVE absent.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b16d81bf19..b7b64f7358 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1183,21 +1183,31 @@ static bool fp_access_check(DisasContext *s)
 return true;
 }
 
-/* Check that SVE access is enabled.  If it is, return true.
+/*
+ * Check that SVE access is enabled.  If it is, return true.
  * If not, emit code to generate an appropriate exception and return false.
+ * This function corresponds to CheckSVEEnabled().
  */
 bool sve_access_check(DisasContext *s)
 {
-if (s->sve_excp_el) {
-assert(!s->sve_access_checked);
-s->sve_access_checked = true;
-
+if (s->pstate_sm || !dc_isar_feature(aa64_sve, s)) {
+assert(dc_isar_feature(aa64_sme, s));
+if (!sme_sm_enabled_check(s)) {
+goto fail_exit;
+}
+} else if (s->sve_excp_el) {
 gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
   syn_sve_access_trap(), s->sve_excp_el);
-return false;
+goto fail_exit;
 }
 s->sve_access_checked = true;
 return fp_access_check(s);
+
+ fail_exit:
+/* Assert that we only raise one exception per instruction. */
+assert(!s->sve_access_checked);
+s->sve_access_checked = true;
+return false;
 }
 
 /*
-- 
2.34.1

[PATCH v6 24/45] target/arm: Implement FMOPA, FMOPS (non-widening)

Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h|  5 +++
 target/arm/sme.decode  |  9 +
 target/arm/sme_helper.c| 69 ++
 target/arm/translate-sme.c | 32 ++
 4 files changed, 115 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index 753e9e624c..f50d0fe1d6 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -120,3 +120,8 @@ DEF_HELPER_FLAGS_5(sme_addha_s, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index 8cb6c4053c..ba4774d174 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -64,3 +64,12 @@ ADDHA_s 1100 10 01000 0 ... ... . 000 .. 
   @adda_32
 ADDVA_s 1100 10 01000 1 ... ... . 000 ..@adda_32
 ADDHA_d 1100 11 01000 0 ... ... . 00 ...@adda_64
 ADDVA_d 1100 11 01000 1 ... ... . 00 ...@adda_64
+
+### SME Outer Product
+
+&op zad zn zm pm pn sub:bool
+@op_32   ... zm:5 pm:3 pn:3 zn:5 sub:1 .. zad:2 &op
+@op_64   ... zm:5 pm:3 pn:3 zn:5 sub:1 .  zad:3 &op
+
+FMOPA_s 1000 100 . ... ... . . 00 ..@op_32
+FMOPA_d 1000 110 . ... ... . . 0 ...@op_64
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index f1e924db74..7dc76b6a1c 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -25,6 +25,7 @@
 #include "exec/cpu_ldst.h"
 #include "exec/exec-all.h"
 #include "qemu/int128.h"
+#include "fpu/softfloat.h"
 #include "vec_internal.h"
 #include "sve_ldst_internal.h"
 
@@ -918,3 +919,71 @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn,
 }
 }
 }
+
+void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn,
+ void *vpm, void *vst, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_maxsz(desc);
+uint32_t neg = simd_data(desc) << 31;
+uint16_t *pn = vpn, *pm = vpm;
+float_status fpst;
+
+/*
+ * Make a copy of float_status because this operation does not
+ * update the cumulative fp exception status.  It also produces
+ * default nans.
+ */
+fpst = *(float_status *)vst;
+set_default_nan_mode(true, &fpst);
+
+for (row = 0; row < oprsz; ) {
+uint16_t pa = pn[H2(row >> 4)];
+do {
+if (pa & 1) {
+void *vza_row = vza + tile_vslice_offset(row);
+uint32_t n = *(uint32_t *)(vzn + H1_4(row)) ^ neg;
+
+for (col = 0; col < oprsz; ) {
+uint16_t pb = pm[H2(col >> 4)];
+do {
+if (pb & 1) {
+uint32_t *a = vza_row + H1_4(col);
+uint32_t *m = vzm + H1_4(col);
+*a = float32_muladd(n, *m, *a, 0, vst);
+}
+col += 4;
+pb >>= 4;
+} while (col & 15);
+}
+}
+row += 4;
+pa >>= 4;
+} while (row & 15);
+}
+}
+
+void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn,
+ void *vpm, void *vst, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
+uint64_t neg = (uint64_t)simd_data(desc) << 63;
+uint64_t *za = vza, *zn = vzn, *zm = vzm;
+uint8_t *pn = vpn, *pm = vpm;
+float_status fpst = *(float_status *)vst;
+
+set_default_nan_mode(true, &fpst);
+
+for (row = 0; row < oprsz; ++row) {
+if (pn[H1(row)] & 1) {
+uint64_t *za_row = &za[tile_vslice_index(row)];
+uint64_t n = zn[row] ^ neg;
+
+for (col = 0; col < oprsz; ++col) {
+if (pm[H1(col)] & 1) {
+uint64_t *a = &za_row[col];
+*a = float64_muladd(n, zm[col], *a, 0, &fpst);
+}
+}
+}
+}
+}
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
index d3b9cdd5c4..fa8f343a7d 100644
--- a/target/arm/translate-sme.c
+++ b/target/arm/translate-sme.c
@@ -298,3 +298,35 @@ TRANS_FEAT(ADDHA_s, aa64_sme, do_adda, a, MO_32, 
gen_helper_sme_addha_s)
 TRANS_FEAT(ADDVA_s, aa64_sme, do_adda, a, MO_32, gen_helper_sme_addva_s)
 TRANS_FEAT(ADDHA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addha_d)
 TRANS_FEAT(ADDVA_d, aa64_sme_i16i64, d

[PATCH v6 15/45] target/arm: Add SME enablement checks

These functions will be used to verify that the cpu
is in the correct state for a given instruction.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.h | 21 +
 target/arm/translate-a64.c | 34 ++
 2 files changed, 55 insertions(+)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 789b6e8e78..02fb95e019 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -29,6 +29,27 @@ void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v);
 bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
 unsigned int imms, unsigned int immr);
 bool sve_access_check(DisasContext *s);
+bool sme_enabled_check(DisasContext *s);
+bool sme_enabled_check_with_svcr(DisasContext *s, unsigned);
+
+/* This function corresponds to CheckStreamingSVEEnabled. */
+static inline bool sme_sm_enabled_check(DisasContext *s)
+{
+return sme_enabled_check_with_svcr(s, R_SVCR_SM_MASK);
+}
+
+/* This function corresponds to CheckSMEAndZAEnabled. */
+static inline bool sme_za_enabled_check(DisasContext *s)
+{
+return sme_enabled_check_with_svcr(s, R_SVCR_ZA_MASK);
+}
+
+/* Note that this function corresponds to CheckStreamingSVEAndZAEnabled. */
+static inline bool sme_smza_enabled_check(DisasContext *s)
+{
+return sme_enabled_check_with_svcr(s, R_SVCR_SM_MASK | R_SVCR_ZA_MASK);
+}
+
 TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr);
 TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
 bool tag_checked, int log2_size);
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 7fab7f64f8..b16d81bf19 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1216,6 +1216,40 @@ static bool sme_access_check(DisasContext *s)
 return true;
 }
 
+/* This function corresponds to CheckSMEEnabled. */
+bool sme_enabled_check(DisasContext *s)
+{
+/*
+ * Note that unlike sve_excp_el, we have not constrained sme_excp_el
+ * to be zero when fp_excp_el has priority.  This is because we need
+ * sme_excp_el by itself for cpregs access checks.
+ */
+if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) {
+s->fp_access_checked = true;
+return sme_access_check(s);
+}
+return fp_access_check_only(s);
+}
+
+/* Common subroutine for CheckSMEAnd*Enabled. */
+bool sme_enabled_check_with_svcr(DisasContext *s, unsigned req)
+{
+if (!sme_enabled_check(s)) {
+return false;
+}
+if (FIELD_EX64(req, SVCR, SM) && !s->pstate_sm) {
+gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+   syn_smetrap(SME_ET_NotStreaming, false));
+return false;
+}
+if (FIELD_EX64(req, SVCR, ZA) && !s->pstate_za) {
+gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+   syn_smetrap(SME_ET_InactiveZA, false));
+return false;
+}
+return true;
+}
+
 /*
  * This utility function is for doing register extension with an
  * optional shift. You will likely want to pass a temporary for the
-- 
2.34.1

[PATCH v6 27/45] target/arm: Implement SME integer outer product

This is SMOPA, SUMOPA, USMOPA_s, UMOPA, for both Int8 and Int16.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h| 16 
 target/arm/sme.decode  | 10 +
 target/arm/sme_helper.c| 82 ++
 target/arm/translate-sme.c | 10 +
 4 files changed, 118 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index 4d5d05db3a..d2d544a696 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -129,3 +129,19 @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_6(sme_bfmopa, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_umopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_sumopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_usmopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_smopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_umopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_sumopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index e8d27fd8a0..628804e37a 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -76,3 +76,13 @@ FMOPA_d 1000 110 . ... ... . . 0 ... 
   @op_64
 
 BFMOPA  1001 100 . ... ... . . 00 ..@op_32
 FMOPA_h 1001 101 . ... ... . . 00 ..@op_32
+
+SMOPA_s 101 0 10 0 . ... ... . . 00 ..  @op_32
+SUMOPA_s101 0 10 1 . ... ... . . 00 ..  @op_32
+USMOPA_s101 1 10 0 . ... ... . . 00 ..  @op_32
+UMOPA_s 101 1 10 1 . ... ... . . 00 ..  @op_32
+
+SMOPA_d 101 0 11 0 . ... ... . . 0 ...  @op_64
+SUMOPA_d101 0 11 1 . ... ... . . 0 ...  @op_64
+USMOPA_d101 1 11 0 . ... ... . . 0 ...  @op_64
+UMOPA_d 101 1 11 1 . ... ... . . 0 ...  @op_64
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index 302f89c30b..f891306bb9 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -1117,3 +1117,85 @@ void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, 
void *vpn,
 } while (row & 15);
 }
 }
+
+typedef uint64_t IMOPFn(uint64_t, uint64_t, uint64_t, uint8_t, bool);
+
+static inline void do_imopa(uint64_t *za, uint64_t *zn, uint64_t *zm,
+uint8_t *pn, uint8_t *pm,
+uint32_t desc, IMOPFn *fn)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
+bool neg = simd_data(desc);
+
+for (row = 0; row < oprsz; ++row) {
+uint8_t pa = pn[H1(row)];
+uint64_t *za_row = &za[tile_vslice_index(row)];
+uint64_t n = zn[row];
+
+for (col = 0; col < oprsz; ++col) {
+uint8_t pb = pm[H1(col)];
+uint64_t *a = &za_row[col];
+
+*a = fn(n, zm[col], *a, pa & pb, neg);
+}
+}
+}
+
+#define DEF_IMOP_32(NAME, NTYPE, MTYPE) \
+static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \
+{   \
+uint32_t sum0 = 0, sum1 = 0;\
+/* Apply P to N as a mask, making the inactive elements 0. */   \
+n &= expand_pred_b(p);  \
+sum0 += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);  \
+sum0 += (NTYPE)(n >> 8) * (MTYPE)(m >> 8);  \
+sum0 += (NTYPE)(n >> 16) * (MTYPE)(m >> 16);\
+sum0 += (NTYPE)(n >> 24) * (MTYPE)(m >> 24);\
+sum1 += (NTYPE)(n >> 32) * (MTYPE)(m >> 32);\
+sum1 += (NTYPE)(n >> 40) * (MTYPE)(m >> 40);\
+sum1 += (NTYPE)(n >> 48) * (MTYPE)(m >> 48);\
+sum1 += (NTYPE)(n >> 56) * (MTYPE)(m >> 56);\
+if (neg) {  \
+sum0 = (uint32_t)a - sum0, sum1 = (uint32_t)(a >> 32) - sum1;   \
+} else {\
+sum0 = (uint32_t)a + sum0, sum1 = (uint32_t)(a >> 32) + sum1;   \
+}

[PATCH v6 17/45] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL