date:20231009

Re: [PATCH] scripts/xml-preprocess: Make sure this script is invoked via the right Python

2023-10-09 Thread Paolo Bonzini

Queued, thanks.

Paolo

vIOMMU - PCI pass through to Layer 2 VMs (Nested Virtualization)

2023-10-09 Thread Markus Frank


Hello,

I have already sent this email to qemu-discuss but I did not get a reply.
https://lists.nongnu.org/archive/html/qemu-discuss/2023-09/msg00034.html
Maybe someone here could help me and reply to this email or the one on 
qemu-discuss?

I would like to pass through PCI devices to Layer-2 VMs via Nested 
Virtualization.

Is there current documentation for this topic somewhere?

I used these parameters:
-machine ...,kernel-irqchip=split
-device intel-iommu

With these parameters PCI pass through to L2-VMs worked fine.


Now I come to the part where I get confused.

https://wiki.qemu.org/Features/VT-d#With_Virtio_Devices
Is this documentation relevant for PCI pass through? Do I need DMAR for virtio 
devices?

And there is also the virtio-iommu device where I also could use the i440fx 
chipset.
https://michael2012z.medium.com/virtio-iommu-789369049443

When adding "-device virtio-iommu-pci" pci pass through also works
but I get "kvm: virtio_iommu_translate no mapping for 0x1002030f000 for sid=240"
when starting qemu. What could that mean?

What do these parameters 
"disable-legacy=on,disable-modern=off,iommu_platform=on,ats=on"
actually do? When do I need them and on which virtio devices?

And which device should I rather use: virtio-iommu or intel-iommu?

Thanks in advance,
Markus

Re: [PATCH v17 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-10-09 Thread Marc-André Lureau

On Fri, Oct 6, 2023 at 5:09 AM Gurchetan Singh
 wrote:
>
> This adds initial support for gfxstream and cross-domain.  Both
> features rely on virtio-gpu blob resources and context types, which
> are also implemented in this patch.
>
> gfxstream has a long and illustrious history in Android graphics
> paravirtualization.  It has been powering graphics in the Android
> Studio Emulator for more than a decade, which is the main developer
> platform.
>
> Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
> The key design characteristic was a 1:1 threading model and
> auto-generation, which fit nicely with the OpenGLES spec.  It also
> allowed easy layering with ANGLE on the host, which provides the GLES
> implementations on Windows or MacOS enviroments.
>
> gfxstream has traditionally been maintained by a single engineer, and
> between 2015 to 2021, the goldfish throne passed to Frank Yang.
> Historians often remark this glorious reign ("pax gfxstreama" is the
> academic term) was comparable to that of Augustus and both Queen
> Elizabeths.  Just to name a few accomplishments in a resplendent
> panoply: higher versions of GLES, address space graphics, snapshot
> support and CTS compliant Vulkan [b].
>
> One major drawback was the use of out-of-tree goldfish drivers.
> Android engineers didn't know much about DRM/KMS and especially TTM so
> a simple guest to host pipe was conceived.
>
> Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
> the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
> port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
> It was a symbol compatible replacement of virglrenderer [c] and named
> "AVDVirglrenderer".  This implementation forms the basis of the
> current gfxstream host implementation still in use today.
>
> cross-domain support follows a similar arc.  Originally conceived by
> Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
> 2018, it initially relied on the downstream "virtio-wl" device.
>
> In 2020 and 2021, virtio-gpu was extended to include blob resources
> and multiple timelines by yours truly, features gfxstream/cross-domain
> both require to function correctly.
>
> Right now, we stand at the precipice of a truly fantastic possibility:
> the Android Emulator powered by upstream QEMU and upstream Linux
> kernel.  gfxstream will then be packaged properfully, and app
> developers can even fix gfxstream bugs on their own if they encounter
> them.
>
> It's been quite the ride, my friends.  Where will gfxstream head next,
> nobody really knows.  I wouldn't be surprised if it's around for
> another decade, maintained by a new generation of Android graphics
> enthusiasts.
>
> Technical details:
>   - Very simple initial display integration: just used Pixman
>   - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
> calls
>
> Next steps for Android VMs:
>   - The next step would be improving display integration and UI interfaces
> with the goal of the QEMU upstream graphics being in an emulator
> release [d].
>
> Next steps for Linux VMs for display virtualization:
>   - For widespread distribution, someone needs to package Sommelier or the
> wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
> versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
> which allows disabling KMS hypercalls.  If anyone cares enough, it'll
> probably be possible to build a custom VM variant that uses this display
> virtualization strategy.
>
> [a] https://android-review.googlesource.com/c/platform/development/+/34470
> [b] 
> https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
> [c] 
> https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
> [d] https://developer.android.com/studio/releases/emulator
> [e] https://github.com/talex5/wayland-proxy-virtwl
>
> Signed-off-by: Gurchetan Singh 
> Tested-by: Alyssa Ross 
> Tested-by: Emmanouil Pitsidianakis 
> Tested-by: Akihiko Odaki 
> Reviewed-by: Emmanouil Pitsidianakis 
> Reviewed-by: Antonio Caggiano 
> Reviewed-by: Akihiko Odaki 
> ---
>  hw/display/virtio-gpu-pci-rutabaga.c |   47 ++
>  hw/display/virtio-gpu-rutabaga.c | 1113 ++
>  hw/display/virtio-vga-rutabaga.c |   50 ++
>  3 files changed, 1210 insertions(+)
>  create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c
>  create mode 100644 hw/display/virtio-gpu-rutabaga.c
>  create mode 100644 hw/display/virtio-vga-rutabaga.c
>
> diff --git a/hw/display/virtio-gpu-pci-rutabaga.c 
> b/hw/display/virtio-gpu-pci-rutabaga.c
> new file mode 100644
> index 00..c96729e198
> --- /dev/null
> +++ b/hw/display/virtio-gpu-pci-rutabaga.c
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/module.h"
> +#include "hw/pci/pci.h"
> +#include "hw/qdev-properties.h"
> +

Re: [PATCH v17 0/9] gfxstream + rutabaga_gfx

2023-10-09 Thread Marc-André Lureau

Hi

On Fri, Oct 6, 2023 at 5:08 AM Gurchetan Singh
 wrote:
>
> From: Gurchetan Singh 
>
> Branch containing changes:
>
> https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v17
>
> Changes since v16:
>
> - Fixed typo mentioned here:
>
> https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg01407.html
>
> Antonio Caggiano (2):
>   virtio-gpu: CONTEXT_INIT feature
>   virtio-gpu: blob prep
>
> Dr. David Alan Gilbert (1):
>   virtio: Add shared memory capability
>
> Gerd Hoffmann (1):
>   virtio-gpu: hostmem
>
> Gurchetan Singh (5):
>   gfxstream + rutabaga prep: added need defintions, fields, and options
>   gfxstream + rutabaga: add initial support for gfxstream
>   gfxstream + rutabaga: meson support
>   gfxstream + rutabaga: enable rutabaga
>   docs/system: add basic virtio-gpu documentation
>

Except for a few misc style issues, the series looks good to me.

Gerd, as the virtio-gpu "odd fixes" maintainer, any chance you take a
quick look and ack the series? Even better if you send a PR :)

thanks

Re: [PATCH 0/2] topic: meson: add more compiler hardening flags

2023-10-09 Thread Thomas Huth


On 05/10/2023 19.38, Daniel P. Berrangé wrote:
...


I also tested enabling -ftrapv, to change signed integer
overflow from wrapping, to trapping instead. This exposed a
bug in the string-input-visitor which overflows when parsing
ranges, and exposed the test-int128 code as (harmlessly)
overflowing during its testing. Both can be fixed, but I'm
not entirely sure whether -ftrapv is viable or not. I was
wondering about TCG and whether it has a need to intentionally
allow integer overflow for any of its instruction emulation
requirements ?
I'm not an expert when it comes to this question, but as far as I 
understood, we are using -fwrapv (with "w", not "t") on purpose, see 
meson.build:


# We use -fwrapv to tell the compiler that we require a C dialect where
# left shift of signed integers is well defined and has the expected
# 2s-complement style results. (Both clang and gcc agree that it
# provides these semantics.)

And according to the man-page of gcc:

 The options -ftrapv and -fwrapv override each other,
 so using -ftrapv -fwrapv on the command-line results
 in -fwrapv being effective.

If I got that right, this means you cannot use -ftrapv with QEMU.

 Thomas

Re: [PATCH] hw/ppc: Add nest1 chiplet control scoms

2023-10-09 Thread Cédric Le Goater


Hello Chalapathi,

On 10/6/23 18:34, Chalapathi V wrote:

-Create nest1 chiplet model and add nest1 chiplet control scoms.
-Implementation of chiplet control scoms are put in pnv_pervasive.c
  as control scoms are common for all chiplets.


I don't really understand the need for this pnv_pervasive.c file.
Do you have plans for more models using the same set of scoms
registers ?


Anyhow, overall it looks good.

Here are some suggestions for the next respin :

* Please split the model implementation from the wiring in the board.
  three patches with a cover letter would be nice. The first would
  introduce pnv_pervasive.c with some commit log explaining the
  rationale. Then the model, then the wiring.

  See https://lore.kernel.org/qemu-devel/ for series examples.

* In the commit log, pleas add more details on the unit being modeled,
  not the specs but a couple of words/sentences describing what is
  the nest1 unit and how it interacts with the rest of the machine.
  What is modeled, what is not, etc. People are simply curious.

* Get rid of the useless white lines

* Add a SPDX-License-Identifier tag in new files.

* Run scripts/checkpatch.pl

Thanks,

C.



Signed-off-by: Chalapathi V 
---
  hw/ppc/meson.build|   2 +
  hw/ppc/pnv.c  |  11 +++
  hw/ppc/pnv_nest1_chiplet.c| 141 +
  hw/ppc/pnv_pervasive.c| 146 ++
  include/hw/ppc/pnv_chip.h |   2 +
  include/hw/ppc/pnv_nest_chiplet.h |  27 ++
  include/hw/ppc/pnv_pervasive.h|  30 ++
  include/hw/ppc/pnv_xscom.h|   3 +
  8 files changed, 362 insertions(+)
  create mode 100644 hw/ppc/pnv_nest1_chiplet.c
  create mode 100644 hw/ppc/pnv_pervasive.c
  create mode 100644 include/hw/ppc/pnv_nest_chiplet.h
  create mode 100644 include/hw/ppc/pnv_pervasive.h

diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index 7c2c52434a..541d69cf94 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -50,6 +50,8 @@ ppc_ss.add(when: 'CONFIG_POWERNV', if_true: files(
'pnv_bmc.c',
'pnv_homer.c',
'pnv_pnor.c',
+  'pnv_nest1_chiplet.c',
+  'pnv_pervasive.c',
  ))
  # PowerPC 4xx boards
  ppc_ss.add(when: 'CONFIG_PPC405', if_true: files(
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index eb54f93986..0e1c944753 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1660,6 +1660,8 @@ static void pnv_chip_power10_instance_init(Object *obj)
  object_initialize_child(obj, "occ",  &chip10->occ, TYPE_PNV10_OCC);
  object_initialize_child(obj, "sbe",  &chip10->sbe, TYPE_PNV10_SBE);
  object_initialize_child(obj, "homer", &chip10->homer, TYPE_PNV10_HOMER);
+object_initialize_child(obj, "nest1_chiplet", &chip10->nest1_chiplet,
+TYPE_PNV_NEST1_CHIPLET);
  
  chip->num_pecs = pcc->num_pecs;
  
@@ -1829,6 +1831,15 @@ static void pnv_chip_power10_realize(DeviceState *dev, Error **errp)

  memory_region_add_subregion(get_system_memory(), PNV10_HOMER_BASE(chip),
  &chip10->homer.regs);
  
+/* nest1 chiplet control regs */

+object_property_set_link(OBJECT(&chip10->nest1_chiplet), "chip",
+ OBJECT(chip), &error_abort);
+if (!qdev_realize(DEVICE(&chip10->nest1_chiplet), NULL, errp)) {
+return;
+}
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_NEST1_CTRL_CHIPLET_BASE,
+   &chip10->nest1_chiplet.xscom_ctrl_regs);
+
  /* PHBs */
  pnv_chip_power10_phb_realize(chip, &local_err);
  if (local_err) {
diff --git a/hw/ppc/pnv_nest1_chiplet.c b/hw/ppc/pnv_nest1_chiplet.c
new file mode 100644
index 00..c679428213
--- /dev/null
+++ b/hw/ppc/pnv_nest1_chiplet.c
@@ -0,0 +1,141 @@
+/*
+ * QEMU PowerPC nest1 chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_nest_chiplet.h"
+#include "hw/ppc/pnv_pervasive.h"
+#include "hw/ppc/fdt.h"
+
+#include 
+
+/* This chiplet contains nest1 chiplet control unit. More to come later */
+
+static uint64_t pnv_nest1_chiplet_xscom_read(void *opaque, hwaddr addr,
+ unsigned size)
+{
+PnvNest1Chiplet *nest1_chiplet = PNV_NEST1CHIPLET(opaque);
+int reg = addr >> 3;
+uint64_t val = 0;
+
+switch (reg) {
+case 0x000 ... 0x3FF:
+val = pnv_chiplet_ctrl_read(&nest1_chiplet->ctrl_regs, reg, size);
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom read at 0x%" PRIx32 "\n",
+  __func__, reg);
+}
+
+return val;
+}
+
+static void pnv_nest1_chiplet_xscom_write(void *opaque, hwaddr addr,
+  uint64_t val, unsigned

Re: [PATCH 1/2] meson: mitigate against ROP exploits with -fzero-call-used-regs

2023-10-09 Thread Thomas Huth


On 05/10/2023 19.38, Daniel P. Berrangé wrote:

To quote wikipedia:

   "Return-oriented programming (ROP) is a computer security exploit
technique that allows an attacker to execute code in the presence
of security defenses such as executable space protection and code
signing.

In this technique, an attacker gains control of the call stack to
hijack program control flow and then executes carefully chosen
machine instruction sequences that are already present in the
machine's memory, called "gadgets". Each gadget typically ends in
a return instruction and is located in a subroutine within the
existing program and/or shared library code. Chained together,
these gadgets allow an attacker to perform arbitrary operations
on a machine employing defenses that thwart simpler attacks."

QEMU is by no means perfect with an ever growing set of CVEs from
flawed hardware device emulation, which could potentially be
exploited using ROP techniques.

Since GCC 11 there has been a compiler option that can mitigate
against this exploit technique:

 -fzero-call-user-regs

To understand it refer to these two resources:

https://www.jerkeby.se/newsletter/posts/rop-reduction-zero-call-user-regs/
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552262.html

I used two programs to scan qemu-system-x86_64 for ROP gadgets:

   https://github.com/0vercl0k/rp
   https://github.com/JonathanSalwan/ROPgadget

When asked to find 8 byte gadgets, the 'rp' tool reports:

   A total of 440278 gadgets found.
   You decided to keep only the unique ones, 156143 unique gadgets found.

While the ROPgadget tool reports:

   Unique gadgets found: 353122

With the --ropchain argument, the latter attempts to use the found
gadgets to product a chain that can execute arbitrary syscalls. With
current QEMU it succeeds in this task, which is an undesirable
situation.

With QEMU modified to use -fzero-call-user-regs=used-gpr the 'rp' tool
reports

   A total of 528991 gadgets found.
   You decided to keep only the unique ones, 121128 unique gadgets found.

This is 22% fewer unique gadgets

While the ROPgadget tool reports:

   Unique gadgets found: 328605

This is 7% fewer unique gadgets. Crucially though, despite this more
modest reduction, the ROPgadget tool is no longer able to identify a
chain of gadgets for executing arbitrary syscalls. It fails at the
very first step, unable to find gadgets for populating registers for
a future syscall. Having said that, more advanced tools do still
manage to put together a viable ROP chain.

Also this only takes into account QEMU code. QEMU links to many 3rd
party shared libraries and ideally all of them would be compiled with
this same hardening. That becomes a distro policy question though.

In terms of performance impact, TCG was used as an evaluation test
case. We're not interested in protecting TCG since it isn't designed
to provide a security barrier, but it is performance sensitive code,
so useful as a guide to how other areas of QEMU might be impacted.
With the -fzero-call-user-regs=used-gpr argument present, using the
real world test of booting a linux kernel and having init immediately
poweroff, there is a ~1% slow down in performance under TCG. The QEMU
binary size also grows by approximately 1%.

By comparison, using the more aggressive -fzero-call-user-regs=all,
results in a slowdown of over 25% in TCG, which is clearly not an
acceptable impact, and a binary size increase of 5%.

Considering that 'used-gpr' succesfully stopped ROPgadget assembling
a chain, this more targetted protection is a justifiable hardening
/ performance tradeoff.

Signed-off-by: Daniel P. Berrangé 
---
  meson.build | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/meson.build b/meson.build
index 20ceeb8158..2003ca1ba4 100644
--- a/meson.build
+++ b/meson.build
@@ -435,6 +435,17 @@ if get_option('fuzzing')
endif
  endif
  
+# Check further flags that make QEMU more robust against malicious parties

+
+hardening_flags = [
+# Zero out registers used during a function call
+# upon its return. This makes it harder to assemble
+# ROP gadgets into something usable
+'-fzero-call-used-regs=used-gpr',
+]
+
+qemu_common_flags += cc.get_supported_arguments(hardening_flags)


Linux kernel uses the same flag and talks about similar performance costs:

 https://github.com/torvalds/linux/commit/a82adfd5c7cb4b

So I think this should be fine fine to be used in QEMU, too.

Reviewed-by: Thomas Huth

Re: [PATCH v4 3/4] qcow2: add zoned emulation capability

2023-10-09 Thread Sam Li

Eric Blake  于2023年9月29日周五 03:17写道：
>
> On Mon, Sep 18, 2023 at 05:53:12PM +0800, Sam Li wrote:
> > By adding zone operations and zoned metadata, the zoned emulation
> > capability enables full emulation support of zoned device using
> > a qcow2 file. The zoned device metadata includes zone type,
> > zoned device state and write pointer of each zone, which is stored
> > to an array of unsigned integers.
> >
> > Each zone of a zoned device makes state transitions following
> > the zone state machine. The zone state machine mainly describes
> > five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
> > READ ONLY and OFFLINE states will generally be affected by device
> > internal events. The operations on zones cause corresponding state
> > changing.
> >
> > Zoned devices have a limit on zone resources, which puts constraints on
> > write operations into zones.
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/qcow2.c  | 709 -
> >  block/qcow2.h  |   2 +
> >  block/trace-events |   2 +
> >  docs/interop/qcow2.txt |   6 +
> >  4 files changed, 717 insertions(+), 2 deletions(-)
>
> You may want to look at scripts/git.orderfile; putting spec changes
> (docs/*) first in your output before implementation is generally
> beneficial to reviewers.
>
> > +++ b/docs/interop/qcow2.txt
> > @@ -367,6 +367,12 @@ The fields of the zoned extension are:
> >  The maximal number of 512-byte sectors of a zone
> >  append request that can be issued to the device.
> >
> > +  36 - 43:  zonedmeta_offset
> > +The offset of zoned metadata structure in the file in 
> > bytes.
>
> For the spec to be useful, you also need to add a section describing
> the layout of the zoned metadata structure actually is.
>
> > +
> > +  44 - 51:  zonedmeta_size
> > +The size of zoned metadata in bytes.
> > +
>
> Can the zoned metadata structure ever occupy more than 4G, or can this
> field be sized at 4 bytes instead of 8?

The zoned metadata is the write pointers of all zones. The size of it
is nr_zones (uint32_t) * write_pointer size (uint64_t). So it will not
occupy more than 4G. But it still need more than 4 bytes.

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
>

Re: [PATCH 2/2] meson: mitigate against use of uninitialize stack for exploits

2023-10-09 Thread Thomas Huth


On 05/10/2023 19.38, Daniel P. Berrangé wrote:

When variables are used without being initialized, there is potential
to take advantage of data that was pre-existing on the stack from an
earlier call, to drive an exploit.

It is good practice to always initialize variables, and the compiler
can warn about flaws when -Wuninitialized is present. This warning,
however, is by no means foolproof with its output varying depending
on compiler version and which optimizations are enabled.

The -ftrivial-auto-var-init option can be used to tell the compiler
to always initialize all variables. This increases the security and
predictability of the program, closing off certain attack vectors,
reducing the risk of unsafe memory disclosure.

While the option takes several possible values, using 'zero' is
considered to be the  option that is likely to lead to semantically
correct or safe behaviour[1]. eg sizes/indexes are not likely to
lead to out-of-bounds accesses when initialized to zero. Pointers
are less likely to point something useful if initialized to zero.

Even with -ftrivial-auto-var-init=zero set, GCC will still issue
warnings with -Wuninitialized if it discovers a problem, so we are
not loosing diagnostics for developers, just hardening runtime
behaviour and making QEMU behave more predictably in case of hitting
bad codepaths.

[1] https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
Signed-off-by: Daniel P. Berrangé 
---
  meson.build | 5 +
  1 file changed, 5 insertions(+)

diff --git a/meson.build b/meson.build
index 2003ca1ba4..19faea8d30 100644
--- a/meson.build
+++ b/meson.build
@@ -442,6 +442,11 @@ hardening_flags = [
  # upon its return. This makes it harder to assemble
  # ROP gadgets into something usable
  '-fzero-call-used-regs=used-gpr',
+
+# Initialize all stack variables to zero. This makes
+# it harder to take advantage of uninitialized stack
+# data to drive exploits
+'-ftrivial-var-auto-init=zero',
  ]


I was a little bit torn about using =zero when I first read your patch, but 
after looking at [1], I tend now also tend to agree that =zero is likely the 
best choice. So from my side:


Reviewed-by: Thomas Huth

virtio via external shared ram

2023-10-09 Thread Janne Karhunen

Hi,

I have created an experimental setup for Linux where all the virtio
data structures and traffic can be allocated by the guest from a ram
blob outside of the guest default ram space. That ram blob can be
hotplugged to the guest or defined via the guests device tree.This is
done as some hypervisors, including tdx/sev/pkvm and others, would
probably benefit from a simple security policy that removes all
set_memory_{encrypted,decrypted} calls to open up the guest dma memory
in fragments that are not only likely to leak information due to the
widespread use of the DMA API but also slow things down for no obvious
reason. From the hypervisors point of view the fragmented shadow page
table space is also an unnecessary slowdown and a source of memory
waste.

I have seen forks of SWIOTLB that do similar things, but fundamentally
they are still SWIOTLB behind the curtains and as such unusable for
low latency / high bandwidth applications due to bouncing (copying)
data back and forth into those external buffers. The setup I have
created can act as virtio as it was designed to be, a zero copy data
transport path.

A trial integration into QEMU could probably look something like this
(in virt.c):

..
emem_map = mmap(NULL, EMEM_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED
| MAP_SYNC, fd, 0);
memory_region_init_ram_ptr(sr, OBJECT(machine), "ext-mem",
EMEM_SIZE, emem_map);
..
emem = g_new(MemoryRegion, 1);
memory_region_add_subregion_overlap(sysmem, emem_physaddr, emem, 1000);
..

So the question I have is that did I understand the qemu RAM model
correctly and would something like that lead to known issues
somewhere?


--
Janne

Re: [v3] Help wanted for enabling -Wshadow=local

2023-10-09 Thread Thomas Huth


On 06/10/2023 21.08, Warner Losh wrote:



On Fri, Oct 6, 2023, 11:55 AM Thomas Huth > wrote:


On 06/10/2023 18.18, Thomas Huth wrote:
 > On 06/10/2023 16.45, Markus Armbruster wrote:
 >> Local variables shadowing other local variables or parameters make the
 >> code needlessly hard to understand.  Bugs love to hide in such code.
 >> Evidence: "[PATCH v3 1/7] migration/rdma: Fix save_page method to fail
 >> on polling error".
 >>
 >> Enabling -Wshadow would prevent bugs like this one.  But we have to
 >> clean up all the offenders first.
 >>
 >> Quite a few people responded to my calls for help.  Thank you so much!
 >>
 >> I'm collecting patches in my git repo at
 >> https://repo.or.cz/qemu/armbru.git
 in branch shadow-next.  All but the
 >> last two are in a pending pull request.
 >>
 >> My test build is down to seven files with warnings.  "[PATCH v2 0/3]
 >> hexagon: GETPC() and shadowing fixes" takes care of four, but it needs a
 >> rebase.
 >>
 >> Remaining three:
 >>
 >>  In file included from ../hw/display/virtio-gpu-virgl.c:19:
 >>  ../hw/display/virtio-gpu-virgl.c: In function
‘virgl_cmd_submit_3d’:
 >>  /work/armbru/qemu/include/hw/virtio/virtio-gpu.h:228:16: warning:
 >> declaration of ‘s’ shadows a previous local [-Wshadow=compatible-local]
 >>    228 | size_t
 >> s;   \
 >>    |    ^
 >>  ../hw/display/virtio-gpu-virgl.c:215:5: note: in expansion of
macro
 >> ‘VIRTIO_GPU_FILL_CMD’
 >>    215 | VIRTIO_GPU_FILL_CMD(cs);
 >>    | ^~~
 >>  ../hw/display/virtio-gpu-virgl.c:213:12: note: shadowed
declaration
 >> is here
 >>    213 | size_t s;
 >>    |    ^
 >>
 >>  In file included from ../contrib/vhost-user-gpu/virgl.h:18,
 >>   from ../contrib/vhost-user-gpu/virgl.c:17:
 >>  ../contrib/vhost-user-gpu/virgl.c: In function
‘virgl_cmd_submit_3d’:
 >>  ../contrib/vhost-user-gpu/vugpu.h:167:16: warning: declaration
of ‘s’
 >> shadows a previous local [-Wshadow=compatible-local]
 >>    167 | size_t
 >> s;   \
 >>    |    ^
 >>  ../contrib/vhost-user-gpu/virgl.c:203:5: note: in expansion of
macro
 >> ‘VUGPU_FILL_CMD’
 >>    203 | VUGPU_FILL_CMD(cs);
 >>    | ^~
 >>  ../contrib/vhost-user-gpu/virgl.c:201:12: note: shadowed
declaration
 >> is here
 >>    201 | size_t s;
 >>    |    ^
 >>
 >>  ../contrib/vhost-user-gpu/vhost-user-gpu.c: In function
 >> ‘vg_resource_flush’:
 >>  ../contrib/vhost-user-gpu/vhost-user-gpu.c:837:29: warning:
 >> declaration of ‘i’ shadows a previous local [-Wshadow=local]
 >>    837 | pixman_image_t *i =
 >>    | ^
 >>  ../contrib/vhost-user-gpu/vhost-user-gpu.c:757:9: note: shadowed
 >> declaration is here
 >>    757 | int i;
 >>    | ^
 >>
 >> Gerd, Marc-André, or anybody else?
 >>
 >> More warnings may lurk in code my test build doesn't compile.  Need a
 >> full CI build with -Wshadow=local to find them.  Anybody care to kick
 >> one off?
 >
 > I ran a build here (with -Werror enabled, so that it's easier to see
where
 > it breaks):
 >
 > https://gitlab.com/thuth/qemu/-/pipelines/1028023489

 >
 > ... but I didn't see any additional spots in the logs beside the ones
that
 > you already listed.

After adding two more patches to fix the above warnings, things look pretty
good:

https://gitlab.com/thuth/qemu/-/pipelines/1028413030


There are just some warnings left in the BSD code, as Warner already
mentioned in his reply to v2 of your mail:

https://gitlab.com/thuth/qemu/-/jobs/5241420713



I think I have fixes for these. I need to merge what just landed into 
bsd-user fork, rebase, test, the apply them to qemu master branch, retest 
and send them off...


My illness has hung on longer than I thought so I'm still behind...


Get well soon again! ... and no worries about the -Wshadow=local patches in 
the BSD code, there is no hurry - The BSDs are using Clang by default, so 
that option won't get enabled by default there anyway yet - I had to switch 
to GCC in the CI pipeline to trigger those, and I guess only very few people 
will use GCC to compile QEMU on Fre

[PATCH 00/44] Raspberry Pi 4B machine

2023-10-09 Thread Ben Dooks


Hi, is there an git tree with this series or a newer one available
please?

--
Ben Dooks   http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

[PATCH] memory: drop needless argument

2023-10-09 Thread marcandre . lureau

From: Marc-André Lureau 

The argument is unused since commit bdc44640c ("cpu: Use QTAILQ for CPU list").

Signed-off-by: Marc-André Lureau 
---
 softmmu/memory_mapping.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/softmmu/memory_mapping.c b/softmmu/memory_mapping.c
index d7f1d096e0..8ba9968f8c 100644
--- a/softmmu/memory_mapping.c
+++ b/softmmu/memory_mapping.c
@@ -291,7 +291,7 @@ void guest_phys_blocks_append(GuestPhysBlockList *list)
 memory_listener_unregister(&g.listener);
 }
 
-static CPUState *find_paging_enabled_cpu(CPUState *start_cpu)
+static CPUState *find_paging_enabled_cpu(void)
 {
 CPUState *cpu;
 
@@ -312,7 +312,7 @@ void qemu_get_guest_memory_mapping(MemoryMappingList *list,
 GuestPhysBlock *block;
 ram_addr_t offset, length;
 
-first_paging_enabled_cpu = find_paging_enabled_cpu(first_cpu);
+first_paging_enabled_cpu = find_paging_enabled_cpu();
 if (first_paging_enabled_cpu) {
 for (cpu = first_paging_enabled_cpu; cpu != NULL;
  cpu = CPU_NEXT(cpu)) {
-- 
2.41.0

[PATCH] memory: follow Error API guidelines

2023-10-09 Thread marcandre . lureau

From: Marc-André Lureau 

Return true/false on success/failure.

Signed-off-by: Marc-André Lureau 
---
 include/hw/core/cpu.h |  4 +++-
 include/hw/core/sysemu-cpu-ops.h  |  2 +-
 include/sysemu/memory_mapping.h   |  2 +-
 target/i386/cpu.h |  2 +-
 hw/core/cpu-sysemu.c  |  6 +++---
 softmmu/memory_mapping.c  | 13 ++---
 target/i386/arch_memory_mapping.c |  6 --
 7 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e02bc5980f..2373fdde18 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -618,8 +618,10 @@ bool cpu_paging_enabled(const CPUState *cpu);
  * @cpu: The CPU whose memory mappings are to be obtained.
  * @list: Where to write the memory mappings to.
  * @errp: Pointer for reporting an #Error.
+ *
+ * Returns: %true on success, %false otherwise.
  */
-void cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
+bool cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
 Error **errp);
 
 #if !defined(CONFIG_USER_ONLY)
diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
index ee169b872c..24d003fe04 100644
--- a/include/hw/core/sysemu-cpu-ops.h
+++ b/include/hw/core/sysemu-cpu-ops.h
@@ -19,7 +19,7 @@ typedef struct SysemuCPUOps {
 /**
  * @get_memory_mapping: Callback for obtaining the memory mappings.
  */
-void (*get_memory_mapping)(CPUState *cpu, MemoryMappingList *list,
+bool (*get_memory_mapping)(CPUState *cpu, MemoryMappingList *list,
Error **errp);
 /**
  * @get_paging_enabled: Callback for inquiring whether paging is enabled.
diff --git a/include/sysemu/memory_mapping.h b/include/sysemu/memory_mapping.h
index 3bbeb1bcb4..021e0a6230 100644
--- a/include/sysemu/memory_mapping.h
+++ b/include/sysemu/memory_mapping.h
@@ -71,7 +71,7 @@ void guest_phys_blocks_free(GuestPhysBlockList *list);
 void guest_phys_blocks_init(GuestPhysBlockList *list);
 void guest_phys_blocks_append(GuestPhysBlockList *list);
 
-void qemu_get_guest_memory_mapping(MemoryMappingList *list,
+bool qemu_get_guest_memory_mapping(MemoryMappingList *list,
const GuestPhysBlockList *guest_phys_blocks,
Error **errp);
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e1875466b9..471e71dbc5 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2055,7 +2055,7 @@ int x86_cpu_write_elf64_qemunote(WriteCoreDumpFunction f, 
CPUState *cpu,
 int x86_cpu_write_elf32_qemunote(WriteCoreDumpFunction f, CPUState *cpu,
  DumpState *s);
 
-void x86_cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
+bool x86_cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
 Error **errp);
 
 void x86_cpu_dump_state(CPUState *cs, FILE *f, int flags);
diff --git a/hw/core/cpu-sysemu.c b/hw/core/cpu-sysemu.c
index 5eaf2e79e6..d0d6a910f9 100644
--- a/hw/core/cpu-sysemu.c
+++ b/hw/core/cpu-sysemu.c
@@ -34,17 +34,17 @@ bool cpu_paging_enabled(const CPUState *cpu)
 return false;
 }
 
-void cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
+bool cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
 Error **errp)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
 
 if (cc->sysemu_ops->get_memory_mapping) {
-cc->sysemu_ops->get_memory_mapping(cpu, list, errp);
-return;
+return cc->sysemu_ops->get_memory_mapping(cpu, list, errp);
 }
 
 error_setg(errp, "Obtaining memory mappings is unsupported on this CPU.");
+return false;
 }
 
 hwaddr cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
diff --git a/softmmu/memory_mapping.c b/softmmu/memory_mapping.c
index 8ba9968f8c..6f884c5b90 100644
--- a/softmmu/memory_mapping.c
+++ b/softmmu/memory_mapping.c
@@ -304,10 +304,11 @@ static CPUState *find_paging_enabled_cpu(void)
 return NULL;
 }
 
-void qemu_get_guest_memory_mapping(MemoryMappingList *list,
+bool qemu_get_guest_memory_mapping(MemoryMappingList *list,
const GuestPhysBlockList *guest_phys_blocks,
Error **errp)
 {
+ERRP_GUARD();
 CPUState *cpu, *first_paging_enabled_cpu;
 GuestPhysBlock *block;
 ram_addr_t offset, length;
@@ -316,14 +317,11 @@ void qemu_get_guest_memory_mapping(MemoryMappingList 
*list,
 if (first_paging_enabled_cpu) {
 for (cpu = first_paging_enabled_cpu; cpu != NULL;
  cpu = CPU_NEXT(cpu)) {
-Error *err = NULL;
-cpu_get_memory_mapping(cpu, list, &err);
-if (err) {
-error_propagate(errp, err);
-return;
+if (!cpu_get_memory_mapping(cpu, list, errp)) {
+return false;
 }
 }
-return;
+re

Re: [PATCH] target/riscv: deprecate capital 'Z' CPU properties

2023-10-09 Thread Andrew Jones

On Sat, Oct 07, 2023 at 02:14:27PM -0300, Daniel Henrique Barboza wrote:
> At this moment there are eleven CPU extension properties that starts
> with capital 'Z': Zifencei, Zicsr, Zihintntl, Zihintpause, Zawrs, Zfa,
> Zfh, Zfhmin, Zve32f, Zve64f and Zve64d. All other extensions are named
> with lower-case letters.
> 
> We want all properties to be named with lower-case letters since it's
> consistent with the riscv-isa string that we create in the FDT. Having
> these 11 properties to be exceptions can be confusing.
> 
> Deprecate all of them. Create their lower-case counterpart to be used as
> maintained CPU properties. When trying to use any deprecated property a
> warning message will be displayed, recommending users to switch to the
> lower-case variant:
> 
> ./build/qemu-system-riscv64 -M virt -cpu rv64,Zifencei=true --nographic
> qemu-system-riscv64: warning: CPU property 'Zifencei' is deprecated. Please 
> use 'zifencei' instead
> 
> This will give users some time to change their scripts before we remove
> the capital 'Z' properties entirely.
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  docs/about/deprecated.rst  | 23 ++
>  target/riscv/cpu.c | 39 +++---
>  target/riscv/cpu.h |  1 +
>  target/riscv/tcg/tcg-cpu.c | 31 +-
>  4 files changed, 82 insertions(+), 12 deletions(-)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 694b878f36..331f10f930 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -378,6 +378,29 @@ of generic CPUs: rv32 and rv64 as default CPUs and 'max' 
> as a feature complete
>  CPU for both 32 and 64 bit builds. Users are then discouraged to use the 
> 'any'
>  CPU type starting in 8.2.
>  
> +RISC-V CPU properties which start with with capital 'Z' (since 8.2)
  ^ double with

> +^^
> +
> +All RISC-V CPU properties which start with capital 'Z' are being deprecated
> +starting in 8.2. The reason is that they were wrongly added with capital 'Z'
> +in the past. CPU properties were later added with lower-case names, which
> +is the format we want to use from now on.
> +
> +Users which try to use these deprecated properties will receive a warning
> +recommending to switch to their stable counterparts:
> +
> +- "Zifencei" should be replaced with "zifencei"
> +- "Zicsr" should be replaced with "zicsr"
> +- "Zihintntl" should be replaced with "zihintntl"
> +- "Zihintpause" should be replaced with "zihintpause"
> +- "Zawrs" should be replaced with "zawrs"
> +- "Zfa" should be replaced with "zfa"
> +- "Zfh" should be replaced with "zfh"
> +- "Zfhmin" should be replaced with "zfhmin"
> +- "Zve32f" should be replaced with "zve32f"
> +- "Zve64f" should be replaced with "zve64f"
> +- "Zve64d" should be replaced with "zve64d"
> +
>  Block device options
>  
>  
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 521bb88538..1cdc3d2609 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1246,17 +1246,17 @@ const char *riscv_get_misa_ext_description(uint32_t 
> bit)
>  const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
>  /* Defaults for standard extensions */
>  MULTI_EXT_CFG_BOOL("sscofpmf", ext_sscofpmf, false),
> -MULTI_EXT_CFG_BOOL("Zifencei", ext_ifencei, true),
> -MULTI_EXT_CFG_BOOL("Zicsr", ext_icsr, true),
> -MULTI_EXT_CFG_BOOL("Zihintntl", ext_zihintntl, true),
> -MULTI_EXT_CFG_BOOL("Zihintpause", ext_zihintpause, true),
> -MULTI_EXT_CFG_BOOL("Zawrs", ext_zawrs, true),
> -MULTI_EXT_CFG_BOOL("Zfa", ext_zfa, true),
> -MULTI_EXT_CFG_BOOL("Zfh", ext_zfh, false),
> -MULTI_EXT_CFG_BOOL("Zfhmin", ext_zfhmin, false),
> -MULTI_EXT_CFG_BOOL("Zve32f", ext_zve32f, false),
> -MULTI_EXT_CFG_BOOL("Zve64f", ext_zve64f, false),
> -MULTI_EXT_CFG_BOOL("Zve64d", ext_zve64d, false),
> +MULTI_EXT_CFG_BOOL("zifencei", ext_ifencei, true),
> +MULTI_EXT_CFG_BOOL("zicsr", ext_icsr, true),
> +MULTI_EXT_CFG_BOOL("zihintntl", ext_zihintntl, true),
> +MULTI_EXT_CFG_BOOL("zihintpause", ext_zihintpause, true),
> +MULTI_EXT_CFG_BOOL("zawrs", ext_zawrs, true),
> +MULTI_EXT_CFG_BOOL("zfa", ext_zfa, true),
> +MULTI_EXT_CFG_BOOL("zfh", ext_zfh, false),
> +MULTI_EXT_CFG_BOOL("zfhmin", ext_zfhmin, false),
> +MULTI_EXT_CFG_BOOL("zve32f", ext_zve32f, false),
> +MULTI_EXT_CFG_BOOL("zve64f", ext_zve64f, false),
> +MULTI_EXT_CFG_BOOL("zve64d", ext_zve64d, false),
>  MULTI_EXT_CFG_BOOL("sstc", ext_sstc, true),
>  
>  MULTI_EXT_CFG_BOOL("smstateen", ext_smstateen, false),
> @@ -1349,6 +1349,23 @@ const RISCVCPUMultiExtConfig 
> riscv_cpu_experimental_exts[] = {
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +/* Deprecated entries marked for future removal */
> +const RISCVCPUMultiExtConfig riscv_cpu_deprecated_exts[] = {
> +

Re: [Virtio-fs] (no subject)

2023-10-09 Thread Hanna Czenczek


On 06.10.23 22:49, Alex Bennée wrote:

Hanna Czenczek  writes:


On 06.10.23 17:17, Alex Bennée wrote:

Hanna Czenczek  writes:


On 06.10.23 12:34, Michael S. Tsirkin wrote:

On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote:

On 06.10.23 11:26, Michael S. Tsirkin wrote:

On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote:

On 06.10.23 10:45, Michael S. Tsirkin wrote:

On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote:

On 05.10.23 19:15, Michael S. Tsirkin wrote:

On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote:

On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote:



What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all
devices that would implement them as per virtio spec, and even today it’s
broken for stateful devices.  The mentioned performance issue is likely
real, but we can’t address it by making up SET_STATUS calls that are wrong.

I concede that I didn’t think about DRIVER_OK.  Personally, I would do all
final configuration that would happen upon a DRIVER_OK once the first vring
is started (i.e. receives a kick).  That has the added benefit of being
asynchronous because it doesn’t block any vhost-user messages (which are
synchronous, and thus block downtime).

Hanna

For better or worse kick is per ring. It's out of spec to start rings
that were not kicked but I guess you could do configuration ...
Seems somewhat asymmetrical though.

I meant to take the first ring being started as the signal to do the
global configuration, i.e. not do this once per vring, but once
globally.


Let's wait until next week, hopefully Yajun Wu will answer.

I mean, personally I don’t really care about the whole SET_STATUS
thing.  It’s clear that it’s broken for stateful devices.  The fact
that it took until 6f8be29ec17d to fix it for just any device that
would implement it according to spec to me is a strong indication that
nobody does implement it according to spec, and is currently only used
to signal to some specific back-end that all rings have been set up
and should be configured in a single block.

I'm certainly using [GS]ET_STATUS for the proposed F_TRANSPORT
extensions where everything is off-loaded to the vhost-user backend.

How do these back-ends work with the fact that qemu uses SET_STATUS
incorrectly when not offloading?  Do you plan on fixing that?

Mainly having a common base implementation which does it right and
having very lightweight derivations for legacy stubs using it. The
aim is to eliminate the need for QEMU stubs entirely by fully specifying
the device from the vhost-user API.


If the current SET_STATUS use is overhauled, too, that would be good.  I 
wonder why you need the status byte, though.



(I.e. that we send SET_STATUS 0 when the VM is paused, potentially
resetting state that is not recoverable, and that we set DRIVER and
DRIVER_OK simultaneously.)

This is QEMU simulating a SET_STATUS rather than the guest triggering
it?


Yes, and the fact that we simulate it when the guest will not have 
triggered it, i.e. we reset the device (SET_STATUS 0) when the VM is 
paused.  Effectively, qemu injects virtio commands that the guest has 
never requested, which generally feels like a bad idea, because qemu 
will need to get the device back to its previous state before the guest 
is resumed, which may or may not work.  Specifically, it won’t work for 
devices that have internal state.


Furthermore, we use SET_STATUS to set ACKNOWLEDGE | DRIVER | DRIVER_OK 
simultaneously, which is wrong.  ACKNOWLEDGE | DRIVER may perhaps be set 
simultaneously, but then comes feature negotiation (setting and checking 
FEATURES_OK), and then DRIVER_OK.


Finally, how the status byte is to be used is not noted in the 
vhost-user specification, which instead points to the virtio 
specification.  I think if we keep SET_STATUS, it must be documented how 
it interacts with other vhost-user commands.  For example, how the 
FEATURES_OK protocol described in the virtio specification interacts 
with GET_FEATURES/SET_FEATURES, or whether SET_STATUS 0 and RESET_DEVICE 
are equivalent.  Currently, the only implementation of SET_STATUS I know 
(DPDK) ignores SET_STATUS 0, i.e. doesn’t do a reset.  To me that 
indicates that the spec must be clear on what these status values mean 
with regards to the vhost-user protocol as a whole.


So every software implementation with STATUS support that I know 
implements SET_STATUS wrongly right now, and that’s a problem, because 
it prevents implementations like virtiofsd from doing so correctly.


Hanna

Re: [PATCH] hw/virtio/virtio-gpu: Fix compiler warning when compiling with -Wshadow

2023-10-09 Thread Thomas Huth


On 08/10/2023 10.57, Michael S. Tsirkin wrote:

On Fri, Oct 06, 2023 at 06:45:08PM +0200, Thomas Huth wrote:

Avoid using trivial variable names in macros, otherwise we get
the following compiler warning when compiling with -Wshadow=local:

In file included from ../../qemu/hw/display/virtio-gpu-virgl.c:19:
../../home/thuth/devel/qemu/hw/display/virtio-gpu-virgl.c:
  In function ‘virgl_cmd_submit_3d’:
../../qemu/include/hw/virtio/virtio-gpu.h:228:16: error: declaration of ‘s’
  shadows a previous local [-Werror=shadow=compatible-local]
   228 | size_t s;
   |^
../../qemu/hw/display/virtio-gpu-virgl.c:215:5: note: in expansion of macro
  ‘VIRTIO_GPU_FILL_CMD’
   215 | VIRTIO_GPU_FILL_CMD(cs);
   | ^~~
../../qemu/hw/display/virtio-gpu-virgl.c:213:12: note: shadowed declaration
  is here
   213 | size_t s;
   |^
cc1: all warnings being treated as errors

Signed-off-by: Thomas Huth 
---
  include/hw/virtio/virtio-gpu.h | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 390c4642b8..8b7e3faf01 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -225,13 +225,13 @@ struct VhostUserGPU {
  };
  
  #define VIRTIO_GPU_FILL_CMD(out) do {   \

-size_t s;   \
-s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
+size_t s_;  \
+s_ = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0, \
 &out, sizeof(out));  \
-if (s != sizeof(out)) { \
+if (s_ != sizeof(out)) {\
  qemu_log_mask(LOG_GUEST_ERROR,  \
"%s: command size incorrect %zu vs %zu\n",\
-  __func__, s, sizeof(out));\
+  __func__, s_, sizeof(out));   \
  return; \
  }   \
  } while (0)


This is not really enough I think. Someone might
use another macro as parameter to this macro and we'll get
a mess. We want something that's specific to this macro.
How about VIRTIO_GPU_FILL_CMD_s ?


Sure, can do (also for the other patch).

 Thomas

Re: [PATCH v4 10/15] vfio/ccw: Use vfio_[attach/detach]_device

2023-10-09 Thread Eric Auger

Hi Zhenzhong,

On 10/9/23 03:25, Duan, Zhenzhong wrote:
>
>> -Original Message-
>> From: Eric Auger 
>> Sent: Monday, October 9, 2023 1:46 AM
>> Subject: Re: [PATCH v4 10/15] vfio/ccw: Use vfio_[attach/detach]_device
>>
>> Hi Zhenzhong,
>> On 10/8/23 12:21, Duan, Zhenzhong wrote:
>>> Hi Eric,
>>>
 -Original Message-
 From: Eric Auger 
 Sent: Wednesday, October 4, 2023 11:44 PM
 Subject: [PATCH v4 10/15] vfio/ccw: Use vfio_[attach/detach]_device

 Let the vfio-ccw device use vfio_attach_device() and
 vfio_detach_device(), hence hiding the details of the used
 IOMMU backend.

 Note that the migration reduces the following trace
 "vfio: subchannel %s has already been attached" (featuring
 cssid.ssid.devid) into "device is already attached"

 Also now all the devices have been migrated to use the new
 vfio_attach_device/vfio_detach_device API, let's turn the
 legacy functions into static functions, local to container.c.

 Signed-off-by: Eric Auger 
 Signed-off-by: Yi Liu 
 Signed-off-by: Zhenzhong Duan 
 Reviewed-by: Matthew Rosato 

 ---

 v3:
 - simplified vbasedev->dev setting

 v2 -> v3:
 - Hopefully fix confusion beteen vbasedev->name, mdevid and sysfsdev
  while keeping into account Matthew's comment
  https://lore.kernel.org/qemu-devel/6e04ab8f-dc84-e9c2-deea-
 2b6b31678...@linux.ibm.com/
 ---
 include/hw/vfio/vfio-common.h |   5 --
 hw/vfio/ccw.c | 122 +-
 hw/vfio/common.c  |  10 +--
 3 files changed, 37 insertions(+), 100 deletions(-)

 diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
 index 12fbfbc37d..c486bdef2a 100644
 --- a/include/hw/vfio/vfio-common.h
 +++ b/include/hw/vfio/vfio-common.h
 @@ -202,7 +202,6 @@ typedef struct {
 hwaddr pages;
 } VFIOBitmap;

 -void vfio_put_base_device(VFIODevice *vbasedev);
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
 void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index);
 @@ -220,11 +219,7 @@ void vfio_region_unmap(VFIORegion *region);
 void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
 -VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
 -void vfio_put_group(VFIOGroup *group);
 struct vfio_device_info *vfio_get_device_info(int fd);
 -int vfio_get_device(VFIOGroup *group, const char *name,
 -VFIODevice *vbasedev, Error **errp);
 int vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
 void vfio_detach_device(VFIODevice *vbasedev);
 diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
 index 1e2fce83b0..6ec35fedc9 100644
 --- a/hw/vfio/ccw.c
 +++ b/hw/vfio/ccw.c
 @@ -572,88 +572,15 @@ static void vfio_ccw_put_region(VFIOCCWDevice
 *vcdev)
 g_free(vcdev->io_region);
 }

 -static void vfio_ccw_put_device(VFIOCCWDevice *vcdev)
 -{
 -g_free(vcdev->vdev.name);
 -vfio_put_base_device(&vcdev->vdev);
 -}
 -
 -static void vfio_ccw_get_device(VFIOGroup *group, VFIOCCWDevice *vcdev,
 -Error **errp)
 -{
 -S390CCWDevice *cdev = S390_CCW_DEVICE(vcdev);
 -char *name = g_strdup_printf("%x.%x.%04x", cdev->hostid.cssid,
 - cdev->hostid.ssid,
 - cdev->hostid.devid);
 -VFIODevice *vbasedev;
 -
 -QLIST_FOREACH(vbasedev, &group->device_list, next) {
 -if (strcmp(vbasedev->name, name) == 0) {
 -error_setg(errp, "vfio: subchannel %s has already been 
 attached",
 -   name);
 -goto out_err;
 -}
 -}
 -
 -/*
 - * All vfio-ccw devices are believed to operate in a way compatible 
 with
 - * discarding of memory in RAM blocks, ie. pages pinned in the host 
 are
 - * in the current working set of the guest driver and therefore never
 - * overlap e.g., with pages available to the guest balloon driver.  
 This
 - * needs to be set before vfio_get_device() for vfio common to handle
 - * ram_block_discard_disable().
 - */
 -vcdev->vdev.ram_block_discard_allowed = true;
 -
 -if (vfio_get_device(group, cdev->mdevid, &vcdev->vdev, errp)) {
 -goto out_err;
 -}
 -
 -vcdev->vdev.ops = &vfio_ccw_ops;
 -vcdev->vdev.type = VFIO_DEVICE_TYPE_CCW;
 -vcdev->vdev.name = name;
 -vcdev->vdev.dev = DEVICE(vcdev);
 -
 -return;
 -

Re: [PATCH] memory: follow Error API guidelines

2023-10-09 Thread Philippe Mathieu-Daudé


On 9/10/23 09:53, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

Return true/false on success/failure.

Signed-off-by: Marc-André Lureau 
---
  include/hw/core/cpu.h |  4 +++-
  include/hw/core/sysemu-cpu-ops.h  |  2 +-
  include/sysemu/memory_mapping.h   |  2 +-
  target/i386/cpu.h |  2 +-
  hw/core/cpu-sysemu.c  |  6 +++---
  softmmu/memory_mapping.c  | 13 ++---
  target/i386/arch_memory_mapping.c |  6 --
  7 files changed, 19 insertions(+), 16 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] memory: drop needless argument

2023-10-09 Thread Philippe Mathieu-Daudé


On 9/10/23 09:52, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

The argument is unused since commit bdc44640c ("cpu: Use QTAILQ for CPU list").


10 years =)

Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Marc-André Lureau 
---
  softmmu/memory_mapping.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Re: [PATCH v7 05/15] python/qemu: rename command() to cmd()

2023-10-09 Thread Cédric Le Goater


On 10/6/23 17:41, Vladimir Sementsov-Ogievskiy wrote:

Use a shorter name. We are going to move in iotests from qmp() to
command() where possible. But command() is longer than qmp() and don't
look better. Let's rename.

You can simply grep for '\.command(' and for 'def command(' to check
that everything is updated (command() in tests/docker/docker.py is
unrelated).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Eric Blake 
[vsementsov: also update three occurrences in
tests/avocado/machine_aspeed.py and keep r-b]


For aspeed,

Reviewed-by: Cédric Le Goater 

Thanks,

C.





---
  docs/devel/testing.rst|  10 +-
  python/qemu/machine/machine.py|   8 +-
  python/qemu/qmp/legacy.py |   2 +-
  python/qemu/qmp/qmp_shell.py  |   2 +-
  python/qemu/utils/qemu_ga_client.py   |   2 +-
  python/qemu/utils/qom.py  |   8 +-
  python/qemu/utils/qom_common.py   |   2 +-
  python/qemu/utils/qom_fuse.py |   6 +-
  scripts/cpu-x86-uarch-abi.py  |   8 +-
  scripts/device-crash-test |   8 +-
  scripts/render_block_graph.py |   8 +-
  tests/avocado/avocado_qemu/__init__.py|   4 +-
  tests/avocado/cpu_queries.py  |   5 +-
  tests/avocado/hotplug_cpu.py  |  10 +-
  tests/avocado/info_usernet.py |   4 +-
  tests/avocado/machine_arm_integratorcp.py |   6 +-
  tests/avocado/machine_aspeed.py   |  12 +-
  tests/avocado/machine_m68k_nextcube.py|   4 +-
  tests/avocado/machine_mips_malta.py   |   6 +-
  tests/avocado/machine_s390_ccw_virtio.py  |  28 ++--
  tests/avocado/migration.py|  10 +-
  tests/avocado/pc_cpu_hotplug_props.py |   2 +-
  tests/avocado/version.py  |   4 +-
  tests/avocado/virtio_check_params.py  |   6 +-
  tests/avocado/virtio_version.py   |   5 +-
  tests/avocado/x86_cpu_model_versions.py   |  13 +-
  tests/migration/guestperf/engine.py   | 150 +++---
  tests/qemu-iotests/256|  34 ++---
  tests/qemu-iotests/257|  36 +++---
  29 files changed, 204 insertions(+), 199 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 5d1fc0aa95..21525e9aae 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -1014,8 +1014,8 @@ class.  Here's a simple usage example:
"""
def test_qmp_human_info_version(self):
self.vm.launch()
-  res = self.vm.command('human-monitor-command',
-command_line='info version')
+  res = self.vm.cmd('human-monitor-command',
+command_line='info version')
self.assertRegexpMatches(res, r'^(\d+\.\d+\.\d)')
  
  To execute your test, run:

@@ -1065,15 +1065,15 @@ and hypothetical example follows:
first_machine.launch()
second_machine.launch()
  
-  first_res = first_machine.command(

+  first_res = first_machine.cmd(
'human-monitor-command',
command_line='info version')
  
-  second_res = second_machine.command(

+  second_res = second_machine.cmd(
'human-monitor-command',
command_line='info version')
  
-  third_res = self.get_vm(name='third_machine').command(

+  third_res = self.get_vm(name='third_machine').cmd(
'human-monitor-command',
command_line='info version')
  
diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py

index dd1a79cb37..c4e80544bd 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -697,16 +697,16 @@ def qmp(self, cmd: str,
  self._quit_issued = True
  return ret
  
-def command(self, cmd: str,

-conv_keys: bool = True,
-**args: Any) -> QMPReturnValue:
+def cmd(self, cmd: str,
+conv_keys: bool = True,
+**args: Any) -> QMPReturnValue:
  """
  Invoke a QMP command.
  On success return the response dict.
  On failure raise an exception.
  """
  qmp_args = self._qmp_args(conv_keys, args)
-ret = self._qmp.command(cmd, **qmp_args)
+ret = self._qmp.cmd(cmd, **qmp_args)
  if cmd == 'quit':
  self._quit_issued = True
  return ret
diff --git a/python/qemu/qmp/legacy.py b/python/qemu/qmp/legacy.py
index e5fa1ce9c4..22a2b5616e 100644
--- a/python/qemu/qmp/legacy.py
+++ b/python/qemu/qmp/legacy.py
@@ -207,7 +207,7 @@ def cmd_raw(self, name: str,
  qmp_cmd['arguments'] = args
  return self.cmd_obj(qmp_cmd)
  
-def command(self, cmd: str, **kwds: object) -> QMPReturnValue:

+def cmd(self, cmd: str, **kwds: object) -> QMPReturnValue:
  """
  Build and send a QMP com

Re: [Virtio-fs] (no subject)

2023-10-09 Thread Hanna Czenczek


On 07.10.23 04:22, Yajun Wu wrote:


On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote:

External email: Use caution opening links or attachments


On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote:

On 06.10.23 11:26, Michael S. Tsirkin wrote:

On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote:

On 06.10.23 10:45, Michael S. Tsirkin wrote:

On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote:

On 05.10.23 19:15, Michael S. Tsirkin wrote:

On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote:

On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote:
There is no clearly defined purpose for the virtio status 
byte in
vhost-user: For resetting, we already have RESET_DEVICE; and 
for virtio

feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK
protocol extension, it is possible for SET_FEATURES to return 
errors

(SET_PROTOCOL_FEATURES may be called before SET_FEATURES).

As for implementations, SET_STATUS is not widely 
implemented.  dpdk does
implement it, but only uses it to signal feature negotiation 
failure.
While it does log reset requests (SET_STATUS 0) as such, it 
effectively
ignores them, in contrast to RESET_OWNER (which is 
deprecated, and today

means the same thing as RESET_DEVICE).

While qemu superficially has support for [GS]ET_STATUS, it 
does not

forward the guest-set status byte, but instead just makes it up
internally, and actually completely ignores what the back-end 
returns,
only using it as the template for a subsequent SET_STATUS to 
add single
bits to it.  Notably, after setting FEATURES_OK, it never 
reads it back
to see whether the flag is still set, which is the only way 
in which

dpdk uses the status byte.

As-is, no front-end or back-end can rely on the other side 
handling this
field in a useful manner, and it also provides no practical 
use over
other mechanisms the vhost-user protocol has, which are more 
clearly

defined.  Deprecate it.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Hanna Czenczek 
---
 docs/interop/vhost-user.rst | 28 
+---

 1 file changed, 21 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi 
SET_STATUS is the only way to signal failure to acknowledge 
FEATURES_OK.
The fact current backends never check errors does not mean they 
never

will. So no, not applying this.
Can this not be done with REPLY_ACK?  I.e., with the following 
message

order:

1. GET_FEATURES to find out whether 
VHOST_USER_F_PROTOCOL_FEATURES is

present
2. GET_PROTOCOL_FEATURES to hopefully get 
VHOST_USER_PROTOCOL_F_REPLY_ACK

3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK
4. SET_FEATURES with need_reply

If not, the problem is that qemu has sent SET_STATUS 0 for a 
while when the
vCPUs are stopped, which generally seems to request a device 
reset.  If we
don’t state at least that SET_STATUS 0 is to be ignored, 
back-ends that will
implement SET_STATUS later may break with at least these qemu 
versions.  But
documenting that a particular use of the status byte is to be 
ignored would

be really strange.

Hanna
Hmm I guess. Though just following virtio spec seems cleaner to 
me...

vhost-user reconfigures the state fully on start.
Not the internal device state, though.  virtiofsd has internal 
state, and

other devices like vhost-gpu back-ends would probably, too.

Stefan has recently sent a series
(https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) 
to
put the reset (RESET_DEVICE) into virtio_reset() (when we really 
need a

reset).

I really don’t like our current approach with the status byte. 
Following the
virtio specification to me would mean that the guest directly 
controls this
byte, which it does not.  qemu makes up values as it deems 
appropriate, and
this includes sending a SET_STATUS 0 when the guest is just 
paused, i.e.

when the guest really doesn’t want a device reset.

That means that qemu does not treat this as a virtio device field 
(because
that would mean exposing it to the guest driver), but instead 
treats it as
part of the vhost(-user) protocol.  It doesn’t feel right to me 
that we use
a virtio-defined feature for communication on the vhost level, 
i.e. between
front-end and back-end, and not between guest driver and device.  
I think
all vhost-level protocol features should be fully defined in the 
vhost-user

specification, which REPLY_ACK is.

Hmm that makes sense. Maybe we should have done what stefan's patch
is doing.

Do look at the original commit that introduced it to understand why
it was added.
I don’t understand why this was added to the stop/cont code, 
though.  If it
is time consuming to make these changes, why are they done every 
time the VM

is paused
and resumed?  It makes sense that this would be done for the initial
configuration (where a reset also wouldn’t hurt), but here it seems 
wrong.


(To be clear, a reset in the stop/cont code is wrong, because it breaks
stateful devices.)

Also, note

Re: [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE

2023-10-09 Thread Harsh Prateek Bora





On 9/7/23 09:00, Nicholas Piggin wrote:

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:

L1 can reuest to get/set state of any of the supported Guest State
Buffer (GSB) elements using h_guest_[get|set]_state hcalls.
These hcalls needs to do some necessary validation check for each
get/set request based on the flags passed and operation supported.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
  hw/ppc/spapr_nested.c | 267 ++
  include/hw/ppc/spapr_nested.h |  22 +++
  2 files changed, 289 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 6fbb1bcb02..498e7286fa 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -897,6 +897,138 @@ void init_nested(void)
  }
  }
  
+static struct guest_state_element *guest_state_element_next(

+struct guest_state_element *element,
+int64_t *len,
+int64_t *num_elements)
+{
+uint16_t size;
+
+/* size is of element->value[] only. Not whole guest_state_element */
+size = be16_to_cpu(element->size);
+
+if (len) {
+*len -= size + offsetof(struct guest_state_element, value);
+}
+
+if (num_elements) {
+*num_elements -= 1;
+}
+
+return (struct guest_state_element *)(element->value + size);
+}
+
+static
+struct guest_state_element_type *guest_state_element_type_find(uint16_t id)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++)
+if (id == guest_state_element_types[i].id) {
+return &guest_state_element_types[i];
+}
+
+return NULL;
+}
+
+static void print_element(struct guest_state_element *element,
+  struct guest_state_request *gsr)
+{
+printf("id:0x%04x size:0x%04x %s ",
+   be16_to_cpu(element->id), be16_to_cpu(element->size),
+   gsr->flags & GUEST_STATE_REQUEST_SET ? "set" : "get");
+printf("buf:0x%016lx ...\n", be64_to_cpu(*(uint64_t *)element->value));


No printfs. These could be GUEST_ERROR qemu logs if anything, make
sure they're relatively well formed messages if you keep them, i.e.,
something a Linux/KVM developer could understand what went wrong.
I.e., no __func__ which is internal to QEMU, use "H_GUEST_GET_STATE"
etc. Ditto for all the rest of the printfs.



Sure, changing to qemu_log_mask(LOG_GUEST_ERROR, "h_guest_%s_state ..."


+}
+
+static bool guest_state_request_check(struct guest_state_request *gsr)
+{
+int64_t num_elements, len = gsr->len;
+struct guest_state_buffer *gsb = gsr->gsb;
+struct guest_state_element *element;
+struct guest_state_element_type *type;
+uint16_t id, size;
+
+/* gsb->num_elements = 0 == 32 bits long */
+assert(len >= 4);


I haven't looked closely, but can the guest can't crash the
host with malformed requests here?


The GSB communication is happening between L1 host and L0 only.
L2 guest doesnt participate and remains unaware of this state exchange.
Hence, Only L1 with malformed request can crash itself, not L2.


This API is pretty complicated, make sure you sanitize all inputs
carefully, as early as possible, and without too deep a call and
control flow chain from the API entry point.



Noted.




+
+num_elements = be32_to_cpu(gsb->num_elements);
+element = gsb->elements;
+len -= sizeof(gsb->num_elements);
+
+/* Walk the buffer to validate the length */
+while (num_elements) {
+
+id = be16_to_cpu(element->id);
+size = be16_to_cpu(element->size);
+
+if (false) {
+print_element(element, gsr);
+}
+/* buffer size too small */
+if (len < 0) {
+return false;
+}
+
+type = guest_state_element_type_find(id);
+if (!type) {
+printf("%s: Element ID %04x unknown\n", __func__, id);
+print_element(element, gsr);
+return false;
+}
+
+if (id == GSB_HV_VCPU_IGNORED_ID) {
+goto next_element;
+}
+
+if (size != type->size) {
+printf("%s: Size mismatch. Element ID:%04x. Size Exp:%i Got:%i\n",
+   __func__, id, type->size, size);
+print_element(element, gsr);
+return false;
+}
+
+if ((type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY) &&
+(gsr->flags & GUEST_STATE_REQUEST_SET)) {
+printf("%s: trying to set a read-only Element ID:%04x.\n",
+   __func__, id);
+return false;
+}
+
+if (type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE) {
+/* guest wide element type */
+if (!(gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE)) {
+printf("%s: trying to set a guest wide Element ID:%04x.\n",
+   __func__, id);
+return false;
+}
+} else {
+/* thread wide element type */
+if (gsr->flags & GUEST_S

RE: [PATCH v4 10/15] vfio/ccw: Use vfio_[attach/detach]_device

2023-10-09 Thread Duan, Zhenzhong

Hi Eric,

>-Original Message-
>From: qemu-devel-bounces+zhenzhong.duan=intel@nongnu.org devel-bounces+zhenzhong.duan=intel@nongnu.org> On Behalf Of Eric
>Auger
>Sent: Monday, October 9, 2023 4:15 PM
>Subject: Re: [PATCH v4 10/15] vfio/ccw: Use vfio_[attach/detach]_device
>
>Hi Zhenzhong,
>
>On 10/9/23 03:25, Duan, Zhenzhong wrote:
>>
>>> -Original Message-
>>> From: Eric Auger 
>>> Sent: Monday, October 9, 2023 1:46 AM
>>> Subject: Re: [PATCH v4 10/15] vfio/ccw: Use vfio_[attach/detach]_device
>>>
>>> Hi Zhenzhong,
>>> On 10/8/23 12:21, Duan, Zhenzhong wrote:
 Hi Eric,

> -Original Message-
> From: Eric Auger 
> Sent: Wednesday, October 4, 2023 11:44 PM
> Subject: [PATCH v4 10/15] vfio/ccw: Use vfio_[attach/detach]_device
>
> Let the vfio-ccw device use vfio_attach_device() and
> vfio_detach_device(), hence hiding the details of the used
> IOMMU backend.
>
> Note that the migration reduces the following trace
> "vfio: subchannel %s has already been attached" (featuring
> cssid.ssid.devid) into "device is already attached"
>
> Also now all the devices have been migrated to use the new
> vfio_attach_device/vfio_detach_device API, let's turn the
> legacy functions into static functions, local to container.c.
>
> Signed-off-by: Eric Auger 
> Signed-off-by: Yi Liu 
> Signed-off-by: Zhenzhong Duan 
> Reviewed-by: Matthew Rosato 
>
> ---
>
> v3:
> - simplified vbasedev->dev setting
>
> v2 -> v3:
> - Hopefully fix confusion beteen vbasedev->name, mdevid and sysfsdev
>  while keeping into account Matthew's comment
>  https://lore.kernel.org/qemu-devel/6e04ab8f-dc84-e9c2-deea-
> 2b6b31678...@linux.ibm.com/
> ---
> include/hw/vfio/vfio-common.h |   5 --
> hw/vfio/ccw.c | 122 +-
> hw/vfio/common.c  |  10 +--
> 3 files changed, 37 insertions(+), 100 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
> index 12fbfbc37d..c486bdef2a 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -202,7 +202,6 @@ typedef struct {
> hwaddr pages;
> } VFIOBitmap;
>
> -void vfio_put_base_device(VFIODevice *vbasedev);
> void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
> void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
> void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index);
> @@ -220,11 +219,7 @@ void vfio_region_unmap(VFIORegion *region);
> void vfio_region_exit(VFIORegion *region);
> void vfio_region_finalize(VFIORegion *region);
> void vfio_reset_handler(void *opaque);
> -VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
> -void vfio_put_group(VFIOGroup *group);
> struct vfio_device_info *vfio_get_device_info(int fd);
> -int vfio_get_device(VFIOGroup *group, const char *name,
> -VFIODevice *vbasedev, Error **errp);
> int vfio_attach_device(char *name, VFIODevice *vbasedev,
>AddressSpace *as, Error **errp);
> void vfio_detach_device(VFIODevice *vbasedev);
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 1e2fce83b0..6ec35fedc9 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -572,88 +572,15 @@ static void vfio_ccw_put_region(VFIOCCWDevice
> *vcdev)
> g_free(vcdev->io_region);
> }
>
> -static void vfio_ccw_put_device(VFIOCCWDevice *vcdev)
> -{
> -g_free(vcdev->vdev.name);
> -vfio_put_base_device(&vcdev->vdev);
> -}
> -
> -static void vfio_ccw_get_device(VFIOGroup *group, VFIOCCWDevice
>*vcdev,
> -Error **errp)
> -{
> -S390CCWDevice *cdev = S390_CCW_DEVICE(vcdev);
> -char *name = g_strdup_printf("%x.%x.%04x", cdev->hostid.cssid,
> - cdev->hostid.ssid,
> - cdev->hostid.devid);
> -VFIODevice *vbasedev;
> -
> -QLIST_FOREACH(vbasedev, &group->device_list, next) {
> -if (strcmp(vbasedev->name, name) == 0) {
> -error_setg(errp, "vfio: subchannel %s has already been 
> attached",
> -   name);
> -goto out_err;
> -}
> -}
> -
> -/*
> - * All vfio-ccw devices are believed to operate in a way compatible 
> with
> - * discarding of memory in RAM blocks, ie. pages pinned in the host 
> are
> - * in the current working set of the guest driver and therefore never
> - * overlap e.g., with pages available to the guest balloon driver.  
> This
> - * needs to be set before vfio_get_device() for vfio common to handle
> - * ram_block_discard_d

Re: [PATCH 0/2] topic: meson: add more compiler hardening flags

2023-10-09 Thread Daniel P . Berrangé

On Mon, Oct 09, 2023 at 09:21:01AM +0200, Thomas Huth wrote:
> On 05/10/2023 19.38, Daniel P. Berrangé wrote:
> ...
> > 
> > I also tested enabling -ftrapv, to change signed integer
> > overflow from wrapping, to trapping instead. This exposed a
> > bug in the string-input-visitor which overflows when parsing
> > ranges, and exposed the test-int128 code as (harmlessly)
> > overflowing during its testing. Both can be fixed, but I'm
> > not entirely sure whether -ftrapv is viable or not. I was
> > wondering about TCG and whether it has a need to intentionally
> > allow integer overflow for any of its instruction emulation
> > requirements ?
> I'm not an expert when it comes to this question, but as far as I
> understood, we are using -fwrapv (with "w", not "t") on purpose, see
> meson.build:
> 
> # We use -fwrapv to tell the compiler that we require a C dialect where
> # left shift of signed integers is well defined and has the expected
> # 2s-complement style results. (Both clang and gcc agree that it
> # provides these semantics.)
> 
> And according to the man-page of gcc:
> 
>  The options -ftrapv and -fwrapv override each other,
>  so using -ftrapv -fwrapv on the command-line results
>  in -fwrapv being effective.
> 
> If I got that right, this means you cannot use -ftrapv with QEMU.

Opps, I didn't notice we had -fwrapv in our flags, that is clearly
mutually exclusive with -ftrapv, so nothing further to do here.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v2] contrib/vhost-user-gpu: Fix compiler warning when compiling with -Wshadow

2023-10-09 Thread Thomas Huth

Rename some variables to avoid compiler warnings when compiling
with -Wshadow=local.

Signed-off-by: Thomas Huth 
---
 v2: Renamed the variable to something more unique

 contrib/vhost-user-gpu/vugpu.h  | 8 
 contrib/vhost-user-gpu/vhost-user-gpu.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/contrib/vhost-user-gpu/vugpu.h b/contrib/vhost-user-gpu/vugpu.h
index 509b679f03..654c392fbb 100644
--- a/contrib/vhost-user-gpu/vugpu.h
+++ b/contrib/vhost-user-gpu/vugpu.h
@@ -164,12 +164,12 @@ struct virtio_gpu_ctrl_command {
 };
 
 #define VUGPU_FILL_CMD(out) do {\
-size_t s;   \
-s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
+size_t vugpufillcmd_s_ =\
+iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
&out, sizeof(out));  \
-if (s != sizeof(out)) { \
+if (vugpufillcmd_s_ != sizeof(out)) {   \
 g_critical("%s: command size incorrect %zu vs %zu", \
-   __func__, s, sizeof(out));   \
+   __func__, vugpufillcmd_s_, sizeof(out)); \
 return; \
 }   \
 } while (0)
diff --git a/contrib/vhost-user-gpu/vhost-user-gpu.c 
b/contrib/vhost-user-gpu/vhost-user-gpu.c
index aa304475a0..bb41758e34 100644
--- a/contrib/vhost-user-gpu/vhost-user-gpu.c
+++ b/contrib/vhost-user-gpu/vhost-user-gpu.c
@@ -834,7 +834,7 @@ vg_resource_flush(VuGpu *g,
 .width = width,
 .height = height,
 };
-pixman_image_t *i =
+pixman_image_t *img =
 pixman_image_create_bits(pixman_image_get_format(res->image),
  msg->payload.update.width,
  msg->payload.update.height,
@@ -842,11 +842,11 @@ vg_resource_flush(VuGpu *g,
   payload.update.data),
  width * bpp);
 pixman_image_composite(PIXMAN_OP_SRC,
-   res->image, NULL, i,
+   res->image, NULL, img,
extents->x1, extents->y1,
0, 0, 0, 0,
width, height);
-pixman_image_unref(i);
+pixman_image_unref(img);
 vg_send_msg(g, msg, -1);
 g_free(msg);
 }
-- 
2.41.0

[PATCH v2] hw/virtio/virtio-gpu: Fix compiler warning when compiling with -Wshadow

2023-10-09 Thread Thomas Huth

Avoid using trivial variable names in macros, otherwise we get
the following compiler warning when compiling with -Wshadow=local:

In file included from ../../qemu/hw/display/virtio-gpu-virgl.c:19:
../../home/thuth/devel/qemu/hw/display/virtio-gpu-virgl.c:
 In function ‘virgl_cmd_submit_3d’:
../../qemu/include/hw/virtio/virtio-gpu.h:228:16: error: declaration of ‘s’
 shadows a previous local [-Werror=shadow=compatible-local]
  228 | size_t s;
  |^
../../qemu/hw/display/virtio-gpu-virgl.c:215:5: note: in expansion of macro
 ‘VIRTIO_GPU_FILL_CMD’
  215 | VIRTIO_GPU_FILL_CMD(cs);
  | ^~~
../../qemu/hw/display/virtio-gpu-virgl.c:213:12: note: shadowed declaration
 is here
  213 | size_t s;
  |^
cc1: all warnings being treated as errors

Signed-off-by: Thomas Huth 
---
 v2: Renamed the variable to something even less trivial

 include/hw/virtio/virtio-gpu.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 390c4642b8..4739fa4689 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -225,13 +225,13 @@ struct VhostUserGPU {
 };
 
 #define VIRTIO_GPU_FILL_CMD(out) do {   \
-size_t s;   \
-s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
+size_t virtiogpufillcmd_s_ =\
+iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
&out, sizeof(out));  \
-if (s != sizeof(out)) { \
+if (virtiogpufillcmd_s_ != sizeof(out)) {   \
 qemu_log_mask(LOG_GUEST_ERROR,  \
   "%s: command size incorrect %zu vs %zu\n",\
-  __func__, s, sizeof(out));\
+  __func__, virtiogpufillcmd_s_, sizeof(out));  \
 return; \
 }   \
 } while (0)
-- 
2.41.0

[PATCH RFC v4 2/9] target/loongarch: Define some kvm_arch interfaces

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

Define some functions in target/loongarch/kvm.c, such as
kvm_arch_put_registers, kvm_arch_get_registers and
kvm_arch_handle_exit, etc. which are needed by kvm/kvm-all.c.
Now the most functions has no content and they will be
implemented in the next patches.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Reviewed-by: Richard Henderson 
Signed-off-by: xianglai li 
---
 target/loongarch/kvm.c | 131 +
 1 file changed, 131 insertions(+)
 create mode 100644 target/loongarch/kvm.c

diff --git a/target/loongarch/kvm.c b/target/loongarch/kvm.c
new file mode 100644
index 00..0d67322fd9
--- /dev/null
+++ b/target/loongarch/kvm.c
@@ -0,0 +1,131 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch KVM
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+
+#include "qemu/timer.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_int.h"
+#include "hw/pci/pci.h"
+#include "exec/memattrs.h"
+#include "exec/address-spaces.h"
+#include "hw/boards.h"
+#include "hw/irq.h"
+#include "qemu/log.h"
+#include "hw/loader.h"
+#include "migration/migration.h"
+#include "sysemu/runstate.h"
+#include "cpu-csr.h"
+#include "kvm_loongarch.h"
+
+static bool cap_has_mp_state;
+const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
+KVM_CAP_LAST_INFO
+};
+
+int kvm_arch_get_registers(CPUState *cs)
+{
+return 0;
+}
+int kvm_arch_put_registers(CPUState *cs, int level)
+{
+return 0;
+}
+
+int kvm_arch_init_vcpu(CPUState *cs)
+{
+return 0;
+}
+
+int kvm_arch_destroy_vcpu(CPUState *cs)
+{
+return 0;
+}
+
+unsigned long kvm_arch_vcpu_id(CPUState *cs)
+{
+return cs->cpu_index;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+return 0;
+}
+
+int kvm_arch_msi_data_to_gsi(uint32_t data)
+{
+abort();
+}
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+ uint64_t address, uint32_t data, PCIDevice *dev)
+{
+return 0;
+}
+
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev)
+{
+return 0;
+}
+
+void kvm_arch_init_irq_routing(KVMState *s)
+{
+}
+
+int kvm_arch_get_default_type(MachineState *ms)
+{
+return 0;
+}
+
+int kvm_arch_init(MachineState *ms, KVMState *s)
+{
+return 0;
+}
+
+int kvm_arch_irqchip_create(KVMState *s)
+{
+return 0;
+}
+
+void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
+{
+}
+
+MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
+{
+return MEMTXATTRS_UNSPECIFIED;
+}
+
+int kvm_arch_process_async_events(CPUState *cs)
+{
+return cs->halted;
+}
+
+bool kvm_arch_stop_on_emulation_error(CPUState *cs)
+{
+return true;
+}
+
+bool kvm_arch_cpu_check_are_resettable(void)
+{
+return true;
+}
+
+int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
+{
+return 0;
+}
+
+void kvm_arch_accel_class_init(ObjectClass *oc)
+{
+}
-- 
2.39.1

[PATCH RFC v4 6/9] target/loongarch: Implement kvm_arch_init_vcpu

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

Implement kvm_arch_init_vcpu interface for loongarch,
in this function, we register VM change state handler.
And when VM state changes to running, the counter value
should be put into kvm to keep consistent with kvm,
and when state change to stop, counter value should be
refreshed from kvm.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Signed-off-by: xianglai li 
---
 target/loongarch/cpu.h|  2 ++
 target/loongarch/kvm.c| 23 +++
 target/loongarch/trace-events |  2 ++
 3 files changed, 27 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 2580dc26e1..49edf6b016 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -382,6 +382,8 @@ struct ArchCPU {
 
 /* 'compatible' string for this CPU for Linux device trees */
 const char *dtb_compatible;
+/* used by KVM_REG_LOONGARCH_COUNTER ioctl to access guest time counters */
+uint64_t kvm_state_counter;
 };
 
 #define TYPE_LOONGARCH_CPU "loongarch-cpu"
diff --git a/target/loongarch/kvm.c b/target/loongarch/kvm.c
index 5e3bda444e..9754478e34 100644
--- a/target/loongarch/kvm.c
+++ b/target/loongarch/kvm.c
@@ -443,8 +443,31 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
+static void kvm_loongarch_vm_stage_change(void *opaque, bool running,
+  RunState state)
+{
+int ret;
+CPUState *cs = opaque;
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+
+if (running) {
+ret = kvm_larch_putq(cs, KVM_REG_LOONGARCH_COUNTER,
+ &cpu->kvm_state_counter);
+if (ret < 0) {
+trace_kvm_failed_put_counter(strerror(errno));
+}
+} else {
+ret = kvm_larch_getq(cs, KVM_REG_LOONGARCH_COUNTER,
+ &cpu->kvm_state_counter);
+if (ret < 0) {
+trace_kvm_failed_get_counter(strerror(errno));
+}
+}
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
+qemu_add_vm_change_state_handler(kvm_loongarch_vm_stage_change, cs);
 return 0;
 }
 
diff --git a/target/loongarch/trace-events b/target/loongarch/trace-events
index ceba80121b..f801ad7c76 100644
--- a/target/loongarch/trace-events
+++ b/target/loongarch/trace-events
@@ -9,5 +9,7 @@ kvm_failed_get_fpu(const char *msg) "Failed to get fpu from 
KVM: %s"
 kvm_failed_put_fpu(const char *msg) "Failed to put fpu into KVM: %s"
 kvm_failed_get_mpstate(const char *msg) "Failed to get mp_state from KVM: %s"
 kvm_failed_put_mpstate(const char *msg) "Failed to put mp_state into KVM: %s"
+kvm_failed_get_counter(const char *msg) "Failed to get counter from KVM: %s"
+kvm_failed_put_counter(const char *msg) "Failed to put counter into KVM: %s"
 kvm_failed_get_cpucfg(const char *msg) "Failed to get cpucfg from KVM: %s"
 kvm_failed_put_cpucfg(const char *msg) "Failed to put cpucfg into KVM: %s"
-- 
2.39.1

[PATCH RFC v4 5/9] target/loongarch: Implement kvm_arch_init function

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

Implement the kvm_arch_init of loongarch, in the function, the
KVM_CAP_MP_STATE cap is checked by kvm ioctl.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Reviewed-by: Richard Henderson 
Signed-off-by: xianglai li 
---
 target/loongarch/kvm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/loongarch/kvm.c b/target/loongarch/kvm.c
index 8fda80b107..5e3bda444e 100644
--- a/target/loongarch/kvm.c
+++ b/target/loongarch/kvm.c
@@ -491,6 +491,7 @@ int kvm_arch_get_default_type(MachineState *ms)

 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
+cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
 return 0;
 }

-- 
2.39.1

[PATCH RFC v4 9/9] target/loongarch: Add loongarch kvm into meson build

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

Add kvm.c and kvm-stub.c into meson.build to compile
it when kvm is configed. Meanwhile in meson.build,
we set the kvm_targets to loongarch64-softmmu when
the cpu is loongarch.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Signed-off-by: xianglai li 
---
 meson.build  | 2 ++
 target/loongarch/meson.build | 1 +
 2 files changed, 3 insertions(+)

diff --git a/meson.build b/meson.build
index 1c71ead833..1f8f3ea136 100644
--- a/meson.build
+++ b/meson.build
@@ -114,6 +114,8 @@ elif cpu in ['riscv32']
   kvm_targets = ['riscv32-softmmu']
 elif cpu in ['riscv64']
   kvm_targets = ['riscv64-softmmu']
+elif cpu in ['loongarch64']
+  kvm_targets = ['loongarch64-softmmu']
 else
   kvm_targets = []
 endif
diff --git a/target/loongarch/meson.build b/target/loongarch/meson.build
index 7fbf045a5d..dc2e452b1c 100644
--- a/target/loongarch/meson.build
+++ b/target/loongarch/meson.build
@@ -27,6 +27,7 @@ loongarch_system_ss.add(files(
 
 common_ss.add(when: 'CONFIG_LOONGARCH_DIS', if_true: [files('disas.c'), gen])
 
+loongarch_ss.add(when: 'CONFIG_KVM', if_true: files('kvm.c'), if_false: 
files('kvm-stub.c'))
 loongarch_ss.add_all(when: 'CONFIG_TCG', if_true: [loongarch_tcg_ss])
 
 target_arch += {'loongarch': loongarch_ss}
-- 
2.39.1

[PATCH RFC v4 8/9] target/loongarch: Implement set vcpu intr for kvm

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

Implement loongarch kvm set vcpu interrupt interface,
when a irq is set in vcpu, we use the KVM_INTERRUPT
ioctl to set intr into kvm.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Signed-off-by: xianglai li 
---
 target/loongarch/cpu.c   | 18 +-
 target/loongarch/kvm-stub.c  | 11 +++
 target/loongarch/kvm.c   | 15 +++
 target/loongarch/kvm_loongarch.h | 13 +
 target/loongarch/trace-events|  1 +
 5 files changed, 53 insertions(+), 5 deletions(-)
 create mode 100644 target/loongarch/kvm-stub.c
 create mode 100644 target/loongarch/kvm_loongarch.h

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 61344c7ad2..670612dd0b 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -20,6 +20,11 @@
 #include "sysemu/reset.h"
 #include "tcg/tcg.h"
 #include "vec.h"
+#include "sysemu/kvm.h"
+#include "kvm_loongarch.h"
+#ifdef CONFIG_KVM
+#include 
+#endif
 
 const char * const regnames[32] = {
 "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
@@ -108,12 +113,15 @@ void loongarch_cpu_set_irq(void *opaque, int irq, int 
level)
 return;
 }
 
-env->CSR_ESTAT = deposit64(env->CSR_ESTAT, irq, 1, level != 0);
-
-if (FIELD_EX64(env->CSR_ESTAT, CSR_ESTAT, IS)) {
-cpu_interrupt(cs, CPU_INTERRUPT_HARD);
+if (kvm_enabled()) {
+kvm_loongarch_set_interrupt(cpu, irq, level);
 } else {
-cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
+env->CSR_ESTAT = deposit64(env->CSR_ESTAT, irq, 1, level != 0);
+if (FIELD_EX64(env->CSR_ESTAT, CSR_ESTAT, IS)) {
+cpu_interrupt(cs, CPU_INTERRUPT_HARD);
+} else {
+cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
+}
 }
 }
 
diff --git a/target/loongarch/kvm-stub.c b/target/loongarch/kvm-stub.c
new file mode 100644
index 00..9965c1f119
--- /dev/null
+++ b/target/loongarch/kvm-stub.c
@@ -0,0 +1,11 @@
+/*
+ * QEMU KVM LoongArch specific function stubs
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+#include "cpu.h"
+
+void kvm_loongarch_set_interrupt(LoongArchCPU *cpu, int irq, int level)
+{
+   g_assert_not_reached();
+}
diff --git a/target/loongarch/kvm.c b/target/loongarch/kvm.c
index 0fe52434ed..df8d5f 100644
--- a/target/loongarch/kvm.c
+++ b/target/loongarch/kvm.c
@@ -574,6 +574,21 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 return ret;
 }
 
+int kvm_loongarch_set_interrupt(LoongArchCPU *cpu, int irq, int level)
+{
+struct kvm_interrupt intr;
+CPUState *cs = CPU(cpu);
+
+if (level) {
+intr.irq = irq;
+} else {
+intr.irq = -irq;
+}
+
+trace_kvm_set_intr(irq, level);
+return kvm_vcpu_ioctl(cs, KVM_INTERRUPT, &intr);
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 }
diff --git a/target/loongarch/kvm_loongarch.h b/target/loongarch/kvm_loongarch.h
new file mode 100644
index 00..cdef980eec
--- /dev/null
+++ b/target/loongarch/kvm_loongarch.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch kvm interface
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
+#ifndef QEMU_KVM_LOONGARCH_H
+#define QEMU_KVM_LOONGARCH_H
+
+int  kvm_loongarch_set_interrupt(LoongArchCPU *cpu, int irq, int level);
+
+#endif
diff --git a/target/loongarch/trace-events b/target/loongarch/trace-events
index 6cce653b20..3263406ebe 100644
--- a/target/loongarch/trace-events
+++ b/target/loongarch/trace-events
@@ -14,3 +14,4 @@ kvm_failed_put_counter(const char *msg) "Failed to put 
counter into KVM: %s"
 kvm_failed_get_cpucfg(const char *msg) "Failed to get cpucfg from KVM: %s"
 kvm_failed_put_cpucfg(const char *msg) "Failed to put cpucfg into KVM: %s"
 kvm_arch_handle_exit(int num) "kvm arch handle exit, the reason number: %d"
+kvm_set_intr(int irq, int level) "kvm set interrupt, irq num: %d, level: %d"
-- 
2.39.1

[PATCH RFC v4 4/9] target/loongarch: Implement kvm get/set registers

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

Implement kvm_arch_get/set_registers interfaces, many regs
can be get/set in the function, such as core regs, csr regs,
fpu regs, mp state, etc.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Signed-off-by: xianglai li 
---
 meson.build   |   1 +
 target/loongarch/cpu.c|   3 +
 target/loongarch/cpu.h|   2 +
 target/loongarch/kvm.c| 406 +-
 target/loongarch/trace-events |  13 ++
 target/loongarch/trace.h  |   1 +
 6 files changed, 424 insertions(+), 2 deletions(-)
 create mode 100644 target/loongarch/trace-events
 create mode 100644 target/loongarch/trace.h

diff --git a/meson.build b/meson.build
index 3bb64b536c..1c71ead833 100644
--- a/meson.build
+++ b/meson.build
@@ -3305,6 +3305,7 @@ if have_system or have_user
 'target/hppa',
 'target/i386',
 'target/i386/kvm',
+'target/loongarch',
 'target/mips/tcg',
 'target/nios2',
 'target/ppc',
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 0d763d8a65..61344c7ad2 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -546,6 +546,9 @@ static void loongarch_cpu_reset_hold(Object *obj)
 #ifndef CONFIG_USER_ONLY
 env->pc = 0x1c00;
 memset(env->tlb, 0, sizeof(env->tlb));
+if (kvm_enabled()) {
+kvm_arch_reset_vcpu(env);
+}
 #endif
 
 restore_fp_status(env);
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index e6a99c83ab..2580dc26e1 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -359,6 +359,7 @@ typedef struct CPUArchState {
 MemoryRegion iocsr_mem;
 bool load_elf;
 uint64_t elf_address;
+uint32_t mp_state;
 /* Store ipistate to access from this struct */
 DeviceState *ipistate;
 #endif
@@ -477,6 +478,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState 
*env, vaddr *pc,
 }
 
 void loongarch_cpu_list(void);
+void kvm_arch_reset_vcpu(CPULoongArchState *env);
 
 #define cpu_list loongarch_cpu_list
 
diff --git a/target/loongarch/kvm.c b/target/loongarch/kvm.c
index 0d67322fd9..8fda80b107 100644
--- a/target/loongarch/kvm.c
+++ b/target/loongarch/kvm.c
@@ -26,19 +26,421 @@
 #include "sysemu/runstate.h"
 #include "cpu-csr.h"
 #include "kvm_loongarch.h"
+#include "trace.h"
 
 static bool cap_has_mp_state;
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
 
+static int kvm_loongarch_get_regs_core(CPUState *cs)
+{
+int ret = 0;
+int i;
+struct kvm_regs regs;
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+CPULoongArchState *env = &cpu->env;
+
+/* Get the current register set as KVM seems it */
+ret = kvm_vcpu_ioctl(cs, KVM_GET_REGS, ®s);
+if (ret < 0) {
+trace_kvm_failed_get_regs_core(strerror(errno));
+return ret;
+}
+/* gpr[0] value is always 0 */
+env->gpr[0] = 0;
+for (i = 1; i < 32; i++) {
+env->gpr[i] = regs.gpr[i];
+}
+
+env->pc = regs.pc;
+return ret;
+}
+
+static int kvm_loongarch_put_regs_core(CPUState *cs)
+{
+int ret = 0;
+int i;
+struct kvm_regs regs;
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+CPULoongArchState *env = &cpu->env;
+
+/* Set the registers based on QEMU's view of things */
+for (i = 0; i < 32; i++) {
+regs.gpr[i] = env->gpr[i];
+}
+
+regs.pc = env->pc;
+ret = kvm_vcpu_ioctl(cs, KVM_SET_REGS, ®s);
+if (ret < 0) {
+trace_kvm_failed_put_regs_core(strerror(errno));
+}
+
+return ret;
+}
+
+static int kvm_larch_getq(CPUState *cs, uint64_t reg_id,
+ uint64_t *addr)
+{
+struct kvm_one_reg csrreg = {
+.id = reg_id,
+.addr = (uintptr_t)addr
+};
+
+return kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &csrreg);
+}
+
+static int kvm_larch_putq(CPUState *cs, uint64_t reg_id,
+ uint64_t *addr)
+{
+struct kvm_one_reg csrreg = {
+.id = reg_id,
+.addr = (uintptr_t)addr
+};
+
+return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &csrreg);
+}
+
+#define KVM_GET_ONE_UREG64(cs, ret, regidx, addr) \
+({\
+err = kvm_larch_getq(cs, KVM_IOC_CSRID(regidx), addr);\
+if (err < 0) {\
+ret = err;\
+trace_kvm_failed_get_csr(regidx, strerror(errno));\
+} \
+})
+
+#define KVM_PUT_ONE_UREG64(cs, ret, regidx, addr) \
+({\
+err = kvm_larch_putq

[PATCH RFC v4 3/9] target/loongarch: Supplement vcpu env initial when vcpu reset

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

Supplement vcpu env initial when vcpu reset, including
init vcpu CSR_CPUID,CSR_TID to cpu->cpu_index. The two
regs will be used in kvm_get/set_csr_ioctl.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Signed-off-by: xianglai li 
---
 target/loongarch/cpu.c | 2 ++
 target/loongarch/cpu.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 2bea7ca5d5..0d763d8a65 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -524,10 +524,12 @@ static void loongarch_cpu_reset_hold(Object *obj)
 
 env->CSR_ESTAT = env->CSR_ESTAT & (~MAKE_64BIT_MASK(0, 2));
 env->CSR_RVACFG = FIELD_DP64(env->CSR_RVACFG, CSR_RVACFG, RBITS, 0);
+env->CSR_CPUID = cs->cpu_index;
 env->CSR_TCFG = FIELD_DP64(env->CSR_TCFG, CSR_TCFG, EN, 0);
 env->CSR_LLBCTL = FIELD_DP64(env->CSR_LLBCTL, CSR_LLBCTL, KLO, 0);
 env->CSR_TLBRERA = FIELD_DP64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR, 0);
 env->CSR_MERRCTL = FIELD_DP64(env->CSR_MERRCTL, CSR_MERRCTL, ISMERR, 0);
+env->CSR_TID = cs->cpu_index;
 
 env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, TLB_TYPE, 2);
 env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, MTLB_ENTRY, 63);
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 40e70a8119..e6a99c83ab 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -318,6 +318,7 @@ typedef struct CPUArchState {
 uint64_t CSR_PWCH;
 uint64_t CSR_STLBPS;
 uint64_t CSR_RVACFG;
+uint64_t CSR_CPUID;
 uint64_t CSR_PRCFG1;
 uint64_t CSR_PRCFG2;
 uint64_t CSR_PRCFG3;
@@ -349,7 +350,6 @@ typedef struct CPUArchState {
 uint64_t CSR_DBG;
 uint64_t CSR_DERA;
 uint64_t CSR_DSAVE;
-uint64_t CSR_CPUID;
 
 #ifndef CONFIG_USER_ONLY
 LoongArchTLB  tlb[LOONGARCH_TLB_MAX];
-- 
2.39.1

[PATCH RFC v4 7/9] target/loongarch: Implement kvm_arch_handle_exit

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

Implement kvm_arch_handle_exit for loongarch. In this
function, the KVM_EXIT_LOONGARCH_IOCSR is handled,
we read or write the iocsr address space by the addr,
length and is_write argument in kvm_run.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Reviewed-by: Richard Henderson 
Signed-off-by: xianglai li 
---
 target/loongarch/kvm.c| 24 +++-
 target/loongarch/trace-events |  1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/target/loongarch/kvm.c b/target/loongarch/kvm.c
index 9754478e34..0fe52434ed 100644
--- a/target/loongarch/kvm.c
+++ b/target/loongarch/kvm.c
@@ -549,7 +549,29 @@ bool kvm_arch_cpu_check_are_resettable(void)
 
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 {
-return 0;
+int ret = 0;
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+CPULoongArchState *env = &cpu->env;
+MemTxAttrs attrs = {};
+
+attrs.requester_id = env_cpu(env)->cpu_index;
+
+trace_kvm_arch_handle_exit(run->exit_reason);
+switch (run->exit_reason) {
+case KVM_EXIT_LOONGARCH_IOCSR:
+address_space_rw(&env->address_space_iocsr,
+ run->iocsr_io.phys_addr,
+ attrs,
+ run->iocsr_io.data,
+ run->iocsr_io.len,
+ run->iocsr_io.is_write);
+break;
+default:
+ret = -1;
+warn_report("KVM: unknown exit reason %d", run->exit_reason);
+break;
+}
+return ret;
 }
 
 void kvm_arch_accel_class_init(ObjectClass *oc)
diff --git a/target/loongarch/trace-events b/target/loongarch/trace-events
index f801ad7c76..6cce653b20 100644
--- a/target/loongarch/trace-events
+++ b/target/loongarch/trace-events
@@ -13,3 +13,4 @@ kvm_failed_get_counter(const char *msg) "Failed to get 
counter from KVM: %s"
 kvm_failed_put_counter(const char *msg) "Failed to put counter into KVM: %s"
 kvm_failed_get_cpucfg(const char *msg) "Failed to get cpucfg from KVM: %s"
 kvm_failed_put_cpucfg(const char *msg) "Failed to put cpucfg into KVM: %s"
+kvm_arch_handle_exit(int num) "kvm arch handle exit, the reason number: %d"
-- 
2.39.1

[PATCH RFC v4 1/9] linux-headers: Add KVM headers for loongarch

2023-10-09 Thread xianglai li

From: Tianrui Zhao 

This patch is only a placeholder now, which is used to
show some kvm structures and macros for reviewers.
And it will be replaced by using update-linux-headers.sh
when the linux loongarch kvm patches are accepted.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Signed-off-by: Tianrui Zhao 
Signed-off-by: xianglai li 
---
 linux-headers/asm-loongarch/kvm.h | 100 ++
 linux-headers/linux/kvm.h |   9 +++
 2 files changed, 109 insertions(+)
 create mode 100644 linux-headers/asm-loongarch/kvm.h

diff --git a/linux-headers/asm-loongarch/kvm.h 
b/linux-headers/asm-loongarch/kvm.h
new file mode 100644
index 00..5e72b83372
--- /dev/null
+++ b/linux-headers/asm-loongarch/kvm.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (C) 2023 Loongson Technology Corporation Limited
+ */
+
+#ifndef __UAPI_ASM_LOONGARCH_KVM_H
+#define __UAPI_ASM_LOONGARCH_KVM_H
+
+#include 
+
+/*
+ * KVM Loongarch specific structures and definitions.
+ */
+
+#define __KVM_HAVE_READONLY_MEM
+
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+/*
+ * for KVM_GET_REGS and KVM_SET_REGS
+ */
+struct kvm_regs {
+   /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
+   __u64 gpr[32];
+   __u64 pc;
+};
+
+/*
+ * for KVM_GET_FPU and KVM_SET_FPU
+ */
+struct kvm_fpu {
+   __u32 fcsr;
+   __u32 none;
+   __u64 fcc;/* 8x8 */
+   struct kvm_fpureg {
+   __u64 val64[4];
+   } fpr[32];
+};
+
+/*
+ * For LoongArch, we use KVM_SET_ONE_REG and KVM_GET_ONE_REG to access various
+ * registers.  The id field is broken down as follows:
+ *
+ *  bits[63..52] - As per linux/kvm.h
+ *  bits[51..32] - Must be zero.
+ *  bits[31..16] - Register set.
+ *
+ * Register set = 0: GP registers from kvm_regs (see definitions below).
+ *
+ * Register set = 1: CSR registers.
+ *
+ * Register set = 2: KVM specific registers (see definitions below).
+ *
+ * Register set = 3: FPU / SIMD registers (see definitions below).
+ *
+ * Other sets registers may be added in the future.  Each set would
+ * have its own identifier in bits[31..16].
+ */
+
+#define KVM_REG_LOONGARCH_GP   (KVM_REG_LOONGARCH | 0x0ULL)
+#define KVM_REG_LOONGARCH_CSR  (KVM_REG_LOONGARCH | 0x1ULL)
+#define KVM_REG_LOONGARCH_KVM  (KVM_REG_LOONGARCH | 0x2ULL)
+#define KVM_REG_LOONGARCH_FPU  (KVM_REG_LOONGARCH | 0x3ULL)
+#define KVM_REG_LOONGARCH_CPUCFG   (KVM_REG_LOONGARCH | 0x4ULL)
+#define KVM_REG_LOONGARCH_MASK (KVM_REG_LOONGARCH | 0x7ULL)
+#define KVM_CSR_IDX_MASK   0x7fff
+#define KVM_CPUCFG_IDX_MASK0x7fff
+
+/*
+ * KVM_REG_LOONGARCH_KVM - KVM specific control registers.
+ */
+
+#define KVM_REG_LOONGARCH_COUNTER  (KVM_REG_LOONGARCH_KVM | 
KVM_REG_SIZE_U64 | 3)
+#define KVM_REG_LOONGARCH_VCPU_RESET   (KVM_REG_LOONGARCH_KVM | 
KVM_REG_SIZE_U64 | 4)
+
+#define LOONGARCH_REG_SHIFT3
+#define LOONGARCH_REG_64(TYPE, REG)(TYPE | KVM_REG_SIZE_U64 | (REG << 
LOONGARCH_REG_SHIFT))
+#define KVM_IOC_CSRID(REG) LOONGARCH_REG_64(KVM_REG_LOONGARCH_CSR, 
REG)
+#define KVM_IOC_CPUCFG(REG)
LOONGARCH_REG_64(KVM_REG_LOONGARCH_CPUCFG, REG)
+
+struct kvm_debug_exit_arch {
+};
+
+/* for KVM_SET_GUEST_DEBUG */
+struct kvm_guest_debug_arch {
+};
+
+/* definition of registers in kvm_run */
+struct kvm_sync_regs {
+};
+
+/* dummy definition */
+struct kvm_sregs {
+};
+
+#define KVM_NR_IRQCHIPS1
+#define KVM_IRQCHIP_NUM_PINS   64
+#define KVM_MAX_CORES  256
+
+#endif /* __UAPI_ASM_LOONGARCH_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 0d74ee999a..0e378bbcbf 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -264,6 +264,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_SBI35
 #define KVM_EXIT_RISCV_CSR36
 #define KVM_EXIT_NOTIFY   37
+#define KVM_EXIT_LOONGARCH_IOCSR  38
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -336,6 +337,13 @@ struct kvm_run {
__u32 len;
__u8  is_write;
} mmio;
+   /* KVM_EXIT_LOONGARCH_IOCSR */
+   struct {
+   __u64 phys_addr;
+   __u8  data[8];
+   __u32 len;
+   __u8  is_write;
+   } iocsr_io;
/* KVM_EXIT_HYPERCALL */
struct {
__u64 nr;
@@ -1358,6 +1366,7 @@ struct kvm_dirty_tlb {
 #define KVM_REG_ARM64  0x6000ULL
 #define KVM_REG_MIPS   0x7000ULL
 #define KVM_REG_RISCV  0x8000ULL
+#defin

[PATCH RFC v4 0/9] Add loongarch kvm accel support

2023-10-09 Thread xianglai li

This series add loongarch kvm support, mainly implement
some interfaces used by kvm such as kvm_arch_get/set_regs,
kvm_arch_handle_exit, kvm_loongarch_set_interrupt, etc.

Currently, we are able to boot LoongArch KVM Linux Guests.
In loongarch VM, mmio devices and iocsr devices are emulated
in user space such as APIC, IPI, pci devices, etc, other
hardwares such as MMU, timer and csr are emulated in kernel.

It is based on temporarily unaccepted linux kvm:
https://github.com/loongson/linux-loongarch-kvm
And We will remove the RFC flag until the linux kvm patches
are merged.

The running environment of LoongArch virt machine:
1. Get the linux source by the above mentioned link.
   git checkout kvm-loongarch
   make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- 
loongson3_defconfig
   make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu-
2. Get the qemu source: https://github.com/loongson/qemu
   git checkout kvm-loongarch
   ./configure --target-list="loongarch64-softmmu"  --enable-kvm
   make
3. Get uefi bios of LoongArch virt machine:
   Link: 
https://github.com/tianocore/edk2-platforms/tree/master/Platform/Loongson/LoongArchQemuPkg#readme
4. Also you can access the binary files we have already build:
   https://github.com/yangxiaojuan-loongson/qemu-binary

The command to boot loongarch virt machine:
   $ qemu-system-loongarch64 -machine virt -m 4G -cpu la464 \
   -smp 1 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd ramdisk \
   -serial stdio   -monitor telnet:localhost:4495,server,nowait \
   -append "root=/dev/ram rdinit=/sbin/init console=ttyS0,115200" \
   --nographic

Changes for RFC v4:
1. Added function interfaces kvm_loongarch_get_cpucfg and
kvm_loongarch_put_cpucfg for passing the value of vcpu cfg to kvm.
Move the macro definition KVM_IOC_CSRID from kvm.c to kvm.h.
2.Delete the duplicate CSR_CPUID field in CPUArchState.
3.Add kvm_arch_get_default_type function in kvm.c.
4.Disable LSX,LASX in cpucfg2 in KVM. And disable LBT in cpucfg2 in KVM.

Changes for RFC v3:
1. Move the init mp_state to KVM_MP_STATE_RUNNABLE function into kvm.c.
2. Fix some unstandard code problems in kvm_get/set_regs_ioctl, such as
sort loongarch to keep alphabetic ordering in meson.build, gpr[0] should
be always 0, remove unnecessary inline statement, etc.
3. Rename the counter_value variable to kvm_state_counter in cpu_env,
and add comments for it to explain the meaning.

Changes for RFC v2:
1. Mark the "Add KVM headers for loongarch" patch as a placeholder,
as we will use the update-linux-headers.sh to generate the kvm headers
when the linux loongarch KVM patch series are accepted.
2. Remove the DPRINTF macro in kvm.c and use trace events to replace
it, we add some trace functions such as trace_kvm_handle_exit,
trace_kvm_set_intr, trace_kvm_failed_get_csr, etc.
3. Remove the unused functions in kvm_stub.c and move stub function into
the suitable patch.

Cc: "Michael S. Tsirkin" 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Bibo Mao 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Tianrui Zhao 

Tianrui Zhao (9):
  linux-headers: Add KVM headers for loongarch
  target/loongarch: Define some kvm_arch interfaces
  target/loongarch: Supplement vcpu env initial when vcpu reset
  target/loongarch: Implement kvm get/set registers
  target/loongarch: Implement kvm_arch_init function
  target/loongarch: Implement kvm_arch_init_vcpu
  target/loongarch: Implement kvm_arch_handle_exit
  target/loongarch: Implement set vcpu intr for kvm
  target/loongarch: Add loongarch kvm into meson build

 linux-headers/asm-loongarch/kvm.h | 100 +
 linux-headers/linux/kvm.h |   9 +
 meson.build   |   3 +
 target/loongarch/cpu.c|  23 +-
 target/loongarch/cpu.h|   6 +-
 target/loongarch/kvm-stub.c   |  11 +
 target/loongarch/kvm.c| 594 ++
 target/loongarch/kvm_loongarch.h  |  13 +
 target/loongarch/meson.build  |   1 +
 target/loongarch/trace-events |  17 +
 target/loongarch/trace.h  |   1 +
 11 files changed, 772 insertions(+), 6 deletions(-)
 create mode 100644 linux-headers/asm-loongarch/kvm.h
 create mode 100644 target/loongarch/kvm-stub.c
 create mode 100644 target/loongarch/kvm.c
 create mode 100644 target/loongarch/kvm_loongarch.h
 create mode 100644 target/loongarch/trace-events
 create mode 100644 target/loongarch/trace.h

-- 
2.39.1

Re: [PATCH 1/3] via-ide: Fix legacy mode emulation

2023-10-09 Thread Bernhard Beschow

Am 8. Oktober 2023 11:08:58 UTC schrieb BALATON Zoltan :
>On Sun, 8 Oct 2023, Mark Cave-Ayland wrote:
>> On 05/10/2023 23:13, BALATON Zoltan wrote:
>> 
>>> The initial value for BARs were set in reset method for emulating
>>> legacy mode at start but this does not work because PCI code resets
>>> BARs after calling device reset method.
>> 
>> This is certainly something I've noticed when testing previous versions of 
>> the VIA patches. Perhaps it's worth a separate thread to the PCI devs?
>
>I think I brought up this back then but was told current PCI code won't change 
>and since that could break everything else that makes sense so this is 
>something that we should take as given and accomodate that.

Why not play safe like:
1. add a class property such as `reset_bar_addrs[PCI_NUM_REGIONS]`
2. set all elements to zero in `pci_device_class_init()`
3. respect `reset_bar_addrs` in `pci_reset_regions()`
4. assign the proper reset addresses of TYPE_VIA_IDE in `via_ide_class_init()`

That would pretty obviously preserve the behavior of existing device models 
while allowing TYPE_VIA_IDE to be reset properly. It would also perform the 
main part of the workaround in the code that exhibits the limitation, so the 
code could potentially be simplified at some point without impacting all PCI 
device models.

Best regards,
Bernhard

>
>>> Additionally the values
>>> written to BARs were also wrong.
>> 
>> I don't believe this is correct: according to the datasheet the values on 
>> reset are the ones given in the current reset code, so even if the reset 
>> function is overridden at a later data during PCI bus reset, I would leave 
>> these for now since it is a different issue.
>
>Those values are missing the IO space bit for one so they can't be correct as 
>a BAR value no matter what the datasheet says. And since they are ineffective 
>now I think it's best to remove them to avoid confusion.
>
>>> Move setting the BARs to a callback on writing the PCI config regsiter
>>> that sets the compatibility mode (which firmwares needing this mode
>>> seem to do) and fix their values to program it to use legacy port
>>> numbers. As noted in a comment, we only do this when the BARs were
>>> unset before, because logs from real machine show this is how real
>>> chip works, even if it contradicts the data sheet which is not very
>>> clear about this.
>>> 
>>> Signed-off-by: BALATON Zoltan 
>>> ---
>>>   hw/ide/via.c | 35 ++-
>>>   1 file changed, 30 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/hw/ide/via.c b/hw/ide/via.c
>>> index fff23803a6..8186190207 100644
>>> --- a/hw/ide/via.c
>>> +++ b/hw/ide/via.c
>>> @@ -132,11 +132,6 @@ static void via_ide_reset(DeviceState *dev)
>>>   pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_FAST_BACK |
>>>PCI_STATUS_DEVSEL_MEDIUM);
>>>   -pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, 0x01f0);
>>> -pci_set_long(pci_conf + PCI_BASE_ADDRESS_1, 0x03f4);
>>> -pci_set_long(pci_conf + PCI_BASE_ADDRESS_2, 0x0170);
>>> -pci_set_long(pci_conf + PCI_BASE_ADDRESS_3, 0x0374);
>>> -pci_set_long(pci_conf + PCI_BASE_ADDRESS_4, 0xcc01); /* BMIBA: 
>>> 20-23h */
>>>   pci_set_long(pci_conf + PCI_INTERRUPT_LINE, 0x010e);
>>> /* IDE chip enable, IDE configuration 1/2, IDE FIFO Configuration*/
>>> @@ -159,6 +154,35 @@ static void via_ide_reset(DeviceState *dev)
>>>   pci_set_long(pci_conf + 0xc0, 0x00020001);
>>>   }
>>>   +static void via_ide_cfg_write(PCIDevice *pd, uint32_t addr,
>>> +  uint32_t val, int len)
>>> +{
>>> +pci_default_write_config(pd, addr, val, len);
>>> +/*
>>> + * Only set BARs if they are unset. Logs from real hardware show that
>>> + * writing class_prog to enable compatibility mode after BARs were set
>>> + * (possibly by firmware) it will use ports set by BARs not ISA ports
>>> + * (e.g. pegasos2 Linux does this and calls it non-100% native mode).
>> 
>> Can you remind me again where the references are to non-100% native mode? 
>> The only thing I can find in Linux is 
>> https://github.com/torvalds/linux/blob/master/arch/powerpc/platforms/chrp/pci.c#L360
>>  but that simply forces a switch to legacy mode, with no mention of 
>> "non-100% native mode".
>
>It was discussed somewhere in the via-ide thread we had when this was last 
>touched for pegasos2 in March 2020. Basically the non-100% native mode is when 
>ports are set by BARs but IRQs are still hard coded to 14-15. Linux can work 
>with all 3 possible modes: legacy (both ports and IRQs are hard coded to ISA 
>values), native (using BARs and PCI config 0x3c for a single interrupt for 
>both channels, vt82c686 data sheet does not document this but vt8231 has a 
>comment saying native mode only) and non-100% native mode where BARs are 
>effective to set port addresses but IRQs don't respect 0x3c but use 14-15 as 
>in legacy mode. Some machines only work in non-100%

Re: [Virtio-fs] (no subject)

2023-10-09 Thread Hanna Czenczek


On 09.10.23 10:21, Hanna Czenczek wrote:

On 07.10.23 04:22, Yajun Wu wrote:


[...]

The main motivation of adding VHOST_USER_SET_STATUS is to let backend 
DPDK know
when DRIVER_OK bit is valid. It's an indication of all VQ 
configuration has sent,
otherwise DPDK has to rely on first queue pair is ready, then 
receiving/applying

VQ configuration one by one.

During live migration, configuring VQ one by one is very time consuming.


One question I have here is why it wasn’t then introduced in the live 
migration code, but in the general VM stop/cont code instead. It does 
seem time-consuming to do this every time the VM is paused and resumed.



For VIRTIO
net vDPA, HW needs to know how many VQs are enabled to set 
RSS(Receive-Side Scaling).


If you don’t want SET_STATUS message, backend can remove protocol 
feature bit

VHOST_USER_PROTOCOL_F_STATUS.


The problem isn’t back-ends that don’t want the message, the problem 
is that qemu uses the message wrongly, which prevents well-behaving 
back-ends from implementing the message.


DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device 
close/reset.


So the right thing to do for back-ends is to announce STATUS support 
and then not implement it correctly?


GET_VRING_BASE should not reset the close or reset the device, by the 
way.  It should stop that one vring, not more.  We have a RESET_DEVICE 
command for resetting.


I'm not involved in discussion about adding SET_STATUS in Vhost 
protocol. This feature
is essential for vDPA(same as vhost-vdpa implements 
VHOST_VDPA_SET_STATUS).


So from what I gather from your response is that there is only a 
single use for SET_STATUS, which is the DRIVER_OK bit.  If so, 
documenting that all other bits are to be ignored by both back-end and 
front-end would be fine by me.


I’m not fully serious about that suggestion, but I hear the strong 
implication that nothing but DRIVER_OK was of any concern, and this is 
really important to note when we talk about the status of the STATUS 
feature in vhost today.  It seems to me now that it was not intended 
to be the virtio-level status byte, but just a DRIVER_OK signalling 
path from front-end to back-end.  That makes it a vhost-level protocol 
feature to me.


On second thought, it just is a pure vhost-level protocol feature, and 
has nothing to do with the virtio status byte as-is.  The only stated 
purpose is for the front-end to send DRIVER_OK after migration, but 
migration is transparent to the guest, so the guest would never change 
the status byte during migration.  Therefore, if this feature is 
essential, we will never be able to have a status byte that is 
transparently shared between guest and back-end device, i.e. the virtio 
status byte.


Cc-ing Alex on this mail, because to me, this seems like an important 
detail when he plans on using the byte in the future.  If we need a 
virtio status byte, I can’t see how we could use the existing F_STATUS 
for it.


Hanna

[PATCH] cpus: Remove unused smp_cores/smp_threads declarations

2023-10-09 Thread Philippe Mathieu-Daudé

Commit a5e0b33119 ("vl.c: Replace smp global variables
with smp machine properties") removed the last uses of
the smp_cores / smp_threads variables but forgot to
remove their declarations. Do it now.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/sysemu/cpus.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 0535a4c68a..b4a566cfe7 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -50,11 +50,4 @@ void cpu_synchronize_all_post_reset(void);
 void cpu_synchronize_all_post_init(void);
 void cpu_synchronize_all_pre_loadvm(void);
 
-#ifndef CONFIG_USER_ONLY
-/* vl.c */
-/* *-user doesn't have configurable SMP topology */
-extern int smp_cores;
-extern int smp_threads;
-#endif
-
 #endif
-- 
2.41.0

[PATCH v5 02/15] linux-headers: Add iommufd.h

2023-10-09 Thread Eric Auger

From: Zhenzhong Duan 

Since commit da3c22c74a3c ("linux-headers: Update to Linux v6.6-rc1"),
linux-headers has been updated to v6.6-rc1.

As previous patch added iommufd.h to update-linux-headers.sh,
run the script again against TAG v6.6-rc1 to have iommufd.h included.

Signed-off-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 
---
 linux-headers/linux/iommufd.h | 444 ++
 1 file changed, 444 insertions(+)
 create mode 100644 linux-headers/linux/iommufd.h

diff --git a/linux-headers/linux/iommufd.h b/linux-headers/linux/iommufd.h
new file mode 100644
index 00..218bf7ac98
--- /dev/null
+++ b/linux-headers/linux/iommufd.h
@@ -0,0 +1,444 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES.
+ */
+#ifndef _IOMMUFD_H
+#define _IOMMUFD_H
+
+#include 
+#include 
+
+#define IOMMUFD_TYPE (';')
+
+/**
+ * DOC: General ioctl format
+ *
+ * The ioctl interface follows a general format to allow for extensibility. 
Each
+ * ioctl is passed in a structure pointer as the argument providing the size of
+ * the structure in the first u32. The kernel checks that any structure space
+ * beyond what it understands is 0. This allows userspace to use the backward
+ * compatible portion while consistently using the newer, larger, structures.
+ *
+ * ioctls use a standard meaning for common errnos:
+ *
+ *  - ENOTTY: The IOCTL number itself is not supported at all
+ *  - E2BIG: The IOCTL number is supported, but the provided structure has
+ *non-zero in a part the kernel does not understand.
+ *  - EOPNOTSUPP: The IOCTL number is supported, and the structure is
+ *understood, however a known field has a value the kernel does not
+ *understand or support.
+ *  - EINVAL: Everything about the IOCTL was understood, but a field is not
+ *correct.
+ *  - ENOENT: An ID or IOVA provided does not exist.
+ *  - ENOMEM: Out of memory.
+ *  - EOVERFLOW: Mathematics overflowed.
+ *
+ * As well as additional errnos, within specific ioctls.
+ */
+enum {
+   IOMMUFD_CMD_BASE = 0x80,
+   IOMMUFD_CMD_DESTROY = IOMMUFD_CMD_BASE,
+   IOMMUFD_CMD_IOAS_ALLOC,
+   IOMMUFD_CMD_IOAS_ALLOW_IOVAS,
+   IOMMUFD_CMD_IOAS_COPY,
+   IOMMUFD_CMD_IOAS_IOVA_RANGES,
+   IOMMUFD_CMD_IOAS_MAP,
+   IOMMUFD_CMD_IOAS_UNMAP,
+   IOMMUFD_CMD_OPTION,
+   IOMMUFD_CMD_VFIO_IOAS,
+   IOMMUFD_CMD_HWPT_ALLOC,
+   IOMMUFD_CMD_GET_HW_INFO,
+};
+
+/**
+ * struct iommu_destroy - ioctl(IOMMU_DESTROY)
+ * @size: sizeof(struct iommu_destroy)
+ * @id: iommufd object ID to destroy. Can be any destroyable object type.
+ *
+ * Destroy any object held within iommufd.
+ */
+struct iommu_destroy {
+   __u32 size;
+   __u32 id;
+};
+#define IOMMU_DESTROY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DESTROY)
+
+/**
+ * struct iommu_ioas_alloc - ioctl(IOMMU_IOAS_ALLOC)
+ * @size: sizeof(struct iommu_ioas_alloc)
+ * @flags: Must be 0
+ * @out_ioas_id: Output IOAS ID for the allocated object
+ *
+ * Allocate an IO Address Space (IOAS) which holds an IO Virtual Address (IOVA)
+ * to memory mapping.
+ */
+struct iommu_ioas_alloc {
+   __u32 size;
+   __u32 flags;
+   __u32 out_ioas_id;
+};
+#define IOMMU_IOAS_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_ALLOC)
+
+/**
+ * struct iommu_iova_range - ioctl(IOMMU_IOVA_RANGE)
+ * @start: First IOVA
+ * @last: Inclusive last IOVA
+ *
+ * An interval in IOVA space.
+ */
+struct iommu_iova_range {
+   __aligned_u64 start;
+   __aligned_u64 last;
+};
+
+/**
+ * struct iommu_ioas_iova_ranges - ioctl(IOMMU_IOAS_IOVA_RANGES)
+ * @size: sizeof(struct iommu_ioas_iova_ranges)
+ * @ioas_id: IOAS ID to read ranges from
+ * @num_iovas: Input/Output total number of ranges in the IOAS
+ * @__reserved: Must be 0
+ * @allowed_iovas: Pointer to the output array of struct iommu_iova_range
+ * @out_iova_alignment: Minimum alignment required for mapping IOVA
+ *
+ * Query an IOAS for ranges of allowed IOVAs. Mapping IOVA outside these ranges
+ * is not allowed. num_iovas will be set to the total number of iovas and
+ * the allowed_iovas[] will be filled in as space permits.
+ *
+ * The allowed ranges are dependent on the HW path the DMA operation takes, and
+ * can change during the lifetime of the IOAS. A fresh empty IOAS will have a
+ * full range, and each attached device will narrow the ranges based on that
+ * device's HW restrictions. Detaching a device can widen the ranges. Userspace
+ * should query ranges after every attach/detach to know what IOVAs are valid
+ * for mapping.
+ *
+ * On input num_iovas is the length of the allowed_iovas array. On output it is
+ * the total number of iovas filled in. The ioctl will return -EMSGSIZE and set
+ * num_iovas to the required value if num_iovas is too small. In this case the
+ * caller should allocate a larger output array and re-issue the ioctl.
+ *
+ * out_iova_alignment returns the minimum IOVA alignment that can be given
+ * to IOMMU

[PATCH v5 00/15] Prerequisite changes for IOMMUFD support

2023-10-09 Thread Eric Auger

Hi All,

This is the v5 respin of the IOMMUFD prerequisite series.
This applies on top of vfio-next:
https://github.com/legoater/qemu/, branch vfio-next.

Per Cédric's suggestion, the IOMMUFD patchset v1[1] is now split
into two series, this prerequisite series and the new IOMMUFD backend
introduction support series. Hopefully this will ease the review.
  
The main purpose of this series is to make "common.c" group agnostic:
all group related code are moved into container.c. Then we are prepared
for next series, abstract base container, adding new backend, etc.

This series can be found at
https://github.com/eauger/qemu/tree/prereq_v5
previous: https://github.com/eauger/qemu/tree/prereq_v4

Test done:
- PCI device were tested
- device hotplug test
- with or without vIOMMU
- VFIO migration with a E800 net card(no dirty sync support) passthrough
- platform and ccw were only compile-tested due to environment limit

Zhenzhong, Yi, Eric

[1] 
https://lore.kernel.org/all/20230830103754.36461-1-zhenzhong.d...@intel.com/t/#u

Changelog:

v5:
- ap: fix missing return
- ccw: remove vbasedev->sysfsdev g_strdup_printf(), remove name local var
- container.c: restored !vbasedev->container check in vfio_detach_device()
- pci.c: removed vbasedev->name deallocation in error path as this is
  handled in instance_finalize function

v4:
- include qemu/error-report.h in helpers.c
- in ap.c, fix the wrongly added
  vfio_detach_device(vbasedev) and g_free(vbasedev->name);
  also added error_prepend
- simplified vbasedev setting in ccw.c
- vfio_detach_device: dropped check on
  !vbasedev->container
- container.c: restore dropped comment

v3:
- rebased on vfio-next as suggested by Cedric
- added vfio/common: Propagate KVM_SET_DEVICE_ATTR error if any
- collected Cedric's R-b
- Fix some error paths in vfio/cpi which now properly detach the device
  and also free the vbasedev->name
- Fix vfio/ccw migration (hopefully) [Matthew inputs]
- Split [PATCH v2 11/12] vfio/common: Introduce two kinds of VFIO device lists
  into 3 patches

v2:
- Refine patch description per Eric
- return errno and errp in vfio_kvm_device_[add/del]_fd per Eric
- make memory listener register/deregister in seperate patch per Eric
- Include the .h file first per Cédric
- Add trace event in vfio_attach_device per Cédric
- drop the change to vfio_viommu_preset by refactor per Cédric
- Introduce global VFIO device list and per container list per Alex

Note changelog below are from full IOMMUFD series:

v1:
- Alloc hwpt instead of using auto hwpt
- elaborate iommufd code per Nicolin
- consolidate two patches and drop as.c
- typo error fix and function rename

rfcv4:
- rebase on top of v8.0.3
- Add one patch from Yi which is about vfio device add in kvm
- Remove IOAS_COPY optimization and focus on functions in this patchset
- Fix wrong name issue reported and fix suggested by Matthew
- Fix compilation issue reported and fix sugggsted by Nicolin
- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
granularity
- Add dev_iter_next() callback to avoid adding so many callback
  at container scope, add VFIODevice.hwpt to support that
- Restore all functions back to common from container whenever possible,
  mainly migration and reset related functions
- Add --enable/disable-iommufd config option, enabled by default in linux
- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
redundant code
- Add FD passing support for vfio device backed by IOMMUFD
- Fix hot unplug resource leak issue in vfio_legacy_detach_device()
- Fix FD leak in vfio_get_devicefd()

rfcv3:
- rebase on top of v7.2.0
- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
  VFIO backends
- Fix use after free in error path, reported by Alister
- Split common.c in several steps to ease the review

rfcv2:
- remove the first three patches of rfcv1
- add open cdev helper suggested by Jason
- remove the QOMification of the VFIOContainer and simply use standard ops
(David)
- add "-object iommufd" suggested by Alex


Eric Auger (7):
  scripts/update-linux-headers: Add iommufd.h
  vfio/common: Propagate KVM_SET_DEVICE_ATTR error if any
  vfio/common: Introduce vfio_container_add|del_section_window()
  vfio/pci: Introduce vfio_[attach/detach]_device
  vfio/platform: Use vfio_[attach/detach]_device
  vfio/ap: Use vfio_[attach/detach]_device
  vfio/ccw: Use vfio_[attach/detach]_device

Yi Liu (2):
  vfio/common: Move IOMMU agnostic helpers to a separate file
  vfio/common: Move legacy VFIO backend code into separate container.c

Zhenzhong Duan (6):
  linux-headers: Add iommufd.h
  vfio/common: Extract out vfio_kvm_device_[add/del]_fd
  vfio/common: Move VFIO reset handler registration to a group agnostic
function
  vfio/common: Introduce a per container device list
  vfio/common: Store the parent container in VFIODevice

[PATCH v5 04/15] vfio/common: Propagate KVM_SET_DEVICE_ATTR error if any

2023-10-09 Thread Eric Auger

In the VFIO_SPAPR_TCE_v2_IOMMU container case, when
KVM_SET_DEVICE_ATTR fails, we currently don't propagate the
error as we do on the vfio_spapr_create_window() failure
case. Let's align the code. Take the opportunity to
reword the error message and make it more explicit.

Signed-off-by: Eric Auger 
Reviewed-by: Cédric Le Goater 

---
---
 hw/vfio/common.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4e122fc4e4..c54a72ec80 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -878,11 +878,11 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 QLIST_FOREACH(group, &container->group_list, container_next) {
 param.groupfd = group->fd;
 if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) 
{
-error_report("vfio: failed to setup fd %d "
- "for a group with fd %d: %s",
- param.tablefd, param.groupfd,
- strerror(errno));
-return;
+error_setg_errno(&err, errno,
+ "vfio: failed GROUP_SET_SPAPR_TCE for 
"
+ "KVM VFIO device %d and group fd %d",
+ param.tablefd, param.groupfd);
+goto fail;
 }
 trace_vfio_spapr_group_attach(param.groupfd, 
param.tablefd);
 }
-- 
2.41.0

[PATCH v5 08/15] vfio/platform: Use vfio_[attach/detach]_device

2023-10-09 Thread Eric Auger

Let the vfio-platform device use vfio_attach_device() and
vfio_detach_device(), hence hiding the details of the used
IOMMU backend.

Drop the trace event for vfio-platform as we have similar
one in vfio_attach_device.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/platform.c   | 43 +++
 hw/vfio/trace-events |  1 -
 2 files changed, 3 insertions(+), 41 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 5af73f9287..8e3d4ac458 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -529,12 +529,7 @@ static VFIODeviceOps vfio_platform_ops = {
  */
 static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
 {
-VFIOGroup *group;
-VFIODevice *vbasedev_iter;
-char *tmp, group_path[PATH_MAX], *group_name;
-ssize_t len;
 struct stat st;
-int groupid;
 int ret;
 
 /* @sysfsdev takes precedence over @host */
@@ -557,47 +552,15 @@ static int vfio_base_device_init(VFIODevice *vbasedev, 
Error **errp)
 return -errno;
 }
 
-tmp = g_strdup_printf("%s/iommu_group", vbasedev->sysfsdev);
-len = readlink(tmp, group_path, sizeof(group_path));
-g_free(tmp);
-
-if (len < 0 || len >= sizeof(group_path)) {
-ret = len < 0 ? -errno : -ENAMETOOLONG;
-error_setg_errno(errp, -ret, "no iommu_group found");
-return ret;
-}
-
-group_path[len] = 0;
-
-group_name = basename(group_path);
-if (sscanf(group_name, "%d", &groupid) != 1) {
-error_setg_errno(errp, errno, "failed to read %s", group_path);
-return -errno;
-}
-
-trace_vfio_platform_base_device_init(vbasedev->name, groupid);
-
-group = vfio_get_group(groupid, &address_space_memory, errp);
-if (!group) {
-return -ENOENT;
-}
-
-QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
-if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
-error_setg(errp, "device is already attached");
-vfio_put_group(group);
-return -EBUSY;
-}
-}
-ret = vfio_get_device(group, vbasedev->name, vbasedev, errp);
+ret = vfio_attach_device(vbasedev->name, vbasedev,
+ &address_space_memory, errp);
 if (ret) {
-vfio_put_group(group);
 return ret;
 }
 
 ret = vfio_populate_device(vbasedev, errp);
 if (ret) {
-vfio_put_group(group);
+vfio_detach_device(vbasedev);
 }
 
 return ret;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 8ac13eb106..0eb2387cf2 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -121,7 +121,6 @@ vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, 
uint64_t bitmap_size
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu 
dirty @ 0x%"PRIx64" - 0x%"PRIx64
 
 # platform.c
-vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group 
#%d"
 vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"
 vfio_platform_eoi(int pin, int fd) "EOI IRQ pin %d (fd=%d)"
 vfio_platform_intp_mmap_enable(int pin) "IRQ #%d still active, stay in slow 
path"
-- 
2.41.0

[PATCH v5 03/15] vfio/common: Move IOMMU agnostic helpers to a separate file

2023-10-09 Thread Eric Auger

From: Yi Liu 

Move low-level iommu agnostic helpers to a separate helpers.c
file. They relate to regions, interrupts, device/region
capabilities and etc.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Sun 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 

---
v3 -> v4:
- added #include "qemu/error-report.h"
---
 include/hw/vfio/vfio-common.h |   9 +
 hw/vfio/common.c  | 588 
 hw/vfio/helpers.c | 612 ++
 hw/vfio/meson.build   |   1 +
 4 files changed, 622 insertions(+), 588 deletions(-)
 create mode 100644 hw/vfio/helpers.c

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e9b8954595..e0483893d1 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -196,6 +196,12 @@ typedef struct VFIODisplay {
 } dmabuf;
 } VFIODisplay;
 
+typedef struct {
+unsigned long *bitmap;
+hwaddr size;
+hwaddr pages;
+} VFIOBitmap;
+
 void vfio_put_base_device(VFIODevice *vbasedev);
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
@@ -245,6 +251,8 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info 
*info,
  unsigned int *avail);
 struct vfio_info_cap_header *
 vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
+struct vfio_info_cap_header *
+vfio_get_cap(void *ptr, uint32_t cap_offset, uint16_t id);
 #endif
 extern const MemoryListener vfio_prereg_listener;
 
@@ -257,4 +265,5 @@ int vfio_spapr_remove_window(VFIOContainer *container,
 bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
 void vfio_migration_exit(VFIODevice *vbasedev);
 
+int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size);
 #endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 134649226d..4e122fc4e4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -62,84 +62,6 @@ static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
 static int vfio_kvm_device_fd = -1;
 #endif
 
-/*
- * Common VFIO interrupt disable
- */
-void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
-{
-struct vfio_irq_set irq_set = {
-.argsz = sizeof(irq_set),
-.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
-.index = index,
-.start = 0,
-.count = 0,
-};
-
-ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index)
-{
-struct vfio_irq_set irq_set = {
-.argsz = sizeof(irq_set),
-.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
-.index = index,
-.start = 0,
-.count = 1,
-};
-
-ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index)
-{
-struct vfio_irq_set irq_set = {
-.argsz = sizeof(irq_set),
-.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
-.index = index,
-.start = 0,
-.count = 1,
-};
-
-ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-static inline const char *action_to_str(int action)
-{
-switch (action) {
-case VFIO_IRQ_SET_ACTION_MASK:
-return "MASK";
-case VFIO_IRQ_SET_ACTION_UNMASK:
-return "UNMASK";
-case VFIO_IRQ_SET_ACTION_TRIGGER:
-return "TRIGGER";
-default:
-return "UNKNOWN ACTION";
-}
-}
-
-static const char *index_to_str(VFIODevice *vbasedev, int index)
-{
-if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
-return NULL;
-}
-
-switch (index) {
-case VFIO_PCI_INTX_IRQ_INDEX:
-return "INTX";
-case VFIO_PCI_MSI_IRQ_INDEX:
-return "MSI";
-case VFIO_PCI_MSIX_IRQ_INDEX:
-return "MSIX";
-case VFIO_PCI_ERR_IRQ_INDEX:
-return "ERR";
-case VFIO_PCI_REQ_IRQ_INDEX:
-return "REQ";
-default:
-return NULL;
-}
-}
-
 static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
 {
 switch (container->iommu_type) {
@@ -163,183 +85,10 @@ static int vfio_ram_block_discard_disable(VFIOContainer 
*container, bool state)
 }
 }
 
-int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
-   int action, int fd, Error **errp)
-{
-struct vfio_irq_set *irq_set;
-int argsz, ret = 0;
-const char *name;
-int32_t *pfd;
-
-argsz = sizeof(*irq_set) + sizeof(*pfd);
-
-irq_set = g_malloc0(argsz);
-irq_set->argsz = argsz;
-irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | action;
-irq_set->index = index;
-irq_set->start = subindex;
-irq_set->count = 1;
-pfd = (int32_t *)&irq_set->data;
-*pfd = fd;
-
-if (ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
-ret = -errno;
-}
-g_free(irq_set);
-
-if (!ret) {
-

[PATCH v5 05/15] vfio/common: Introduce vfio_container_add|del_section_window()

2023-10-09 Thread Eric Auger

Introduce helper functions that isolate the code used for
VFIO_SPAPR_TCE_v2_IOMMU.

Those helpers hide implementation details beneath the container object
and make the vfio_listener_region_add/del() implementations more
readable. No code change intended.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/common.c | 156 +++
 1 file changed, 89 insertions(+), 67 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c54a72ec80..0397788aa5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -807,6 +807,92 @@ static bool vfio_get_section_iova_range(VFIOContainer 
*container,
 return true;
 }
 
+static int vfio_container_add_section_window(VFIOContainer *container,
+ MemoryRegionSection *section,
+ Error **errp)
+{
+VFIOHostDMAWindow *hostwin;
+hwaddr pgsize = 0;
+int ret;
+
+if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
+return 0;
+}
+
+/* For now intersections are not allowed, we may relax this later */
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (ranges_overlap(hostwin->min_iova,
+   hostwin->max_iova - hostwin->min_iova + 1,
+   section->offset_within_address_space,
+   int128_get64(section->size))) {
+error_setg(errp,
+"region [0x%"PRIx64",0x%"PRIx64"] overlaps with existing"
+"host DMA window [0x%"PRIx64",0x%"PRIx64"]",
+section->offset_within_address_space,
+section->offset_within_address_space +
+int128_get64(section->size) - 1,
+hostwin->min_iova, hostwin->max_iova);
+return -EINVAL;
+}
+}
+
+ret = vfio_spapr_create_window(container, section, &pgsize);
+if (ret) {
+error_setg_errno(errp, -ret, "Failed to create SPAPR window");
+return ret;
+}
+
+vfio_host_win_add(container, section->offset_within_address_space,
+  section->offset_within_address_space +
+  int128_get64(section->size) - 1, pgsize);
+#ifdef CONFIG_KVM
+if (kvm_enabled()) {
+VFIOGroup *group;
+IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+struct kvm_vfio_spapr_tce param;
+struct kvm_device_attr attr = {
+.group = KVM_DEV_VFIO_GROUP,
+.attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
+.addr = (uint64_t)(unsigned long)¶m,
+};
+
+if (!memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_SPAPR_TCE_FD,
+  ¶m.tablefd)) {
+QLIST_FOREACH(group, &container->group_list, container_next) {
+param.groupfd = group->fd;
+if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+error_setg_errno(errp, errno,
+ "vfio: failed GROUP_SET_SPAPR_TCE for "
+ "KVM VFIO device %d and group fd %d",
+ param.tablefd, param.groupfd);
+return -errno;
+}
+trace_vfio_spapr_group_attach(param.groupfd, param.tablefd);
+}
+}
+}
+#endif
+return 0;
+}
+
+static void vfio_container_del_section_window(VFIOContainer *container,
+  MemoryRegionSection *section)
+{
+if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
+return;
+}
+
+vfio_spapr_remove_window(container,
+ section->offset_within_address_space);
+if (vfio_host_win_del(container,
+  section->offset_within_address_space,
+  section->offset_within_address_space +
+  int128_get64(section->size) - 1) < 0) {
+hw_error("%s: Cannot delete missing window at %"HWADDR_PRIx,
+ __func__, section->offset_within_address_space);
+}
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -833,62 +919,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 return;
 }
 
-if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
-hwaddr pgsize = 0;
-
-/* For now intersections are not allowed, we may relax this later */
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (ranges_overlap(hostwin->min_iova,
-   hostwin->max_iova - hostwin->min_iova + 1,
-   section->offset_within_address_space,
-   int128_get64(section->size))) {
-

[PATCH v5 06/15] vfio/common: Extract out vfio_kvm_device_[add/del]_fd

2023-10-09 Thread Eric Auger

From: Zhenzhong Duan 

Introduce two new helpers, vfio_kvm_device_[add/del]_fd
which take as input a file descriptor which can be either a group fd or
a cdev fd. This uses the new KVM_DEV_VFIO_FILE VFIO KVM device group,
which aliases to the legacy KVM_DEV_VFIO_GROUP.

vfio_kvm_device_[add/del]_group then call those new helpers.

Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Eric Auger 
---
 include/hw/vfio/vfio-common.h |  3 ++
 hw/vfio/common.c  | 69 +++
 2 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e0483893d1..c4e7c3b4a7 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -226,6 +226,9 @@ struct vfio_device_info *vfio_get_device_info(int fd);
 int vfio_get_device(VFIOGroup *group, const char *name,
 VFIODevice *vbasedev, Error **errp);
 
+int vfio_kvm_device_add_fd(int fd, Error **errp);
+int vfio_kvm_device_del_fd(int fd, Error **errp);
+
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 extern VFIOGroupList vfio_group_list;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0397788aa5..d8ed432cb6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1818,17 +1818,17 @@ void vfio_reset_handler(void *opaque)
 }
 }
 
-static void vfio_kvm_device_add_group(VFIOGroup *group)
+int vfio_kvm_device_add_fd(int fd, Error **errp)
 {
 #ifdef CONFIG_KVM
 struct kvm_device_attr attr = {
-.group = KVM_DEV_VFIO_GROUP,
-.attr = KVM_DEV_VFIO_GROUP_ADD,
-.addr = (uint64_t)(unsigned long)&group->fd,
+.group = KVM_DEV_VFIO_FILE,
+.attr = KVM_DEV_VFIO_FILE_ADD,
+.addr = (uint64_t)(unsigned long)&fd,
 };
 
 if (!kvm_enabled()) {
-return;
+return 0;
 }
 
 if (vfio_kvm_device_fd < 0) {
@@ -1837,38 +1837,61 @@ static void vfio_kvm_device_add_group(VFIOGroup *group)
 };
 
 if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
-error_report("Failed to create KVM VFIO device: %m");
-return;
+error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
+return -errno;
 }
 
 vfio_kvm_device_fd = cd.fd;
 }
 
 if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-error_report("Failed to add group %d to KVM VFIO device: %m",
- group->groupid);
+error_setg_errno(errp, errno, "Failed to add fd %d to KVM VFIO device",
+ fd);
+return -errno;
 }
 #endif
+return 0;
+}
+
+int vfio_kvm_device_del_fd(int fd, Error **errp)
+{
+#ifdef CONFIG_KVM
+struct kvm_device_attr attr = {
+.group = KVM_DEV_VFIO_FILE,
+.attr = KVM_DEV_VFIO_FILE_DEL,
+.addr = (uint64_t)(unsigned long)&fd,
+};
+
+if (vfio_kvm_device_fd < 0) {
+error_setg(errp, "KVM VFIO device isn't created yet");
+return -EINVAL;
+}
+
+if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+error_setg_errno(errp, errno,
+ "Failed to remove fd %d from KVM VFIO device", fd);
+return -errno;
+}
+#endif
+return 0;
+}
+
+static void vfio_kvm_device_add_group(VFIOGroup *group)
+{
+Error *err = NULL;
+
+if (vfio_kvm_device_add_fd(group->fd, &err)) {
+error_reportf_err(err, "group ID %d: ", group->groupid);
+}
 }
 
 static void vfio_kvm_device_del_group(VFIOGroup *group)
 {
-#ifdef CONFIG_KVM
-struct kvm_device_attr attr = {
-.group = KVM_DEV_VFIO_GROUP,
-.attr = KVM_DEV_VFIO_GROUP_DEL,
-.addr = (uint64_t)(unsigned long)&group->fd,
-};
+Error *err = NULL;
 
-if (vfio_kvm_device_fd < 0) {
-return;
+if (vfio_kvm_device_del_fd(group->fd, &err)) {
+error_reportf_err(err, "group ID %d: ", group->groupid);
 }
-
-if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-error_report("Failed to remove group %d from KVM VFIO device: %m",
- group->groupid);
-}
-#endif
 }
 
 static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
-- 
2.41.0

[PATCH v5 01/15] scripts/update-linux-headers: Add iommufd.h

2023-10-09 Thread Eric Auger

Update the script to import iommufd.h

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 
---
 scripts/update-linux-headers.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 35a64bb501..34295c0fe5 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -161,7 +161,8 @@ done
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
 for header in const.h stddef.h kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \
-  psci.h psp-sev.h userfaultfd.h memfd.h mman.h nvme_ioctl.h 
vduse.h; do
+  psci.h psp-sev.h userfaultfd.h memfd.h mman.h nvme_ioctl.h \
+  vduse.h iommufd.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 
-- 
2.41.0

[PATCH v5 15/15] vfio/common: Move legacy VFIO backend code into separate container.c

2023-10-09 Thread Eric Auger

From: Yi Liu 

Move all the code really dependent on the legacy VFIO container/group
into a separate file: container.c. What does remain in common.c is
the code related to VFIOAddressSpace, MemoryListeners, migration and
all other general operations.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 

---

v4 -> v5:
- restored check on if (!vbasedev->container) in vfio_detach_device

v3 -> v4:
- added dropped comment
---
 include/hw/vfio/vfio-common.h |   35 +
 hw/vfio/common.c  | 1155 +---
 hw/vfio/container.c   | 1161 +
 hw/vfio/meson.build   |1 +
 4 files changed, 1213 insertions(+), 1139 deletions(-)
 create mode 100644 hw/vfio/container.c

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 54905b9dd4..7780b9073a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -206,6 +206,30 @@ typedef struct {
 hwaddr pages;
 } VFIOBitmap;
 
+void vfio_host_win_add(VFIOContainer *container,
+   hwaddr min_iova, hwaddr max_iova,
+   uint64_t iova_pgsizes);
+int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
+  hwaddr max_iova);
+VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
+void vfio_put_address_space(VFIOAddressSpace *space);
+bool vfio_devices_all_running_and_saving(VFIOContainer *container);
+
+/* container->fd */
+int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
+   ram_addr_t size, IOMMUTLBEntry *iotlb);
+int vfio_dma_map(VFIOContainer *container, hwaddr iova,
+ ram_addr_t size, void *vaddr, bool readonly);
+int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start);
+int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
+hwaddr iova, hwaddr size);
+
+int vfio_container_add_section_window(VFIOContainer *container,
+  MemoryRegionSection *section,
+  Error **errp);
+void vfio_container_del_section_window(VFIOContainer *container,
+   MemoryRegionSection *section);
+
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
 void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index);
@@ -235,6 +259,10 @@ extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
 extern VFIOGroupList vfio_group_list;
+extern VFIODeviceList vfio_device_list;
+
+extern const MemoryListener vfio_memory_listener;
+extern int vfio_kvm_device_fd;
 
 bool vfio_mig_active(void);
 int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
@@ -272,4 +300,11 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error 
**errp);
 void vfio_migration_exit(VFIODevice *vbasedev);
 
 int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size);
+bool vfio_devices_all_running_and_mig_active(VFIOContainer *container);
+bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container);
+int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
+VFIOBitmap *vbmap, hwaddr iova,
+hwaddr size);
+int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+ uint64_t size, ram_addr_t ram_addr);
 #endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 95bc50bcda..9e61de03ee 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -46,9 +46,7 @@
 #include "migration/qemu-file.h"
 #include "sysemu/tpm.h"
 
-VFIOGroupList vfio_group_list =
-QLIST_HEAD_INITIALIZER(vfio_group_list);
-static VFIODeviceList vfio_device_list =
+VFIODeviceList vfio_device_list =
 QLIST_HEAD_INITIALIZER(vfio_device_list);
 static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
 QLIST_HEAD_INITIALIZER(vfio_address_spaces);
@@ -61,39 +59,13 @@ static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
  * initialized, this file descriptor is only released on QEMU exit and
  * we'll re-use it should another vfio device be attached before then.
  */
-static int vfio_kvm_device_fd = -1;
+int vfio_kvm_device_fd = -1;
 #endif
 
-static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
-{
-switch (container->iommu_type) {
-case VFIO_TYPE1v2_IOMMU:
-case VFIO_TYPE1_IOMMU:
-/*
- * We support coordinated discarding of RAM via the RamDiscardManager.
- */
-return ram_block_uncoordinated_discard_disable(state);
-default:
-/*
- * VFIO_SPAPR_TCE_IOMMU most probably works just fine with
- * RamDiscardManager, however, it is completely u

[PATCH v5 14/15] vfio/common: Introduce a global VFIODevice list

2023-10-09 Thread Eric Auger

From: Zhenzhong Duan 

Some functions iterate over all the VFIODevices. This is currently
achieved by iterating over all groups/devices. Let's
introduce a global list of VFIODevices simplifying that scan.

This will also be useful while migrating to IOMMUFD by hiding the
group specificity.

Signed-off-by: Eric Auger 
Signed-off-by: Zhenzhong Duan 
Suggested-by: Alex Williamson 
---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/common.c  | 45 +++
 2 files changed, 21 insertions(+), 26 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index bf12e40667..54905b9dd4 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -131,6 +131,7 @@ typedef struct VFIODeviceOps VFIODeviceOps;
 typedef struct VFIODevice {
 QLIST_ENTRY(VFIODevice) next;
 QLIST_ENTRY(VFIODevice) container_next;
+QLIST_ENTRY(VFIODevice) global_next;
 struct VFIOGroup *group;
 VFIOContainer *container;
 char *sysfsdev;
@@ -232,6 +233,7 @@ int vfio_kvm_device_del_fd(int fd, Error **errp);
 
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
+typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
 extern VFIOGroupList vfio_group_list;
 
 bool vfio_mig_active(void);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 55f8a113ea..95bc50bcda 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -48,6 +48,8 @@
 
 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
+static VFIODeviceList vfio_device_list =
+QLIST_HEAD_INITIALIZER(vfio_device_list);
 static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
 QLIST_HEAD_INITIALIZER(vfio_address_spaces);
 
@@ -94,18 +96,15 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, 
uint64_t iova,
 
 bool vfio_mig_active(void)
 {
-VFIOGroup *group;
 VFIODevice *vbasedev;
 
-if (QLIST_EMPTY(&vfio_group_list)) {
+if (QLIST_EMPTY(&vfio_device_list)) {
 return false;
 }
 
-QLIST_FOREACH(group, &vfio_group_list, next) {
-QLIST_FOREACH(vbasedev, &group->device_list, next) {
-if (vbasedev->migration_blocker) {
-return false;
-}
+QLIST_FOREACH(vbasedev, &vfio_device_list, next) {
+if (vbasedev->migration_blocker) {
+return false;
 }
 }
 return true;
@@ -120,19 +119,16 @@ static Error *multiple_devices_migration_blocker;
  */
 static bool vfio_multiple_devices_migration_is_supported(void)
 {
-VFIOGroup *group;
 VFIODevice *vbasedev;
 unsigned int device_num = 0;
 bool all_support_p2p = true;
 
-QLIST_FOREACH(group, &vfio_group_list, next) {
-QLIST_FOREACH(vbasedev, &group->device_list, next) {
-if (vbasedev->migration) {
-device_num++;
+QLIST_FOREACH(vbasedev, &vfio_device_list, next) {
+if (vbasedev->migration) {
+device_num++;
 
-if (!(vbasedev->migration->mig_flags & VFIO_MIGRATION_P2P)) {
-all_support_p2p = false;
-}
+if (!(vbasedev->migration->mig_flags & VFIO_MIGRATION_P2P)) {
+all_support_p2p = false;
 }
 }
 }
@@ -1777,22 +1773,17 @@ bool vfio_get_info_dma_avail(struct 
vfio_iommu_type1_info *info,
 
 void vfio_reset_handler(void *opaque)
 {
-VFIOGroup *group;
 VFIODevice *vbasedev;
 
-QLIST_FOREACH(group, &vfio_group_list, next) {
-QLIST_FOREACH(vbasedev, &group->device_list, next) {
-if (vbasedev->dev->realized) {
-vbasedev->ops->vfio_compute_needs_reset(vbasedev);
-}
+QLIST_FOREACH(vbasedev, &vfio_device_list, next) {
+if (vbasedev->dev->realized) {
+vbasedev->ops->vfio_compute_needs_reset(vbasedev);
 }
 }
 
-QLIST_FOREACH(group, &vfio_group_list, next) {
-QLIST_FOREACH(vbasedev, &group->device_list, next) {
-if (vbasedev->dev->realized && vbasedev->needs_reset) {
-vbasedev->ops->vfio_hot_reset_multi(vbasedev);
-}
+QLIST_FOREACH(vbasedev, &vfio_device_list, next) {
+if (vbasedev->dev->realized && vbasedev->needs_reset) {
+vbasedev->ops->vfio_hot_reset_multi(vbasedev);
 }
 }
 }
@@ -2657,6 +2648,7 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
 container = group->container;
 vbasedev->container = container;
 QLIST_INSERT_HEAD(&container->device_list, vbasedev, container_next);
+QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
 
 return ret;
 }
@@ -2669,6 +2661,7 @@ void vfio_detach_device(VFIODevice *vbasedev)
 return;
 }
 
+QLIST_REMOVE(vbasedev, global_next);
 QLIST_REMOVE(vbasedev, container_next);
 vbasedev->container = NULL;
 trace_vfio_detach_device(vbasedev->name, group->groupid);
-- 
2.41.0

[PATCH v5 07/15] vfio/pci: Introduce vfio_[attach/detach]_device

2023-10-09 Thread Eric Auger

We want the VFIO devices to be able to use two different
IOMMU backends, the legacy VFIO one and the new iommufd one.

Introduce vfio_[attach/detach]_device which aim at hiding the
underlying IOMMU backend (IOCTLs, datatypes, ...).

Once vfio_attach_device completes, the device is attached
to a security context and its fd can be used. Conversely
When vfio_detach_device completes, the device has been
detached from the security context.

At the moment only the implementation based on the legacy
container/group exists. Let's use it from the vfio-pci device.
Subsequent patches will handle other devices.

We also take benefit of this patch to properly free
vbasedev->name on failure.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 

---

v4 -> v5:
- remove vbasedev->name g_free as it is done on instance_finalize

v2 -> v3:
- added trace_vfio_detach_device
- added a comment explaining why we pass @name to vfio_attach_device
  although vbasedev->name is populated
- free vbasedev->name and detach_device if needed
---
 include/hw/vfio/vfio-common.h |  3 ++
 hw/vfio/common.c  | 74 +++
 hw/vfio/pci.c | 66 +++
 hw/vfio/trace-events  |  3 +-
 4 files changed, 93 insertions(+), 53 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c4e7c3b4a7..12fbfbc37d 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -225,6 +225,9 @@ void vfio_put_group(VFIOGroup *group);
 struct vfio_device_info *vfio_get_device_info(int fd);
 int vfio_get_device(VFIOGroup *group, const char *name,
 VFIODevice *vbasedev, Error **errp);
+int vfio_attach_device(char *name, VFIODevice *vbasedev,
+   AddressSpace *as, Error **errp);
+void vfio_detach_device(VFIODevice *vbasedev);
 
 int vfio_kvm_device_add_fd(int fd, Error **errp);
 int vfio_kvm_device_del_fd(int fd, Error **errp);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d8ed432cb6..f4c33c9858 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -2611,3 +2611,77 @@ int vfio_eeh_as_op(AddressSpace *as, uint32_t op)
 }
 return vfio_eeh_container_op(container, op);
 }
+
+static int vfio_device_groupid(VFIODevice *vbasedev, Error **errp)
+{
+char *tmp, group_path[PATH_MAX], *group_name;
+int ret, groupid;
+ssize_t len;
+
+tmp = g_strdup_printf("%s/iommu_group", vbasedev->sysfsdev);
+len = readlink(tmp, group_path, sizeof(group_path));
+g_free(tmp);
+
+if (len <= 0 || len >= sizeof(group_path)) {
+ret = len < 0 ? -errno : -ENAMETOOLONG;
+error_setg_errno(errp, -ret, "no iommu_group found");
+return ret;
+}
+
+group_path[len] = 0;
+
+group_name = basename(group_path);
+if (sscanf(group_name, "%d", &groupid) != 1) {
+error_setg_errno(errp, errno, "failed to read %s", group_path);
+return -errno;
+}
+return groupid;
+}
+
+/*
+ * vfio_attach_device: attach a device to a security context
+ * @name and @vbasedev->name are likely to be different depending
+ * on the type of the device, hence the need for passing @name
+ */
+int vfio_attach_device(char *name, VFIODevice *vbasedev,
+   AddressSpace *as, Error **errp)
+{
+int groupid = vfio_device_groupid(vbasedev, errp);
+VFIODevice *vbasedev_iter;
+VFIOGroup *group;
+int ret;
+
+if (groupid < 0) {
+return groupid;
+}
+
+trace_vfio_attach_device(vbasedev->name, groupid);
+
+group = vfio_get_group(groupid, as, errp);
+if (!group) {
+return -ENOENT;
+}
+
+QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
+error_setg(errp, "device is already attached");
+vfio_put_group(group);
+return -EBUSY;
+}
+}
+ret = vfio_get_device(group, name, vbasedev, errp);
+if (ret) {
+vfio_put_group(group);
+}
+
+return ret;
+}
+
+void vfio_detach_device(VFIODevice *vbasedev)
+{
+VFIOGroup *group = vbasedev->group;
+
+trace_vfio_detach_device(vbasedev->name, group->groupid);
+vfio_put_base_device(vbasedev);
+vfio_put_group(group);
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 898296fd54..40ae46266e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2895,10 +2895,10 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 
 static void vfio_pci_put_device(VFIOPCIDevice *vdev)
 {
+vfio_detach_device(&vdev->vbasedev);
+
 g_free(vdev->vbasedev.name);
 g_free(vdev->msix);
-
-vfio_put_base_device(&vdev->vbasedev);
 }
 
 static void vfio_err_notifier_handler(void *opaque)
@@ -3045,13 +3045,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = VFIO_PCI(pdev);
 VFIODevice *vbasedev = &vdev->vbasedev;
-

[PATCH v5 13/15] vfio/common: Store the parent container in VFIODevice

2023-10-09 Thread Eric Auger

From: Zhenzhong Duan 

let's store the parent contaienr within the VFIODevice.
This simplifies the logic in vfio_viommu_preset() and
brings the benefice to hide the group specificity which
is useful for IOMMUFD migration.

Signed-off-by: Eric Auger 
Signed-off-by: Zhenzhong Duan 

---
v4 -> v5:
- restore check on !vbasedev->container!

v3 -> v4:
- Dropped check on !vbasedev->container
---
 include/hw/vfio/vfio-common.h | 1 +
 hw/vfio/common.c  | 8 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8ca70dd821..bf12e40667 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -132,6 +132,7 @@ typedef struct VFIODevice {
 QLIST_ENTRY(VFIODevice) next;
 QLIST_ENTRY(VFIODevice) container_next;
 struct VFIOGroup *group;
+VFIOContainer *container;
 char *sysfsdev;
 char *name;
 DeviceState *dev;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ef9dc7c747..55f8a113ea 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -184,7 +184,7 @@ void vfio_unblock_multiple_devices_migration(void)
 
 bool vfio_viommu_preset(VFIODevice *vbasedev)
 {
-return vbasedev->group->container->space->as != &address_space_memory;
+return vbasedev->container->space->as != &address_space_memory;
 }
 
 static void vfio_set_migration_error(int err)
@@ -2655,6 +2655,7 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
 }
 
 container = group->container;
+vbasedev->container = container;
 QLIST_INSERT_HEAD(&container->device_list, vbasedev, container_next);
 
 return ret;
@@ -2664,7 +2665,12 @@ void vfio_detach_device(VFIODevice *vbasedev)
 {
 VFIOGroup *group = vbasedev->group;
 
+if (!vbasedev->container) {
+return;
+}
+
 QLIST_REMOVE(vbasedev, container_next);
+vbasedev->container = NULL;
 trace_vfio_detach_device(vbasedev->name, group->groupid);
 vfio_put_base_device(vbasedev);
 vfio_put_group(group);
-- 
2.41.0

[PATCH v5 12/15] vfio/common: Introduce a per container device list

2023-10-09 Thread Eric Auger

From: Zhenzhong Duan 

Several functions need to iterate over the VFIO devices attached to
a given container.  This is currently achieved by iterating over the
groups attached to the container and then over the devices in the group.
Let's introduce a per container device list that simplifies this
search.

Per container list is used in below functions:
vfio_devices_all_dirty_tracking
vfio_devices_all_device_dirty_tracking
vfio_devices_all_running_and_mig_active
vfio_devices_dma_logging_stop
vfio_devices_dma_logging_start
vfio_devices_query_dirty_bitmap

This will also ease the migration of IOMMUFD by hiding the group
specificity.

Suggested-by: Alex Williamson 
Signed-off-by: Zhenzhong Duan 
Signed-off-by: Eric Auger 
Reviewed-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-common.h |   2 +
 hw/vfio/common.c  | 145 +++---
 2 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c486bdef2a..8ca70dd821 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -98,6 +98,7 @@ typedef struct VFIOContainer {
 QLIST_HEAD(, VFIOGroup) group_list;
 QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
 QLIST_ENTRY(VFIOContainer) next;
+QLIST_HEAD(, VFIODevice) device_list;
 } VFIOContainer;
 
 typedef struct VFIOGuestIOMMU {
@@ -129,6 +130,7 @@ typedef struct VFIODeviceOps VFIODeviceOps;
 
 typedef struct VFIODevice {
 QLIST_ENTRY(VFIODevice) next;
+QLIST_ENTRY(VFIODevice) container_next;
 struct VFIOGroup *group;
 char *sysfsdev;
 char *name;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 019da387d2..ef9dc7c747 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -218,7 +218,6 @@ bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
 
 static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
 {
-VFIOGroup *group;
 VFIODevice *vbasedev;
 MigrationState *ms = migrate_get_current();
 
@@ -227,19 +226,17 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return false;
 }
 
-QLIST_FOREACH(group, &container->group_list, container_next) {
-QLIST_FOREACH(vbasedev, &group->device_list, next) {
-VFIOMigration *migration = vbasedev->migration;
+QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+VFIOMigration *migration = vbasedev->migration;
 
-if (!migration) {
-return false;
-}
+if (!migration) {
+return false;
+}
 
-if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
-(vfio_device_state_is_running(vbasedev) ||
- vfio_device_state_is_precopy(vbasedev))) {
-return false;
-}
+if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
+(vfio_device_state_is_running(vbasedev) ||
+ vfio_device_state_is_precopy(vbasedev))) {
+return false;
 }
 }
 return true;
@@ -247,14 +244,11 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 
 static bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
 {
-VFIOGroup *group;
 VFIODevice *vbasedev;
 
-QLIST_FOREACH(group, &container->group_list, container_next) {
-QLIST_FOREACH(vbasedev, &group->device_list, next) {
-if (!vbasedev->dirty_pages_supported) {
-return false;
-}
+QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+if (!vbasedev->dirty_pages_supported) {
+return false;
 }
 }
 
@@ -267,27 +261,24 @@ static bool 
vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
  */
 static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 {
-VFIOGroup *group;
 VFIODevice *vbasedev;
 
 if (!migration_is_active(migrate_get_current())) {
 return false;
 }
 
-QLIST_FOREACH(group, &container->group_list, container_next) {
-QLIST_FOREACH(vbasedev, &group->device_list, next) {
-VFIOMigration *migration = vbasedev->migration;
+QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+VFIOMigration *migration = vbasedev->migration;
 
-if (!migration) {
-return false;
-}
+if (!migration) {
+return false;
+}
 
-if (vfio_device_state_is_running(vbasedev) ||
-vfio_device_state_is_precopy(vbasedev)) {
-continue;
-} else {
-return false;
-}
+if (vfio_device_state_is_running(vbasedev) ||
+vfio_device_state_is_precopy(vbasedev)) {
+continue;
+} else {
+return false;
 }
 }
 return true;
@@ -1187,20 +1178,17 @@ static bool 
vfio_section_is_vf

Re: [Virtio-fs] (no subject)

2023-10-09 Thread Hanna Czenczek


On 09.10.23 11:07, Hanna Czenczek wrote:

On 09.10.23 10:21, Hanna Czenczek wrote:

On 07.10.23 04:22, Yajun Wu wrote:


[...]

The main motivation of adding VHOST_USER_SET_STATUS is to let 
backend DPDK know
when DRIVER_OK bit is valid. It's an indication of all VQ 
configuration has sent,
otherwise DPDK has to rely on first queue pair is ready, then 
receiving/applying

VQ configuration one by one.

During live migration, configuring VQ one by one is very time 
consuming.


One question I have here is why it wasn’t then introduced in the live 
migration code, but in the general VM stop/cont code instead. It does 
seem time-consuming to do this every time the VM is paused and resumed.



For VIRTIO
net vDPA, HW needs to know how many VQs are enabled to set 
RSS(Receive-Side Scaling).


If you don’t want SET_STATUS message, backend can remove protocol 
feature bit

VHOST_USER_PROTOCOL_F_STATUS.


The problem isn’t back-ends that don’t want the message, the problem 
is that qemu uses the message wrongly, which prevents well-behaving 
back-ends from implementing the message.


DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device 
close/reset.


So the right thing to do for back-ends is to announce STATUS support 
and then not implement it correctly?


GET_VRING_BASE should not reset the close or reset the device, by the 
way.  It should stop that one vring, not more.  We have a 
RESET_DEVICE command for resetting.


I'm not involved in discussion about adding SET_STATUS in Vhost 
protocol. This feature
is essential for vDPA(same as vhost-vdpa implements 
VHOST_VDPA_SET_STATUS).


So from what I gather from your response is that there is only a 
single use for SET_STATUS, which is the DRIVER_OK bit.  If so, 
documenting that all other bits are to be ignored by both back-end 
and front-end would be fine by me.


I’m not fully serious about that suggestion, but I hear the strong 
implication that nothing but DRIVER_OK was of any concern, and this 
is really important to note when we talk about the status of the 
STATUS feature in vhost today.  It seems to me now that it was not 
intended to be the virtio-level status byte, but just a DRIVER_OK 
signalling path from front-end to back-end.  That makes it a 
vhost-level protocol feature to me.


On second thought, it just is a pure vhost-level protocol feature, and 
has nothing to do with the virtio status byte as-is.  The only stated 
purpose is for the front-end to send DRIVER_OK after migration, but 
migration is transparent to the guest, so the guest would never change 
the status byte during migration.  Therefore, if this feature is 
essential, we will never be able to have a status byte that is 
transparently shared between guest and back-end device, i.e. the 
virtio status byte.


On third thought, scratch that.  The guest wouldn’t set it, but 
naturally, after migration, the front-end will need to restore the 
status byte from the source, so the front-end will always need to set 
it, even if it were otherwise used controlled only by the guest and the 
back-end device.  So technically, this doesn’t prevent such a use case.  
(In practice, it isn’t controlled by the guest right now, but that could 
be fixed.)


Cc-ing Alex on this mail, because to me, this seems like an important 
detail when he plans on using the byte in the future. If we need a 
virtio status byte, I can’t see how we could use the existing F_STATUS 
for it.


Hanna

[PATCH v5 10/15] vfio/ccw: Use vfio_[attach/detach]_device

2023-10-09 Thread Eric Auger

Let the vfio-ccw device use vfio_attach_device() and
vfio_detach_device(), hence hiding the details of the used
IOMMU backend.

Note that the migration reduces the following trace
"vfio: subchannel %s has already been attached" (featuring
cssid.ssid.devid) into "device is already attached"

Also now all the devices have been migrated to use the new
vfio_attach_device/vfio_detach_device API, let's turn the
legacy functions into static functions, local to container.c.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 

---

v4 -> v5:
- do not overwrite sysfsdev [Zhenzhong]
- remove name local variable

v3:
- simplified vbasedev->dev setting

v2 -> v3:
- Hopefully fix confusion beteen vbasedev->name, mdevid and sysfsdev
  while keeping into account Matthew's comment
  
https://lore.kernel.org/qemu-devel/6e04ab8f-dc84-e9c2-deea-2b6b31678...@linux.ibm.com/
---
 include/hw/vfio/vfio-common.h |   5 --
 hw/vfio/ccw.c | 117 --
 hw/vfio/common.c  |  10 +--
 3 files changed, 32 insertions(+), 100 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 12fbfbc37d..c486bdef2a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -202,7 +202,6 @@ typedef struct {
 hwaddr pages;
 } VFIOBitmap;
 
-void vfio_put_base_device(VFIODevice *vbasedev);
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
 void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index);
@@ -220,11 +219,7 @@ void vfio_region_unmap(VFIORegion *region);
 void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
-VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
-void vfio_put_group(VFIOGroup *group);
 struct vfio_device_info *vfio_get_device_info(int fd);
-int vfio_get_device(VFIOGroup *group, const char *name,
-VFIODevice *vbasedev, Error **errp);
 int vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
 void vfio_detach_device(VFIODevice *vbasedev);
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 1e2fce83b0..6623ae237b 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -572,88 +572,14 @@ static void vfio_ccw_put_region(VFIOCCWDevice *vcdev)
 g_free(vcdev->io_region);
 }
 
-static void vfio_ccw_put_device(VFIOCCWDevice *vcdev)
-{
-g_free(vcdev->vdev.name);
-vfio_put_base_device(&vcdev->vdev);
-}
-
-static void vfio_ccw_get_device(VFIOGroup *group, VFIOCCWDevice *vcdev,
-Error **errp)
-{
-S390CCWDevice *cdev = S390_CCW_DEVICE(vcdev);
-char *name = g_strdup_printf("%x.%x.%04x", cdev->hostid.cssid,
- cdev->hostid.ssid,
- cdev->hostid.devid);
-VFIODevice *vbasedev;
-
-QLIST_FOREACH(vbasedev, &group->device_list, next) {
-if (strcmp(vbasedev->name, name) == 0) {
-error_setg(errp, "vfio: subchannel %s has already been attached",
-   name);
-goto out_err;
-}
-}
-
-/*
- * All vfio-ccw devices are believed to operate in a way compatible with
- * discarding of memory in RAM blocks, ie. pages pinned in the host are
- * in the current working set of the guest driver and therefore never
- * overlap e.g., with pages available to the guest balloon driver.  This
- * needs to be set before vfio_get_device() for vfio common to handle
- * ram_block_discard_disable().
- */
-vcdev->vdev.ram_block_discard_allowed = true;
-
-if (vfio_get_device(group, cdev->mdevid, &vcdev->vdev, errp)) {
-goto out_err;
-}
-
-vcdev->vdev.ops = &vfio_ccw_ops;
-vcdev->vdev.type = VFIO_DEVICE_TYPE_CCW;
-vcdev->vdev.name = name;
-vcdev->vdev.dev = DEVICE(vcdev);
-
-return;
-
-out_err:
-g_free(name);
-}
-
-static VFIOGroup *vfio_ccw_get_group(S390CCWDevice *cdev, Error **errp)
-{
-char *tmp, group_path[PATH_MAX];
-ssize_t len;
-int groupid;
-
-tmp = g_strdup_printf("/sys/bus/css/devices/%x.%x.%04x/%s/iommu_group",
-  cdev->hostid.cssid, cdev->hostid.ssid,
-  cdev->hostid.devid, cdev->mdevid);
-len = readlink(tmp, group_path, sizeof(group_path));
-g_free(tmp);
-
-if (len <= 0 || len >= sizeof(group_path)) {
-error_setg(errp, "vfio: no iommu_group found");
-return NULL;
-}
-
-group_path[len] = 0;
-
-if (sscanf(basename(group_path), "%d", &groupid) != 1) {
-error_setg(errp, "vfio: failed to read %s", group_path);
-return NULL;
-}
-
-return vfio_get_group(groupid, &address_space_memory, errp);
-}
-
 static void vfio_ccw_realize(DeviceState *dev, Error **errp)
 {
-VFIOGroup *group;
 S390CCWDevice *cdev

[PATCH v5 09/15] vfio/ap: Use vfio_[attach/detach]_device

2023-10-09 Thread Eric Auger

Let the vfio-ap device use vfio_attach_device() and
vfio_detach_device(), hence hiding the details of the used
IOMMU backend.

We take the opportunity to use g_path_get_basename() which
is prefered, as suggested by
3e015d815b ("use g_path_get_basename instead of basename")

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Matthew Rosato 

---

v4 -> v5:
- restore the 'return' before the error label

v3 -> v4:
- Removed vfio_detach_device(vbasedev) and g_free(vbasedev->name);
  which do not match the intent
- added error_prepend

v2 -> v3:
- Mention g_path_get_basename in commit message and properly free
  vbasedev->name, call vfio_detach_device
---
 hw/vfio/ap.c | 67 ++--
 1 file changed, 13 insertions(+), 54 deletions(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 6e21d1da5a..f870f51ffa 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -53,40 +53,6 @@ struct VFIODeviceOps vfio_ap_ops = {
 .vfio_compute_needs_reset = vfio_ap_compute_needs_reset,
 };
 
-static void vfio_ap_put_device(VFIOAPDevice *vapdev)
-{
-g_free(vapdev->vdev.name);
-vfio_put_base_device(&vapdev->vdev);
-}
-
-static VFIOGroup *vfio_ap_get_group(VFIOAPDevice *vapdev, Error **errp)
-{
-GError *gerror = NULL;
-char *symlink, *group_path;
-int groupid;
-
-symlink = g_strdup_printf("%s/iommu_group", vapdev->vdev.sysfsdev);
-group_path = g_file_read_link(symlink, &gerror);
-g_free(symlink);
-
-if (!group_path) {
-error_setg(errp, "%s: no iommu_group found for %s: %s",
-   TYPE_VFIO_AP_DEVICE, vapdev->vdev.sysfsdev, 
gerror->message);
-g_error_free(gerror);
-return NULL;
-}
-
-if (sscanf(basename(group_path), "%d", &groupid) != 1) {
-error_setg(errp, "vfio: failed to read %s", group_path);
-g_free(group_path);
-return NULL;
-}
-
-g_free(group_path);
-
-return vfio_get_group(groupid, &address_space_memory, errp);
-}
-
 static void vfio_ap_req_notifier_handler(void *opaque)
 {
 VFIOAPDevice *vapdev = opaque;
@@ -189,22 +155,15 @@ static void vfio_ap_unregister_irq_notifier(VFIOAPDevice 
*vapdev,
 static void vfio_ap_realize(DeviceState *dev, Error **errp)
 {
 int ret;
-char *mdevid;
 Error *err = NULL;
-VFIOGroup *vfio_group;
 APDevice *apdev = AP_DEVICE(dev);
 VFIOAPDevice *vapdev = VFIO_AP_DEVICE(apdev);
+VFIODevice *vbasedev = &vapdev->vdev;
 
-vfio_group = vfio_ap_get_group(vapdev, errp);
-if (!vfio_group) {
-return;
-}
-
-vapdev->vdev.ops = &vfio_ap_ops;
-vapdev->vdev.type = VFIO_DEVICE_TYPE_AP;
-mdevid = basename(vapdev->vdev.sysfsdev);
-vapdev->vdev.name = g_strdup_printf("%s", mdevid);
-vapdev->vdev.dev = dev;
+vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+vbasedev->ops = &vfio_ap_ops;
+vbasedev->type = VFIO_DEVICE_TYPE_AP;
+vbasedev->dev = dev;
 
 /*
  * vfio-ap devices operate in a way compatible with discarding of
@@ -214,9 +173,10 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
  */
 vapdev->vdev.ram_block_discard_allowed = true;
 
-ret = vfio_get_device(vfio_group, mdevid, &vapdev->vdev, errp);
+ret = vfio_attach_device(vbasedev->name, vbasedev,
+ &address_space_memory, errp);
 if (ret) {
-goto out_get_dev_err;
+goto error;
 }
 
 vfio_ap_register_irq_notifier(vapdev, VFIO_AP_REQ_IRQ_INDEX, &err);
@@ -230,20 +190,19 @@ static void vfio_ap_realize(DeviceState *dev, Error 
**errp)
 
 return;
 
-out_get_dev_err:
-vfio_ap_put_device(vapdev);
-vfio_put_group(vfio_group);
+error:
+error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->name);
+g_free(vbasedev->name);
 }
 
 static void vfio_ap_unrealize(DeviceState *dev)
 {
 APDevice *apdev = AP_DEVICE(dev);
 VFIOAPDevice *vapdev = VFIO_AP_DEVICE(apdev);
-VFIOGroup *group = vapdev->vdev.group;
 
 vfio_ap_unregister_irq_notifier(vapdev, VFIO_AP_REQ_IRQ_INDEX);
-vfio_ap_put_device(vapdev);
-vfio_put_group(group);
+vfio_detach_device(&vapdev->vdev);
+g_free(vapdev->vdev.name);
 }
 
 static Property vfio_ap_properties[] = {
-- 
2.41.0

[PATCH v5 11/15] vfio/common: Move VFIO reset handler registration to a group agnostic function

2023-10-09 Thread Eric Auger

From: Zhenzhong Duan 

Move the reset handler registration/unregistration to a place that is not
group specific. vfio_[get/put]_address_space are the best places for that
purpose.

Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/common.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 56cfe94d97..019da387d2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1909,6 +1909,10 @@ static VFIOAddressSpace 
*vfio_get_address_space(AddressSpace *as)
 space->as = as;
 QLIST_INIT(&space->containers);
 
+if (QLIST_EMPTY(&vfio_address_spaces)) {
+qemu_register_reset(vfio_reset_handler, NULL);
+}
+
 QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
 
 return space;
@@ -1920,6 +1924,9 @@ static void vfio_put_address_space(VFIOAddressSpace 
*space)
 QLIST_REMOVE(space, list);
 g_free(space);
 }
+if (QLIST_EMPTY(&vfio_address_spaces)) {
+qemu_unregister_reset(vfio_reset_handler, NULL);
+}
 }
 
 /*
@@ -2385,10 +2392,6 @@ static VFIOGroup *vfio_get_group(int groupid, 
AddressSpace *as, Error **errp)
 goto close_fd_exit;
 }
 
-if (QLIST_EMPTY(&vfio_group_list)) {
-qemu_register_reset(vfio_reset_handler, NULL);
-}
-
 QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
 return group;
@@ -2417,10 +2420,6 @@ static void vfio_put_group(VFIOGroup *group)
 trace_vfio_put_group(group->fd);
 close(group->fd);
 g_free(group);
-
-if (QLIST_EMPTY(&vfio_group_list)) {
-qemu_unregister_reset(vfio_reset_handler, NULL);
-}
 }
 
 struct vfio_device_info *vfio_get_device_info(int fd)
-- 
2.41.0

[PATCH] buildsys: Only display Objective-C information when Objective-C is used

2023-10-09 Thread Philippe Mathieu-Daudé

When configuring with '--disable-cocoa --disable-coreaudio'
on Darwin, we get:

 meson.build:4081:58: ERROR: Tried to access compiler for language "objc", not 
specified for host machine.
 meson.build:4097:47: ERROR: Tried to access unknown option 'objc_args'.

Instead of unconditionally display Objective-C informations
on Darwin, display them when Objective-C is discovered.

Signed-off-by: Philippe Mathieu-Daudé 
---
 meson.build | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 3bb64b536c..da1a9a7228 100644
--- a/meson.build
+++ b/meson.build
@@ -4074,7 +4074,7 @@ if 'cpp' in all_languages
 else
   summary_info += {'C++ compiler':  false}
 endif
-if targetos == 'darwin'
+if 'objc' in all_languages
   summary_info += {'Objective-C compiler': ' 
'.join(meson.get_compiler('objc').cmd_array())}
 endif
 option_cflags = (get_option('debug') ? ['-g'] : [])
@@ -4085,7 +4085,7 @@ summary_info += {'CFLAGS':' 
'.join(get_option('c_args') + option_cfl
 if 'cpp' in all_languages
   summary_info += {'CXXFLAGS':' '.join(get_option('cpp_args') + 
option_cflags)}
 endif
-if targetos == 'darwin'
+if 'objc' in all_languages
   summary_info += {'OBJCFLAGS':   ' '.join(get_option('objc_args') + 
option_cflags)}
 endif
 link_args = get_option('c_link_args')
-- 
2.41.0

Re: [PATCH] buildsys: Only display Objective-C information when Objective-C is used

2023-10-09 Thread Akihiko Odaki


On 2023/10/09 18:13, Philippe Mathieu-Daudé wrote:

When configuring with '--disable-cocoa --disable-coreaudio'
on Darwin, we get:

  meson.build:4081:58: ERROR: Tried to access compiler for language "objc", not 
specified for host machine.
  meson.build:4097:47: ERROR: Tried to access unknown option 'objc_args'.

Instead of unconditionally display Objective-C informations
on Darwin, display them when Objective-C is discovered.

Signed-off-by: Philippe Mathieu-Daudé 
---
  meson.build | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 3bb64b536c..da1a9a7228 100644
--- a/meson.build
+++ b/meson.build
@@ -4074,7 +4074,7 @@ if 'cpp' in all_languages
  else
summary_info += {'C++ compiler':  false}
  endif
-if targetos == 'darwin'
+if 'objc' in all_languages
summary_info += {'Objective-C compiler': ' 
'.join(meson.get_compiler('objc').cmd_array())}
  endif


Probably it's more kind if it emits "Objective-C compiler: false" as it 
does for C++ when the compiler is not available.

Re: [PATCH v4 2/4] qcow2: add configurations for zoned format extension

2023-10-09 Thread Sam Li

Hello Eric,

Eric Blake  于2023年9月28日周四 23:15写道：
>
> On Mon, Sep 18, 2023 at 05:53:11PM +0800, Sam Li wrote:
> > To configure the zoned format feature on the qcow2 driver, it
> > requires settings as: the device size, zone model, zone size,
> > zone capacity, number of conventional zones, limits on zone
> > resources (max append sectors, max open zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=1
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/qcow2.c| 186 ++-
> >  block/qcow2.h|  28 +
> >  docs/interop/qcow2.txt   |  36 ++
> >  include/block/block_int-common.h |  13 +++
> >  qapi/block-core.json |  30 -
> >  5 files changed, 291 insertions(+), 2 deletions(-)
>
> Below, I'll focus only on the spec change, not the implementation:
>
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index b48cd9ce63..521276fc51 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -73,6 +73,7 @@ typedef struct {
> >  #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> >  #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> >  #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > +#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
>
> Why not spell it 0x007a6264 with 8 hex digits, like the others?  (I
> get why you choose that constant, though - ascii 'zbd')
>
> > +++ b/docs/interop/qcow2.txt
> > @@ -331,6 +331,42 @@ The fields of the bitmaps extension are:
> > Offset into the image file at which the bitmap directory
> > starts. Must be aligned to a cluster boundary.
> >
> > +== Zoned extension ==
>
> Where is the magic number for this extension called out?  That's
> missing, and MUST be part of the spec.

It's a part of the header extension type in the spec. I will add it.

>
> Back-compatibility constraints: you should consider what happens in
> both of the following cases:
>
> a program that intends to do read-only access to the qcow2 file but
> which does not understand this extension header (for example, an older
> version of 'qemu-img convert' being used to extract data from a newer
> .qcow2 file with this header present - but also the new 'nbdkit
> qcow2dec' decoder plugin just released in nbdkit 1.36).  Is it safe to
> read the data as-is, by basically ignoring zone informations?  Or will
> that ever produce wrong data (for example, if operations on a
> particular zone imply that the guest should read all zeroes after the
> current zone offset within that zone, regardless of whether non-zero
> content was previously stored at those offsets - then not honoring the
> existence of the extension header would require you to add and
> document an incompatible feature bit so that reader apps fail to open
> the file rather than reading wrong data).
>
> a program that intends to edit the qcow2 file but which does not
> understand this extension header (again, consider access by an older
> version of qemu).  Is it safe to just write data anywhere in the disk,
> but where failure to update the zone metadata means that all
> subsequent use of the file MUST behave as if it is now a non-zeoned
> device?  If so, then it is sufficient to document an autoclear feature
> bit: any time a newer qcow2 writer creates a file with a zoned
> extension, it also sets the autoclear feature bit; any time an older
> qcow2 writer edits a file with the autoclear bit, it clears the bit
> (because it has no idea if its edits invalidated the unknown
> extension).  Then when the new qcow2 program again accesses the file,
> it knows that the zone information is no longer reliable, and can fall
> back to forcing the image to behave as flat.

Considering access by an older version of qemu ('old qemu' for abbr.)
with a qcow2 file created with zoned extension ('new file' for abbr.),
reads from a new file on old qemu which does not understand zoned
information are safe. The zoned extension represents necessary zone
states for all zones, which puts constraints to operations on the
zones. For example, writes to offsets that are over the capacity of
that zone are not allowed, where it will be read as zeroes. The old
qemu ignores that and reads the new file as a regular one anyway.

However, what is unsafe is when an old qemu program gets involved in
editing a new file. The new qemu will not see the write pointer
changes of the new file that was done sometime by old qemu programs.
Then the zone information is no longer reliable as you illustrated.

Therefore I will add an autoclear bit for the latter case. It clears
the zoned extension when it is set by old qemu programs.

>
> > +
> > +The zoned extension is an optional header extension. It contains fields for
> > +emulati

[PATCH] hw/mips/malta: Use sdram_type enum from 'hw/i2c/smbus_eeprom.h'

2023-10-09 Thread Philippe Mathieu-Daudé

Since commit 93198b6cad ("i2c: Split smbus into parts") the SDRAM
types are enumerated as sdram_type in "hw/i2c/smbus_eeprom.h".

Using the enum removes this global shadow warning:

  hw/mips/malta.c:209:12: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
  enum { SDR = 0x4, DDR2 = 0x8 } type;
 ^
  include/hw/i2c/smbus_eeprom.h:33:19: note: previous declaration is here
  enum sdram_type { SDR = 0x4, DDR = 0x7, DDR2 = 0x8 };
^

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/mips/malta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index dac27fad9d..62d04ed113 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -206,7 +206,7 @@ static eeprom24c0x_t spd_eeprom = {
 
 static void generate_eeprom_spd(uint8_t *eeprom, ram_addr_t ram_size)
 {
-enum { SDR = 0x4, DDR2 = 0x8 } type;
+enum sdram_type type;
 uint8_t *spd = spd_eeprom.contents;
 uint8_t nbanks = 0;
 uint16_t density = 0;
-- 
2.41.0

[PATCH] target/sparc: Clean up global variable shadowing

2023-10-09 Thread Philippe Mathieu-Daudé

Fix:

  target/sparc/translate.c:2823:66: error: declaration shadows a variable in 
the global scope [-Werror,-Wshadow]
  static void gen_load_trap_state_at_tl(TCGv_ptr r_tsptr, TCGv_env tcg_env)
   ^
  include/tcg/tcg.h:579:17: note: previous declaration is here
  extern TCGv_env tcg_env;
  ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/sparc/translate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index f92ff80ac8..26ed371109 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -2820,19 +2820,19 @@ static void gen_fmovq(DisasContext *dc, DisasCompare 
*cmp, int rd, int rs)
 }
 
 #ifndef CONFIG_USER_ONLY
-static void gen_load_trap_state_at_tl(TCGv_ptr r_tsptr, TCGv_env tcg_env)
+static void gen_load_trap_state_at_tl(TCGv_ptr r_tsptr, TCGv_env env)
 {
 TCGv_i32 r_tl = tcg_temp_new_i32();
 
 /* load env->tl into r_tl */
-tcg_gen_ld_i32(r_tl, tcg_env, offsetof(CPUSPARCState, tl));
+tcg_gen_ld_i32(r_tl, env, offsetof(CPUSPARCState, tl));
 
 /* tl = [0 ... MAXTL_MASK] where MAXTL_MASK must be power of 2 */
 tcg_gen_andi_i32(r_tl, r_tl, MAXTL_MASK);
 
 /* calculate offset to current trap state from env->ts, reuse r_tl */
 tcg_gen_muli_i32(r_tl, r_tl, sizeof (trap_state));
-tcg_gen_addi_ptr(r_tsptr, tcg_env, offsetof(CPUSPARCState, ts));
+tcg_gen_addi_ptr(r_tsptr, env, offsetof(CPUSPARCState, ts));
 
 /* tsptr = env->ts[env->tl & MAXTL_MASK] */
 {
-- 
2.41.0

Re: [PATCH] memory: drop needless argument

2023-10-09 Thread David Hildenbrand


On 09.10.23 09:52, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

The argument is unused since commit bdc44640c ("cpu: Use QTAILQ for CPU list").

Signed-off-by: Marc-André Lureau 
---


Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb

Re: [PATCH] memory: follow Error API guidelines

2023-10-09 Thread David Hildenbrand


On 09.10.23 09:53, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

Return true/false on success/failure.

Signed-off-by: Marc-André Lureau 
---


Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb

Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates

2023-10-09 Thread Denis V. Lunev


On 10/6/23 20:10, Vladimir Sementsov-Ogievskiy wrote:

On 06.10.23 11:56, Kevin Wolf wrote:

Am 05.10.2023 um 21:55 hat Vladimir Sementsov-Ogievskiy geschrieben:

On 11.09.23 12:46, Kevin Wolf wrote:
When the permission related BlockDriver callbacks are called, we 
are in
the middle of an operation traversing the block graph. Polling in 
such a

place is a very bad idea because the graph could change in unexpected
ways. In the future, callers will also hold the graph lock, which is
likely to turn polling into a deadlock.

So we need to get rid of calls to functions like bdrv_getlength() or
bdrv_truncate() there as these functions poll internally. They are
currently used so that when no parent has write/resize permissions on
the image any more, the preallocate filter drops the extra 
preallocated

area in the image file and gives up write/resize permissions itself.

In order to achieve this without polling in .bdrv_check_perm, don't
immediately truncate the image, but only schedule a BH to do so. The
filter keeps the write/resize permissions a bit longer now until 
the BH

has executed.

There is one case in which delaying doesn't work: Reopening the image
read-only. In this case, bs->file will likely be reopened read-only,
too, so keeping write permissions a bit longer on it doesn't work. But
we can already cover this case in preallocate_reopen_prepare() and not
rely on the permission updates for it.


Hmm, now I found one more "future" case.

I now try to rebase my "[PATCH v7 0/7] blockdev-replace"
https://patchew.org/QEMU/20230421114102.884457-1-vsement...@yandex-team.ru/ 



And it breaks after this commit.

By accident, blockdev-replace series uses exactly "preallocate" filter
to test insertion/removing of filters. And removing is broken now.

Removing is done as follows:

1. We have filter inserted: disk0 -file-> filter -file-> file0

2. blockdev-replace, replaces file child of disk0, so we should get 
the picture*: disk0 -file-> file0 <-file- filter


3. blockdev-del filter


But step [2] fails, as now preallocate filter doesn't drop permissions
during the operation (postponing this for a while) and the picture* is
impossible. Permission check fails.

Hmmm... Any idea how blockdev-replace and preallocate filter should
work :) ? Maybe, doing truncation in .drain_begin() will help? Will
try


Hm... What preallocate tries to do is really tricky...

Of course, the error is correct, this is an invalid configuration if
preallocate can still resize the image. So it would have to truncate the
file earlier, but the first time that preallocate knows of the change is
already too late to run requests.

Truncating on drain_begin feels more like a hack, but as long as it does
the job... Of course, this will have the preallocation truncated away on
events that have nothing to do with removing the filter. It's not
necessarily a disaster because preallocation is only an optimisation,
but it doesn't feel great.


Hmm, yes, that's not good.



Maybe let's take a step back: Which scenario is the preallocate driver
meant for and why do we even need to truncate the image file after
removing the filter? I suppose the filter doesn't make sense with raw
images because these are fixed size anyway, and pretty much any other
image format should be able to tolerate a permanently rounded up file
size. As long as you don't write to the preallocated area, it shouldn't
take space either on any sane filesystem.

Hmm, actually both VHD and VMDK can have footers, better avoid it with
those... But if truncating the image file on close is critical, what do
you do on crashes? Maybe preallocate should just not be considered
compatible with these formats?



Originally preallocate filter was made to be used with qcow2, on some 
proprietary storage, where:


1. Allocating of big chunk works a lot fater than allocating several 
smaller chunks
2. Holes are not free and/or file length is not free, so we really 
want to truncate the file back on close


Den, correct me if I'm wrong.


1. Absolutely correct. This is true when the file attributes
    are stored in a centralized place aka metadata storage
    and requests to it does not scale well.

2. This is at my opinion has different meaning. We have
    tried to make local storage behavior and distributed
    storage behavior to be the same when VM is off, i.e.
    the file should be in the same state (no free blocks
    at the end of the file).



Good thing is that in this scenario we don't need to remove the filter 
in runtime, so there is no problem.



Yes, this filter is not dynamic in that respect. It is either
here or not here.




Now I think that the generic solution is just add a new handler 
.bdrv_pre_replace, so blockdev-replace may work as follows:


drain_begin

call .bdrv_pre_replace for all affected nodes

do the replace

drain_end

And prellocate filter would do truncation in this .bdrv_pre_replace 
handler and set some flag, that we have nothing to trunctate (the flag 
is a

Re: vIOMMU - PCI pass through to Layer 2 VMs (Nested Virtualization)

2023-10-09 Thread Eric Auger

Hi Markus,

On 10/9/23 09:06, Markus Frank wrote:
> Hello,
> 
> I have already sent this email to qemu-discuss but I did not get a reply.
> https://lists.nongnu.org/archive/html/qemu-discuss/2023-09/msg00034.html
> Maybe someone here could help me and reply to this email or the one on
> qemu-discuss?
> 
> I would like to pass through PCI devices to Layer-2 VMs via Nested
> Virtualization.
> 
> Is there current documentation for this topic somewhere?
> 
> I used these parameters:
> -machine ...,kernel-irqchip=split
> -device intel-iommu
> 
> With these parameters PCI pass through to L2-VMs worked fine.
> 
> 
> Now I come to the part where I get confused.
> 
> https://wiki.qemu.org/Features/VT-d#With_Virtio_Devices
> Is this documentation relevant for PCI pass through? Do I need DMAR for
> virtio devices?
If you just want the host assigned devices to be protected by the
viommu, you don't need to add iommu_platform=on along with the
virtio-pci devices.
> 
> And there is also the virtio-iommu device where I also could use the
> i440fx chipset.
> https://michael2012z.medium.com/virtio-iommu-789369049443

you can use virtio-iommu with q35 machine.
> 
> When adding "-device virtio-iommu-pci" pci pass through also works
> but I get "kvm: virtio_iommu_translate no mapping for 0x1002030f000 for
> sid=240"
> when starting qemu. What could that mean?
Normally you shouldn't get any such error. This means there is no
mapping programmed by the iommu-driver for this requester id (0x240) and
this iova=0x1002030f000. But if I understand correctly this does not
prevent your device from working, correct?
> 
> What do these parameters
> "disable-legacy=on,disable-modern=off,iommu_platform=on,ats=on"
> actually do? When do I need them and on which virtio devices?
you need them if you want your virtio devices to be protected by the
viommu. Otherwise the viommu is bypassed.
> 
> And which device should I rather use: virtio-iommu or intel-iommu?
Both should be working. virtio-iommu is more recent and less used in
production than intel-iommu though.

Thanks

Eric
> 
> Thanks in advance,
> Markus
> 
>

[PATCH] system/vl: Use global &bdo_queue in configure_blockdev()

2023-10-09 Thread Philippe Mathieu-Daudé

Commit d11bf9bf0f ("vl: Factor configure_blockdev() out of main()")
passed &bdo_queue as argument, but this isn't really necessary since
there is only one call, so we still use the global variable.

Dropping the &bdo_queue argument allows to silence this global shadow
warning:

  softmmu/vl.c:678:54: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
  static void configure_blockdev(BlockdevOptionsQueue *bdo_queue,
   ^
  softmmu/vl.c:172:29: note: previous declaration is here
  static BlockdevOptionsQueue bdo_queue = QSIMPLEQ_HEAD_INITIALIZER(bdo_queue);
  ^

Remove a spurious empty line.

Signed-off-by: Philippe Mathieu-Daudé 
---
 softmmu/vl.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index 98e071e63b..bc283b9fd4 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -674,8 +674,7 @@ static void default_drive(int enable, int snapshot, 
BlockInterfaceType type,
 
 }
 
-static void configure_blockdev(BlockdevOptionsQueue *bdo_queue,
-   MachineClass *machine_class, int snapshot)
+static void configure_blockdev(MachineClass *machine_class, int snapshot)
 {
 /*
  * If the currently selected machine wishes to override the
@@ -688,10 +687,10 @@ static void configure_blockdev(BlockdevOptionsQueue 
*bdo_queue,
 }
 
 /* open the virtual block devices */
-while (!QSIMPLEQ_EMPTY(bdo_queue)) {
-BlockdevOptionsQueueEntry *bdo = QSIMPLEQ_FIRST(bdo_queue);
+while (!QSIMPLEQ_EMPTY(&bdo_queue)) {
+BlockdevOptionsQueueEntry *bdo = QSIMPLEQ_FIRST(&bdo_queue);
 
-QSIMPLEQ_REMOVE_HEAD(bdo_queue, entry);
+QSIMPLEQ_REMOVE_HEAD(&bdo_queue, entry);
 loc_push_restore(&bdo->loc);
 qmp_blockdev_add(bdo->bdo, &error_fatal);
 loc_pop(&bdo->loc);
@@ -712,7 +711,6 @@ static void configure_blockdev(BlockdevOptionsQueue 
*bdo_queue,
   CDROM_OPTS);
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
-
 }
 
 static QemuOptsList qemu_smp_opts = {
@@ -1961,7 +1959,7 @@ static void qemu_create_early_backends(void)
  * Note: we need to create audio and block backends before
  * setting machine properties, so they can be referred to.
  */
-configure_blockdev(&bdo_queue, machine_class, snapshot);
+configure_blockdev(machine_class, snapshot);
 audio_init_audiodevs();
 }
 
-- 
2.41.0

[PATCH v2] buildsys: Only display Objective-C information when Objective-C is used

2023-10-09 Thread Philippe Mathieu-Daudé

When configuring with '--disable-cocoa --disable-coreaudio'
on Darwin, we get:

 meson.build:4081:58: ERROR: Tried to access compiler for language "objc", not 
specified for host machine.
 meson.build:4097:47: ERROR: Tried to access unknown option 'objc_args'.

Instead of unconditionally display Objective-C informations
on Darwin, display them when Objective-C is discovered.

Signed-off-by: Philippe Mathieu-Daudé 
---
v2: Emit 'false' (Akihiko)
---
 meson.build | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 3bb64b536c..567c1c9add 100644
--- a/meson.build
+++ b/meson.build
@@ -4074,8 +4074,10 @@ if 'cpp' in all_languages
 else
   summary_info += {'C++ compiler':  false}
 endif
-if targetos == 'darwin'
+if 'objc' in all_languages
   summary_info += {'Objective-C compiler': ' 
'.join(meson.get_compiler('objc').cmd_array())}
+else
+  summary_info += {'Objective-C compiler': false}
 endif
 option_cflags = (get_option('debug') ? ['-g'] : [])
 if get_option('optimization') != 'plain'
@@ -4085,7 +4087,7 @@ summary_info += {'CFLAGS':' 
'.join(get_option('c_args') + option_cfl
 if 'cpp' in all_languages
   summary_info += {'CXXFLAGS':' '.join(get_option('cpp_args') + 
option_cflags)}
 endif
-if targetos == 'darwin'
+if 'objc' in all_languages
   summary_info += {'OBJCFLAGS':   ' '.join(get_option('objc_args') + 
option_cflags)}
 endif
 link_args = get_option('c_link_args')
-- 
2.41.0

Re: [PATCH v2] buildsys: Only display Objective-C information when Objective-C is used

2023-10-09 Thread Akihiko Odaki


On 2023/10/09 18:38, Philippe Mathieu-Daudé wrote:

When configuring with '--disable-cocoa --disable-coreaudio'
on Darwin, we get:

  meson.build:4081:58: ERROR: Tried to access compiler for language "objc", not 
specified for host machine.
  meson.build:4097:47: ERROR: Tried to access unknown option 'objc_args'.

Instead of unconditionally display Objective-C informations
on Darwin, display them when Objective-C is discovered.

Signed-off-by: Philippe Mathieu-Daudé 
---
v2: Emit 'false' (Akihiko)
---
  meson.build | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 3bb64b536c..567c1c9add 100644
--- a/meson.build
+++ b/meson.build
@@ -4074,8 +4074,10 @@ if 'cpp' in all_languages
  else
summary_info += {'C++ compiler':  false}
  endif
-if targetos == 'darwin'
+if 'objc' in all_languages
summary_info += {'Objective-C compiler': ' 
'.join(meson.get_compiler('objc').cmd_array())}
+else
+  summary_info += {'Objective-C compiler': false}
  endif
  option_cflags = (get_option('debug') ? ['-g'] : [])
  if get_option('optimization') != 'plain'
@@ -4085,7 +4087,7 @@ summary_info += {'CFLAGS':' 
'.join(get_option('c_args') + option_cfl
  if 'cpp' in all_languages
summary_info += {'CXXFLAGS':' '.join(get_option('cpp_args') + 
option_cflags)}
  endif
-if targetos == 'darwin'
+if 'objc' in all_languages
summary_info += {'OBJCFLAGS':   ' '.join(get_option('objc_args') + 
option_cflags)}
  endif
  link_args = get_option('c_link_args')


Reviewed-by: Akihiko Odaki

[PATCH v2 08/10] blockjob: query driver-specific info via a new 'query' driver method

2023-10-09 Thread Fiona Ebner

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 blockjob.c   | 4 
 include/block/blockjob_int.h | 5 +
 2 files changed, 9 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index f8cf6e58e2..7e8cfad0fd 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -376,6 +376,7 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error 
**errp)
 {
 BlockJobInfo *info;
 uint64_t progress_current, progress_total;
+const BlockJobDriver *drv = block_job_driver(job);
 
 GLOBAL_STATE_CODE();
 
@@ -405,6 +406,9 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error 
**errp)
 g_strdup(error_get_pretty(job->job.err)) :
 g_strdup(strerror(-job->job.ret));
 }
+if (drv->query) {
+drv->query(job, info);
+}
 return info;
 }
 
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f604985315..4ab88b3c97 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -72,6 +72,11 @@ struct BlockJobDriver {
  * Change the @job's options according to @opts.
  */
 void (*change)(BlockJob *job, BlockJobChangeOptions *opts, Error **errp);
+
+/*
+ * Query information specific to this kind of block job.
+ */
+void (*query)(BlockJob *job, BlockJobInfo *info);
 };
 
 /*
-- 
2.39.2

[PATCH v2 06/10] qapi/block-core: use JobType for BlockJobInfo's type

2023-10-09 Thread Fiona Ebner

In preparation to turn BlockJobInfo into a union with @type as the
discriminator. That requires it to be an enum.

No functional change is intended.

Signed-off-by: Fiona Ebner 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

No changes in v2.

 block/monitor/block-hmp-cmds.c | 4 ++--
 blockjob.c | 2 +-
 qapi/block-core.json   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index ca2599de44..f9f87e5c47 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -843,7 +843,7 @@ void hmp_info_block_jobs(Monitor *mon, const QDict *qdict)
 }
 
 while (list) {
-if (strcmp(list->value->type, "stream") == 0) {
+if (list->value->type == JOB_TYPE_STREAM) {
 monitor_printf(mon, "Streaming device %s: Completed %" PRId64
" of %" PRId64 " bytes, speed limit %" PRId64
" bytes/s\n",
@@ -855,7 +855,7 @@ void hmp_info_block_jobs(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "Type %s, device %s: Completed %" PRId64
" of %" PRId64 " bytes, speed limit %" PRId64
" bytes/s\n",
-   list->value->type,
+   JobType_str(list->value->type),
list->value->device,
list->value->offset,
list->value->len,
diff --git a/blockjob.c b/blockjob.c
index d53bc775d2..f8cf6e58e2 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -388,7 +388,7 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error 
**errp)
   &progress_total);
 
 info = g_new0(BlockJobInfo, 1);
-info->type  = g_strdup(job_type_str(&job->job));
+info->type  = job_type(&job->job);
 info->device= g_strdup(job->job.id);
 info->busy  = job->job.busy;
 info->paused= job->job.pause_count > 0;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 01427c259a..a19718a69f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1396,7 +1396,7 @@
 # Since: 1.1
 ##
 { 'struct': 'BlockJobInfo',
-  'data': {'type': 'str', 'device': 'str', 'len': 'int',
+  'data': {'type': 'JobType', 'device': 'str', 'len': 'int',
'offset': 'int', 'busy': 'bool', 'paused': 'bool', 'speed': 'int',
'io-status': 'BlockDeviceIoStatus', 'ready': 'bool',
'status': 'JobStatus',
-- 
2.39.2

[PATCH v2 07/10] qapi/block-core: turn BlockJobInfo into a union

2023-10-09 Thread Fiona Ebner

In preparation to additionally return job-type-specific information.

Signed-off-by: Fiona Ebner 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

No changes in v2.

 qapi/block-core.json | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index a19718a69f..950542b735 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1395,13 +1395,15 @@
 #
 # Since: 1.1
 ##
-{ 'struct': 'BlockJobInfo',
-  'data': {'type': 'JobType', 'device': 'str', 'len': 'int',
+{ 'union': 'BlockJobInfo',
+  'base': {'type': 'JobType', 'device': 'str', 'len': 'int',
'offset': 'int', 'busy': 'bool', 'paused': 'bool', 'speed': 'int',
'io-status': 'BlockDeviceIoStatus', 'ready': 'bool',
'status': 'JobStatus',
'auto-finalize': 'bool', 'auto-dismiss': 'bool',
-   '*error': 'str' } }
+   '*error': 'str' },
+  'discriminator': 'type',
+  'data': {} }
 
 ##
 # @query-block-jobs:
-- 
2.39.2

[PATCH v2 04/10] block/mirror: determine copy_to_target only once

2023-10-09 Thread Fiona Ebner

In preparation to allow changing the copy_mode via QMP. When running
in an iothread, it could be that copy_mode is changed from the main
thread in between reading copy_mode in bdrv_mirror_top_pwritev() and
reading copy_mode in bdrv_mirror_top_do_write(), so they might end up
disagreeing about whether copy_to_target is true or false. Avoid that
scenario by determining copy_to_target only once and passing it to
bdrv_mirror_top_do_write() as an argument.

Signed-off-by: Fiona Ebner 
---

New in v2.

 block/mirror.c | 41 ++---
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 0ed54754e2..8992c09172 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1463,21 +1463,21 @@ bdrv_mirror_top_preadv(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
 }
 
+static bool should_copy_to_target(MirrorBDSOpaque *s)
+{
+return s->job && s->job->ret >= 0 &&
+!job_is_cancelled(&s->job->common.job) &&
+s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
+}
+
 static int coroutine_fn GRAPH_RDLOCK
 bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method,
- uint64_t offset, uint64_t bytes, QEMUIOVector *qiov,
- int flags)
+ bool copy_to_target, uint64_t offset, uint64_t bytes,
+ QEMUIOVector *qiov, int flags)
 {
 MirrorOp *op = NULL;
 MirrorBDSOpaque *s = bs->opaque;
 int ret = 0;
-bool copy_to_target = false;
-
-if (s->job) {
-copy_to_target = s->job->ret >= 0 &&
- !job_is_cancelled(&s->job->common.job) &&
- s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
-}
 
 if (copy_to_target) {
 op = active_write_prepare(s->job, offset, bytes);
@@ -1523,17 +1523,10 @@ static int coroutine_fn GRAPH_RDLOCK
 bdrv_mirror_top_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
 QEMUIOVector *qiov, BdrvRequestFlags flags)
 {
-MirrorBDSOpaque *s = bs->opaque;
 QEMUIOVector bounce_qiov;
 void *bounce_buf;
 int ret = 0;
-bool copy_to_target = false;
-
-if (s->job) {
-copy_to_target = s->job->ret >= 0 &&
- !job_is_cancelled(&s->job->common.job) &&
- s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
-}
+bool copy_to_target = should_copy_to_target(bs->opaque);
 
 if (copy_to_target) {
 /* The guest might concurrently modify the data to write; but
@@ -1550,8 +1543,8 @@ bdrv_mirror_top_pwritev(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 flags &= ~BDRV_REQ_REGISTERED_BUF;
 }
 
-ret = bdrv_mirror_top_do_write(bs, MIRROR_METHOD_COPY, offset, bytes, qiov,
-   flags);
+ret = bdrv_mirror_top_do_write(bs, MIRROR_METHOD_COPY, copy_to_target,
+   offset, bytes, qiov, flags);
 
 if (copy_to_target) {
 qemu_iovec_destroy(&bounce_qiov);
@@ -1574,15 +1567,17 @@ static int coroutine_fn GRAPH_RDLOCK
 bdrv_mirror_top_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
   int64_t bytes, BdrvRequestFlags flags)
 {
-return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_ZERO, offset, bytes, 
NULL,
-flags);
+bool copy_to_target = should_copy_to_target(bs->opaque);
+return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_ZERO, copy_to_target,
+offset, bytes, NULL, flags);
 }
 
 static int coroutine_fn GRAPH_RDLOCK
 bdrv_mirror_top_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
-return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_DISCARD, offset, bytes,
-NULL, 0);
+bool copy_to_target = should_copy_to_target(bs->opaque);
+return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_DISCARD, copy_to_target,
+offset, bytes, NULL, 0);
 }
 
 static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs)
-- 
2.39.2

[PATCH 1/6] hw/core/cpu: Clean up global variable shadowing

2023-10-09 Thread Philippe Mathieu-Daudé

Fix:

  hw/core/machine.c:1302:22: error: declaration shadows a variable in the 
global scope [-Werror,-Wshadow]
  const CPUArchId *cpus = possible_cpus->cpus;
   ^
  hw/core/numa.c:69:17: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
  uint16List *cpus = NULL;
  ^
  hw/acpi/aml-build.c:2005:20: error: declaration shadows a variable in the 
global scope [-Werror,-Wshadow]
  CPUArchIdList *cpus = ms->possible_cpus;
 ^
  hw/core/machine-smp.c:77:14: error: declaration shadows a variable in the 
global scope [-Werror,-Wshadow]
  unsigned cpus= config->has_cpus ? config->cpus : 0;
   ^
  include/hw/core/cpu.h:589:17: note: previous declaration is here
  extern CPUTailQ cpus;
  ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/core/cpu.h | 8 
 cpu-common.c  | 6 +++---
 target/s390x/cpu_models.c | 2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e02bc5980f..d0dc0a1698 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -586,13 +586,13 @@ static inline CPUArchState *cpu_env(CPUState *cpu)
 }
 
 typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
-extern CPUTailQ cpus;
+extern CPUTailQ cpus_queue;
 
-#define first_cpuQTAILQ_FIRST_RCU(&cpus)
+#define first_cpuQTAILQ_FIRST_RCU(&cpus_queue)
 #define CPU_NEXT(cpu)QTAILQ_NEXT_RCU(cpu, node)
-#define CPU_FOREACH(cpu) QTAILQ_FOREACH_RCU(cpu, &cpus, node)
+#define CPU_FOREACH(cpu) QTAILQ_FOREACH_RCU(cpu, &cpus_queue, node)
 #define CPU_FOREACH_SAFE(cpu, next_cpu) \
-QTAILQ_FOREACH_SAFE_RCU(cpu, &cpus, node, next_cpu)
+QTAILQ_FOREACH_SAFE_RCU(cpu, &cpus_queue, node, next_cpu)
 
 extern __thread CPUState *current_cpu;
 
diff --git a/cpu-common.c b/cpu-common.c
index 45c745ecf6..c81fd72d16 100644
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -73,7 +73,7 @@ static int cpu_get_free_index(void)
 return max_cpu_index;
 }
 
-CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+CPUTailQ cpus_queue = QTAILQ_HEAD_INITIALIZER(cpus_queue);
 static unsigned int cpu_list_generation_id;
 
 unsigned int cpu_list_generation_id_get(void)
@@ -90,7 +90,7 @@ void cpu_list_add(CPUState *cpu)
 } else {
 assert(!cpu_index_auto_assigned);
 }
-QTAILQ_INSERT_TAIL_RCU(&cpus, cpu, node);
+QTAILQ_INSERT_TAIL_RCU(&cpus_queue, cpu, node);
 cpu_list_generation_id++;
 }
 
@@ -102,7 +102,7 @@ void cpu_list_remove(CPUState *cpu)
 return;
 }
 
-QTAILQ_REMOVE_RCU(&cpus, cpu, node);
+QTAILQ_REMOVE_RCU(&cpus_queue, cpu, node);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
 cpu_list_generation_id++;
 }
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 98f14c09c2..b1e77b3a2b 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -756,7 +756,7 @@ void s390_set_qemu_cpu_model(uint16_t type, uint8_t gen, 
uint8_t ec_ga,
 const S390CPUDef *def = s390_find_cpu_def(type, gen, ec_ga, NULL);
 
 g_assert(def);
-g_assert(QTAILQ_EMPTY_RCU(&cpus));
+g_assert(QTAILQ_EMPTY_RCU(&cpus_queue));
 
 /* build the CPU model */
 s390_qemu_cpu_model.def = def;
-- 
2.41.0

[PATCH v2 00/10] mirror: allow switching from background to active mode

2023-10-09 Thread Fiona Ebner

Changes in v2:
* move bitmap to filter which allows to avoid draining when
  changing the copy mode
* add patch to determine copy_to_target only once
* drop patches returning redundant information upon query
* update QEMU version in QAPI
* update indentation in QAPI
* update indentation in QAPI (like in a937b6aa73 ("qapi: Reformat
  doc comments to conform to current conventions"))
* add patch to adapt iotest output

Discussion of v1:
https://lists.nongnu.org/archive/html/qemu-devel/2023-02/msg07216.html

With active mode, the guest write speed is limited by the synchronous
writes to the mirror target. For this reason, management applications
might want to start out in background mode and only switch to active
mode later, when certain conditions are met. This series adds a
block-job-change QMP command to achieve that, as well as
job-type-specific information when querying block jobs, which
can be used to decide when the switch should happen.

For now, only the direction background -> active is supported.

The information added upon querying is whether the target is actively
synced, the total data sent, and the remaining dirty bytes.

Initially, I tried to go for a more general 'job-change' command, but
I couldn't figure out a way to avoid mutual inclusion between
block-core.json and job.json.


Fiona Ebner (10):
  blockjob: introduce block-job-change QMP command
  block/mirror: set actively_synced even after the job is ready
  block/mirror: move dirty bitmap to filter
  block/mirror: determine copy_to_target only once
  mirror: implement mirror_change method
  qapi/block-core: use JobType for BlockJobInfo's type
  qapi/block-core: turn BlockJobInfo into a union
  blockjob: query driver-specific info via a new 'query' driver method
  mirror: return mirror-specific information upon query
  iotests: adapt test output for new mirror query property

 block/mirror.c | 95 +++---
 block/monitor/block-hmp-cmds.c |  4 +-
 blockdev.c | 14 +
 blockjob.c | 26 +-
 include/block/blockjob.h   | 11 
 include/block/blockjob_int.h   | 10 
 job.c  |  1 +
 qapi/block-core.json   | 59 +++--
 qapi/job.json  |  4 +-
 tests/qemu-iotests/109.out | 24 -
 10 files changed, 199 insertions(+), 49 deletions(-)

-- 
2.39.2

[PATCH v2 01/10] blockjob: introduce block-job-change QMP command

2023-10-09 Thread Fiona Ebner

which will allow changing job-type-specific options after job
creation.

In the JobVerbTable, the same allow bits as for set-speed are used,
because set-speed can be considered an existing change command.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* update QEMU version in QAPI
* fix typo in function comment

 blockdev.c   | 14 ++
 blockjob.c   | 20 
 include/block/blockjob.h | 11 +++
 include/block/blockjob_int.h |  5 +
 job.c|  1 +
 qapi/block-core.json | 26 ++
 qapi/job.json|  4 +++-
 7 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 325b7a3bef..d0e274ff8b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3344,6 +3344,20 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
 job_dismiss_locked(&job, errp);
 }
 
+void qmp_block_job_change(BlockJobChangeOptions *opts, Error **errp)
+{
+BlockJob *job;
+
+JOB_LOCK_GUARD();
+job = find_block_job_locked(opts->id, errp);
+
+if (!job) {
+return;
+}
+
+block_job_change_locked(job, opts, errp);
+}
+
 void qmp_change_backing_file(const char *device,
  const char *image_node_name,
  const char *backing_file,
diff --git a/blockjob.c b/blockjob.c
index 58c5d64539..d53bc775d2 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -328,6 +328,26 @@ static bool block_job_set_speed(BlockJob *job, int64_t 
speed, Error **errp)
 return block_job_set_speed_locked(job, speed, errp);
 }
 
+void block_job_change_locked(BlockJob *job, BlockJobChangeOptions *opts,
+ Error **errp)
+{
+const BlockJobDriver *drv = block_job_driver(job);
+
+GLOBAL_STATE_CODE();
+
+if (job_apply_verb_locked(&job->job, JOB_VERB_CHANGE, errp)) {
+return;
+}
+
+if (drv->change) {
+job_unlock();
+drv->change(job, opts, errp);
+job_lock();
+} else {
+error_setg(errp, "Job type does not support change");
+}
+}
+
 void block_job_ratelimit_processed_bytes(BlockJob *job, uint64_t n)
 {
 IO_CODE();
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 058b0c824c..95854f1477 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -172,6 +172,17 @@ bool block_job_has_bdrv(BlockJob *job, BlockDriverState 
*bs);
  */
 bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp);
 
+/**
+ * block_job_change_locked:
+ * @job: The job to change.
+ * @opts: The new options.
+ * @errp: Error object.
+ *
+ * Change the job according to opts.
+ */
+void block_job_change_locked(BlockJob *job, BlockJobChangeOptions *opts,
+ Error **errp);
+
 /**
  * block_job_query_locked:
  * @job: The job to get information about.
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 104824040c..f604985315 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -67,6 +67,11 @@ struct BlockJobDriver {
 void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
 
 void (*set_speed)(BlockJob *job, int64_t speed);
+
+/*
+ * Change the @job's options according to @opts.
+ */
+void (*change)(BlockJob *job, BlockJobChangeOptions *opts, Error **errp);
 };
 
 /*
diff --git a/job.c b/job.c
index 72d57f0934..99a2e54b54 100644
--- a/job.c
+++ b/job.c
@@ -80,6 +80,7 @@ bool JobVerbTable[JOB_VERB__MAX][JOB_STATUS__MAX] = {
 [JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0},
 [JOB_VERB_FINALIZE] = {0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0},
 [JOB_VERB_DISMISS]  = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
+[JOB_VERB_CHANGE]   = {0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0},
 };
 
 /* Transactional group of jobs */
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 89751d81f2..c6f31a9399 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3044,6 +3044,32 @@
 { 'command': 'block-job-finalize', 'data': { 'id': 'str' },
   'allow-preconfig': true }
 
+##
+# @BlockJobChangeOptions:
+#
+# Block job options that can be changed after job creation.
+#
+# @id: The job identifier
+#
+# @type: The job type
+#
+# Since 8.2
+##
+{ 'union': 'BlockJobChangeOptions',
+  'base': { 'id': 'str', 'type': 'JobType' },
+  'discriminator': 'type',
+  'data': {} }
+
+##
+# @block-job-change:
+#
+# Change the block job's options.
+#
+# Since: 8.2
+##
+{ 'command': 'block-job-change',
+  'data': 'BlockJobChangeOptions', 'boxed': true }
+
 ##
 # @BlockdevDiscardOptions:
 #
diff --git a/qapi/job.json b/qapi/job.json
index 7f0ba090de..b3957207a4 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -105,11 +105,13 @@
 #
 # @finalize: see @job-finalize
 #
+# @change: see @block-job-change (since 8.2)
+#
 # Since: 2.12
 ##
 { 'enum': 'JobVerb',
   'data': ['cancel', 'pause', 'res

[PATCH 6/6] hw/s390x: Clean up global variable shadowing in quiesce_powerdown_req()

2023-10-09 Thread Philippe Mathieu-Daudé

Fix:

  hw/s390x/sclpquiesce.c:90:22: error: declaration shadows a variable in the 
global scope [-Werror,-Wshadow]
  QuiesceNotifier *qn = container_of(n, QuiesceNotifier, notifier);
   ^
  hw/s390x/sclpquiesce.c:86:3: note: previous declaration is here
  } qn;
^

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/s390x/sclpquiesce.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/sclpquiesce.c b/hw/s390x/sclpquiesce.c
index ce07b16884..a641089929 100644
--- a/hw/s390x/sclpquiesce.c
+++ b/hw/s390x/sclpquiesce.c
@@ -78,12 +78,10 @@ static const VMStateDescription vmstate_sclpquiesce = {
  }
 };
 
-typedef struct QuiesceNotifier QuiesceNotifier;
-
-static struct QuiesceNotifier {
+typedef struct QuiesceNotifier {
 Notifier notifier;
 SCLPEvent *event;
-} qn;
+} QuiesceNotifier;
 
 static void quiesce_powerdown_req(Notifier *n, void *opaque)
 {
@@ -97,6 +95,8 @@ static void quiesce_powerdown_req(Notifier *n, void *opaque)
 
 static int quiesce_init(SCLPEvent *event)
 {
+static QuiesceNotifier qn;
+
 qn.notifier.notify = quiesce_powerdown_req;
 qn.event = event;
 
-- 
2.41.0

[PATCH v2 03/10] block/mirror: move dirty bitmap to filter

2023-10-09 Thread Fiona Ebner

In preparation to allow switching to active mode without draining.
Initialization of the bitmap in mirror_dirty_init() still happens with
the original/backing BlockDriverState, which should be fine, because
the mirror top has the same length.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---

New in v2.

 block/mirror.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b764ad5108..0ed54754e2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1500,6 +1500,10 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, 
MirrorMethod method,
 abort();
 }
 
+if (!copy_to_target && s->job && s->job->dirty_bitmap) {
+bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes);
+}
+
 if (ret < 0) {
 goto out;
 }
@@ -1823,13 +1827,17 @@ static BlockJob *mirror_start_job(
 s->should_complete = true;
 }
 
-s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
+s->dirty_bitmap = bdrv_create_dirty_bitmap(s->mirror_top_bs, granularity,
+   NULL, errp);
 if (!s->dirty_bitmap) {
 goto fail;
 }
-if (s->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING) {
-bdrv_disable_dirty_bitmap(s->dirty_bitmap);
-}
+
+/*
+ * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active
+ * mode.
+ */
+bdrv_disable_dirty_bitmap(s->dirty_bitmap);
 
 ret = block_job_add_bdrv(&s->common, "source", bs, 0,
  BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
-- 
2.39.2

[PATCH 3/6] hw/display/vga: Clean up global variable shadowing

2023-10-09 Thread Philippe Mathieu-Daudé

Fix:

  hw/display/vga.c:2307:29: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
MemoryRegion *address_space_io, bool init_vga_ports)
^
  include/exec/address-spaces.h:35:21: note: previous declaration is here
  extern AddressSpace address_space_io;
  ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/display/vga_int.h | 2 +-
 hw/display/vga.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/display/vga_int.h b/hw/display/vga_int.h
index 7cf0d11201..94949d8a0c 100644
--- a/hw/display/vga_int.h
+++ b/hw/display/vga_int.h
@@ -157,7 +157,7 @@ static inline int c6_to_8(int v)
 }
 
 bool vga_common_init(VGACommonState *s, Object *obj, Error **errp);
-void vga_init(VGACommonState *s, Object *obj, MemoryRegion *address_space,
+void vga_init(VGACommonState *s, Object *obj, MemoryRegion *io,
   MemoryRegion *address_space_io, bool init_vga_ports);
 MemoryRegion *vga_init_io(VGACommonState *s, Object *obj,
   const MemoryRegionPortio **vga_ports,
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 37557c3442..bb4cd240ec 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2304,7 +2304,7 @@ MemoryRegion *vga_init_io(VGACommonState *s, Object *obj,
 }
 
 void vga_init(VGACommonState *s, Object *obj, MemoryRegion *address_space,
-  MemoryRegion *address_space_io, bool init_vga_ports)
+  MemoryRegion *io, bool init_vga_ports)
 {
 MemoryRegion *vga_io_memory;
 const MemoryRegionPortio *vga_ports, *vbe_ports;
@@ -2324,10 +2324,10 @@ void vga_init(VGACommonState *s, Object *obj, 
MemoryRegion *address_space,
 if (init_vga_ports) {
 portio_list_init(&s->vga_port_list, obj, vga_ports, s, "vga");
 portio_list_set_flush_coalesced(&s->vga_port_list);
-portio_list_add(&s->vga_port_list, address_space_io, 0x3b0);
+portio_list_add(&s->vga_port_list, io, 0x3b0);
 }
 if (vbe_ports) {
 portio_list_init(&s->vbe_port_list, obj, vbe_ports, s, "vbe");
-portio_list_add(&s->vbe_port_list, address_space_io, 0x1ce);
+portio_list_add(&s->vbe_port_list, io, 0x1ce);
 }
 }
-- 
2.41.0

[PATCH v2 10/10] iotests: adapt test output for new mirror query property

2023-10-09 Thread Fiona Ebner

Signed-off-by: Fiona Ebner 
---

New in v2.

 tests/qemu-iotests/109.out | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/109.out b/tests/qemu-iotests/109.out
index 2611d6a40f..965c9a6a0a 100644
--- a/tests/qemu-iotests/109.out
+++ b/tests/qemu-iotests/109.out
@@ -38,7 +38,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 1024, "offset": 1024, 
"speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", 
"auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": 
"ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", 
"auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": 
"ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", 
"actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -90,7 +90,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 197120, "offset": 197120, 
"speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", 
"auto-dismiss": true, "busy": false, "len": 197120, "offset": 197120, "status": 
"ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", 
"auto-dismiss": true, "busy": false, "len": 197120, "offset": 197120, "status": 
"ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", 
"actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -142,7 +142,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 327680, "offset": 327680, 
"speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", 
"auto-dismiss": true, "busy": false, "len": 327680, "offset": 327680, "status": 
"ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", 
"auto-dismiss": true, "busy": false, "len": 327680, "offset": 327680, "status": 
"ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", 
"actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -194,7 +194,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 1024, "offset": 1024, 
"speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", 
"auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": 
"ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", 
"auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": 
"ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", 
"actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -246,7 +246,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 65536, "offset": 65536, 
"speed": 0, "type": "mirror"}}
 {"execute":"query-block-jo

[PATCH 5/6] hw/pci: Clean up global variable shadowing of address_space_io variable

2023-10-09 Thread Philippe Mathieu-Daudé

Fix:

  hw/pci/pci.c:504:54: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
 MemoryRegion *address_space_io,
   ^
  hw/pci/pci.c:533:38: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
 MemoryRegion *address_space_io,
   ^
  hw/pci/pci.c:543:40: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
   MemoryRegion *address_space_io,
 ^
  hw/pci/pci.c:590:45: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
MemoryRegion *address_space_io,
  ^
  include/exec/address-spaces.h:35:21: note: previous declaration is here
  extern AddressSpace address_space_io;
  ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/pci/pci.h |  9 +++--
 hw/pci/pci.c | 25 +
 2 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index b70a0b95ff..ea5aff118b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -279,12 +279,10 @@ bool pci_bus_is_express(const PCIBus *bus);
 
 void pci_root_bus_init(PCIBus *bus, size_t bus_size, DeviceState *parent,
const char *name,
-   MemoryRegion *address_space_mem,
-   MemoryRegion *address_space_io,
+   MemoryRegion *mem, MemoryRegion *io,
uint8_t devfn_min, const char *typename);
 PCIBus *pci_root_bus_new(DeviceState *parent, const char *name,
- MemoryRegion *address_space_mem,
- MemoryRegion *address_space_io,
+ MemoryRegion *mem, MemoryRegion *io,
  uint8_t devfn_min, const char *typename);
 void pci_root_bus_cleanup(PCIBus *bus);
 void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq,
@@ -304,8 +302,7 @@ int pci_swizzle_map_irq_fn(PCIDevice *pci_dev, int pin);
 PCIBus *pci_register_root_bus(DeviceState *parent, const char *name,
   pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
   void *irq_opaque,
-  MemoryRegion *address_space_mem,
-  MemoryRegion *address_space_io,
+  MemoryRegion *mem, MemoryRegion *io,
   uint8_t devfn_min, int nirq,
   const char *typename);
 void pci_unregister_root_bus(PCIBus *bus);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b0d21bf43a..7d09e1a39d 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -500,15 +500,14 @@ bool pci_bus_bypass_iommu(PCIBus *bus)
 }
 
 static void pci_root_bus_internal_init(PCIBus *bus, DeviceState *parent,
-   MemoryRegion *address_space_mem,
-   MemoryRegion *address_space_io,
+   MemoryRegion *mem, MemoryRegion *io,
uint8_t devfn_min)
 {
 assert(PCI_FUNC(devfn_min) == 0);
 bus->devfn_min = devfn_min;
 bus->slot_reserved_mask = 0x0;
-bus->address_space_mem = address_space_mem;
-bus->address_space_io = address_space_io;
+bus->address_space_mem = mem;
+bus->address_space_io = io;
 bus->flags |= PCI_BUS_IS_ROOT;
 
 /* host bridge */
@@ -529,25 +528,21 @@ bool pci_bus_is_express(const PCIBus *bus)
 
 void pci_root_bus_init(PCIBus *bus, size_t bus_size, DeviceState *parent,
const char *name,
-   MemoryRegion *address_space_mem,
-   MemoryRegion *address_space_io,
+   MemoryRegion *mem, MemoryRegion *io,
uint8_t devfn_min, const char *typename)
 {
 qbus_init(bus, bus_size, typename, parent, name);
-pci_root_bus_internal_init(bus, parent, address_space_mem,
-   address_space_io, devfn_min);
+pci_root_bus_internal_init(bus, parent, mem, io, devfn_min);
 }
 
 PCIBus *pci_root_bus_new(DeviceState *parent, const char *name,
- MemoryRegion *address_space_mem,
- MemoryRegion *address_space_io,
+ MemoryRegion *mem, MemoryRegion *io,
  uint8_t devfn_min, const char *typename)
 {
 PCIBus *bus;
 
 bus = PCI_BUS(qbus_new(typename, parent, name));
-pci_root_bus_internal_init(bus, parent, address_space_mem,
-   address_space_io, devfn_min);
+pci_root_bus_internal_init(bus, parent, mem, io, devfn_min);
 return bus;
 }
 
@@ -586,15 +581,13 @@ void pci_bus_irqs_cleanup(PCIBus *bus

[PATCH v2 09/10] mirror: return mirror-specific information upon query

2023-10-09 Thread Fiona Ebner

To start out, only actively-synced is returned.

For example, this is useful for jobs that started out in background
mode and switched to active mode. Once actively-synced is true, it's
clear that the mode switch has been completed. Note that completion of
the switch might happen much earlier, e.g. if the switch happens
before the job is ready, once all background operations have finished.
It's assumed that whether the disks are actively-synced or not is more
interesting than whether the mode switch completed. That information
can still be added if required in the future.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* udpate QEMU version in QAPI
* update indentation in QAPI (like in a937b6aa73 ("qapi: Reformat
  doc comments to conform to current conventions"))

 block/mirror.c   | 10 ++
 qapi/block-core.json | 16 +++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 83aa4176c2..33b72ec5e5 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1267,6 +1267,15 @@ static void mirror_change(BlockJob *job, 
BlockJobChangeOptions *opts,
 s->copy_mode = MIRROR_COPY_MODE_WRITE_BLOCKING;
 }
 
+static void mirror_query(BlockJob *job, BlockJobInfo *info)
+{
+MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+info->u.mirror = (BlockJobInfoMirror) {
+.actively_synced = s->actively_synced,
+};
+}
+
 static const BlockJobDriver mirror_job_driver = {
 .job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
@@ -1282,6 +1291,7 @@ static const BlockJobDriver mirror_job_driver = {
 },
 .drained_poll   = mirror_drained_poll,
 .change = mirror_change,
+.query  = mirror_query,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 950542b735..35d67410cc 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1352,6 +1352,20 @@
 { 'enum': 'MirrorCopyMode',
   'data': ['background', 'write-blocking'] }
 
+##
+# @BlockJobInfoMirror:
+#
+# Information specific to mirror block jobs.
+#
+# @actively-synced: Whether the source is actively synced to the
+# target, i.e. same data and new writes are done synchronously to
+# both.
+#
+# Since 8.2
+##
+{ 'struct': 'BlockJobInfoMirror',
+  'data': { 'actively-synced': 'bool' } }
+
 ##
 # @BlockJobInfo:
 #
@@ -1403,7 +1417,7 @@
'auto-finalize': 'bool', 'auto-dismiss': 'bool',
'*error': 'str' },
   'discriminator': 'type',
-  'data': {} }
+  'data': { 'mirror': 'BlockJobInfoMirror' } }
 
 ##
 # @query-block-jobs:
-- 
2.39.2

[PATCH 0/6] hw: Clean up global variables shadowing

2023-10-09 Thread Philippe Mathieu-Daudé

Clean up global variables shadowing in hw/ in
order to be able to use -Wshadow with Clang.

Philippe Mathieu-Daudé (6):
  hw/core/cpu: Clean up global variable shadowing
  hw/loader: Clean up global variable shadowing in rom_add_file()
  hw/display/vga: Clean up global variable shadowing
  hw/acpi/pcihp: Clean up global variable shadowing in acpi_pcihp_init()
  hw/pci: Clean up global variable shadowing of address_space_io
variable
  hw/s390x: Clean up global variable shadowing in
quiesce_powerdown_req()

 hw/display/vga_int.h  |  2 +-
 include/hw/acpi/pcihp.h   |  2 +-
 include/hw/core/cpu.h |  8 
 include/hw/loader.h   |  2 +-
 include/hw/pci/pci.h  |  9 +++--
 cpu-common.c  |  6 +++---
 hw/acpi/pcihp.c   |  5 ++---
 hw/core/loader.c  |  4 ++--
 hw/display/vga.c  |  6 +++---
 hw/pci/pci.c  | 25 +
 hw/s390x/sclpquiesce.c|  8 
 target/s390x/cpu_models.c |  2 +-
 12 files changed, 34 insertions(+), 45 deletions(-)

-- 
2.41.0

[PATCH 2/6] hw/loader: Clean up global variable shadowing in rom_add_file()

2023-10-09 Thread Philippe Mathieu-Daudé

Fix:

  hw/core/loader.c:1073:27: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
   bool option_rom, MemoryRegion *mr,
^
  include/sysemu/sysemu.h:57:22: note: previous declaration is here
  extern QEMUOptionRom option_rom[MAX_OPTION_ROMS];
   ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/loader.h | 2 +-
 hw/core/loader.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/hw/loader.h b/include/hw/loader.h
index c4c14170ea..8685e27334 100644
--- a/include/hw/loader.h
+++ b/include/hw/loader.h
@@ -272,7 +272,7 @@ void pstrcpy_targphys(const char *name,
 
 ssize_t rom_add_file(const char *file, const char *fw_dir,
  hwaddr addr, int32_t bootindex,
- bool option_rom, MemoryRegion *mr, AddressSpace *as);
+ bool has_option_rom, MemoryRegion *mr, AddressSpace *as);
 MemoryRegion *rom_add_blob(const char *name, const void *blob, size_t len,
size_t max_len, hwaddr addr,
const char *fw_file_name,
diff --git a/hw/core/loader.c b/hw/core/loader.c
index 4dd5a71fb7..7f0cbfb214 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -1070,7 +1070,7 @@ static void *rom_set_mr(Rom *rom, Object *owner, const 
char *name, bool ro)
 
 ssize_t rom_add_file(const char *file, const char *fw_dir,
  hwaddr addr, int32_t bootindex,
- bool option_rom, MemoryRegion *mr,
+ bool has_option_rom, MemoryRegion *mr,
  AddressSpace *as)
 {
 MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
@@ -1139,7 +1139,7 @@ ssize_t rom_add_file(const char *file, const char *fw_dir,
  basename);
 snprintf(devpath, sizeof(devpath), "/rom@%s", fw_file_name);
 
-if ((!option_rom || mc->option_rom_has_mr) && mc->rom_file_has_mr) {
+if ((!has_option_rom || mc->option_rom_has_mr) && mc->rom_file_has_mr) 
{
 data = rom_set_mr(rom, OBJECT(fw_cfg), devpath, true);
 } else {
 data = rom->data;
-- 
2.41.0

[PATCH v2 05/10] mirror: implement mirror_change method

2023-10-09 Thread Fiona Ebner

which allows switching the @copy-mode from 'background' to
'write-blocking'.

This is useful for management applications, so they can start out in
background mode to avoid limiting guest write speed and switch to
active mode when certain criteria are fulfilled.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* update QEMU version in QAPI
* update indentation in QAPI (like in a937b6aa73 ("qapi: Reformat
  doc comments to conform to current conventions"))
* drop drained section and disable dirty bitmap call. It's already
  disabled, because the bitmap is now attached to the filter and
  set in bdrv_mirror_top_do_write(). See the earlier patch
  "block/mirror: move dirty bitmap to filter"

 block/mirror.c   | 22 ++
 qapi/block-core.json | 13 -
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index b84de56734..83aa4176c2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1246,6 +1246,27 @@ static bool commit_active_cancel(Job *job, bool force)
 return force || !job_is_ready(job);
 }
 
+static void mirror_change(BlockJob *job, BlockJobChangeOptions *opts,
+  Error **errp)
+{
+MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+BlockJobChangeOptionsMirror *change_opts = &opts->u.mirror;
+
+if (s->copy_mode == change_opts->copy_mode) {
+return;
+}
+
+if (s->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING) {
+error_setg(errp, "Cannot switch away from copy mode 'write-blocking'");
+return;
+}
+
+assert(s->copy_mode == MIRROR_COPY_MODE_BACKGROUND &&
+   change_opts->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING);
+
+s->copy_mode = MIRROR_COPY_MODE_WRITE_BLOCKING;
+}
+
 static const BlockJobDriver mirror_job_driver = {
 .job_driver = {
 .instance_size  = sizeof(MirrorBlockJob),
@@ -1260,6 +1281,7 @@ static const BlockJobDriver mirror_job_driver = {
 .cancel = mirror_cancel,
 },
 .drained_poll   = mirror_drained_poll,
+.change = mirror_change,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c6f31a9399..01427c259a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3044,6 +3044,17 @@
 { 'command': 'block-job-finalize', 'data': { 'id': 'str' },
   'allow-preconfig': true }
 
+##
+# @BlockJobChangeOptionsMirror:
+#
+# @copy-mode: Switch to this copy mode. Currenlty, only the switch
+# from 'background' to 'write-blocking' is implemented.
+#
+# Since: 8.2
+##
+{ 'struct': 'BlockJobChangeOptionsMirror',
+  'data': { 'copy-mode' : 'MirrorCopyMode' } }
+
 ##
 # @BlockJobChangeOptions:
 #
@@ -3058,7 +3069,7 @@
 { 'union': 'BlockJobChangeOptions',
   'base': { 'id': 'str', 'type': 'JobType' },
   'discriminator': 'type',
-  'data': {} }
+  'data': { 'mirror': 'BlockJobChangeOptionsMirror' } }
 
 ##
 # @block-job-change:
-- 
2.39.2

[PATCH 4/6] hw/acpi/pcihp: Clean up global variable shadowing in acpi_pcihp_init()

2023-10-09 Thread Philippe Mathieu-Daudé

Fix:

  hw/acpi/pcihp.c:499:36: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
   MemoryRegion *address_space_io,
 ^
  include/exec/address-spaces.h:35:21: note: previous declaration is here
  extern AddressSpace address_space_io;
  ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/acpi/pcihp.h | 2 +-
 hw/acpi/pcihp.c | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
index ef59810c17..ac21a95913 100644
--- a/include/hw/acpi/pcihp.h
+++ b/include/hw/acpi/pcihp.h
@@ -56,7 +56,7 @@ typedef struct AcpiPciHpState {
 } AcpiPciHpState;
 
 void acpi_pcihp_init(Object *owner, AcpiPciHpState *, PCIBus *root,
- MemoryRegion *address_space_io, uint16_t io_base);
+ MemoryRegion *io, uint16_t io_base);
 
 bool acpi_pcihp_is_hotpluggbale_bus(AcpiPciHpState *s, BusState *bus);
 void acpi_pcihp_device_pre_plug_cb(HotplugHandler *hotplug_dev,
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index cdd6f775a1..4f75c873e2 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -496,8 +496,7 @@ static const MemoryRegionOps acpi_pcihp_io_ops = {
 };
 
 void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, PCIBus *root_bus,
- MemoryRegion *address_space_io,
- uint16_t io_base)
+ MemoryRegion *io, uint16_t io_base)
 {
 s->io_len = ACPI_PCIHP_SIZE;
 s->io_base = io_base;
@@ -506,7 +505,7 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, 
PCIBus *root_bus,
 
 memory_region_init_io(&s->io, owner, &acpi_pcihp_io_ops, s,
   "acpi-pci-hotplug", s->io_len);
-memory_region_add_subregion(address_space_io, s->io_base, &s->io);
+memory_region_add_subregion(io, s->io_base, &s->io);
 
 object_property_add_uint16_ptr(owner, ACPI_PCIHP_IO_BASE_PROP, &s->io_base,
OBJ_PROP_FLAG_READ);
-- 
2.41.0

[PATCH v2 02/10] block/mirror: set actively_synced even after the job is ready

2023-10-09 Thread Fiona Ebner

In preparation to allow switching from background to active mode. This
ensures that setting actively_synced will not be missed when the
switch happens after the job is ready.

Signed-off-by: Fiona Ebner 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

No changes in v2.

 block/mirror.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 3cc0757a03..b764ad5108 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1074,9 +1074,9 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  * the target in a consistent state.
  */
 job_transition_to_ready(&s->common.job);
-if (s->copy_mode != MIRROR_COPY_MODE_BACKGROUND) {
-s->actively_synced = true;
-}
+}
+if (s->copy_mode != MIRROR_COPY_MODE_BACKGROUND) {
+s->actively_synced = true;
 }
 
 should_complete = s->should_complete ||
-- 
2.39.2

Re: [PATCH 6/6] hw/s390x: Clean up global variable shadowing in quiesce_powerdown_req()

2023-10-09 Thread Thomas Huth


On 09/10/2023 11.47, Philippe Mathieu-Daudé wrote:

Fix:

   hw/s390x/sclpquiesce.c:90:22: error: declaration shadows a variable in the 
global scope [-Werror,-Wshadow]
   QuiesceNotifier *qn = container_of(n, QuiesceNotifier, notifier);
^
   hw/s390x/sclpquiesce.c:86:3: note: previous declaration is here
   } qn;
 ^

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/s390x/sclpquiesce.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/sclpquiesce.c b/hw/s390x/sclpquiesce.c
index ce07b16884..a641089929 100644
--- a/hw/s390x/sclpquiesce.c
+++ b/hw/s390x/sclpquiesce.c
@@ -78,12 +78,10 @@ static const VMStateDescription vmstate_sclpquiesce = {
   }
  };
  
-typedef struct QuiesceNotifier QuiesceNotifier;

-
-static struct QuiesceNotifier {
+typedef struct QuiesceNotifier {
  Notifier notifier;
  SCLPEvent *event;
-} qn;
+} QuiesceNotifier;
  
  static void quiesce_powerdown_req(Notifier *n, void *opaque)

  {
@@ -97,6 +95,8 @@ static void quiesce_powerdown_req(Notifier *n, void *opaque)
  
  static int quiesce_init(SCLPEvent *event)

  {
+static QuiesceNotifier qn;
+
  qn.notifier.notify = quiesce_powerdown_req;
  qn.event = event;


Reviewed-by: Thomas Huth

Re: [PATCH 6/6] hw/s390x: Clean up global variable shadowing in quiesce_powerdown_req()

2023-10-09 Thread David Hildenbrand


On 09.10.23 11:47, Philippe Mathieu-Daudé wrote:

Fix:

   hw/s390x/sclpquiesce.c:90:22: error: declaration shadows a variable in the 
global scope [-Werror,-Wshadow]
   QuiesceNotifier *qn = container_of(n, QuiesceNotifier, notifier);
^
   hw/s390x/sclpquiesce.c:86:3: note: previous declaration is here
   } qn;
 ^

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/s390x/sclpquiesce.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/sclpquiesce.c b/hw/s390x/sclpquiesce.c
index ce07b16884..a641089929 100644
--- a/hw/s390x/sclpquiesce.c
+++ b/hw/s390x/sclpquiesce.c
@@ -78,12 +78,10 @@ static const VMStateDescription vmstate_sclpquiesce = {
   }
  };
  
-typedef struct QuiesceNotifier QuiesceNotifier;

-
-static struct QuiesceNotifier {
+typedef struct QuiesceNotifier {
  Notifier notifier;
  SCLPEvent *event;
-} qn;
+} QuiesceNotifier;
  
  static void quiesce_powerdown_req(Notifier *n, void *opaque)

  {
@@ -97,6 +95,8 @@ static void quiesce_powerdown_req(Notifier *n, void *opaque)
  
  static int quiesce_init(SCLPEvent *event)

  {
+static QuiesceNotifier qn;
+
  qn.notifier.notify = quiesce_powerdown_req;
  qn.event = event;
  


Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb

Re: [PATCH v7 05/15] python/qemu: rename command() to cmd()

2023-10-09 Thread Juan Quintela

Vladimir Sementsov-Ogievskiy  wrote:
> Use a shorter name. We are going to move in iotests from qmp() to
> command() where possible. But command() is longer than qmp() and don't
> look better. Let's rename.

I feel your pain O:-)

> You can simply grep for '\.command(' and for 'def command(' to check
> that everything is updated (command() in tests/docker/docker.py is
> unrelated).
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Daniel P. Berrangé 
> Reviewed-by: Eric Blake 
> [vsementsov: also update three occurrences in
>tests/avocado/machine_aspeed.py and keep r-b]

Reviewed-by: Juan Quintela

qemu-devel@nongnu.org

2023-10-09 Thread Alex Bennée

A lot of our vhost-user stubs are large chunks of boilerplate that do
(mostly) the same thing. This series continues the cleanups by
splitting the vhost-user-base and vhost-user-generic implementations.
After adding a new vq_size property the rng, gpio and i2c vhost-user
devices become simple specialisations of the common base defining the
ID, number of queues and potentially the config handling.

I've also added Manos' vhost-user-sound while I was at it.

Changes
---

I've dropped the F_TRANSPORT work from this series to keep this small
and ready to merge. The changes for F_TRANSPORT are a bit more
invasive and still need a bit of debugging but I wanted to get this
stuff merged now.

Alex Bennée (5):
  virtio: split into vhost-user-base and vhost-user-device
  hw/virtio: derive vhost-user-rng from vhost-user-base
  hw/virtio: derive vhost-user-gpio from vhost-user-base
  hw/virtio: derive vhost-user-i2c from vhost-user-base
  docs/system: add a basic enumeration of vhost-user devices

Manos Pitsidianakis (1):
  hw/virtio: add vhost-user-snd and virtio-snd-pci devices

 docs/system/devices/vhost-user-rng.rst |   2 +
 docs/system/devices/vhost-user.rst |  41 +++
 include/hw/virtio/vhost-user-base.h|  49 +++
 include/hw/virtio/vhost-user-gpio.h|  23 +-
 include/hw/virtio/vhost-user-i2c.h |  14 +-
 include/hw/virtio/vhost-user-rng.h |  11 +-
 include/hw/virtio/vhost-user-snd.h |  26 ++
 hw/virtio/vhost-user-base.c| 348 +
 hw/virtio/vhost-user-device-pci.c  |  10 +-
 hw/virtio/vhost-user-device.c  | 335 +---
 hw/virtio/vhost-user-gpio.c| 407 ++---
 hw/virtio/vhost-user-i2c.c | 272 +
 hw/virtio/vhost-user-rng.c | 278 ++---
 hw/virtio/vhost-user-snd-pci.c |  75 +
 hw/virtio/vhost-user-snd.c |  67 
 hw/virtio/Kconfig  |   5 +
 hw/virtio/meson.build  |  25 +-
 17 files changed, 705 insertions(+), 1283 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user-base.h
 create mode 100644 include/hw/virtio/vhost-user-snd.h
 create mode 100644 hw/virtio/vhost-user-base.c
 create mode 100644 hw/virtio/vhost-user-snd-pci.c
 create mode 100644 hw/virtio/vhost-user-snd.c

-- 
2.39.2

[PATCH v4 5/6] hw/virtio: add vhost-user-snd and virtio-snd-pci devices

2023-10-09 Thread Alex Bennée

From: Manos Pitsidianakis 

Tested with rust-vmm vhost-user-sound daemon:

RUST_LOG=trace cargo run --bin vhost-user-sound -- --socket /tmp/snd.sock 
--backend null

Invocation:

qemu-system-x86_64  \
-qmp unix:./qmp-sock,server,wait=off  \
-m 4096 \
-numa node,memdev=mem \
-object 
memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
-D qemu.log \
-d guest_errors,trace:\*snd\*,trace:\*sound\*,trace:\*vhost\* \
-chardev socket,id=vsnd,path=/tmp/snd.sock \
-device vhost-user-snd-pci,chardev=vsnd,id=snd \
/path/to/disk

[AJB: imported from 
https://github.com/epilys/qemu-virtio-snd/commit/54ae1cdd15fef2d88e9e387a175f099a38c636f4.patch]
Signed-off-by: Alex Bennée 

---
v1
  - import and test
---
 include/hw/virtio/vhost-user-snd.h | 26 +++
 hw/virtio/vhost-user-snd-pci.c | 75 ++
 hw/virtio/vhost-user-snd.c | 67 ++
 hw/virtio/Kconfig  |  5 ++
 hw/virtio/meson.build  |  3 ++
 5 files changed, 176 insertions(+)
 create mode 100644 include/hw/virtio/vhost-user-snd.h
 create mode 100644 hw/virtio/vhost-user-snd-pci.c
 create mode 100644 hw/virtio/vhost-user-snd.c

diff --git a/include/hw/virtio/vhost-user-snd.h 
b/include/hw/virtio/vhost-user-snd.h
new file mode 100644
index 00..a1627003a0
--- /dev/null
+++ b/include/hw/virtio/vhost-user-snd.h
@@ -0,0 +1,26 @@
+/*
+ * Vhost-user Sound virtio device
+ *
+ * Copyright (c) 2021 Mathieu Poirier 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef QEMU_VHOST_USER_SND_H
+#define QEMU_VHOST_USER_SND_H
+
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
+#include "hw/virtio/vhost-user-base.h"
+
+#define TYPE_VHOST_USER_SND "vhost-user-snd"
+OBJECT_DECLARE_SIMPLE_TYPE(VHostUserSound, VHOST_USER_SND)
+
+struct VHostUserSound {
+/*< private >*/
+VHostUserBase parent;
+/*< public >*/
+};
+
+#endif /* QEMU_VHOST_USER_SND_H */
diff --git a/hw/virtio/vhost-user-snd-pci.c b/hw/virtio/vhost-user-snd-pci.c
new file mode 100644
index 00..d61cfdae63
--- /dev/null
+++ b/hw/virtio/vhost-user-snd-pci.c
@@ -0,0 +1,75 @@
+/*
+ * Vhost-user Sound virtio device PCI glue
+ *
+ * Copyright (c) 2023 Manos Pitsidianakis 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/vhost-user-snd.h"
+#include "hw/virtio/virtio-pci.h"
+
+struct VHostUserSoundPCI {
+VirtIOPCIProxy parent_obj;
+VHostUserSound vdev;
+};
+
+typedef struct VHostUserSoundPCI VHostUserSoundPCI;
+
+#define TYPE_VHOST_USER_SND_PCI "vhost-user-snd-pci-base"
+
+DECLARE_INSTANCE_CHECKER(VHostUserSoundPCI, VHOST_USER_SND_PCI,
+ TYPE_VHOST_USER_SND_PCI)
+
+static Property vhost_user_snd_pci_properties[] = {
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vhost_user_snd_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+VHostUserSoundPCI *dev = VHOST_USER_SND_PCI(vpci_dev);
+DeviceState *vdev = DEVICE(&dev->vdev);
+
+vpci_dev->nvectors = 1;
+
+qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
+}
+
+static void vhost_user_snd_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+k->realize = vhost_user_snd_pci_realize;
+set_bit(DEVICE_CATEGORY_SOUND, dc->categories);
+device_class_set_props(dc, vhost_user_snd_pci_properties);
+pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+pcidev_k->device_id = 0; /* Set by virtio-pci based on virtio id */
+pcidev_k->revision = 0x00;
+pcidev_k->class_id = PCI_CLASS_MULTIMEDIA_AUDIO;
+}
+
+static void vhost_user_snd_pci_instance_init(Object *obj)
+{
+VHostUserSoundPCI *dev = VHOST_USER_SND_PCI(obj);
+
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VHOST_USER_SND);
+}
+
+static const VirtioPCIDeviceTypeInfo vhost_user_snd_pci_info = {
+.base_name = TYPE_VHOST_USER_SND_PCI,
+.non_transitional_name = "vhost-user-snd-pci",
+.instance_size = sizeof(VHostUserSoundPCI),
+.instance_init = vhost_user_snd_pci_instance_init,
+.class_init = vhost_user_snd_pci_class_init,
+};
+
+static void vhost_user_snd_pci_register(void)
+{
+virtio_pci_types_register(&vhost_user_snd_pci_info);
+}
+
+type_init(vhost_user_snd_pci_register);
diff --git a/hw/virtio/vhost-user-snd.c b/hw/virtio/vhost-user-snd.c
new file mode 100644
index 00..9a217543f8
--- /dev/null
+++ b/hw/virtio/vhost-user-snd.c
@@ -0,0 +1,67 @@
+/*
+ * Vhost-user snd virtio device
+ *
+ * Copyright (c) 2023 Manos Pitsidianakis 
+ *
+ * Simple wrapper of the generic vhost-user-device.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include

[PATCH v4 4/6] hw/virtio: derive vhost-user-i2c from vhost-user-base

2023-10-09 Thread Alex Bennée

Now we can take advantage of the new base class and make
vhost-user-i2c a much simpler boilerplate wrapper. Also as this
doesn't require any target specific hacks we only need to build the
stubs once.

Message-Id: <20230418162140.373219-13-alex.ben...@linaro.org>
Acked-by: Mark Cave-Ayland 
Signed-off-by: Alex Bennée 

---
v2
  - update to new inheritance scheme
  - move build to common code
v3
  - fix merge conflict in meson
  - style updates, remove duplicate includes
v4
  - use vqsize
---
 include/hw/virtio/vhost-user-i2c.h |  14 +-
 hw/virtio/vhost-user-i2c.c | 272 ++---
 hw/virtio/meson.build  |   5 +-
 3 files changed, 23 insertions(+), 268 deletions(-)

diff --git a/include/hw/virtio/vhost-user-i2c.h 
b/include/hw/virtio/vhost-user-i2c.h
index 0f7acd40e3..a8fcb108db 100644
--- a/include/hw/virtio/vhost-user-i2c.h
+++ b/include/hw/virtio/vhost-user-i2c.h
@@ -9,23 +9,17 @@
 #ifndef QEMU_VHOST_USER_I2C_H
 #define QEMU_VHOST_USER_I2C_H
 
+#include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
+#include "hw/virtio/vhost-user-base.h"
 
 #define TYPE_VHOST_USER_I2C "vhost-user-i2c-device"
+
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserI2C, VHOST_USER_I2C)
 
 struct VHostUserI2C {
-VirtIODevice parent;
-CharBackend chardev;
-struct vhost_virtqueue *vhost_vq;
-struct vhost_dev vhost_dev;
-VhostUserState vhost_user;
-VirtQueue *vq;
-bool connected;
+VHostUserBase parent;
 };
 
-/* Virtio Feature bits */
-#define VIRTIO_I2C_F_ZERO_LENGTH_REQUEST   0
-
 #endif /* QEMU_VHOST_USER_I2C_H */
diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c
index 4eef3f0633..a464f5e039 100644
--- a/hw/virtio/vhost-user-i2c.c
+++ b/hw/virtio/vhost-user-i2c.c
@@ -14,253 +14,22 @@
 #include "qemu/error-report.h"
 #include "standard-headers/linux/virtio_ids.h"
 
-static const int feature_bits[] = {
-VIRTIO_I2C_F_ZERO_LENGTH_REQUEST,
-VIRTIO_F_RING_RESET,
-VHOST_INVALID_FEATURE_BIT
+static Property vi2c_properties[] = {
+DEFINE_PROP_CHR("chardev", VHostUserBase, chardev),
+DEFINE_PROP_END_OF_LIST(),
 };
 
-static void vu_i2c_start(VirtIODevice *vdev)
-{
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-int ret, i;
-
-if (!k->set_guest_notifiers) {
-error_report("binding does not support guest notifiers");
-return;
-}
-
-ret = vhost_dev_enable_notifiers(&i2c->vhost_dev, vdev);
-if (ret < 0) {
-error_report("Error enabling host notifiers: %d", -ret);
-return;
-}
-
-ret = k->set_guest_notifiers(qbus->parent, i2c->vhost_dev.nvqs, true);
-if (ret < 0) {
-error_report("Error binding guest notifier: %d", -ret);
-goto err_host_notifiers;
-}
-
-i2c->vhost_dev.acked_features = vdev->guest_features;
-
-ret = vhost_dev_start(&i2c->vhost_dev, vdev, true);
-if (ret < 0) {
-error_report("Error starting vhost-user-i2c: %d", -ret);
-goto err_guest_notifiers;
-}
-
-/*
- * guest_notifier_mask/pending not used yet, so just unmask
- * everything here. virtio-pci will do the right thing by
- * enabling/disabling irqfd.
- */
-for (i = 0; i < i2c->vhost_dev.nvqs; i++) {
-vhost_virtqueue_mask(&i2c->vhost_dev, vdev, i, false);
-}
-
-return;
-
-err_guest_notifiers:
-k->set_guest_notifiers(qbus->parent, i2c->vhost_dev.nvqs, false);
-err_host_notifiers:
-vhost_dev_disable_notifiers(&i2c->vhost_dev, vdev);
-}
-
-static void vu_i2c_stop(VirtIODevice *vdev)
-{
-VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-int ret;
-
-if (!k->set_guest_notifiers) {
-return;
-}
-
-vhost_dev_stop(&i2c->vhost_dev, vdev, true);
-
-ret = k->set_guest_notifiers(qbus->parent, i2c->vhost_dev.nvqs, false);
-if (ret < 0) {
-error_report("vhost guest notifier cleanup failed: %d", ret);
-return;
-}
-
-vhost_dev_disable_notifiers(&i2c->vhost_dev, vdev);
-}
-
-static void vu_i2c_set_status(VirtIODevice *vdev, uint8_t status)
-{
-VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-bool should_start = virtio_device_should_start(vdev, status);
-
-if (vhost_dev_is_started(&i2c->vhost_dev) == should_start) {
-return;
-}
-
-if (should_start) {
-vu_i2c_start(vdev);
-} else {
-vu_i2c_stop(vdev);
-}
-}
-
-static uint64_t vu_i2c_get_features(VirtIODevice *vdev,
-uint64_t requested_features, Error **errp)
-{
-VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-
-virtio_add_feature(&requested_features, VIRTIO_I2C_F_ZERO_LENGTH_REQUEST);
-return vhost_get_features(&i2c->vhost_dev, feature_bits, 
requested_features);
-}
-
-static void

[PATCH v4 2/6] hw/virtio: derive vhost-user-rng from vhost-user-base

2023-10-09 Thread Alex Bennée

Now we can take advantage of our new base class and make
vhost-user-rng a much simpler boilerplate wrapper. Also as this
doesn't require any target specific hacks we only need to build the
stubs once.

Message-Id: <20230418162140.373219-10-alex.ben...@linaro.org>
Acked-by: Mark Cave-Ayland 
Signed-off-by: Alex Bennée 

---
v2
  - new derivation layout
  - move directly to softmmu_virtio_ss
v3
  - use vqsize
---
 include/hw/virtio/vhost-user-rng.h |  11 +-
 hw/virtio/vhost-user-rng.c | 278 +++--
 hw/virtio/meson.build  |  11 +-
 3 files changed, 31 insertions(+), 269 deletions(-)

diff --git a/include/hw/virtio/vhost-user-rng.h 
b/include/hw/virtio/vhost-user-rng.h
index ddd9f01eea..6cffe28807 100644
--- a/include/hw/virtio/vhost-user-rng.h
+++ b/include/hw/virtio/vhost-user-rng.h
@@ -12,21 +12,14 @@
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
-#include "chardev/char-fe.h"
+#include "hw/virtio/vhost-user-base.h"
 
 #define TYPE_VHOST_USER_RNG "vhost-user-rng"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserRNG, VHOST_USER_RNG)
 
 struct VHostUserRNG {
 /*< private >*/
-VirtIODevice parent;
-CharBackend chardev;
-struct vhost_virtqueue *vhost_vq;
-struct vhost_dev vhost_dev;
-VhostUserState vhost_user;
-VirtQueue *req_vq;
-bool connected;
-
+VHostUserBase parent;
 /*< public >*/
 };
 
diff --git a/hw/virtio/vhost-user-rng.c b/hw/virtio/vhost-user-rng.c
index efc54cd3fb..01879c863d 100644
--- a/hw/virtio/vhost-user-rng.c
+++ b/hw/virtio/vhost-user-rng.c
@@ -3,7 +3,7 @@
  *
  * Copyright (c) 2021 Mathieu Poirier 
  *
- * Implementation seriously tailored on vhost-user-i2c.c
+ * Simple wrapper of the generic vhost-user-device.
  *
  * SPDX-License-Identifier: GPL-2.0-or-later
  */
@@ -13,281 +13,47 @@
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/vhost-user-rng.h"
-#include "qemu/error-report.h"
 #include "standard-headers/linux/virtio_ids.h"
 
-static const int feature_bits[] = {
-VIRTIO_F_RING_RESET,
-VHOST_INVALID_FEATURE_BIT
-};
-
-static void vu_rng_start(VirtIODevice *vdev)
-{
-VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-int ret;
-int i;
-
-if (!k->set_guest_notifiers) {
-error_report("binding does not support guest notifiers");
-return;
-}
-
-ret = vhost_dev_enable_notifiers(&rng->vhost_dev, vdev);
-if (ret < 0) {
-error_report("Error enabling host notifiers: %d", -ret);
-return;
-}
-
-ret = k->set_guest_notifiers(qbus->parent, rng->vhost_dev.nvqs, true);
-if (ret < 0) {
-error_report("Error binding guest notifier: %d", -ret);
-goto err_host_notifiers;
-}
-
-rng->vhost_dev.acked_features = vdev->guest_features;
-ret = vhost_dev_start(&rng->vhost_dev, vdev, true);
-if (ret < 0) {
-error_report("Error starting vhost-user-rng: %d", -ret);
-goto err_guest_notifiers;
-}
-
-/*
- * guest_notifier_mask/pending not used yet, so just unmask
- * everything here. virtio-pci will do the right thing by
- * enabling/disabling irqfd.
- */
-for (i = 0; i < rng->vhost_dev.nvqs; i++) {
-vhost_virtqueue_mask(&rng->vhost_dev, vdev, i, false);
-}
-
-return;
-
-err_guest_notifiers:
-k->set_guest_notifiers(qbus->parent, rng->vhost_dev.nvqs, false);
-err_host_notifiers:
-vhost_dev_disable_notifiers(&rng->vhost_dev, vdev);
-}
-
-static void vu_rng_stop(VirtIODevice *vdev)
-{
-VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-int ret;
-
-if (!k->set_guest_notifiers) {
-return;
-}
-
-vhost_dev_stop(&rng->vhost_dev, vdev, true);
-
-ret = k->set_guest_notifiers(qbus->parent, rng->vhost_dev.nvqs, false);
-if (ret < 0) {
-error_report("vhost guest notifier cleanup failed: %d", ret);
-return;
-}
-
-vhost_dev_disable_notifiers(&rng->vhost_dev, vdev);
-}
-
-static void vu_rng_set_status(VirtIODevice *vdev, uint8_t status)
-{
-VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-bool should_start = virtio_device_should_start(vdev, status);
-
-if (vhost_dev_is_started(&rng->vhost_dev) == should_start) {
-return;
-}
-
-if (should_start) {
-vu_rng_start(vdev);
-} else {
-vu_rng_stop(vdev);
-}
-}
-
-static uint64_t vu_rng_get_features(VirtIODevice *vdev,
-uint64_t requested_features, Error **errp)
-{
-VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-
-return vhost_get_features(&rng->vhost_dev, feature_bits,
-  requested_features);
-}
-
-static void vu_rng_handle_output(VirtIODevice *vdev, VirtQueue *vq)
-{
-/*
-

[PATCH v4 3/6] hw/virtio: derive vhost-user-gpio from vhost-user-base

2023-10-09 Thread Alex Bennée

Now the new base class supports config handling we can take advantage
and make vhost-user-gpio a much simpler boilerplate wrapper. Also as
this doesn't require any target specific hacks we only need to build
the stubs once.

Message-Id: <20230418162140.373219-12-alex.ben...@linaro.org>
Signed-off-by: Alex Bennée 
Acked-by: Mark Cave-Ayland 

---
v2
  - use new vhost-user-base
  - move build to common code
v3
  - fix inadvertent double link
v4
  - merge conflict
  - update includes
---
 include/hw/virtio/vhost-user-gpio.h |  23 +-
 hw/virtio/vhost-user-gpio.c | 407 ++--
 hw/virtio/meson.build   |   5 +-
 3 files changed, 22 insertions(+), 413 deletions(-)

diff --git a/include/hw/virtio/vhost-user-gpio.h 
b/include/hw/virtio/vhost-user-gpio.h
index a9d3f9b049..5201d5f072 100644
--- a/include/hw/virtio/vhost-user-gpio.h
+++ b/include/hw/virtio/vhost-user-gpio.h
@@ -12,33 +12,14 @@
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
-#include "standard-headers/linux/virtio_gpio.h"
-#include "chardev/char-fe.h"
+#include "hw/virtio/vhost-user-base.h"
 
 #define TYPE_VHOST_USER_GPIO "vhost-user-gpio-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserGPIO, VHOST_USER_GPIO);
 
 struct VHostUserGPIO {
 /*< private >*/
-VirtIODevice parent_obj;
-CharBackend chardev;
-struct virtio_gpio_config config;
-struct vhost_virtqueue *vhost_vqs;
-struct vhost_dev vhost_dev;
-VhostUserState vhost_user;
-VirtQueue *command_vq;
-VirtQueue *interrupt_vq;
-/**
- * There are at least two steps of initialization of the
- * vhost-user device. The first is a "connect" step and
- * second is a "start" step. Make a separation between
- * those initialization phases by using two fields.
- *
- * @connected: see vu_gpio_connect()/vu_gpio_disconnect()
- * @started_vu: see vu_gpio_start()/vu_gpio_stop()
- */
-bool connected;
-bool started_vu;
+VHostUserBase parent;
 /*< public >*/
 };
 
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index 3d7fae3984..9f37c25415 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -11,388 +11,25 @@
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/vhost-user-gpio.h"
-#include "qemu/error-report.h"
 #include "standard-headers/linux/virtio_ids.h"
-#include "trace.h"
+#include "standard-headers/linux/virtio_gpio.h"
 
-#define REALIZE_CONNECTION_RETRIES 3
-#define VHOST_NVQS 2
-
-/* Features required from VirtIO */
-static const int feature_bits[] = {
-VIRTIO_F_VERSION_1,
-VIRTIO_F_NOTIFY_ON_EMPTY,
-VIRTIO_RING_F_INDIRECT_DESC,
-VIRTIO_RING_F_EVENT_IDX,
-VIRTIO_GPIO_F_IRQ,
-VIRTIO_F_RING_RESET,
-VHOST_INVALID_FEATURE_BIT
-};
-
-static void vu_gpio_get_config(VirtIODevice *vdev, uint8_t *config)
-{
-VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
-
-memcpy(config, &gpio->config, sizeof(gpio->config));
-}
-
-static int vu_gpio_config_notifier(struct vhost_dev *dev)
-{
-VHostUserGPIO *gpio = VHOST_USER_GPIO(dev->vdev);
-
-memcpy(dev->vdev->config, &gpio->config, sizeof(gpio->config));
-virtio_notify_config(dev->vdev);
-
-return 0;
-}
-
-const VhostDevConfigOps gpio_ops = {
-.vhost_dev_config_notifier = vu_gpio_config_notifier,
+static Property vgpio_properties[] = {
+DEFINE_PROP_CHR("chardev", VHostUserBase, chardev),
+DEFINE_PROP_END_OF_LIST(),
 };
 
-static int vu_gpio_start(VirtIODevice *vdev)
-{
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
-struct vhost_dev *vhost_dev = &gpio->vhost_dev;
-int ret, i;
-
-if (!k->set_guest_notifiers) {
-error_report("binding does not support guest notifiers");
-return -ENOSYS;
-}
-
-ret = vhost_dev_enable_notifiers(vhost_dev, vdev);
-if (ret < 0) {
-error_report("Error enabling host notifiers: %d", ret);
-return ret;
-}
-
-ret = k->set_guest_notifiers(qbus->parent, vhost_dev->nvqs, true);
-if (ret < 0) {
-error_report("Error binding guest notifier: %d", ret);
-goto err_host_notifiers;
-}
-
-/*
- * Before we start up we need to ensure we have the final feature
- * set needed for the vhost configuration. The backend may also
- * apply backend_features when the feature set is sent.
- */
-vhost_ack_features(&gpio->vhost_dev, feature_bits, vdev->guest_features);
-
-ret = vhost_dev_start(&gpio->vhost_dev, vdev, false);
-if (ret < 0) {
-error_report("Error starting vhost-user-gpio: %d", ret);
-goto err_guest_notifiers;
-}
-gpio->started_vu = true;
-
-/*
- * guest_notifier_mask/pending not used yet, so just unmask
- * everything here. virtio-pci will do the right thing by
- * enabling/disabling irqfd

[PATCH v4 6/6] docs/system: add a basic enumeration of vhost-user devices

2023-10-09 Thread Alex Bennée

Make it clear the vhost-user-device is intended for expert use only.

Signed-off-by: Alex Bennée 

---
v2
  - make clear vhost-user-device for expert use
---
 docs/system/devices/vhost-user-rng.rst |  2 ++
 docs/system/devices/vhost-user.rst | 41 ++
 2 files changed, 43 insertions(+)

diff --git a/docs/system/devices/vhost-user-rng.rst 
b/docs/system/devices/vhost-user-rng.rst
index a145d4105c..ead1405326 100644
--- a/docs/system/devices/vhost-user-rng.rst
+++ b/docs/system/devices/vhost-user-rng.rst
@@ -1,3 +1,5 @@
+.. _vhost_user_rng:
+
 QEMU vhost-user-rng - RNG emulation
 ===
 
diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index a80e95a48a..0f9eec3f00 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -15,6 +15,47 @@ to the guest. The code is mostly boilerplate although each 
device has
 a ``chardev`` option which specifies the ID of the ``--chardev``
 device that connects via a socket to the vhost-user *daemon*.
 
+Each device will have an virtio-mmio and virtio-pci variant. See your
+platform details for what sort of virtio bus to use.
+
+.. list-table:: vhost-user devices
+  :widths: 20 20 60
+  :header-rows: 1
+
+  * - Device
+- Type
+- Notes
+  * - vhost-user-device
+- Generic Development Device
+- You must manually specify ``virtio-id`` and the correct ``num_vqs``. 
Intended for expert use.
+  * - vhost-user-blk
+- Block storage
+-
+  * - vhost-user-fs
+- File based storage driver
+- See https://gitlab.com/virtio-fs/virtiofsd
+  * - vhost-user-scsi
+- SCSI based storage
+- See contrib/vhost-user/scsi
+  * - vhost-user-gpio
+- Proxy gpio pins to host
+- See https://github.com/rust-vmm/vhost-device
+  * - vhost-user-i2c
+- Proxy i2c devices to host
+- See https://github.com/rust-vmm/vhost-device
+  * - vhost-user-input
+- Generic input driver
+- See contrib/vhost-user-input
+  * - vhost-user-rng
+- Entropy driver
+- :ref:`vhost_user_rng`
+  * - vhost-user-gpu
+- GPU driver
+-
+  * - vhost-user-vsock
+- Socket based communication
+- See https://github.com/rust-vmm/vhost-device
+
 vhost-user daemon
 =
 
-- 
2.39.2

[PATCH v4 1/6] virtio: split into vhost-user-base and vhost-user-device

2023-10-09 Thread Alex Bennée

Lets keep a cleaner split between the base class and the derived
vhost-user-device which we can use for generic vhost-user stubs. This
includes an update to introduce the vq_size property so the number of
entries in a virtq can be defined.

Signed-off-by: Alex Bennée 

---
v1
  - merge and re-base, reset testing/review tags
---
 include/hw/virtio/vhost-user-base.h |  49 
 hw/virtio/vhost-user-base.c | 348 
 hw/virtio/vhost-user-device-pci.c   |  10 +-
 hw/virtio/vhost-user-device.c   | 335 +-
 hw/virtio/meson.build   |   1 +
 5 files changed, 410 insertions(+), 333 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user-base.h
 create mode 100644 hw/virtio/vhost-user-base.c

diff --git a/include/hw/virtio/vhost-user-base.h 
b/include/hw/virtio/vhost-user-base.h
new file mode 100644
index 00..cad377468b
--- /dev/null
+++ b/include/hw/virtio/vhost-user-base.h
@@ -0,0 +1,49 @@
+/*
+ * Vhost-user generic virtio device
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef QEMU_VHOST_USER_BASE_H
+#define QEMU_VHOST_USER_BASE_H
+
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
+
+#define TYPE_VHOST_USER_BASE "vhost-user-base"
+
+OBJECT_DECLARE_TYPE(VHostUserBase, VHostUserBaseClass, VHOST_USER_BASE)
+
+struct VHostUserBase {
+VirtIODevice parent;
+
+/* Properties */
+CharBackend chardev;
+uint16_t virtio_id;
+uint32_t num_vqs;
+uint32_t vq_size; /* can't exceed VIRTIO_QUEUE_MAX */
+uint32_t config_size;
+/* State tracking */
+VhostUserState vhost_user;
+struct vhost_virtqueue *vhost_vq;
+struct vhost_dev vhost_dev;
+GPtrArray *vqs;
+bool connected;
+};
+
+/*
+ * Needed so we can use the base realize after specialisation
+ * tweaks
+ */
+struct VHostUserBaseClass {
+VirtioDeviceClass parent_class;
+
+DeviceRealize parent_realize;
+};
+
+
+#define TYPE_VHOST_USER_DEVICE "vhost-user-device"
+
+#endif /* QEMU_VHOST_USER_BASE_H */
diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
new file mode 100644
index 00..a8b811c394
--- /dev/null
+++ b/hw/virtio/vhost-user-base.c
@@ -0,0 +1,348 @@
+/*
+ * Base vhost-user-base implementation. This can be used to derive a
+ * more fully specified vhost-user backend either generically (see
+ * vhost-user-device) or via a specific stub for a device which
+ * encapsulates some fixed parameters.
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ * Author: Alex Bennée 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/vhost-user-base.h"
+#include "qemu/error-report.h"
+
+static void vub_start(VirtIODevice *vdev)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+VHostUserBase *vub = VHOST_USER_BASE(vdev);
+int ret, i;
+
+if (!k->set_guest_notifiers) {
+error_report("binding does not support guest notifiers");
+return;
+}
+
+ret = vhost_dev_enable_notifiers(&vub->vhost_dev, vdev);
+if (ret < 0) {
+error_report("Error enabling host notifiers: %d", -ret);
+return;
+}
+
+ret = k->set_guest_notifiers(qbus->parent, vub->vhost_dev.nvqs, true);
+if (ret < 0) {
+error_report("Error binding guest notifier: %d", -ret);
+goto err_host_notifiers;
+}
+
+vub->vhost_dev.acked_features = vdev->guest_features;
+
+ret = vhost_dev_start(&vub->vhost_dev, vdev, true);
+if (ret < 0) {
+error_report("Error starting vhost-user-base: %d", -ret);
+goto err_guest_notifiers;
+}
+
+/*
+ * guest_notifier_mask/pending not used yet, so just unmask
+ * everything here. virtio-pci will do the right thing by
+ * enabling/disabling irqfd.
+ */
+for (i = 0; i < vub->vhost_dev.nvqs; i++) {
+vhost_virtqueue_mask(&vub->vhost_dev, vdev, i, false);
+}
+
+return;
+
+err_guest_notifiers:
+k->set_guest_notifiers(qbus->parent, vub->vhost_dev.nvqs, false);
+err_host_notifiers:
+vhost_dev_disable_notifiers(&vub->vhost_dev, vdev);
+}
+
+static void vub_stop(VirtIODevice *vdev)
+{
+VHostUserBase *vub = VHOST_USER_BASE(vdev);
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+int ret;
+
+if (!k->set_guest_notifiers) {
+return;
+}
+
+vhost_dev_stop(&vub->vhost_dev, vdev, true);
+
+ret = k->set_guest_notifiers(qbus->parent, vub->vhost_dev.nvqs, false);
+if (ret < 0) {
+error_report("vhost guest notifier cleanup failed: %d", ret);
+return;
+}
+
+vhost_dev_disable_notifiers(&vub->vhost_dev, vdev);
+}
+
+static void vub_set_status(VirtIODevice *vdev, uint8_t status)
+{
+VHostUserBase *vub = VHOST_USER_BASE(

[PATCH 01/10] system/qtest: Clean up global variable shadowing in qtest_server_init()

2023-10-09 Thread Philippe Mathieu-Daudé

Rename the variable to fix:

  softmmu/qtest.c:869:13: error: declaration shadows a variable in the global 
scope [-Werror,-Wshadow]
  Object *qtest;
  ^
  softmmu/qtest.c:53:15: note: previous declaration is here
  static QTest *qtest;
^

Signed-off-by: Philippe Mathieu-Daudé 
---
 softmmu/qtest.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/softmmu/qtest.c b/softmmu/qtest.c
index 35b643a274..7964f0b248 100644
--- a/softmmu/qtest.c
+++ b/softmmu/qtest.c
@@ -866,7 +866,7 @@ void qtest_server_init(const char *qtest_chrdev, const char 
*qtest_log, Error **
 {
 ERRP_GUARD();
 Chardev *chr;
-Object *qtest;
+Object *qobj;
 
 chr = qemu_chr_new("qtest", qtest_chrdev, NULL);
 if (chr == NULL) {
@@ -875,18 +875,18 @@ void qtest_server_init(const char *qtest_chrdev, const 
char *qtest_log, Error **
 return;
 }
 
-qtest = object_new(TYPE_QTEST);
-object_property_set_str(qtest, "chardev", chr->label, &error_abort);
+qobj = object_new(TYPE_QTEST);
+object_property_set_str(qobj, "chardev", chr->label, &error_abort);
 if (qtest_log) {
-object_property_set_str(qtest, "log", qtest_log, &error_abort);
+object_property_set_str(qobj, "log", qtest_log, &error_abort);
 }
-object_property_add_child(qdev_get_machine(), "qtest", qtest);
-user_creatable_complete(USER_CREATABLE(qtest), errp);
+object_property_add_child(qdev_get_machine(), "qtest", qobj);
+user_creatable_complete(USER_CREATABLE(qobj), errp);
 if (*errp) {
-object_unparent(qtest);
+object_unparent(qobj);
 }
 object_unref(OBJECT(chr));
-object_unref(qtest);
+object_unref(qobj);
 }
 
 static bool qtest_server_start(QTest *q, Error **errp)
-- 
2.41.0

[PATCH 09/10] tests/aio-multithread: Clean up global variable shadowing

2023-10-09 Thread Philippe Mathieu-Daudé

Rename the argument to avoid:

  tests/unit/test-aio-multithread.c:226:37: error: declaration shadows a 
variable in the global scope [-Werror,-Wshadow]
  static void test_multi_co_mutex(int threads, int seconds)
  ^
  tests/unit/test-aio-multithread.c:401:34: error: declaration shadows a 
variable in the global scope [-Werror,-Wshadow]
  static void test_multi_mutex(int threads, int seconds)
   ^
  tests/unit/test-aio-multithread.c:24:18: note: previous declaration is here
  static IOThread *threads[NUM_CONTEXTS];
   ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/unit/test-aio-multithread.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tests/unit/test-aio-multithread.c 
b/tests/unit/test-aio-multithread.c
index 08d4570ccb..d587f20667 100644
--- a/tests/unit/test-aio-multithread.c
+++ b/tests/unit/test-aio-multithread.c
@@ -223,7 +223,7 @@ static void coroutine_fn test_multi_co_mutex_entry(void 
*opaque)
 qatomic_dec(&running);
 }
 
-static void test_multi_co_mutex(int threads, int seconds)
+static void test_multi_co_mutex(unsigned ctx_num, int seconds)
 {
 int i;
 
@@ -233,9 +233,9 @@ static void test_multi_co_mutex(int threads, int seconds)
 now_stopping = false;
 
 create_aio_contexts();
-assert(threads <= NUM_CONTEXTS);
-running = threads;
-for (i = 0; i < threads; i++) {
+assert(ctx_num <= NUM_CONTEXTS);
+running = ctx_num;
+for (i = 0; i < ctx_num; i++) {
 Coroutine *co1 = qemu_coroutine_create(test_multi_co_mutex_entry, 
NULL);
 aio_co_schedule(ctx[i], co1);
 }
@@ -398,7 +398,7 @@ static void test_multi_mutex_entry(void *opaque)
 qatomic_dec(&running);
 }
 
-static void test_multi_mutex(int threads, int seconds)
+static void test_multi_mutex(unsigned ctx_num, int seconds)
 {
 int i;
 
@@ -408,9 +408,9 @@ static void test_multi_mutex(int threads, int seconds)
 now_stopping = false;
 
 create_aio_contexts();
-assert(threads <= NUM_CONTEXTS);
-running = threads;
-for (i = 0; i < threads; i++) {
+assert(ctx_num <= NUM_CONTEXTS);
+running = ctx_num;
+for (i = 0; i < ctx_num; i++) {
 Coroutine *co1 = qemu_coroutine_create(test_multi_mutex_entry, NULL);
 aio_co_schedule(ctx[i], co1);
 }
-- 
2.41.0

[PATCH 02/10] tests/throttle: Clean up global variable shadowing

2023-10-09 Thread Philippe Mathieu-Daudé

Follow all other tests pattern from this file, use the
global 'cfg' variable to fix:

  tests/unit/test-throttle.c:621:20: error: declaration shadows a variable in 
the global scope [-Werror,-Wshadow]
  ThrottleConfig cfg;
 ^
  tests/unit/test-throttle.c:28:23: note: previous declaration is here
  static ThrottleConfig cfg;
^

Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/unit/test-throttle.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
index ac35d65d19..2146cfacd3 100644
--- a/tests/unit/test-throttle.c
+++ b/tests/unit/test-throttle.c
@@ -618,7 +618,6 @@ static bool do_test_accounting(bool is_ops, /* are we 
testing bps or ops */
  { THROTTLE_OPS_TOTAL,
THROTTLE_OPS_READ,
THROTTLE_OPS_WRITE, } };
-ThrottleConfig cfg;
 BucketType index;
 int i;
 
-- 
2.41.0

1 2 3 4 >

1 - 100 of 366 matches

Mail list logo