date:20230206

Re: [PATCH 0/5] hw/ppc: Set QDev properties using QDev API (part 2/3)

2023-02-06 Thread Cédric Le Goater


On 2/3/23 22:16, Philippe Mathieu-Daudé wrote:

part 1 [*] cover:
--
QEMU provides the QOM API for core objects.
Devices are modelled on top of QOM as QDev objects.

There is no point in using the lower level QOM API with
QDev; it makes the code more complex and harder to review.

I first converted all the calls using errp=&error_abort or
&errp=NULL, then noticed the other uses weren't really
consistent.


6/8 years ago, we converted models to QOM, supposedly because the qdev
interface was legacy and QOM was the new way. That's not true anymore ?

That said, I am ok with changes, even for the best practices. I would
like to know how to keep track. Do we have a model skeleton/reference ?

Thanks,

C.


A QDev property defined with the DEFINE_PROP_xxx() macros
is always available, thus can't fail. When using hot-plug
devices, we only need to check for optional properties
registered at runtime with the object_property_add_XXX()
API. Some are even always registered in device instance_init.
--

In this series PPC hw is converted. Only one call site in PNV
forwards the Error* argument and its conversion is justified.

Based-on: <20230203180914.49112-1-phi...@linaro.org>
(in particular [PATCH 02/19] hw/qdev: Introduce qdev_prop_set_link():
  https://lore.kernel.org/qemu-devel/20230203180914.49112-3-phi...@linaro.org/)

[*] https://lore.kernel.org/qemu-devel/20230203180914.49112-1-phi...@linaro.org/

Philippe Mathieu-Daudé (5):
   hw/misc/macio: Set QDev properties using QDev API
   hw/pci-host/raven: Set QDev properties using QDev API
   hw/ppc/ppc4xx: Set QDev properties using QDev API
   hw/ppc/spapr: Set QDev properties using QDev API
   hw/ppc/pnv: Set QDev properties using QDev API

  hw/intc/pnv_xive.c | 11 --
  hw/intc/pnv_xive2.c| 15 +-
  hw/intc/spapr_xive.c   | 11 --
  hw/intc/xics.c |  4 ++--
  hw/intc/xive.c |  4 ++--
  hw/misc/macio/macio.c  |  6 ++
  hw/pci-host/pnv_phb3.c |  9 +++--
  hw/pci-host/pnv_phb4.c |  4 ++--
  hw/pci-host/pnv_phb4_pec.c | 10 +++---
  hw/pci-host/raven.c|  6 ++
  hw/ppc/e500.c  |  3 +--
  hw/ppc/pnv.c   | 41 --
  hw/ppc/pnv_psi.c   | 10 +++---
  hw/ppc/ppc405_boards.c |  6 ++
  hw/ppc/ppc405_uc.c |  6 +++---
  hw/ppc/ppc440_bamboo.c |  3 +--
  hw/ppc/ppc4xx_devs.c   |  2 +-
  hw/ppc/sam460ex.c  |  5 ++---
  hw/ppc/spapr_irq.c |  8 +++-
  19 files changed, 62 insertions(+), 102 deletions(-)

Re: [PATCH 2/3] hw/isa/vt82c686: Allow PM controller to switch to ACPI mode

2023-02-06 Thread Philippe Mathieu-Daudé


On 31/1/23 15:54, BALATON Zoltan wrote:

On Sun, 29 Jan 2023, Bernhard Beschow wrote:

Adds missing functionality the real hardware supports.

Signed-off-by: Bernhard Beschow 
---
hw/isa/vt82c686.c | 18 +-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 2189be6f20..b0765d4ed8 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -37,6 +37,9 @@
#include "qemu/timer.h"
#include "trace.h"


Why does 
acpi_pm1_cnt_update() take two arguments for a bool value? Can these be 
both true or false at the same time?


No, this is a one-bit so boolean is enough...

Maybe unfinished refactor from commit eaba51c573 ("acpi, acpi_piix,
vt82c686: factor out PM1_CNT logic")?

Re: [PATCH 2/3] hw/isa/vt82c686: Allow PM controller to switch to ACPI mode

2023-02-06 Thread Philippe Mathieu-Daudé


Forgot to Cc ACPI maintainers.

On 6/2/23 09:00, Philippe Mathieu-Daudé wrote:

On 31/1/23 15:54, BALATON Zoltan wrote:

On Sun, 29 Jan 2023, Bernhard Beschow wrote:

Adds missing functionality the real hardware supports.

Signed-off-by: Bernhard Beschow 
---
hw/isa/vt82c686.c | 18 +-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 2189be6f20..b0765d4ed8 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -37,6 +37,9 @@
#include "qemu/timer.h"
#include "trace.h"


Why does acpi_pm1_cnt_update() take two arguments for a bool value? 
Can these be both true or false at the same time?


No, this is a one-bit so boolean is enough...

Maybe unfinished refactor from commit eaba51c573 ("acpi, acpi_piix,
vt82c686: factor out PM1_CNT logic")?

Re: [PATCH 5/5] hw/ppc/pnv: Set QDev properties using QDev API

2023-02-06 Thread Cédric Le Goater


On 2/3/23 22:16, Philippe Mathieu-Daudé wrote:

No need to use the low-level QOM API when an object
inherits from QDev. Directly use the QDev API to set
its properties.

One call in pnv_psi_power8_realize() propagate the
Error* argument:

   if (!object_property_set_int(OBJECT(ics), "nr-irqs",
PSI_NUM_INTERRUPTS, errp)) {
   return;
   }

TYPE_ICS "nr-irqs" is declared in ics_properties[],
itself always registered in ics_class_init(); so converting
this errp to &error_abort is safe.


yes. That's ok. I think this model was one of the first I converted to QOM.



All other calls use either errp=&error_abort or &error_fatal,
so converting to the QDev API is almost a no-op (QDev API
always uses &error_abort).

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
  hw/intc/pnv_xive.c | 11 --
  hw/intc/pnv_xive2.c| 15 +-
  hw/pci-host/pnv_phb3.c |  9 +++--
  hw/pci-host/pnv_phb4.c |  4 ++--
  hw/pci-host/pnv_phb4_pec.c | 10 +++---
  hw/ppc/pnv.c   | 41 --
  hw/ppc/pnv_psi.c   | 10 +++---
  7 files changed, 37 insertions(+), 63 deletions(-)

diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
index 622f9d28b7..ccc1ea5380 100644
--- a/hw/intc/pnv_xive.c
+++ b/hw/intc/pnv_xive.c
@@ -1857,17 +1857,14 @@ static void pnv_xive_realize(DeviceState *dev, Error 
**errp)
   * resized dynamically when the controller is configured by the FW
   * to limit accesses to resources not provisioned.
   */
-object_property_set_int(OBJECT(xsrc), "nr-irqs", PNV_XIVE_NR_IRQS,
-&error_fatal);
-object_property_set_link(OBJECT(xsrc), "xive", OBJECT(xive), &error_abort);
+qdev_prop_set_uint32(DEVICE(xsrc), "nr-irqs", PNV_XIVE_NR_IRQS);
+qdev_prop_set_link(DEVICE(xsrc), "xive", OBJECT(xive));
  if (!qdev_realize(DEVICE(xsrc), NULL, errp)) {
  return;
  }
  
-object_property_set_int(OBJECT(end_xsrc), "nr-ends", PNV_XIVE_NR_ENDS,

-&error_fatal);
-object_property_set_link(OBJECT(end_xsrc), "xive", OBJECT(xive),
- &error_abort);
+qdev_prop_set_uint32(DEVICE(end_xsrc), "nr-ends", PNV_XIVE_NR_ENDS);
+qdev_prop_set_link(DEVICE(end_xsrc), "xive", OBJECT(xive));
  if (!qdev_realize(DEVICE(end_xsrc), NULL, errp)) {
  return;
  }
diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index 7176d70234..d7695f65e7 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -1821,22 +1821,17 @@ static void pnv_xive2_realize(DeviceState *dev, Error 
**errp)
   * resized dynamically when the controller is configured by the FW
   * to limit accesses to resources not provisioned.
   */
-object_property_set_int(OBJECT(xsrc), "flags", XIVE_SRC_STORE_EOI,
-&error_fatal);
-object_property_set_int(OBJECT(xsrc), "nr-irqs", PNV_XIVE2_NR_IRQS,
-&error_fatal);
-object_property_set_link(OBJECT(xsrc), "xive", OBJECT(xive),
- &error_fatal);
+qdev_prop_set_uint64(DEVICE(xsrc), "flags", XIVE_SRC_STORE_EOI);
+qdev_prop_set_uint32(DEVICE(xsrc), "nr-irqs", PNV_XIVE2_NR_IRQS);
+qdev_prop_set_link(DEVICE(xsrc), "xive", OBJECT(xive));
  qdev_realize(DEVICE(xsrc), NULL, &local_err);
  if (local_err) {
  error_propagate(errp, local_err);
  return;
  }
  
-object_property_set_int(OBJECT(end_xsrc), "nr-ends", PNV_XIVE2_NR_ENDS,

-&error_fatal);
-object_property_set_link(OBJECT(end_xsrc), "xive", OBJECT(xive),
- &error_abort);
+qdev_prop_set_uint32(DEVICE(end_xsrc), "nr-ends", PNV_XIVE2_NR_ENDS);
+qdev_prop_set_link(DEVICE(end_xsrc), "xive", OBJECT(xive));
  qdev_realize(DEVICE(end_xsrc), NULL, &local_err);
  if (local_err) {
  error_propagate(errp, local_err);
diff --git a/hw/pci-host/pnv_phb3.c b/hw/pci-host/pnv_phb3.c
index 7a21497cf8..6da9053ffa 100644
--- a/hw/pci-host/pnv_phb3.c
+++ b/hw/pci-host/pnv_phb3.c
@@ -1029,8 +1029,7 @@ static void pnv_phb3_realize(DeviceState *dev, Error 
**errp)
  /* LSI sources */
  object_property_set_link(OBJECT(&phb->lsis), "xics", OBJECT(pnv),
   &error_abort);
-object_property_set_int(OBJECT(&phb->lsis), "nr-irqs", PNV_PHB3_NUM_LSI,
-&error_abort);
+qdev_prop_set_uint32(DEVICE(&phb->lsis), "nr-irqs", PNV_PHB3_NUM_LSI);
  if (!qdev_realize(DEVICE(&phb->lsis), NULL, errp)) {
  return;
  }
@@ -1046,15 +1045,13 @@ static void pnv_phb3_realize(DeviceState *dev, Error 
**errp)
   &error_abort);
  object_property_set_link(OBJECT(&phb->msis), "xics", OBJECT(pnv),
   &error_abort);
-object_property_set_int(OBJECT(&phb->msis),

Re: [PATCH 4/5] hw/ppc/spapr: Set QDev properties using QDev API

2023-02-06 Thread Cédric Le Goater


On 2/3/23 22:16, Philippe Mathieu-Daudé wrote:

No need to use the low-level QOM API when an object
inherits from QDev. Directly use the QDev API to set
its properties.

All calls use either errp=&error_abort or &error_fatal,
so converting to the QDev API is almost a no-op (QDev
API always uses &error_abort).

Signed-off-by: Philippe Mathieu-Daudé 



Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
  hw/intc/spapr_xive.c | 11 ---
  hw/intc/xics.c   |  4 ++--
  hw/intc/xive.c   |  4 ++--
  hw/ppc/spapr_irq.c   |  8 +++-
  4 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index dc641cc604..213c4cac44 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -310,9 +310,8 @@ static void spapr_xive_realize(DeviceState *dev, Error 
**errp)
  /*
   * Initialize the internal sources, for IPIs and virtual devices.
   */
-object_property_set_int(OBJECT(xsrc), "nr-irqs", xive->nr_irqs,
-&error_fatal);
-object_property_set_link(OBJECT(xsrc), "xive", OBJECT(xive), &error_abort);
+qdev_prop_set_uint32(DEVICE(xsrc), "nr-irqs", xive->nr_irqs);
+qdev_prop_set_link(DEVICE(xsrc), "xive", OBJECT(xive));
  if (!qdev_realize(DEVICE(xsrc), NULL, errp)) {
  return;
  }
@@ -321,10 +320,8 @@ static void spapr_xive_realize(DeviceState *dev, Error 
**errp)
  /*
   * Initialize the END ESB source
   */
-object_property_set_int(OBJECT(end_xsrc), "nr-ends", xive->nr_irqs,
-&error_fatal);
-object_property_set_link(OBJECT(end_xsrc), "xive", OBJECT(xive),
- &error_abort);
+qdev_prop_set_uint32(DEVICE(end_xsrc), "nr-ends", xive->nr_irqs);
+qdev_prop_set_link(DEVICE(end_xsrc), "xive", OBJECT(xive));
  if (!qdev_realize(DEVICE(end_xsrc), NULL, errp)) {
  return;
  }
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index c7f8abd71e..2fd1a15153 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -382,8 +382,8 @@ Object *icp_create(Object *cpu, const char *type, 
XICSFabric *xi, Error **errp)
  obj = object_new(type);
  object_property_add_child(cpu, type, obj);
  object_unref(obj);
-object_property_set_link(obj, ICP_PROP_XICS, OBJECT(xi), &error_abort);
-object_property_set_link(obj, ICP_PROP_CPU, cpu, &error_abort);
+qdev_prop_set_link(DEVICE(obj), ICP_PROP_XICS, OBJECT(xi));
+qdev_prop_set_link(DEVICE(obj), ICP_PROP_CPU, cpu);
  if (!qdev_realize(DEVICE(obj), NULL, errp)) {
  object_unparent(obj);
  obj = NULL;
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index a986b96843..0e34035bc6 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -799,8 +799,8 @@ Object *xive_tctx_create(Object *cpu, XivePresenter *xptr, 
Error **errp)
  obj = object_new(TYPE_XIVE_TCTX);
  object_property_add_child(cpu, TYPE_XIVE_TCTX, obj);
  object_unref(obj);
-object_property_set_link(obj, "cpu", cpu, &error_abort);
-object_property_set_link(obj, "presenter", OBJECT(xptr), &error_abort);
+qdev_prop_set_link(DEVICE(obj), "cpu", cpu);
+qdev_prop_set_link(DEVICE(obj), "presenter", OBJECT(xptr));
  if (!qdev_realize(DEVICE(obj), NULL, errp)) {
  object_unparent(obj);
  return NULL;
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index a0d1e1298e..283769c074 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -313,9 +313,8 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
  obj = object_new(TYPE_ICS_SPAPR);
  
  object_property_add_child(OBJECT(spapr), "ics", obj);

-object_property_set_link(obj, ICS_PROP_XICS, OBJECT(spapr),
- &error_abort);
-object_property_set_int(obj, "nr-irqs", smc->nr_xirqs, &error_abort);
+qdev_prop_set_link(DEVICE(obj), ICS_PROP_XICS, OBJECT(spapr));
+qdev_prop_set_uint32(DEVICE(obj), "nr-irqs", smc->nr_xirqs);
  if (!qdev_realize(DEVICE(obj), NULL, errp)) {
  return;
  }
@@ -335,8 +334,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
   * priority
   */
  qdev_prop_set_uint32(dev, "nr-ends", nr_servers << 3);
-object_property_set_link(OBJECT(dev), "xive-fabric", OBJECT(spapr),
- &error_abort);
+qdev_prop_set_link(dev, "xive-fabric", OBJECT(spapr));
  sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
  
  spapr->xive = SPAPR_XIVE(dev);

Re: [PATCH v2 2/9] tests: fix test-io-channel-command on win32

2023-02-06 Thread Thomas Huth


On 29/01/2023 19.24, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

socat "PIPE:"" on Windows are named pipes, not fifo path names.

Fixes: commit 68406d10859 ("tests/unit: cleanups for test-io-channel-command")
Signed-off-by: Marc-André Lureau 
---
  tests/unit/test-io-channel-command.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/tests/unit/test-io-channel-command.c 
b/tests/unit/test-io-channel-command.c
index 096224962c..e76ef2daaa 100644
--- a/tests/unit/test-io-channel-command.c
+++ b/tests/unit/test-io-channel-command.c
@@ -31,8 +31,12 @@ static char *socat = NULL;
  
  static void test_io_channel_command_fifo(bool async)

  {
+#ifdef WIN32
+const gchar *fifo = TEST_FIFO;


Question from a Windows ignorant: Won't this cause a race condition in case 
someone is trying to run tests in parallel (i.e. shouldn't there be a random 
part in the name)? Or are these named pipes local to the current process?


 Thomas


+#else
  g_autofree gchar *tmpdir = g_dir_make_tmp("qemu-test-io-channel.XX", 
NULL);
  g_autofree gchar *fifo = g_build_filename(tmpdir, TEST_FIFO, NULL);
+#endif
  g_autofree gchar *srcargs = g_strdup_printf("%s - PIPE:%s,wronly", socat, 
fifo);
  g_autofree gchar *dstargs = g_strdup_printf("%s PIPE:%s,rdonly -", socat, 
fifo);
  g_auto(GStrv) srcargv = g_strsplit(srcargs, " ", -1);
@@ -57,7 +61,9 @@ static void test_io_channel_command_fifo(bool async)
  object_unref(OBJECT(src));
  object_unref(OBJECT(dst));
  
+#ifndef WIN32

  g_rmdir(tmpdir);
+#endif
  }

Re: [PATCH 1/1] hw/loongarch/virt: add system_powerdown hmp command support

2023-02-06 Thread gaosong


Ping !!

在 2023/1/31 下午8:05, gaosong 写道:

Ping !

在 2023/1/12 下午2:11, Song Gao 写道:

For loongarch virt machine, add powerdown notification callback
and send ACPI_POWER_DOWN_STATUS event by acpi ged. Also add
acpi dsdt table for ACPI_POWER_BUTTON_DEVICE device in this
patch.

Signed-off-by: Song Gao 
---
  hw/loongarch/acpi-build.c   |  1 +
  hw/loongarch/virt.c | 14 ++
  include/hw/loongarch/virt.h |  1 +
  3 files changed, 16 insertions(+)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index c2b237736d..b7601cb284 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -261,6 +261,7 @@ build_la_ged_aml(Aml *dsdt, MachineState *machine)
   AML_SYSTEM_MEMORY,
   VIRT_GED_MEM_ADDR);
  }
+    acpi_dsdt_add_power_button(dsdt);
  }
    static void build_pci_device_aml(Aml *scope, 
LoongArchMachineState *lams)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 66be925068..a4998599d3 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -316,6 +316,16 @@ static void virt_machine_done(Notifier 
*notifier, void *data)

  loongarch_acpi_setup(lams);
  }
  +static void virt_powerdown_req(Notifier *notifier, void *opaque)
+{
+    LoongArchMachineState *s = container_of(notifier,
+   LoongArchMachineState, 
powerdown_notifier);

+
+    if (s->acpi_ged) {
+    acpi_send_event(s->acpi_ged, ACPI_POWER_DOWN_STATUS);
+    }
+}
+
  struct memmap_entry {
  uint64_t address;
  uint64_t length;
@@ -859,6 +869,10 @@ static void loongarch_init(MachineState *machine)
 VIRT_PLATFORM_BUS_IRQ);
  lams->machine_done.notify = virt_machine_done;
qemu_add_machine_init_done_notifier(&lams->machine_done);
+ /* connect powerdown request */
+    lams->powerdown_notifier.notify = virt_powerdown_req;
+ qemu_register_powerdown_notifier(&lams->powerdown_notifier);
+
  fdt_add_pcie_node(lams);
  /*
   * Since lowmem region starts from 0 and Linux kernel legacy 
start address

diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index f5f818894e..7ae8a91229 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -45,6 +45,7 @@ struct LoongArchMachineState {
  /* State for other subsystems/APIs: */
  FWCfgState  *fw_cfg;
  Notifier machine_done;
+    Notifier powerdown_notifier;
  OnOffAuto    acpi;
  char *oem_id;
  char *oem_table_id;

Re: [PATCH v2 2/9] tests: fix test-io-channel-command on win32

2023-02-06 Thread Philippe Mathieu-Daudé


On 29/1/23 19:24, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

socat "PIPE:"" on Windows are named pipes, not fifo path names.

Fixes: commit 68406d10859 ("tests/unit: cleanups for test-io-channel-command")
Signed-off-by: Marc-André Lureau 
---
  tests/unit/test-io-channel-command.c | 6 ++
  1 file changed, 6 insertions(+)


This doesn't apply cleanly on top of c906e6fbaa ("tests/unit: drop hacky
race avoidance in test-io-channel-command").

Re: [PULL 00/11] Net patches

2023-02-06 Thread Laurent Vivier


On 2/5/23 13:36, Peter Maydell wrote:

On Sat, 4 Feb 2023 at 20:09, Laurent Vivier  wrote:


On 2/4/23 15:57, Peter Maydell wrote:

On Thu, 2 Feb 2023 at 06:21, Jason Wang  wrote:


The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a:

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
staging (2023-01-24 09:45:33 +)

are available in the git repository at:

https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 2bd492bca521ee8594f1d5db8dc9aac126fc4f85:

vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check (2023-02-02 14:16:48 +0800)




Something weird has happened here -- this pullreq is trying to
add tests/qtest/netdev-socket.c, but it already exists in the
tree and doesn't have the same contents as the version in your
pull request.

Can you look at what's happened here and fix it up, please ?


Thomas and Jason have queued the patch:

tests/qtest: netdev: test stream and dgram backends

For Jason it's because it's needed by

net: stream: add a new option to automatically reconnect

For me, both patches (in tree and Jason's one) are identical to my v7
(except the one that is merged does not have Thomas' acked-by).


When I tried to merge the pullreq I got conflicts because they
weren't the same, notably in the set of #include lines.


The differences in the file are introduced by the following patch 10/11:

5b28ced1bc6d net: stream: add a new option to automatically reconnect

Thanks,
Laurent

Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job

2023-02-06 Thread Juan Quintela

Thomas Huth  wrote:
> On 03/02/2023 22.14, Juan Quintela wrote:
>> Peter Maydell  wrote:
>>> On Fri, 3 Feb 2023 at 15:44, Thomas Huth  wrote:

 On 03/02/2023 13.08, Kevin Wolf wrote:
> Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
>> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
>>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
 We can get rid of the build-coroutine-sigaltstack job by moving
 the configure flags that should be tested here to other jobs:
 Move --with-coroutine=sigaltstack to the build-without-defaults job
 and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
>>>
>>> The biggest user of coroutines is the block layer. So we probably
>>> ought to have coroutines aligned with a job that triggers the
>>> 'make check-block' for iotests.  IIUC,  the without-defaults
>>> job won't do that. How about, arbitrarily, using either the
>>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
>>> are closely related, so getting sigaltstack vs ucontext coverage
>>> between them is a good win, and they both trigger the block jobs
>>> IIUC.
>>
>> I gave it a try with the ubuntu job, but this apparently trips up the 
>> iotests:
>>
>>https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
>>
>> Does anybody have a clue what could be going wrong here?
>
> I'm not sure how changing the coroutine backend could cause it, but
> primarily this looks like an assertion failure in migration code.
>
> Dave, Juan, any ideas what this assertion checks and why it could be
> failing?

 Ah, I think it's the bug that will be fixed by:


 https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quint...@redhat.com/

 The fix hasn't hit the master branch yet (I think), and I had another patch
 in my CI that disables the aarch64 binary in that runner, so the iotests
 suddenly have been executed with the alpha binary there --> migration 
 fails.

 So never mind, it will be fixed as soon as Juan's pull request gets 
 included.
>>>
>>> The migration tests have been flaky for a while now,
>>> including setups where host and guest page sizes are the same.
>>> (For instance, my x86 macos box pretty reliably sees failures
>>> when the machine is under load.)
>> I *thought* that we had fixed all of those.
>> But it is difficult for me to know because:
>> - I only happens when one runs "make check"
>> - running ./migration-test have never failed to me
>> - When it fails (and it has been a while since it has failed to me)
>>it is impossible to me to detect what is going on, and as said, I have
>>never been able to reproduce running only migration-test.
>> I will try to run several at the same time and see if it happens.
>> And as Thomas said, I *think* that the fix that Peter Xu posted
>> should
>> fix this issue.  Famous last words.
>
> The patch from Peter should fix my problems that I triggered via the
> iotests - but the migration-qtest is still unstable independent from
> that issue, I think. See for example the latest staging pipeline:
>
>  https://gitlab.com/qemu-project/qemu/-/pipelines/767961842
>
> The migration qtest failed in both, the x86-freebsd-build and the
> ubuntu-20.04-s390x-all pipelin.
>
>  Thomas

 31/659 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test 
  ERROR  48.23s   killed by signal 6 SIGABRT
>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>>  QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-aarch64 
>>> MALLOC_PERTURB_=124 
>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>> /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test
>>>  --tap -k
― ✀  ―
stderr:
Broken pipe
../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 11 
(Segmentation fault) (core dumped)
TAP parsing error: Too few tests run (expected 41, got 12)
(test program exited with status code -6)
――

I don't know hat to do with this:
- this is aarch64 tcg
- this *works* on f37, or at least I can't reproduce any error with make
  check on my box, and I *think* my configuration is quite extensive (as
  far as I know everything that can be compiled in fedora with packages
  in the distro):

configure file: /mnt/code/qemu/full/configure
--enable-trace-backends=log
--prefix=/usr
--sysconfdir=/etc/sysconfig/
--audio-drv-list=pa,alsa
--with-coroutine=ucontext
--with-git-submodules=validate
--enable-alsa
--enable-attr
--enable-auth-pam
--enable-avx2
--enable-avx512f
--enable-bochs
--en

[PATCH] hw/riscv: virt: Simplify virt_{get,set}_aclint()

2023-02-06 Thread Bin Meng

There is no need to declare an intermediate "MachineState *ms".

Signed-off-by: Bin Meng 
---

 hw/riscv/virt.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 4a11b4b010..bdb6b93115 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1577,16 +1577,14 @@ static void virt_set_aia(Object *obj, const char *val, 
Error **errp)
 
 static bool virt_get_aclint(Object *obj, Error **errp)
 {
-MachineState *ms = MACHINE(obj);
-RISCVVirtState *s = RISCV_VIRT_MACHINE(ms);
+RISCVVirtState *s = RISCV_VIRT_MACHINE(obj);
 
 return s->have_aclint;
 }
 
 static void virt_set_aclint(Object *obj, bool value, Error **errp)
 {
-MachineState *ms = MACHINE(obj);
-RISCVVirtState *s = RISCV_VIRT_MACHINE(ms);
+RISCVVirtState *s = RISCV_VIRT_MACHINE(obj);
 
 s->have_aclint = value;
 }
-- 
2.25.1

Re: [PATCH 0/5] hw/ppc: Set QDev properties using QDev API (part 2/3)

2023-02-06 Thread Mark Cave-Ayland


On 06/02/2023 08:00, Cédric Le Goater wrote:


On 2/3/23 22:16, Philippe Mathieu-Daudé wrote:

part 1 [*] cover:
--
QEMU provides the QOM API for core objects.
Devices are modelled on top of QOM as QDev objects.

There is no point in using the lower level QOM API with
QDev; it makes the code more complex and harder to review.

I first converted all the calls using errp=&error_abort or
&errp=NULL, then noticed the other uses weren't really
consistent.


6/8 years ago, we converted models to QOM, supposedly because the qdev
interface was legacy and QOM was the new way. That's not true anymore ?


That is a good question, and something that we really should decide first before 
going ahead with these changes. My understanding is that architectures with newer 
machines (particularly ARM and PPC) use QOM APIs directly, however more recently 
Markus did some improvements to qdev which largely eliminated the gap between the 
two. Hence why these days the two are mostly interchangeable: the main difference is 
that qdev has a notion of a parent which can be useful during device modelling.



That said, I am ok with changes, even for the best practices. I would
like to know how to keep track. Do we have a model skeleton/reference ?


Agreed. I've added Peter on CC as I know he has had some thoughts on QOM vs. qdev, 
but certainly as a reviewer it would be great to know which way we should be heading 
in the future.



ATB,

Mark.

Re: [PATCH v2 07/10] hw/ide/piix: Require an ISABus only for user-created instances

2023-02-06 Thread Mark Cave-Ayland


On 06/02/2023 06:51, Philippe Mathieu-Daudé wrote:


On 5/2/23 23:32, Mark Cave-Ayland wrote:

On 05/02/2023 22:21, BALATON Zoltan wrote:


On Sun, 5 Feb 2023, Mark Cave-Ayland wrote:

On 26/01/2023 21:17, Bernhard Beschow wrote:

Internal instances now defer interrupt wiring to the caller which
decouples them from the ISABus. User-created devices still fish out the
ISABus from the QOM tree and the interrupt wiring remains in PIIX IDE.
The latter mechanism is considered a workaround and intended to be
removed once a deprecation period for user-created PIIX IDE devices is
over.

Signed-off-by: Bernhard Beschow 
---
  include/hw/ide/pci.h |  1 +
  hw/ide/piix.c    | 64 ++--
  hw/isa/piix.c    |  5 
  3 files changed, 56 insertions(+), 14 deletions(-)


I haven't checked the datasheet, but I suspect this will be similar to the 
cmd646/via PCI-IDE interfaces in that there will be a PCI configuration register 
that will switch between ISA compatibility mode (and ISA irqs) and PCI mode (with 
PCI IRQs). So it would be the device configuration that would specify PCI or ISA 
mode, rather than the presence of an ISABus.


I forgot about this topic already and haven't follwed this series either so what I 
say may not fully make sense but I think CMD646 and via-ide are different. CMD646 
is a PCI device and should use PCI interrupts while via-ide is part of a 
southbridge/superio complex and connected to the ISA PICs within that southbride, 
so I think via-ide always uses ISA IRQs and the ISA btidge within the same chip 
may convert that to PCI IRQs or not (that part is where I'm lost also because we 
may not actually model it that way). After a long debate we managed to find a 
solution back then that works for every guest we use it for now so I think we 
don't want to touch it now until some real need arises. It does not worth the 
trouble and added complexity to model something that is not used just for the sake 
of correctness. By the time we find a use for that, the ISA emulation may evolve 
so it's easier to implement the missing switching between isa and native mode or 
we may want to do it differently (such as we do things differently now compared to 
what we did years ago). So I think it does not worth keeping the ISA model from 
being simplified for some theoretical uses in the future which we may not actually 
do any time soon. But I don't want to get into this again so just shared my 
thoughts and feel free to ignore it. I don't care where these patches go as long 
as the VIA model keeps working for me.


I have a vague memory that ISA compatibility mode was part of the original 
PCI-BMDMA specification, but it has been a while since I last looked.


Bernhard, is there any mention of this in the PIIX datasheet(s)? For reference the 
cmd646 datasheet specifies that ISA mode or PCI mode is determined by register 
PROG_IF (0x9) in PCI configuration space.


Orthogonal to this discussion, one problem I see here is a device
exposing 2 interfaces: ISA and PCI. QOM does support interfaces,
but ISA and PCI aren't QOM interfaces. The QOM cast macros are
written as a QOM object can only inherit one parent. Should we
stick to QDev and convert ISA/PCI as QOM interfaces? That could
solve some QDev IDE limitations...


The normal approach to this is to encapsulate the chip functionality as a set of 
common functions, for example see esp.c and then have separate files such as 
esp-pci.c and esp-isa.c which can be compiled accordingly using Kconfig dependencies.


FWIW the ability to support a legacy mode is only something that would be present in 
the generation of mixed PCI/ISA motherboards, and so probably not something that is 
worth the effort of reworking separate PCI and ISA QOM interfaces.



ATB,

Mark.

Re: [PATCH] hw/riscv: virt: Simplify virt_{get,set}_aclint()

2023-02-06 Thread Philippe Mathieu-Daudé


On 6/2/23 09:50, Bin Meng wrote:

There is no need to declare an intermediate "MachineState *ms".

Signed-off-by: Bin Meng 
---

  hw/riscv/virt.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job

2023-02-06 Thread Juan Quintela

Peter Maydell  wrote:
> On Fri, 3 Feb 2023 at 21:14, Juan Quintela  wrote:
>>
>> Peter Maydell  wrote:
>> > The migration tests have been flaky for a while now,
>> > including setups where host and guest page sizes are the same.
>> > (For instance, my x86 macos box pretty reliably sees failures
>> > when the machine is under load.)
>>
>> I *thought* that we had fixed all of those.
>>
>> But it is difficult for me to know because:
>> - I only happens when one runs "make check"
>> - running ./migration-test have never failed to me
>> - When it fails (and it has been a while since it has failed to me)
>>   it is impossible to me to detect what is going on, and as said, I have
>>   never been able to reproduce running only migration-test.
>
> Yes. If we could improve the logging in the test so that when
> an intermittent failure does happen the test prints better
> clues about what happened, I think that would help a lot.
>
> https://lore.kernel.org/qemu-devel/cafeaca8x_im3hn2-p9f+huxnxfxy+d6fze+leq4erldg7zk...@mail.gmail.com/
> is the thread from late December about the macos failures.

We (red hat) found a similar problem with aarch64, but only when using
zero copy.  Will try to see if I can reproduce this other there.

https://bugzilla.redhat.com/show_bug.cgi?id=2160929

the similar thing to what you have is:
- they are trying to cancel
- they are on aarch64

but:

- they can only reproduce with zero copy enabled.

Later, Juan.

Re: [PATCH 02/10] hw/riscv/virt: Add a switch to enable/disable ACPI

2023-02-06 Thread Bin Meng

On Thu, Feb 2, 2023 at 12:54 PM Sunil V L  wrote:
>
> ACPI is optional. So, add a switch to toggle.
>
> Signed-off-by: Sunil V L 
> ---
>  hw/riscv/virt.c | 38 ++
>  include/hw/riscv/virt.h |  2 ++
>  2 files changed, 40 insertions(+)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index 7ad9fda20c..84962962ff 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -50,6 +50,7 @@
>  #include "hw/pci-host/gpex.h"
>  #include "hw/display/ramfb.h"
>  #include "hw/acpi/aml-build.h"
> +#include "qapi/qapi-visit-common.h"
>
>  /*
>   * The virt machine physical address space used by some of the devices
> @@ -1525,6 +1526,10 @@ static void virt_machine_init(MachineState *machine)
>
>  static void virt_machine_instance_init(Object *obj)
>  {
> +MachineState *ms = MACHINE(obj);

Drop this

> +RISCVVirtState *s = RISCV_VIRT_MACHINE(ms);
> +
> +s->acpi = ON_OFF_AUTO_OFF;

Is this needed? I believe the purpose of an auto/on/off property is to
have an "auto" value, which is ON_OFF_AUTO_AUTO.

>  }
>
>  static char *virt_get_aia_guests(Object *obj, Error **errp)
> @@ -1601,6 +1606,34 @@ static void virt_set_aclint(Object *obj, bool value, 
> Error **errp)
>  s->have_aclint = value;
>  }
>
> +bool virt_is_acpi_enabled(RISCVVirtState *s)
> +{
> +if (s->acpi == ON_OFF_AUTO_OFF) {
> +return false;
> +}
> +return true;
> +}
> +
> +static void virt_get_acpi(Object *obj, Visitor *v, const char *name,
> +  void *opaque, Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);

ditto

> +RISCVVirtState *s = RISCV_VIRT_MACHINE(ms);
> +
> +OnOffAuto acpi = s->acpi;
> +
> +visit_type_OnOffAuto(v, name, &acpi, errp);
> +}
> +
> +static void virt_set_acpi(Object *obj, Visitor *v, const char *name,
> +  void *opaque, Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);

ditto

> +RISCVVirtState *s = RISCV_VIRT_MACHINE(ms);
> +
> +visit_type_OnOffAuto(v, name, &s->acpi, errp);
> +}
> +
>  static HotplugHandler *virt_machine_get_hotplug_handler(MachineState 
> *machine,
>  DeviceState *dev)
>  {
> @@ -1672,6 +1705,11 @@ static void virt_machine_class_init(ObjectClass *oc, 
> void *data)
>  sprintf(str, "Set number of guest MMIO pages for AIA IMSIC. Valid value "
>   "should be between 0 and %d.", VIRT_IRQCHIP_MAX_GUESTS);
>  object_class_property_set_description(oc, "aia-guests", str);
> +object_class_property_add(oc, "acpi", "OnOffAuto",
> +  virt_get_acpi, virt_set_acpi,
> +  NULL, NULL);

I am not sure about "OnOffAuto" vs. "bool" type. It seems the only
difference is that with "OnOffAuto" type we may silently change the
interpretation of "auto" value across different QEMU versions?

> +object_class_property_set_description(oc, "acpi",
> +  "Enable ACPI");
>  }
>
>  static const TypeInfo virt_machine_typeinfo = {
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index 6c7885bf89..62efebaa32 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -58,6 +58,7 @@ struct RISCVVirtState {
>  int aia_guests;
>  char *oem_id;
>  char *oem_table_id;
> +OnOffAuto acpi;
>  };
>
>  enum {
> @@ -123,4 +124,5 @@ enum {
>  #define FDT_APLIC_INT_MAP_WIDTH (FDT_PCI_ADDR_CELLS + FDT_PCI_INT_CELLS + \
>   1 + FDT_APLIC_INT_CELLS)
>
> +bool virt_is_acpi_enabled(RISCVVirtState *s);
>  #endif
> --

Regards,
Bin

Re: [PATCH 03/10] hw/riscv/virt: Add memmap pointer to RiscVVirtState

2023-02-06 Thread Bin Meng

On Thu, Feb 2, 2023 at 12:54 PM Sunil V L  wrote:
>
> memmap needs to be exported outside of virt.c so that
> modules like acpi can use it. Hence, add a pointer field
> in RiscVVirtState structure and initialize it with the
> memorymap.
>
> Signed-off-by: Sunil V L 
> ---
>  hw/riscv/virt.c | 2 ++
>  include/hw/riscv/virt.h | 1 +
>  2 files changed, 3 insertions(+)
>

Reviewed-by: Bin Meng

Re: Qemu - how to run in Win?

2023-02-06 Thread Philippe Mathieu-Daudé


Cc'ing Yonggang & Stefan.

On 5/2/23 13:01, Jacob A wrote:

Hello,

After installing Qemu on Win, I don't see any shortcut to run it? There 
is only a link to 'uninstall'. launching exe files doesn't do anything.  
Can you please explain how to launch this application?


Thanks,
J.

Please see the attached image.

Re: [PATCH v3 8/9] hw/i386/x86: Make TYPE_X86_MACHINE the owner of smram

2023-02-06 Thread Juan Quintela

Philippe Mathieu-Daudé  wrote:
> On 4/2/23 16:10, Bernhard Beschow wrote:
>> Treat the smram MemoryRegion analoguous to other memory regions such as
>> ram, pci, io, ... , making the used memory regions more explicit when
>> instantiating q35 or i440fx.
>> Note that the q35 device uses these memory regions only during the
>> realize phase which suggests that it shouldn't be the owner of smram.
>
> Few years ago I tried something similar and it wasn't accepted because
> the MR owner name is used in the migration stream, so this was breaking
> migrating from older machines.

I don't remember the details O:-)

Migration code, really depends on RAMBlocks names, not memory region
names.  But as far as I remember, that don't matter too much because the
memory region names ends tangled quite a bit with the RAMBlock name, right?

> Adding David/Juan for double-checking that.

trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");

You can try to enable this trace and see that every section has the same
name with and without your change (i.e. that memory region name is not
seen by the migration stream).

But that is the only help that I can came with.

The code that you are changing (smram) is something that I don't know
about to give you more help.

Looking at the patch, it looks that the name was before and now the
"sram", so perhaps it could help.  But I don't know.

In the i440fx you say that you only use it until realize, so you should
be safe.

For q35, it is not clear to me.

If the trace don't show new names, I will just try:
- migrate a i440fx machine from binary without your patch to one with
  your patch
- the same for q35.

And depending on the result, we can go from there.

Later, Juan.

Re: [PATCH v15 03/11] target/s390x/cpu topology: handle STSI(15) and build the SYSIB

2023-02-06 Thread Pierre Morel





On 2/3/23 18:36, Nina Schoetterl-Glausch wrote:

On Wed, 2023-02-01 at 14:20 +0100, Pierre Morel wrote:

On interception of STSI(15.1.x) the System Information Block
(SYSIB) is built from the list of pre-ordered topology entries.

Signed-off-by: Pierre Morel 
---
  include/hw/s390x/cpu-topology.h |  22 +++
  include/hw/s390x/sclp.h |   1 +
  target/s390x/cpu.h  |  72 +++
  hw/s390x/cpu-topology.c |  10 +
  target/s390x/kvm/cpu_topology.c | 335 
  target/s390x/kvm/kvm.c  |   5 +-
  target/s390x/kvm/meson.build|   3 +-
  7 files changed, 446 insertions(+), 2 deletions(-)
  create mode 100644 target/s390x/kvm/cpu_topology.c


[...]


diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index d654267a71..e1f6925856 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h

[...]

+
+/* CPU type Topology List Entry */
+typedef struct SysIBTl_cpu {
+uint8_t nl;
+uint8_t reserved0[3];
+#define SYSIB_TLE_POLARITY_MASK 0x03
+#define SYSIB_TLE_DEDICATED 0x04
+uint8_t entitlement;


I would just call this flags, since it's multiple fields.


OK




+uint8_t type;
+uint16_t origin;
+uint64_t mask;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_cpu;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_cpu) != 16);

+


[...]

  /**
diff --git a/target/s390x/kvm/cpu_topology.c b/target/s390x/kvm/cpu_topology.c
new file mode 100644
index 00..aba141fb66
--- /dev/null
+++ b/target/s390x/kvm/cpu_topology.c


[...]

+
+/*
+ * Macro to check that the size of data after increment
+ * will not get bigger than the size of the SysIB.
+ */
+#define SYSIB_GUARD(data, x) do {   \
+data += x;  \
+if (data  > sizeof(SysIB)) {\
+return -ENOSPC; \


I would go with ENOMEM here.


Considering your comment of the length in insert_stsi_15_1_x(), I think 
the best would be to return 0.

Because:
- We do not need to report any error reason.
- The value will be forwarded to insert_stsi_15_1_x() as the length of 
the SYSIB to write

- On error we do not write the SYSIB (len = 0)
- In normal case the return value is always non zero and positive.




+}   \
+} while (0)
+
+/**
+ * stsi_set_tle:
+ * @p: A pointer to the position of the first TLE
+ * @level: The nested level wanted by the guest
+ *
+ * Loop inside the s390_topology.list until the sentinelle entry


s/sentinelle/sentinel/


OK, thx,




+ * is found and for each entry:
+ *   - Check using SYSIB_GUARD() that the size of the SysIB is not
+ * reached.
+ *   - Add all the container TLE needed for the level
+ *   - Add the CPU TLE.


I'd focus more on *what* the function does instead of *how*.


Right.



Fill the SYSIB with the topology information as described in the PoP,
nesting containers as appropriate, with the maximum nesting limited by @level.

Or something similar.


Thanks, looks good to me .




+ *
+ * Return value:
+ * s390_top_set_level returns the size of the SysIB_15x after being


You forgot to rename the function here, right?
How about stsi_fill_topology_sysib or stsi_topology_fill_sysib, instead?



OK with stsi_topology_fill_sysib()




+ * filled with TLE on success.
+ * It returns -ENOSPC in the case we would overrun the end of the SysIB.


You would have to change to ENOMEM here than also.


As discussed above, it seems to me that return 0 is even better.




+ */
+static int stsi_set_tle(char *p, int level)
+{
+S390TopologyEntry *entry;
+int last_drawer = -1;
+int last_book = -1;
+int last_socket = -1;
+int drawer_id = 0;
+int book_id = 0;
+int socket_id = 0;
+int n = sizeof(SysIB_151x);
+
+QTAILQ_FOREACH(entry, &s390_topology.list, next) {
+int current_drawer = entry->id.drawer;
+int current_book = entry->id.book;
+int current_socket = entry->id.socket;


This only saves two characters, so you could just use entry->id. ...


OK




+bool drawer_change = last_drawer != current_drawer;
+bool book_change = drawer_change || last_book != current_book;
+bool socket_change = book_change || last_socket != current_socket;


... but keep it if it would make this line too long.


it is OK


You could also rename entry, to current or cur, if you want to emphasize that.


+
+/* If we reach the guard get out */
+if (entry->id.level5) {
+break;
+}
+
+if (level > 3 && drawer_change) {
+SYSIB_GUARD(n, sizeof(SysIBTl_container));
+p = fill_container(p, 3, drawer_id++);
+book_id = 0;
+}
+if (level > 2 && book_change) {
+SYSIB_GUARD(n, sizeof(SysIBTl_container));
+p = fill_container(p, 2, book_id++);
+socket_id = 0;
+}
+if (socket_change) {
+SYSIB_GUARD(n, sizeof(SysIBTl_container));
+p = fill_container(p, 1, s

Re: [PATCH v15 04/11] s390x/sclp: reporting the maximum nested topology entries

2023-02-06 Thread Thomas Huth


On 01/02/2023 14.20, Pierre Morel wrote:

The maximum nested topology entries is used by the guest to
know how many nested topology are available on the machine.

Let change the MNEST value from 2 to 4 in the SCLP READ INFO
structure now that we support books and drawers.

Signed-off-by: Pierre Morel 
Reviewed-by: Nina Schoetterl-Glausch 
---
  include/hw/s390x/sclp.h | 5 +++--
  hw/s390x/sclp.c | 5 +
  2 files changed, 8 insertions(+), 2 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH 04/10] hw/riscv/virt: virt-acpi-build.c: Add basic ACPI tables

2023-02-06 Thread Bin Meng

On Thu, Feb 2, 2023 at 12:54 PM Sunil V L  wrote:
>
> Add few basic ACPI tables and DSDT with few devices in a
> new file virt-acpi-build.c.
>
> These are mostly leveraged from arm64.

There are lots of same ACPI codes existing in x86/arm/riscv. I believe
some refactoring work is needed before ACPI support fully lands on
RISC-V.
For example, we can extract the common part among x86/arm/riscv into a
separate file, like hw/acpi/acpi-build.c?

>
> Signed-off-by: Sunil V L 
> ---
>  hw/riscv/virt-acpi-build.c | 292 +
>  include/hw/riscv/virt.h|   1 +
>  2 files changed, 293 insertions(+)
>  create mode 100644 hw/riscv/virt-acpi-build.c
>
> diff --git a/hw/riscv/virt-acpi-build.c b/hw/riscv/virt-acpi-build.c
> new file mode 100644
> index 00..0410b955bd
> --- /dev/null
> +++ b/hw/riscv/virt-acpi-build.c
> @@ -0,0 +1,292 @@
> +/*
> + * Support for generating ACPI tables and passing them to Guests
> + *
> + * RISC-V virt ACPI generation
> + *
> + * Copyright (C) 2008-2010  Kevin O'Connor 
> + * Copyright (C) 2006 Fabrice Bellard
> + * Copyright (C) 2013 Red Hat Inc
> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
> + * Copyright (C) 2021-2023 Ventana Micro Systems Inc
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi-defs.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/riscv/virt.h"
> +#include "hw/riscv/numa.h"
> +#include "hw/acpi/pci.h"
> +#include "hw/acpi/utils.h"
> +#include "sysemu/reset.h"
> +#include "hw/pci-host/gpex.h"
> +#include "qapi/error.h"
> +#include "migration/vmstate.h"
> +
> +#define ACPI_BUILD_TABLE_SIZE 0x2
> +
> +typedef struct AcpiBuildState {
> +/* Copy of table in RAM (for patching). */

nits: removing the ending .

> +MemoryRegion *table_mr;
> +MemoryRegion *rsdp_mr;
> +MemoryRegion *linker_mr;
> +/* Is table patched? */
> +bool patched;
> +} AcpiBuildState;
> +
> +static void

nits: please put above in the same line

> +acpi_align_size(GArray *blob, unsigned align)
> +{
> +/*
> + * Align size to multiple of given size. This reduces the chance
> + * we need to change size in the future (breaking cross version 
> migration).
> + */
> +g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
> +}
> +
> +static void
> +acpi_dsdt_add_cpus(Aml *scope, RISCVVirtState *vms)

QEMU convention is to use 's' for the model state, so:

s/vms/s/g

> +{
> +MachineState *ms = MACHINE(vms);
> +uint16_t i;
> +
> +
> +for (i = 0; i < ms->smp.cpus; i++) {
> +Aml *dev = aml_device("C%.03X", i);
> +aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
> +aml_append(dev, aml_name_decl("_UID", aml_int(i)));
> +aml_append(scope, dev);
> +}
> +}
> +
> +static void
> +acpi_dsdt_add_fw_cfg(Aml *scope, const MemMapEntry *fw_cfg_memmap)
> +{
> +Aml *dev = aml_device("FWCF");
> +aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
> +/* device present, functioning, decoding, not shown in UI */
> +aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
> +aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
> +
> +Aml *crs = aml_resource_template();
> +aml_append(crs, aml_memory32_fixed(fw_cfg_memmap->base,
> +   fw_cfg_memmap->size, AML_READ_WRITE));
> +aml_append(dev, aml_name_decl("_CRS", crs));
> +aml_append(scope, dev);
> +}
> +
> +/* FADT */
> +static void
> +build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
> + RISCVVirtState *vms, unsigned dsdt_tbl_offset)
> +{
> +/* ACPI v5.1 */
> +AcpiFadtData fadt = {
> +.rev = 6,
> +.minor_ver = 0,
> +.flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
> +.xdsdt_tbl_offset = &dsdt_tbl_offset,
> +};
> +
> +build_fadt(table_data, linker, &fadt, vms->oem_id, vms->oem_table_id);
> +}
> +
> +/* DSDT */
> +static void
> +build_dsdt(GArray *table_data, BIOSLinker *linker, RISCVVirtState *vms)
> +{
> +Aml *scope, *dsdt;
> +const MemMapEntry *memmap = vms->memmap;
> +AcpiTable table = { .sig = "DSDT", .rev = 2, .oem_id = vms->oem_id,
> +.oem_table_id = vms->oem_table_id };
> +
> +
> +acpi_table_begin(&table, table

Re: Qemu - how to run in Win?

2023-02-06 Thread Bin Meng

On Mon, Feb 6, 2023 at 5:55 PM Philippe Mathieu-Daudé  wrote:
>
> Cc'ing Yonggang & Stefan.
>
> On 5/2/23 13:01, Jacob A wrote:
> > Hello,
> >
> > After installing Qemu on Win, I don't see any shortcut to run it? There
> > is only a link to 'uninstall'. launching exe files doesn't do anything.
> > Can you please explain how to launch this application?
> >

QEMU is a command-line utility so you need to run QEMU with some
parameters from a Windows shell.

Regards,
Bin

Re: [PATCH v15 04/11] s390x/sclp: reporting the maximum nested topology entries

2023-02-06 Thread Pierre Morel





On 2/6/23 11:13, Thomas Huth wrote:

On 01/02/2023 14.20, Pierre Morel wrote:

The maximum nested topology entries is used by the guest to
know how many nested topology are available on the machine.

Let change the MNEST value from 2 to 4 in the SCLP READ INFO
structure now that we support books and drawers.

Signed-off-by: Pierre Morel 
Reviewed-by: Nina Schoetterl-Glausch 
---
  include/hw/s390x/sclp.h | 5 +++--
  hw/s390x/sclp.c | 5 +
  2 files changed, 8 insertions(+), 2 deletions(-)


Reviewed-by: Thomas Huth 



Thanks,

Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen

[PATCH] meson: Avoid duplicates in generated config-poison.h again

2023-02-06 Thread Markus Armbruster

Commit eed56e9a89f "configure, meson: move config-poison.h to meson"
lost a "| sort -u".  Restore it.  config-poison shrinks from ~4500 to
~700 lines when all targets are enabled.

Signed-off-by: Markus Armbruster 
---
 scripts/make-config-poison.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/make-config-poison.sh b/scripts/make-config-poison.sh
index d222a04304..1892854261 100755
--- a/scripts/make-config-poison.sh
+++ b/scripts/make-config-poison.sh
@@ -13,4 +13,4 @@ exec sed -n \
   -e's///' \
   -e's/ .*//' \
   -e's/^/#pragma GCC poison /p' \
-  -e '}' "$@"
+  -e '}' "$@" | sort -u
-- 
2.39.0

Re: [PATCH 07/10] hw/riscv: meson.build: Build virt-acpi-build.c

2023-02-06 Thread Bin Meng

On Thu, Feb 2, 2023 at 12:53 PM Sunil V L  wrote:
>
> ACPI functions are defined in new file virt-acpi-build.c. Enable
> it to be built as part of virt machine if CONFIG_ACPI is set.
>
> Signed-off-by: Sunil V L 
> ---
>  hw/riscv/meson.build | 1 +
>  1 file changed, 1 insertion(+)
>

Reviewed-by: Bin Meng

Re: [PATCH 08/10] hw/riscv/Kconfig: virt: Enable ACPI config options

2023-02-06 Thread Bin Meng

On Thu, Feb 2, 2023 at 12:54 PM Sunil V L  wrote:
>
> Enable ACPI related config options to build ACPI subsystem
> for virt machine.
>
> Signed-off-by: Sunil V L 
> ---
>  hw/riscv/Kconfig | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
> index 4550b3b938..92b1a9eb64 100644
> --- a/hw/riscv/Kconfig
> +++ b/hw/riscv/Kconfig
> @@ -44,6 +44,9 @@ config RISCV_VIRT
>  select VIRTIO_MMIO
>  select FW_CFG_DMA
>  select PLATFORM_BUS
> +select ACPI
> +select ACPI_HW_REDUCED

I don't see APIs in ACPI_HW_REDUCED get called by RISC-V virt codes.

> +select ACPI_PCI

Regards,
Bin

Re: [PATCH] meson: Avoid duplicates in generated config-poison.h again

2023-02-06 Thread Marc-André Lureau

On Mon, Feb 6, 2023 at 2:21 PM Markus Armbruster  wrote:
>
> Commit eed56e9a89f "configure, meson: move config-poison.h to meson"
> lost a "| sort -u".  Restore it.  config-poison shrinks from ~4500 to
> ~700 lines when all targets are enabled.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Marc-André Lureau 

> ---
>  scripts/make-config-poison.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/make-config-poison.sh b/scripts/make-config-poison.sh
> index d222a04304..1892854261 100755
> --- a/scripts/make-config-poison.sh
> +++ b/scripts/make-config-poison.sh
> @@ -13,4 +13,4 @@ exec sed -n \
>-e's///' \
>-e's/ .*//' \
>-e's/^/#pragma GCC poison /p' \
> -  -e '}' "$@"
> +  -e '}' "$@" | sort -u
> --
> 2.39.0
>
>


-- 
Marc-André Lureau

Re: [PATCH 09/10] hw/riscv/virt.c: Initialize the ACPI tables

2023-02-06 Thread Bin Meng

On Thu, Feb 2, 2023 at 12:53 PM Sunil V L  wrote:
>
> When the "acpi=on", ACPI tables need to be added. Detect the option
> and initialize the ACPI tables.
>
> Signed-off-by: Sunil V L 
> ---
>  hw/riscv/virt.c | 4 
>  1 file changed, 4 insertions(+)
>

Reviewed-by: Bin Meng

Re: [PATCH v15 03/11] target/s390x/cpu topology: handle STSI(15) and build the SYSIB

2023-02-06 Thread Nina Schoetterl-Glausch

On Mon, 2023-02-06 at 11:06 +0100, Pierre Morel wrote:
> 
> On 2/3/23 18:36, Nina Schoetterl-Glausch wrote:
> > On Wed, 2023-02-01 at 14:20 +0100, Pierre Morel wrote:
> > > > On interception of STSI(15.1.x) the System Information Block
> > > > (SYSIB) is built from the list of pre-ordered topology entries.
> > > > 
> > > > Signed-off-by: Pierre Morel 
> > > > ---
> > > >   include/hw/s390x/cpu-topology.h |  22 +++
> > > >   include/hw/s390x/sclp.h |   1 +
> > > >   target/s390x/cpu.h  |  72 +++
> > > >   hw/s390x/cpu-topology.c |  10 +
> > > >   target/s390x/kvm/cpu_topology.c | 335 
> > > >   target/s390x/kvm/kvm.c  |   5 +-
> > > >   target/s390x/kvm/meson.build|   3 +-
> > > >   7 files changed, 446 insertions(+), 2 deletions(-)
> > > >   create mode 100644 target/s390x/kvm/cpu_topology.c
> > > > 
[...]
> > 
> > > > + */
> > > > +void insert_stsi_15_1_x(S390CPU *cpu, int sel2, __u64 addr, uint8_t ar)
> > > > +{
> > > > +SysIB sysib = {0};
> > > > +int len;
> > > > +
> > > > +if (!s390_has_topology() || sel2 < 2 || sel2 > 
> > > > SCLP_READ_SCP_INFO_MNEST) {
> > > > +setcc(cpu, 3);
> > > > +return;
> > > > +}
> > > > +
> > > > +s390_order_tle();
> > > > +
> > > > +len = setup_stsi(&sysib.sysib_151x, sel2);
> > > > +
> > > > +if (len < 0) {
> > 
> > I stumbled a bit over this, maybe rename len to r.
> 
> Why ? it is the length used to fill the length field of the SYSIB.

Well a negative length doesn't make sense, that's what confused me for a bit.
It's the error value of course, I proposed renaming it to the more generic r 
return value,
to signify that the return value isn't exactly the length.

You're solution of using 0 is fine with me, too.
> 
> May be it would be clearer if we give back the length to write and 0 on 
> error then we have here:
> 
>   if (!len) {
>   setcc(cpu, 3);
>   return;
>   }
> 
> > 
> > > > +setcc(cpu, 3);
> > > > +return;
> > > > +}
> > > > +
> > > > +sysib.sysib_151x.length = cpu_to_be16(len);
> > > > +s390_cpu_virt_mem_write(cpu, addr, ar, &sysib, len);
> > > > +setcc(cpu, 0);
> > > > +
> > > > +s390_free_tle();
> > > > +}
> 
> Thanks for the comments.
> If there are only comments and cosmetic changes will I get your RB if I 
> change them according to your wishes?

Usually I review the whole series and then comment, this time sent feedback 
early,
so the review isn't as deep. Probably easiest to give you the R-b for v16.
My impression is that the series is close to final.
> 
> Regards,
> Pierre
>

Re: [PATCH 10/10] MAINTAINERS: Add entry for RISC-V ACPI

2023-02-06 Thread Bin Meng

On Thu, Feb 2, 2023 at 12:54 PM Sunil V L  wrote:
>
> RISC-V ACPI related functionality for virt machine is added in
> virt-acpi-build.c. Add the maintainer entry.
>
> Signed-off-by: Sunil V L 
> ---
>  MAINTAINERS | 6 ++
>  1 file changed, 6 insertions(+)
>

Reviewed-by: Bin Meng

Re: Qemu - how to run in Win?

2023-02-06 Thread Jacob A

Understood, thanks. Will stick to GUI app.

On Mon, 6 Feb 2023 at 11:19, Bin Meng  wrote:

> On Mon, Feb 6, 2023 at 5:55 PM Philippe Mathieu-Daudé 
> wrote:
> >
> > Cc'ing Yonggang & Stefan.
> >
> > On 5/2/23 13:01, Jacob A wrote:
> > > Hello,
> > >
> > > After installing Qemu on Win, I don't see any shortcut to run it? There
> > > is only a link to 'uninstall'. launching exe files doesn't do anything.
> > > Can you please explain how to launch this application?
> > >
>
> QEMU is a command-line utility so you need to run QEMU with some
> parameters from a Windows shell.
>
> Regards,
> Bin
>

Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job

2023-02-06 Thread Peter Maydell

On Mon, 6 Feb 2023 at 08:46, Juan Quintela  wrote:
>  31/659 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test   
> ERROR  48.23s   killed by signal 6 SIGABRT
> >>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
> >>>  QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-aarch64 
> >>> MALLOC_PERTURB_=124 
> >>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
> >>> /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test
> >>>  --tap -k
> ― ✀  ―
> stderr:
> Broken pipe
> ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 11 
> (Segmentation fault) (core dumped)
> TAP parsing error: Too few tests run (expected 41, got 12)
> (test program exited with status code -6)
> ――
>
> I don't know hat to do with this:
> - this is aarch64 tcg
> - this *works* on f37, or at least I can't reproduce any error with make
>   check on my box, and I *think* my configuration is quite extensive (as
>   far as I know everything that can be compiled in fedora with packages
>   in the distro):

> - It gives a segmentation fault.  Nothing else.
>
> Can we get at least a backtrace to work from there?

Improving the test program / test harness to provide better
information would help. Ultimately if we're going to be
doing CI in gitlab this is a necessity -- all we are ever
going to have is whatever the test program and harness can
print to the logs...

thanks
-- PMM

Re: [PATCH 02/10] hw/riscv/virt: Add a switch to enable/disable ACPI

2023-02-06 Thread Andrea Bolognani

On Thu, Feb 02, 2023 at 10:22:15AM +0530, Sunil V L wrote:
> +object_class_property_add(oc, "acpi", "OnOffAuto",
> +  virt_get_acpi, virt_set_acpi,
> +  NULL, NULL);
> +object_class_property_set_description(oc, "acpi",
> +  "Enable ACPI");

The way this works on other architectures (x86_64, aarch64) is that
you get ACPI by default and can use -no-acpi to disable it if
desired. Can we have the same on RISC-V, for consistency?

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH] meson: Avoid duplicates in generated config-poison.h again

2023-02-06 Thread Philippe Mathieu-Daudé


On 6/2/23 11:20, Markus Armbruster wrote:

Commit eed56e9a89f "configure, meson: move config-poison.h to meson"
lost a "| sort -u".  Restore it.  config-poison shrinks from ~4500 to
~700 lines when all targets are enabled.

Signed-off-by: Markus Armbruster 
---
  scripts/make-config-poison.sh | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v15 05/11] s390x/cpu topology: resetting the Topology-Change-Report

2023-02-06 Thread Thomas Huth


On 01/02/2023 14.20, Pierre Morel wrote:

During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
---

...

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index a80a1ebf22..cf63f3dd01 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -85,6 +85,18 @@ static void s390_topology_init(MachineState *ms)
  QTAILQ_INSERT_HEAD(&s390_topology.list, entry, next);
  }
  
+/**

+ * s390_topology_reset:
+ *
+ * Generic reset for CPU topology, calls s390_topology_reset()
+ * s390_topology_reset() to reset the kernel Modified Topology


Duplicated s390_topology_reset() in the comment.


+ * change record.
+ */
+void s390_topology_reset(void)
+{
+s390_cpu_topology_reset();
+}


With the nit fixed:
Reviewed-by: Thomas Huth

Re: [PATCH 1/1] hw/loongarch/virt: add system_powerdown hmp command support

2023-02-06 Thread Philippe Mathieu-Daudé


On 12/1/23 07:11, Song Gao wrote:

For loongarch virt machine, add powerdown notification callback
and send ACPI_POWER_DOWN_STATUS event by acpi ged. Also add
acpi dsdt table for ACPI_POWER_BUTTON_DEVICE device in this
patch.

Signed-off-by: Song Gao 
---
  hw/loongarch/acpi-build.c   |  1 +
  hw/loongarch/virt.c | 14 ++
  include/hw/loongarch/virt.h |  1 +
  3 files changed, 16 insertions(+)


Cc'ing ACPI maintainers:

$ git grep acpi_send_event
include/hw/acpi/acpi_dev_interface.h:30:void acpi_send_event(DeviceState 
*dev, AcpiEventStatusBits event);


$ ./scripts/get_maintainer.pl -f include/hw/acpi/acpi_dev_interface.h
"Michael S. Tsirkin"  (supporter:ACPI/SMBIOS)
Igor Mammedov  (supporter:ACPI/SMBIOS)
Ani Sinha  (reviewer:ACPI/SMBIOS)
qemu-devel@nongnu.org (open list:All patches CC here)


diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index c2b237736d..b7601cb284 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -261,6 +261,7 @@ build_la_ged_aml(Aml *dsdt, MachineState *machine)
   AML_SYSTEM_MEMORY,
   VIRT_GED_MEM_ADDR);
  }
+acpi_dsdt_add_power_button(dsdt);
  }
  
  static void build_pci_device_aml(Aml *scope, LoongArchMachineState *lams)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 66be925068..a4998599d3 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -316,6 +316,16 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
  loongarch_acpi_setup(lams);
  }
  
+static void virt_powerdown_req(Notifier *notifier, void *opaque)

+{
+LoongArchMachineState *s = container_of(notifier,
+   LoongArchMachineState, powerdown_notifier);
+
+if (s->acpi_ged) {


Why check for acpi_ged? AFAICT it is always set.

Otherwise LGTM.

Reviewed-by: Philippe Mathieu-Daudé 


+acpi_send_event(s->acpi_ged, ACPI_POWER_DOWN_STATUS);
+}
+}
+
  struct memmap_entry {
  uint64_t address;
  uint64_t length;
@@ -859,6 +869,10 @@ static void loongarch_init(MachineState *machine)
 VIRT_PLATFORM_BUS_IRQ);
  lams->machine_done.notify = virt_machine_done;
  qemu_add_machine_init_done_notifier(&lams->machine_done);
+ /* connect powerdown request */
+lams->powerdown_notifier.notify = virt_powerdown_req;
+qemu_register_powerdown_notifier(&lams->powerdown_notifier);
+
  fdt_add_pcie_node(lams);
  /*
   * Since lowmem region starts from 0 and Linux kernel legacy start address
diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index f5f818894e..7ae8a91229 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -45,6 +45,7 @@ struct LoongArchMachineState {
  /* State for other subsystems/APIs: */
  FWCfgState  *fw_cfg;
  Notifier machine_done;
+Notifier powerdown_notifier;
  OnOffAutoacpi;
  char *oem_id;
  char *oem_table_id;

Re: [PATCH 02/10] hw/riscv/virt: Add a switch to enable/disable ACPI

2023-02-06 Thread Philippe Mathieu-Daudé


On 6/2/23 11:54, Andrea Bolognani wrote:

On Thu, Feb 02, 2023 at 10:22:15AM +0530, Sunil V L wrote:

+object_class_property_add(oc, "acpi", "OnOffAuto",
+  virt_get_acpi, virt_set_acpi,
+  NULL, NULL);
+object_class_property_set_description(oc, "acpi",
+  "Enable ACPI");


The way this works on other architectures (x86_64, aarch64) is that
you get ACPI by default and can use -no-acpi to disable it if
desired. Can we have the same on RISC-V, for consistency?


-no-acpi rather seems a x86-specific hack for the ISA PC machine, and
has a high maintenance cost / burden.

If hardware provides ACPI support, QEMU should expose it to the guest.

Actually, what is the value added by '-no-acpi'?

[PATCH RFCv1 2/8] memory: Add last stage indicator to global dirty log synchronization

2023-02-06 Thread Gavin Shan

The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callback. This last stage information isn't used yet.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c   |  2 +-
 include/exec/memory.h |  5 +++--
 migration/dirtyrate.c |  4 ++--
 migration/ram.c   |  6 +++---
 softmmu/memory.c  | 10 +-
 5 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9b26582655..01a6a026af 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener,
 kvm_slots_unlock();
 }
 
-static void kvm_log_sync_global(MemoryListener *l)
+static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
 {
 KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
 KVMState *s = kvm_state;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..75b2fd9f48 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -929,8 +929,9 @@ struct MemoryListener {
  * its @log_sync must be NULL.  Vice versa.
  *
  * @listener: The #MemoryListener.
+ * @last_stage: The last stage to synchronize the log during migration
  */
-void (*log_sync_global)(MemoryListener *listener);
+void (*log_sync_global)(MemoryListener *listener, bool last_stage);
 
 /**
  * @log_clear:
@@ -2408,7 +2409,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
  *
  * Synchronizes the dirty page log for all address spaces.
  */
-void memory_global_dirty_log_sync(void);
+void memory_global_dirty_log_sync(bool last_stage);
 
 /**
  * memory_global_dirty_log_sync: synchronize the dirty log for all memory
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 4bfb97fc68..aecc8142e4 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -101,7 +101,7 @@ void global_dirty_log_change(unsigned int flag, bool start)
 static void global_dirty_log_sync(unsigned int flag, bool one_shot)
 {
 qemu_mutex_lock_iothread();
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 if (one_shot) {
 memory_global_dirty_log_stop(flag);
 }
@@ -553,7 +553,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct 
DirtyRateConfig config)
  * skip it unconditionally and start dirty tracking
  * from 2'round of log sync
  */
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 
 /*
  * reset page protect manually and unconditionally.
diff --git a/migration/ram.c b/migration/ram.c
index 334309f1c6..d1b9b270ec 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 
 trace_migration_bitmap_sync_start();
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 
 qemu_mutex_lock(&rs->bitmap_mutex);
 WITH_RCU_READ_LOCK_GUARD() {
@@ -3817,7 +3817,7 @@ void colo_incoming_start_dirty_log(void)
 qemu_mutex_lock_iothread();
 qemu_mutex_lock_ramlist();
 
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 WITH_RCU_READ_LOCK_GUARD() {
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 ramblock_sync_dirty_bitmap(ram_state, block);
@@ -4114,7 +4114,7 @@ void colo_flush_ram_cache(void)
 void *src_host;
 unsigned long offset = 0;
 
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 WITH_RCU_READ_LOCK_GUARD() {
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 ramblock_sync_dirty_bitmap(ram_state, block);
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9d64efca26..1cc36ef028 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2224,7 +2224,7 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr 
addr,
  * If memory region `mr' is NULL, do global sync.  Otherwise, sync
  * dirty bitmap for the specified memory region.
  */
-static void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
+static void memory_region_sync_dirty_bitmap(MemoryRegion *mr, bool last_stage)
 {
 MemoryListener *listener;
 AddressSpace *as;
@@ -2254,7 +2254,7 @@ static void memory_region_sync_dirty_bitmap(MemoryRegion 
*mr)
  * is to do a global sync, because we are not capable to
  * sync in a finer granularity.
  */
-listener->log_sync_global(listener);
+listener->log_sync_global(listener, last_stage);
 trace_memory_region

[PATCH RFCv1 0/8] hw/arm/virt: Support dirty ring

2023-02-06 Thread Gavin Shan

This series intends to support dirty ring for live migration. The dirty
ring use discrete buffer to track dirty pages. For ARM64, the speciality
is to use backup bitmap to track dirty pages when there is no-running-vcpu
context. It's known that the backup bitmap needs to be synchronized when
KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup
bitmap is collected in the last stage of migration.

PATCH[1]Synchronize linux-headers for dirty ring
PATCH[2-3]  Introduce indicator of the last stage migration and pass it
all the way down
PATCH[4-5]  Introduce secondary bitmap, corresponding to the backup bitmap
PATCH[6-8]  Enable dirty ring for hw/arm/virt

Testing
===
(1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with
dirty ring or normal dirty page tracking mechanism. All test cases passed.

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
./its-pending-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
./its-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
./its-pending-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
./its-migration

(2) Combinations of migration, post-copy migration, e1000e and virtio-net
devices. All test cases passed.

-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
-device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0

-netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0

Gavin Shan (8):
  linux-headers: Update for dirty ring
  memory: Add last stage indicator to global dirty log synchronization
  migration: Add last stage indicator to global dirty log
synchronization
  kvm: Introduce secondary dirty bitmap
  kvm: Synchronize secondary bitmap in last stage
  kvm: Add helper kvm_dirty_ring_init()
  hw/arm/virt: Enable backup bitmap for dirty ring
  kvm: Enable dirty ring for arm64

 accel/kvm/kvm-all.c   | 138 --
 hw/arm/virt.c |  26 +++
 include/exec/memory.h |   5 +-
 include/sysemu/kvm_int.h  |   1 +
 linux-headers/asm-arm64/kvm.h |   1 +
 linux-headers/linux/kvm.h |   2 +
 migration/dirtyrate.c |   4 +-
 migration/ram.c   |  20 ++---
 softmmu/memory.c  |  10 +--
 target/arm/kvm64.c|  25 ++
 target/arm/kvm_arm.h  |  12 +++
 11 files changed, 185 insertions(+), 59 deletions(-)

-- 
2.23.0

[PATCH RFCv1 1/8] linux-headers: Update for dirty ring

2023-02-06 Thread Gavin Shan

Signed-off-by: Gavin Shan 
---
 linux-headers/asm-arm64/kvm.h | 1 +
 linux-headers/linux/kvm.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 4bf2d7246e..a7cfefb3a8 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)   \
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index ebdafa576d..5b4e0e4e68 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1175,6 +1175,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.23.0

[PATCH RFCv1 3/8] migration: Add last stage indicator to global dirty log synchronization

2023-02-06 Thread Gavin Shan

For the pre-copy live migration scenario, the last stage indicator
is needed for KVM backend to collect the dirty pages from the backup
bitmap when dirty ring is used. The indicator isn't used so far.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 migration/ram.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d1b9b270ec..6c9642edee 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1176,7 +1176,7 @@ static void migration_trigger_throttle(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync(RAMState *rs)
+static void migration_bitmap_sync(RAMState *rs, bool last_stage)
 {
 RAMBlock *block;
 int64_t end_time;
@@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 
 trace_migration_bitmap_sync_start();
-memory_global_dirty_log_sync(false);
+memory_global_dirty_log_sync(last_stage);
 
 qemu_mutex_lock(&rs->bitmap_mutex);
 WITH_RCU_READ_LOCK_GUARD() {
@@ -1222,7 +1222,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync_precopy(RAMState *rs)
+static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage)
 {
 Error *local_err = NULL;
 
@@ -1235,7 +1235,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
 local_err = NULL;
 }
 
-migration_bitmap_sync(rs);
+migration_bitmap_sync(rs, last_stage);
 
 if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, &local_err)) {
 error_report_err(local_err);
@@ -2844,7 +2844,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
 RCU_READ_LOCK_GUARD();
 
 /* This should be our last sync, the src is now paused */
-migration_bitmap_sync(rs);
+migration_bitmap_sync(rs, false);
 
 /* Easiest way to make sure we don't resume in the middle of a host-page */
 rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
@@ -3034,7 +3034,7 @@ static void ram_init_bitmaps(RAMState *rs)
 /* We don't use dirty log with background snapshots */
 if (!migrate_background_snapshot()) {
 memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy(rs, false);
 }
 }
 qemu_mutex_unlock_ramlist();
@@ -3349,7 +3349,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
 WITH_RCU_READ_LOCK_GUARD() {
 if (!migration_in_postcopy()) {
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy(rs, true);
 }
 
 ram_control_before_iterate(f, RAM_CONTROL_FINISH);
@@ -3407,7 +3407,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 remaining_size < max_size) {
 qemu_mutex_lock_iothread();
 WITH_RCU_READ_LOCK_GUARD() {
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy(rs, false);
 }
 qemu_mutex_unlock_iothread();
 remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
-- 
2.23.0

[PATCH RFCv1 4/8] kvm: Introduce secondary dirty bitmap

2023-02-06 Thread Gavin Shan

When dirty ring is enabled on ARM64, the backup bitmap may be used
to track the dirty pages in no-running-vcpu situations. The original
bitmap is the primary one, used for the dirty ring buffer. We need
the secondary bitmap to collect the backup bitmap for ARM64.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c  | 50 ++--
 include/sysemu/kvm_int.h |  1 +
 2 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 01a6a026af..1a93985574 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -553,13 +553,29 @@ static void kvm_log_stop(MemoryListener *listener,
 }
 }
 
+static unsigned long *kvm_slot_dirty_bitmap(KVMSlot *slot, bool primary)
+{
+if (primary) {
+return slot->dirty_bmap;
+}
+
+return slot->dirty_bmap +
+   slot->dirty_bmap_size / sizeof(slot->dirty_bmap[0]);
+}
+
 /* get kvm's dirty pages bitmap and update qemu's */
-static void kvm_slot_sync_dirty_pages(KVMSlot *slot)
+static void kvm_slot_sync_dirty_pages(KVMSlot *slot, bool primary)
 {
+KVMState *s = kvm_state;
+unsigned long *bmap = kvm_slot_dirty_bitmap(slot, primary);
 ram_addr_t start = slot->ram_start_offset;
 ram_addr_t pages = slot->memory_size / qemu_real_host_page_size();
 
-cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages);
+if (!s->kvm_dirty_ring_with_bitmap && !primary) {
+return;
+}
+
+cpu_physical_memory_set_dirty_lebitmap(bmap, start, pages);
 }
 
 static void kvm_slot_reset_dirty_pages(KVMSlot *slot)
@@ -572,6 +588,9 @@ static void kvm_slot_reset_dirty_pages(KVMSlot *slot)
 /* Allocate the dirty bitmap for a slot  */
 static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
 {
+KVMState *s = kvm_state;
+hwaddr bitmap_size, alloc_size;
+
 if (!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) || mem->dirty_bmap) {
 return;
 }
@@ -593,9 +612,11 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
  * And mem->memory_size is aligned to it (otherwise this mem can't
  * be registered to KVM).
  */
-hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size(),
-/*HOST_LONG_BITS*/ 64) / 8;
-mem->dirty_bmap = g_malloc0(bitmap_size);
+bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size(),
+/*HOST_LONG_BITS*/ 64) / 8;
+alloc_size = s->kvm_dirty_ring_with_bitmap ? 2 * bitmap_size : bitmap_size;
+
+mem->dirty_bmap = g_malloc0(alloc_size);
 mem->dirty_bmap_size = bitmap_size;
 }
 
@@ -603,12 +624,16 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
  * Sync dirty bitmap from kernel to KVMSlot.dirty_bmap, return true if
  * succeeded, false otherwise
  */
-static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot)
+static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot, bool primary)
 {
 struct kvm_dirty_log d = {};
 int ret;
 
-d.dirty_bitmap = slot->dirty_bmap;
+if (!s->kvm_dirty_ring_with_bitmap && !primary) {
+return false;
+}
+
+d.dirty_bitmap = kvm_slot_dirty_bitmap(slot, primary);
 d.slot = slot->slot | (slot->as_id << 16);
 ret = kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d);
 
@@ -839,8 +864,8 @@ static void 
kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
 /* We don't have a slot if we want to trap every access. */
 return;
 }
-if (kvm_slot_get_dirty_log(s, mem)) {
-kvm_slot_sync_dirty_pages(mem);
+if (kvm_slot_get_dirty_log(s, mem, true)) {
+kvm_slot_sync_dirty_pages(mem, true);
 }
 start_addr += slot_size;
 size -= slot_size;
@@ -1353,9 +1378,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 if (kvm_state->kvm_dirty_ring_size) {
 kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
-kvm_slot_get_dirty_log(kvm_state, mem);
+kvm_slot_get_dirty_log(kvm_state, mem, true);
 }
-kvm_slot_sync_dirty_pages(mem);
+kvm_slot_sync_dirty_pages(mem, true);
 }
 
 /* unregister the slot */
@@ -1572,7 +1597,7 @@ static void kvm_log_sync_global(MemoryListener *l, bool 
last_stage)
 for (i = 0; i < s->nr_slots; i++) {
 mem = &kml->slots[i];
 if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-kvm_slot_sync_dirty_pages(mem);
+kvm_slot_sync_dirty_pages(mem, true);
 /*
  * This is not needed by KVM_GET_DIRTY_LOG because the
  * ioctl will unconditionally overwrite the whole region.
@@ -3701,6 +3726,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
 /* KVM dirty ring is by default off */
 s->kvm_dirty_ring_size = 0;
+

[PATCH RFCv1 8/8] kvm: Enable dirty ring for arm64

2023-02-06 Thread Gavin Shan

arm64 has different capability from x86 to enable the dirty ring, which
is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. To enable it in kvm_dirty_ring_init()
when KVM_CAP_DIRTY_LOG_RING isn't supported.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 399ef0f7e6..8ab31865eb 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1479,13 +1479,19 @@ static int kvm_dirty_ring_reaper_init(KVMState *s)
 static int kvm_dirty_ring_init(KVMState *s)
 {
 uint64_t ring_bytes;
+unsigned int capability = KVM_CAP_DIRTY_LOG_RING;
 int ret;
 
 /*
  * Read the max supported pages. Fall back to dirty logging mode
  * if the dirty ring isn't supported.
  */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+ret = kvm_vm_check_extension(s, capability);
+if (ret <= 0) {
+capability = KVM_CAP_DIRTY_LOG_RING_ACQ_REL;
+ret = kvm_vm_check_extension(s, capability);
+}
+
 if (ret <= 0) {
 warn_report("KVM dirty ring not available, using bitmap method");
 s->kvm_dirty_ring_size = 0;
@@ -1502,7 +1508,7 @@ static int kvm_dirty_ring_init(KVMState *s)
 goto out;
 }
 
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+ret = kvm_vm_enable_cap(s, capability, 0, ring_bytes);
 if (ret) {
 error_report("Enabling of KVM dirty ring failed: %s. "
  "Suggested minimum value is 1024.", strerror(-ret));
-- 
2.23.0

[PATCH RFCv1 7/8] hw/arm/virt: Enable backup bitmap for dirty ring

2023-02-06 Thread Gavin Shan

When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to
enable the backup bitmap for the dirty ring. Otherwise, the migration
will fail because those two devices are using the backup bitmap to track
dirty guest memory, corresponding to various hardware tables.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c| 26 ++
 target/arm/kvm64.c   | 25 +
 target/arm/kvm_arm.h | 12 
 3 files changed, 63 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ba47728288..b91b5972bc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2025,6 +2025,8 @@ static void machvirt_init(MachineState *machine)
 VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *possible_cpus;
+const char *gictype = NULL;
+const char *itsclass = NULL;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *secure_sysmem = NULL;
 MemoryRegion *tag_sysmem = NULL;
@@ -2072,6 +2074,30 @@ static void machvirt_init(MachineState *machine)
  */
 finalize_gic_version(vms);
 
+/*
+ * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty
+ * bitmap has to be enabled for KVM dirty ring, before any memory
+ * slot is added. Otherwise, the migration will fail with the dirty
+ * ring.
+ */
+if (kvm_enabled()) {
+if (vms->gic_version != VIRT_GIC_VERSION_2) {
+gictype = gicv3_class_name();
+}
+
+if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) {
+itsclass = its_class_name();
+}
+
+if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) ||
+ (itsclass && !strcmp(itsclass, "arm-its-kvm"))) &&
+!kvm_arm_enable_dirty_ring_with_bitmap()) {
+error_report("Failed to enable the backup bitmap for "
+ "KVM dirty ring");
+exit(1);
+}
+}
+
 if (vms->secure) {
 /*
  * The Secure view of the world is the same as the NonSecure,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1197253d12..91960e1cd3 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -764,6 +764,31 @@ bool kvm_arm_steal_time_supported(void)
 return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME);
 }
 
+bool kvm_arm_enable_dirty_ring_with_bitmap(void)
+{
+int ret;
+
+/* No need to enable the backup bitmap if dirty ring isn't enabled */
+if (!kvm_dirty_ring_enabled()) {
+return true;
+}
+
+ret = kvm_check_extension(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP);
+if (!ret) {
+return false;
+}
+
+ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP,
+0, 1);
+if (ret) {
+return false;
+}
+
+kvm_state->kvm_dirty_ring_with_bitmap = true;
+
+return true;
+}
+
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
 uint32_t kvm_arm_sve_get_vls(CPUState *cs)
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 99017b635c..402281e61a 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -282,6 +282,13 @@ void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error 
**errp);
  */
 bool kvm_arm_steal_time_supported(void);
 
+/**
+ * kvm_arm_enable_dirty_ring_with_bitmap:
+ * Returns: true if KVM dirty ring's backup bitmap is enabled
+ * and false otherwise.
+ */
+bool kvm_arm_enable_dirty_ring_with_bitmap(void);
+
 /**
  * kvm_arm_aarch32_supported:
  *
@@ -395,6 +402,11 @@ static inline bool kvm_arm_steal_time_supported(void)
 return false;
 }
 
+static inline bool kvm_arm_enable_dirty_ring_with_bitmap(void)
+{
+return false;
+}
+
 /*
  * These functions should never actually be called without KVM support.
  */
-- 
2.23.0

[PATCH RFCv1 5/8] kvm: Synchronize secondary bitmap in last stage

2023-02-06 Thread Gavin Shan

In the last stage of live migration or memory slot removal, the
backup bitmap needs to be synchronized.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 1a93985574..9ec117c441 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1377,10 +1377,12 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  */
 if (kvm_state->kvm_dirty_ring_size) {
 kvm_dirty_ring_reap_locked(kvm_state, NULL);
+kvm_slot_get_dirty_log(kvm_state, mem, false);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem, true);
 }
 kvm_slot_sync_dirty_pages(mem, true);
+kvm_slot_sync_dirty_pages(mem, false);
 }
 
 /* unregister the slot */
@@ -1604,6 +1606,11 @@ static void kvm_log_sync_global(MemoryListener *l, bool 
last_stage)
  * However kvm dirty ring has no such side effect.
  */
 kvm_slot_reset_dirty_pages(mem);
+
+if (s->kvm_dirty_ring_with_bitmap && last_stage &&
+kvm_slot_get_dirty_log(s, mem, false)) {
+kvm_slot_sync_dirty_pages(mem, false);
+}
 }
 }
 kvm_slots_unlock();
-- 
2.23.0

[PATCH RFCv1 6/8] kvm: Add helper kvm_dirty_ring_init()

2023-02-06 Thread Gavin Shan

Due to multiple capabilities associated with the dirty ring for different
architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and
arm64 separately. There will be more to be done in order to support the
dirty ring for arm64.

Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this,
the code looks a bit clean.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c | 73 -
 1 file changed, 46 insertions(+), 27 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9ec117c441..399ef0f7e6 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1476,6 +1476,49 @@ static int kvm_dirty_ring_reaper_init(KVMState *s)
 return 0;
 }
 
+static int kvm_dirty_ring_init(KVMState *s)
+{
+uint64_t ring_bytes;
+int ret;
+
+/*
+ * Read the max supported pages. Fall back to dirty logging mode
+ * if the dirty ring isn't supported.
+ */
+ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+if (ret <= 0) {
+warn_report("KVM dirty ring not available, using bitmap method");
+s->kvm_dirty_ring_size = 0;
+return 0;
+}
+
+ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
+if (ring_bytes > ret) {
+error_report("KVM dirty ring size %" PRIu32 " too big "
+ "(maximum is %ld).  Please use a smaller value.",
+ s->kvm_dirty_ring_size,
+ (long)ret / sizeof(struct kvm_dirty_gfn));
+ret = -EINVAL;
+goto out;
+}
+
+ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+if (ret) {
+error_report("Enabling of KVM dirty ring failed: %s. "
+ "Suggested minimum value is 1024.", strerror(-ret));
+ret = -EIO;
+}
+
+out:
+if (ret) {
+s->kvm_dirty_ring_size = 0;
+} else {
+s->kvm_dirty_ring_bytes = ring_bytes;
+}
+
+return ret;
+}
+
 static void kvm_region_add(MemoryListener *listener,
MemoryRegionSection *section)
 {
@@ -2545,33 +2588,9 @@ static int kvm_init(MachineState *ms)
  * dirty logging mode
  */
 if (s->kvm_dirty_ring_size > 0) {
-uint64_t ring_bytes;
-
-ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
-
-/* Read the max supported pages */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
-if (ret > 0) {
-if (ring_bytes > ret) {
-error_report("KVM dirty ring size %" PRIu32 " too big "
- "(maximum is %ld).  Please use a smaller value.",
- s->kvm_dirty_ring_size,
- (long)ret / sizeof(struct kvm_dirty_gfn));
-ret = -EINVAL;
-goto err;
-}
-
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
-if (ret) {
-error_report("Enabling of KVM dirty ring failed: %s. "
- "Suggested minimum value is 1024.", 
strerror(-ret));
-goto err;
-}
-
-s->kvm_dirty_ring_bytes = ring_bytes;
- } else {
- warn_report("KVM dirty ring not available, using bitmap method");
- s->kvm_dirty_ring_size = 0;
+ret = kvm_dirty_ring_init(s);
+if (ret < 0) {
+goto err;
 }
 }
 
-- 
2.23.0

Re: [PATCH v15 03/11] target/s390x/cpu topology: handle STSI(15) and build the SYSIB

2023-02-06 Thread Thomas Huth


On 01/02/2023 14.20, Pierre Morel wrote:

On interception of STSI(15.1.x) the System Information Block
(SYSIB) is built from the list of pre-ordered topology entries.

Signed-off-by: Pierre Morel 
---

...

diff --git a/include/hw/s390x/sclp.h b/include/hw/s390x/sclp.h
index d3ade40a5a..712fd68123 100644
--- a/include/hw/s390x/sclp.h
+++ b/include/hw/s390x/sclp.h
@@ -112,6 +112,7 @@ typedef struct CPUEntry {
  } QEMU_PACKED CPUEntry;
  
  #define SCLP_READ_SCP_INFO_FIXED_CPU_OFFSET 128

+#define SCLP_READ_SCP_INFO_MNEST2
  typedef struct ReadInfo {
  SCCBHeader h;
  uint16_t rnmax;
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index d654267a71..e1f6925856 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -560,6 +560,25 @@ typedef struct SysIB_322 {
  } SysIB_322;
  QEMU_BUILD_BUG_ON(sizeof(SysIB_322) != 4096);
  
+#define S390_TOPOLOGY_MAG  6

+#define S390_TOPOLOGY_MAG6 0
+#define S390_TOPOLOGY_MAG5 1
+#define S390_TOPOLOGY_MAG4 2
+#define S390_TOPOLOGY_MAG3 3
+#define S390_TOPOLOGY_MAG2 4
+#define S390_TOPOLOGY_MAG1 5
+/* Configuration topology */
+typedef struct SysIB_151x {
+uint8_t  reserved0[2];
+uint16_t length;
+uint8_t  mag[S390_TOPOLOGY_MAG];
+uint8_t  reserved1;
+uint8_t  mnest;
+uint32_t reserved2;
+char tle[];
+} QEMU_PACKED QEMU_ALIGNED(8) SysIB_151x;
+QEMU_BUILD_BUG_ON(sizeof(SysIB_151x) != 16);
+
  typedef union SysIB {
  SysIB_111 sysib_111;
  SysIB_121 sysib_121;
@@ -567,9 +586,62 @@ typedef union SysIB {
  SysIB_221 sysib_221;
  SysIB_222 sysib_222;
  SysIB_322 sysib_322;
+SysIB_151x sysib_151x;
  } SysIB;
  QEMU_BUILD_BUG_ON(sizeof(SysIB) != 4096);
  
+/*

+ * CPU Topology List provided by STSI with fc=15 provides a list
+ * of two different Topology List Entries (TLE) types to specify
+ * the topology hierarchy.
+ *
+ * - Container Topology List Entry
+ *   Defines a container to contain other Topology List Entries
+ *   of any type, nested containers or CPU.
+ * - CPU Topology List Entry
+ *   Specifies the CPUs position, type, entitlement and polarization
+ *   of the CPUs contained in the last Container TLE.
+ *
+ * There can be theoretically up to five levels of containers, QEMU
+ * uses only three levels, the drawer's, book's and socket's level.
+ *
+ * A container of with a nesting level (NL) greater than 1 can only
+ * contain another container of nesting level NL-1.
+ *
+ * A container of nesting level 1 (socket), contains as many CPU TLE
+ * as needed to describe the position and qualities of all CPUs inside
+ * the container.
+ * The qualities of a CPU are polarization, entitlement and type.
+ *
+ * The CPU TLE defines the position of the CPUs of identical qualities
+ * using a 64bits mask which first bit has its offset defined by
+ * the CPU address orgin field of the CPU TLE like in:
+ * CPU address = origin * 64 + bit position within the mask
+ *
+ */
+/* Container type Topology List Entry */
+typedef struct SysIBTl_container {
+uint8_t nl;
+uint8_t reserved[6];
+uint8_t id;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_container;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_container) != 8);
+
+/* CPU type Topology List Entry */
+typedef struct SysIBTl_cpu {
+uint8_t nl;
+uint8_t reserved0[3];
+#define SYSIB_TLE_POLARITY_MASK 0x03
+#define SYSIB_TLE_DEDICATED 0x04
+uint8_t entitlement;
+uint8_t type;
+uint16_t origin;
+uint64_t mask;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_cpu;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_cpu) != 16);
+
+void insert_stsi_15_1_x(S390CPU *cpu, int sel2, __u64 addr, uint8_t ar);


__u64 is not a portable type (e.g. fails when doing "make 
vm-build-openbsd"), please replace with uint64_t.


 Thanks,
  Thomas

Re: [PULL 00/16] ppc queue

2023-02-06 Thread Peter Maydell

On Sun, 5 Feb 2023 at 10:04, Daniel Henrique Barboza
 wrote:
>
> The following changes since commit ceabf6e500570ecfb311d8896c4ba9da8cf21f2a:
>
>   Merge tag 'linux-user-for-8.0-pull-request' of 
> https://gitlab.com/laurent_vivier/qemu into staging (2023-02-04 17:17:15 
> +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/danielhb/qemu.git tags/pull-ppc-20230205
>
> for you to fetch changes up to bd591dc1b3c39b7f73b8d9f20be6e9001c905238:
>
>   hw/display/sm501: Code style fix (2023-02-05 06:40:28 -0300)
>
> 
> ppc patch queue for 2023-02-05:
>
> This queue includes patches that aren't PPC specific but benefit/impact
> PPC machines, such as the changes to guestperf.py, mv64361 and sm501. As
> for PPC specific changes we have e500 and PNV_PHB5 fixes.
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: [PATCH v15 06/11] s390x/cpu topology: interception of PTF instruction

2023-02-06 Thread Thomas Huth


On 01/02/2023 14.20, Pierre Morel wrote:

When the host supports the CPU topology facility, the PTF
instruction with function code 2 is interpreted by the SIE,
provided that the userland hypervizor activates the interpretation


s/hypervizor/hypervisor/


by using the KVM_CAP_S390_CPU_TOPOLOGY KVM extension.

The PTF instructions with function code 0 and 1 are intercepted
and must be emulated by the userland hypervizor.


dito


During RESET all CPU of the configuration are placed in
horizontal polarity.

Signed-off-by: Pierre Morel 
---

...

  /**
   * s390_topology_reset:
   *
   * Generic reset for CPU topology, calls s390_topology_reset()
   * s390_topology_reset() to reset the kernel Modified Topology
   * change record.
+ * Then set global and all CPUs polarity to POLARITY_HORIZONTAL.


You describe here already what's going to happen...


   */
  void s390_topology_reset(void)
  {
  s390_cpu_topology_reset();
+/* Set global polarity to POLARITY_HORIZONTAL */


... then here again ...


+s390_topology.polarity = POLARITY_HORIZONTAL;


... and the code is (fortunately) also very self-exaplaining...


+/* Set all CPU polarity to POLARITY_HORIZONTAL */
+s390_topology_set_cpus_polarity(POLARITY_HORIZONTAL);


... so I'd rather drop the two comments within the function body.


  }


(rest of the patch looks fine to me)

 Thomas

Re: Call for GSoC and Outreachy project ideas for summer 2023

2023-02-06 Thread Eugenio Perez Martin

On Sun, Feb 5, 2023 at 2:57 PM Stefan Hajnoczi  wrote:
>
> On Sun, 5 Feb 2023 at 03:15, Eugenio Perez Martin  wrote:
> >
> > On Fri, Jan 27, 2023 at 4:18 PM Stefan Hajnoczi  wrote:
> > >
> > > Dear QEMU, KVM, and rust-vmm communities,
> > > QEMU will apply for Google Summer of Code 2023
> > > (https://summerofcode.withgoogle.com/) and has been accepted into
> > > Outreachy May 2023 (https://www.outreachy.org/). You can now
> > > submit internship project ideas for QEMU, KVM, and rust-vmm!
> > >
> > > Please reply to this email by February 6th with your project ideas.
> > >
> > > If you have experience contributing to QEMU, KVM, or rust-vmm you can
> > > be a mentor. Mentors support interns as they work on their project. It's a
> > > great way to give back and you get to work with people who are just
> > > starting out in open source.
> > >
> > > Good project ideas are suitable for remote work by a competent
> > > programmer who is not yet familiar with the codebase. In
> > > addition, they are:
> > > - Well-defined - the scope is clear
> > > - Self-contained - there are few dependencies
> > > - Uncontroversial - they are acceptable to the community
> > > - Incremental - they produce deliverables along the way
> > >
> > > Feel free to post ideas even if you are unable to mentor the project.
> > > It doesn't hurt to share the idea!
> > >
> > > I will review project ideas and keep you up-to-date on QEMU's
> > > acceptance into GSoC.
> > >
> > > Internship program details:
> > > - Paid, remote work open source internships
> > > - GSoC projects are 175 or 350 hours, Outreachy projects are 30
> > > hrs/week for 12 weeks
> > > - Mentored by volunteers from QEMU, KVM, and rust-vmm
> > > - Mentors typically spend at least 5 hours per week during the coding 
> > > period
> > >
> > > For more background on QEMU internships, check out this video:
> > > https://www.youtube.com/watch?v=xNVCX7YMUL8
> > >
> > > Please let me know if you have any questions!
> > >
> > > Stefan
> > >
> >
> > Appending the different ideas here.
>
> Hi Eugenio,
> Thanks for sharing your project ideas. I have added some questions
> below before we add them to the ideas list wiki page.
>
> > VIRTIO_F_IN_ORDER feature support for virtio devices
> > ===
> > This was already a project the last year, and it produced a few series
> > upstream but was never merged. The previous series are totally useful
> > to start with, so it's not starting from scratch with them [1]:
>
> Has Zhi Guo stopped working on the patches?
>

I can ask him for sure.

> What is the state of the existing patches? What work remains to be done?
>

There are some pending comments from upstream. However if somebody
starts it from scratch it needs time to review some of the VirtIO
standard to understand the virtio in_order feature, both in split and
packed vq.


> >
> > Summary
> > ---
> > Implement VIRTIO_F_IN_ORDER in QEMU and Linux (vhost and virtio drivers)
> >
> > The VIRTIO specification defines a feature bit (VIRTIO_F_IN_ORDER)
> > that devices and drivers can negotiate when the device uses
> > descriptors in the same order in which they were made available by the
> > driver.
> >
> > This feature can simplify device and driver implementations and
> > increase performance. For example, when VIRTIO_F_IN_ORDER is
> > negotiated, it may be easier to create a batch of buffers and reduce
> > DMA transactions when the device uses a batch of buffers.
> >
> > Currently the devices and drivers available in Linux and QEMU do not
> > support this feature. An implementation is available in DPDK for the
> > virtio-net driver.
> >
> > Goals
> > ---
> > Implement VIRTIO_F_IN_ORDER for a single device/driver in QEMU and
> > Linux (virtio-net or virtio-serial are good starting points).
> > Generalize your approach to the common virtio core code for split and
> > packed virtqueue layouts.
> > If time allows, support for the packed virtqueue layout can be added
> > to Linux vhost, QEMU's libvhost-user, and/or QEMU's virtio qtest code.
> >
> > Shadow Virtqueue missing virtio features
> > ===
> >
> > Summary
> > ---
> > Some VirtIO devices like virtio-net have a control virtqueue (CVQ)
> > that allows them to dynamically change a number of parameters like MAC
> > or number of active queues. Changes to passthrough devices using vDPA
> > using CVQ are inherently hard to track if CVQ is handled as
> > passthrough data queues, because qemu is not aware of that
> > communication for performance reasons. In this situation, qemu is not
> > able to migrate these devices, as it is not able to tell the actual
> > state of the device.
> >
> > Shadow Virtqueue (SVQ) allows qemu to offer an emulated queue to the
> > device, effectively forwarding the descriptors of that communication,
> > tracking the device internal state, and being able to migrate it to a
> > new destination qemu.
> >
> > To restore that state in the destination, SVQ is able to send these
> > messages as regular CVQ commands. The code to u

Re: [PATCH v15 07/11] target/s390x/cpu topology: activating CPU topology

2023-02-06 Thread Thomas Huth


On 01/02/2023 14.20, Pierre Morel wrote:

The KVM capability KVM_CAP_S390_CPU_TOPOLOGY is used to
activate the S390_FEAT_CONFIGURATION_TOPOLOGY feature and
the topology facility in the host CPU model for the guest
in the case the topology is available in QEMU and in KVM.

The feature is disabled by default and fenced for SE
(secure execution).

Signed-off-by: Pierre Morel 
---
  hw/s390x/cpu-topology.c   |  2 +-
  target/s390x/cpu_models.c |  1 +
  target/s390x/kvm/kvm.c| 12 
  3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index 1028bf4476..c33378577b 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -55,7 +55,7 @@ int s390_socket_nb(S390CPU *cpu)
   */
  bool s390_has_topology(void)
  {
-return false;
+return s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY);
  }
  
  /**

diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 065ec6d66c..aca2c5c96b 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -254,6 +254,7 @@ bool s390_has_feat(S390Feat feat)
  case S390_FEAT_SIE_CMMA:
  case S390_FEAT_SIE_PFMFI:
  case S390_FEAT_SIE_IBS:
+case S390_FEAT_CONFIGURATION_TOPOLOGY:
  return false;
  break;
  default:
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index fb63be41b7..808e35a7bd 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2470,6 +2470,18 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, 
Error **errp)
  set_bit(S390_FEAT_UNPACK, model->features);
  }
  
+/*

+ * If we have kernel support for CPU Topology indicate the
+ * configuration-topology facility.
+ */
+if (kvm_check_extension(kvm_state, KVM_CAP_S390_CPU_TOPOLOGY)) {
+if (kvm_vm_enable_cap(kvm_state, KVM_CAP_S390_CPU_TOPOLOGY, 0) < 0) {
+error_setg(errp, "KVM: Error enabling KVM_CAP_S390_CPU_TOPOLOGY");
+return;
+}
+set_bit(S390_FEAT_CONFIGURATION_TOPOLOGY, model->features);
+}


Not sure, but for the other capabilities, the kvm_vm_enable_cap() is rather 
done in kvm_arch_init() instead ... likely that it is properly available in 
case you don't run with the "host" cpu model? So should the 
kvm_vm_enable_cap(KVM_CAP_S390_CPU_TOPOLOGY) also be moved there (but of 
course keep the set_bit() here in kvm_s390_get_host_cpu_model())?


 Thomas

Re: [PATCH v15 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2023-02-06 Thread Stefan Hajnoczi

)(_On Sun, 29 Jan 2023 at 05:30, Sam Li  wrote:
>
> Add zoned device option to host_device BlockDriver. It will be presented only
> for zoned host block devices. By adding zone management operations to the
> host_block_device BlockDriver, users can use the new block layer APIs
> including Report Zone and four zone management operations
> (open, close, finish, reset, reset_all).
>
> Qemu-io uses the new APIs to perform zoned storage commands of the device:
> zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
> zone_finish(zf).
>
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
> -c "zrp offset nr_zones"
>
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> Reviewed-by: Stefan Hajnoczi 
> ---
>  block/block-backend.c | 147 ++
>  block/file-posix.c| 323 ++
>  block/io.c|  41 
>  include/block/block-io.h  |   7 +
>  include/block/block_int-common.h  |  21 ++
>  include/block/raw-aio.h   |   6 +-
>  include/sysemu/block-backend-io.h |  18 ++
>  meson.build   |   4 +
>  qemu-io-cmds.c| 149 ++
>  9 files changed, 715 insertions(+), 1 deletion(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index ba7bf1d6bc..a4847b9131 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1451,6 +1451,15 @@ typedef struct BlkRwCo {
>  void *iobuf;
>  int ret;
>  BdrvRequestFlags flags;
> +union {
> +struct {
> +unsigned int *nr_zones;
> +BlockZoneDescriptor *zones;
> +} zone_report;
> +struct {
> +unsigned long op;
> +} zone_mgmt;
> +};
>  } BlkRwCo;
>
>  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
> @@ -1795,6 +1804,144 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>  return ret;
>  }
>
> +static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
> +{
> +BlkAioEmAIOCB *acb = opaque;
> +BlkRwCo *rwco = &acb->rwco;
> +
> +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
> +   rwco->zone_report.nr_zones,
> +   rwco->zone_report.zones);
> +blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
> +unsigned int *nr_zones,
> +BlockZoneDescriptor  *zones,
> +BlockCompletionFunc *cb, void *opaque)
> +{
> +BlkAioEmAIOCB *acb;
> +Coroutine *co;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
> +acb->rwco = (BlkRwCo) {
> +.blk= blk,
> +.offset = offset,
> +.ret= NOT_DONE,
> +.zone_report = {
> +.zones = zones,
> +.nr_zones = nr_zones,
> +},
> +};
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
> +bdrv_coroutine_enter(blk_bs(blk), co);
> +
> +acb->has_returned = true;
> +if (acb->rwco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +}
> +
> +return &acb->common;
> +}
> +
> +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
> +{
> +BlkAioEmAIOCB *acb = opaque;
> +BlkRwCo *rwco = &acb->rwco;
> +
> +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
> + rwco->offset, acb->bytes);
> +blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +  int64_t offset, int64_t len,
> +  BlockCompletionFunc *cb, void *opaque) {
> +BlkAioEmAIOCB *acb;
> +Coroutine *co;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
> +acb->rwco = (BlkRwCo) {
> +.blk= blk,
> +.offset = offset,
> +.ret= NOT_DONE,
> +.zone_mgmt = {
> +.op = op,
> +},
> +};
> +acb->bytes = len;
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
> +bdrv_coroutine_enter(blk_bs(blk), co);
> +
> +acb->has_returned = true;
> +if (acb->rwco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +}
> +
> +return &acb->common;
> +}
> +
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OU

Re: [PATCH v15 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2023-02-06 Thread Sam Li

Stefan Hajnoczi  于2023年2月6日周一 20:04写道：
>
> )(_On Sun, 29 Jan 2023 at 05:30, Sam Li  wrote:
> >
> > Add zoned device option to host_device BlockDriver. It will be presented 
> > only
> > for zoned host block devices. By adding zone management operations to the
> > host_block_device BlockDriver, users can use the new block layer APIs
> > including Report Zone and four zone management operations
> > (open, close, finish, reset, reset_all).
> >
> > Qemu-io uses the new APIs to perform zoned storage commands of the device:
> > zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
> > zone_finish(zf).
> >
> > For example, to test zone_report, use following command:
> > $ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
> > -c "zrp offset nr_zones"
> >
> > Signed-off-by: Sam Li 
> > Reviewed-by: Hannes Reinecke 
> > Reviewed-by: Stefan Hajnoczi 
> > ---
> >  block/block-backend.c | 147 ++
> >  block/file-posix.c| 323 ++
> >  block/io.c|  41 
> >  include/block/block-io.h  |   7 +
> >  include/block/block_int-common.h  |  21 ++
> >  include/block/raw-aio.h   |   6 +-
> >  include/sysemu/block-backend-io.h |  18 ++
> >  meson.build   |   4 +
> >  qemu-io-cmds.c| 149 ++
> >  9 files changed, 715 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index ba7bf1d6bc..a4847b9131 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1451,6 +1451,15 @@ typedef struct BlkRwCo {
> >  void *iobuf;
> >  int ret;
> >  BdrvRequestFlags flags;
> > +union {
> > +struct {
> > +unsigned int *nr_zones;
> > +BlockZoneDescriptor *zones;
> > +} zone_report;
> > +struct {
> > +unsigned long op;
> > +} zone_mgmt;
> > +};
> >  } BlkRwCo;
> >
> >  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
> > @@ -1795,6 +1804,144 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
> >  return ret;
> >  }
> >
> > +static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
> > +{
> > +BlkAioEmAIOCB *acb = opaque;
> > +BlkRwCo *rwco = &acb->rwco;
> > +
> > +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
> > +   rwco->zone_report.nr_zones,
> > +   rwco->zone_report.zones);
> > +blk_aio_complete(acb);
> > +}
> > +
> > +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
> > +unsigned int *nr_zones,
> > +BlockZoneDescriptor  *zones,
> > +BlockCompletionFunc *cb, void *opaque)
> > +{
> > +BlkAioEmAIOCB *acb;
> > +Coroutine *co;
> > +IO_CODE();
> > +
> > +blk_inc_in_flight(blk);
> > +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
> > +acb->rwco = (BlkRwCo) {
> > +.blk= blk,
> > +.offset = offset,
> > +.ret= NOT_DONE,
> > +.zone_report = {
> > +.zones = zones,
> > +.nr_zones = nr_zones,
> > +},
> > +};
> > +acb->has_returned = false;
> > +
> > +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
> > +bdrv_coroutine_enter(blk_bs(blk), co);
> > +
> > +acb->has_returned = true;
> > +if (acb->rwco.ret != NOT_DONE) {
> > +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> > + blk_aio_complete_bh, acb);
> > +}
> > +
> > +return &acb->common;
> > +}
> > +
> > +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
> > +{
> > +BlkAioEmAIOCB *acb = opaque;
> > +BlkRwCo *rwco = &acb->rwco;
> > +
> > +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
> > + rwco->offset, acb->bytes);
> > +blk_aio_complete(acb);
> > +}
> > +
> > +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> > +  int64_t offset, int64_t len,
> > +  BlockCompletionFunc *cb, void *opaque) {
> > +BlkAioEmAIOCB *acb;
> > +Coroutine *co;
> > +IO_CODE();
> > +
> > +blk_inc_in_flight(blk);
> > +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
> > +acb->rwco = (BlkRwCo) {
> > +.blk= blk,
> > +.offset = offset,
> > +.ret= NOT_DONE,
> > +.zone_mgmt = {
> > +.op = op,
> > +},
> > +};
> > +acb->bytes = len;
> > +acb->has_returned = false;
> > +
> > +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
> > +bdrv_coroutine_enter(blk_bs(blk), co);
> > +
> > +acb->has_returned = true;
> > +if (acb->rwco.ret != NOT_DONE) {
> > +replay_bh_schedule_oneshot_event(blk_get_aio_

[PATCH 0/9] target/arm: Housekeeping around NVIC

2023-02-06 Thread Philippe Mathieu-Daudé

Few cleanups while using link properties between CPU/NVIC:
- Restrict code to sysemu|useremu or tcg
- Simplify ID_PFR1 on useremu
- Move NVIC helpersto "hw/intc/armv7m_nvic.h"

Something odd occurs when an ARM CPU is realized, the
CPU reset handler is called, ending calling
armv7m_nvic_neg_prio_requested() on an unrealized NVIC.
I kludged by checking whether the NVIC is realized, but
this rather looks like a code smell...

Philippe Mathieu-Daudé (9):
  target/arm: Restrict v7-M MMU helpers to sysemu TCG
  target/arm: Constify ID_PFR1 on user emulation
  target/arm: Avoid resetting CPUARMState::eabi field
  target/arm: Restrict CPUARMState::arm_boot_info to sysemu
  target/arm: Restrict CPUARMState::gicv3state to sysemu
  target/arm: Restrict CPUARMState::nvic to sysemu and store as
NVICState*
  target/arm: Declare CPU <-> NVIC helpers in 'hw/intc/armv7m_nvic.h'
  hw/intc/armv7m_nvic: Allow calling neg_prio_requested on unrealized
NVIC
  hw/arm/armv7m: Pass CPU/NVIC using object_property_add_const_link()

 hw/arm/armv7m.c   |   4 +-
 hw/intc/armv7m_nvic.c |  44 +--
 include/hw/intc/armv7m_nvic.h | 128 ++-
 target/arm/cpu.c  |   8 +-
 target/arm/cpu.h  | 137 ++
 target/arm/cpu_tcg.c  |   3 +
 target/arm/helper.c   |  14 +++-
 target/arm/m_helper.c |   9 ++-
 8 files changed, 179 insertions(+), 168 deletions(-)

-- 
2.38.1

Re: [PATCH v15 8/8] docs/zoned-storage: add zoned device documentation

2023-02-06 Thread Stefan Hajnoczi

On Sun, 29 Jan 2023 at 05:31, Sam Li  wrote:
>
> Add the documentation about the zoned device support to virtio-blk
> emulation.
>
> Signed-off-by: Sam Li 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Damien Le Moal 
> Reviewed-by: Dmitry Fomichev 
> ---
>  docs/devel/zoned-storage.rst   | 43 ++
>  docs/system/qemu-block-drivers.rst.inc |  6 
>  2 files changed, 49 insertions(+)
>  create mode 100644 docs/devel/zoned-storage.rst

This patch uses the old "zoned_host_device" BlockDriver name. Please
update it to "host_device".

It's probably a good idea to search the patches for zoned_host_device
to find any other places that need to be updated. That can be done
with git log -p master.. | grep zoned_host_device.

> diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
> new file mode 100644
> index 00..03e52efe2e
> --- /dev/null
> +++ b/docs/devel/zoned-storage.rst
> @@ -0,0 +1,43 @@
> +=
> +zoned-storage
> +=
> +
> +Zoned Block Devices (ZBDs) divide the LBA space into block regions called 
> zones
> +that are larger than the LBA size. They can only allow sequential writes, 
> which
> +can reduce write amplification in SSDs, and potentially lead to higher
> +throughput and increased capacity. More details about ZBDs can be found at:
> +
> +https://zonedstorage.io/docs/introduction/zoned-storage
> +
> +1. Block layer APIs for zoned storage
> +-
> +QEMU block layer supports three zoned storage models:
> +- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
> +to zones. It supports ZBD-specific I/O commands that can be used by a host to
> +manage the zones of a device.
> +- BLK_Z_HA: The host-aware zoned model allows random write operations in
> +zones, making it backward compatible with regular block devices.
> +- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
> +regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
> +supported.
> +
> +The block device information resides inside BlockDriverState. QEMU uses
> +BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
> +block layer while processing I/O requests. A BlockBackend has a root pointer 
> to
> +a BlockDriverState graph(for example, raw format on top of file-posix). The
> +zoned storage information can be propagated from the leaf BlockDriverState 
> all
> +the way up to the BlockBackend. If the zoned storage model in file-posix is
> +set to BLK_Z_HM, then block drivers will declare support for zoned host 
> device.
> +
> +The block layer APIs support commands needed for zoned storage devices,
> +including report zones, four zone operations, and zone append.
> +
> +2. Emulating zoned storage controllers
> +--
> +When the BlockBackend's BlockLimits model reports a zoned storage device, 
> users
> +like the virtio-blk emulation or the qemu-io-cmds.c utility can use block 
> layer
> +APIs for zoned storage emulation or testing.
> +
> +For example, to test zone_report on a null_blk device using qemu-io is:
> +$ path/to/qemu-io --image-opts -n 
> driver=zoned_host_device,filename=/dev/nullb0
> +-c "zrp offset nr_zones"
> diff --git a/docs/system/qemu-block-drivers.rst.inc 
> b/docs/system/qemu-block-drivers.rst.inc
> index dfe5d2293d..0b97227fd9 100644
> --- a/docs/system/qemu-block-drivers.rst.inc
> +++ b/docs/system/qemu-block-drivers.rst.inc
> @@ -430,6 +430,12 @@ Hard disks
>you may corrupt your host data (use the ``-snapshot`` command
>line option or modify the device permissions accordingly).
>
> +Zoned block devices
> +  Zoned block devices can be passed through to the guest if the emulated 
> storage
> +  controller supports zoned storage. Use ``--blockdev zoned_host_device,
> +  node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0``
> +  as ``drive0``.
> +
>  Windows
>  ^^^
>
> --
> 2.38.1
>
>

[PATCH 1/9] target/arm: Restrict v7-M MMU helpers to sysemu TCG

2023-02-06 Thread Philippe Mathieu-Daudé

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/helper.c   | 2 +-
 target/arm/m_helper.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index c62ed05c12..5dbeade787 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11774,7 +11774,7 @@ int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
 }
 }
 
-#ifndef CONFIG_TCG
+#if !defined(CONFIG_TCG) || defined(CONFIG_USER_ONLY)
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 {
 g_assert_not_reached();
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index e7e746ea18..1e7e4e33bd 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -2854,8 +2854,6 @@ uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, 
uint32_t op)
 return tt_resp;
 }
 
-#endif /* !CONFIG_USER_ONLY */
-
 ARMMMUIdx arm_v7m_mmu_idx_all(CPUARMState *env,
   bool secstate, bool priv, bool negpri)
 {
@@ -2892,3 +2890,5 @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, 
bool secstate)
 
 return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
 }
+
+#endif /* !CONFIG_USER_ONLY */
-- 
2.38.1

[PATCH 5/9] target/arm: Restrict CPUARMState::gicv3state to sysemu

2023-02-06 Thread Philippe Mathieu-Daudé

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/cpu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 01d478e9ce..61681101a5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -772,10 +772,10 @@ typedef struct CPUArchState {
 int eabi;
 #else
 const struct arm_boot_info *boot_info;
-#endif
-void *nvic;
 /* Store GICv3CPUState to access from this struct */
 void *gicv3state;
+#endif
+void *nvic;
 
 #ifdef TARGET_TAGGED_ADDRESSES
 /* Linux syscall tagged address support */
-- 
2.38.1

[PATCH 2/9] target/arm: Constify ID_PFR1 on user emulation

2023-02-06 Thread Philippe Mathieu-Daudé

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/helper.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 5dbeade787..b58800a1a5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7021,6 +7021,7 @@ static void define_pmu_regs(ARMCPU *cpu)
 }
 }
 
+#ifndef CONFIG_USER_ONLY
 /*
  * We don't know until after realize whether there's a GICv3
  * attached, and that is what registers the gicv3 sysregs.
@@ -7038,7 +7039,6 @@ static uint64_t id_pfr1_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return pfr1;
 }
 
-#ifndef CONFIG_USER_ONLY
 static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 ARMCPU *cpu = env_archcpu(env);
@@ -7998,8 +7998,16 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 1,
   .access = PL1_R, .type = ARM_CP_NO_RAW,
   .accessfn = access_aa32_tid3,
+#ifdef CONFIG_USER_ONLY
+  .type = ARM_CP_CONST,
+  .resetvalue = cpu->isar.id_pfr1,
+#else
+  .type = ARM_CP_NO_RAW,
+  .accessfn = access_aa32_tid3,
   .readfn = id_pfr1_read,
-  .writefn = arm_cp_write_ignore },
+  .writefn = arm_cp_write_ignore
+#endif
+},
 { .name = "ID_DFR0", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 2,
   .access = PL1_R, .type = ARM_CP_CONST,
-- 
2.38.1

[PATCH 7/9] target/arm: Declare CPU <-> NVIC helpers in 'hw/intc/armv7m_nvic.h'

2023-02-06 Thread Philippe Mathieu-Daudé

While dozens of files include "cpu.h", only 3 files require
these NVIC helper declarations.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/intc/armv7m_nvic.h | 123 ++
 target/arm/cpu.c  |   4 +-
 target/arm/cpu.h  | 123 --
 target/arm/cpu_tcg.c  |   3 +
 target/arm/m_helper.c |   3 +
 5 files changed, 132 insertions(+), 124 deletions(-)

diff --git a/include/hw/intc/armv7m_nvic.h b/include/hw/intc/armv7m_nvic.h
index 07f9c21a5f..1ca262fbf8 100644
--- a/include/hw/intc/armv7m_nvic.h
+++ b/include/hw/intc/armv7m_nvic.h
@@ -83,4 +83,127 @@ struct NVICState {
 qemu_irq sysresetreq;
 };
 
+/* Interface between CPU and Interrupt controller.  */
+/**
+ * armv7m_nvic_set_pending: mark the specified exception as pending
+ * @s: the NVIC
+ * @irq: the exception number to mark pending
+ * @secure: false for non-banked exceptions or for the nonsecure
+ * version of a banked exception, true for the secure version of a banked
+ * exception.
+ *
+ * Marks the specified exception as pending. Note that we will assert()
+ * if @secure is true and @irq does not specify one of the fixed set
+ * of architecturally banked exceptions.
+ */
+void armv7m_nvic_set_pending(NVICState *s, int irq, bool secure);
+/**
+ * armv7m_nvic_set_pending_derived: mark this derived exception as pending
+ * @s: the NVIC
+ * @irq: the exception number to mark pending
+ * @secure: false for non-banked exceptions or for the nonsecure
+ * version of a banked exception, true for the secure version of a banked
+ * exception.
+ *
+ * Similar to armv7m_nvic_set_pending(), but specifically for derived
+ * exceptions (exceptions generated in the course of trying to take
+ * a different exception).
+ */
+void armv7m_nvic_set_pending_derived(NVICState *s, int irq, bool secure);
+/**
+ * armv7m_nvic_set_pending_lazyfp: mark this lazy FP exception as pending
+ * @s: the NVIC
+ * @irq: the exception number to mark pending
+ * @secure: false for non-banked exceptions or for the nonsecure
+ * version of a banked exception, true for the secure version of a banked
+ * exception.
+ *
+ * Similar to armv7m_nvic_set_pending(), but specifically for exceptions
+ * generated in the course of lazy stacking of FP registers.
+ */
+void armv7m_nvic_set_pending_lazyfp(NVICState *s, int irq, bool secure);
+/**
+ * armv7m_nvic_get_pending_irq_info: return highest priority pending
+ *exception, and whether it targets Secure state
+ * @s: the NVIC
+ * @pirq: set to pending exception number
+ * @ptargets_secure: set to whether pending exception targets Secure
+ *
+ * This function writes the number of the highest priority pending
+ * exception (the one which would be made active by
+ * armv7m_nvic_acknowledge_irq()) to @pirq, and sets @ptargets_secure
+ * to true if the current highest priority pending exception should
+ * be taken to Secure state, false for NS.
+ */
+void armv7m_nvic_get_pending_irq_info(NVICState *s, int *pirq,
+  bool *ptargets_secure);
+/**
+ * armv7m_nvic_acknowledge_irq: make highest priority pending exception active
+ * @s: the NVIC
+ *
+ * Move the current highest priority pending exception from the pending
+ * state to the active state, and update v7m.exception to indicate that
+ * it is the exception currently being handled.
+ */
+void armv7m_nvic_acknowledge_irq(NVICState *s);
+/**
+ * armv7m_nvic_complete_irq: complete specified interrupt or exception
+ * @s: the NVIC
+ * @irq: the exception number to complete
+ * @secure: true if this exception was secure
+ *
+ * Returns: -1 if the irq was not active
+ *   1 if completing this irq brought us back to base (no active irqs)
+ *   0 if there is still an irq active after this one was completed
+ * (Ignoring -1, this is the same as the RETTOBASE value before completion.)
+ */
+int armv7m_nvic_complete_irq(NVICState *s, int irq, bool secure);
+/**
+ * armv7m_nvic_get_ready_status(void *opaque, int irq, bool secure)
+ * @s: the NVIC
+ * @irq: the exception number to mark pending
+ * @secure: false for non-banked exceptions or for the nonsecure
+ * version of a banked exception, true for the secure version of a banked
+ * exception.
+ *
+ * Return whether an exception is "ready", i.e. whether the exception is
+ * enabled and is configured at a priority which would allow it to
+ * interrupt the current execution priority. This controls whether the
+ * RDY bit for it in the FPCCR is set.
+ */
+bool armv7m_nvic_get_ready_status(NVICState *s, int irq, bool secure);
+/**
+ * armv7m_nvic_raw_execution_priority: return the raw execution priority
+ * @s: the NVIC
+ *
+ * Returns: the raw execution priority as defined by the v8M architecture.
+ * This is the execution priority minus the effects of AIRCR.PRIS,
+ * and minus any PRIMASK/FAULTMASK/BASEPRI priority boosting.
+ * (v8M ARM ARM I_PKLD.)
+ */
+int armv7m_nvic_raw_execution_priority(NVICState

[PATCH 8/9] hw/intc/armv7m_nvic: Allow calling neg_prio_requested on unrealized NVIC

2023-02-06 Thread Philippe Mathieu-Daudé

armv7m_nvic_neg_prio_requested() is called via arm_cpu_reset_hold()
during CPU realize() time, when the NVIC isn't yet realized:

(lldb) bt
  * frame #0:  0x10059ed5c armv7m_nvic_neg_prio_requested(opaque=0x1180087b0, 
secure=true) at armv7m_nvic.c:404:9
frame #1:  0x100383018 arm_v7m_mmu_idx_for_secstate [inlined] 
arm_v7m_mmu_idx_for_secstate_and_priv(...) at m_helper.c:2882:19
frame #2:  0x10038300c arm_v7m_mmu_idx_for_secstate(..., secstate=true) at 
m_helper.c:2893:12
frame #3:  0x10036e9bc arm_mmu_idx_el(...) at helper.c:11799:16 [artificial]
frame #4:  0x100366cd4 arm_rebuild_hflags [inlined] 
rebuild_hflags_internal(env=0x118411f30) at helper.c:12129:25
frame #5:  0x100366c18 arm_rebuild_hflags(env=0x118411f30) at 
helper.c:12142:19
frame #6:  0x10035f1c4 arm_cpu_reset_hold(...) at cpu.c:541:5 [artificial]
frame #7:  0x10066b354 resettable_phase_hold(obj=0x11841, 
opaque=0x0, ...) at resettable.c:0
frame #8:  0x10066ac40 resettable_assert_reset(obj=0x11841, ...) at 
resettable.c:60:5
frame #9:  0x10066ab1c resettable_reset(obj=0x11841, 
type=RESET_TYPE_COLD) at resettable.c:45:5
frame #10: 0x100669568 device_cold_reset(...) at qdev.c:255:5 [artificial]
frame #11: 0x1ca28 cpu_reset(cpu=0x11841) at cpu-common.c:114:5
frame #12: 0x10035ec74 arm_cpu_realizefn(dev=0x11841, errp=0x16fdfb910) 
at cpu.c:2145:5
frame #13: 0x10066a3e0 device_set_realized(...) at qdev.c:519:13
frame #14: 0x100671b98 property_set_bool(obj=0x11841, ...) at 
object.c:2285:5
frame #15: 0x10066fdf4 object_property_set(obj=0x11841, 
name="realized", ...) at object.c:1420:5
frame #16: 0x100673da8 object_property_set_qobject(...) at 
qom-qobject.c:28:10
frame #17: 0x10067026c object_property_set_bool(...) at object.c:1489:15
frame #18: 0x100669600 qdev_realize(...) at qdev.c:292:12 [artificial]
frame #19: 0x1003101bc armv7m_realize(dev=0x118008480, ...) at 
armv7m.c:344:10
frame #20: 0x10066a3e0 device_set_realized(...) at qdev.c:519:13
frame #21: 0x100671b98 property_set_bool(obj=0x118008480, ...) at 
object.c:2285:5
frame #22: 0x10066fdf4 object_property_set(obj=0x118008480, 
name="realized", ...) at object.c:1420:5
frame #23: 0x100673da8 object_property_set_qobject(...) at 
qom-qobject.c:28:10
frame #24: 0x10067026c object_property_set_bool(...) at object.c:1489:15
frame #25: 0x100669600 qdev_realize(...) at qdev.c:292:12 [artificial]
frame #26: 0x100092da8 sysbus_realize(...) at sysbus.c:256:12 [artificial]
frame #27: 0x100350e1c armsse_realize(dev=0x118008150, ...) at 
armsse.c:1043:14
frame #28: 0x10066a3e0 device_set_realized(...) at qdev.c:519:13
frame #29: 0x100671b98 property_set_bool(obj=0x118008150, ...) at 
object.c:2285:5
frame #30: 0x10066fdf4 object_property_set(obj=0x118008150, 
name="realized", ...) at object.c:1420:5
frame #31: 0x100673da8 object_property_set_qobject(...) at 
qom-qobject.c:28:10
frame #32: 0x10067026c object_property_set_bool(...) at object.c:1489:15
frame #33: 0x100669600 qdev_realize(...) at qdev.c:292:12 [artificial]
frame #34: 0x100092da8 sysbus_realize(...) at sysbus.c:256:12 [artificial]
frame #35: 0x100349354 mps2tz_common_init(machine=0x118008000) at 
mps2-tz.c:834:5
frame #36: 0x10008e6b8 machine_run_board_init(machine=0x118008000, ...) at 
machine.c:1405:5
(lldb) frame select 12
frame #12: 0x10035ec74 arm_cpu_realizefn(dev=0x11841, errp=0x16fdfb910) at 
cpu.c:2145:5
   2142 }
   2143
   2144 qemu_init_vcpu(cs);
-> 2145 cpu_reset(cs);
   2146
   2147 acc->parent_realize(dev, errp);
   2148 }

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/intc/armv7m_nvic.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index e54553283f..d9c7e414bc 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -399,6 +399,11 @@ bool armv7m_nvic_neg_prio_requested(NVICState *s, bool 
secure)
  * mean we don't allow FAULTMASK_NS to actually make the execution
  * priority negative). Compare pseudocode IsReqExcPriNeg().
  */
+
+if (!DEVICE(s)->realized) { /* XXX Why are we called while not realized? */
+return false;
+}
+
 if (s->cpu->env.v7m.faultmask[secure]) {
 return true;
 }
-- 
2.38.1

[PATCH 9/9] hw/arm/armv7m: Pass CPU/NVIC using object_property_add_const_link()

2023-02-06 Thread Philippe Mathieu-Daudé

Avoid having QOM objects poke at each other internals.
Instead, pass references using QOM link properties.

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Mark Cave-Ayland 
---
 hw/arm/armv7m.c   | 4 ++--
 hw/intc/armv7m_nvic.c | 3 ++-
 target/arm/cpu.c  | 3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index 50a9507c0b..edde774da2 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -338,8 +338,8 @@ static void armv7m_realize(DeviceState *dev, Error **errp)
  * Tell the CPU where the NVIC is; it will fail realize if it doesn't
  * have one. Similarly, tell the NVIC where its CPU is.
  */
-s->cpu->env.nvic = &s->nvic;
-s->nvic.cpu = s->cpu;
+object_property_add_const_link(OBJECT(s->cpu), "nvic", OBJECT(&s->nvic));
+object_property_add_const_link(OBJECT(&s->nvic), "cpu", OBJECT(s->cpu));
 
 if (!qdev_realize(DEVICE(s->cpu), NULL, errp)) {
 return;
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index d9c7e414bc..e43898a9e0 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -2668,7 +2668,8 @@ static void armv7m_nvic_realize(DeviceState *dev, Error 
**errp)
 NVICState *s = NVIC(dev);
 
 /* The armv7m container object will have set our CPU pointer */
-if (!s->cpu || !arm_feature(&s->cpu->env, ARM_FEATURE_M)) {
+s->cpu = ARM_CPU(object_property_get_link(OBJECT(dev), "cpu", 
&error_abort));
+if (!arm_feature(&s->cpu->env, ARM_FEATURE_M)) {
 error_setg(errp, "The NVIC can only be used with a Cortex-M CPU");
 return;
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 876ab8f3bf..f081861947 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1573,12 +1573,13 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
  * error and will result in segfaults if not caught here.
  */
 if (arm_feature(env, ARM_FEATURE_M)) {
+env->nvic = NVIC(object_property_get_link(OBJECT(dev), "nvic", NULL));
 if (!env->nvic) {
 error_setg(errp, "This board cannot be used with Cortex-M CPUs");
 return;
 }
 } else {
-if (env->nvic) {
+if (object_property_find(OBJECT(dev), "nvic")) {
 error_setg(errp, "This board can only be used with Cortex-M CPUs");
 return;
 }
-- 
2.38.1

[PATCH 6/9] target/arm: Restrict CPUARMState::nvic to sysemu and store as NVICState*

2023-02-06 Thread Philippe Mathieu-Daudé

There is no point in using a void pointer to access the NVIC.
Use the real type to avoid casting it while debugging.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/intc/armv7m_nvic.c | 38 ++---
 include/hw/intc/armv7m_nvic.h |  5 +---
 target/arm/cpu.c  |  1 +
 target/arm/cpu.h  | 46 ++-
 target/arm/m_helper.c |  2 +-
 5 files changed, 40 insertions(+), 52 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index 1f7763964c..e54553283f 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -389,7 +389,7 @@ static inline int nvic_exec_prio(NVICState *s)
 return MIN(running, s->exception_prio);
 }
 
-bool armv7m_nvic_neg_prio_requested(void *opaque, bool secure)
+bool armv7m_nvic_neg_prio_requested(NVICState *s, bool secure)
 {
 /* Return true if the requested execution priority is negative
  * for the specified security state, ie that security state
@@ -399,8 +399,6 @@ bool armv7m_nvic_neg_prio_requested(void *opaque, bool 
secure)
  * mean we don't allow FAULTMASK_NS to actually make the execution
  * priority negative). Compare pseudocode IsReqExcPriNeg().
  */
-NVICState *s = opaque;
-
 if (s->cpu->env.v7m.faultmask[secure]) {
 return true;
 }
@@ -418,17 +416,13 @@ bool armv7m_nvic_neg_prio_requested(void *opaque, bool 
secure)
 return false;
 }
 
-bool armv7m_nvic_can_take_pending_exception(void *opaque)
+bool armv7m_nvic_can_take_pending_exception(NVICState *s)
 {
-NVICState *s = opaque;
-
 return nvic_exec_prio(s) > nvic_pending_prio(s);
 }
 
-int armv7m_nvic_raw_execution_priority(void *opaque)
+int armv7m_nvic_raw_execution_priority(NVICState *s)
 {
-NVICState *s = opaque;
-
 return s->exception_prio;
 }
 
@@ -506,9 +500,8 @@ static void nvic_irq_update(NVICState *s)
  * if @secure is true and @irq does not specify one of the fixed set
  * of architecturally banked exceptions.
  */
-static void armv7m_nvic_clear_pending(void *opaque, int irq, bool secure)
+static void armv7m_nvic_clear_pending(NVICState *s, int irq, bool secure)
 {
-NVICState *s = (NVICState *)opaque;
 VecInfo *vec;
 
 assert(irq > ARMV7M_EXCP_RESET && irq < s->num_irq);
@@ -666,17 +659,17 @@ static void do_armv7m_nvic_set_pending(void *opaque, int 
irq, bool secure,
 }
 }
 
-void armv7m_nvic_set_pending(void *opaque, int irq, bool secure)
+void armv7m_nvic_set_pending(NVICState *s, int irq, bool secure)
 {
-do_armv7m_nvic_set_pending(opaque, irq, secure, false);
+do_armv7m_nvic_set_pending(s, irq, secure, false);
 }
 
-void armv7m_nvic_set_pending_derived(void *opaque, int irq, bool secure)
+void armv7m_nvic_set_pending_derived(NVICState *s, int irq, bool secure)
 {
-do_armv7m_nvic_set_pending(opaque, irq, secure, true);
+do_armv7m_nvic_set_pending(s, irq, secure, true);
 }
 
-void armv7m_nvic_set_pending_lazyfp(void *opaque, int irq, bool secure)
+void armv7m_nvic_set_pending_lazyfp(NVICState *s, int irq, bool secure)
 {
 /*
  * Pend an exception during lazy FP stacking. This differs
@@ -684,7 +677,6 @@ void armv7m_nvic_set_pending_lazyfp(void *opaque, int irq, 
bool secure)
  * whether we should escalate depends on the saved context
  * in the FPCCR register, not on the current state of the CPU/NVIC.
  */
-NVICState *s = (NVICState *)opaque;
 bool banked = exc_is_banked(irq);
 VecInfo *vec;
 bool targets_secure;
@@ -773,9 +765,8 @@ void armv7m_nvic_set_pending_lazyfp(void *opaque, int irq, 
bool secure)
 }
 
 /* Make pending IRQ active.  */
-void armv7m_nvic_acknowledge_irq(void *opaque)
+void armv7m_nvic_acknowledge_irq(NVICState *s)
 {
-NVICState *s = (NVICState *)opaque;
 CPUARMState *env = &s->cpu->env;
 const int pending = s->vectpending;
 const int running = nvic_exec_prio(s);
@@ -814,10 +805,9 @@ static bool vectpending_targets_secure(NVICState *s)
 exc_targets_secure(s, s->vectpending);
 }
 
-void armv7m_nvic_get_pending_irq_info(void *opaque,
+void armv7m_nvic_get_pending_irq_info(NVICState *s,
   int *pirq, bool *ptargets_secure)
 {
-NVICState *s = (NVICState *)opaque;
 const int pending = s->vectpending;
 bool targets_secure;
 
@@ -831,9 +821,8 @@ void armv7m_nvic_get_pending_irq_info(void *opaque,
 *pirq = pending;
 }
 
-int armv7m_nvic_complete_irq(void *opaque, int irq, bool secure)
+int armv7m_nvic_complete_irq(NVICState *s, int irq, bool secure)
 {
-NVICState *s = (NVICState *)opaque;
 VecInfo *vec = NULL;
 int ret = 0;
 
@@ -915,7 +904,7 @@ int armv7m_nvic_complete_irq(void *opaque, int irq, bool 
secure)
 return ret;
 }
 
-bool armv7m_nvic_get_ready_status(void *opaque, int irq, bool secure)
+bool armv7m_nvic_get_ready_status(NVICState *s, int irq, bool secure)
 {
 /*
  * Return whether an exception is "ready", i.e. it is enabled and is
@@ -926,7 +915,6 @@ b

[PATCH 3/9] target/arm: Avoid resetting CPUARMState::eabi field

2023-02-06 Thread Philippe Mathieu-Daudé

Although the 'eabi' field is only used in user emulation where
CPU reset doesn't occur, it doesn't belong to the area to reset.
Move it after the 'end_reset_fields' for consistency.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/cpu.h | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 7bc97fece9..bbbcf2e153 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -721,11 +721,6 @@ typedef struct CPUArchState {
 ARMVectorReg zarray[ARM_MAX_VQ * 16];
 #endif
 
-#if defined(CONFIG_USER_ONLY)
-/* For usermode syscall translation.  */
-int eabi;
-#endif
-
 struct CPUBreakpoint *cpu_breakpoint[16];
 struct CPUWatchpoint *cpu_watchpoint[16];
 
@@ -772,6 +767,10 @@ typedef struct CPUArchState {
 uint32_t ctrl;
 } sau;
 
+#if defined(CONFIG_USER_ONLY)
+/* For usermode syscall translation.  */
+int eabi;
+#endif
 void *nvic;
 const struct arm_boot_info *boot_info;
 /* Store GICv3CPUState to access from this struct */
-- 
2.38.1

[PATCH 4/9] target/arm: Restrict CPUARMState::arm_boot_info to sysemu

2023-02-06 Thread Philippe Mathieu-Daudé

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/cpu.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index bbbcf2e153..01d478e9ce 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -770,9 +770,10 @@ typedef struct CPUArchState {
 #if defined(CONFIG_USER_ONLY)
 /* For usermode syscall translation.  */
 int eabi;
+#else
+const struct arm_boot_info *boot_info;
 #endif
 void *nvic;
-const struct arm_boot_info *boot_info;
 /* Store GICv3CPUState to access from this struct */
 void *gicv3state;
 
-- 
2.38.1

Re: [PATCH v15 8/8] docs/zoned-storage: add zoned device documentation

2023-02-06 Thread Sam Li

Stefan Hajnoczi  于2023年2月6日周一 20:16写道：
>
> On Sun, 29 Jan 2023 at 05:31, Sam Li  wrote:
> >
> > Add the documentation about the zoned device support to virtio-blk
> > emulation.
> >
> > Signed-off-by: Sam Li 
> > Reviewed-by: Stefan Hajnoczi 
> > Reviewed-by: Damien Le Moal 
> > Reviewed-by: Dmitry Fomichev 
> > ---
> >  docs/devel/zoned-storage.rst   | 43 ++
> >  docs/system/qemu-block-drivers.rst.inc |  6 
> >  2 files changed, 49 insertions(+)
> >  create mode 100644 docs/devel/zoned-storage.rst
>
> This patch uses the old "zoned_host_device" BlockDriver name. Please
> update it to "host_device".
>
> It's probably a good idea to search the patches for zoned_host_device
> to find any other places that need to be updated. That can be done
> with git log -p master.. | grep zoned_host_device.

Sorry for missing that. Will do.

>
> > diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
> > new file mode 100644
> > index 00..03e52efe2e
> > --- /dev/null
> > +++ b/docs/devel/zoned-storage.rst
> > @@ -0,0 +1,43 @@
> > +=
> > +zoned-storage
> > +=
> > +
> > +Zoned Block Devices (ZBDs) divide the LBA space into block regions called 
> > zones
> > +that are larger than the LBA size. They can only allow sequential writes, 
> > which
> > +can reduce write amplification in SSDs, and potentially lead to higher
> > +throughput and increased capacity. More details about ZBDs can be found at:
> > +
> > +https://zonedstorage.io/docs/introduction/zoned-storage
> > +
> > +1. Block layer APIs for zoned storage
> > +-
> > +QEMU block layer supports three zoned storage models:
> > +- BLK_Z_HM: The host-managed zoned model only allows sequential writes 
> > access
> > +to zones. It supports ZBD-specific I/O commands that can be used by a host 
> > to
> > +manage the zones of a device.
> > +- BLK_Z_HA: The host-aware zoned model allows random write operations in
> > +zones, making it backward compatible with regular block devices.
> > +- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
> > +regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
> > +supported.
> > +
> > +The block device information resides inside BlockDriverState. QEMU uses
> > +BlockLimits struct(BlockDriverState::bl) that is continuously accessed by 
> > the
> > +block layer while processing I/O requests. A BlockBackend has a root 
> > pointer to
> > +a BlockDriverState graph(for example, raw format on top of file-posix). The
> > +zoned storage information can be propagated from the leaf BlockDriverState 
> > all
> > +the way up to the BlockBackend. If the zoned storage model in file-posix is
> > +set to BLK_Z_HM, then block drivers will declare support for zoned host 
> > device.
> > +
> > +The block layer APIs support commands needed for zoned storage devices,
> > +including report zones, four zone operations, and zone append.
> > +
> > +2. Emulating zoned storage controllers
> > +--
> > +When the BlockBackend's BlockLimits model reports a zoned storage device, 
> > users
> > +like the virtio-blk emulation or the qemu-io-cmds.c utility can use block 
> > layer
> > +APIs for zoned storage emulation or testing.
> > +
> > +For example, to test zone_report on a null_blk device using qemu-io is:
> > +$ path/to/qemu-io --image-opts -n 
> > driver=zoned_host_device,filename=/dev/nullb0
> > +-c "zrp offset nr_zones"
> > diff --git a/docs/system/qemu-block-drivers.rst.inc 
> > b/docs/system/qemu-block-drivers.rst.inc
> > index dfe5d2293d..0b97227fd9 100644
> > --- a/docs/system/qemu-block-drivers.rst.inc
> > +++ b/docs/system/qemu-block-drivers.rst.inc
> > @@ -430,6 +430,12 @@ Hard disks
> >you may corrupt your host data (use the ``-snapshot`` command
> >line option or modify the device permissions accordingly).
> >
> > +Zoned block devices
> > +  Zoned block devices can be passed through to the guest if the emulated 
> > storage
> > +  controller supports zoned storage. Use ``--blockdev zoned_host_device,
> > +  node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0``
> > +  as ``drive0``.
> > +
> >  Windows
> >  ^^^
> >
> > --
> > 2.38.1
> >
> >

Re: [PATCH v15 08/11] qapi/s390x/cpu topology: x-set-cpu-topology monitor command

2023-02-06 Thread Thomas Huth


On 01/02/2023 14.20, Pierre Morel wrote:

The modification of the CPU attributes are done through a monitor
command.

It allows to move the core inside the topology tree to optimise


I'm not a native speaker, but "optimize" is more common, I think?


the cache usage in the case the host's hypervisor previously
moved the CPU.

The same command allows to modify the CPU attributes modifiers
like polarization entitlement and the dedicated attribute to notify
the guest if the host admin modified scheduling or dedication of a vCPU.

With this knowledge the guest has the possibility to optimize the
usage of the vCPUs.

The command is made experimental for the moment.

Signed-off-by: Pierre Morel 
---
  qapi/machine-target.json | 29 +
  include/monitor/hmp.h|  1 +
  hw/s390x/cpu-topology.c  | 88 
  hmp-commands.hx  | 16 
  4 files changed, 134 insertions(+)

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index 2e267fa458..58df0f5061 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -342,3 +342,32 @@
 'TARGET_S390X',
 'TARGET_MIPS',
 'TARGET_LOONGARCH64' ] } }
+
+##
+# @x-set-cpu-topology:
+#
+# @core: the vCPU ID to be moved
+# @socket: the destination socket where to move the vCPU
+# @book: the destination book where to move the vCPU
+# @drawer: the destination drawer where to move the vCPU
+# @polarity: optional polarity, default is last polarity set by the guest


Hmm, below you do something like that:

if (!has_polarity) {
polarity = POLARITY_VERTICAL_MEDIUM;
}

... so it rather seems like medium polarity is the default? ... that looks 
weird. I think you should maybe rather pass the has_polarity and 
has_dedication flags to the s390_change_topology() function and only change 
the value if they have really been provided?



+# @dedicated: optional, if the vCPU is dedicated to a real CPU
+#
+# Modifies the topology by moving the CPU inside the topology
+# tree or by changing a modifier attribute of a CPU.
+#
+# Returns: Nothing on success, the reason on failure.
+#
+# Since: 


Please replace with "Since: 8.0" ... I hope we'll get it merged during this 
cycle!



+{ 'command': 'x-set-cpu-topology',
+  'data': {
+  'core': 'int',
+  'socket': 'int',
+  'book': 'int',
+  'drawer': 'int',
+  '*polarity': 'int',
+  '*dedicated': 'bool'
+  },
+  'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
+}
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 1b3bdcb446..12827479cf 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -151,5 +151,6 @@ void hmp_human_readable_text_helper(Monitor *mon,
  HumanReadableText *(*qmp_handler)(Error 
**));
  void hmp_info_stats(Monitor *mon, const QDict *qdict);
  void hmp_pcie_aer_inject_error(Monitor *mon, const QDict *qdict);
+void hmp_x_set_cpu_topology(Monitor *mon, const QDict *qdict);
  
  #endif

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index c33378577b..6c50050991 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -18,6 +18,10 @@
  #include "target/s390x/cpu.h"
  #include "hw/s390x/s390-virtio-ccw.h"
  #include "hw/s390x/cpu-topology.h"
+#include "qapi/qapi-commands-machine-target.h"
+#include "qapi/qmp/qdict.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
  
  /*

   * s390_topology is used to keep the topology information.
@@ -379,3 +383,87 @@ void s390_topology_set_cpu(MachineState *ms, S390CPU *cpu, 
Error **errp)
  /* topology tree is reflected in props */
  s390_update_cpu_props(ms, cpu);
  }
+
+/*
+ * qmp and hmp implementations
+ */
+
+static void s390_change_topology(int64_t core_id, int64_t socket_id,
+ int64_t book_id, int64_t drawer_id,
+ int64_t polarity, bool dedicated,
+ Error **errp)
+{
+MachineState *ms = current_machine;
+S390CPU *cpu;
+ERRP_GUARD();
+
+cpu = (S390CPU *)ms->possible_cpus->cpus[core_id].cpu;


I think you should add a sanity check for core_id being in a valid range ... 
otherwise this will cause an access beyond the end of the array if core_id 
is too big or negative.



+if (!cpu) {
+error_setg(errp, "Core-id %ld does not exist!", core_id);
+return;
+}
+
+/* Verify the new topology */
+s390_topology_check(cpu, errp);
+if (*errp) {
+return;
+}
+
+/* Move the CPU into its new socket */
+s390_set_core_in_socket(cpu, drawer_id, book_id, socket_id, true, errp);
+
+/* All checks done, report topology in environment */
+cpu->env.drawer_id = drawer_id;
+cpu->env.book_id = book_id;
+cpu->env.socket_id = socket_id;
+cpu->env.dedicated = dedicated;
+cpu->env.entitlement = polarity;


As mentioned above, I think dedicated and polarity s

Re: [PULL 04/35] tests/unit: drop hacky race avoidance in test-io-channel-command

2023-02-06 Thread Philippe Mathieu-Daudé


Hi Alex, Thomas.

On 26/1/23 12:22, Alex Bennée wrote:

We don't need to play timing games to ensure one socat wins over the
other, just create the fifo they both can use before spawning the
processes. However in the process we need to disable two tests for
Windows platforms as we don't have an abstraction for mkfifo().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1403
Signed-off-by: Alex Bennée 
Reviewed-by: Thomas Huth 
Message-Id: <20230124180127.1881110-5-alex.ben...@linaro.org>

diff --git a/tests/unit/test-io-channel-command.c 
b/tests/unit/test-io-channel-command.c
index 19f72eab96..425e2f5594 100644
--- a/tests/unit/test-io-channel-command.c
+++ b/tests/unit/test-io-channel-command.c
@@ -20,6 +20,8 @@
  
  #include "qemu/osdep.h"

  #include 
+#include 
+#include 
  #include "io/channel-command.h"
  #include "io-channel-helpers.h"
  #include "qapi/error.h"
@@ -29,6 +31,7 @@
  
  static char *socat = NULL;
  
+#ifndef _WIN32

  static void test_io_channel_command_fifo(bool async)
  {
  g_autofree gchar *tmpdir = g_dir_make_tmp("qemu-test-io-channel.XX", 
NULL);
@@ -40,12 +43,13 @@ static void test_io_channel_command_fifo(bool async)
  QIOChannel *src, *dst;
  QIOChannelTest *test;
  
+if (mkfifo(fifo, 0600)) {

+g_error("mkfifo: %s", strerror(errno));
+}
+
  src = QIO_CHANNEL(qio_channel_command_new_spawn((const char **) srcargv,
  O_WRONLY,
  &error_abort));
-/* try to avoid a race to create the socket */
-g_usleep(1000);
-
  dst = QIO_CHANNEL(qio_channel_command_new_spawn((const char **) dstargv,
  O_RDONLY,
  &error_abort));
@@ -60,7 +64,6 @@ static void test_io_channel_command_fifo(bool async)


Testing on Darwin/Aarch64 I'm getting (reproducible):

78/93 qemu:unit / test-io-channel-command ERROR 
2.38s   killed by signal 13 SIGPIPE
>>> MALLOC_PERTURB_=10 G_TEST_BUILDDIR=./tests/unit 
G_TEST_SRCDIR=tests/unit ./tests/unit/test-io-channel-command --tap -k

―
stderr:
2023/02/03 08:26:49 socat[32507] E 
mkfifo(/var/folders/yj/r7khncsj4d77k04ybz9lw4tmgn/T/qemu-test-io-channel.GMARZ1/test-io-channel-command.fifo, 
438): File exists


(test program exited with status code -13)

TAP parsing error: Too few tests run (expected 4, got 0)
―

We call g_rmdir(), but I see various qtests calling unlink()
before rmdir(). Do we need it here?

+   g_unlink(fifo);


  g_rmdir(tmpdir);
  }
  
-

  static void test_io_channel_command_fifo_async(void)
  {
  if (!socat) {
@@ -80,6 +83,7 @@ static void test_io_channel_command_fifo_sync(void)
  
  test_io_channel_command_fifo(false);

  }
+#endif
  
  
  static void test_io_channel_command_echo(bool async)

@@ -124,10 +128,12 @@ int main(int argc, char **argv)
  
  socat = g_find_program_in_path("socat");
  
+#ifndef _WIN32

  g_test_add_func("/io/channel/command/fifo/sync",
  test_io_channel_command_fifo_sync);
  g_test_add_func("/io/channel/command/fifo/async",
  test_io_channel_command_fifo_async);
+#endif
  g_test_add_func("/io/channel/command/echo/sync",
  test_io_channel_command_echo_sync);
  g_test_add_func("/io/channel/command/echo/async",

Re: [PATCH v8 0/8] Introduce igb

2023-02-06 Thread Akihiko Odaki


Hi Jason,

Let me remind that every patches in this series now has Reviewed-by: or 
Acked-by: tag though I forgot to include tags the prior versions of this 
series received to the latest version:


"Introduce igb"
https://lore.kernel.org/qemu-devel/dbbp189mb143365704198dc9a0684dea595...@dbbp189mb1433.eurp189.prod.outlook.com/

"docs/system/devices/igb: Add igb documentation"
https://lore.kernel.org/qemu-devel/741a0975-9f7a-b4bc-9651-cf45f03d1...@kaod.org/

Regards,
Akihiko Odaki

On 2023/02/04 13:36, Akihiko Odaki wrote:

Based-on: <20230201033539.30049-1-akihiko.od...@daynix.com>
([PATCH v5 00/29] e1000x cleanups (preliminary for IGB))

igb is a family of Intel's gigabit ethernet controllers. This series implements
82576 emulation in particular. You can see the last patch for the documentation.

Note that there is another effort to bring 82576 emulation. This series was
developed independently by Sriram Yagnaraman.
https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg04670.html

V7 -> V8:
- Removed obsolete patch
   "hw/net/net_tx_pkt: Introduce net_tx_pkt_get_eth_hdr" (Cédric Le Goater)

V6 -> V7:
- Reordered statements in igb_receive_internal() so that checksum will be
   calculated only once and it will be more close to e1000e_receive_internal().

V5 -> V6:
- Rebased.
- Renamed "test" to "packet" in tests/qtest/e1000e-test.c.
- Fixed Rx logic so that a Rx pool without enough space won't prevent other
   pools from receiving, based on Sriram Yagnaraman's work.

V4 -> V5:
- Rebased.
- Squashed patches to copy from e1000e code and modify it.
- Listed the implemented features.
- Added a check for interrupts availablity on PF.
- Fixed the declaration of igb_receive_internal(). (Sriram Yagnaraman)

V3 -> V4:
- Rebased.
- Corrected PCIDevice specified for DMA.

V2 -> V3:
- Rebased.
- Fixed PCIDevice reference in hw/net/igbvf.c.
- Fixed TX packet switching when VM loopback is enabled.
- Fixed VMDq enablement check.
- Fixed RX descriptor length parser.
- Fixed the definitions of RQDPC readers.
- Implemented VLAN VM filter.
- Implemented VT_CTL.Def_PL.
- Implemented the combination of VMDq and RSS.
- Noted that igb is tested with Windows HLK.

V1 -> V2:
- Spun off e1000e general improvements to a distinct series.
- Restored vnet_hdr offload as there seems nothing preventing from that.

Akihiko Odaki (8):
   pcie: Introduce pcie_sriov_num_vfs
   e1000: Split header files
   Intrdocue igb device emulation
   tests/qtest/e1000e-test: Fabricate ethernet header
   tests/qtest/libqos/e1000e: Export macreg functions
   igb: Introduce qtest for igb device
   tests/avocado: Add igb test
   docs/system/devices/igb: Add igb documentation

  MAINTAINERS   |9 +
  docs/system/device-emulation.rst  |1 +
  docs/system/devices/igb.rst   |   71 +
  hw/net/Kconfig|5 +
  hw/net/e1000.c|1 +
  hw/net/e1000_common.h |  102 +
  hw/net/e1000_regs.h   |  927 +---
  hw/net/e1000e.c   |3 +-
  hw/net/e1000e_core.c  |1 +
  hw/net/e1000x_common.c|1 +
  hw/net/e1000x_common.h|   74 -
  hw/net/e1000x_regs.h  |  940 
  hw/net/igb.c  |  612 +++
  hw/net/igb_common.h   |  146 +
  hw/net/igb_core.c | 4043 +
  hw/net/igb_core.h |  144 +
  hw/net/igb_regs.h |  648 +++
  hw/net/igbvf.c|  327 ++
  hw/net/meson.build|2 +
  hw/net/trace-events   |   32 +
  hw/pci/pcie_sriov.c   |5 +
  include/hw/pci/pcie_sriov.h   |3 +
  .../org.centos/stream/8/x86_64/test-avocado   |1 +
  tests/avocado/igb.py  |   38 +
  tests/qtest/e1000e-test.c |   25 +-
  tests/qtest/fuzz/generic_fuzz_configs.h   |5 +
  tests/qtest/igb-test.c|  243 +
  tests/qtest/libqos/e1000e.c   |   12 -
  tests/qtest/libqos/e1000e.h   |   14 +
  tests/qtest/libqos/igb.c  |  185 +
  tests/qtest/libqos/meson.build|1 +
  tests/qtest/meson.build   |1 +
  32 files changed, 7600 insertions(+), 1022 deletions(-)
  create mode 100644 docs/system/devices/igb.rst
  create mode 100644 hw/net/e1000_common.h
  create mode 100644 hw/net/e1000x_regs.h
  create mode 100644 hw/net/igb.c
  create mode 100644 hw/net/igb_common.h
  create mode 100644 hw/net/igb_core.c
  create mode 100644 hw/net/igb_core.h
  create mode 100644 hw/net/igb_regs.h
  create mode 100644 hw/net/igbvf.c
  create mode 100644 tests/avocado/igb.py
  create

[PATCH v9 00/14] vfio/migration: Implement VFIO migration protocol v2

2023-02-06 Thread Avihai Horon

Hello,

This v9 of the series adds a patch to block multiple devices migration
and removes redundant P2P code.

Note that when Juan's pull request [1] is merged, I will rebase this
series on it and send a v10.

Following VFIO migration protocol v2 acceptance in kernel, this series
implements VFIO migration according to the new v2 protocol and replaces
the now deprecated v1 implementation.

The main differences between v1 and v2 migration protocols are:
1. VFIO device state is represented as a finite state machine instead of
   a bitmap.

2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
   ioctl and normal read() and write() instead of the migration region
   used in v1.

3. Pre-copy is made optional in v2 protocol. Support for pre-copy will
   be added later on.

Full description of the v2 protocol and the differences from v1 can be
found here [2].



Patch list:

Patch 1 updates linux headers so we will have the MIG_DATA_SIZE ioctl.

Patches 2-9 are prep patches fixing bugs, adding QEMUFile function
that will be used later and refactoring v1 protocol code to make it
easier to add v2 protocol.

Patches 10-14 implement v2 protocol and remove v1 protocol.

Thanks.



Changes from v8 [3]:
- Added patch that blocks migration of multiple devices. As discussed,
  this is necessary since VFIO migration doesn't support P2P yet.
- Removed unnecessary P2P code. This should be added when P2P support is
  added.
- Fixed vfio_save_block() comment to say -errno is returned on error.
- Added Reviewed-by tag to linux headers sync patch.



Changes from v7 [4]:
- Fixed compilation error on windows in patch #9 reported by Cedric.



Changes from v6 [5]:
- Fixed another compilation error in patch #9 reported by Cedric.
- Added Reviewed-by tags.



Changes from v5 [6]:
- Dropped patch #3.
- Simplified patch #5 as per Alex's suggestion.
- Changed qemu_file_get_to_fd() to return -EIO instead of -1, as
  suggested by Cedric.
  Also changed it so now write returns -errno instead of -1 on error.
- Fixed compilation error reported by Cedric.
- Changed vfio_migration_query_flags() to print error message and return
  -errno in error case as suggested by Cedric.
- Added Reviewed-by tags.



Changes from v4 [7]:
- Rebased on latest master branch.
- Added linux header update to kernel v6.2-rc1.
- Merged preview patches (#13-14) into this series.



Changes from v3 [8]:
- Rebased on latest master branch.

- Dropped patch #1 "migration: Remove res_compatible parameter" as
  it's not mandatory to this series and needs some further discussion.

- Dropped patch #3 "migration: Block migration comment or code is
  wrong" as it has been merged already.

- Addressed overlooked corner case reported by Vladimir in patch #4
  "migration: Simplify migration_iteration_run()".

- Dropped patch #5 "vfio/migration: Fix wrong enum usage" as it has
  been merged already.

- In patch #12 "vfio/migration: Implement VFIO migration protocol v2":
  1. Changed vfio_save_pending() to update res_precopy_only instead of
 res_postcopy_only (as VFIO migration doesn’t support postcopy).
  2. Moved VFIOMigration->data_buffer allocation to vfio_save_setup()
 and its de-allocation to vfio_save_cleanup(), so now it's
 allocated when actually used (during migration and only on source
 side).

- Addressed Alex's comments:
  1. Eliminated code duplication in patch #7 "vfio/migration: Allow
 migration without VFIO IOMMU dirty tracking support".
  2. Removed redundant initialization of vfio_region_info in patch #10
 "vfio/migration: Move migration v1 logic to vfio_migration_init()".
  3. Added comment about VFIO_MIG_DATA_BUFFER_SIZE heuristic (and
 renamed to VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE).
  4. Cast migration structs to their actual types instead of void *.
  5. Return -errno and -EBADF instead of -1 in vfio_migration_set_state().
  6. Set migration->device_state to new_state even in case of data_fd
 out of sync. Although migration will be aborted, setting device
 state succeeded so we should reflect that.
  7. Renamed VFIO_MIG_PENDING_SIZE to VFIO_MIG_STOP_COPY_SIZE, set it
 to 100G and added a comment about the size choice.
  8. Changed vfio_save_block() to return -errno on error.
  9. Squashed Patch #14 to patch #12.
  10. Adjusted migration data buffer size according to MIG_DATA_SIZE
  ioctl.

- In preview patch #17 "vfio/migration: Query device data size in
  vfio_save_pending()" - changed vfio_save_pending() to report
  VFIO_MIG_STOP_COPY_SIZE on any error.
   
- Added another preview patch "vfio/migration: Optimize
  vfio_save_pending()".

- Added ret value on some traces as suggested by David.

- Added Reviewed-By tags.



Changes from v2 [9]:
- Rebased on top of latest master branch.

- Added relevant patches from Juan's RFC [10] with minor changes:
  1. Added Reviewed-by tag to patch #3 in the RFC.
  2. Adjusted patch #6 to work without patch #4 in the RFC.

- Added a new patch "vfio/migration:

[PATCH v9 03/14] vfio/migration: Fix NULL pointer dereference bug

2023-02-06 Thread Avihai Horon

As part of its error flow, vfio_vmstate_change() accesses
MigrationState->to_dst_file without any checks. This can cause a NULL
pointer dereference if the error flow is taken and
MigrationState->to_dst_file is not set.

For example, this can happen if VM is started or stopped not during
migration and vfio_vmstate_change() error flow is taken, as
MigrationState->to_dst_file is not set at that time.

Fix it by checking that MigrationState->to_dst_file is set before using
it.

Fixes: 02a7e71b1e5b ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Avihai Horon 
Reviewed-by: Juan Quintela 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 hw/vfio/migration.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e1413ac90c..09fe7c1de2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -743,7 +743,9 @@ static void vfio_vmstate_change(void *opaque, bool running, 
RunState state)
  */
 error_report("%s: Failed to set device state 0x%x", vbasedev->name,
  (migration->device_state & mask) | value);
-qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+if (migrate_get_current()->to_dst_file) {
+qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+}
 }
 vbasedev->migration->vm_running = running;
 trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
-- 
2.26.3

[PATCH v9 02/14] migration: No save_live_pending() method uses the QEMUFile parameter

2023-02-06 Thread Avihai Horon

From: Juan Quintela 

So remove it everywhere.

Signed-off-by: Juan Quintela 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/migration/register.h   | 3 +--
 migration/savevm.h | 3 +--
 hw/s390x/s390-stattrib.c   | 2 +-
 hw/vfio/migration.c| 3 +--
 migration/block-dirty-bitmap.c | 3 +--
 migration/block.c  | 2 +-
 migration/migration.c  | 4 ++--
 migration/ram.c| 2 +-
 migration/savevm.c | 7 +++
 9 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index c1dcff0f90..eb6266a877 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -46,8 +46,7 @@ typedef struct SaveVMHandlers {
 
 /* This runs outside the iothread lock!  */
 int (*save_setup)(QEMUFile *f, void *opaque);
-void (*save_live_pending)(QEMUFile *f, void *opaque,
-  uint64_t threshold_size,
+void (*save_live_pending)(void *opaque, uint64_t threshold_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only);
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..6dec468cc3 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,8 +40,7 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
bool inactivate_disks);
-void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
-   uint64_t *res_precopy_only,
+void qemu_savevm_state_pending(uint64_t max_size, uint64_t *res_precopy_only,
uint64_t *res_compatible,
uint64_t *res_postcopy_only);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 9eda1c3b2a..a553a1e850 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
 return 0;
 }
 
-static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void cmma_save_pending(void *opaque, uint64_t max_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c74453e0b5..e1413ac90c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -456,8 +456,7 @@ static void vfio_save_cleanup(void *opaque)
 trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_save_pending(QEMUFile *f, void *opaque,
-  uint64_t threshold_size,
+static void vfio_save_pending(void *opaque, uint64_t threshold_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 15127d489a..9ec6e9d858 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -762,8 +762,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void 
*opaque)
 return 0;
 }
 
-static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
-  uint64_t max_size,
+static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only)
diff --git a/migration/block.c b/migration/block.c
index 5da15a62de..47852b8d58 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -863,7 +863,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
 return 0;
 }
 
-static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void block_save_pending(void *opaque, uint64_t max_size,
uint64_t *res_precopy_only,
uint64_t *res_compatible,
uint64_t *res_postcopy_only)
diff --git a/migration/migration.c b/migration/migration.c
index 56859d5869..9e1f9c5736 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3781,8 +3781,8 @@ static MigIterateState 
migration_iteration_run(MigrationState *s)
 uint64_t pending_size, pend_pre, pend_compat, pend_post;
 bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
-  &pend_compat, &pend_post);
+qemu_savevm

[PATCH v9 01/14] linux-headers: Update to v6.2-rc1

2023-02-06 Thread Avihai Horon

Update to commit 1b929c02afd3 ("Linux 6.2-rc1").

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Michael S. Tsirkin 
---
 include/standard-headers/drm/drm_fourcc.h |  63 +++-
 include/standard-headers/linux/ethtool.h  |  81 -
 include/standard-headers/linux/fuse.h |  20 +-
 .../linux/input-event-codes.h |   4 +
 include/standard-headers/linux/pci_regs.h |   2 +
 include/standard-headers/linux/virtio_blk.h   |  19 ++
 include/standard-headers/linux/virtio_bt.h|   8 +
 include/standard-headers/linux/virtio_net.h   |   4 +
 linux-headers/asm-arm64/kvm.h |   1 +
 linux-headers/asm-generic/hugetlb_encode.h|  26 +-
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-riscv/kvm.h |   7 +
 linux-headers/asm-x86/kvm.h   |  11 +-
 linux-headers/linux/kvm.h |  32 +-
 linux-headers/linux/psci.h|  14 +
 linux-headers/linux/userfaultfd.h |   4 +
 linux-headers/linux/vfio.h| 278 +-
 18 files changed, 522 insertions(+), 56 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 48b620cbef..69cab17b38 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -98,18 +98,42 @@ extern "C" {
 #define DRM_FORMAT_INVALID 0
 
 /* color index */
+#define DRM_FORMAT_C1  fourcc_code('C', '1', ' ', ' ') /* [7:0] 
C0:C1:C2:C3:C4:C5:C6:C7 1:1:1:1:1:1:1:1 eight pixels/byte */
+#define DRM_FORMAT_C2  fourcc_code('C', '2', ' ', ' ') /* [7:0] 
C0:C1:C2:C3 2:2:2:2 four pixels/byte */
+#define DRM_FORMAT_C4  fourcc_code('C', '4', ' ', ' ') /* [7:0] C0:C1 
4:4 two pixels/byte */
 #define DRM_FORMAT_C8  fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
-/* 8 bpp Red */
+/* 1 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D1  fourcc_code('D', '1', ' ', ' ') /* [7:0] 
D0:D1:D2:D3:D4:D5:D6:D7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D2  fourcc_code('D', '2', ' ', ' ') /* [7:0] 
D0:D1:D2:D3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D4  fourcc_code('D', '4', ' ', ' ') /* [7:0] D0:D1 
4:4 two pixels/byte */
+
+/* 8 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D8  fourcc_code('D', '8', ' ', ' ') /* [7:0] D */
+
+/* 1 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R1  fourcc_code('R', '1', ' ', ' ') /* [7:0] 
R0:R1:R2:R3:R4:R5:R6:R7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R2  fourcc_code('R', '2', ' ', ' ') /* [7:0] 
R0:R1:R2:R3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R4  fourcc_code('R', '4', ' ', ' ') /* [7:0] R0:R1 
4:4 two pixels/byte */
+
+/* 8 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R8  fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
 
-/* 10 bpp Red */
+/* 10 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R10 fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 
6:10 little endian */
 
-/* 12 bpp Red */
+/* 12 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R12 fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 
4:12 little endian */
 
-/* 16 bpp Red */
+/* 16 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R16 fourcc_code('R', '1', '6', ' ') /* [15:0] R 
little endian */
 
 /* 16 bpp RG */
@@ -204,7 +228,9 @@ extern "C" {
 #define DRM_FORMAT_VYUYfourcc_code('V', 'Y', 'U', 'Y') /* 
[31:0] Y1:Cb0:Y0:Cr0 8:8:8:8 little endian */
 
 #define DRM_FORMAT_AYUVfourcc_code('A', 'Y', 'U', 'V') /* 
[31:0] A:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_AVUYfourcc_code('A', 'V', 'U', 'Y') /* [31:0] 
A:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_XYUVfourcc_code('X', 'Y', 'U', 'V') /* [31:0] 
X:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_XVUYfourcc_code('X', 'V', 'U', 'Y') /* [31:0] 
X:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_VUY888  fourcc_code('V', 'U', '2', '4') /* [23:0] 
Cr:Cb:Y 8:8:8 little endian */
 #define DRM_FORMAT_VUY101010   fourcc_code('V', 'U', '3', '0') /* Y followed 
by U then V, 10:10:10. Non-linear modifier only */
 
@@ -717,6 +743,35 @@ extern "C" {
  */
 #define DR

[PATCH] build: make meson-buildoptions.sh stable

2023-02-06 Thread Paolo Bonzini

The library directory can change depending on the multilib setup of the host.
It would be even better to detect it in configure with the same algorithm
that Meson uses, but the important thing to avoid confusing developers is
to have identical contents of scripts/meson-buildoptions.sh, independent
of the distro and architecture on which it was created.

So, for now just give a custom default value to libdir.

Signed-off-by: Paolo Bonzini 
---
 scripts/meson-buildoptions.py | 7 +--
 scripts/meson-buildoptions.sh | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/scripts/meson-buildoptions.py b/scripts/meson-buildoptions.py
index 3e2b4785388f..a04dcc70a5b7 100755
--- a/scripts/meson-buildoptions.py
+++ b/scripts/meson-buildoptions.py
@@ -61,7 +61,10 @@
 
 # Convert the default value of an option to the string used in
 # the help message
-def value_to_help(value):
+def get_help(opt):
+if opt["name"] == "libdir":
+return 'system default'
+value = opt["value"]
 if isinstance(value, list):
 return ",".join(value)
 if isinstance(value, bool):
@@ -88,7 +91,7 @@ def sh_print(line=""):
 def help_line(left, opt, indent, long):
 right = f'{opt["description"]}'
 if long:
-value = value_to_help(opt["value"])
+value = get_help(opt)
 if value != "auto" and value != "":
 right += f" [{value}]"
 if "choices" in opt and long:
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 0f71e92dcba6..d663c9cadfbe 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -49,7 +49,7 @@ meson_options_help() {
   printf "%s\n" '  --includedir=VALUE   Header file directory [include]'
   printf "%s\n" '  --interp-prefix=VALUEwhere to find shared libraries 
etc., use %M for'
   printf "%s\n" '   cpu name [/usr/gnemul/qemu-%M]'
-  printf "%s\n" '  --libdir=VALUE   Library directory [lib64]'
+  printf "%s\n" '  --libdir=VALUE   Library directory [system default]'
   printf "%s\n" '  --libexecdir=VALUE   Library executable directory 
[libexec]'
   printf "%s\n" '  --localedir=VALUELocale data directory 
[share/locale]'
   printf "%s\n" '  --localstatedir=VALUELocalstate data directory 
[/var/local]'
-- 
2.38.1

[PATCH v9 07/14] vfio/migration: Block multiple devices migration

2023-02-06 Thread Avihai Horon

Currently VFIO migration doesn't implement some kind of intermediate
quiescent state in which P2P DMAs are quiesced before stopping or
running the device. This can cause problems in multi-device migration
where the devices are doing P2P DMAs, since the devices are not stopped
together at the same time.

Until such support is added, block migration of multiple devices.

Signed-off-by: Avihai Horon 
---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/common.c  | 51 +++
 hw/vfio/migration.c   |  6 +
 3 files changed, 59 insertions(+)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e573f5a9f1..56b1683824 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 extern VFIOGroupList vfio_group_list;
 
 bool vfio_mig_active(void);
+int vfio_block_multiple_devices_migration(Error **errp);
+void vfio_unblock_multiple_devices_migration(void);
 int64_t vfio_mig_bytes_transferred(void);
 
 #ifdef CONFIG_LINUX
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3a35f4afad..01db41b735 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -41,6 +41,7 @@
 #include "qapi/error.h"
 #include "migration/migration.h"
 #include "migration/misc.h"
+#include "migration/blocker.h"
 #include "sysemu/tpm.h"
 
 VFIOGroupList vfio_group_list =
@@ -337,6 +338,56 @@ bool vfio_mig_active(void)
 return true;
 }
 
+Error *multiple_devices_migration_blocker;
+
+static unsigned int vfio_migratable_device_num(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+unsigned int device_num = 0;
+
+QLIST_FOREACH(group, &vfio_group_list, next) {
+QLIST_FOREACH(vbasedev, &group->device_list, next) {
+if (vbasedev->migration) {
+device_num++;
+}
+}
+}
+
+return device_num;
+}
+
+int vfio_block_multiple_devices_migration(Error **errp)
+{
+int ret;
+
+if (vfio_migratable_device_num() != 2) {
+return 0;
+}
+
+error_setg(&multiple_devices_migration_blocker,
+   "Migration is currently not supported with multiple "
+   "VFIO devices");
+ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
+if (ret < 0) {
+error_free(multiple_devices_migration_blocker);
+multiple_devices_migration_blocker = NULL;
+}
+
+return ret;
+}
+
+void vfio_unblock_multiple_devices_migration(void)
+{
+if (vfio_migratable_device_num() != 2) {
+return;
+}
+
+migrate_del_blocker(multiple_devices_migration_blocker);
+error_free(multiple_devices_migration_blocker);
+multiple_devices_migration_blocker = NULL;
+}
+
 static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
 {
 VFIOGroup *group;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 552c2313b2..8085a4ff44 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -880,6 +880,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error 
**errp)
 goto add_blocker;
 }
 
+ret = vfio_block_multiple_devices_migration(errp);
+if (ret) {
+return ret;
+}
+
 trace_vfio_migration_probe(vbasedev->name, info->index);
 g_free(info);
 return 0;
@@ -902,6 +907,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
 if (vbasedev->migration) {
 VFIOMigration *migration = vbasedev->migration;
 
+vfio_unblock_multiple_devices_migration();
 remove_migration_state_change_notifier(&migration->migration_state);
 qemu_del_vm_change_state_handler(migration->vm_state);
 unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
-- 
2.26.3

[PATCH v9 04/14] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support

2023-02-06 Thread Avihai Horon

Currently, if IOMMU of a VFIO container doesn't support dirty page
tracking, migration is blocked. This is because a DMA-able VFIO device
can dirty RAM pages without updating QEMU about it, thus breaking the
migration.

However, this doesn't mean that migration can't be done at all.
In such case, allow migration and let QEMU VFIO code mark all pages
dirty.

This guarantees that all pages that might have gotten dirty are reported
back, and thus guarantees a valid migration even without VFIO IOMMU
dirty tracking support.

The motivation for this patch is the introduction of iommufd [1].
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into its internal ops, allowing the usage of these IOCTLs
over iommufd. However, VFIO IOMMU dirty tracking is not supported by
this VFIO compatibility API.

This patch will allow migration by hosts that use the VFIO compatibility
API and prevent migration regressions caused by the lack of VFIO IOMMU
dirty tracking support.

[1]
https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_...@nvidia.com/

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/common.c| 20 ++--
 hw/vfio/migration.c |  3 +--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 130e5d1dc7..f6dd571549 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -488,6 +488,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
 return -errno;
 }
 
+if (iotlb && vfio_devices_all_running_and_saving(container)) {
+cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
+tcg_enabled() ? DIRTY_CLIENTS_ALL :
+DIRTY_CLIENTS_NOCODE);
+}
+
 return 0;
 }
 
@@ -1201,6 +1207,10 @@ static void vfio_set_dirty_page_tracking(VFIOContainer 
*container, bool start)
 .argsz = sizeof(dirty),
 };
 
+if (!container->dirty_pages_supported) {
+return;
+}
+
 if (start) {
 dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
 } else {
@@ -1236,6 +1246,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer 
*container, uint64_t iova,
 uint64_t pages;
 int ret;
 
+if (!container->dirty_pages_supported) {
+cpu_physical_memory_set_dirty_range(ram_addr, size,
+tcg_enabled() ? DIRTY_CLIENTS_ALL :
+DIRTY_CLIENTS_NOCODE);
+return 0;
+}
+
 dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
 
 dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
@@ -1409,8 +1426,7 @@ static void vfio_listener_log_sync(MemoryListener 
*listener,
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 
-if (vfio_listener_skipped_section(section) ||
-!container->dirty_pages_supported) {
+if (vfio_listener_skipped_section(section)) {
 return;
 }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 09fe7c1de2..552c2313b2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -860,11 +860,10 @@ int64_t vfio_mig_bytes_transferred(void)
 
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
-VFIOContainer *container = vbasedev->group->container;
 struct vfio_region_info *info = NULL;
 int ret = -ENOTSUP;
 
-if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
+if (!vbasedev->enable_migration) {
 goto add_blocker;
 }
 
-- 
2.26.3

[PATCH v9 05/14] migration/qemu-file: Add qemu_file_get_to_fd()

2023-02-06 Thread Avihai Horon

Add new function qemu_file_get_to_fd() that allows reading data from
QEMUFile and writing it straight into a given fd.

This will be used later in VFIO migration code.

Signed-off-by: Avihai Horon 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Cédric Le Goater 
---
 migration/qemu-file.h |  1 +
 migration/qemu-file.c | 34 ++
 2 files changed, 35 insertions(+)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index fa13d04d78..9d0155a2a1 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -148,6 +148,7 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2d5f74ffc2..102ab3b439 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -940,3 +940,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file)
 {
 return file->ioc;
 }
+
+/*
+ * Read size bytes from QEMUFile f and write them to fd.
+ */
+int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size)
+{
+while (size) {
+size_t pending = f->buf_size - f->buf_index;
+ssize_t rc;
+
+if (!pending) {
+rc = qemu_fill_buffer(f);
+if (rc < 0) {
+return rc;
+}
+if (rc == 0) {
+return -EIO;
+}
+continue;
+}
+
+rc = write(fd, f->buf + f->buf_index, MIN(pending, size));
+if (rc < 0) {
+return -errno;
+}
+if (rc == 0) {
+return -EIO;
+}
+f->buf_index += rc;
+size -= rc;
+}
+
+return 0;
+}
-- 
2.26.3

[PATCH v9 09/14] vfio/migration: Rename functions/structs related to v1 protocol

2023-02-06 Thread Avihai Horon

To avoid name collisions, rename functions and structs related to VFIO
migration protocol v1. This will allow the two protocols to co-exist
when v2 protocol is added, until v1 is removed. No functional changes
intended.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-common.h |   2 +-
 hw/vfio/common.c  |   6 +-
 hw/vfio/migration.c   | 106 +-
 hw/vfio/trace-events  |  12 ++--
 4 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 56b1683824..2c0fb1d622 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -62,7 +62,7 @@ typedef struct VFIOMigration {
 struct VFIODevice *vbasedev;
 VMChangeStateEntry *vm_state;
 VFIORegion region;
-uint32_t device_state;
+uint32_t device_state_v1;
 int vm_running;
 Notifier migration_state;
 uint64_t pending_bytes;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 01db41b735..cf772fd335 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -406,8 +406,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return false;
 }
 
-if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
-&& (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
+if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+(migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
 return false;
 }
 }
@@ -436,7 +436,7 @@ static bool 
vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 return false;
 }
 
-if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
+if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
 continue;
 } else {
 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 55b43d24f6..0b77b0bbba 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -107,8 +107,8 @@ static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, 
size_t count,
  * an error is returned.
  */
 
-static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
-uint32_t value)
+static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
+   uint32_t value)
 {
 VFIOMigration *migration = vbasedev->migration;
 VFIORegion *region = &migration->region;
@@ -145,8 +145,8 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
 return ret;
 }
 
-migration->device_state = device_state;
-trace_vfio_migration_set_state(vbasedev->name, device_state);
+migration->device_state_v1 = device_state;
+trace_vfio_migration_v1_set_state(vbasedev->name, device_state);
 return 0;
 }
 
@@ -260,8 +260,8 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice 
*vbasedev, uint64_t *size)
 return ret;
 }
 
-static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
-uint64_t data_size)
+static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
+   uint64_t data_size)
 {
 VFIORegion *region = &vbasedev->migration->region;
 uint64_t data_offset = 0, size, report_size;
@@ -288,7 +288,7 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice 
*vbasedev,
 data_size = 0;
 }
 
-trace_vfio_load_state_device_data(vbasedev->name, data_offset, size);
+trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, 
size);
 
 while (size) {
 void *buf;
@@ -394,7 +394,7 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }
 
-static void vfio_migration_cleanup(VFIODevice *vbasedev)
+static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
 
@@ -405,13 +405,13 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
 
 /* -- */
 
-static int vfio_save_setup(QEMUFile *f, void *opaque)
+static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
 VFIOMigration *migration = vbasedev->migration;
 int ret;
 
-trace_vfio_save_setup(vbasedev->name);
+trace_vfio_v1_save_setup(vbasedev->name);
 
 qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
 
@@ -431,8 +431,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
 }
 }
 
-ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-   VFIO_DEVICE_STATE_V1_SAVING);
+ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
+

[PATCH v9 08/14] vfio/migration: Move migration v1 logic to vfio_migration_init()

2023-02-06 Thread Avihai Horon

Move vfio_dev_get_region_info() logic from vfio_migration_probe() to
vfio_migration_init(). This logic is specific to v1 protocol and moving
it will make it easier to add the v2 protocol implementation later.
No functional changes intended.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/migration.c  | 30 +++---
 hw/vfio/trace-events |  2 +-
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 8085a4ff44..55b43d24f6 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -788,14 +788,14 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
 vbasedev->migration = NULL;
 }
 
-static int vfio_migration_init(VFIODevice *vbasedev,
-   struct vfio_region_info *info)
+static int vfio_migration_init(VFIODevice *vbasedev)
 {
 int ret;
 Object *obj;
 VFIOMigration *migration;
 char id[256] = "";
 g_autofree char *path = NULL, *oid = NULL;
+struct vfio_region_info *info;
 
 if (!vbasedev->ops->vfio_get_object) {
 return -EINVAL;
@@ -806,6 +806,14 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return -EINVAL;
 }
 
+ret = vfio_get_dev_region_info(vbasedev,
+   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+   &info);
+if (ret) {
+return ret;
+}
+
 vbasedev->migration = g_new0(VFIOMigration, 1);
 vbasedev->migration->device_state = VFIO_DEVICE_STATE_V1_RUNNING;
 vbasedev->migration->vm_running = runstate_is_running();
@@ -825,6 +833,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 goto err;
 }
 
+g_free(info);
+
 migration = vbasedev->migration;
 migration->vbasedev = vbasedev;
 
@@ -847,6 +857,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return 0;
 
 err:
+g_free(info);
 vfio_migration_exit(vbasedev);
 return ret;
 }
@@ -860,22 +871,13 @@ int64_t vfio_mig_bytes_transferred(void)
 
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
-struct vfio_region_info *info = NULL;
 int ret = -ENOTSUP;
 
 if (!vbasedev->enable_migration) {
 goto add_blocker;
 }
 
-ret = vfio_get_dev_region_info(vbasedev,
-   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-   &info);
-if (ret) {
-goto add_blocker;
-}
-
-ret = vfio_migration_init(vbasedev, info);
+ret = vfio_migration_init(vbasedev);
 if (ret) {
 goto add_blocker;
 }
@@ -885,14 +887,12 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error 
**errp)
 return ret;
 }
 
-trace_vfio_migration_probe(vbasedev->name, info->index);
-g_free(info);
+trace_vfio_migration_probe(vbasedev->name);
 return 0;
 
 add_blocker:
 error_setg(&vbasedev->migration_blocker,
"VFIO device doesn't support migration");
-g_free(info);
 
 ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
 if (ret < 0) {
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 73dffe9e00..b259dcc644 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -148,7 +148,7 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) 
"%ux%u"
 vfio_display_edid_write_error(void) ""
 
 # migration.c
-vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
+vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, 
uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) 
state %s"
-- 
2.26.3

[PATCH v9 10/14] vfio/migration: Implement VFIO migration protocol v2

2023-02-06 Thread Avihai Horon

Implement the basic mandatory part of VFIO migration protocol v2.
This includes all functionality that is necessary to support
VFIO_MIGRATION_STOP_COPY part of the v2 protocol.

The two protocols, v1 and v2, will co-exist and in the following patches
v1 protocol code will be removed.

There are several main differences between v1 and v2 protocols:
- VFIO device state is now represented as a finite state machine instead
  of a bitmap.

- Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
  ioctl and normal read() and write() instead of the migration region.

- Pre-copy is made optional in v2 protocol. Support for pre-copy will be
  added later on.

Detailed information about VFIO migration protocol v2 and its difference
compared to v1 protocol can be found here [1].

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-common.h |   5 +
 hw/vfio/common.c  |  17 +-
 hw/vfio/migration.c   | 453 +++---
 hw/vfio/trace-events  |   7 +
 4 files changed, 443 insertions(+), 39 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 2c0fb1d622..c4eab55af9 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -66,6 +66,11 @@ typedef struct VFIOMigration {
 int vm_running;
 Notifier migration_state;
 uint64_t pending_bytes;
+uint32_t device_state;
+int data_fd;
+void *data_buffer;
+size_t data_buffer_size;
+bool v2;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cf772fd335..8628f2c18c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -406,10 +406,17 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return false;
 }
 
-if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+if (!migration->v2 &&
+(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
 (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
 return false;
 }
+
+if (migration->v2 &&
+vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
+migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
+return false;
+}
 }
 }
 return true;
@@ -436,7 +443,13 @@ static bool 
vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 return false;
 }
 
-if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
+if (!migration->v2 &&
+migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
+continue;
+}
+
+if (migration->v2 &&
+migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
 continue;
 } else {
 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 0b77b0bbba..dcffe9235b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
 #include "qemu/cutils.h"
+#include "qemu/units.h"
 #include 
 #include 
 
@@ -44,8 +45,101 @@
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
 
+/*
+ * This is an arbitrary size based on migration of mlx5 devices, where 
typically
+ * total device migration size is on the order of 100s of MB. Testing with
+ * larger values, e.g. 128MB and 1GB, did not show a performance improvement.
+ */
+#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
+
 static int64_t bytes_transferred;
 
+static const char *mig_state_to_str(enum vfio_device_mig_state state)
+{
+switch (state) {
+case VFIO_DEVICE_STATE_ERROR:
+return "ERROR";
+case VFIO_DEVICE_STATE_STOP:
+return "STOP";
+case VFIO_DEVICE_STATE_RUNNING:
+return "RUNNING";
+case VFIO_DEVICE_STATE_STOP_COPY:
+return "STOP_COPY";
+case VFIO_DEVICE_STATE_RESUMING:
+return "RESUMING";
+default:
+return "UNKNOWN STATE";
+}
+}
+
+static int vfio_migration_set_state(VFIODevice *vbasedev,
+enum vfio_device_mig_state new_state,
+enum vfio_device_mig_state recover_state)
+{
+VFIOMigration *migration = vbasedev->migration;
+uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+  sizeof(struct vfio_device_feature_mig_state),
+  sizeof(uint64_t))] = {};
+struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+struct vfio_device_feature_mig_state *mig_state =
+(struct vfio_device_feature_mig_state

[PATCH v9 06/14] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one

2023-02-06 Thread Avihai Horon

vfio_devices_all_running_and_saving() is used to check if migration is
in pre-copy phase. This is done by checking if migration is in setup or
active states and if all VFIO devices are in pre-copy state, i.e.
_SAVING | _RUNNING.

In VFIO migration protocol v2 pre-copy support is made optional. Hence,
a matching v2 protocol pre-copy state can't be used here.

As preparation for adding v2 protocol, change
vfio_devices_all_running_and_saving() logic such that it doesn't use the
VFIO pre-copy state.

The new equivalent logic checks if migration is in active state and if
all VFIO devices are in running state [1]. No functional changes
intended.

[1] Note that checking if migration is in setup or active states and if
all VFIO devices are in running state doesn't guarantee that we are in
pre-copy phase, thus we check if migration is only in active state.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/common.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f6dd571549..3a35f4afad 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -40,6 +40,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
+#include "migration/misc.h"
 #include "sysemu/tpm.h"
 
 VFIOGroupList vfio_group_list =
@@ -363,13 +364,16 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return true;
 }
 
-static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
+/*
+ * Check if all VFIO devices are running and migration is active, which is
+ * essentially equivalent to the migration being in pre-copy phase.
+ */
+static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 {
 VFIOGroup *group;
 VFIODevice *vbasedev;
-MigrationState *ms = migrate_get_current();
 
-if (!migration_is_setup_or_active(ms->state)) {
+if (!migration_is_active(migrate_get_current())) {
 return false;
 }
 
@@ -381,8 +385,7 @@ static bool 
vfio_devices_all_running_and_saving(VFIOContainer *container)
 return false;
 }
 
-if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
-(migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
+if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
 continue;
 } else {
 return false;
@@ -461,7 +464,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
 };
 
 if (iotlb && container->dirty_pages_supported &&
-vfio_devices_all_running_and_saving(container)) {
+vfio_devices_all_running_and_mig_active(container)) {
 return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
 }
 
@@ -488,7 +491,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
 return -errno;
 }
 
-if (iotlb && vfio_devices_all_running_and_saving(container)) {
+if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
 cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
 tcg_enabled() ? DIRTY_CLIENTS_ALL :
 DIRTY_CLIENTS_NOCODE);
-- 
2.26.3

[PATCH v9 13/14] vfio: Alphabetize migration section of VFIO trace-events file

2023-02-06 Thread Avihai Horon

Sort the migration section of VFIO trace events file alphabetically
and move two misplaced traces to common.c section.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/trace-events | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 60c49b2ecf..db9cb94952 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -119,6 +119,8 @@ vfio_region_sparse_mmap_header(const char *name, int index, 
int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
+vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t 
bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 
0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
+vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu 
dirty @ 0x%"PRIx64" - 0x%"PRIx64
 
 # platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group 
#%d"
@@ -148,20 +150,18 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) 
"%ux%u"
 vfio_display_edid_write_error(void) ""
 
 # migration.c
+vfio_load_cleanup(const char *name) " (%s)"
+vfio_load_device_config_state(const char *name) " (%s)"
+vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
+vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " 
(%s) size 0x%"PRIx64" ret %d"
+vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) 
stopcopy size 0x%"PRIx64
 vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
-vfio_vmstate_change(const char *name, int running, const char *reason, const 
char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) 
state %s"
-vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data 
buffer size 0x%"PRIx64
+vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_cleanup(const char *name) " (%s)"
+vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
 vfio_save_device_config_state(const char *name) " (%s)"
 vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, 
uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" 
postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64
-vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
-vfio_load_device_config_state(const char *name) " (%s)"
-vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " 
(%s) size 0x%"PRIx64" ret %d"
-vfio_load_cleanup(const char *name) " (%s)"
-vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t 
bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 
0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
-vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu 
dirty @ 0x%"PRIx64" - 0x%"PRIx64
-vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
-vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) 
stopcopy size 0x%"PRIx64
+vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data 
buffer size 0x%"PRIx64
+vfio_vmstate_change(const char *name, int running, const char *reason, const 
char *dev_state) " (%s) running %d reason %s device state %s"
-- 
2.26.3

[PATCH v9 11/14] vfio/migration: Optimize vfio_save_pending()

2023-02-06 Thread Avihai Horon

During pre-copy phase of migration vfio_save_pending() is called
repeatedly and queries the VFIO device for its pending data size.

As long as pending RAM size is over the threshold, migration can't
converge and be completed. Therefore, during this time there is no
point in querying the VFIO device pending data size.

Avoid these unnecessary queries by issuing them in a RAM pre-copy
notifier instead of vfio_save_pending().

This way the VFIO device is queried only when RAM pending data is
below the threshold, when there is an actual chance for migration to
converge.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/migration.c   | 56 +++
 hw/vfio/trace-events  |  1 +
 3 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c4eab55af9..3c94660608 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -65,11 +65,13 @@ typedef struct VFIOMigration {
 uint32_t device_state_v1;
 int vm_running;
 Notifier migration_state;
+NotifierWithReturn migration_data;
 uint64_t pending_bytes;
 uint32_t device_state;
 int data_fd;
 void *data_buffer;
 size_t data_buffer_size;
+uint64_t stop_copy_size;
 bool v2;
 } VFIOMigration;
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index dcffe9235b..5daeb5a106 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -653,29 +653,19 @@ static void vfio_v1_save_cleanup(void *opaque)
 trace_vfio_save_cleanup(vbasedev->name);
 }
 
-/*
- * Migration size of VFIO devices can be as little as a few KBs or as big as
- * many GBs. This value should be big enough to cover the worst case.
- */
-#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
 static void vfio_save_pending(void *opaque, uint64_t threshold_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only)
 {
 VFIODevice *vbasedev = opaque;
-uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
+VFIOMigration *migration = vbasedev->migration;
 
-/*
- * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
- * reported so downtime limit won't be violated.
- */
-vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
-*res_precopy_only += stop_copy_size;
+*res_precopy_only += migration->stop_copy_size;
 
 trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
 *res_postcopy_only, *res_compatible,
-stop_copy_size);
+migration->stop_copy_size);
 }
 
 static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
@@ -1102,6 +1092,40 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
 }
 }
 
+/*
+ * Migration size of VFIO devices can be as little as a few KBs or as big as
+ * many GBs. This value should be big enough to cover the worst case.
+ */
+#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
+static int vfio_migration_data_notifier(NotifierWithReturn *n, void *data)
+{
+VFIOMigration *migration = container_of(n, VFIOMigration, migration_data);
+VFIODevice *vbasedev = migration->vbasedev;
+PrecopyNotifyData *pnd = data;
+
+if (pnd->reason != PRECOPY_NOTIFY_AFTER_BITMAP_SYNC) {
+return 0;
+}
+
+/* No need to get pending size when finishing migration */
+if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+return 0;
+}
+
+if (vfio_query_stop_copy_size(vbasedev, &migration->stop_copy_size)) {
+/*
+ * Failed to get pending migration size. Report big pending size so
+ * downtime limit won't be violated.
+ */
+migration->stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
+}
+
+trace_vfio_migration_data_notifier(vbasedev->name,
+   migration->stop_copy_size);
+
+return 0;
+}
+
 static void vfio_migration_exit(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -1223,6 +1247,9 @@ static int vfio_migration_init(VFIODevice *vbasedev)
 
 migration->vm_state = qdev_add_vm_change_state_handler(
 vbasedev->dev, vfio_vmstate_change, vbasedev);
+
+migration->migration_data.notify = vfio_migration_data_notifier;
+precopy_add_notifier(&migration->migration_data);
 } else {
 register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
  &savevm_vfio_v1_handlers, vbasedev);
@@ -1287,6 +1314,9 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
 VFIOMigration *migration = vbasedev->migration;
 
 vfio_unblock_multiple_devices_migration();
+if (migration->v2) {
+precopy_remove_notifier(&migration->migration_data);
+}
 remove_mi

[PATCH v9 14/14] docs/devel: Align VFIO migration docs to v2 protocol

2023-02-06 Thread Avihai Horon

Now that VFIO migration protocol v2 has been implemented and v1 protocol
has been removed, update the documentation according to v2 protocol.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 docs/devel/vfio-migration.rst | 68 ---
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 9ff6163c88..1d50c2fe5f 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -7,46 +7,39 @@ the guest is running on source host and restoring this saved 
state on the
 destination host. This document details how saving and restoring of VFIO
 devices is done in QEMU.
 
-Migration of VFIO devices consists of two phases: the optional pre-copy phase,
-and the stop-and-copy phase. The pre-copy phase is iterative and allows to
-accommodate VFIO devices that have a large amount of data that needs to be
-transferred. The iterative pre-copy phase of migration allows for the guest to
-continue whilst the VFIO device state is transferred to the destination, this
-helps to reduce the total downtime of the VM. VFIO devices can choose to skip
-the pre-copy phase of migration by returning pending_bytes as zero during the
-pre-copy phase.
+Migration of VFIO devices currently consists of a single stop-and-copy phase.
+During the stop-and-copy phase the guest is stopped and the entire VFIO device
+data is transferred to the destination.
+
+The pre-copy phase of migration is currently not supported for VFIO devices.
+Support for VFIO pre-copy will be added later on.
 
 A detailed description of the UAPI for VFIO device migration can be found in
-the comment for the ``vfio_device_migration_info`` structure in the header
-file linux-headers/linux/vfio.h.
+the comment for the ``vfio_device_mig_state`` structure in the header file
+linux-headers/linux/vfio.h.
 
 VFIO implements the device hooks for the iterative approach as follows:
 
-* A ``save_setup`` function that sets up the migration region and sets _SAVING
-  flag in the VFIO device state.
+* A ``save_setup`` function that sets up migration on the source.
 
-* A ``load_setup`` function that sets up the migration region on the
-  destination and sets _RESUMING flag in the VFIO device state.
+* A ``load_setup`` function that sets the VFIO device on the destination in
+  _RESUMING state.
 
 * A ``save_live_pending`` function that reads pending_bytes from the vendor
   driver, which indicates the amount of data that the vendor driver has yet to
   save for the VFIO device.
 
-* A ``save_live_iterate`` function that reads the VFIO device's data from the
-  vendor driver through the migration region during iterative phase.
-
 * A ``save_state`` function to save the device config space if it is present.
 
-* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the
-  VFIO device state and iteratively copies the remaining data for the VFIO
-  device until the vendor driver indicates that no data remains (pending bytes
-  is zero).
+* A ``save_live_complete_precopy`` function that sets the VFIO device in
+  _STOP_COPY state and iteratively copies the data for the VFIO device until
+  the vendor driver indicates that no data remains.
 
 * A ``load_state`` function that loads the config section and the data
-  sections that are generated by the save functions above
+  sections that are generated by the save functions above.
 
 * ``cleanup`` functions for both save and load that perform any migration
-  related cleanup, including unmapping the migration region
+  related cleanup.
 
 
 The VFIO migration code uses a VM state change handler to change the VFIO
@@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinned by 
the vendor driver
 can also be written by the device. There is currently no device or IOMMU
 support for dirty page tracking in hardware.
 
-By default, dirty pages are tracked when the device is in pre-copy as well as
-stop-and-copy phase. So, a page pinned by the vendor driver will be copied to
-the destination in both phases. Copying dirty pages in pre-copy phase helps
-QEMU to predict if it can achieve its downtime tolerances. If QEMU during
-pre-copy phase keeps finding dirty pages continuously, then it understands
-that even in stop-and-copy phase, it is likely to find dirty pages and can
-predict the downtime accordingly.
+By default, dirty pages are tracked during pre-copy as well as stop-and-copy
+phase. So, a page pinned by the vendor driver will be copied to the destination
+in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
+it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
+finding dirty pages continuously, then it understands that even in 
stop-and-copy
+phase, it is likely to find dirty pages and can predict the downtime
+accordingly.
 
 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
 which disables

[PATCH v9 12/14] vfio/migration: Remove VFIO migration protocol v1

2023-02-06 Thread Avihai Horon

Now that v2 protocol implementation has been added, remove the
deprecated v1 implementation.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-common.h |   5 -
 hw/vfio/common.c  |  17 +-
 hw/vfio/migration.c   | 703 +-
 hw/vfio/trace-events  |   9 -
 4 files changed, 23 insertions(+), 711 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3c94660608..f7da9f50a9 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -61,18 +61,13 @@ typedef struct VFIORegion {
 typedef struct VFIOMigration {
 struct VFIODevice *vbasedev;
 VMChangeStateEntry *vm_state;
-VFIORegion region;
-uint32_t device_state_v1;
-int vm_running;
 Notifier migration_state;
 NotifierWithReturn migration_data;
-uint64_t pending_bytes;
 uint32_t device_state;
 int data_fd;
 void *data_buffer;
 size_t data_buffer_size;
 uint64_t stop_copy_size;
-bool v2;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8628f2c18c..f5055df86d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -406,14 +406,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return false;
 }
 
-if (!migration->v2 &&
-(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
-(migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
-return false;
-}
-
-if (migration->v2 &&
-vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
+if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
 migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
 return false;
 }
@@ -443,13 +436,7 @@ static bool 
vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 return false;
 }
 
-if (!migration->v2 &&
-migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
-continue;
-}
-
-if (migration->v2 &&
-migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
+if (migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
 continue;
 } else {
 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 5daeb5a106..cf5c196299 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -140,220 +140,6 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
 return 0;
 }
 
-static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
-  off_t off, bool iswrite)
-{
-int ret;
-
-ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
-pread(vbasedev->fd, val, count, off);
-if (ret < count) {
-error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
- HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
- vbasedev->name, off, strerror(errno));
-return (ret < 0) ? ret : -EINVAL;
-}
-return 0;
-}
-
-static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
-   off_t off, bool iswrite)
-{
-int ret, done = 0;
-__u8 *tbuf = buf;
-
-while (count) {
-int bytes = 0;
-
-if (count >= 8 && !(off % 8)) {
-bytes = 8;
-} else if (count >= 4 && !(off % 4)) {
-bytes = 4;
-} else if (count >= 2 && !(off % 2)) {
-bytes = 2;
-} else {
-bytes = 1;
-}
-
-ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
-if (ret) {
-return ret;
-}
-
-count -= bytes;
-done += bytes;
-off += bytes;
-tbuf += bytes;
-}
-return done;
-}
-
-#define vfio_mig_read(f, v, c, o)   vfio_mig_rw(f, (__u8 *)v, c, o, false)
-#define vfio_mig_write(f, v, c, o)  vfio_mig_rw(f, (__u8 *)v, c, o, true)
-
-#define VFIO_MIG_STRUCT_OFFSET(f)   \
- offsetof(struct vfio_device_migration_info, f)
-/*
- * Change the device_state register for device @vbasedev. Bits set in @mask
- * are preserved, bits set in @value are set, and bits not set in either @mask
- * or @value are cleared in device_state. If the register cannot be accessed,
- * the resulting state would be invalid, or the device enters an error state,
- * an error is returned.
- */
-
-static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
-   uint32_t value)
-{
-VFIOMigration *migration = vbasedev->migration;
-VFIORegion *region = &migration->region;
-off_t dev_state_off = region->fd_offset +
-

vhost-user (virtio-fs) migration: back end state

2023-02-06 Thread Hanna Czenczek


Hi Stefan,

For true virtio-fs migration, we need to migrate the daemon’s (back 
end’s) state somehow.  I’m addressing you because you had a talk on this 
topic at KVM Forum 2021. :)


As far as I understood your talk, the only standardized way to migrate a 
vhost-user back end’s state is via dbus-vmstate.  I believe that 
interface is unsuitable for our use case, because we will need to 
migrate more than 1 MB of state.  Now, that 1 MB limit has supposedly 
been chosen arbitrarily, but the introducing commit’s message says that 
it’s based on the idea that the data must be supplied basically 
immediately anyway (due to both dbus and qemu migration requirements), 
and I don’t think we can meet that requirement.


Has there been progress on the topic of standardizing a vhost-user back 
end state migration channel besides dbus-vmstate?  I’ve looked around 
but didn’t find anything.  If there isn’t anything yet, is there still 
interest in the topic?


Of course, we could use a channel that completely bypasses qemu, but I 
think we’d like to avoid that if possible.  First, this would require 
adding functionality to virtiofsd to configure this channel.  Second, 
not storing the state in the central VM state means that migrating to 
file doesn’t work (well, we could migrate to a dedicated state file, 
but...).  Third, setting up such a channel after virtiofsd has sandboxed 
itself is hard.  I guess we should set up the migration channel before 
sandboxing, which constrains runtime configuration (basically this would 
only allow us to set up a listening server, I believe).  Well, and 
finally, it isn’t a standard way, which won’t be great if we’re planning 
to add a standard way anyway.


Thanks!

Hanna

Re: [PATCH 02/10] hw/riscv/virt: Add a switch to enable/disable ACPI

2023-02-06 Thread Andrew Jones

On Mon, Feb 06, 2023 at 12:18:06PM +0100, Philippe Mathieu-Daudé wrote:
> On 6/2/23 11:54, Andrea Bolognani wrote:
> > On Thu, Feb 02, 2023 at 10:22:15AM +0530, Sunil V L wrote:
> > > +object_class_property_add(oc, "acpi", "OnOffAuto",
> > > +  virt_get_acpi, virt_set_acpi,
> > > +  NULL, NULL);
> > > +object_class_property_set_description(oc, "acpi",
> > > +  "Enable ACPI");
> > 
> > The way this works on other architectures (x86_64, aarch64) is that
> > you get ACPI by default and can use -no-acpi to disable it if
> > desired. Can we have the same on RISC-V, for consistency?

Default on, with a user control to turn off, can be done with a boolean.
I'm not sure why/if Auto is needed for acpi. Auto is useful when a
configuration doesn't support a default setting for a feature. If the
user hasn't explicitly requested the feature to be on or off, then the
configuration can silently select what works. If, however, the user
explicitly chooses what doesn't work, then qemu will fail with an error
instead.

> 
> -no-acpi rather seems a x86-specific hack for the ISA PC machine, and
> has a high maintenance cost / burden.
> 
> If hardware provides ACPI support, QEMU should expose it to the guest.
> 
> Actually, what is the value added by '-no-acpi'?

IIRC, when booting, at least arm guests, with edk2 and ACPI tables,
then edk2 will provide the guest ACPI tables instead of DT. To ensure
we can boot with edk2, but still allow the guest to boot with DT, we
need a way to disable the generation of ACPI tables.

Thanks,
drew

Re: [PATCH v15 09/11] machine: adding s390 topology to query-cpu-fast

2023-02-06 Thread Thomas Huth


On 01/02/2023 14.20, Pierre Morel wrote:

S390x provides two more topology containers above the sockets,
books and drawers.


books and drawers are already handled via the entries in 
CpuInstanceProperties, so this sentence looks like a wrong leftover now?


I'd suggest talking about "dedication" and "polarity" instead?


Let's add these CPU attributes to the QAPI command query-cpu-fast.

Signed-off-by: Pierre Morel 
---
  qapi/machine.json  | 13 ++---
  hw/core/machine-qmp-cmds.c |  2 ++
  2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 3036117059..e36c39e258 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -53,11 +53,18 @@
  #
  # Additional information about a virtual S390 CPU
  #
-# @cpu-state: the virtual CPU's state
+# @cpu-state: the virtual CPU's state (since 2.12)
+# @dedicated: the virtual CPU's dedication (since 8.0)
+# @polarity: the virtual CPU's polarity (since 8.0)
  #
  # Since: 2.12
  ##
-{ 'struct': 'CpuInfoS390', 'data': { 'cpu-state': 'CpuS390State' } }
+{ 'struct': 'CpuInfoS390',
+'data': { 'cpu-state': 'CpuS390State',
+  'dedicated': 'bool',
+  'polarity': 'int'
+}
+}
  
  ##

  # @CpuInfoFast:
@@ -70,7 +77,7 @@
  #
  # @thread-id: ID of the underlying host thread
  #
-# @props: properties describing to which node/socket/core/thread
+# @props: properties describing to which node/drawer/book/socket/core/thread


I think this hunk should rather be moved to patch 1 now.

 Thomas

Re: [PATCH v15 09/11] machine: adding s390 topology to query-cpu-fast

2023-02-06 Thread Thomas Huth


On 01/02/2023 14.20, Pierre Morel wrote:

S390x provides two more topology containers above the sockets,
books and drawers.

Let's add these CPU attributes to the QAPI command query-cpu-fast.

Signed-off-by: Pierre Morel 
---
  qapi/machine.json  | 13 ++---
  hw/core/machine-qmp-cmds.c |  2 ++
  2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 3036117059..e36c39e258 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -53,11 +53,18 @@
  #
  # Additional information about a virtual S390 CPU
  #
-# @cpu-state: the virtual CPU's state
+# @cpu-state: the virtual CPU's state (since 2.12)
+# @dedicated: the virtual CPU's dedication (since 8.0)
+# @polarity: the virtual CPU's polarity (since 8.0)
  #
  # Since: 2.12
  ##
-{ 'struct': 'CpuInfoS390', 'data': { 'cpu-state': 'CpuS390State' } }
+{ 'struct': 'CpuInfoS390',
+'data': { 'cpu-state': 'CpuS390State',
+  'dedicated': 'bool',
+  'polarity': 'int'


I think it would also be better to mark the new fields as optional and only 
return them if the guest has the topology enabled, to avoid confusing 
clients that were written before this change.


 Thomas

Re: [PATCH v5 1/3] arm/virt: don't try to spell out the accelerator

2023-02-06 Thread Eric Auger

Hi Connie,

On 2/3/23 14:44, Cornelia Huck wrote:
> Just use current_accel_name() directly.
> 
> Signed-off-by: Cornelia Huck 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  hw/arm/virt.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index ea2413a0bad7..bdc297a4570c 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2123,21 +2123,21 @@ static void machvirt_init(MachineState *machine)
>  if (vms->secure && (kvm_enabled() || hvf_enabled())) {
>  error_report("mach-virt: %s does not support providing "
>   "Security extensions (TrustZone) to the guest CPU",
> - kvm_enabled() ? "KVM" : "HVF");
> + current_accel_name());
>  exit(1);
>  }
>  
>  if (vms->virt && (kvm_enabled() || hvf_enabled())) {
>  error_report("mach-virt: %s does not support providing "
>   "Virtualization extensions to the guest CPU",
> - kvm_enabled() ? "KVM" : "HVF");
> + current_accel_name());
>  exit(1);
>  }
>  
>  if (vms->mte && (kvm_enabled() || hvf_enabled())) {
>  error_report("mach-virt: %s does not support providing "
>   "MTE to the guest CPU",
> - kvm_enabled() ? "KVM" : "HVF");
> + current_accel_name());
>  exit(1);
>  }
>

Re: [PATCH] meson: Avoid duplicates in generated config-poison.h again

2023-02-06 Thread Alex Bennée



Markus Armbruster  writes:

> Commit eed56e9a89f "configure, meson: move config-poison.h to meson"
> lost a "| sort -u".  Restore it.  config-poison shrinks from ~4500 to
> ~700 lines when all targets are enabled.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v15 09/11] machine: adding s390 topology to query-cpu-fast

2023-02-06 Thread Daniel P . Berrangé

On Mon, Feb 06, 2023 at 01:41:44PM +0100, Thomas Huth wrote:
> On 01/02/2023 14.20, Pierre Morel wrote:
> > S390x provides two more topology containers above the sockets,
> > books and drawers.
> > 
> > Let's add these CPU attributes to the QAPI command query-cpu-fast.
> > 
> > Signed-off-by: Pierre Morel 
> > ---
> >   qapi/machine.json  | 13 ++---
> >   hw/core/machine-qmp-cmds.c |  2 ++
> >   2 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/qapi/machine.json b/qapi/machine.json
> > index 3036117059..e36c39e258 100644
> > --- a/qapi/machine.json
> > +++ b/qapi/machine.json
> > @@ -53,11 +53,18 @@
> >   #
> >   # Additional information about a virtual S390 CPU
> >   #
> > -# @cpu-state: the virtual CPU's state
> > +# @cpu-state: the virtual CPU's state (since 2.12)
> > +# @dedicated: the virtual CPU's dedication (since 8.0)
> > +# @polarity: the virtual CPU's polarity (since 8.0)
> >   #
> >   # Since: 2.12
> >   ##
> > -{ 'struct': 'CpuInfoS390', 'data': { 'cpu-state': 'CpuS390State' } }
> > +{ 'struct': 'CpuInfoS390',
> > +'data': { 'cpu-state': 'CpuS390State',
> > +  'dedicated': 'bool',
> > +  'polarity': 'int'
> 
> I think it would also be better to mark the new fields as optional and only
> return them if the guest has the topology enabled, to avoid confusing
> clients that were written before this change.

FWIW, I would say that the general expectation of QMP clients is that
they must *always* expect new fields to appear in dicts that are
returned in QMP replies. We add new fields at will on a frequent basis.

So personally I'd keep life simple and unconditionally report the new
fields.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v15 05/11] s390x/cpu topology: resetting the Topology-Change-Report

2023-02-06 Thread Pierre Morel





On 2/6/23 12:05, Thomas Huth wrote:

On 01/02/2023 14.20, Pierre Morel wrote:

During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
---

...

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index a80a1ebf22..cf63f3dd01 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -85,6 +85,18 @@ static void s390_topology_init(MachineState *ms)
  QTAILQ_INSERT_HEAD(&s390_topology.list, entry, next);
  }
+/**
+ * s390_topology_reset:
+ *
+ * Generic reset for CPU topology, calls s390_topology_reset()
+ * s390_topology_reset() to reset the kernel Modified Topology


Duplicated s390_topology_reset() in the comment.


right, thx.




+ * change record.
+ */
+void s390_topology_reset(void)
+{
+    s390_cpu_topology_reset();
+}


With the nit fixed:
Reviewed-by: Thomas Huth 



Thanks!

Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen

Re: [PATCH 02/10] hw/riscv/virt: Add a switch to enable/disable ACPI

2023-02-06 Thread Gerd Hoffmann

On Mon, Feb 06, 2023 at 12:18:06PM +0100, Philippe Mathieu-Daudé wrote:
> On 6/2/23 11:54, Andrea Bolognani wrote:
> > On Thu, Feb 02, 2023 at 10:22:15AM +0530, Sunil V L wrote:
> > > +object_class_property_add(oc, "acpi", "OnOffAuto",
> > > +  virt_get_acpi, virt_set_acpi,
> > > +  NULL, NULL);
> > > +object_class_property_set_description(oc, "acpi",
> > > +  "Enable ACPI");
> > 
> > The way this works on other architectures (x86_64, aarch64) is that
> > you get ACPI by default and can use -no-acpi to disable it if
> > desired. Can we have the same on RISC-V, for consistency?
> 
> -no-acpi rather seems a x86-specific hack for the ISA PC machine, and
> has a high maintenance cost / burden.

Under the hood it is actually a OnOffAuto machine property and -no-acpi
is just a shortcut to set that.

> Actually, what is the value added by '-no-acpi'?

On arm(64) the linux kernel can use either device trees or ACPI to find
the hardware.  Historical reasons mostly, when the platform started
there was no ACPI support.  Also the edk2 firmware uses Device Trees
for hardware discovery, likewise for historical reasons.

When ACPI is available for a platform right from the start I see little
reason to offer an option to turn it off though ...

take care,
  Gerd

Re: [PATCH v15 03/11] target/s390x/cpu topology: handle STSI(15) and build the SYSIB

2023-02-06 Thread Pierre Morel





On 2/6/23 12:24, Thomas Huth wrote:

On 01/02/2023 14.20, Pierre Morel wrote:

On interception of STSI(15.1.x) the System Information Block
(SYSIB) is built from the list of pre-ordered topology entries.

Signed-off-by: Pierre Morel 
---

...


...


--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h


...


+
+void insert_stsi_15_1_x(S390CPU *cpu, int sel2, __u64 addr, uint8_t ar);


__u64 is not a portable type (e.g. fails when doing "make 
vm-build-openbsd"), please replace with uint64_t.


done.
Thanks


Regards,
Pierre



--
Pierre Morel
IBM Lab Boeblingen

Re: [PATCH 18/19] hw/rx: Set QDev properties using QDev API

2023-02-06 Thread Yoshinori Sato

On Sat, 04 Feb 2023 03:09:13 +0900,
Philippe Mathieu-Daudé wrote:
> 
> No need to use the low-level QOM API when an object
> inherits from QDev. Directly use the QDev API to set
> its properties.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/rx/rx-gdbsim.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
> index 47c17026c7..5d50f36877 100644
> --- a/hw/rx/rx-gdbsim.c
> +++ b/hw/rx/rx-gdbsim.c
> @@ -21,6 +21,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/guest-random.h"
>  #include "qapi/error.h"
> +#include "hw/qdev-properties.h"
>  #include "hw/loader.h"
>  #include "hw/rx/rx62n.h"
>  #include "sysemu/qtest.h"
> @@ -99,12 +100,10 @@ static void rx_gdbsim_init(MachineState *machine)
>  
>  /* Initialize MCU */
>  object_initialize_child(OBJECT(machine), "mcu", &s->mcu, rxc->mcu_name);
> -object_property_set_link(OBJECT(&s->mcu), "main-bus", OBJECT(sysmem),
> - &error_abort);
> -object_property_set_uint(OBJECT(&s->mcu), "xtal-frequency-hz",
> - rxc->xtal_freq_hz, &error_abort);
> -object_property_set_bool(OBJECT(&s->mcu), "load-kernel",
> - kernel_filename != NULL, &error_abort);
> +qdev_prop_set_link(DEVICE(&s->mcu), "main-bus", OBJECT(sysmem));
> +qdev_prop_set_uint32(DEVICE(&s->mcu), "xtal-frequency-hz",
> + rxc->xtal_freq_hz);
> +qdev_prop_set_bit(DEVICE(&s->mcu), "load-kernel", kernel_filename != 
> NULL);
>  
>  if (!kernel_filename) {
>  if (machine->firmware) {
> -- 
> 2.38.1
> 

Reviewed-by: Yoshinori Sato 

-- 
Yosinori Sato

Re: [PATCH v15 06/11] s390x/cpu topology: interception of PTF instruction

2023-02-06 Thread Pierre Morel





On 2/6/23 12:38, Thomas Huth wrote:

On 01/02/2023 14.20, Pierre Morel wrote:

When the host supports the CPU topology facility, the PTF
instruction with function code 2 is interpreted by the SIE,
provided that the userland hypervizor activates the interpretation


s/hypervizor/hypervisor/


grr, I was pretty sure you already said that to me and I did change 
it... seems I did not.


done now.




by using the KVM_CAP_S390_CPU_TOPOLOGY KVM extension.

The PTF instructions with function code 0 and 1 are intercepted
and must be emulated by the userland hypervizor.


dito


yes, thx.




During RESET all CPU of the configuration are placed in
horizontal polarity.

Signed-off-by: Pierre Morel 
---

...

  /**
   * s390_topology_reset:
   *
   * Generic reset for CPU topology, calls s390_topology_reset()
   * s390_topology_reset() to reset the kernel Modified Topology
   * change record.
+ * Then set global and all CPUs polarity to POLARITY_HORIZONTAL.


You describe here already what's going to happen...


   */
  void s390_topology_reset(void)
  {
  s390_cpu_topology_reset();
+    /* Set global polarity to POLARITY_HORIZONTAL */


... then here again ...


+    s390_topology.polarity = POLARITY_HORIZONTAL;


... and the code is (fortunately) also very self-exaplaining...


+    /* Set all CPU polarity to POLARITY_HORIZONTAL */
+    s390_topology_set_cpus_polarity(POLARITY_HORIZONTAL);


... so I'd rather drop the two comments within the function body.


OK




  }


(rest of the patch looks fine to me)

  Thomas



Thanks.

Regards,
Pierre


--
Pierre Morel
IBM Lab Boeblingen

Re: [PATCH 02/10] hw/riscv/virt: Add a switch to enable/disable ACPI

2023-02-06 Thread Sunil V L

On Mon, Feb 06, 2023 at 01:35:20PM +0100, Andrew Jones wrote:
> On Mon, Feb 06, 2023 at 12:18:06PM +0100, Philippe Mathieu-Daudé wrote:
> > On 6/2/23 11:54, Andrea Bolognani wrote:
> > > On Thu, Feb 02, 2023 at 10:22:15AM +0530, Sunil V L wrote:
> > > > +object_class_property_add(oc, "acpi", "OnOffAuto",
> > > > +  virt_get_acpi, virt_set_acpi,
> > > > +  NULL, NULL);
> > > > +object_class_property_set_description(oc, "acpi",
> > > > +  "Enable ACPI");
> > > 
> > > The way this works on other architectures (x86_64, aarch64) is that
> > > you get ACPI by default and can use -no-acpi to disable it if
> > > desired. Can we have the same on RISC-V, for consistency?
> 
> Default on, with a user control to turn off, can be done with a boolean.
> I'm not sure why/if Auto is needed for acpi. Auto is useful when a
> configuration doesn't support a default setting for a feature. If the
> user hasn't explicitly requested the feature to be on or off, then the
> configuration can silently select what works. If, however, the user
> explicitly chooses what doesn't work, then qemu will fail with an error
> instead.
> 

Since all other architectures use Auto instead of a simple bool, I opted
for the same to keep it consistent.

However, default AUTO looked ambiguous to me. Since we still need to
support external interrupt controllers (IMSIC/APLIC/PLIC), I chose to
keep it OFF by default for now.

Thanks
Sunil

> > 
> > -no-acpi rather seems a x86-specific hack for the ISA PC machine, and
> > has a high maintenance cost / burden.
> > 
> > If hardware provides ACPI support, QEMU should expose it to the guest.
> > 
> > Actually, what is the value added by '-no-acpi'?
> 
> IIRC, when booting, at least arm guests, with edk2 and ACPI tables,
> then edk2 will provide the guest ACPI tables instead of DT. To ensure
> we can boot with edk2, but still allow the guest to boot with DT, we
> need a way to disable the generation of ACPI tables.
> 
> Thanks,
> drew

Re: [PATCH v15 09/11] machine: adding s390 topology to query-cpu-fast

2023-02-06 Thread Thomas Huth


On 06/02/2023 13.49, Daniel P. Berrangé wrote:

On Mon, Feb 06, 2023 at 01:41:44PM +0100, Thomas Huth wrote:

On 01/02/2023 14.20, Pierre Morel wrote:

S390x provides two more topology containers above the sockets,
books and drawers.

Let's add these CPU attributes to the QAPI command query-cpu-fast.

Signed-off-by: Pierre Morel 
---
   qapi/machine.json  | 13 ++---
   hw/core/machine-qmp-cmds.c |  2 ++
   2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 3036117059..e36c39e258 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -53,11 +53,18 @@
   #
   # Additional information about a virtual S390 CPU
   #
-# @cpu-state: the virtual CPU's state
+# @cpu-state: the virtual CPU's state (since 2.12)
+# @dedicated: the virtual CPU's dedication (since 8.0)
+# @polarity: the virtual CPU's polarity (since 8.0)
   #
   # Since: 2.12
   ##
-{ 'struct': 'CpuInfoS390', 'data': { 'cpu-state': 'CpuS390State' } }
+{ 'struct': 'CpuInfoS390',
+'data': { 'cpu-state': 'CpuS390State',
+  'dedicated': 'bool',
+  'polarity': 'int'


I think it would also be better to mark the new fields as optional and only
return them if the guest has the topology enabled, to avoid confusing
clients that were written before this change.


FWIW, I would say that the general expectation of QMP clients is that
they must *always* expect new fields to appear in dicts that are
returned in QMP replies. We add new fields at will on a frequent basis.


Did we change our policy here? I slightly remember I've been told 
differently in the past ... but I can't recall where this was, it's 
certainly been a while.


So a question to the QAPI maintainers: What's the preferred handling for new 
fields nowadays in such situations?


 Thomas

Re: [PATCH v5 2/3] arm/kvm: add support for MTE

2023-02-06 Thread Eric Auger

Hi,

On 2/3/23 21:40, Richard Henderson wrote:
> On 2/3/23 03:44, Cornelia Huck wrote:
>> +static void aarch64_cpu_get_mte(Object *obj, Visitor *v, const char
>> *name,
>> +    void *opaque, Error **errp)
>> +{
>> +    ARMCPU *cpu = ARM_CPU(obj);
>> +    OnOffAuto mte = cpu->prop_mte;
>> +
>> +    visit_type_OnOffAuto(v, name, &mte, errp);
>> +}
> 
> You don't need to copy to a local variable here.
> 
>> +
>> +static void aarch64_cpu_set_mte(Object *obj, Visitor *v, const char
>> *name,
>> +    void *opaque, Error **errp)
>> +{
>> +    ARMCPU *cpu = ARM_CPU(obj);
>> +
>> +    visit_type_OnOffAuto(v, name, &cpu->prop_mte, errp);
>> +}
> 
> ... which makes get and set functions identical.
> No need for both.
This looks like a common pattern though. virt_get_acpi/set_acpi in
virt.c or pc_machine_get_vmport/set_vmport in i386/pc.c and many other
places (microvm ...). Do those other callers also need some simplifications?

Eric
> 
>> +static inline bool arm_machine_has_tag_memory(void)
>> +{
>> +#ifndef CONFIG_USER_ONLY
>> +    Object *obj = object_dynamic_cast(qdev_get_machine(),
>> TYPE_VIRT_MACHINE);
>> +
>> +    /* so far, only the virt machine has support for tag memory */
>> +    if (obj) {
>> +    VirtMachineState *vms = VIRT_MACHINE(obj);
> 
> VIRT_MACHINE() does object_dynamic_cast_assert, and we've just done that.
> 
> As this is startup, it's not the speed that matters.  But it does look
> unfortunate.  Not for this patch set, but perhaps we ought to add
> TRY_OBJ_NAME to DECLARE_INSTANCE_CHECKER?
> 
>> +void arm_cpu_mte_finalize(ARMCPU *cpu, Error **errp)
>> +{
>> +    bool enable_mte;
>> +
>> +    switch (cpu->prop_mte) {
>> +    case ON_OFF_AUTO_OFF:
>> +    enable_mte = false;
>> +    break;
>> +    case ON_OFF_AUTO_ON:
>> +    if (tcg_enabled()) {
>> +    if (cpu_isar_feature(aa64_mte, cpu)) {
>> +    if (!arm_machine_has_tag_memory()) {
>> +    error_setg(errp, "mte=on requires tag memory");
>> +    return;
>> +    }
>> +    } else {
>> +    error_setg(errp, "mte not supported by this CPU type");
>> +    return;
>> +    }
>> +    }
>> +    if (kvm_enabled() && !kvm_arm_mte_supported()) {
>> +    error_setg(errp, "mte not supported by kvm");
>> +    return;
>> +    }
>> +    enable_mte = true;
>> +    break;
> 
> What's here is not wrong, but maybe better structured as
> 
> enable_mte = true;
>     if (qtest_enabled()) {
>     break;
>     }
>     if (tcg_enabled()) {
>     if (arm_machine_tag_mem) {
>     break;
>     }
>     error;
>     return;
>     }
>     if (kvm_enabled() && kvm_arm_mte_supported) {
>     break;
>     }
>     error("mte not supported by %s", current_accel_type());
>     return;
> 
> We only add the property for tcg via -cpu max, so the isar check is
> redundant.
> 
> 
> r~
>

1 2 3 4 5 >

1 - 100 of 421 matches

Mail list logo