date:20231115

[PATCH] tests/qtest: remove unused variables

2023-11-15 Thread zhujun2

These variables are never referenced in the code, just remove them

Signed-off-by: zhujun2 
---
 tests/qtest/test-filter-mirror.c | 2 +-
 tests/qtest/test-filter-redirector.c | 4 ++--
 tests/qtest/virtio-net-test.c| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/test-filter-mirror.c b/tests/qtest/test-filter-mirror.c
index adeada3eb8..7aa81daa93 100644
--- a/tests/qtest/test-filter-mirror.c
+++ b/tests/qtest/test-filter-mirror.c
@@ -60,7 +60,7 @@ static void test_mirror(void)
 
 g_assert_cmpint(len, ==, sizeof(send_buf));
 recv_buf = g_malloc(len);
-ret = recv(recv_sock[0], recv_buf, len, 0);
+recv(recv_sock[0], recv_buf, len, 0);
 g_assert_cmpstr(recv_buf, ==, send_buf);
 
 g_free(recv_buf);
diff --git a/tests/qtest/test-filter-redirector.c 
b/tests/qtest/test-filter-redirector.c
index e72e3b7873..e4dfeff2e0 100644
--- a/tests/qtest/test-filter-redirector.c
+++ b/tests/qtest/test-filter-redirector.c
@@ -117,7 +117,7 @@ static void test_redirector_tx(void)
 
 g_assert_cmpint(len, ==, sizeof(send_buf));
 recv_buf = g_malloc(len);
-ret = recv(recv_sock, recv_buf, len, 0);
+recv(recv_sock, recv_buf, len, 0);
 g_assert_cmpstr(recv_buf, ==, send_buf);
 
 g_free(recv_buf);
@@ -184,7 +184,7 @@ static void test_redirector_rx(void)
 
 g_assert_cmpint(len, ==, sizeof(send_buf));
 recv_buf = g_malloc(len);
-ret = recv(backend_sock[0], recv_buf, len, 0);
+recv(backend_sock[0], recv_buf, len, 0);
 g_assert_cmpstr(recv_buf, ==, send_buf);
 
 close(send_sock);
diff --git a/tests/qtest/virtio-net-test.c b/tests/qtest/virtio-net-test.c
index fab5dd8b05..26df5bbabe 100644
--- a/tests/qtest/virtio-net-test.c
+++ b/tests/qtest/virtio-net-test.c
@@ -90,7 +90,7 @@ static void tx_test(QVirtioDevice *dev,
 g_assert_cmpint(ret, ==, sizeof(len));
 len = ntohl(len);
 
-ret = recv(socket, buffer, len, 0);
+recv(socket, buffer, len, 0);
 g_assert_cmpstr(buffer, ==, "TEST");
 }
 
-- 
2.17.1

Re: [PATCH] tests/qtest: remove unused variables

2023-11-15 Thread Thomas Huth


On 15/11/2023 09.00, zhujun2 wrote:

These variables are never referenced in the code, just remove them

Signed-off-by: zhujun2 
---
  tests/qtest/test-filter-mirror.c | 2 +-
  tests/qtest/test-filter-redirector.c | 4 ++--
  tests/qtest/virtio-net-test.c| 2 +-
  3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/test-filter-mirror.c b/tests/qtest/test-filter-mirror.c
index adeada3eb8..7aa81daa93 100644
--- a/tests/qtest/test-filter-mirror.c
+++ b/tests/qtest/test-filter-mirror.c
@@ -60,7 +60,7 @@ static void test_mirror(void)
  
  g_assert_cmpint(len, ==, sizeof(send_buf));

  recv_buf = g_malloc(len);
-ret = recv(recv_sock[0], recv_buf, len, 0);
+recv(recv_sock[0], recv_buf, len, 0);
  g_assert_cmpstr(recv_buf, ==, send_buf);
  
  g_free(recv_buf);

diff --git a/tests/qtest/test-filter-redirector.c 
b/tests/qtest/test-filter-redirector.c
index e72e3b7873..e4dfeff2e0 100644
--- a/tests/qtest/test-filter-redirector.c
+++ b/tests/qtest/test-filter-redirector.c
@@ -117,7 +117,7 @@ static void test_redirector_tx(void)
  
  g_assert_cmpint(len, ==, sizeof(send_buf));

  recv_buf = g_malloc(len);
-ret = recv(recv_sock, recv_buf, len, 0);
+recv(recv_sock, recv_buf, len, 0);
  g_assert_cmpstr(recv_buf, ==, send_buf);
  
  g_free(recv_buf);

@@ -184,7 +184,7 @@ static void test_redirector_rx(void)
  
  g_assert_cmpint(len, ==, sizeof(send_buf));

  recv_buf = g_malloc(len);
-ret = recv(backend_sock[0], recv_buf, len, 0);
+recv(backend_sock[0], recv_buf, len, 0);
  g_assert_cmpstr(recv_buf, ==, send_buf);
  
  close(send_sock);

diff --git a/tests/qtest/virtio-net-test.c b/tests/qtest/virtio-net-test.c
index fab5dd8b05..26df5bbabe 100644
--- a/tests/qtest/virtio-net-test.c
+++ b/tests/qtest/virtio-net-test.c
@@ -90,7 +90,7 @@ static void tx_test(QVirtioDevice *dev,
  g_assert_cmpint(ret, ==, sizeof(len));
  len = ntohl(len);
  
-ret = recv(socket, buffer, len, 0);

+recv(socket, buffer, len, 0);
  g_assert_cmpstr(buffer, ==, "TEST");
  }


Wouldn't it be better to check the ret value for success?

 Thomas

Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object

2023-11-15 Thread Cédric Le Goater

On 11/15/23 05:06, Duan, Zhenzhong wrote:

-Original Message-
From: Cédric Le Goater 
Sent: Tuesday, November 14, 2023 9:29 PM
Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object

On 11/14/23 11:09, Zhenzhong Duan wrote:

From: Eric Auger 

Introduce an iommufd object which allows the interaction
with the host /dev/iommu device.

The /dev/iommu can have been already pre-opened outside of qemu,
in which case the fd can be passed directly along with the
iommufd object:

This allows the iommufd object to be shared accross several
subsystems (VFIO, VDPA, ...). For example, libvirt would open
the /dev/iommu once.

If no fd is passed along with the iommufd object, the /dev/iommu
is opened by the qemu code.

Suggested-by: Alex Williamson 
Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 

I simplified the object declaration in include/sysemu/iommufd.h and
formatted /dev/iommu in qemu-options.hx. No need to resend.

Good catch, thanks! Maybe further simplified with below? This is minor.

OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)

Indeed. Done.

Thanks,

C.

Re: [PATCH] tests/avocado/intel_iommu: Add asset hashes to avoid warnings

2023-11-15 Thread Eric Auger

Hi Thomas,

On 11/15/23 07:20, Thomas Huth wrote:
> On 14/11/2023 21.42, Eric Auger wrote:
>> Hi Thomas,
>>
>> On 11/14/23 15:35, Thomas Huth wrote:
>>> The intel_iommu test is currently succeeding with annoying warnings.
>> nit: you may have precised the nature of the warning or quotes
>
> The annoying warnings look like this (in the summary):
>
>  (031/174) tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu:
> WARN: Test passed but there were warnings during execution. Check the
> log for details. (67.87 s)
>  (032/174)
> tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: WARN:
> Test passed but there were warnings during execution. Check the log
> for details. (55.83 s)
>  (033/174)
> tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm:
> WARN: Test passed but there were warnings during execution. Check the
> log for details. (56.01 s)
>  (034/174)
> tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: WARN:
> Test passed but there were warnings during execution. Check the log
> for details. (54.06 s)
>
> ... not too helpful to quote them in the commit log, I guess...
Ah OK. Not really helpful indeed ;-)

Eric
>
>>> Add the proper asset hashes to avoid those.
>>>
>>> Signed-off-by: Thomas Huth 
>>> ---
>>>   tests/avocado/intel_iommu.py | 6 --
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tests/avocado/intel_iommu.py
>>> b/tests/avocado/intel_iommu.py
>>> index 474d62f6bf..77635ab56c 100644
>>> --- a/tests/avocado/intel_iommu.py
>>> +++ b/tests/avocado/intel_iommu.py
>>> @@ -54,9 +54,11 @@ def common_vm_setup(self, custom_kernel=None):
>>>   return
>>>     kernel_url = self.distro.pxeboot_url + 'vmlinuz'
>>> +    kernel_hash = '5b6f6876e1b5bda314f93893271da0d5777b1f3c'
>>>   initrd_url = self.distro.pxeboot_url + 'initrd.img'
>>> -    self.kernel_path = self.fetch_asset(kernel_url)
>>> -    self.initrd_path = self.fetch_asset(initrd_url)
>>> +    initrd_hash = 'dd0340a1b39bd28f88532babd4581c67649ec5b1'
>>> +    self.kernel_path = self.fetch_asset(kernel_url,
>>> asset_hash=kernel_hash)
>>> +    self.initrd_path = self.fetch_asset(initrd_url,
>>> asset_hash=initrd_hash)
>>>     def run_and_check(self):
>>>   if self.kernel_path:
>> Besides,
>> Reviewed-by: Eric Auger 
>
> Thanks!
>
>  Thomas
>

[PATCH RFC 2/2] hw/arm: Add minimal support for the B-L475E-IOT01A board

2023-11-15 Thread ~inesvarhol

From: Inès Varhol 

This commit adds a new B-L475E-IOT01A board using the STM32L475VG SoC.
The implementation is derived from the Netduino Plus 2 machine.
There are no peripherals implemented, only memory regions.

Signed-off-by: default avatarArnaud Minier 
Signed-off-by: Inès Varhol 
---
 configs/devices/arm-softmmu/default.mak |  1 +
 hw/arm/Kconfig  |  6 +++
 hw/arm/b-l475e-iot01a.c | 71 +
 hw/arm/meson.build  |  1 +
 4 files changed, 79 insertions(+)
 create mode 100644 hw/arm/b-l475e-iot01a.c

diff --git a/configs/devices/arm-softmmu/default.mak 
b/configs/devices/arm-softmmu/default.mak
index 980c48a7d9..023faa2f75 100644
--- a/configs/devices/arm-softmmu/default.mak
+++ b/configs/devices/arm-softmmu/default.mak
@@ -19,6 +19,7 @@ CONFIG_ARM_VIRT=y
 # CONFIG_NSERIES=n
 # CONFIG_STELLARIS=n
 # CONFIG_STM32VLDISCOVERY=n
+# CONFIG_B_L475E_IOT01A=n
 # CONFIG_REALVIEW=n
 # CONFIG_VERSATILE=n
 # CONFIG_VEXPRESS=n
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 763510afeb..4d4ed96168 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -448,6 +448,12 @@ config STM32F405_SOC
 select STM32F4XX_SYSCFG
 select STM32F4XX_EXTI
 
+config B_L475E_IOT01A
+bool
+default y
+depends on TCG && ARM
+select STM32L475VG_SOC
+
 config STM32L475VG_SOC
 bool
 select ARM_V7M
diff --git a/hw/arm/b-l475e-iot01a.c b/hw/arm/b-l475e-iot01a.c
new file mode 100644
index 00..bfcb386d52
--- /dev/null
+++ b/hw/arm/b-l475e-iot01a.c
@@ -0,0 +1,71 @@
+/*
+ * B-L475E-IOT01A Discovery Kit machine
+ * (B-L475E-IOT01A IoT Node)
+ *
+ * Copyright (c) 2014 Alistair Francis 
+ * Copyright (c) 2023 Arnaud Minier 
+ * Copyright (c) 2023 Ines Varhol 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-clock.h"
+#include "qemu/error-report.h"
+#include "hw/arm/stm32l475vg_soc.h"
+#include "hw/arm/boot.h"
+
+/* B-L475E-IOT01A implementation is derived from netduinoplus2 */
+
+/* Main SYSCLK frequency in Hz (80MHz) */
+#define SYSCLK_FRQ 8000ULL
+
+static void b_l475e_iot01a_init(MachineState *machine)
+{
+DeviceState *dev;
+Clock *sysclk;
+
+/* This clock doesn't need migration because it is fixed-frequency */
+sysclk = clock_new(OBJECT(machine), "SYSCLK");
+clock_set_hz(sysclk, SYSCLK_FRQ);
+
+dev = qdev_new(TYPE_STM32L475VG_SOC);
+qdev_prop_set_string(dev, "cpu-type", ARM_CPU_TYPE_NAME("cortex-m4"));
+qdev_connect_clock_in(dev, "sysclk", sysclk);
+qdev_realize(DEVICE(dev), NULL, &error_fatal);
+
+armv7m_load_kernel(ARM_CPU(first_cpu),
+   machine->kernel_filename,
+   0, FLASH_SIZE);
+}
+
+static void b_l475e_iot01a_machine_init(MachineClass *mc)
+{
+mc->desc = "B-L475E-IOT01A Discovery Kit (Cortex-M4)";
+mc->init = b_l475e_iot01a_init;
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m4");
+
+/* SRAM pre-allocated as part of the SoC instantiation */
+mc->default_ram_size = 0;
+}
+
+DEFINE_MACHINE("b-l475e-iot01a", b_l475e_iot01a_machine_init)
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 6b2e1228e5..579c28f546 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -42,6 +42,7 @@ arm_ss.add(when: 'CONFIG_RASPI', if_true: files('bcm2836.c', 
'raspi.c'))
 arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c'))
 arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c'))
 arm_ss.add(when: 'CONFIG_STM32F405_SOC', if_true: files('stm32f405_soc.c'))
+arm_ss.add(when: 'CONFIG_B_L475E_IOT01A', if_true: files('b-l475e-iot01a.c'))
 arm_ss.add(when: 'CONFIG_STM32L475VG_SOC', if_true: files('stm32l475vg_soc.c'))
 arm_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp.c', 
'xlnx-zcu102.c'))
 arm_ss.add(when: 'CONF

[PATCH RFC 0/2] hw/arm: Add minimal support for the B-L475E-IOT01A board

2023-11-15 Thread ~inesvarhol

This patch allows to emulate the B-L475E-IOT01A ARM Cortex-M4 board.
This is RFC since the implementation isn't complete yet, there are no
implemented
peripherals, and it's a first contribution to QEMU.

Inès Varhol (2):
  hw/arm: Add minimal support for the STM32L475VG SoC
  hw/arm: Add minimal support for the B-L475E-IOT01A board

 configs/devices/arm-softmmu/default.mak |   1 +
 hw/arm/Kconfig  |  11 ++
 hw/arm/b-l475e-iot01a.c |  71 +++
 hw/arm/meson.build  |   2 +
 hw/arm/stm32l475vg_soc.c| 241 
 include/hw/arm/stm32l475vg_soc.h|  60 ++
 6 files changed, 386 insertions(+)
 create mode 100644 hw/arm/b-l475e-iot01a.c
 create mode 100644 hw/arm/stm32l475vg_soc.c
 create mode 100644 include/hw/arm/stm32l475vg_soc.h

-- 
2.38.5

[PATCH RFC 1/2] hw/arm: Add minimal support for the STM32L475VG SoC

2023-11-15 Thread ~inesvarhol

From: Inès Varhol 

This patch adds a new STM32L475VG SoC, it is necessary to add support for
the B-L475E-IOT01A board.
The implementation is derived from the STM32F405 SoC.
The implementation contains no peripherals, only memory regions are
implemented.

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
---
 hw/arm/Kconfig   |   5 +
 hw/arm/meson.build   |   1 +
 hw/arm/stm32l475vg_soc.c | 241 +++
 include/hw/arm/stm32l475vg_soc.h |  60 
 4 files changed, 307 insertions(+)
 create mode 100644 hw/arm/stm32l475vg_soc.c
 create mode 100644 include/hw/arm/stm32l475vg_soc.h

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 3ada335a24..763510afeb 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -448,6 +448,11 @@ config STM32F405_SOC
 select STM32F4XX_SYSCFG
 select STM32F4XX_EXTI
 
+config STM32L475VG_SOC
+bool
+select ARM_V7M
+select OR_IRQ
+
 config XLNX_ZYNQMP_ARM
 bool
 default y if PIXMAN
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 68245d3ad1..6b2e1228e5 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -42,6 +42,7 @@ arm_ss.add(when: 'CONFIG_RASPI', if_true: files('bcm2836.c', 
'raspi.c'))
 arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c'))
 arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c'))
 arm_ss.add(when: 'CONFIG_STM32F405_SOC', if_true: files('stm32f405_soc.c'))
+arm_ss.add(when: 'CONFIG_STM32L475VG_SOC', if_true: files('stm32l475vg_soc.c'))
 arm_ss.add(when: 'CONFIG_XLNX_ZYNQMP_ARM', if_true: files('xlnx-zynqmp.c', 
'xlnx-zcu102.c'))
 arm_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files('xlnx-versal.c', 
'xlnx-versal-virt.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX25', if_true: files('fsl-imx25.c', 
'imx25_pdk.c'))
diff --git a/hw/arm/stm32l475vg_soc.c b/hw/arm/stm32l475vg_soc.c
new file mode 100644
index 00..7dd5d70eb3
--- /dev/null
+++ b/hw/arm/stm32l475vg_soc.c
@@ -0,0 +1,241 @@
+/*
+ * STM32L475VG SoC
+ *
+ * Copyright (c) 2014 Alistair Francis 
+ * Copyright (c) 2023 Arnaud Minier 
+ * Copyright (c) 2023 Inès Varhol 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "exec/address-spaces.h"
+#include "sysemu/sysemu.h"
+#include "hw/arm/stm32l475vg_soc.h"
+#include "hw/qdev-clock.h"
+#include "hw/misc/unimp.h"
+
+/* stm32l475vg_soc implementation is derived from stm32f405_soc */
+
+static void stm32l475vg_soc_initfn(Object *obj)
+{
+STM32L475VGState *s = STM32L475VG_SOC(obj);
+
+object_initialize_child(obj, "armv7m", &s->armv7m, TYPE_ARMV7M);
+
+s->sysclk = qdev_init_clock_in(DEVICE(s), "sysclk", NULL, NULL, 0);
+s->refclk = qdev_init_clock_in(DEVICE(s), "refclk", NULL, NULL, 0);
+}
+
+static void stm32l475vg_soc_realize(DeviceState *dev_soc, Error **errp)
+{
+STM32L475VGState *s = STM32L475VG_SOC(dev_soc);
+MemoryRegion *system_memory = get_system_memory();
+DeviceState *armv7m;
+Error *err = NULL;
+
+/*
+ * We use s->refclk internally and only define it with qdev_init_clock_in()
+ * so it is correctly parented and not leaked on an init/deinit; it is not
+ * intended as an externally exposed clock.
+ */
+if (clock_has_source(s->refclk)) {
+error_setg(errp, "refclk clock must not be wired up by the board 
code");
+return;
+}
+
+if (!clock_has_source(s->sysclk)) {
+error_setg(errp, "sysclk clock must be wired up by the board code");
+return;
+}
+
+/*
+ * TODO: ideally we should model the SoC RCC and its ability to
+ * change the sysclk frequency and define different sysclk sources.
+ */
+
+/* The refclk always runs at frequency HCLK / 8 */
+clock_set_mul_div(s->refclk, 8, 1);
+clock_set_source(s->refclk, s->sysclk);
+
+memory_region_init_rom(&s->flash, OBJECT(dev_soc), "STM32L475VG.flash",

Re: [PATCH trivial 13/21] hw/net/cadence_gem.c: spelling fixes: Octects

2023-11-15 Thread Luc Michel

On 19:58 Tue 14 Nov , Michael Tokarev wrote:
> Fixes: c755c943aa2e "hw/net/cadence_gem: use REG32 macro for register 
> definitions"
> Cc: Luc Michel 
> Cc: Peter Maydell 
> Signed-off-by: Michael Tokarev 

Reviewed-by: Luc Michel 

> ---
>  hw/net/cadence_gem.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
> index 5b989f5b52..19adbc0e19 100644
> --- a/hw/net/cadence_gem.c
> +++ b/hw/net/cadence_gem.c
> @@ -225,8 +225,8 @@ REG32(WOLAN, 0xb8) /* Wake on LAN reg */
>  REG32(IPGSTRETCH, 0xbc) /* IPG Stretch reg */
>  REG32(SVLAN, 0xc0) /* Stacked VLAN reg */
>  REG32(MODID, 0xfc) /* Module ID reg */
> -REG32(OCTTXLO, 0x100) /* Octects transmitted Low reg */
> -REG32(OCTTXHI, 0x104) /* Octects transmitted High reg */
> +REG32(OCTTXLO, 0x100) /* Octets transmitted Low reg */
> +REG32(OCTTXHI, 0x104) /* Octets transmitted High reg */
>  REG32(TXCNT, 0x108) /* Error-free Frames transmitted */
>  REG32(TXBCNT, 0x10c) /* Error-free Broadcast Frames */
>  REG32(TXMCNT, 0x110) /* Error-free Multicast Frame */
> @@ -245,8 +245,8 @@ REG32(EXCESSCOLLCNT, 0x140) /* Excessive Collision Frames 
> */
>  REG32(LATECOLLCNT, 0x144) /* Late Collision Frames */
>  REG32(DEFERTXCNT, 0x148) /* Deferred Transmission Frames */
>  REG32(CSENSECNT, 0x14c) /* Carrier Sense Error Counter */
> -REG32(OCTRXLO, 0x150) /* Octects Received register Low */
> -REG32(OCTRXHI, 0x154) /* Octects Received register High */
> +REG32(OCTRXLO, 0x150) /* Octets Received register Low */
> +REG32(OCTRXHI, 0x154) /* Octets Received register High */
>  REG32(RXCNT, 0x158) /* Error-free Frames Received */
>  REG32(RXBROADCNT, 0x15c) /* Error-free Broadcast Frames RX */
>  REG32(RXMULTICNT, 0x160) /* Error-free Multicast Frames RX */
> --
> 2.39.2
> 

--

Re: [PATCH] monitor: flush messages on abort

2023-11-15 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Fri, Nov 03, 2023 at 03:51:00PM -0400, Steven Sistare wrote:
>> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
>> > On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
>> >> Buffered monitor output is lost when abort() is called.  The pattern
>> >> error_report() followed by abort() occurs about 60 times, so valuable
>> >> information is being lost when the abort is called in the context of a
>> >> monitor command.
>> > 
>> > I'm curious, was there a particular abort() scenario that you hit ?
>> 
>> Yes, while tweaking the suspended state, and forgetting to add transitions:
>> 
>> error_report("invalid runstate transition: '%s' -> '%s'",
>> abort();
>> 
>> But I have previously hit this for other errors.

Can you provide a reproducer?

>> > For some crude statistics:
>> > 
>> >   $ for i in abort return exit goto ; do echo -n "$i: " ; git grep --after 
>> > 1 error_report | grep $i | wc -l ; done
>> >   abort: 47
>> >   return: 512
>> >   exit: 458
>> >   goto: 177
>> > 
>> > to me those numbers say that calling "abort()" after error_report
>> > should be considered a bug, and we can blanket replace all the
>> > abort() calls with exit(EXIT_FAILURE), and thus avoid the need to
>> > special case flushing the monitor.
>> 
>> And presumably add an atexit handler to flush the monitor ala monitor_abort.
>> AFAICT currently no destructor is called for the monitor at exit time.
>
> The HMP monitor flushes at each newline,  and exit() will take care of
> flushing stdout, so I don't think there's anything else needed.

Correct.

>> > Also I think there's a decent case to be made for error_report()
>> > to call monitor_flush().

No, because the messages printed by error_report() all end in newline,
and printing a newline to a monitor triggers monitor_flush_locked().

>> A good start, but that would not help for monitors with skip_flush=true, 
>> which 
>> need to format the buffered string in a json response, which is the case I 
>> tripped over.
>
> 'skip_flush' is only set to 'true' when using a QMP monitor and invoking
> "hmp-monitor-command".

Correct.

> In such a case, the error message needs to be built into a JSON error
> reply and sent over the socket. Your patch doesn't help this case
> since you've just printed to stderr.  I don't think it is reasonable
> to expect QMP monitors to send replies on SIG_ABRT anyway. So I don't
> think the skip_flush=true scenario is a problem to be concerned with.
>
>> >> To fix, install a SIGABRT handler to flush the monitor buffer to stderr.
>> >>
>> >> Signed-off-by: Steve Sistare 
>> >> ---
>> >>  monitor/monitor.c | 38 ++
>> >>  1 file changed, 38 insertions(+)
>> >>
>> >> diff --git a/monitor/monitor.c b/monitor/monitor.c
>> >> index dc352f9..65dace0 100644
>> >> --- a/monitor/monitor.c
>> >> +++ b/monitor/monitor.c
>> >> @@ -701,6 +701,43 @@ void monitor_cleanup(void)
>> >>  }
>> >>  }
>> >>  
>> >> +#ifdef CONFIG_LINUX
>> >> +
>> >> +static void monitor_abort(int signal, siginfo_t *info, void *c)
>> >> +{
>> >> +Monitor *mon = monitor_cur();
>> >> +
>> >> +if (!mon || qemu_mutex_trylock(&mon->mon_lock)) {
>> >> +return;
>> >> +}
>> >> +
>> >> +if (mon->outbuf && mon->outbuf->len) {
>> >> +fputs("SIGABRT received: ", stderr);
>> >> +fputs(mon->outbuf->str, stderr);
>> >> +if (mon->outbuf->str[mon->outbuf->len - 1] != '\n') {
>> >> +fputc('\n', stderr);
>> >> +}
>> >> +}
>> >> +
>> >> +qemu_mutex_unlock(&mon->mon_lock);
>> > 
>> > The SIGABRT handling does not only fire in response to abort()
>> > calls, but also in response to bad memory scenarios, so we have
>> > to be careful what we do in signal handlers.
>> > 
>> > In particular using mutexes in signal handlers is a big red
>> > flag generally. Mutex APIs are not declare async signal
>> > safe, so this code is technically a POSIX compliance
>> > violation.

"Technically a POSIX compliance violation" sounds like something only
pedants would care about.  It's actually a recipe for deadlocks and
crashes.

>> Righto.  I would need to mask all signals in the sigaction to be on the 
>> safe(r) side.
>
> This is still doomed, because SIGABRT could fire while 'mon_lock' is
> already held, and so this code would deadlock trying to acquire the
> lock.

Yup.

There is no way to make async signal unsafe code safe.

>> > So I think we'd be safer just eliminating the explicit abort()
>> > calls and adding monitor_flush call to error_report.
>> 
>> I like adding a handler because it is future proof.  No need to play 
>> whack-a-mole when
>> developers re-introduce abort() calls in the future.  A minor benefit is I 
>> would not
>> need ack's from 50 maintainers to change 50 call sites from abort to exit.
>
> That's a bit of a crazy exaggeration. THe aborts() don't cover 50 different
> subsystems, and we don't require explicit acks from every subsystem ma

[PATCH 1/4] vfio/pci: Move VFIODevice initializations in vfio_instance_init

2023-11-15 Thread Zhenzhong Duan

Some of the VFIODevice initializations is in vfio_realize,
move all of them in vfio_instance_init.

No functional change intended.

Suggested-by: Cédric Le Goater 
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/pci.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b23b492cce..5a2b7a2d6b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2969,9 +2969,6 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 if (vfio_device_get_name(vbasedev, errp)) {
 return;
 }
-vbasedev->ops = &vfio_pci_ops;
-vbasedev->type = VFIO_DEVICE_TYPE_PCI;
-vbasedev->dev = DEVICE(vdev);
 
 /*
  * Mediated devices *might* operate compatibly with discarding of RAM, but
@@ -3320,6 +3317,7 @@ static void vfio_instance_init(Object *obj)
 {
 PCIDevice *pci_dev = PCI_DEVICE(obj);
 VFIOPCIDevice *vdev = VFIO_PCI(obj);
+VFIODevice *vbasedev = &vdev->vbasedev;
 
 device_add_bootindex_property(obj, &vdev->bootindex,
   "bootindex", NULL,
@@ -3328,7 +3326,11 @@ static void vfio_instance_init(Object *obj)
 vdev->host.bus = ~0U;
 vdev->host.slot = ~0U;
 vdev->host.function = ~0U;
-vdev->vbasedev.fd = -1;
+
+vbasedev->type = VFIO_DEVICE_TYPE_PCI;
+vbasedev->ops = &vfio_pci_ops;
+vbasedev->dev = DEVICE(vdev);
+vbasedev->fd = -1;
 
 vdev->nv_gpudirect_clique = 0xFF;
 
-- 
2.34.1

[PATCH 2/4] vfio/platform: Move VFIODevice initializations in vfio_platform_instance_init

2023-11-15 Thread Zhenzhong Duan

Some of the VFIODevice initializations is in vfio_platform_realize,
move all of them in vfio_platform_instance_init.

No functional change intended.

Suggested-by: Cédric Le Goater 
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/platform.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a97d9c6234..506eb8193f 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -581,10 +581,6 @@ static void vfio_platform_realize(DeviceState *dev, Error 
**errp)
 VFIODevice *vbasedev = &vdev->vbasedev;
 int i, ret;
 
-vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
-vbasedev->dev = dev;
-vbasedev->ops = &vfio_platform_ops;
-
 qemu_mutex_init(&vdev->intp_mutex);
 
 trace_vfio_platform_realize(vbasedev->sysfsdev ?
@@ -659,8 +655,12 @@ static Property vfio_platform_dev_properties[] = {
 static void vfio_platform_instance_init(Object *obj)
 {
 VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(obj);
+VFIODevice *vbasedev = &vdev->vbasedev;
 
-vdev->vbasedev.fd = -1;
+vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
+vbasedev->ops = &vfio_platform_ops;
+vbasedev->dev = DEVICE(vdev);
+vbasedev->fd = -1;
 }
 
 #ifdef CONFIG_IOMMUFD
-- 
2.34.1

[PATCH 4/4] vfio/ccw: Move VFIODevice initializations in vfio_ccw_instance_init

2023-11-15 Thread Zhenzhong Duan

Some of the VFIODevice initializations is in vfio_ccw_realize,
move all of them in vfio_ccw_instance_init.

No functional change intended.

Suggested-by: Cédric Le Goater 
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/ccw.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index b116b10fe7..8de2fd809b 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -594,20 +594,6 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-vbasedev->ops = &vfio_ccw_ops;
-vbasedev->type = VFIO_DEVICE_TYPE_CCW;
-vbasedev->dev = dev;
-
-/*
- * All vfio-ccw devices are believed to operate in a way compatible with
- * discarding of memory in RAM blocks, ie. pages pinned in the host are
- * in the current working set of the guest driver and therefore never
- * overlap e.g., with pages available to the guest balloon driver.  This
- * needs to be set before vfio_get_device() for vfio common to handle
- * ram_block_discard_disable().
- */
-vbasedev->ram_block_discard_allowed = true;
-
 ret = vfio_attach_device(cdev->mdevid, vbasedev,
  &address_space_memory, errp);
 if (ret) {
@@ -695,8 +681,22 @@ static const VMStateDescription vfio_ccw_vmstate = {
 static void vfio_ccw_instance_init(Object *obj)
 {
 VFIOCCWDevice *vcdev = VFIO_CCW(obj);
+VFIODevice *vbasedev = &vcdev->vdev;
+
+vbasedev->type = VFIO_DEVICE_TYPE_CCW;
+vbasedev->ops = &vfio_ccw_ops;
+vbasedev->dev = DEVICE(vcdev);
+vbasedev->fd = -1;
 
-vcdev->vdev.fd = -1;
+/*
+ * All vfio-ccw devices are believed to operate in a way compatible with
+ * discarding of memory in RAM blocks, ie. pages pinned in the host are
+ * in the current working set of the guest driver and therefore never
+ * overlap e.g., with pages available to the guest balloon driver.  This
+ * needs to be set before vfio_get_device() for vfio common to handle
+ * ram_block_discard_disable().
+ */
+vbasedev->ram_block_discard_allowed = true;
 }
 
 #ifdef CONFIG_IOMMUFD
-- 
2.34.1

[PATCH 0/4] VFIO device init cleanup

2023-11-15 Thread Zhenzhong Duan

Hi,

This is a clean up based on Cedric's suggestion at
https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02722.html

VFIO device initializations are all moved from realize to instance_init.

Based on https://github.com/legoater/qemu/commits/vfio-8.2

Thanks
Zhenzhong

Zhenzhong Duan (4):
  vfio/pci: Move VFIODevice initializations in vfio_instance_init
  vfio/platform: Move VFIODevice initializations in
vfio_platform_instance_init
  vfio/ap: Move VFIODevice initializations in vfio_ap_instance_init
  vfio/ccw: Move VFIODevice initializations in vfio_ccw_instance_init

 hw/vfio/ap.c   | 26 +-
 hw/vfio/ccw.c  | 30 +++---
 hw/vfio/pci.c  | 10 ++
 hw/vfio/platform.c | 10 +-
 4 files changed, 39 insertions(+), 37 deletions(-)

-- 
2.34.1

[PATCH 3/4] vfio/ap: Move VFIODevice initializations in vfio_ap_instance_init

2023-11-15 Thread Zhenzhong Duan

Some of the VFIODevice initializations is in vfio_ap_realize,
move all of them in vfio_ap_instance_init.

No functional change intended.

Suggested-by: Cédric Le Goater 
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/ap.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index b21f92291e..31ea9644c5 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -164,18 +164,6 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
 return;
 }
 
-vbasedev->ops = &vfio_ap_ops;
-vbasedev->type = VFIO_DEVICE_TYPE_AP;
-vbasedev->dev = dev;
-
-/*
- * vfio-ap devices operate in a way compatible with discarding of
- * memory in RAM blocks, as no pages are pinned in the host.
- * This needs to be set before vfio_get_device() for vfio common to
- * handle ram_block_discard_disable().
- */
-vapdev->vdev.ram_block_discard_allowed = true;
-
 ret = vfio_attach_device(vbasedev->name, vbasedev,
  &address_space_memory, errp);
 if (ret) {
@@ -236,8 +224,20 @@ static const VMStateDescription vfio_ap_vmstate = {
 static void vfio_ap_instance_init(Object *obj)
 {
 VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
+VFIODevice *vbasedev = &vapdev->vdev;
 
-vapdev->vdev.fd = -1;
+vbasedev->type = VFIO_DEVICE_TYPE_AP;
+vbasedev->ops = &vfio_ap_ops;
+vbasedev->dev = DEVICE(vapdev);
+vbasedev->fd = -1;
+
+/*
+ * vfio-ap devices operate in a way compatible with discarding of
+ * memory in RAM blocks, as no pages are pinned in the host.
+ * This needs to be set before vfio_get_device() for vfio common to
+ * handle ram_block_discard_disable().
+ */
+vbasedev->ram_block_discard_allowed = true;
 }
 
 #ifdef CONFIG_IOMMUFD
-- 
2.34.1

Re: [PATCH trivial 06/21] docs/devel/migration.rst: spelling fix: doen't

2023-11-15 Thread Michael Tokarev


15.11.2023 09:46, Thomas Huth:
..

The "really-really-fixed" one (without resending):

-  This combination is not possible as the qemu-5.1 doen't understand
+  This combination is not possible as the qemu-5.1 doesn't understand
    pc-5.2 machine type.  So nothing to worry here.

;)


Actually there's more in there than this 1 fix: the change has
other fixes too, and one of them is in heading, so "underline"
in the next line should be fixed too.

These spelling fixes are.. tough :)  Lemme see if I did multiple
changes like this somewhere else, pretending there's just one fix..

Actual commit (with your R-b tag still, as apparently you reviewed
all changes).
With updated Subject, new Fixes: tag, and a change in heading underlining):

From: Michael Tokarev 
Date: Tue Nov 14 19:08:48 2023 +0300

docs/devel/migration.rst: spelling fixes: doen't, diferent, responsability, 
recomend

Fixes: 593c28c02c81 "migration/doc: How to migrate when hosts have different 
features"
Fixes: 1aefe2ca1423 "migration/doc: Add documentation for backwards 
compatiblity"
Reviewed-by: Thomas Huth 
Signed-off-by: Michael Tokarev 

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 5adf4f12f7..ec55089b25 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -1061,7 +1061,7 @@ QEMU version, in this case pc-5.1.

 4 - qemu-5.1 -M pc-5.2  -> migrates to -> qemu-5.1 -M pc-5.2

-  This combination is not possible as the qemu-5.1 doen't understand
+  This combination is not possible as the qemu-5.1 doesn't understand
   pc-5.2 machine type.  So nothing to worry here.

 Now it comes the interesting ones, when both QEMU processes are
@@ -1214,8 +1214,8 @@ machine types to have the right value::
  ...
  };

-A device with diferent features on both sides
--
+A device with different features on both sides
+--

 Let's assume that we are using the same QEMU binary on both sides,
 just to make the things easier.  But we have a device that has
@@ -1294,12 +1294,12 @@ Host B:

 $ qemu-system-x86_64 -cpu host,taa-no=off

-And you would be able to migrate between them.  It is responsability
+And you would be able to migrate between them.  It is responsibility
 of the management application or of the user to make sure that the
 configuration is correct.  QEMU doesn't know how to look at this kind
 of features in general.

-Notice that we don't recomend to use -cpu host for migration.  It is
+Notice that we don't recommend to use -cpu host for migration.  It is
 used in this example because it makes the example simpler.

 Other devices have worse control about individual features.  If they

Re: [PATCH for-8.2] linux-user: Fix loaddr computation for some elf files

2023-11-15 Thread Michael Tokarev


14.11.2023 23:17, Richard Henderson:

The file offset of the load segment is not relevant to the
low address, only the beginning of the virtual address page.

Cc: qemu-sta...@nongnu.org
Fixes: a93934fecd4 ("elf: take phdr offset into account when calculating the program 
load address")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1952
Signed-off-by: Richard Henderson 


Reviewed-by: Michael Tokarev 


---
  linux-user/elfload.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 4cd6891d7b..cf9e74468b 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3308,7 +3308,7 @@ static void load_elf_image(const char *image_name, const 
ImageSource *src,
  for (i = 0; i < ehdr->e_phnum; ++i) {
  struct elf_phdr *eppnt = phdr + i;
  if (eppnt->p_type == PT_LOAD) {
-abi_ulong a = eppnt->p_vaddr - eppnt->p_offset;
+abi_ulong a = eppnt->p_vaddr & TARGET_PAGE_MASK;
  if (a < loaddr) {
  loaddr = a;
  }

Re: [PATCH v2 3/3] hw/nvme: Add SPDM over DOE support

2023-11-15 Thread Klaus Jensen

On Oct 17 15:21, Alistair Francis wrote:
> From: Wilfred Mallawa 
> 
> Setup Data Object Exchance (DOE) as an extended capability for the NVME
> controller and connect SPDM to it (CMA) to it.
> 
> Signed-off-by: Wilfred Mallawa 
> Signed-off-by: Alistair Francis 

Acked-by: Klaus Jensen 

I have no problem with picking this up for nvme, but I'd rather not take
the full series through my tree without reviews/acks from the pci
maintainers.

signature.asc
Description: PGP signature

Re: [RFC PATCH v3 65/78] hw/nvme: add fallthrough pseudo-keyword

2023-11-15 Thread Klaus Jensen

On Oct 13 11:46, Emmanouil Pitsidianakis wrote:
> In preparation of raising -Wimplicit-fallthrough to 5, replace all
> fall-through comments with the fallthrough attribute pseudo-keyword.
> 
> Signed-off-by: Emmanouil Pitsidianakis 
> ---
>  hw/nvme/ctrl.c | 24 
>  hw/nvme/dif.c  |  4 ++--
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index f026245d1e..acb2012fb9 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -1918,7 +1918,7 @@ static uint16_t nvme_zrm_finish(NvmeNamespace *ns, 
> NvmeZone *zone)
>  case NVME_ZONE_STATE_IMPLICITLY_OPEN:
>  case NVME_ZONE_STATE_EXPLICITLY_OPEN:
>  nvme_aor_dec_open(ns);
> -/* fallthrough */
> +fallthrough;
>  case NVME_ZONE_STATE_CLOSED:
>  nvme_aor_dec_active(ns);
>  
> @@ -1929,7 +1929,7 @@ static uint16_t nvme_zrm_finish(NvmeNamespace *ns, 
> NvmeZone *zone)
>  }
>  }
>  
> -/* fallthrough */
> +fallthrough;
>  case NVME_ZONE_STATE_EMPTY:
>  nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL);
>  return NVME_SUCCESS;
> @@ -1946,7 +1946,7 @@ static uint16_t nvme_zrm_close(NvmeNamespace *ns, 
> NvmeZone *zone)
>  case NVME_ZONE_STATE_IMPLICITLY_OPEN:
>  nvme_aor_dec_open(ns);
>  nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED);
> -/* fall through */
> +fallthrough;
>  case NVME_ZONE_STATE_CLOSED:
>  return NVME_SUCCESS;
>  
> @@ -1961,7 +1961,7 @@ static uint16_t nvme_zrm_reset(NvmeNamespace *ns, 
> NvmeZone *zone)
>  case NVME_ZONE_STATE_EXPLICITLY_OPEN:
>  case NVME_ZONE_STATE_IMPLICITLY_OPEN:
>  nvme_aor_dec_open(ns);
> -/* fallthrough */
> +fallthrough;
>  case NVME_ZONE_STATE_CLOSED:
>  nvme_aor_dec_active(ns);
>  
> @@ -1971,12 +1971,12 @@ static uint16_t nvme_zrm_reset(NvmeNamespace *ns, 
> NvmeZone *zone)
>  }
>  }
>  
> -/* fallthrough */
> +fallthrough;
>  case NVME_ZONE_STATE_FULL:
>  zone->w_ptr = zone->d.zslba;
>  zone->d.wp = zone->w_ptr;
>  nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY);
> -/* fallthrough */
> +fallthrough;
>  case NVME_ZONE_STATE_EMPTY:
>  return NVME_SUCCESS;
>  
> @@ -2017,7 +2017,7 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, 
> NvmeNamespace *ns,
>  case NVME_ZONE_STATE_EMPTY:
>  act = 1;
>  
> -/* fallthrough */
> +fallthrough;
>  
>  case NVME_ZONE_STATE_CLOSED:
>  if (n->params.auto_transition_zones) {
> @@ -2040,7 +2040,7 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, 
> NvmeNamespace *ns,
>  return NVME_SUCCESS;
>  }
>  
> -/* fallthrough */
> +fallthrough;
>  
>  case NVME_ZONE_STATE_IMPLICITLY_OPEN:
>  if (flags & NVME_ZRM_AUTO) {
> @@ -2049,7 +2049,7 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, 
> NvmeNamespace *ns,
>  
>  nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN);
>  
> -/* fallthrough */
> +fallthrough;
>  
>  case NVME_ZONE_STATE_EXPLICITLY_OPEN:
>  if (flags & NVME_ZRM_ZRWA) {
> @@ -3582,7 +3582,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
> *req, bool append,
>  return NVME_INVALID_PROT_INFO | NVME_DNR;
>  }
>  
> -/* fallthrough */
> +fallthrough;
>  
>  case NVME_ID_NS_DPS_TYPE_2:
>  if (piremap) {
> @@ -3737,7 +3737,7 @@ static uint16_t nvme_offline_zone(NvmeNamespace *ns, 
> NvmeZone *zone,
>  switch (state) {
>  case NVME_ZONE_STATE_READ_ONLY:
>  nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_OFFLINE);
> -/* fall through */
> +fallthrough;
>  case NVME_ZONE_STATE_OFFLINE:
>  return NVME_SUCCESS;
>  default:
> @@ -4914,7 +4914,7 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t 
> csi, uint32_t buf_len,
>  switch (NVME_CC_CSS(ldl_le_p(&n->bar.cc))) {
>  case NVME_CC_CSS_NVM:
>  src_iocs = nvme_cse_iocs_nvm;
> -/* fall through */
> +fallthrough;
>  case NVME_CC_CSS_ADMIN_ONLY:
>  break;
>  case NVME_CC_CSS_CSI:
> diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
> index 01b19c3373..00dd96bdb3 100644
> --- a/hw/nvme/dif.c
> +++ b/hw/nvme/dif.c
> @@ -161,7 +161,7 @@ static uint16_t nvme_dif_prchk_crc16(NvmeNamespace *ns, 
> NvmeDifTuple *dif,
>  break;
>  }
>  
> -/* fallthrough */
> +fallthrough;
>  case NVME_ID_NS_DPS_TYPE_1:
>  case NVME_ID_NS_DPS_TYPE_2:
>  if (be16_to_cpu(dif->g16.apptag) != 0x) {
> @@ -229,7 +229,7 @@ static uint16_t nvme_dif_prchk_crc64(NvmeNamespace *ns, 
> NvmeDifTuple *dif,
>  break;
>  }
>  
> -/* fallthrough */
> +fallthrough;
>  case NVME

Re: [PATCH] migration: free 'saddr' since be no longer used

2023-11-15 Thread Daniel P . Berrangé

On Wed, Nov 15, 2023 at 11:27:39AM +0800, Zongmin Zhou wrote:
> Since socket_parse() will allocate memory for 'saddr',
> and its value will pass to 'addr' that allocated
> by migrate_uri_parse(),so free 'saddr' to avoid memory leak.
> 
> Fixes: 72a8192e225c ("migration: convert migration 'uri' into 
> 'MigrateAddress'")
> Signed-off-by: Zongmin Zhou
> ---
>  migration/migration.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 28a34c9068..30ed4bf6b6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -493,6 +493,7 @@ bool migrate_uri_parse(const char *uri, MigrationChannel 
> **channel,
>  }
>  addr->u.socket.type = saddr->type;
>  addr->u.socket.u = saddr->u;

'saddr->u' is a union embedded in SocketAddress, containing:

union { /* union tag is @type */
InetSocketAddressWrapper inet;
UnixSocketAddressWrapper q_unix;
VsockSocketAddressWrapper vsock;
StringWrapper fd;
} u;

THis assignment is *shallow* copying the contents of the union.

All the type specifics structs that are members of this union
containing allocated strings, and with this shallow copy, we
are stealing the pointers to these allocated strings


> +qapi_free_SocketAddress(saddr);

This meanwhle is doing a *deep* free of the contents of the
SocketAddress, which includes all the pointers we just stole.

IOW, unless I'm mistaken somehow, this is going to cause a
double-free


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 0/8] Add powernv10 I2C devices and tests

2023-11-15 Thread Cédric Le Goater


Nick,

On 11/14/23 20:56, Glenn Miles wrote:

This series of patches includes support, tests and fixes for
adding PCA9552 and PCA9554 I2C devices to the powernv10 chip.

The PCA9552 device is used for PCIe slot hotplug power control
and monitoring, while the PCA9554 device is used for presence
detection of IBM CableCard devices.  Both devices are required
by the Power Hypervisor Firmware on Power10 platforms.

Changes from previous version:
   - Added Fixes: tag to commits 3 and 4


Are you preparing a QEMU 8.2 PR for fixes ?

Thanks,

C.




   - Fixed formatting errors in commits 2 and 8

Glenn Miles (8):
   ppc/pnv: Add pca9552 to powernv10 for PCIe hotplug power control
   ppc/pnv: Wire up pca9552 GPIO pins for PCIe hotplug power control
   ppc/pnv: PNV I2C engines assigned incorrect XSCOM addresses
   ppc/pnv: Fix PNV I2C invalid status after reset
   ppc/pnv: Use resettable interface to reset child I2C buses
   misc: Add a pca9554 GPIO device model
   ppc/pnv: Add a pca9554 I2C device to powernv10
   ppc/pnv: Test pnv i2c master and connected devices

  MAINTAINERS |   2 +
  hw/misc/Kconfig |   4 +
  hw/misc/meson.build |   1 +
  hw/misc/pca9554.c   | 328 
  hw/ppc/Kconfig  |   2 +
  hw/ppc/pnv.c|  35 +-
  hw/ppc/pnv_i2c.c|  47 ++-
  include/hw/misc/pca9554.h   |  36 ++
  include/hw/misc/pca9554_regs.h  |  19 +
  tests/qtest/meson.build |   1 +
  tests/qtest/pnv-host-i2c-test.c | 650 
  11 files changed, 1103 insertions(+), 22 deletions(-)
  create mode 100644 hw/misc/pca9554.c
  create mode 100644 include/hw/misc/pca9554.h
  create mode 100644 include/hw/misc/pca9554_regs.h
  create mode 100644 tests/qtest/pnv-host-i2c-test.c

Re: [PATCH] linux-headers: Synchronize linux headers from linux v6.7.0-rc1

2023-11-15 Thread gaosong


Hi,

Can this patch be merged in during the 8.2 cycle?

Thanks.
Song Gao

在 2023/11/14 上午9:54, Tianrui Zhao 写道:

Use the scripts/update-linux-headers.sh to synchronize linux
headers from linux v6.7.0-rc1. We mainly want to add the
loongarch linux headers and then add the loongarch kvm support
based on it.

Signed-off-by: Tianrui Zhao 
---
  include/standard-headers/drm/drm_fourcc.h |   2 +
  include/standard-headers/linux/pci_regs.h |  24 ++-
  include/standard-headers/linux/vhost_types.h  |   7 +
  .../standard-headers/linux/virtio_config.h|   5 +
  linux-headers/asm-arm64/kvm.h |  32 
  linux-headers/asm-generic/unistd.h|  14 +-
  linux-headers/asm-loongarch/bitsperlong.h |   1 +
  linux-headers/asm-loongarch/kvm.h | 108 +++
  linux-headers/asm-loongarch/mman.h|   1 +
  linux-headers/asm-loongarch/unistd.h  |   5 +
  linux-headers/asm-mips/unistd_n32.h   |   4 +
  linux-headers/asm-mips/unistd_n64.h   |   4 +
  linux-headers/asm-mips/unistd_o32.h   |   4 +
  linux-headers/asm-powerpc/unistd_32.h |   4 +
  linux-headers/asm-powerpc/unistd_64.h |   4 +
  linux-headers/asm-riscv/kvm.h |  12 ++
  linux-headers/asm-s390/unistd_32.h|   4 +
  linux-headers/asm-s390/unistd_64.h|   4 +
  linux-headers/asm-x86/unistd_32.h |   4 +
  linux-headers/asm-x86/unistd_64.h |   3 +
  linux-headers/asm-x86/unistd_x32.h|   3 +
  linux-headers/linux/iommufd.h | 180 +-
  linux-headers/linux/kvm.h |  11 ++
  linux-headers/linux/psp-sev.h |   1 +
  linux-headers/linux/stddef.h  |   7 +
  linux-headers/linux/userfaultfd.h |   9 +-
  linux-headers/linux/vfio.h|  47 +++--
  linux-headers/linux/vhost.h   |   8 +
  28 files changed, 486 insertions(+), 26 deletions(-)
  create mode 100644 linux-headers/asm-loongarch/bitsperlong.h
  create mode 100644 linux-headers/asm-loongarch/kvm.h
  create mode 100644 linux-headers/asm-loongarch/mman.h
  create mode 100644 linux-headers/asm-loongarch/unistd.h

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 72279f4d25..3afb70160f 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -322,6 +322,8 @@ extern "C" {
   * index 1 = Cr:Cb plane, [39:0] Cr1:Cb1:Cr0:Cb0 little endian
   */
  #define DRM_FORMAT_NV15   fourcc_code('N', 'V', '1', '5') /* 2x2 
subsampled Cr:Cb plane */
+#define DRM_FORMAT_NV20fourcc_code('N', 'V', '2', '0') /* 2x1 
subsampled Cr:Cb plane */
+#define DRM_FORMAT_NV30fourcc_code('N', 'V', '3', '0') /* 
non-subsampled Cr:Cb plane */
  
  /*

   * 2 plane YCbCr MSB aligned
diff --git a/include/standard-headers/linux/pci_regs.h 
b/include/standard-headers/linux/pci_regs.h
index e5f558d964..a39193213f 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -80,6 +80,7 @@
  #define  PCI_HEADER_TYPE_NORMAL   0
  #define  PCI_HEADER_TYPE_BRIDGE   1
  #define  PCI_HEADER_TYPE_CARDBUS  2
+#define  PCI_HEADER_TYPE_MFD   0x80/* Multi-Function Device 
(possible) */
  
  #define PCI_BIST		0x0f	/* 8 bits */

  #define  PCI_BIST_CODE_MASK   0x0f/* Return result */
@@ -637,6 +638,7 @@
  #define PCI_EXP_RTCAP 0x1e/* Root Capabilities */
  #define  PCI_EXP_RTCAP_CRSVIS 0x0001  /* CRS Software Visibility capability */
  #define PCI_EXP_RTSTA 0x20/* Root Status */
+#define  PCI_EXP_RTSTA_PME_RQ_ID 0x /* PME Requester ID */
  #define  PCI_EXP_RTSTA_PME0x0001 /* PME status */
  #define  PCI_EXP_RTSTA_PENDING0x0002 /* PME pending */
  /*
@@ -930,12 +932,13 @@
  
  /* Process Address Space ID */

  #define PCI_PASID_CAP 0x04/* PASID feature register */
-#define  PCI_PASID_CAP_EXEC0x02/* Exec permissions Supported */
-#define  PCI_PASID_CAP_PRIV0x04/* Privilege Mode Supported */
+#define  PCI_PASID_CAP_EXEC0x0002  /* Exec permissions Supported */
+#define  PCI_PASID_CAP_PRIV0x0004  /* Privilege Mode Supported */
+#define  PCI_PASID_CAP_WIDTH   0x1f00
  #define PCI_PASID_CTRL0x06/* PASID control register */
-#define  PCI_PASID_CTRL_ENABLE 0x01/* Enable bit */
-#define  PCI_PASID_CTRL_EXEC   0x02/* Exec permissions Enable */
-#define  PCI_PASID_CTRL_PRIV   0x04/* Privilege Mode Enable */
+#define  PCI_PASID_CTRL_ENABLE 0x0001  /* Enable bit */
+#define  PCI_PASID_CTRL_EXEC   0x0002  /* Exec permissions Enable */
+#define  PCI_PASID_CTRL_PRIV   0x0004  /* Privilege Mode Enable */
  #define PCI_EXT_CAP_PASID_SIZEOF  8
  
  /* Single Root I/O Virtualization */

@@ -975,6 +978,8 @@
  #define  PCI_LTR_VALUE_MASK   0

Re: [PATCH v3 02/70] RAMBlock: Add support of KVM private guest memfd

2023-11-15 Thread Daniel P . Berrangé

On Wed, Nov 15, 2023 at 02:14:11AM -0500, Xiaoyao Li wrote:
> Add KVM guest_memfd support to RAMBlock so both normal hva based memory
> and kvm guest memfd based private memory can be associated in one RAMBlock.
> 
> Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
> create private guest_memfd during RAMBlock setup.
> 
> Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
> confidential guests, such as TDX VM. How and when to set it for memory
> backends will be implemented in the following patches.
> 
> Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
> KVM guest_memfd allocated.
> 
> Signed-off-by: Xiaoyao Li 
> ---
> Changes in v3:
> - rename gmem to guest_memfd;
> - close(guest_memfd) when RAMBlock is released; (Daniel P. Berrangé)
> - Suqash the patch that introduces memory_region_has_guest_memfd().
> ---
>  accel/kvm/kvm-all.c | 24 
>  include/exec/memory.h   | 13 +
>  include/exec/ramblock.h |  1 +
>  include/sysemu/kvm.h|  2 ++
>  system/memory.c |  5 +
>  system/physmem.c| 27 ---
>  6 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index c1b40e873531..9f751d4971f8 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -101,6 +101,7 @@ bool kvm_msi_use_devid;
>  bool kvm_has_guest_debug;
>  static int kvm_sstep_flags;
>  static bool kvm_immediate_exit;
> +static bool kvm_guest_memfd_supported;
>  static hwaddr kvm_max_slot_size = ~0;
>  
>  static const KVMCapabilityInfo kvm_required_capabilites[] = {
> @@ -2397,6 +2398,8 @@ static int kvm_init(MachineState *ms)
>  }
>  s->as = g_new0(struct KVMAs, s->nr_as);
>  
> +kvm_guest_memfd_supported = kvm_check_extension(s, KVM_CAP_GUEST_MEMFD);
> +
>  if (object_property_find(OBJECT(current_machine), "kvm-type")) {
>  g_autofree char *kvm_type = 
> object_property_get_str(OBJECT(current_machine),
>  "kvm-type",
> @@ -4078,3 +4081,24 @@ void query_stats_schemas_cb(StatsSchemaList **result, 
> Error **errp)
>  query_stats_schema_vcpu(first_cpu, &stats_args);
>  }
>  }
> +
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
> +{
> +int fd;
> +struct kvm_create_guest_memfd guest_memfd = {
> +.size = size,
> +.flags = flags,
> +};
> +
> +if (!kvm_guest_memfd_supported) {
> +error_setg(errp, "KVM doesn't support guest memfd\n");
> +return -EOPNOTSUPP;

Returning an errno value is unusual when we have an 'Error **errp' parameter
for reporting, and the following codepath merely returns -1, so this is
inconsistent. Just return -1 here too.

> +}
> +
> +fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &guest_memfd);
> +if (fd < 0) {
> +error_setg_errno(errp, errno, "%s: error creating kvm guest 
> memfd\n", __func__);

I'd prefer an explicit 'return -1' here, even though 'fd' is technically going
to be -1 already.

Also including __func__ in the error message is not really needed IMHO

> +}
> +
> +return fd;
> +}
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 831f7c996d9d..f780367ab1bd 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -243,6 +243,9 @@ typedef struct IOMMUTLBEvent {
>  /* RAM FD is opened read-only */
>  #define RAM_READONLY_FD (1 << 11)
>  
> +/* RAM can be private that has kvm gmem backend */
> +#define RAM_GUEST_MEMFD   (1 << 12)
> +
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
> IOMMUNotifierFlag flags,
> hwaddr start, hwaddr end,
> @@ -1702,6 +1705,16 @@ static inline bool memory_region_is_romd(MemoryRegion 
> *mr)
>   */
>  bool memory_region_is_protected(MemoryRegion *mr);
>  
> +/**
> + * memory_region_has_guest_memfd: check whether a memory region has 
> guest_memfd
> + * associated
> + *
> + * Returns %true if a memory region's ram_block has valid guest_memfd 
> assigned.
> + *
> + * @mr: the memory region being queried
> + */
> +bool memory_region_has_guest_memfd(MemoryRegion *mr);
> +
>  /**
>   * memory_region_get_iommu: check whether a memory region is an iommu
>   *
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 69c6a5390293..0a17ba882729 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -41,6 +41,7 @@ struct RAMBlock {
>  QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
>  int fd;
>  uint64_t fd_offset;
> +int guest_memfd;
>  size_t page_size;
>  /* dirty bitmap used during migration */
>  unsigned long *bmap;
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index d61487816421..fedc28c7d17f 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -538,4

[PATCH RESEND v2 1/2] i386: Fix conditional CONFIG_SYNDBG enablement

2023-11-15 Thread Vitaly Kuznetsov

Putting HYPERV_FEAT_SYNDBG entry under "#ifdef CONFIG_SYNDBG" in
'kvm_hyperv_properties' array is wrong: as HYPERV_FEAT_SYNDBG is not
the highest feature number, the result is an empty (zeroed) entry in
the array (and not a skipped entry!). hyperv_feature_supported() is
designed to check that all CPUID bits are set but for a zeroed
feature in 'kvm_hyperv_properties' it returns 'true' so QEMU considers
HYPERV_FEAT_SYNDBG as always supported, regardless of whether KVM host
actually supports it.

To fix the issue, leave HYPERV_FEAT_SYNDBG's definition in
'kvm_hyperv_properties' array, there's nothing wrong in having it defined
even when 'CONFIG_SYNDBG' is not set. Instead, put "hv-syndbg" CPU property
under '#ifdef CONFIG_SYNDBG' to alter the existing behavior when the flag
is silently skipped in !CONFIG_SYNDBG builds.

Leave an 'assert' sentinel in hyperv_feature_supported() making sure there
are no 'holes' or improperly defined features in 'kvm_hyperv_properties'.

Fixes: d8701185f40c ("hw: hyperv: Initial commit for Synthetic Debugging 
device")
Signed-off-by: Vitaly Kuznetsov 
---
 target/i386/cpu.c |  2 ++
 target/i386/kvm/kvm.c | 11 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 358d9c0a655a..f5fac3744173 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7842,8 +7842,10 @@ static Property x86_cpu_properties[] = {
   HYPERV_FEAT_TLBFLUSH_DIRECT, 0),
 DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU,
 hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF),
+#ifdef CONFIG_SYNDBG
 DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features,
   HYPERV_FEAT_SYNDBG, 0),
+#endif
 DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false),
 DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false),
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 11b8177eff21..2fcb1f6673d8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -945,7 +945,6 @@ static struct {
  .bits = HV_DEPRECATING_AEOI_RECOMMENDED}
 }
 },
-#ifdef CONFIG_SYNDBG
 [HYPERV_FEAT_SYNDBG] = {
 .desc = "Enable synthetic kernel debugger channel (hv-syndbg)",
 .flags = {
@@ -954,7 +953,6 @@ static struct {
 },
 .dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED)
 },
-#endif
 [HYPERV_FEAT_MSR_BITMAP] = {
 .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)",
 .flags = {
@@ -1206,6 +1204,13 @@ static bool hyperv_feature_supported(CPUState *cs, int 
feature)
 uint32_t func, bits;
 int i, reg;
 
+/*
+ * kvm_hyperv_properties needs to define at least one CPUID flag which
+ * must be used to detect the feature, it's hard to say whether it is
+ * supported or not otherwise.
+ */
+assert(kvm_hyperv_properties[feature].flags[0].func);
+
 for (i = 0; i < ARRAY_SIZE(kvm_hyperv_properties[feature].flags); i++) {
 
 func = kvm_hyperv_properties[feature].flags[i].func;
@@ -3391,13 +3396,11 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_add(cpu, HV_X64_MSR_TSC_EMULATION_STATUS,
   env->msr_hv_tsc_emulation_status);
 }
-#ifdef CONFIG_SYNDBG
 if (hyperv_feat_enabled(cpu, HYPERV_FEAT_SYNDBG) &&
 has_msr_hv_syndbg_options) {
 kvm_msr_entry_add(cpu, HV_X64_MSR_SYNDBG_OPTIONS,
   hyperv_syndbg_query_options());
 }
-#endif
 }
 if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC)) {
 kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE,
-- 
2.41.0

[PATCH RESEND v2 0/2] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'

2023-11-15 Thread Vitaly Kuznetsov

Changes since v1/v1 RESEND:
- No changes.

Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is
used. While 'hv-passthrough' is a debug only feature, this significantly
limit its usefullness. While debugging the problem, I found that there are
two loosely connected issues:
- 'hv-passthrough' enables 'hv-syndbg' and this is undesired.
- 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG.

Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow
to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. 

Vitaly Kuznetsov (2):
  i386: Fix conditional CONFIG_SYNDBG enablement
  i386: Exclude 'hv-syndbg' from 'hv-passthrough'

 docs/system/i386/hyperv.rst | 13 +
 target/i386/cpu.c   |  2 ++
 target/i386/kvm/kvm.c   | 18 --
 3 files changed, 23 insertions(+), 10 deletions(-)

-- 
2.41.0

[PATCH RESEND v2 2/2] i386: Exclude 'hv-syndbg' from 'hv-passthrough'

2023-11-15 Thread Vitaly Kuznetsov

Windows with Hyper-V role enabled doesn't boot with 'hv-passthrough' when
no debugger is configured, this significantly limits the usefulness of the
feature as there's no support for subtracting Hyper-V features from CPU
flags at this moment (e.g. "-cpu host,hv-passthrough,-hv-syndbg" does not
work). While this is also theoretically fixable, 'hv-syndbg' is likely
very special and unneeded in the default set. Genuine Hyper-V doesn't seem
to enable it either.

Introduce 'skip_passthrough' flag to 'kvm_hyperv_properties' and use it as
one-off to skip 'hv-syndbg' when enabling features in 'hv-passthrough'
mode. Note, "-cpu host,hv-passthrough,hv-syndbg" can still be used if
needed.

As both 'hv-passthrough' and 'hv-syndbg' are debug features, the change
should not have any effect on production environments.

Signed-off-by: Vitaly Kuznetsov 
---
 docs/system/i386/hyperv.rst | 13 +
 target/i386/kvm/kvm.c   |  7 +--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst
index 2505dc4c86e0..009947e39141 100644
--- a/docs/system/i386/hyperv.rst
+++ b/docs/system/i386/hyperv.rst
@@ -262,14 +262,19 @@ Supplementary features
 ``hv-passthrough``
   In some cases (e.g. during development) it may make sense to use QEMU in
   'pass-through' mode and give Windows guests all enlightenments currently
-  supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU
-  flag.
+  supported by KVM.
 
   Note: ``hv-passthrough`` flag only enables enlightenments which are known to 
QEMU
   (have corresponding 'hv-' flag) and copies ``hv-spinlocks`` and 
``hv-vendor-id``
   values from KVM to QEMU. ``hv-passthrough`` overrides all other 'hv-' 
settings on
-  the command line. Also, enabling this flag effectively prevents migration as 
the
-  list of enabled enlightenments may differ between target and destination 
hosts.
+  the command line.
+
+  Note: ``hv-passthrough`` does not enable ``hv-syndbg`` which can prevent 
certain
+  Windows guests from booting when used without proper configuration. If 
needed,
+  ``hv-syndbg`` can be enabled additionally.
+
+  Note: ``hv-passthrough`` effectively prevents migration as the list of 
enabled
+  enlightenments may differ between target and destination hosts.
 
 ``hv-enforce-cpuid``
   By default, KVM allows the guest to use all currently supported Hyper-V
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 2fcb1f6673d8..0c745562b667 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -823,6 +823,7 @@ static struct {
 uint32_t bits;
 } flags[2];
 uint64_t dependencies;
+bool skip_passthrough;
 } kvm_hyperv_properties[] = {
 [HYPERV_FEAT_RELAXED] = {
 .desc = "relaxed timing (hv-relaxed)",
@@ -951,7 +952,8 @@ static struct {
 {.func = HV_CPUID_FEATURES, .reg = R_EDX,
  .bits = HV_FEATURE_DEBUG_MSRS_AVAILABLE}
 },
-.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED)
+.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED),
+.skip_passthrough = true,
 },
 [HYPERV_FEAT_MSR_BITMAP] = {
 .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)",
@@ -1360,7 +1362,8 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp)
  * hv_build_cpuid_leaf() uses this info to build guest CPUIDs.
  */
 for (feat = 0; feat < ARRAY_SIZE(kvm_hyperv_properties); feat++) {
-if (hyperv_feature_supported(cs, feat)) {
+if (hyperv_feature_supported(cs, feat) &&
+!kvm_hyperv_properties[feat].skip_passthrough) {
 cpu->hyperv_features |= BIT(feat);
 }
 }
-- 
2.41.0

Re: [PATCH] softmmu/memory: use memcpy for multi-byte accesses

2023-11-15 Thread Peter Maydell

On Tue, 14 Nov 2023 at 21:18, Richard Henderson
 wrote:
>
> On 11/14/23 12:55, Patrick Venture wrote:
> > Avoids unaligned pointer issues.
> >
> > Reviewed-by: Chris Rauer 
> > Reviewed-by: Peter Foley 
> > Signed-off-by: Patrick Venture 
> > ---
> >   system/memory.c | 16 
> >   1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/system/memory.c b/system/memory.c
> > index 304fa843ea..02c97d5187 100644
> > --- a/system/memory.c
> > +++ b/system/memory.c
> > @@ -1343,16 +1343,16 @@ static uint64_t memory_region_ram_device_read(void 
> > *opaque,
> >
> >   switch (size) {
> >   case 1:
> > -data = *(uint8_t *)(mr->ram_block->host + addr);
> > +memcpy(&data, mr->ram_block->host + addr, sizeof(uint8_t));
>
>
> This is incorrect, especially for big-endian hosts.
>
> You want to use "qemu/bswap.h", ld*_he_p(), st*_he_p().

More specifically, we have a ldn_he_p() and stn_he_p() that
take the size in bytes of the data to read, so we should be
able to replace the switch-on-size in these functions with
a single call to the appropriate one of those.

thanks
-- PMM

Re: [PATCH] softmmu/memory: use memcpy for multi-byte accesses

2023-11-15 Thread Peter Maydell

On Tue, 14 Nov 2023 at 20:55, Patrick Venture  wrote:
> Avoids unaligned pointer issues.
>

It would be nice to be more specific in the commit message here, by
describing what kind of guest behaviour or machine config runs into this
problem, and whether this happens in a situation users are likely to
run into. If the latter, we should consider tagging the commit
with "Cc: qemu-sta...@nongnu.org" to have it backported to the
stable release branches.

thanks
-- PMM

Re: [PATCH v3 06/70] kvm: Introduce support for memory_attributes

2023-11-15 Thread Daniel P . Berrangé

On Wed, Nov 15, 2023 at 02:14:15AM -0500, Xiaoyao Li wrote:
> Introduce the helper functions to set the attributes of a range of
> memory to private or shared.
> 
> This is necessary to notify KVM the private/shared attribute of each gpa
> range. KVM needs the information to decide the GPA needs to be mapped at
> hva-based shared memory or guest_memfd based private memory.
> 
> Signed-off-by: Xiaoyao Li 
> ---
>  accel/kvm/kvm-all.c  | 42 ++
>  include/sysemu/kvm.h |  3 +++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 69afeb47c9c0..76e2404d54d2 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -102,6 +102,7 @@ bool kvm_has_guest_debug;
>  static int kvm_sstep_flags;
>  static bool kvm_immediate_exit;
>  static bool kvm_guest_memfd_supported;
> +static uint64_t kvm_supported_memory_attributes;
>  static hwaddr kvm_max_slot_size = ~0;
>  
>  static const KVMCapabilityInfo kvm_required_capabilites[] = {
> @@ -1305,6 +1306,44 @@ void kvm_set_max_memslot_size(hwaddr max_slot_size)
>  kvm_max_slot_size = max_slot_size;
>  }
>  
> +static int kvm_set_memory_attributes(hwaddr start, hwaddr size, uint64_t 
> attr)
> +{
> +struct kvm_memory_attributes attrs;
> +int r;
> +
> +attrs.attributes = attr;
> +attrs.address = start;
> +attrs.size = size;
> +attrs.flags = 0;
> +
> +r = kvm_vm_ioctl(kvm_state, KVM_SET_MEMORY_ATTRIBUTES, &attrs);
> +if (r) {
> +warn_report("%s: failed to set memory (0x%lx+%#zx) with attr 0x%lx 
> error '%s'",
> + __func__, start, size, attr, strerror(errno));

This is an error condition rather than an warning condition.

Also again I think __func__ is generally not required in an error message,
if the error message text is suitably descriptive - applies to other
patches in this series too.

> +}
> +return r;
> +}
> +
> +int kvm_set_memory_attributes_private(hwaddr start, hwaddr size)
> +{
> +if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> +error_report("KVM doesn't support PRIVATE memory attribute\n");
> +return -EINVAL;
> +}
> +
> +return kvm_set_memory_attributes(start, size, 
> KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +}
> +
> +int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size)
> +{
> +if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> +error_report("KVM doesn't support PRIVATE memory attribute\n");
> +return -EINVAL;
> +}
> +
> +return kvm_set_memory_attributes(start, size, 0);
> +}
> +
>  /* Called with KVMMemoryListener.slots_lock held */
>  static void kvm_set_phys_mem(KVMMemoryListener *kml,
>   MemoryRegionSection *section, bool add)
> @@ -2440,6 +2479,9 @@ static int kvm_init(MachineState *ms)
>  
>  kvm_guest_memfd_supported = kvm_check_extension(s, KVM_CAP_GUEST_MEMFD);
>  
> +ret = kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES);
> +kvm_supported_memory_attributes = ret > 0 ? ret : 0;
> +
>  if (object_property_find(OBJECT(current_machine), "kvm-type")) {
>  g_autofree char *kvm_type = 
> object_property_get_str(OBJECT(current_machine),
>  "kvm-type",
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index fedc28c7d17f..0e88958190a4 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -540,4 +540,7 @@ bool kvm_dirty_ring_enabled(void);
>  uint32_t kvm_dirty_ring_size(void);
>  
>  int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
> +
> +int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
> +int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
>  #endif
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 10/70] kvm: handle KVM_EXIT_MEMORY_FAULT

2023-11-15 Thread Daniel P . Berrangé

On Wed, Nov 15, 2023 at 02:14:19AM -0500, Xiaoyao Li wrote:
> From: Chao Peng 
> 
> Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
> KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
> the memory conversion on the RAMBlock to turn the memory into desired
> attribute, i.e., private/shared.
> 
> Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
> guest_memfd memory backend.
> 
> Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is
> added.
> 
> Signed-off-by: Chao Peng 
> Co-developed-by: Xiaoyao Li 
> Signed-off-by: Xiaoyao Li 
> ---
>  accel/kvm/kvm-all.c | 76 +++--
>  1 file changed, 66 insertions(+), 10 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 76e2404d54d2..58abbcb6926e 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2902,6 +2902,50 @@ static void kvm_eat_signals(CPUState *cpu)
>  } while (sigismember(&chkset, SIG_IPI));
>  }
>  
> +static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> +{
> +MemoryRegionSection section;
> +ram_addr_t offset;
> +RAMBlock *rb;
> +void *addr;
> +int ret = -1;
> +
> +section = memory_region_find(get_system_memory(), start, size);
> +if (!section.mr) {
> +return ret;
> +}
> +
> +if (memory_region_has_guest_memfd(section.mr)) {
> +if (to_private) {
> +ret = kvm_set_memory_attributes_private(start, size);
> +} else {
> +ret = kvm_set_memory_attributes_shared(start, size);
> +}
> +
> +if (ret) {
> +memory_region_unref(section.mr);
> +return ret;
> +}
> +
> +addr = memory_region_get_ram_ptr(section.mr) +
> +   section.offset_within_region;
> +rb = qemu_ram_block_from_host(addr, false, &offset);
> +/*
> + * With KVM_SET_MEMORY_ATTRIBUTES by kvm_set_memory_attributes(),
> + * operation on underlying file descriptor is only for releasing
> + * unnecessary pages.
> + */
> +ram_block_convert_range(rb, offset, size, to_private);
> +} else {
> +warn_report("Convert non guest_memfd backed memory region "
> +"(0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
> +start, size, to_private ? "private" : "shared");

Again, if you're returning '-1' to indicate error, then
using warn_report is wrong, it should be error_report.

warn_report is for when you return success, indicating
the problem was non-fatal.

> +}
> +
> +memory_region_unref(section.mr);
> +return ret;
> +}
> +
>  int kvm_cpu_exec(CPUState *cpu)
>  {
>  struct kvm_run *run = cpu->kvm_run;
> @@ -2969,18 +3013,20 @@ int kvm_cpu_exec(CPUState *cpu)
>  ret = EXCP_INTERRUPT;
>  break;
>  }
> -fprintf(stderr, "error: kvm run failed %s\n",
> -strerror(-run_ret));
> +if (!(run_ret == -EFAULT && run->exit_reason == 
> KVM_EXIT_MEMORY_FAULT)) {
> +fprintf(stderr, "error: kvm run failed %s\n",
> +strerror(-run_ret));
>  #ifdef TARGET_PPC
> -if (run_ret == -EBUSY) {
> -fprintf(stderr,
> -"This is probably because your SMT is enabled.\n"
> -"VCPU can only run on primary threads with all "
> -"secondary threads offline.\n");
> -}
> +if (run_ret == -EBUSY) {
> +fprintf(stderr,
> +"This is probably because your SMT is enabled.\n"
> +"VCPU can only run on primary threads with all "
> +"secondary threads offline.\n");
> +}
>  #endif
> -ret = -1;
> -break;
> +ret = -1;
> +break;
> +}
>  }
>  
>  trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
> @@ -3067,6 +3113,16 @@ int kvm_cpu_exec(CPUState *cpu)
>  break;
>  }
>  break;
> +case KVM_EXIT_MEMORY_FAULT:
> +if (run->memory_fault.flags & ~KVM_MEMORY_EXIT_FLAG_PRIVATE) {
> +error_report("KVM_EXIT_MEMORY_FAULT: Unknown flag 0x%" 
> PRIx64,
> + (uint64_t)run->memory_fault.flags);
> +ret = -1;
> +break;
> +}
> +ret = kvm_convert_memory(run->memory_fault.gpa, 
> run->memory_fault.size,
> + run->memory_fault.flags & 
> KVM_MEMORY_EXIT_FLAG_PRIVATE);
> +break;
>  default:
>  DPRINTF("kvm_arch_handle_exit\n");
>  ret = kvm_arch_handle_exit(cpu, run);
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/p

Re: [PATCH v3 14/70] target/i386: Implement mc->kvm_type() to get VM type

2023-11-15 Thread Daniel P . Berrangé

On Wed, Nov 15, 2023 at 02:14:23AM -0500, Xiaoyao Li wrote:
> Implement mc->kvm_type() for i386 machines. It provides a way for user
> to create SW_PROTECTE_VM.

Small typo there missing final 'D' in 'PROTECTED'

> 
> Also store the vm_type in machinestate to other code to query what the
> VM type is.
> 
> Signed-off-by: Xiaoyao Li 
> ---
>  hw/i386/x86.c  | 12 
>  include/hw/i386/x86.h  |  1 +
>  target/i386/kvm/kvm.c  | 25 +
>  target/i386/kvm/kvm_i386.h |  1 +
>  4 files changed, 39 insertions(+)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index b3d054889bba..55678279bf3b 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -1377,6 +1377,17 @@ static void machine_set_sgx_epc(Object *obj, Visitor 
> *v, const char *name,
>  qapi_free_SgxEPCList(list);
>  }
>  
> +static int x86_kvm_type(MachineState *ms, const char *vm_type)
> +{
> +X86MachineState *x86ms = X86_MACHINE(ms);
> +int kvm_type;
> +
> +kvm_type = kvm_get_vm_type(ms, vm_type);
> +x86ms->vm_type = kvm_type;
> +
> +return kvm_type;
> +}
> +
>  static void x86_machine_initfn(Object *obj)
>  {
>  X86MachineState *x86ms = X86_MACHINE(obj);
> @@ -1401,6 +1412,7 @@ static void x86_machine_class_init(ObjectClass *oc, 
> void *data)
>  mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
>  mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
>  mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
> +mc->kvm_type = x86_kvm_type;
>  x86mc->save_tsc_khz = true;
>  x86mc->fwcfg_dma_enabled = true;
>  nc->nmi_monitor_handler = x86_nmi;
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> index da19ae15463a..ab1d38569019 100644
> --- a/include/hw/i386/x86.h
> +++ b/include/hw/i386/x86.h
> @@ -41,6 +41,7 @@ struct X86MachineState {
>  MachineState parent;
>  
>  /*< public >*/
> +unsigned int vm_type;
>  
>  /* Pointers to devices and objects: */
>  ISADevice *rtc;
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index b4b9ce89842f..2e47fda25f95 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -161,6 +161,31 @@ static KVMMSRHandlers 
> msr_handlers[KVM_MSR_FILTER_MAX_RANGES];
>  static RateLimit bus_lock_ratelimit_ctrl;
>  static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
>  
> +static const char* vm_type_name[] = {

nitpick   'char *vm_type_name[]', is normal style

> +[KVM_X86_DEFAULT_VM] = "default",
> +[KVM_X86_SW_PROTECTED_VM] = "sw-protected-vm",
> +};
> +
> +int kvm_get_vm_type(MachineState *ms, const char *vm_type)
> +{
> +int kvm_type = KVM_X86_DEFAULT_VM;
> +
> +/*
> + * old KVM doesn't support KVM_CAP_VM_TYPES and KVM_X86_DEFAULT_VM
> + * is always supported
> + */
> +if (kvm_type == KVM_X86_DEFAULT_VM) {
> +return kvm_type;
> +}
> +
> +if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) 
> & BIT(kvm_type))) {
> +error_report("vm-type %s not supported by KVM", 
> vm_type_name[kvm_type]);
> +exit(1);
> +}
> +
> +return kvm_type;
> +}
> +
>  bool kvm_has_smm(void)
>  {
>  return kvm_vm_check_extension(kvm_state, KVM_CAP_X86_SMM);
> diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
> index 30fedcffea3e..55fb25fa8e2e 100644
> --- a/target/i386/kvm/kvm_i386.h
> +++ b/target/i386/kvm/kvm_i386.h
> @@ -37,6 +37,7 @@ bool kvm_hv_vpindex_settable(void);
>  bool kvm_enable_sgx_provisioning(KVMState *s);
>  bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp);
>  
> +int kvm_get_vm_type(MachineState *ms, const char *vm_type);
>  void kvm_arch_reset_vcpu(X86CPU *cs);
>  void kvm_arch_after_reset_vcpu(X86CPU *cpu);
>  void kvm_arch_do_init_vcpu(X86CPU *cs);
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v1] target/i386/host-cpu: Use IOMMU addr width for passthrough devices on Intel platforms

2023-11-15 Thread Gerd Hoffmann

  Hi,

> > +if (iommu_phys_bits && phys_bits > iommu_phys_bits) {
> > +phys_bits = iommu_phys_bits;
> > +if (!warned2) {
> > +warn_report("Using physical bits (%u)"
> > +" to prevent VFIO mapping failures",
> > +iommu_phys_bits);
> > +warned2 = true;
> > +}
> > +}
> > +
> >  return phys_bits;
> >  }
 
> - As I (may have) mentioned in my OVMF comments, I'm unsure if narrowing
> the VCPU "phys address bits" property due to host IOMMU limitations is a
> good design. To me it feels like hacking one piece of information into
> another (unrelated) piece of information. It vaguely makes me think
> we're going to regret this later. But I don't have any specific, current
> counter-argument, admittedly.

It boils down to:

  (a) do MIN(cpu-phys-bits,iommu-phys-bits) in qemu (this patch)

or

  (b1) communicate iommu-phys-bits to the guest firmware
  (b2) do MIN(cpu-phys-bits,iommu-phys-bits) in the guest.

We certainly had cases cases in the past where taking shortcuts in the
design to simplify things caused problems later on.  So variant (a)
leaves the somewhat ugly feeling that we might regret this some day.

On the other hand switching from (a) to (b) at some point in the future
(should the need arise) shouldn't be much different from doing (b) now.
And the whole phys-bits situation is already messy enough even without
a new iommu-phys-bits setting for the firmware.

So, all in all I think I'm fine with taking this approach.

Acked-by: Gerd Hoffmann 

take care,
  Gerd

Re: [PATCH v3 18/70] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES

2023-11-15 Thread Daniel P . Berrangé

On Wed, Nov 15, 2023 at 02:14:27AM -0500, Xiaoyao Li wrote:
> KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
> IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
> TDX context. It will be used to validate user's setting later.
> 
> Since there is no interface reporting how many cpuid configs contains in
> KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
> and abort when it exceeds KVM_MAX_CPUID_ENTRIES.
> 
> Besides, introduce the interfaces to invoke TDX "ioctls" at different
> scope (KVM, VM and VCPU) in preparation.
> 
> Signed-off-by: Xiaoyao Li 
> ---
> Changes in v3:
> - rename __tdx_ioctl() to tdx_ioctl_internal()
> - Pass errp in get_tdx_capabilities();
> 
> changes in v2:
>   - Make the error message more clear;
> 
> changes in v1:
>   - start from nr_cpuid_configs = 6 for the loop;
>   - stop the loop when nr_cpuid_configs exceeds KVM_MAX_CPUID_ENTRIES;
> ---
>  target/i386/kvm/kvm.c  |   2 -
>  target/i386/kvm/kvm_i386.h |   2 +
>  target/i386/kvm/tdx.c  | 102 -
>  3 files changed, 103 insertions(+), 3 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 7abcdebb1452..28e60c5ea4a7 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1687,8 +1687,6 @@ static int hyperv_init_vcpu(X86CPU *cpu)
>  
>  static Error *invtsc_mig_blocker;
>  
> -#define KVM_MAX_CPUID_ENTRIES  100
> -
>  static void kvm_init_xsave(CPUX86State *env)
>  {
>  if (has_xsave2) {
> diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
> index 55fb25fa8e2e..c3ef46a97a7b 100644
> --- a/target/i386/kvm/kvm_i386.h
> +++ b/target/i386/kvm/kvm_i386.h
> @@ -13,6 +13,8 @@
>  
>  #include "sysemu/kvm.h"
>  
> +#define KVM_MAX_CPUID_ENTRIES  100
> +
>  #ifdef CONFIG_KVM
>  
>  #define kvm_pit_in_kernel() \
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 621a05beeb4e..cb0040187b27 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -12,17 +12,117 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include "qapi/error.h"
>  #include "qom/object_interfaces.h"
> +#include "sysemu/kvm.h"
>  
>  #include "hw/i386/x86.h"
> +#include "kvm_i386.h"
>  #include "tdx.h"
>  
> +static struct kvm_tdx_capabilities *tdx_caps;
> +
> +enum tdx_ioctl_level{
> +TDX_PLATFORM_IOCTL,
> +TDX_VM_IOCTL,
> +TDX_VCPU_IOCTL,
> +};
> +
> +static int tdx_ioctl_internal(void *state, enum tdx_ioctl_level level, int 
> cmd_id,
> +__u32 flags, void *data)
> +{
> +struct kvm_tdx_cmd tdx_cmd;

Add   ' = {}'  to initialize to all-zeros, avoiding the explicit
memset call

> +int r;
> +
> +memset(&tdx_cmd, 0x0, sizeof(tdx_cmd));
> +
> +tdx_cmd.id = cmd_id;
> +tdx_cmd.flags = flags;
> +tdx_cmd.data = (__u64)(unsigned long)data;
> +
> +switch (level) {
> +case TDX_PLATFORM_IOCTL:
> +r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
> +break;
> +case TDX_VM_IOCTL:
> +r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
> +break;
> +case TDX_VCPU_IOCTL:
> +r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
> +break;
> +default:
> +error_report("Invalid tdx_ioctl_level %d", level);
> +exit(1);
> +}
> +
> +return r;
> +}
> +
> +static inline int tdx_platform_ioctl(int cmd_id, __u32 flags, void *data)
> +{
> +return tdx_ioctl_internal(NULL, TDX_PLATFORM_IOCTL, cmd_id, flags, data);
> +}
> +
> +static inline int tdx_vm_ioctl(int cmd_id, __u32 flags, void *data)
> +{
> +return tdx_ioctl_internal(NULL, TDX_VM_IOCTL, cmd_id, flags, data);
> +}
> +
> +static inline int tdx_vcpu_ioctl(void *vcpu_fd, int cmd_id, __u32 flags,
> + void *data)
> +{
> +return  tdx_ioctl_internal(vcpu_fd, TDX_VCPU_IOCTL, cmd_id, flags, data);
> +}
> +
> +static int get_tdx_capabilities(Error **errp)
> +{
> +struct kvm_tdx_capabilities *caps;
> +/* 1st generation of TDX reports 6 cpuid configs */
> +int nr_cpuid_configs = 6;
> +size_t size;
> +int r;
> +
> +do {
> +size = sizeof(struct kvm_tdx_capabilities) +
> +   nr_cpuid_configs * sizeof(struct kvm_tdx_cpuid_config);
> +caps = g_malloc0(size);
> +caps->nr_cpuid_configs = nr_cpuid_configs;
> +
> +r = tdx_vm_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
> +if (r == -E2BIG) {
> +g_free(caps);
> +nr_cpuid_configs *= 2;
> +if (nr_cpuid_configs > KVM_MAX_CPUID_ENTRIES) {
> +error_setg(errp, "%s: KVM TDX seems broken that number of 
> CPUID "
> +   "entries in kvm_tdx_capabilities exceeds limit 
> %d",
> +   __func__, KVM_MAX_CPUID_ENTRIES);
> +return r;
> +}
> +} else if (r < 0) {
> +g_free(ca

Re: [PATCH v3 26/70] i386/tdx: Initialize TDX before creating TD vcpus

2023-11-15 Thread Daniel P . Berrangé

On Wed, Nov 15, 2023 at 02:14:35AM -0500, Xiaoyao Li wrote:
> Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
> configures global TD configurations, e.g. the canonical CPUID config,
> and must be executed prior to creating vCPUs.
> 
> Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.
> 
> Note, this doesn't address the fact that QEMU may change the CPUID
> configuration when creating vCPUs, i.e. punts on refactoring QEMU to
> provide a stable CPUID config prior to kvm_arch_init().
> 
> Signed-off-by: Xiaoyao Li 
> Acked-by: Gerd Hoffmann 
> ---
> Changes in v3:
> - Pass @errp in tdx_pre_create_vcpu() and pass error info to it. (Daniel)
> ---
>  accel/kvm/kvm-all.c|  9 +++-
>  target/i386/kvm/kvm.c  |  9 
>  target/i386/kvm/tdx-stub.c |  5 +
>  target/i386/kvm/tdx.c  | 45 ++
>  target/i386/kvm/tdx.h  |  4 
>  5 files changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 6b5f4d62f961..a92fff471b58 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -441,8 +441,15 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>  
>  trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  
> +/*
> + * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call
> + * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer
> + * dereference.
> + */
> +cpu->kvm_state = s;
>  ret = kvm_arch_pre_create_vcpu(cpu, errp);
>  if (ret < 0) {
> +cpu->kvm_state = NULL;
>  goto err;
>  }
>  
> @@ -450,11 +457,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>  if (ret < 0) {
>  error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed 
> (%lu)",
>   kvm_arch_vcpu_id(cpu));
> +cpu->kvm_state = NULL;
>  goto err;
>  }
>  
>  cpu->kvm_fd = ret;
> -cpu->kvm_state = s;
>  cpu->vcpu_dirty = true;
>  cpu->dirty_pages = 0;
>  cpu->throttle_us_per_full = 0;
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index dafe4d262977..fc840653ceb6 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2268,6 +2268,15 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  return r;
>  }
>  
> +int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> +{
> +if (is_tdx_vm()) {
> +return tdx_pre_create_vcpu(cpu, errp);
> +}
> +
> +return 0;
> +}
> +
>  int kvm_arch_destroy_vcpu(CPUState *cs)
>  {
>  X86CPU *cpu = X86_CPU(cs);
> diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
> index 1d866d5496bf..3877d432a397 100644
> --- a/target/i386/kvm/tdx-stub.c
> +++ b/target/i386/kvm/tdx-stub.c
> @@ -6,3 +6,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
>  {
>  return -EINVAL;
>  }
> +
> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> +{
> +return -EINVAL;
> +}
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 1f5d8117d1a9..122a37c93de3 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -467,6 +467,49 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
>  return 0;
>  }
>  
> +int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
> +{
> +MachineState *ms = MACHINE(qdev_get_machine());
> +X86CPU *x86cpu = X86_CPU(cpu);
> +CPUX86State *env = &x86cpu->env;
> +struct kvm_tdx_init_vm *init_vm;

Mark this as auto-free to avoid the g_free() requirement

  g_autofree  struct kvm_tdx_init_vm *init_vm = NULL;

> +int r = 0;
> +
> +qemu_mutex_lock(&tdx_guest->lock);

   QEMU_LOCK_GUARD(&tdx_guest->lock);

to eliminate the mutex_unlock requirement, thus eliminating all
'goto' jumps and label targets, in favour of a plain 'return -1'
everywhere.

> +if (tdx_guest->initialized) {
> +goto out;
> +}
> +
> +init_vm = g_malloc0(sizeof(struct kvm_tdx_init_vm) +
> +sizeof(struct kvm_cpuid_entry2) * 
> KVM_MAX_CPUID_ENTRIES);
> +
> +r = kvm_vm_enable_cap(kvm_state, KVM_CAP_MAX_VCPUS, 0, ms->smp.cpus);
> +if (r < 0) {
> +error_setg(errp, "Unable to set MAX VCPUS to %d", ms->smp.cpus);
> +goto out_free;
> +}
> +
> +init_vm->cpuid.nent = kvm_x86_arch_cpuid(env, init_vm->cpuid.entries, 0);
> +
> +init_vm->attributes = tdx_guest->attributes;
> +
> +do {
> +r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, init_vm);
> +} while (r == -EAGAIN);
> +if (r < 0) {
> +error_setg_errno(errp, -r, "KVM_TDX_INIT_VM failed");
> +goto out_free;
> +}
> +
> +tdx_guest->initialized = true;
> +
> +out_free:
> +g_free(init_vm);
> +out:
> +qemu_mutex_unlock(&tdx_guest->lock);
> +return r;
> +}
> +
>  /* tdx guest */
>  OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
> tdx_guest,
> @@ -479,6 +522,8 @@ static void tdx_guest_init(Object *obj)
>  {
>  TdxGu

Re: [PATCH v5 02/31] target/hppa: Remove object_class_is_abstract()

2023-11-15 Thread BALATON Zoltan


On Wed, 15 Nov 2023, Gavin Shan wrote:

No need to check if @oc is abstract because it has been covered
by cpu_class_by_name().

Signed-off-by: Gavin Shan 
---
target/hppa/cpu.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 04de1689d7..fc4d2abad7 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -163,7 +163,6 @@ static ObjectClass *hppa_cpu_class_by_name(const char 
*cpu_model)
ObjectClass *oc = object_class_by_name(typename);

if (oc &&
-!object_class_is_abstract(oc) &&
object_class_dynamic_cast(oc, TYPE_HPPA_CPU)) {


Might as well remove the line break as the remaining expression fits in 80 
chars.


Regards,
BALATON Zoltan


return oc;
}

Re: [PATCH] tests/avocado/virtio-gpu: Fix test_vhost_user_vga_virgl for edid support

2023-11-15 Thread Antonio Caggiano


Hi,

On 14/11/2023 21:34, Thomas Huth wrote:

The "edid" feature has been added to vhost-user-gpu in commit
c06444261e20 ("contrib/vhost-user-gpu: implement get_edid feature"),
so waiting for "features: +virgl -edid" in the test does not work
anymore, it's "+edid" instead of "-edid" now!

While we're at it, move the expected string to the preceeding
exec_command_and_wait_for_pattern() instead (since waiting for
empty string here does not make too much sense).

Signed-off-by: Thomas Huth 


Reviewed-by: Antonio Caggiano 


---
  tests/avocado/virtio-gpu.py | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tests/avocado/virtio-gpu.py b/tests/avocado/virtio-gpu.py
index 89bfecc715..6091f614a4 100644
--- a/tests/avocado/virtio-gpu.py
+++ b/tests/avocado/virtio-gpu.py
@@ -149,10 +149,8 @@ def test_vhost_user_vga_virgl(self):
  # TODO: probably fails because we are missing the VirGL features
  self.cancel("VirGL not enabled?")
  self.wait_for_console_pattern("as init process")
-exec_command_and_wait_for_pattern(
-self, "/usr/sbin/modprobe virtio_gpu", ""
-)
-self.wait_for_console_pattern("features: +virgl -edid")
+exec_command_and_wait_for_pattern(self, "/usr/sbin/modprobe 
virtio_gpu",
+  "features: +virgl +edid")
  self.vm.shutdown()
  qemu_sock.close()
  vugp.terminate()

Re: [PATCH v5 02/31] target/hppa: Remove object_class_is_abstract()

2023-11-15 Thread Gavin Shan


On 11/15/23 21:18, BALATON Zoltan wrote:

On Wed, 15 Nov 2023, Gavin Shan wrote:

No need to check if @oc is abstract because it has been covered
by cpu_class_by_name().

Signed-off-by: Gavin Shan 
---
target/hppa/cpu.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 04de1689d7..fc4d2abad7 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -163,7 +163,6 @@ static ObjectClass *hppa_cpu_class_by_name(const char 
*cpu_model)
    ObjectClass *oc = object_class_by_name(typename);

    if (oc &&
-    !object_class_is_abstract(oc) &&
    object_class_dynamic_cast(oc, TYPE_HPPA_CPU)) {


Might as well remove the line break as the remaining expression fits in 80 
chars.



Yes, but the whole chunk of code will be removed in PATCH[03]. So I think
we needn't the extra effort to adjust the format in PATCH[02]?

Thaks,
Gavin


    return oc;
    }

Re: [PATCH v5 02/31] target/hppa: Remove object_class_is_abstract()

2023-11-15 Thread BALATON Zoltan


On Wed, 15 Nov 2023, Gavin Shan wrote:

On 11/15/23 21:18, BALATON Zoltan wrote:

On Wed, 15 Nov 2023, Gavin Shan wrote:

No need to check if @oc is abstract because it has been covered
by cpu_class_by_name().

Signed-off-by: Gavin Shan 
---
target/hppa/cpu.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 04de1689d7..fc4d2abad7 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -163,7 +163,6 @@ static ObjectClass *hppa_cpu_class_by_name(const char 
*cpu_model)

    ObjectClass *oc = object_class_by_name(typename);

    if (oc &&
-    !object_class_is_abstract(oc) &&
    object_class_dynamic_cast(oc, TYPE_HPPA_CPU)) {


Might as well remove the line break as the remaining expression fits in 80 
chars.




Yes, but the whole chunk of code will be removed in PATCH[03]. So I think
we needn't the extra effort to adjust the format in PATCH[02]?


Yes, if it's gone later then does not matter.

Regards,
BALATON Zoltan


Thaks,
Gavin


    return oc;
    }

Re: [PATCH 2/2] vhost-scsi: Add support for a worker thread per virtqueue

2023-11-15 Thread Stefano Garzarella


On Mon, Nov 13, 2023 at 06:36:44PM -0600, Mike Christie wrote:

This adds support for vhost-scsi to be able to create a worker thread
per virtqueue. Right now for vhost-net we get a worker thread per
tx/rx virtqueue pair which scales nicely as we add more virtqueues and
CPUs, but for scsi we get the single worker thread that's shared by all
virtqueues. When trying to send IO to more than 2 virtqueues the single
thread becomes a bottlneck.

This patch adds a new setting, virtqueue_workers, which can be set to:

1: Existing behavior whre we get the single thread.
-1: Create a worker per IO virtqueue.


I find this setting a bit odd. What about a boolean instead?

`per_virtqueue_workers`:
false: Existing behavior whre we get the single thread.
true: Create a worker per IO virtqueue.



Signed-off-by: Mike Christie 
---
hw/scsi/vhost-scsi.c| 68 +
include/hw/virtio/virtio-scsi.h |  1 +
2 files changed, 69 insertions(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 3126df9e1d9d..5cf669b6563b 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -31,6 +31,9 @@
#include "qemu/cutils.h"
#include "sysemu/sysemu.h"

+#define VHOST_SCSI_WORKER_PER_VQ-1
+#define VHOST_SCSI_WORKER_DEF1
+
/* Features supported by host kernel. */
static const int kernel_feature_bits[] = {
VIRTIO_F_NOTIFY_ON_EMPTY,
@@ -165,6 +168,62 @@ static const VMStateDescription vmstate_virtio_vhost_scsi 
= {
.pre_save = vhost_scsi_pre_save,
};

+static int vhost_scsi_set_workers(VHostSCSICommon *vsc, int workers_cnt)
+{
+struct vhost_dev *dev = &vsc->dev;
+struct vhost_vring_worker vq_worker;
+struct vhost_worker_state worker;
+int i, ret;
+
+/* Use default worker */
+if (workers_cnt == VHOST_SCSI_WORKER_DEF ||
+dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1) {
+return 0;
+}
+
+if (workers_cnt != VHOST_SCSI_WORKER_PER_VQ) {
+return -EINVAL;
+}
+
+/*
+ * ctl/evt share the first worker since it will be rare for them
+ * to send cmds while IO is running.
+ */
+for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
+memset(&worker, 0, sizeof(worker));
+
+ret = dev->vhost_ops->vhost_new_worker(dev, &worker);


Should we call vhost_free_worker() in the vhost_scsi_unrealize() or are
workers automatically freed when `vhostfd` is closed?

The rest LGTM.

Thanks,
Stefano


+if (ret == -ENOTTY) {
+/*
+ * worker ioctls are not implemented so just ignore and
+ * and continue device setup.
+ */
+ret = 0;
+break;
+} else if (ret) {
+break;
+}
+
+memset(&vq_worker, 0, sizeof(vq_worker));
+vq_worker.worker_id = worker.worker_id;
+vq_worker.index = i;
+
+ret = dev->vhost_ops->vhost_attach_vring_worker(dev, &vq_worker);
+if (ret == -ENOTTY) {
+/*
+ * It's a bug for the kernel to have supported the worker creation
+ * ioctl but not attach.
+ */
+dev->vhost_ops->vhost_free_worker(dev, &worker);
+break;
+} else if (ret) {
+break;
+}
+}
+
+return ret;
+}
+
static void vhost_scsi_realize(DeviceState *dev, Error **errp)
{
VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
@@ -232,6 +291,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
goto free_vqs;
}

+ret = vhost_scsi_set_workers(vsc, vs->conf.virtqueue_workers);
+if (ret < 0) {
+error_setg(errp, "vhost-scsi: vhost worker setup failed: %s",
+   strerror(-ret));
+goto free_vqs;
+}
+
/* At present, channel and lun both are 0 for bootable vhost-scsi disk */
vsc->channel = 0;
vsc->lun = 0;
@@ -297,6 +363,8 @@ static Property vhost_scsi_properties[] = {
 VIRTIO_SCSI_F_T10_PI,
 false),
DEFINE_PROP_BOOL("migratable", VHostSCSICommon, migratable, false),
+DEFINE_PROP_INT32("virtqueue_workers", VirtIOSCSICommon,
+  conf.virtqueue_workers, VHOST_SCSI_WORKER_DEF),
DEFINE_PROP_END_OF_LIST(),
};

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 779568ab5d28..f70624ece564 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -51,6 +51,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig;
struct VirtIOSCSIConf {
uint32_t num_queues;
uint32_t virtqueue_size;
+int virtqueue_workers;
bool seg_max_adjust;
uint32_t max_sectors;
uint32_t cmd_per_lun;
--
2.34.1

Re: [PATCH 0/4] VFIO device init cleanup

2023-11-15 Thread Philippe Mathieu-Daudé


On 15/11/23 09:32, Zhenzhong Duan wrote:


Zhenzhong Duan (4):
   vfio/pci: Move VFIODevice initializations in vfio_instance_init
   vfio/platform: Move VFIODevice initializations in
 vfio_platform_instance_init
   vfio/ap: Move VFIODevice initializations in vfio_ap_instance_init
   vfio/ccw: Move VFIODevice initializations in vfio_ccw_instance_init


Series:
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle

2023-11-15 Thread Philippe Mathieu-Daudé


Hi Zhenzhong,

On 14/11/23 11:09, Zhenzhong Duan wrote:

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Together with the earlier support of pre-opening /dev/iommu device,
now we have full support of passing a vfio device to unprivileged
qemu by management tool. This mode is no more considered for the
legacy backend. So let's remove the "TODO" comment.

Add helper functions vfio_device_set_fd() and vfio_device_get_name()
to set fd and get device name, they will also be used by other vfio
devices.

There is no easy way to check if a device is mdev with FD passing,
so fail the x-balloon-allowed check unconditionally in this case.

There is also no easy way to get BDF as name with FD passing, so
we fake a name by VFIO_FD[fd].

Signed-off-by: Zhenzhong Duan 
---
v6: simplify CONFIG_IOMMUFD checking code
 introduce a helper vfio_device_set_fd

  include/hw/vfio/vfio-common.h |  3 +++
  hw/vfio/helpers.c | 44 +++
  hw/vfio/iommufd.c | 12 ++
  hw/vfio/pci.c | 28 --
  4 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3dac5c167e..567e5f7bea 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -251,4 +251,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase 
*bcontainer,
  hwaddr size);
  int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
   uint64_t size, ram_addr_t ram_addr);
+


Please add bare documentation:

  /* Returns 0 on success, or a negative errno. */


+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);


Functions taking an Error** param should return a boolean, so:

  /* Return: true on success, else false setting @errp with error. */


+void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
  #endif /* HW_VFIO_VFIO_COMMON_H */




@@ -609,3 +611,45 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, 
uint16_t cap_type)
  
  return ret;

  }
+
+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
+{
+struct stat st;
+
+if (vbasedev->fd < 0) {
+if (stat(vbasedev->sysfsdev, &st) < 0) {
+error_setg_errno(errp, errno, "no such host device");
+error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
+return -errno;
+}
+/* User may specify a name, e.g: VFIO platform device */
+if (!vbasedev->name) {
+vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+}
+} else {
+if (!vbasedev->iommufd) {
+error_setg(errp, "Use FD passing only with iommufd backend");
+return -EINVAL;
+}
+/*
+ * Give a name with fd so any function printing out vbasedev->name
+ * will not break.
+ */
+if (!vbasedev->name) {
+vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
+}
+}
+
+return 0;
+}
+
+void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)


   bool vfio_device_set_fd(..., Error **errp)


+{
+int fd = monitor_fd_param(monitor_cur(), str, errp);
+
+if (fd < 0) {
+error_prepend(errp, "Could not parse remote object fd %s:", str);
+return;


   return false;


+}
+vbasedev->fd = fd;


   return true;


+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 3eec428162..e08a217057 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -326,11 +326,15 @@ static int iommufd_cdev_attach(const char *name, 
VFIODevice *vbasedev,
  uint32_t ioas_id;
  Error *err = NULL;
  
-devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);

-if (devfd < 0) {
-return devfd;
+if (vbasedev->fd < 0) {
+devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
+if (devfd < 0) {
+return devfd;
+}
+vbasedev->fd = devfd;
+} else {
+devfd = vbasedev->fd;
  }
-vbasedev->fd = devfd;
  
  ret = iommufd_cdev_connect_and_bind(vbasedev, errp);

  if (ret) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c5984b0598..b23b492cce 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2944,17 +2944,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
  VFIODevice *vbasedev = &vdev->vbasedev;
  char *tmp, *subsys;
  Error *err = NULL;
-struct stat st;
  int i, ret;
  bool is_mdev;
  char uuid[UUID_STR_LEN];
  char *name;
  
-if (!vbasedev->sysfsdev) {

+if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
  if (!(~vdev->host.domain || ~vdev->host.bus ||
~vdev->host.slot || ~vdev->host.function)) {
  error_setg

Re: [PATCH v1] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER

2023-11-15 Thread Sebastian Ott


Hi,

On Mon, 13 Nov 2023, Shaoqin Huang wrote:

+``pmu-filter={A,D}:start-end[;...]``
+KVM implements pmu event filtering to prevent a guest from being able 
to
+   sample certain events. It has the following format:
+
+   pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"
+
+   The A means "allow" and D means "deny", start if the first event of the

  ^
  is

Also it should be stated that the first filter action defines if the whole
list is an allow or a deny list.


+static void kvm_arm_pmu_filter_init(CPUState *cs)
+{
+struct kvm_pmu_event_filter filter;
+struct kvm_device_attr attr = {
+.group  = KVM_ARM_VCPU_PMU_V3_CTRL,
+.attr   = KVM_ARM_VCPU_PMU_V3_FILTER,
+};
+KVMState *kvm_state = cs->kvm_state;
+char *tmp;
+char *str, act;
+
+if (!kvm_state->kvm_pmu_filter)
+return;
+
+tmp = g_strdup(kvm_state->kvm_pmu_filter);
+
+for (str = strtok(tmp, ";"); str != NULL; str = strtok(NULL, ";")) {
+unsigned short start = 0, end = 0;
+
+sscanf(str, "%c:%hx-%hx", &act, &start, &end);
+if ((act != 'A' && act != 'D') || (!start && !end)) {
+error_report("skipping invalid filter %s\n", str);
+continue;
+}
+
+filter = (struct kvm_pmu_event_filter) {
+.base_event = start,
+.nevents= end - start + 1,
+.action = act == 'A' ? KVM_PMU_EVENT_ALLOW :
+   KVM_PMU_EVENT_DENY,
+};
+
+attr.addr = (uint64_t)&filter;


That could move to the initialization of attr (the address of filter
doesn't change).


+if (!kvm_arm_set_device_attr(cs, &attr, "PMU Event Filter")) {
+error_report("Failed to init PMU Event Filter\n");
+abort();
+}
+}
+
+g_free(tmp);
+}
+
void kvm_arm_pmu_init(CPUState *cs)
{
struct kvm_device_attr attr = {
.group = KVM_ARM_VCPU_PMU_V3_CTRL,
.attr = KVM_ARM_VCPU_PMU_V3_INIT,
};
+static bool pmu_filter_init = false;

if (!ARM_CPU(cs)->has_pmu) {
return;
}
+if (!pmu_filter_init) {
+kvm_arm_pmu_filter_init(cs);
+pmu_filter_init = true;


pmu_filter_init could move inside kvm_arm_pmu_filter_init() - maybe
together with a comment that this only needs to be called for 1 vcpu.

Thanks,
Sebastian

Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object

2023-11-15 Thread Eric Auger

Hi Zhenzhong,

On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Eric Auger 
>
> Introduce an iommufd object which allows the interaction
> with the host /dev/iommu device.
>
> The /dev/iommu can have been already pre-opened outside of qemu,
> in which case the fd can be passed directly along with the
> iommufd object:
>
> This allows the iommufd object to be shared accross several
> subsystems (VFIO, VDPA, ...). For example, libvirt would open
> the /dev/iommu once.
>
> If no fd is passed along with the iommufd object, the /dev/iommu
> is opened by the qemu code.
>
> Suggested-by: Alex Williamson 
> Signed-off-by: Eric Auger 
> Signed-off-by: Yi Liu 
> Signed-off-by: Zhenzhong Duan 
> ---
> v6: remove redundant call, alloc_hwpt, get/put_ioas
>
>  MAINTAINERS  |   7 ++
>  qapi/qom.json|  19 
>  include/sysemu/iommufd.h |  44 
>  backends/iommufd.c   | 228 +++
>  backends/Kconfig |   4 +
>  backends/meson.build |   1 +
>  backends/trace-events|  10 ++
>  qemu-options.hx  |  12 +++
>  8 files changed, 325 insertions(+)
>  create mode 100644 include/sysemu/iommufd.h
>  create mode 100644 backends/iommufd.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ff1238bb98..a4891f7bda 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c
>  F: docs/system/s390x/vfio-ap.rst
>  L: qemu-s3...@nongnu.org
>  
> +iommufd
> +M: Yi Liu 
> +M: Eric Auger 
Zhenzhong, don't you want to be added here?
> +S: Supported
> +F: backends/iommufd.c
> +F: include/sysemu/iommufd.h
> +
>  vhost
>  M: Michael S. Tsirkin 
>  S: Supported
> diff --git a/qapi/qom.json b/qapi/qom.json
> index c53ef978ff..1fd8555a75 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -794,6 +794,23 @@
>  { 'struct': 'VfioUserServerProperties',
>'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>  
> +##
> +# @IOMMUFDProperties:
> +#
> +# Properties for iommufd objects.
> +#
> +# @fd: file descriptor name previously passed via 'getfd' command,

"previously passed via 'getfd' command", I wonder if this applies here or 
whether it is copy/paste of 
RemoteObjectProperties.fd doc?

> +# which represents a pre-opened /dev/iommu.  This allows the
> +# iommufd object to be shared accross several subsystems
> +# (VFIO, VDPA, ...), and the file descriptor to be shared
> +# with other process, e.g. DPDK.  (default: QEMU opens
> +# /dev/iommu by itself)
> +#
> +# Since: 8.2
> +##
> +{ 'struct': 'IOMMUFDProperties',
> +  'data': { '*fd': 'str' } }
> +
>  ##
>  # @RngProperties:
>  #
> @@ -934,6 +951,7 @@
>  'input-barrier',
>  { 'name': 'input-linux',
>'if': 'CONFIG_LINUX' },
> +'iommufd',
>  'iothread',
>  'main-loop',
>  { 'name': 'memory-backend-epc',
> @@ -1003,6 +1021,7 @@
>'input-barrier':  'InputBarrierProperties',
>'input-linux':{ 'type': 'InputLinuxProperties',
>'if': 'CONFIG_LINUX' },
> +  'iommufd':'IOMMUFDProperties',
>'iothread':   'IothreadProperties',
>'main-loop':  'MainLoopProperties',
>'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties',
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> new file mode 100644
> index 00..9b3a86f57d
> --- /dev/null
> +++ b/include/sysemu/iommufd.h
> @@ -0,0 +1,44 @@
> +#ifndef SYSEMU_IOMMUFD_H
> +#define SYSEMU_IOMMUFD_H
> +
> +#include "qom/object.h"
> +#include "qemu/thread.h"
> +#include "exec/hwaddr.h"
> +#include "exec/cpu-common.h"
> +
> +#define TYPE_IOMMUFD_BACKEND "iommufd"
> +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
> +IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND(obj) \
> +OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND_GET_CLASS(obj) \
> +OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND_CLASS(klass) \
> +OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND)
> +struct IOMMUFDBackendClass {
> +ObjectClass parent_class;
> +};
> +
> +struct IOMMUFDBackend {
> +Object parent;
> +
> +/*< protected >*/
> +int fd;/* /dev/iommu file descriptor */
> +bool owned;/* is the /dev/iommu opened internally */
> +QemuMutex lock;
> +uint32_t users;
> +
> +/*< public >*/
> +};
> +
> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
> +void iommufd_backend_disconnect(IOMMUFDBackend *be);
> +
> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
> +   Error **errp);
> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr 
> iova,
> +ram_addr_t size, void *vaddr, bool r

Re: [PATCH 2/2] vhost-scsi: Add support for a worker thread per virtqueue

2023-11-15 Thread Stefan Hajnoczi

On Wed, Nov 15, 2023 at 12:43:02PM +0100, Stefano Garzarella wrote:
> On Mon, Nov 13, 2023 at 06:36:44PM -0600, Mike Christie wrote:
> > This adds support for vhost-scsi to be able to create a worker thread
> > per virtqueue. Right now for vhost-net we get a worker thread per
> > tx/rx virtqueue pair which scales nicely as we add more virtqueues and
> > CPUs, but for scsi we get the single worker thread that's shared by all
> > virtqueues. When trying to send IO to more than 2 virtqueues the single
> > thread becomes a bottlneck.
> > 
> > This patch adds a new setting, virtqueue_workers, which can be set to:
> > 
> > 1: Existing behavior whre we get the single thread.
> > -1: Create a worker per IO virtqueue.
> 
> I find this setting a bit odd. What about a boolean instead?
> 
> `per_virtqueue_workers`:
> false: Existing behavior whre we get the single thread.
> true: Create a worker per IO virtqueue.

Me too, I thought there would be round-robin assignment for 1 <
worker_cnt < (dev->nvqs - VHOST_SCSI_VQ_NUM_FIXED) but instead only 1
and -1 have any meaning.

Do you want to implement round-robin assignment?

> 
> > 
> > Signed-off-by: Mike Christie 
> > ---
> > hw/scsi/vhost-scsi.c| 68 +
> > include/hw/virtio/virtio-scsi.h |  1 +
> > 2 files changed, 69 insertions(+)
> > 
> > diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> > index 3126df9e1d9d..5cf669b6563b 100644
> > --- a/hw/scsi/vhost-scsi.c
> > +++ b/hw/scsi/vhost-scsi.c
> > @@ -31,6 +31,9 @@
> > #include "qemu/cutils.h"
> > #include "sysemu/sysemu.h"
> > 
> > +#define VHOST_SCSI_WORKER_PER_VQ-1
> > +#define VHOST_SCSI_WORKER_DEF1
> > +
> > /* Features supported by host kernel. */
> > static const int kernel_feature_bits[] = {
> > VIRTIO_F_NOTIFY_ON_EMPTY,
> > @@ -165,6 +168,62 @@ static const VMStateDescription 
> > vmstate_virtio_vhost_scsi = {
> > .pre_save = vhost_scsi_pre_save,
> > };
> > 
> > +static int vhost_scsi_set_workers(VHostSCSICommon *vsc, int workers_cnt)
> > +{
> > +struct vhost_dev *dev = &vsc->dev;
> > +struct vhost_vring_worker vq_worker;
> > +struct vhost_worker_state worker;
> > +int i, ret;
> > +
> > +/* Use default worker */
> > +if (workers_cnt == VHOST_SCSI_WORKER_DEF ||
> > +dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1) {
> > +return 0;
> > +}
> > +
> > +if (workers_cnt != VHOST_SCSI_WORKER_PER_VQ) {
> > +return -EINVAL;
> > +}
> > +
> > +/*
> > + * ctl/evt share the first worker since it will be rare for them
> > + * to send cmds while IO is running.
> > + */
> > +for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
> > +memset(&worker, 0, sizeof(worker));
> > +
> > +ret = dev->vhost_ops->vhost_new_worker(dev, &worker);
> 
> Should we call vhost_free_worker() in the vhost_scsi_unrealize() or are
> workers automatically freed when `vhostfd` is closed?
> 
> The rest LGTM.
> 
> Thanks,
> Stefano
> 
> > +if (ret == -ENOTTY) {
> > +/*
> > + * worker ioctls are not implemented so just ignore and
> > + * and continue device setup.
> > + */
> > +ret = 0;
> > +break;
> > +} else if (ret) {
> > +break;
> > +}
> > +
> > +memset(&vq_worker, 0, sizeof(vq_worker));
> > +vq_worker.worker_id = worker.worker_id;
> > +vq_worker.index = i;
> > +
> > +ret = dev->vhost_ops->vhost_attach_vring_worker(dev, &vq_worker);
> > +if (ret == -ENOTTY) {
> > +/*
> > + * It's a bug for the kernel to have supported the worker 
> > creation
> > + * ioctl but not attach.
> > + */
> > +dev->vhost_ops->vhost_free_worker(dev, &worker);
> > +break;
> > +} else if (ret) {
> > +break;
> > +}
> > +}
> > +
> > +return ret;
> > +}
> > +
> > static void vhost_scsi_realize(DeviceState *dev, Error **errp)
> > {
> > VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> > @@ -232,6 +291,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
> > **errp)
> > goto free_vqs;
> > }
> > 
> > +ret = vhost_scsi_set_workers(vsc, vs->conf.virtqueue_workers);
> > +if (ret < 0) {
> > +error_setg(errp, "vhost-scsi: vhost worker setup failed: %s",
> > +   strerror(-ret));
> > +goto free_vqs;
> > +}
> > +
> > /* At present, channel and lun both are 0 for bootable vhost-scsi disk 
> > */
> > vsc->channel = 0;
> > vsc->lun = 0;
> > @@ -297,6 +363,8 @@ static Property vhost_scsi_properties[] = {
> >  VIRTIO_SCSI_F_T10_PI,
> >  false),
> > DEFINE_PROP_BOOL("migratable", VHostSCSICommon, migratable, false),
> > +DEFINE_PROP_INT32("virtqueue_workers", VirtIOSCSICommon,
> > +

Re: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle

2023-11-15 Thread Cédric Le Goater


On 11/15/23 13:09, Philippe Mathieu-Daudé wrote:

Hi Zhenzhong,

On 14/11/23 11:09, Zhenzhong Duan wrote:

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Together with the earlier support of pre-opening /dev/iommu device,
now we have full support of passing a vfio device to unprivileged
qemu by management tool. This mode is no more considered for the
legacy backend. So let's remove the "TODO" comment.

Add helper functions vfio_device_set_fd() and vfio_device_get_name()
to set fd and get device name, they will also be used by other vfio
devices.

There is no easy way to check if a device is mdev with FD passing,
so fail the x-balloon-allowed check unconditionally in this case.

There is also no easy way to get BDF as name with FD passing, so
we fake a name by VFIO_FD[fd].

Signed-off-by: Zhenzhong Duan 
---
v6: simplify CONFIG_IOMMUFD checking code
 introduce a helper vfio_device_set_fd

  include/hw/vfio/vfio-common.h |  3 +++
  hw/vfio/helpers.c | 44 +++
  hw/vfio/iommufd.c | 12 ++
  hw/vfio/pci.c | 28 --
  4 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3dac5c167e..567e5f7bea 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -251,4 +251,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase 
*bcontainer,
  hwaddr size);
  int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
   uint64_t size, ram_addr_t ram_addr);
+


Please add bare documentation:

   /* Returns 0 on success, or a negative errno. */


+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);


Functions taking an Error** param should return a boolean, so:

   /* Return: true on success, else false setting @errp with error. */


+void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
  #endif /* HW_VFIO_VFIO_COMMON_H */




@@ -609,3 +611,45 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, 
uint16_t cap_type)
  return ret;
  }
+
+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
+{
+    struct stat st;
+
+    if (vbasedev->fd < 0) {
+    if (stat(vbasedev->sysfsdev, &st) < 0) {
+    error_setg_errno(errp, errno, "no such host device");
+    error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
+    return -errno;
+    }
+    /* User may specify a name, e.g: VFIO platform device */
+    if (!vbasedev->name) {
+    vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+    }
+    } else {
+    if (!vbasedev->iommufd) {
+    error_setg(errp, "Use FD passing only with iommufd backend");
+    return -EINVAL;
+    }
+    /*
+ * Give a name with fd so any function printing out vbasedev->name
+ * will not break.
+ */
+    if (!vbasedev->name) {
+    vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
+    }
+    }
+
+    return 0;
+}
+
+void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)


    bool vfio_device_set_fd(..., Error **errp)


+{
+    int fd = monitor_fd_param(monitor_cur(), str, errp);
+
+    if (fd < 0) {
+    error_prepend(errp, "Could not parse remote object fd %s:", str);
+    return;


    return false;


+    }
+    vbasedev->fd = fd;


    return true;


If we had a QOM base device object, vfio_device_set_fd() would be passed
directly to object_class_property_add_str() which expects a :

  void (*set)(Object *, const char *, Error **)

I think it is fine to keep as it is. We might have a QOM base device object
one day ! Minor anyway.

Thanks,

C.





+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 3eec428162..e08a217057 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -326,11 +326,15 @@ static int iommufd_cdev_attach(const char *name, 
VFIODevice *vbasedev,
  uint32_t ioas_id;
  Error *err = NULL;
-    devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
-    if (devfd < 0) {
-    return devfd;
+    if (vbasedev->fd < 0) {
+    devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
+    if (devfd < 0) {
+    return devfd;
+    }
+    vbasedev->fd = devfd;
+    } else {
+    devfd = vbasedev->fd;
  }
-    vbasedev->fd = devfd;
  ret = iommufd_cdev_connect_and_bind(vbasedev, errp);
  if (ret) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c5984b0598..b23b492cce 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2944,17 +2944,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
  VFIODevice *vbasedev = &vdev->vbasedev;
  char *tmp, *subsys;
  Error *err

Re: [PATCH 1/4] vfio/pci: Move VFIODevice initializations in vfio_instance_init

2023-11-15 Thread Cédric Le Goater


On 11/15/23 09:32, Zhenzhong Duan wrote:

Some of the VFIODevice initializations is in vfio_realize,
move all of them in vfio_instance_init.

No functional change intended.

Suggested-by: Cédric Le Goater 
Signed-off-by: Zhenzhong Duan 
---
  hw/vfio/pci.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b23b492cce..5a2b7a2d6b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2969,9 +2969,6 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
  if (vfio_device_get_name(vbasedev, errp)) {
  return;
  }
-vbasedev->ops = &vfio_pci_ops;
-vbasedev->type = VFIO_DEVICE_TYPE_PCI;
-vbasedev->dev = DEVICE(vdev);
  
  /*

   * Mediated devices *might* operate compatibly with discarding of RAM, but
@@ -3320,6 +3317,7 @@ static void vfio_instance_init(Object *obj)
  {
  PCIDevice *pci_dev = PCI_DEVICE(obj);
  VFIOPCIDevice *vdev = VFIO_PCI(obj);
+VFIODevice *vbasedev = &vdev->vbasedev;
  
  device_add_bootindex_property(obj, &vdev->bootindex,

"bootindex", NULL,
@@ -3328,7 +3326,11 @@ static void vfio_instance_init(Object *obj)
  vdev->host.bus = ~0U;
  vdev->host.slot = ~0U;
  vdev->host.function = ~0U;
-vdev->vbasedev.fd = -1;
+
+vbasedev->type = VFIO_DEVICE_TYPE_PCI;
+vbasedev->ops = &vfio_pci_ops;
+vbasedev->dev = DEVICE(vdev);
+vbasedev->fd = -1;


VFIODevice is similar to a base QOM parent. Could we introduce an helper
routine like we did with vfio_device_set_fd() ?

Thanks,

C.

  

  vdev->nv_gpudirect_clique = 0xFF;

Re: [PATCH] tests/avocado/reverse_debugging: Disable the ppc64 tests by default

2023-11-15 Thread Daniel P . Berrangé

On Wed, Nov 15, 2023 at 07:23:01AM +0100, Thomas Huth wrote:
> On 15/11/2023 02.15, Nicholas Piggin wrote:
> > On Wed Nov 15, 2023 at 4:29 AM AEST, Thomas Huth wrote:
> > > On 14/11/2023 17.37, Philippe Mathieu-Daudé wrote:
> > > > On 14/11/23 17:31, Thomas Huth wrote:
> > > > > The tests seem currently to be broken. Disable them by default
> > > > > until someone fixes them.
> > > > > 
> > > > > Signed-off-by: Thomas Huth 
> > > > > ---
> > > > >    tests/avocado/reverse_debugging.py | 7 ---
> > > > >    1 file changed, 4 insertions(+), 3 deletions(-)
> > > > 
> > > > Similarly, I suspect https://gitlab.com/qemu-project/qemu/-/issues/1961
> > > > which has a fix ready:
> > > > https://lore.kernel.org/qemu-devel/20231110170831.185001-1-richard.hender...@linaro.org/
> > > > 
> > > > Maybe wait the fix gets in first?
> > > 
> > > No, I applied Richard's patch, but the problem persists. Does this test
> > > still work for you?
> > 
> > I bisected it to 1d4796cd008373 ("python/machine: use socketpair() for
> > console connections"),
> 
> Maybe John (who wrote that commit) can help?

I find it hard to believe this commit is a direct root cause of the
problem since all it does is change the QEMU startup sequence so that
instead of QEMU listening for a monitor connection, it is given a
pre-opened monitor connection.

At the very most that should affect the startup timing a little.

I notice all the reverse debugging tests have a skip on gitlab
with a comment:

# unidentified gitlab timeout problem

this makes be suspicious that John's patch has merely made this
(henceforth undiagnosed) timeout more likely to ocurr.

> > which causes this halfway through the test:
> > 
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR| Traceback (most 
> > recent call last):
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR|   File 
> > "/home/npiggin/src/qemu/build/pyvenv/lib/python3.11/site-packages/avocado/core/decorators.py",
> >  line 90, in wrapper
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR| return 
> > function(obj, *args, **kwargs)
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR|
> > ^^
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR|   File 
> > "/home/npiggin/src/qemu/build/tests/avocado/reverse_debugging.py", line 
> > 264, in test_ppc64_powernv
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR| 
> > self.reverse_debugging()
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR|   File 
> > "/home/npiggin/src/qemu/build/tests/avocado/reverse_debugging.py", line 
> > 173, in reverse_debugging
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR| g.cmd(b'c')
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR|   File 
> > "/home/npiggin/src/qemu/build/pyvenv/lib/python3.11/site-packages/avocado/utils/gdb.py",
> >  line 783, in cmd
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR| response_payload 
> > = self.decode(result)
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR|   
> >  ^^^
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR|   File 
> > "/home/npiggin/src/qemu/build/pyvenv/lib/python3.11/site-packages/avocado/utils/gdb.py",
> >  line 738, in decode
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR| raise 
> > InvalidPacketError
> > 2023-11-15 10:37:04,600 stacktrace   L0045 ERROR| 
> > avocado.utils.gdb.InvalidPacketError
> > 2023-11-15 10:37:04,600 stacktrace   L0046 ERROR|
> > 
> > It doesn't always fail the same gdb command
> > (I saw a bc on line 182 as well). It seems to be receiving a
> > zero length response?
> > 
> > No idea what's happening or why ppc seems to be more fragile.
> > Or why changing console connection affects gdb connection?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v6 02/21] util/char_dev: Add open_cdev()

2023-11-15 Thread Eric Auger




On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Yi Liu 
>
> /dev/vfio/devices/vfioX may not exist. In that case it is still possible
> to open /dev/char/$major:$minor instead. Add helper function to abstract
> the cdev open.
>
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Yi Liu 
> Signed-off-by: Zhenzhong Duan 
Reviewed-by: Eric Auger 


Eric


> ---
>  MAINTAINERS |  3 ++
>  include/qemu/chardev_open.h | 16 
>  util/chardev_open.c | 81 +
>  util/meson.build|  1 +
>  4 files changed, 101 insertions(+)
>  create mode 100644 include/qemu/chardev_open.h
>  create mode 100644 util/chardev_open.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a4891f7bda..869ec3d5af 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2172,6 +2172,9 @@ M: Eric Auger 
>  S: Supported
>  F: backends/iommufd.c
>  F: include/sysemu/iommufd.h
> +F: include/qemu/chardev_open.h
> +F: util/chardev_open.c
> +
>  
>  vhost
>  M: Michael S. Tsirkin 
> diff --git a/include/qemu/chardev_open.h b/include/qemu/chardev_open.h
> new file mode 100644
> index 00..64e8fcfdcb
> --- /dev/null
> +++ b/include/qemu/chardev_open.h
> @@ -0,0 +1,16 @@
> +/*
> + * QEMU Chardev Helper
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + *
> + * Authors: Yi Liu 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_CHARDEV_OPEN_H
> +#define QEMU_CHARDEV_OPEN_H
> +
> +int open_cdev(const char *devpath, dev_t cdev);
> +#endif
> diff --git a/util/chardev_open.c b/util/chardev_open.c
> new file mode 100644
> index 00..f776429788
> --- /dev/null
> +++ b/util/chardev_open.c
> @@ -0,0 +1,81 @@
> +/*
> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
> + * Copyright (C) 2023 Intel Corporation.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *  Redistribution and use in source and binary forms, with or
> + *  without modification, are permitted provided that the following
> + *  conditions are met:
> + *
> + *  - Redistributions of source code must retain the above
> + *copyright notice, this list of conditions and the following
> + *disclaimer.
> + *
> + *  - Redistributions in binary form must reproduce the above
> + *copyright notice, this list of conditions and the following
> + *disclaimer in the documentation and/or other materials
> + *provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + * Authors: Yi Liu 
> + *
> + * Copied from
> + * https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/chardev_open.h"
> +
> +static int open_cdev_internal(const char *path, dev_t cdev)
> +{
> +struct stat st;
> +int fd;
> +
> +fd = qemu_open_old(path, O_RDWR);
> +if (fd == -1) {
> +return -1;
> +}
> +if (fstat(fd, &st) || !S_ISCHR(st.st_mode) ||
> +(cdev != 0 && st.st_rdev != cdev)) {
> +close(fd);
> +return -1;
> +}
> +return fd;
> +}
> +
> +static int open_cdev_robust(dev_t cdev)
> +{
> +g_autofree char *devpath = NULL;
> +
> +/*
> + * This assumes that udev is being used and is creating the /dev/char/
> + * symlinks.
> + */
> +devpath = g_strdup_printf("/dev/char/%u:%u", major(cdev), minor(cdev));
> +return open_cdev_internal(devpath, cdev);
> +}
> +
> +int open_cdev(const char *devpath, dev_t cdev)
> +{
> +int fd;
> +
> +fd = open_cdev_internal(devpath, cdev);
> +if (fd == -1 && cdev != 0) {
> +return open_cdev_robust(cdev);
> +}
> +return fd;
> +}
> diff --git a/util/meson.build b/util/meson.build
> index c2322ef6e7..174c133368 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -108,6 +108,7 @@ if have_block
>  util_ss.add(files('filemonitor-stub.c'))
>endif
>util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c'))
> +  util_ss.add(when: 'CONFIG_LINUX', if_true: files('chardev_open.c'))
>  endif
>  
>  if cpu == 'aarch64'

Re: [PATCH v6 03/21] vfio/common: return early if space isn't empty

2023-11-15 Thread Eric Auger




On 11/14/23 11:09, Zhenzhong Duan wrote:
> This is a trivial optimization. If there is active container in space,
> vfio_reset_handler will never be unregistered. So revert the check of
> space->containers and return early.
>
> Signed-off-by: Zhenzhong Duan 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  hw/vfio/common.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 572ae7c934..934f4f5446 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1462,10 +1462,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace 
> *as)
>  
>  void vfio_put_address_space(VFIOAddressSpace *space)
>  {
> -if (QLIST_EMPTY(&space->containers)) {
> -QLIST_REMOVE(space, list);
> -g_free(space);
> +if (!QLIST_EMPTY(&space->containers)) {
> +return;
>  }
> +
> +QLIST_REMOVE(space, list);
> +g_free(space);
> +
>  if (QLIST_EMPTY(&vfio_address_spaces)) {
>  qemu_unregister_reset(vfio_reset_handler, NULL);
>  }

Re: [PATCH] block-backend: per-device throttling of BLOCK_IO_ERROR reports

2023-11-15 Thread Markus Armbruster

Vladimir Sementsov-Ogievskiy  writes:

> From: Leonid Kaplan 
>
> BLOCK_IO_ERROR events comes from guest, so we must throttle them.

Really?  Can you describe how a guest can trigger these errors?

> We still want per-device throttling, so let's use device id as a key.
>
> Signed-off-by: Leonid Kaplan 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  monitor/monitor.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 01ede1babd..ad0243e9d7 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...)
>  static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>  /* Limit guest-triggerable events to 1 per second */
>  [QAPI_EVENT_RTC_CHANGE]= { 1000 * SCALE_MS },
> +[QAPI_EVENT_BLOCK_IO_ERROR]= { 1000 * SCALE_MS },
>  [QAPI_EVENT_WATCHDOG]  = { 1000 * SCALE_MS },
>  [QAPI_EVENT_BALLOON_CHANGE]= { 1000 * SCALE_MS },
>  [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void 
> *key)
>  hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>  }
>  
> +if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) {
> +hash += g_str_hash(qdict_get_str(evstate->data, "device"));
> +}
> +
>  return hash;
>  }
>  
> @@ -525,6 +530,11 @@ static gboolean qapi_event_throttle_equal(const void *a, 
> const void *b)
> qdict_get_str(evb->data, "qom-path"));
>  }
>  
> +if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) {
> +return !strcmp(qdict_get_str(eva->data, "device"),
> +   qdict_get_str(evb->data, "device"));
> +}
> +
>  return TRUE;
>  }

Missing:

  diff --git a/qapi/block-core.json b/qapi/block-core.json
  index ca390c5700..32c2c2f030 100644
  --- a/qapi/block-core.json
  +++ b/qapi/block-core.json
  @@ -5559,6 +5559,8 @@
   # Note: If action is "stop", a STOP event will eventually follow the
   # BLOCK_IO_ERROR event
   #
  +# Note: This event is rate-limited.
  +#
   # Since: 0.13
   #
   # Example:

[PATCH] docs/system: Add recommendations to Hyper-V enlightenments doc

2023-11-15 Thread Vitaly Kuznetsov

While hyperv.rst already has all currently implemented Hyper-V
enlightenments documented, it may be unclear what is the recommended set to
achieve the best result. Add the corresponding section to the doc.

Signed-off-by: Vitaly Kuznetsov 
---
 docs/system/i386/hyperv.rst | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst
index 2505dc4c86e0..1c7c4a3981ea 100644
--- a/docs/system/i386/hyperv.rst
+++ b/docs/system/i386/hyperv.rst
@@ -278,6 +278,36 @@ Supplementary features
   feature alters this behavior and only allows the guest to use exposed Hyper-V
   enlightenments.
 
+Recommendations
+---
+
+To achieve the best performance of Windows and Hyper-V guests and unless there
+are any specific requirements (e.g. migration to older QEMU/KVM versions,
+emulating specific Hyper-V version, ...), it is recommended to enable all
+currently implemented Hyper-V enlightenments with the following exceptions:
+
+- ``hv-syndbg``, ``hv-passthrough``, ``hv-enforce-cpuid`` should not be enabled
+  in production configurations as these are debugging/development features.
+- ``hv-reset`` can be avoided as modern Hyper-V versions don't expose it.
+- ``hv-evmcs`` can (and should) be enabled on Intel CPUs only. While the 
feature
+  is only used in nested configurations (Hyper-V, WSL2), enabling it for 
regular
+  Windows guests should not have any negative effects.
+- ``hv-no-nonarch-coresharing`` must only be enabled if vCPUs are properly 
pinned
+  so no non-architectural core sharing is possible.
+- ``hv-vendor-id``, ``hv-version-id-build``, ``hv-version-id-major``,
+  ``hv-version-id-minor``, ``hv-version-id-spack``, ``hv-version-id-sbranch``,
+  ``hv-version-id-snumber`` can be left unchanged, guests are not supposed to
+  behave differently when different Hyper-V version is presented to them.
+- ``hv-crash`` must only be enabled if the crash information is consumed via
+  QAPI by higher levels of the virtualization stack. Enabling this feature
+  effectively prevents Windows from creating dumps upon crashes.
+- ``hv-reenlightenment`` can only be used on hardware which supports TSC
+  scaling or when guest migration is not needed.
+- ``hv-spinlocks`` should be set to e.g. 0xfff when host CPUs are overcommited
+  (meaning there are other scheduled tasks or guests) and can be left unchanged
+  from the default value (0x) otherwise.
+- ``hv-avic``/``hv-apicv`` should not be enabled if the hardware does not
+  support APIC virtualization (Intel APICv, AMD AVIC).
 
 Useful links
 
-- 
2.41.0

Re: [PATCH v6 05/21] vfio/iommufd: Relax assert check for iommufd backend

2023-11-15 Thread Eric Auger




On 11/14/23 11:09, Zhenzhong Duan wrote:
> Currently iommufd doesn't support dirty page sync yet,
> but it will not block us doing live migration if VFIO
> migration is force enabled.
>
> So in this case we allow set_dirty_page_tracking to be NULL.
> Note we don't need same change for query_dirty_bitmap because
> when dirty page sync isn't supported, query_dirty_bitmap will
> never be called.
>
> Suggested-by: Cédric Le Goater 
> Signed-off-by: Zhenzhong Duan 
> Reviewed-by: Cédric Le Goater 

Reviewed-by: Eric Auger 


Eric
> ---
>  hw/vfio/container-base.c | 4 
>  hw/vfio/container.c  | 4 
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 71f7274973..eee2dcfe76 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -55,6 +55,10 @@ void vfio_container_del_section_window(VFIOContainerBase 
> *bcontainer,
>  int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> bool start)
>  {
> +if (!bcontainer->dirty_pages_supported) {
> +return 0;
> +}
> +
>  g_assert(bcontainer->ops->set_dirty_page_tracking);
>  return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
>  }
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 6bacf38222..ed2d721b2b 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -216,10 +216,6 @@ static int 
> vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>  .argsz = sizeof(dirty),
>  };
>  
> -if (!bcontainer->dirty_pages_supported) {
> -return 0;
> -}
> -
>  if (start) {
>  dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
>  } else {

Re: [PATCH v3 1/2] qom: new object to associate device to numa node

2023-11-15 Thread Markus Armbruster

 writes:

> From: Ankit Agrawal 
>
> NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
> partitioning of the GPU device resources (including device memory) into
> several (upto 8) isolated instances. Each of the partitioned memory needs
> a dedicated NUMA node to operate. The partitions are not fixed and they
> can be created/deleted at runtime.
>
> Unfortunately Linux OS does not provide a means to dynamically create/destroy
> NUMA nodes and such feature implementation is not expected to be trivial. The
> nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
> we utilize the Generic Initiator Affinity structures that allows association
> between nodes and devices. Multiple GI structures per BDF is possible,
> allowing creation of multiple nodes by exposing unique PXM in each of these
> structures.
>
> Introduce a new acpi-generic-initiator object to allow host admin provide the
> device and the corresponding NUMA nodes. Qemu maintain this association and
> use this object to build the requisite GI Affinity Structure.
>
> An admin can provide the range of nodes using a ':' delimited numalist and

Please don't create special-purpose syntax, use existing general-purpose
syntax.  See also review of qom.json below.

> link it to a device by providing its id. The node ids are extracted from
> numalist and stores as a uint16List. The following sample creates 8 nodes
> and link them to the device dev0:
>
> -numa node,nodeid=2 \
> -numa node,nodeid=3 \
> -numa node,nodeid=4 \
> -numa node,nodeid=5 \
> -numa node,nodeid=6 \
> -numa node,nodeid=7 \
> -numa node,nodeid=8 \
> -numa node,nodeid=9 \
> -device 
> vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
> -object acpi-generic-initiator,id=gi0,device=dev0,numalist=2:3:4:5:6:7:8:9 \
>
> [1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu
>
> Signed-off-by: Ankit Agrawal 
> ---
>  hw/acpi/acpi-generic-initiator.c | 80 
>  hw/acpi/meson.build  |  1 +
>  include/hw/acpi/acpi-generic-initiator.h | 29 +
>  qapi/qom.json| 16 +
>  4 files changed, 126 insertions(+)
>  create mode 100644 hw/acpi/acpi-generic-initiator.c
>  create mode 100644 include/hw/acpi/acpi-generic-initiator.h
>
> diff --git a/hw/acpi/acpi-generic-initiator.c 
> b/hw/acpi/acpi-generic-initiator.c
> new file mode 100644
> index 00..0699c878e2
> --- /dev/null
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -0,0 +1,80 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/qdev-properties.h"
> +#include "qapi/error.h"
> +#include "qapi/visitor.h"
> +#include "qom/object_interfaces.h"
> +#include "qom/object.h"
> +#include "hw/qdev-core.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/pci.h"
> +#include "hw/pci/pci_device.h"
> +#include "sysemu/numa.h"
> +#include "hw/acpi/acpi-generic-initiator.h"
> +
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, 
> acpi_generic_initiator,
> +   ACPI_GENERIC_INITIATOR, OBJECT,
> +   { TYPE_USER_CREATABLE },
> +   { NULL })
> +
> +OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
> +
> +static void acpi_generic_initiator_init(Object *obj)
> +{
> +AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +gi->device = NULL;
> +gi->nodelist = NULL;
> +}
> +
> +static void acpi_generic_initiator_finalize(Object *obj)
> +{
> +AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> +g_free(gi->device);
> +qapi_free_uint16List(gi->nodelist);
> +}
> +
> +static void acpi_generic_initiator_set_device(Object *obj, const char *val,
> +  Error **errp)
> +{
> +AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> +gi->device = g_strdup(val);
> +}
> +
> +static void acpi_generic_initiator_set_nodelist(Object *obj, const char *val,
> +Error **errp)
> +{
> +AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +char *value = g_strdup(val);
> +uint16_t node;
> +uint16List **tail = &(gi->nodelist);
> +char *nodestr = value ? strtok(value, ":") : NULL;
> +
> +while (nodestr) {
> +if (sscanf(nodestr, "%hu", &node) != 1) {
> +error_setg(errp, "failed to read node-id");
> +return;
> +}
> +
> +if (node >= MAX_NODES) {
> +error_setg(errp, "invalid node-id");
> +return;
> +}
> +
> +QAPI_LIST_APPEND(tail, node);
> +nodestr = strtok(NULL, ":");
> +}
> +}
> +
> +static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
> +{
> +object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_DEVICE_PROP, 
> NULL,
> +

[PATCH 0/2] hw/pci-host: Fix Designware no address match behavior

2023-11-15 Thread Max Hsu

IMX6DQRM Rev4, in chapter 48.3.9.1, specifies that iATU is instantiated
inside the PCIe core, translating TLPs in and out of the PCIe core.

Currently, the model faces issues with TLPs using memory addresses not
registered on the iATU.
The Designware spec (48.3.9.2 for outbound, 48.3.9.3 for inbound)
mentions that TLPs should continue without address translation.

For inbound access, model uses iATU inbound region 0 for dummy access.
However, the Linux Kernel Driver is unaware, leading to the disabling
of this region, blocking TLPs.

For outbound access, the model didn't implement this, blocking any
access outside the iATU outbound list.

This patch series addresses these issues separately for inbound and
outbound. After applying the patches, the model has been tested with
the e1000e Ethernet card, ensuring proper functioning of network
transmissions and MSI interrupts.

Signed-off-by: Max Hsu 

Max Hsu (2):
  hw/pci-host: Designware: Fix inbound iATU no address match behavior
  hw/pci-host: Designware: Add outbound iATU no address match behavior

 hw/pci-host/designware.c | 40 +++-
 1 file changed, 31 insertions(+), 9 deletions(-)

-- 
2.34.1

[PATCH 1/2] hw/pci-host: Designware: Fix inbound iATU no address match behavior

2023-11-15 Thread Max Hsu

IMX6DQRM Rev4, in chapter 48.3.9.3, specifies that for inbound iATU
with no address match: 'If there is no match, then the address is
untranslated.'
The current model implementation registers inbound region 0 as
untranslated dummy, intending to serve as a passing medium for
the no-match address behavior using MemoryRegion.

However, a bug exists where the Linux Kernel driver of Designware PCIe
RC is unaware that inbound region 0 is registered for special setup.
During Kernel driver initialization, the driver overwrites the target
address to 0x_ and later disables all regions, rendering the
untranslated passing medium ineffective.
Consequently, TLP cannot pass iATU, and the transaction is blocked.

To address this issue, we propose "inbound untranslated pass" which is
consistently enabled and distinct from the usage of iATU regions.
We achieve this by introducing a new MemoryRegion with the
lowest priority to prevent conflicts with configured iATU regions.

This fix has been tested with the integration of Designware PCIe RC
along with the e1000e Ethernet card, ensuring proper functioning of
network transmissions and MSI interrupts.

Signed-off-by: Max Hsu 
---
 hw/pci-host/designware.c | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index f477f97847..83dd9b1aaf 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -487,17 +487,28 @@ static void designware_pcie_root_realize(PCIDevice *dev, 
Error **errp)
 }
 
 /*
- * If no inbound iATU windows are configured, HW defaults to
- * letting inbound TLPs to pass in. We emulate that by explicitly
- * configuring first inbound window to cover all of target's
- * address space.
+ * For HW iATU address no match behavior, the TLP should continue with
+ * untranslated address.
  *
- * NOTE: This will not work correctly for the case when first
- * configured inbound window is window 0
+ * We emulate this behavior by adding extra MemoryRegions to create a
+ * 1:1 mapping between PCI address space and cpu address space within
+ * the 64-bit range, encompassing both inbound and outbound directions.
+ *
+ * To avoid interfering with the configured iATU regions and potentially
+ * producing incorrect addresses, the two untranslated regions are set
+ * to have the lowest priority.
  */
-viewport = &root->viewports[DESIGNWARE_PCIE_VIEWPORT_INBOUND][0];
-viewport->cr[1] = DESIGNWARE_PCIE_ATU_ENABLE;
-designware_pcie_update_viewport(root, viewport);
+MemoryRegion *inbound_untranslated = g_new(MemoryRegion, 1);
+
+memory_region_init_alias(inbound_untranslated, OBJECT(root),
+ "inbound untranslated pass",
+ get_system_memory(), dummy_offset, dummy_size);
+memory_region_add_subregion_overlap(&host->pci.address_space_root,
+dummy_offset, inbound_untranslated,
+INT32_MIN);
+memory_region_set_size(inbound_untranslated, UINT64_MAX);
+memory_region_set_address(inbound_untranslated, 0x0ULL);
+memory_region_set_enabled(inbound_untranslated, true);
 
 memory_region_init_io(&root->msi.iomem, OBJECT(root),
   &designware_pci_host_msi_ops,
-- 
2.34.1

[PATCH 2/2] hw/pci-host: Designware: Add outbound iATU no address match behavior

2023-11-15 Thread Max Hsu

IMX6DQRM Rev4, in chapter 48.3.9.2, specifies for outbound iATU with
no address match: 'If there is no address match, then the address is
untranslated.'
The current model implementation only considers inbound occurrences,
neglecting outbound scenarios.

To address this, we introduce a new MemoryRegion to handle the behavior
of no address match in outbound transactions.

This fix has been tested with the integration of Designware PCIe RC
along with the e1000e Ethernet card, ensuring proper functioning of
network transmissions and MSI interrupts.

Signed-off-by: Max Hsu 
---
 hw/pci-host/designware.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index 83dd9b1aaf..d0be8dec68 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -499,6 +499,7 @@ static void designware_pcie_root_realize(PCIDevice *dev, 
Error **errp)
  * to have the lowest priority.
  */
 MemoryRegion *inbound_untranslated = g_new(MemoryRegion, 1);
+MemoryRegion *outbound_untranslated = g_new(MemoryRegion, 1);
 
 memory_region_init_alias(inbound_untranslated, OBJECT(root),
  "inbound untranslated pass",
@@ -510,6 +511,16 @@ static void designware_pcie_root_realize(PCIDevice *dev, 
Error **errp)
 memory_region_set_address(inbound_untranslated, 0x0ULL);
 memory_region_set_enabled(inbound_untranslated, true);
 
+memory_region_init_alias(outbound_untranslated, OBJECT(root),
+ "outbound untranslated pass",
+ &host->pci.memory, dummy_offset, dummy_size);
+memory_region_add_subregion_overlap(get_system_memory(),
+dummy_offset, outbound_untranslated,
+INT32_MIN);
+memory_region_set_size(outbound_untranslated, UINT64_MAX);
+memory_region_set_address(outbound_untranslated, 0x0ULL);
+memory_region_set_enabled(outbound_untranslated, true);
+
 memory_region_init_io(&root->msi.iomem, OBJECT(root),
   &designware_pci_host_msi_ops,
   root, "pcie-msi", 0x4);
-- 
2.34.1

Re: [PATCH-for-9.0 0/8] hw/pci-host/designware: QOM shuffling (Host bridge <-> Root function)

2023-11-15 Thread Philippe Mathieu-Daudé


Cc'ing Sifive developers :)

On 12/10/23 14:18, Philippe Mathieu-Daudé wrote:

Hi,

While trying this PCI host bridge in a hegerogeneous setup
I noticed few discrepancies due to the fact that host bridge
pieces were managed by the root function.

This series move these pieces (ViewPort and MSI regs) to the
host bridge side where they belong. Unfortunately this is
a migration breakage.

I recommend reviewing using 'git-diff --color-moved=dimmed-zebra'.

Regards,

Phil.

Philippe Mathieu-Daudé (8):
   hw/pci-host/designware: Declare CPU QOM types using DEFINE_TYPES()
 macro
   hw/pci-host/designware: Initialize root function in host bridge
 realize
   hw/pci-host/designware: Add 'host_mem' variable for clarity
   hw/pci-host/designware: Hoist host controller in root function #0
   hw/pci-host/designware: Keep host reference in DesignwarePCIEViewport
   hw/pci-host/designware: Move viewports from root func to host bridge
   hw/pci-host/designware: Move MSI registers from root func to host
 bridge
   hw/pci-host/designware: Create ViewPorts during host bridge
 realization

  include/hw/pci-host/designware.h |  20 +-
  hw/pci-host/designware.c | 376 +++
  2 files changed, 187 insertions(+), 209 deletions(-)

[PATCH] tests/avocado/multiprocess: Add asset hashes to silence warnings

2023-11-15 Thread Thomas Huth

The multiprocess test is currently succeeding with an annoying warning:

 (1/2) tests/avocado/multiprocess.py:Multiprocess.test_multiprocess_x86_64:
   WARN: Test passed but there were warnings during execution. Check
   the log for details

In the log, you can find an entry like:

 WARNI| No hash provided. Cannot check the asset file integrity.

Add the proper asset hashes to avoid those warnings.

Signed-off-by: Thomas Huth 
---
 tests/avocado/multiprocess.py | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tests/avocado/multiprocess.py b/tests/avocado/multiprocess.py
index 9112a4cacc..ee7490ae08 100644
--- a/tests/avocado/multiprocess.py
+++ b/tests/avocado/multiprocess.py
@@ -18,8 +18,8 @@ class Multiprocess(QemuSystemTest):
 """
 KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
 
-def do_test(self, kernel_url, initrd_url, kernel_command_line,
-machine_type):
+def do_test(self, kernel_url, kernel_hash, initrd_url, initrd_hash,
+kernel_command_line, machine_type):
 """Main test method"""
 self.require_accelerator('kvm')
 self.require_multiprocess()
@@ -30,8 +30,8 @@ def do_test(self, kernel_url, initrd_url, kernel_command_line,
 os.set_inheritable(proxy_sock.fileno(), True)
 os.set_inheritable(remote_sock.fileno(), True)
 
-kernel_path = self.fetch_asset(kernel_url)
-initrd_path = self.fetch_asset(initrd_url)
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+initrd_path = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
 
 # Create remote process
 remote_vm = self.get_vm()
@@ -72,13 +72,16 @@ def test_multiprocess_x86_64(self):
 kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
   '/linux/releases/31/Everything/x86_64/os/images'
   '/pxeboot/vmlinuz')
+kernel_hash = '5b6f6876e1b5bda314f93893271da0d5777b1f3c'
 initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
   '/linux/releases/31/Everything/x86_64/os/images'
   '/pxeboot/initrd.img')
+initrd_hash = 'dd0340a1b39bd28f88532babd4581c67649ec5b1'
 kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
'console=ttyS0 rdinit=/bin/bash')
 machine_type = 'pc'
-self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
+self.do_test(kernel_url, kernel_hash, initrd_url, initrd_hash,
+ kernel_command_line, machine_type)
 
 def test_multiprocess_aarch64(self):
 """
@@ -87,10 +90,13 @@ def test_multiprocess_aarch64(self):
 kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
   '/linux/releases/31/Everything/aarch64/os/images'
   '/pxeboot/vmlinuz')
+kernel_hash = '3505f2751e2833c681de78cee8dda1e49cabd2e8'
 initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
   '/linux/releases/31/Everything/aarch64/os/images'
   '/pxeboot/initrd.img')
+initrd_hash = '519a1962daf17d67fc3a9c89d45affcb399607db'
 kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
'rdinit=/bin/bash console=ttyAMA0')
 machine_type = 'virt,gic-version=3'
-self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
+self.do_test(kernel_url, kernel_hash, initrd_url, initrd_hash,
+ kernel_command_line, machine_type)
-- 
2.41.0

[PATCH 07/16] hw/uefi: add var-service-auth.c

2023-11-15 Thread Gerd Hoffmann

This implements authenticated variable handling (AuthVariableLib in edk2).

For now this implements the bare minimum to make secure boot work,
by initializing the 'SecureBoot' variable.

Support for authenticated variable updates is not implemented yet, for
now they are read-only so the guest can neither provision secure boot
keys nor update the 'dbx' database.

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/var-service-auth.c | 91 ++
 1 file changed, 91 insertions(+)
 create mode 100644 hw/uefi/var-service-auth.c

diff --git a/hw/uefi/var-service-auth.c b/hw/uefi/var-service-auth.c
new file mode 100644
index ..e7cff65275c2
--- /dev/null
+++ b/hw/uefi/var-service-auth.c
@@ -0,0 +1,91 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi vars device - AuthVariableLib
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/dma.h"
+
+#include "hw/uefi/var-service.h"
+
+static const uint16_t name_pk[]  = { 'P', 'K',
+ 0 };
+static const uint16_t name_setup_mode[]  = { 'S', 'e', 't', 'u', 'p',
+ 'M', 'o', 'd', 'e',
+ 0 };
+static const uint16_t name_sb[]  = { 'S', 'e', 'c', 'u', 'r', 'e',
+ 'B', 'o', 'o', 't',
+ 0 };
+static const uint16_t name_sb_enable[]  = { 'S', 'e', 'c', 'u', 'r', 'e',
+'B', 'o', 'o', 't',
+'E', 'n', 'a', 'b', 'l', 'e',
+0 };
+static const uint16_t name_custom_mode[]  = { 'C', 'u', 's', 't', 'o', 'm',
+  'M', 'o', 'd', 'e',
+  0 };
+
+/* AuthVariableLibInitialize */
+void uefi_vars_auth_init(uefi_vars_state *uv)
+{
+uefi_variable *pk_var, *sbe_var;;
+uint8_t platform_mode, sb, sbe, custom_mode;
+
+/* SetupMode */
+pk_var = uefi_vars_find_variable(uv, EfiGlobalVariable,
+ name_pk, sizeof(name_pk));
+if (!pk_var) {
+platform_mode = SETUP_MODE;
+} else {
+platform_mode = USER_MODE;
+}
+uefi_vars_set_variable(uv, EfiGlobalVariable,
+   name_setup_mode, sizeof(name_setup_mode),
+   EFI_VARIABLE_BOOTSERVICE_ACCESS |
+   EFI_VARIABLE_RUNTIME_ACCESS,
+   &platform_mode, sizeof(platform_mode));
+
+/* TODO: SignatureSupport */
+
+/* SecureBootEnable */
+sbe = SECURE_BOOT_DISABLE;
+sbe_var = uefi_vars_find_variable(uv, EfiSecureBootEnableDisable,
+  name_sb_enable, sizeof(name_sb_enable));
+if (sbe_var) {
+if (platform_mode == USER_MODE) {
+sbe = ((uint8_t*)sbe_var->data)[0];
+}
+} else if (platform_mode == USER_MODE) {
+sbe = SECURE_BOOT_ENABLE;
+uefi_vars_set_variable(uv, EfiSecureBootEnableDisable,
+   name_sb_enable, sizeof(name_sb_enable),
+   EFI_VARIABLE_NON_VOLATILE |
+   EFI_VARIABLE_BOOTSERVICE_ACCESS,
+   &sbe, sizeof(sbe));
+}
+
+/* SecureBoot */
+if ((sbe == SECURE_BOOT_ENABLE) && (platform_mode == USER_MODE)) {
+sb = SECURE_BOOT_MODE_ENABLE;
+} else {
+sb = SECURE_BOOT_MODE_DISABLE;
+}
+uefi_vars_set_variable(uv, EfiGlobalVariable,
+   name_sb, sizeof(name_sb),
+   EFI_VARIABLE_BOOTSERVICE_ACCESS |
+   EFI_VARIABLE_RUNTIME_ACCESS,
+   &sb, sizeof(sb));
+
+/* CustomMode */
+custom_mode = STANDARD_SECURE_BOOT_MODE;
+uefi_vars_set_variable(uv, EfiCustomModeEnable,
+   name_custom_mode, sizeof(name_custom_mode),
+   EFI_VARIABLE_NON_VOLATILE |
+   EFI_VARIABLE_BOOTSERVICE_ACCESS,
+   &custom_mode, sizeof(custom_mode));
+
+/* TODO: certdb */
+/* TODO: certdbv */
+/* TODO: VendorKeysNv */
+/* TODO: VendorKeys */
+}
-- 
2.41.0

[PATCH 02/16] hw/uefi: add include/hw/uefi/var-service-edk2.h

2023-11-15 Thread Gerd Hoffmann

A bunch of #defines and structs copied over from edk2,
mostly needed to decode and encode the messages in the
communication buffer.

Signed-off-by: Gerd Hoffmann 
---
 include/hw/uefi/var-service-edk2.h | 184 +
 1 file changed, 184 insertions(+)
 create mode 100644 include/hw/uefi/var-service-edk2.h

diff --git a/include/hw/uefi/var-service-edk2.h 
b/include/hw/uefi/var-service-edk2.h
new file mode 100644
index ..354b74d1d71c
--- /dev/null
+++ b/include/hw/uefi/var-service-edk2.h
@@ -0,0 +1,184 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi-vars device - structs and defines from edk2
+ *
+ * Note: The edk2 UINTN type has been mapped to uint64_t,
+ *   so the structs are compatible with 64bit edk2 builds.
+ */
+#ifndef QEMU_UEFI_VAR_SERVICE_EDK2_H
+#define QEMU_UEFI_VAR_SERVICE_EDK2_H
+
+#include "qemu/uuid.h"
+
+#define MAX_BIT   0x8000ULL
+#define ENCODE_ERROR(StatusCode)  (MAX_BIT | (StatusCode))
+#define EFI_SUCCESS   0
+#define EFI_INVALID_PARAMETER ENCODE_ERROR(2)
+#define EFI_UNSUPPORTED   ENCODE_ERROR(3)
+#define EFI_BAD_BUFFER_SIZE   ENCODE_ERROR(4)
+#define EFI_BUFFER_TOO_SMALL  ENCODE_ERROR(5)
+#define EFI_WRITE_PROTECTED   ENCODE_ERROR(8)
+#define EFI_OUT_OF_RESOURCES  ENCODE_ERROR(9)
+#define EFI_NOT_FOUND ENCODE_ERROR(14)
+#define EFI_ACCESS_DENIED ENCODE_ERROR(15)
+#define EFI_ALREADY_STARTED   ENCODE_ERROR(20)
+
+#define EFI_VARIABLE_NON_VOLATILE   0x01
+#define EFI_VARIABLE_BOOTSERVICE_ACCESS 0x02
+#define EFI_VARIABLE_RUNTIME_ACCESS 0x04
+#define EFI_VARIABLE_HARDWARE_ERROR_RECORD  0x08
+#define EFI_VARIABLE_AUTHENTICATED_WRITE_ACCESS 0x10  // deprecated
+#define EFI_VARIABLE_TIME_BASED_AUTHENTICATED_WRITE_ACCESS  0x20
+#define EFI_VARIABLE_APPEND_WRITE   0x40
+
+/* SecureBootEnable */
+#define SECURE_BOOT_ENABLE 1
+#define SECURE_BOOT_DISABLE0
+
+/* SecureBoot */
+#define SECURE_BOOT_MODE_ENABLE1
+#define SECURE_BOOT_MODE_DISABLE   0
+
+/* CustomMode */
+#define CUSTOM_SECURE_BOOT_MODE1
+#define STANDARD_SECURE_BOOT_MODE  0
+
+/* SetupMode */
+#define SETUP_MODE 1
+#define USER_MODE  0
+
+typedef uint64_t efi_status;
+typedef struct mm_header mm_header;
+
+/* EFI_MM_COMMUNICATE_HEADER */
+struct mm_header {
+QemuUUID  guid;
+uint64_t  length;
+};
+
+/* --- EfiSmmVariableProtocol  */
+
+#define SMM_VARIABLE_FUNCTION_GET_VARIABLE1
+#define SMM_VARIABLE_FUNCTION_GET_NEXT_VARIABLE_NAME  2
+#define SMM_VARIABLE_FUNCTION_SET_VARIABLE3
+#define SMM_VARIABLE_FUNCTION_QUERY_VARIABLE_INFO 4
+#define SMM_VARIABLE_FUNCTION_READY_TO_BOOT   5
+#define SMM_VARIABLE_FUNCTION_EXIT_BOOT_SERVICE   6
+#define SMM_VARIABLE_FUNCTION_LOCK_VARIABLE   8
+#define SMM_VARIABLE_FUNCTION_GET_PAYLOAD_SIZE   11
+
+typedef struct mm_variable mm_variable;
+typedef struct mm_variable_access mm_variable_access;
+typedef struct mm_next_variable mm_next_variable;
+typedef struct mm_next_variable mm_lock_variable;
+typedef struct mm_variable_info mm_variable_info;
+typedef struct mm_get_payload_size mm_get_payload_size;
+
+/* SMM_VARIABLE_COMMUNICATE_HEADER */
+struct mm_variable {
+uint64_t  function;
+uint64_t  status;
+};
+
+/* SMM_VARIABLE_COMMUNICATE_ACCESS_VARIABLE */
+struct QEMU_PACKED mm_variable_access {
+QemuUUID  guid;
+uint64_t  data_size;
+uint64_t  name_size;
+uint32_t  attributes;
+/* Name */
+/* Data */
+};
+
+/* SMM_VARIABLE_COMMUNICATE_GET_NEXT_VARIABLE_NAME */
+struct mm_next_variable {
+QemuUUID  guid;
+uint64_t  name_size;
+/* Name */
+};
+
+/* SMM_VARIABLE_COMMUNICATE_QUERY_VARIABLE_INFO */
+struct QEMU_PACKED mm_variable_info {
+uint64_t max_storage_size;
+uint64_t free_storage_size;
+uint64_t max_variable_size;
+uint32_t attributes;
+};
+
+/* SMM_VARIABLE_COMMUNICATE_GET_PAYLOAD_SIZE */
+struct mm_get_payload_size {
+uint64_t  payload_size;
+};
+
+/* --- VarCheckPolicyLibMmiHandler --- */
+
+#define VAR_CHECK_POLICY_COMMAND_DISABLE 0x01
+#define VAR_CHECK_POLICY_COMMAND_IS_ENABLED  0x02
+#define VAR_CHECK_POLICY_COMMAND_REGISTER0x03
+#define VAR_CHECK_POLICY_COMMAND_DUMP0x04
+#define VAR_CHECK_POLICY_COMMAND_LOCK0x05
+
+typedef struct mm_check_policy mm_check_policy;
+typedef struct mm_check_policy_is_enabled mm_check_policy_is_enabled;
+typedef struct mm_check_policy_dump_params mm_check_policy_dump_params;
+
+/* VAR_CHECK_POLICY_COMM_HEADER */
+struct QEMU_PACKED mm_check_policy {
+uint32_t  signature;
+uint32_t  revision;
+uint32_t  command;
+uint64_t  result;
+};
+
+/* VAR_CHECK_POLICY_COMM_IS_ENABLED_PARAMS */
+struct QEMU_PACKED mm_check_p

[PATCH 15/16] hw/arm: add uefi variable support to virt machine type

2023-11-15 Thread Gerd Hoffmann

Add -machine virt,x-uefi-vars={on,off} property.  Default is off.
When enabled wire up the uefi-vars-sysbus device.

TODO: wire up jsonfile property.

Signed-off-by: Gerd Hoffmann 
---
 include/hw/arm/virt.h |  2 ++
 hw/arm/virt.c | 41 +
 2 files changed, 43 insertions(+)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index f69239850e61..3dd655b880a9 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -76,6 +76,7 @@ enum {
 VIRT_ACPI_GED,
 VIRT_NVDIMM_ACPI,
 VIRT_PVTIME,
+VIRT_UEFI_VARS,
 VIRT_LOWMEMMAP_LAST,
 };
 
@@ -150,6 +151,7 @@ struct VirtMachineState {
 bool ras;
 bool mte;
 bool dtb_randomness;
+bool uefi_vars;
 OnOffAuto acpi;
 VirtGICType gic_version;
 VirtIOMMUType iommu;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 529f1c089c08..49f692fda7cf 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -65,6 +65,7 @@
 #include "hw/intc/arm_gicv3_common.h"
 #include "hw/intc/arm_gicv3_its_common.h"
 #include "hw/irq.h"
+#include "hw/uefi/var-service-api.h"
 #include "kvm_arm.h"
 #include "hw/firmware/smbios.h"
 #include "qapi/visitor.h"
@@ -155,6 +156,7 @@ static const MemMapEntry base_memmap[] = {
 [VIRT_NVDIMM_ACPI] ={ 0x0909, NVDIMM_ACPI_IO_LEN},
 [VIRT_PVTIME] = { 0x090a, 0x0001 },
 [VIRT_SECURE_GPIO] ={ 0x090b, 0x1000 },
+[VIRT_UEFI_VARS] =  { 0x090c, 0x0010 },
 [VIRT_MMIO] =   { 0x0a00, 0x0200 },
 /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
 [VIRT_PLATFORM_BUS] =   { 0x0c00, 0x0200 },
@@ -1296,6 +1298,24 @@ static FWCfgState *create_fw_cfg(const VirtMachineState 
*vms, AddressSpace *as)
 return fw_cfg;
 }
 
+static void create_uefi_vars(const VirtMachineState *vms)
+{
+hwaddr base = vms->memmap[VIRT_UEFI_VARS].base;
+hwaddr size = vms->memmap[VIRT_UEFI_VARS].size;
+MachineState *ms = MACHINE(vms);
+char *nodename;
+
+sysbus_create_simple("uefi-vars-sysbus", base, NULL);
+
+nodename = g_strdup_printf("/%s@%" PRIx64, UEFI_VARS_FDT_NODE, base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_string(ms->fdt, nodename,
+"compatible", UEFI_VARS_FDT_COMPAT);
+qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg",
+ 2, base, 2, size);
+g_free(nodename);
+}
+
 static void create_pcie_irq_map(const MachineState *ms,
 uint32_t gic_phandle,
 int first_irq, const char *nodename)
@@ -2306,6 +2326,10 @@ static void machvirt_init(MachineState *machine)
 vms->fw_cfg = create_fw_cfg(vms, &address_space_memory);
 rom_set_fw(vms->fw_cfg);
 
+if (vms->uefi_vars) {
+create_uefi_vars(vms);
+}
+
 create_platform_bus(vms);
 
 if (machine->nvdimms_state->is_enabled) {
@@ -2502,6 +2526,20 @@ static void virt_set_oem_table_id(Object *obj, const 
char *value,
 strncpy(vms->oem_table_id, value, 8);
 }
 
+static bool virt_get_uefi_vars(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->uefi_vars;
+}
+
+static void virt_set_uefi_vars(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->uefi_vars = value;
+}
+
 
 bool virt_is_acpi_enabled(VirtMachineState *vms)
 {
@@ -3092,6 +3130,9 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
   "in ACPI table header."
   "The string may be up to 8 bytes in 
size");
 
+object_class_property_add_bool(oc, "x-uefi-vars",
+   virt_get_uefi_vars,
+   virt_set_uefi_vars);
 }
 
 static void virt_instance_init(Object *obj)
-- 
2.41.0

[PATCH 00/16] hw/uefi: add uefi variable service

2023-11-15 Thread Gerd Hoffmann

This patch adds a virtual device to qemu which the uefi firmware can use
to store variables.  This moves the UEFI variable management from
privileged guest code (managing vars in pflash) to the host.  Main
advantage is that the need to have privilege separation in the guest
goes away.

On x86 privileged guest code runs in SMM.  It's supported by kvm, but
not liked much by various stakeholders in cloud space due to the
complexity SMM emulation brings.

On arm privileged guest code runs in el3 (aka secure world).  This is
not supported by kvm, which is unlikely to change anytime soon given
that even el2 support (nested virt) is being worked on for years and is
not yet in mainline.

The design idea is to reuse the request serialization protocol edk2 uses
for communication between SMM and non-SMM code, so large chunks of the
edk2 variable driver stack can be used unmodified.  Only the driver
which traps into SMM mode must be replaced by a driver which talks to
qemu instead.

A edk2 test branch can be found here (build with "-D QEMU_VARS=TRUE").
https://github.com/kraxel/edk2/commits/devel/secure-boot-external-vars

The uefi-vars device must re-implement the privileged edk2 protocols
(i.e. the code running in SMM mode).  The implementation is not complete
yet, specifically updating authenticated variables is not implemented.
These variables are simply read-only for now.

But there is enough functionality working that it is possible to run
guests, including guests in secure boot mode, so I'm sending this out
for feedback (before tackling the remaining 20% which evidently will
need 80% of the time ;)

Because the guest can not write to authenticated variables (yet) it can
not enroll secure boot keys itself, this must be done on the host.  The
virt-firmware tools (https://gitlab.com/kraxel/virt-firmware) can be
used for that:

virt-fw-vars --enroll-redhat --secure-boot --output-json uefivars.json

enjoy & take care,
  Gerd

Gerd Hoffmann (16):
  hw/uefi: add include/hw/uefi/var-service-api.h
  hw/uefi: add include/hw/uefi/var-service-edk2.h
  hw/uefi: add include/hw/uefi/var-service.h
  hw/uefi: add var-service-guid.c
  hw/uefi: add var-service-core.c
  hw/uefi: add var-service-vars.c
  hw/uefi: add var-service-auth.c
  hw/uefi: add var-service-policy.c
  hw/uefi: add support for storing persistent variables on disk
  hw/uefi: add trace-events
  hw/uefi: add to Kconfig
  hw/uefi: add to meson
  hw/uefi: add uefi-vars-sysbus device
  hw/uefi: add uefi-vars-isa device
  hw/arm: add uefi variable support to virt machine type
  docs: add uefi variable service documentation and TODO list.

 include/hw/arm/virt.h  |   2 +
 include/hw/uefi/var-service-api.h  |  40 ++
 include/hw/uefi/var-service-edk2.h | 184 +
 include/hw/uefi/var-service.h  | 119 ++
 hw/arm/virt.c  |  41 ++
 hw/uefi/var-service-auth.c |  91 +
 hw/uefi/var-service-core.c | 350 +
 hw/uefi/var-service-guid.c |  61 +++
 hw/uefi/var-service-isa.c  |  88 +
 hw/uefi/var-service-json.c | 194 ++
 hw/uefi/var-service-policy.c   | 390 +++
 hw/uefi/var-service-sysbus.c   |  87 +
 hw/uefi/var-service-vars.c | 602 +
 docs/devel/index-internals.rst |   1 +
 docs/devel/uefi-vars.rst   |  66 
 hw/Kconfig |   1 +
 hw/meson.build |   1 +
 hw/uefi/Kconfig|   9 +
 hw/uefi/TODO.md|  17 +
 hw/uefi/meson.build|  18 +
 hw/uefi/trace-events   |  16 +
 meson.build|   1 +
 qapi/meson.build   |   1 +
 qapi/qapi-schema.json  |   1 +
 qapi/uefi.json |  40 ++
 25 files changed, 2421 insertions(+)
 create mode 100644 include/hw/uefi/var-service-api.h
 create mode 100644 include/hw/uefi/var-service-edk2.h
 create mode 100644 include/hw/uefi/var-service.h
 create mode 100644 hw/uefi/var-service-auth.c
 create mode 100644 hw/uefi/var-service-core.c
 create mode 100644 hw/uefi/var-service-guid.c
 create mode 100644 hw/uefi/var-service-isa.c
 create mode 100644 hw/uefi/var-service-json.c
 create mode 100644 hw/uefi/var-service-policy.c
 create mode 100644 hw/uefi/var-service-sysbus.c
 create mode 100644 hw/uefi/var-service-vars.c
 create mode 100644 docs/devel/uefi-vars.rst
 create mode 100644 hw/uefi/Kconfig
 create mode 100644 hw/uefi/TODO.md
 create mode 100644 hw/uefi/meson.build
 create mode 100644 hw/uefi/trace-events
 create mode 100644 qapi/uefi.json

-- 
2.41.0

[PATCH 08/16] hw/uefi: add var-service-policy.c

2023-11-15 Thread Gerd Hoffmann

Implement variable policies (Edk2VariablePolicyProtocol).

This protocol allows to define restrictions for variables.
It also allows to lock down variables (disallow write access).

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/var-service-policy.c | 390 +++
 1 file changed, 390 insertions(+)
 create mode 100644 hw/uefi/var-service-policy.c

diff --git a/hw/uefi/var-service-policy.c b/hw/uefi/var-service-policy.c
new file mode 100644
index ..f44edb358c8f
--- /dev/null
+++ b/hw/uefi/var-service-policy.c
@@ -0,0 +1,390 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi vars device - VarCheckPolicyLibMmiHandler implementation
+ *
+ * variable policy specs:
+ * 
https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Library/VariablePolicyLib/ReadMe.md
+ */
+#include "qemu/osdep.h"
+#include "sysemu/dma.h"
+#include "migration/vmstate.h"
+
+#include "hw/uefi/var-service.h"
+#include "hw/uefi/var-service-api.h"
+#include "hw/uefi/var-service-edk2.h"
+
+#include "trace/trace-hw_uefi.h"
+
+static void calc_policy(uefi_var_policy *pol);
+
+static int uefi_var_policy_post_load(void *opaque, int version_id)
+{
+uefi_var_policy *pol = opaque;
+
+calc_policy(pol);
+return 0;
+}
+
+const VMStateDescription vmstate_uefi_var_policy = {
+.name = "uefi-var-policy",
+.post_load = uefi_var_policy_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(entry_size, uefi_var_policy),
+VMSTATE_VBUFFER_ALLOC_UINT32(entry, uefi_var_policy,
+ 0, NULL, entry_size),
+VMSTATE_END_OF_LIST()
+},
+};
+
+static void print_policy_entry(variable_policy_entry *pe)
+{
+uint16_t *name = (void *)pe + pe->offset_to_name;
+
+fprintf(stderr, "%s:\n", __func__);
+
+fprintf(stderr, "name ´");
+while (*name) {
+fprintf(stderr, "%c", *name);
+name++;
+}
+fprintf(stderr, "', version=%d.%d, size=%d\n",
+pe->version >> 16, pe->version & 0x, pe->size);
+
+if (pe->min_size) {
+fprintf(stderr, "size min=%d\n", pe->min_size);
+}
+if (pe->max_size != UINT32_MAX) {
+fprintf(stderr, "size max=%u\n", pe->max_size);
+}
+if (pe->attributes_must_have) {
+fprintf(stderr, "attr must=0x%x\n", pe->attributes_must_have);
+}
+if (pe->attributes_cant_have) {
+fprintf(stderr, "attr cant=0x%x\n", pe->attributes_cant_have);
+}
+if (pe->lock_policy_type) {
+fprintf(stderr, "lock policy type %d\n", pe->lock_policy_type);
+}
+}
+
+static gboolean wildcard_strcmp(uefi_var_policy *pol,
+uefi_variable *var)
+{
+size_t pos = 0;
+size_t plen = pol->name_size / 2;
+size_t vlen = var->name_size / 2;
+
+if (plen == 0) {
+return true;
+}
+
+for (;;) {
+if (pos == plen && pos == vlen) {
+return true;
+}
+if (pos == plen || pos == vlen) {
+return false;
+}
+if (pol->name[pos] == 0 && var->name[pos] == 0) {
+return true;
+}
+
+if (pol->name[pos] == '#') {
+if (!isxdigit(var->name[pos])) {
+return false;
+}
+} else {
+if (pol->name[pos] != var->name[pos]) {
+return false;
+}
+}
+
+pos++;
+}
+}
+
+static uefi_var_policy *find_policy(uefi_vars_state *uv, QemuUUID guid,
+uint16_t *name, uint64_t name_size)
+{
+uefi_var_policy *pol;
+
+QTAILQ_FOREACH(pol, &uv->var_policies, next) {
+if (!qemu_uuid_is_equal(&pol->entry->namespace, &guid)) {
+continue;
+}
+if (!uefi_str_equal(pol->name, pol->name_size,
+name, name_size)) {
+continue;
+}
+return pol;
+}
+return NULL;
+}
+
+static uefi_var_policy *wildcard_find_policy(uefi_vars_state *uv,
+ uefi_variable *var)
+{
+uefi_var_policy *pol;
+
+QTAILQ_FOREACH(pol, &uv->var_policies, next) {
+if (!qemu_uuid_is_equal(&pol->entry->namespace, &var->guid)) {
+continue;
+}
+if (!wildcard_strcmp(pol, var)) {
+continue;
+}
+return pol;
+}
+return NULL;
+}
+
+static void calc_policy(uefi_var_policy *pol)
+{
+variable_policy_entry *pe = pol->entry;
+unsigned int i;
+
+pol->name = (void *)pol->entry + pe->offset_to_name;
+pol->name_size = pe->size - pe->offset_to_name;
+
+for (i = 0; i < pol->name_size / 2; i++) {
+if (pol->name[i] == '#') {
+pol->hashmarks++;
+}
+}
+}
+
+uefi_var_policy *uefi_vars_add_policy(uefi_vars_state *uv,
+  variable_policy_entry *pe)
+{
+uefi_var_policy *pol, *p;
+
+pol = g_new0(uefi_var_policy, 1);
+pol->entry = g_ma

[PATCH 12/16] hw/uefi: add to meson

2023-11-15 Thread Gerd Hoffmann

Wire up uefi-vars in the build system.

Signed-off-by: Gerd Hoffmann 
---
 hw/meson.build  |  1 +
 hw/uefi/meson.build | 12 
 meson.build |  1 +
 3 files changed, 14 insertions(+)
 create mode 100644 hw/uefi/meson.build

diff --git a/hw/meson.build b/hw/meson.build
index f01fac4617c9..f8a7e4a2d8db 100644
--- a/hw/meson.build
+++ b/hw/meson.build
@@ -37,6 +37,7 @@ subdir('smbios')
 subdir('ssi')
 subdir('timer')
 subdir('tpm')
+subdir('uefi')
 subdir('ufs')
 subdir('usb')
 subdir('vfio')
diff --git a/hw/uefi/meson.build b/hw/uefi/meson.build
new file mode 100644
index ..b620297320d0
--- /dev/null
+++ b/hw/uefi/meson.build
@@ -0,0 +1,12 @@
+uefi_vars_ss = ss.source_set()
+uefi_vars_ss.add(when: 'CONFIG_UEFI_VARS',
+ if_true: files('var-service-core.c',
+'var-service-json.c',
+'var-service-vars.c',
+'var-service-auth.c',
+'var-service-guid.c',
+'var-service-policy.c'))
+
+modules += { 'hw-uefi' : {
+'vars' : uefi_vars_ss,
+}}
diff --git a/meson.build b/meson.build
index dcef8b1e7911..05a60631a81f 100644
--- a/meson.build
+++ b/meson.build
@@ -3290,6 +3290,7 @@ if have_system
 'hw/ssi',
 'hw/timer',
 'hw/tpm',
+'hw/uefi',
 'hw/ufs',
 'hw/usb',
 'hw/vfio',
-- 
2.41.0

[PATCH 10/16] hw/uefi: add trace-events

2023-11-15 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/trace-events | 16 
 1 file changed, 16 insertions(+)
 create mode 100644 hw/uefi/trace-events

diff --git a/hw/uefi/trace-events b/hw/uefi/trace-events
new file mode 100644
index ..baeda81bbe12
--- /dev/null
+++ b/hw/uefi/trace-events
@@ -0,0 +1,16 @@
+# device
+uefi_reg_read(uint64_t addr, unsigned size) "addr 0x%lx, size %d"
+uefi_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr 0x%lx, val 
0x%lx, size %d"
+uefi_hard_reset(void) ""
+
+# generic uefi
+uefi_variable(const char *context, const char *name, uint64_t size, const char 
*uuid) "context %s, name %s, size %ld, uuid %s"
+uefi_status(const char *context, const char *name) "context %s, status %s"
+uefi_event(const char *name) "event %s"
+
+# variable protocol
+uefi_vars_proto_cmd(const char *cmd) "cmd %s"
+
+# variable policy protocol
+uefi_vars_policy_cmd(const char *cmd) "cmd %s"
+uefi_vars_policy_deny(const char *reason) "reason %s"
-- 
2.41.0

[PATCH 13/16] hw/uefi: add uefi-vars-sysbus device

2023-11-15 Thread Gerd Hoffmann

This adds sysbus bindings for the variable service.

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/var-service-sysbus.c | 87 
 hw/uefi/meson.build  |  3 +-
 2 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 hw/uefi/var-service-sysbus.c

diff --git a/hw/uefi/var-service-sysbus.c b/hw/uefi/var-service-sysbus.c
new file mode 100644
index ..2b393fc768a9
--- /dev/null
+++ b/hw/uefi/var-service-sysbus.c
@@ -0,0 +1,87 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi vars device - sysbus variant.
+ */
+#include "qemu/osdep.h"
+#include "migration/vmstate.h"
+
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+
+#include "hw/uefi/var-service.h"
+#include "hw/uefi/var-service-api.h"
+
+#define TYPE_UEFI_VARS_SYSBUS "uefi-vars-sysbus"
+OBJECT_DECLARE_SIMPLE_TYPE(uefi_vars_sysbus_state, UEFI_VARS_SYSBUS)
+
+struct uefi_vars_sysbus_state {
+SysBusDevice parent_obj;
+struct uefi_vars_state state;
+};
+
+static const VMStateDescription vmstate_uefi_vars_sysbus = {
+.name = "uefi-vars-sysbus",
+.fields = (VMStateField[]) {
+VMSTATE_STRUCT(state, uefi_vars_sysbus_state, 0,
+   vmstate_uefi_vars, uefi_vars_state),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static Property uefi_vars_sysbus_properties[] = {
+DEFINE_PROP_SIZE("size", uefi_vars_sysbus_state, state.max_storage,
+ 256 * 1024),
+DEFINE_PROP_STRING("jsonfile", uefi_vars_sysbus_state, state.jsonfile),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void uefi_vars_sysbus_init(Object *obj)
+{
+uefi_vars_sysbus_state *uv = UEFI_VARS_SYSBUS(obj);
+
+uefi_vars_init(obj, &uv->state);
+}
+
+static void uefi_vars_sysbus_reset(DeviceState *dev)
+{
+uefi_vars_sysbus_state *uv = UEFI_VARS_SYSBUS(dev);
+
+uefi_vars_hard_reset(&uv->state);
+}
+
+static void uefi_vars_sysbus_realize(DeviceState *dev, Error **errp)
+{
+uefi_vars_sysbus_state *uv = UEFI_VARS_SYSBUS(dev);
+SysBusDevice *sysbus = SYS_BUS_DEVICE(dev);
+
+sysbus_init_mmio(sysbus, &uv->state.mr);
+uefi_vars_realize(&uv->state, errp);
+}
+
+static void uefi_vars_sysbus_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->realize = uefi_vars_sysbus_realize;
+dc->reset = uefi_vars_sysbus_reset;
+dc->vmsd = &vmstate_uefi_vars_sysbus;
+device_class_set_props(dc, uefi_vars_sysbus_properties);
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo uefi_vars_sysbus_info = {
+.name  = TYPE_UEFI_VARS_SYSBUS,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(uefi_vars_sysbus_state),
+.instance_init = uefi_vars_sysbus_init,
+.class_init= uefi_vars_sysbus_class_init,
+};
+module_obj(TYPE_UEFI_VARS_SYSBUS);
+
+static void uefi_vars_sysbus_register_types(void)
+{
+type_register_static(&uefi_vars_sysbus_info);
+}
+
+type_init(uefi_vars_sysbus_register_types)
diff --git a/hw/uefi/meson.build b/hw/uefi/meson.build
index b620297320d0..dc363d67 100644
--- a/hw/uefi/meson.build
+++ b/hw/uefi/meson.build
@@ -5,7 +5,8 @@ uefi_vars_ss.add(when: 'CONFIG_UEFI_VARS',
 'var-service-vars.c',
 'var-service-auth.c',
 'var-service-guid.c',
-'var-service-policy.c'))
+'var-service-policy.c',
+'var-service-sysbus.c'))
 
 modules += { 'hw-uefi' : {
 'vars' : uefi_vars_ss,
-- 
2.41.0

[PATCH 05/16] hw/uefi: add var-service-core.c

2023-11-15 Thread Gerd Hoffmann

This is the core code for guest <-> host communication.  This accepts
request messages from the guest, dispatches them to the service called,
and sends back the response message.

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/var-service-core.c | 350 +
 1 file changed, 350 insertions(+)
 create mode 100644 hw/uefi/var-service-core.c

diff --git a/hw/uefi/var-service-core.c b/hw/uefi/var-service-core.c
new file mode 100644
index ..b37f5c403d2f
--- /dev/null
+++ b/hw/uefi/var-service-core.c
@@ -0,0 +1,350 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi vars device
+ */
+#include "qemu/osdep.h"
+#include "sysemu/dma.h"
+#include "migration/vmstate.h"
+
+#include "hw/uefi/var-service.h"
+#include "hw/uefi/var-service-api.h"
+#include "hw/uefi/var-service-edk2.h"
+
+#include "trace/trace-hw_uefi.h"
+
+static int uefi_vars_pre_load(void *opaque)
+{
+uefi_vars_state *uv = opaque;
+
+uefi_vars_clear_all(uv);
+uefi_vars_policies_clear(uv);
+g_free(uv->buffer);
+return 0;
+}
+
+static int uefi_vars_post_load(void *opaque, int version_id)
+{
+uefi_vars_state *uv = opaque;
+
+uefi_vars_update_storage(uv);
+uv->buffer = g_malloc(uv->buf_size);
+return 0;
+}
+
+const VMStateDescription vmstate_uefi_vars = {
+.name = "uefi-vars",
+.pre_load = uefi_vars_pre_load,
+.post_load = uefi_vars_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT16(sts, uefi_vars_state),
+VMSTATE_UINT32(buf_size, uefi_vars_state),
+VMSTATE_UINT32(buf_addr_lo, uefi_vars_state),
+VMSTATE_UINT32(buf_addr_hi, uefi_vars_state),
+VMSTATE_BOOL(end_of_dxe, uefi_vars_state),
+VMSTATE_BOOL(ready_to_boot, uefi_vars_state),
+VMSTATE_BOOL(exit_boot_service, uefi_vars_state),
+VMSTATE_BOOL(policy_locked, uefi_vars_state),
+VMSTATE_UINT64(used_storage, uefi_vars_state),
+VMSTATE_QTAILQ_V(variables, uefi_vars_state, 0,
+ vmstate_uefi_variable, uefi_variable, next),
+VMSTATE_QTAILQ_V(var_policies, uefi_vars_state, 0,
+ vmstate_uefi_var_policy, uefi_var_policy, next),
+VMSTATE_END_OF_LIST()
+},
+};
+
+size_t uefi_strlen(const uint16_t *str, size_t len)
+{
+size_t pos = 0;
+
+for (;;) {
+if (pos == len) {
+return pos;
+}
+if (str[pos] == 0) {
+return pos;
+}
+pos++;
+}
+}
+
+gboolean uefi_str_equal(const uint16_t *a, size_t alen,
+const uint16_t *b, size_t blen)
+{
+size_t pos = 0;
+
+alen = alen / 2;
+blen = blen / 2;
+for (;;) {
+if (pos == alen && pos == blen) {
+return true;
+}
+if (pos == alen && b[pos] == 0) {
+return true;
+}
+if (pos == blen && a[pos] == 0) {
+return true;
+}
+if (pos == alen || pos == blen) {
+return false;
+}
+if (a[pos] == 0 && b[pos] == 0) {
+return true;
+}
+if (a[pos] != b[pos]) {
+return false;
+}
+pos++;
+}
+}
+
+char *uefi_ucs2_to_ascii(const uint16_t *ucs2, uint64_t ucs2_size)
+{
+char *str = g_malloc0(ucs2_size / 2 + 1);
+int i;
+
+for (i = 0; i * 2 < ucs2_size; i++) {
+if (ucs2[i] == 0) {
+break;
+}
+if (ucs2[i] < 128) {
+str[i] = ucs2[i];
+} else {
+str[i] = '?';
+}
+}
+str[i] = 0;
+return str;
+}
+
+void uefi_trace_variable(const char *action, QemuUUID guid,
+ const uint16_t *name, uint64_t name_size)
+{
+QemuUUID be = qemu_uuid_bswap(guid);
+char *str_uuid = qemu_uuid_unparse_strdup(&be);
+char *str_name = uefi_ucs2_to_ascii(name, name_size);
+
+trace_uefi_variable(action, str_name, name_size, str_uuid);
+
+g_free(str_name);
+g_free(str_uuid);
+}
+
+void uefi_trace_status(const char *action, efi_status status)
+{
+switch (status) {
+case EFI_SUCCESS:
+trace_uefi_status(action, "success");
+break;
+case EFI_INVALID_PARAMETER:
+trace_uefi_status(action, "invalid parameter");
+break;
+case EFI_UNSUPPORTED:
+trace_uefi_status(action, "unsupported");
+break;
+case EFI_BAD_BUFFER_SIZE:
+trace_uefi_status(action, "bad buffer size");
+break;
+case EFI_BUFFER_TOO_SMALL:
+trace_uefi_status(action, "buffer too small");
+break;
+case EFI_WRITE_PROTECTED:
+trace_uefi_status(action, "write protected");
+break;
+case EFI_OUT_OF_RESOURCES:
+trace_uefi_status(action, "out of resources");
+break;
+case EFI_NOT_FOUND:
+trace_uefi_status(action, "not found");
+break;
+case EFI_ACCESS_DENIED:
+trace_uefi_status(action, "access denied");
+break;
+case EFI_ALREADY_STAR

[PATCH 03/16] hw/uefi: add include/hw/uefi/var-service.h

2023-11-15 Thread Gerd Hoffmann

Add state structs and function declarations for the uefi-vars device.

Signed-off-by: Gerd Hoffmann 
---
 include/hw/uefi/var-service.h | 119 ++
 1 file changed, 119 insertions(+)
 create mode 100644 include/hw/uefi/var-service.h

diff --git a/include/hw/uefi/var-service.h b/include/hw/uefi/var-service.h
new file mode 100644
index ..2b8d3052e59f
--- /dev/null
+++ b/include/hw/uefi/var-service.h
@@ -0,0 +1,119 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi-vars device - state struct and function prototypes
+ */
+#ifndef QEMU_UEFI_VAR_SERVICE_H
+#define QEMU_UEFI_VAR_SERVICE_H
+
+#include "qemu/uuid.h"
+#include "qemu/queue.h"
+
+#include "hw/uefi/var-service-edk2.h"
+
+#define MAX_BUFFER_SIZE (64 * 1024)
+
+typedef struct uefi_variable uefi_variable;
+typedef struct uefi_var_policy uefi_var_policy;
+typedef struct uefi_vars_state uefi_vars_state;
+
+struct uefi_variable {
+QemuUUID  guid;
+uint16_t  *name;
+uint32_t  name_size;
+uint32_t  attributes;
+void  *data;
+uint32_t  data_size;
+QTAILQ_ENTRY(uefi_variable)   next;
+};
+
+struct uefi_var_policy {
+variable_policy_entry *entry;
+uint32_t  entry_size;
+uint16_t  *name;
+uint32_t  name_size;
+uint32_t  hashmarks;
+QTAILQ_ENTRY(uefi_var_policy) next;
+};
+
+struct uefi_vars_state {
+MemoryRegion  mr;
+uint16_t  sts;
+uint32_t  buf_size;
+uint32_t  buf_addr_lo;
+uint32_t  buf_addr_hi;
+uint8_t   *buffer;
+QTAILQ_HEAD(, uefi_variable)  variables;
+QTAILQ_HEAD(, uefi_var_policy)var_policies;
+
+/* boot phases */
+bool  end_of_dxe;
+bool  ready_to_boot;
+bool  exit_boot_service;
+bool  policy_locked;
+
+/* storage accounting */
+uint64_t  max_storage;
+uint64_t  used_storage;
+
+char  *jsonfile;
+int   jsonfd;
+};
+
+/* vars-service-guid.c */
+extern QemuUUID EfiGlobalVariable;
+extern QemuUUID EfiImageSecurityDatabase;
+extern QemuUUID EfiCustomModeEnable;
+extern QemuUUID EfiSecureBootEnableDisable;
+extern QemuUUID EfiSmmVariableProtocolGuid;
+extern QemuUUID VarCheckPolicyLibMmiHandlerGuid;
+extern QemuUUID EfiEndOfDxeEventGroupGuid;
+extern QemuUUID EfiEventReadyToBootGuid;
+extern QemuUUID EfiEventExitBootServicesGuid;
+
+/* vars-service-core.c */
+extern const VMStateDescription vmstate_uefi_vars;
+size_t uefi_strlen(const uint16_t *str, size_t len);
+gboolean uefi_str_equal(const uint16_t *a, size_t alen,
+const uint16_t *b, size_t blen);
+char *uefi_ucs2_to_ascii(const uint16_t *ucs2, uint64_t ucs2_size);
+void uefi_trace_variable(const char *action, QemuUUID guid,
+ const uint16_t *name, uint64_t name_size);
+void uefi_trace_status(const char *action, efi_status status);
+void uefi_vars_init(Object *obj, uefi_vars_state *uv);
+void uefi_vars_realize(uefi_vars_state *uv, Error **errp);
+void uefi_vars_hard_reset(uefi_vars_state *uv);
+
+/* vars-service-json.c */
+void uefi_vars_json_init(uefi_vars_state *uv, Error **errp);
+void uefi_vars_json_save(uefi_vars_state *uv);
+void uefi_vars_json_load(uefi_vars_state *uv, Error **errp);
+
+/* vars-service-vars.c */
+extern const VMStateDescription vmstate_uefi_variable;
+uefi_variable *uefi_vars_find_variable(uefi_vars_state *uv, QemuUUID guid,
+   const uint16_t *name,
+   uint64_t name_size);
+void uefi_vars_set_variable(uefi_vars_state *uv, QemuUUID guid,
+const uint16_t *name, uint64_t name_size,
+uint32_t attributes,
+void *data, uint64_t data_size);
+void uefi_vars_clear_volatile(uefi_vars_state *uv);
+void uefi_vars_clear_all(uefi_vars_state *uv);
+void uefi_vars_update_storage(uefi_vars_state *uv);
+uint32_t uefi_vars_mm_vars_proto(uefi_vars_state *uv);
+
+/* vars-service-auth.c */
+void uefi_vars_auth_init(uefi_vars_state *uv);
+
+/* vars-service-policy.c */
+extern const VMStateDescription vmstate_uefi_var_policy;
+efi_status uefi_vars_policy_check(uefi_vars_state *uv,
+  uefi_variable *var,
+  gboolean is_newvar);
+void uefi_vars_policies_clear(uefi_vars_state *uv);
+uefi_var_policy *uefi_vars_add_policy(uefi_vars_state *uv,
+

[PATCH 04/16] hw/uefi: add var-service-guid.c

2023-11-15 Thread Gerd Hoffmann

Add variables for a bunch of GUIDs we will need.

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/var-service-guid.c | 61 ++
 1 file changed, 61 insertions(+)
 create mode 100644 hw/uefi/var-service-guid.c

diff --git a/hw/uefi/var-service-guid.c b/hw/uefi/var-service-guid.c
new file mode 100644
index ..afdc15c4e7e6
--- /dev/null
+++ b/hw/uefi/var-service-guid.c
@@ -0,0 +1,61 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi vars device - GUIDs
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/dma.h"
+
+#include "hw/uefi/var-service.h"
+
+/* variable namespaces */
+
+QemuUUID EfiGlobalVariable = {
+.data = UUID_LE(0x8be4df61, 0x93ca, 0x11d2, 0xaa, 0x0d,
+0x00, 0xe0, 0x98, 0x03, 0x2b, 0x8c)
+};
+
+QemuUUID EfiImageSecurityDatabase = {
+.data = UUID_LE(0xd719b2cb, 0x3d3a, 0x4596, 0xa3, 0xbc,
+0xda, 0xd0, 0x0e, 0x67, 0x65, 0x6f)
+};
+
+QemuUUID EfiCustomModeEnable = {
+.data = UUID_LE(0xc076ec0c, 0x7028, 0x4399, 0xa0, 0x72,
+0x71, 0xee, 0x5c, 0x44, 0x8b, 0x9f)
+};
+
+QemuUUID EfiSecureBootEnableDisable = {
+.data = UUID_LE(0xf0a30bc7, 0xaf08, 0x4556, 0x99, 0xc4,
+0x0, 0x10, 0x9, 0xc9, 0x3a, 0x44)
+};
+
+/* protocols */
+
+QemuUUID EfiSmmVariableProtocolGuid = {
+.data = UUID_LE(0xed32d533, 0x99e6, 0x4209, 0x9c, 0xc0,
+0x2d, 0x72, 0xcd, 0xd9, 0x98, 0xa7)
+};
+
+QemuUUID VarCheckPolicyLibMmiHandlerGuid = {
+.data = UUID_LE(0xda1b0d11, 0xd1a7, 0x46c4, 0x9d, 0xc9,
+0xf3, 0x71, 0x48, 0x75, 0xc6, 0xeb)
+};
+
+/* events */
+
+QemuUUID EfiEndOfDxeEventGroupGuid = {
+.data = UUID_LE(0x02CE967A, 0xDD7E, 0x4FFC, 0x9E, 0xE7,
+0x81, 0x0C, 0xF0, 0x47, 0x08, 0x80)
+};
+
+QemuUUID EfiEventReadyToBootGuid = {
+.data = UUID_LE(0x7CE88FB3, 0x4BD7, 0x4679, 0x87, 0xA8,
+0xA8, 0xD8, 0xDE, 0xE5, 0x0D, 0x2B)
+};
+
+QemuUUID EfiEventExitBootServicesGuid = {
+.data = UUID_LE(0x27ABF055, 0xB1B8, 0x4C26, 0x80, 0x48,
+0x74, 0x8F, 0x37, 0xBA, 0xA2, 0xDF)
+};
-- 
2.41.0

[PATCH 09/16] hw/uefi: add support for storing persistent variables on disk

2023-11-15 Thread Gerd Hoffmann

Define qapi schema for the uefi variable store state.

Use it and the generated visitor helper functions to
store persistent variables in JSON format on disk.

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/var-service-json.c | 194 +
 qapi/meson.build   |   1 +
 qapi/qapi-schema.json  |   1 +
 qapi/uefi.json |  40 
 4 files changed, 236 insertions(+)
 create mode 100644 hw/uefi/var-service-json.c
 create mode 100644 qapi/uefi.json

diff --git a/hw/uefi/var-service-json.c b/hw/uefi/var-service-json.c
new file mode 100644
index ..d8d74945bbf1
--- /dev/null
+++ b/hw/uefi/var-service-json.c
@@ -0,0 +1,194 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi vars device - serialize non-volatile varstore from/to json,
+ *using qapi
+ *
+ * tools which can read/write these json files:
+ *  - https://gitlab.com/kraxel/virt-firmware
+ *  - https://github.com/awslabs/python-uefivars
+ */
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "sysemu/dma.h"
+
+#include "hw/uefi/var-service.h"
+
+#include "qapi/dealloc-visitor.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
+#include "qapi/qmp/qobject.h"
+#include "qapi/qmp/qjson.h"
+#include "qapi/qapi-types-uefi.h"
+#include "qapi/qapi-visit-uefi.h"
+
+static UefiVarStore *uefi_vars_to_qapi(uefi_vars_state *uv)
+{
+static const char hex[] = {
+'0', '1', '2', '3', '4', '5', '6', '7',
+'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+};
+UefiVarStore *vs;
+UefiVariableList **tail;
+UefiVariable *v;
+QemuUUID be;
+uefi_variable *var;
+uint8_t *data;
+unsigned int i;
+
+vs = g_new0(UefiVarStore, 1);
+vs->version = 2;
+tail = &vs->variables;
+
+QTAILQ_FOREACH(var, &uv->variables, next) {
+if (!(var->attributes & EFI_VARIABLE_NON_VOLATILE)) {
+continue;
+}
+
+v = g_new0(UefiVariable, 1);
+be = qemu_uuid_bswap(var->guid);
+v->guid = qemu_uuid_unparse_strdup(&be);
+v->name = uefi_ucs2_to_ascii(var->name, var->name_size);
+v->attr = var->attributes;
+
+v->data = g_malloc(var->data_size * 2 + 1);
+data = var->data;
+for (i = 0; i < var->data_size * 2;) {
+v->data[i++] = hex[*data >> 4];
+v->data[i++] = hex[*data & 15];
+data++;
+}
+v->data[i++] = 0;
+
+QAPI_LIST_APPEND(tail, v);
+}
+return vs;
+}
+
+static unsigned parse_hexchar(char c)
+{
+switch (c) {
+case '0' ... '9': return c - '0';
+case 'a' ... 'f': return c - 'a' + 0xa;
+case 'A' ... 'F': return c - 'A' + 0xA;
+default: return 0;
+}
+}
+
+static void uefi_vars_from_qapi(uefi_vars_state *uv, UefiVarStore *vs)
+{
+UefiVariableList *item;
+UefiVariable *v;
+QemuUUID be;
+uefi_variable *var;
+uint8_t *data;
+size_t i, len;
+
+for (item = vs->variables; item != NULL; item = item->next) {
+v = item->value;
+
+var = g_new0(uefi_variable, 1);
+var->attributes = v->attr;
+qemu_uuid_parse(v->guid, &be);
+var->guid = qemu_uuid_bswap(be);
+
+len = strlen(v->name);
+var->name_size = len * 2 + 2;
+var->name = g_malloc(var->name_size);
+for (i = 0; i <= len; i++) {
+var->name[i] = v->name[i];
+}
+
+len = strlen(v->data);
+var->data_size = len / 2;
+var->data = data = g_malloc(var->data_size);
+for (i = 0; i < len; i += 2) {
+*(data++) =
+parse_hexchar(v->data[i]) << 4 |
+parse_hexchar(v->data[i + 1]);
+}
+
+QTAILQ_INSERT_TAIL(&uv->variables, var, next);
+}
+}
+
+static GString *uefi_vars_to_json(uefi_vars_state *uv)
+{
+UefiVarStore *vs = uefi_vars_to_qapi(uv);
+QObject *qobj = NULL;
+Visitor *v;
+GString *gstr;
+
+v = qobject_output_visitor_new(&qobj);
+if (visit_type_UefiVarStore(v, NULL, &vs, NULL)) {
+visit_complete(v, &qobj);
+}
+visit_free(v);
+qapi_free_UefiVarStore(vs);
+
+gstr = qobject_to_json_pretty(qobj, true);
+qobject_unref(qobj);
+
+return gstr;
+}
+
+void uefi_vars_json_init(uefi_vars_state *uv, Error **errp)
+{
+if (uv->jsonfile) {
+uv->jsonfd = qemu_create(uv->jsonfile, O_RDWR, 0666, errp);
+}
+}
+
+void uefi_vars_json_save(uefi_vars_state *uv)
+{
+GString *gstr;
+
+if (uv->jsonfd == -1) {
+return;
+}
+
+gstr = uefi_vars_to_json(uv);
+
+lseek(uv->jsonfd, 0, SEEK_SET);
+write(uv->jsonfd, gstr->str, gstr->len);
+ftruncate(uv->jsonfd, gstr->len);
+fsync(uv->jsonfd);
+
+g_string_free(gstr, true);
+}
+
+void uefi_vars_json_load(uefi_vars_state *uv, Error **errp)
+{
+UefiVarStore *vs;
+QObject *qobj;
+Visitor *v;
+char *str;
+size_t len;
+
+if (uv->jsonfd == -1) {
+return;
+

[PATCH 01/16] hw/uefi: add include/hw/uefi/var-service-api.h

2023-11-15 Thread Gerd Hoffmann

This file defines the register interface of the uefi-vars device.
It's only a handful of registers: magic value, command and status
registers, location and size of the communication buffer.

Signed-off-by: Gerd Hoffmann 
---
 include/hw/uefi/var-service-api.h | 40 +++
 1 file changed, 40 insertions(+)
 create mode 100644 include/hw/uefi/var-service-api.h

diff --git a/include/hw/uefi/var-service-api.h 
b/include/hw/uefi/var-service-api.h
new file mode 100644
index ..37fdab32741f
--- /dev/null
+++ b/include/hw/uefi/var-service-api.h
@@ -0,0 +1,40 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi-vars device - API of the virtual device for guest/host communication.
+ */
+#ifndef QEMU_UEFI_VAR_SERVICE_API_H
+#define QEMU_UEFI_VAR_SERVICE_API_H
+
+
+/* isa: io range */
+#define UEFI_VARS_IO_BASE   0x520
+
+/* sysbus: fdt node path */
+#define UEFI_VARS_FDT_NODE   "qemu-uefi-vars"
+#define UEFI_VARS_FDT_COMPAT "qemu,uefi-vars"
+
+/* registers */
+#define UEFI_VARS_REG_MAGIC  0x00  /* 16 bit */
+#define UEFI_VARS_REG_CMD_STS0x02  /* 16 bit */
+#define UEFI_VARS_REG_BUFFER_SIZE0x04  /* 32 bit */
+#define UEFI_VARS_REG_BUFFER_ADDR_LO 0x08  /* 32 bit */
+#define UEFI_VARS_REG_BUFFER_ADDR_HI 0x0c  /* 32 bit */
+#define UEFI_VARS_REGS_SIZE  0x10
+
+/* magic value */
+#define UEFI_VARS_MAGIC_VALUE   0xef1
+
+/* command values */
+#define UEFI_VARS_CMD_RESET  0x01
+#define UEFI_VARS_CMD_MM 0x02
+
+/* status values */
+#define UEFI_VARS_STS_SUCCESS0x00
+#define UEFI_VARS_STS_BUSY   0x01
+#define UEFI_VARS_STS_ERR_UNKNOWN0x10
+#define UEFI_VARS_STS_ERR_NOT_SUPPORTED  0x11
+#define UEFI_VARS_STS_ERR_BAD_BUFFER_SIZE0x12
+
+
+#endif /* QEMU_UEFI_VAR_SERVICE_API_H */
-- 
2.41.0

[PATCH 14/16] hw/uefi: add uefi-vars-isa device

2023-11-15 Thread Gerd Hoffmann

This adds isa bindings for the variable service.

Usage: qemu-system-x86_64 -device uefi-vars-isa,jsonfile=/path/to/uefivars.json

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/var-service-isa.c | 88 +++
 hw/uefi/Kconfig   |  6 +++
 hw/uefi/meson.build   |  5 +++
 3 files changed, 99 insertions(+)
 create mode 100644 hw/uefi/var-service-isa.c

diff --git a/hw/uefi/var-service-isa.c b/hw/uefi/var-service-isa.c
new file mode 100644
index ..bdb270c2a643
--- /dev/null
+++ b/hw/uefi/var-service-isa.c
@@ -0,0 +1,88 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi vars device - ISA variant for x64.
+ */
+#include "qemu/osdep.h"
+#include "migration/vmstate.h"
+
+#include "hw/isa/isa.h"
+#include "hw/qdev-properties.h"
+
+#include "hw/uefi/var-service.h"
+#include "hw/uefi/var-service-api.h"
+
+#define TYPE_UEFI_VARS_ISA "uefi-vars-isa"
+OBJECT_DECLARE_SIMPLE_TYPE(uefi_vars_isa_state, UEFI_VARS_ISA)
+
+struct uefi_vars_isa_state {
+ISADevice parent_obj;
+struct uefi_vars_state state;
+};
+
+static const VMStateDescription vmstate_uefi_vars_isa = {
+.name = "uefi-vars-isa",
+.fields = (VMStateField[]) {
+VMSTATE_STRUCT(state, uefi_vars_isa_state, 0,
+   vmstate_uefi_vars, uefi_vars_state),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static Property uefi_vars_isa_properties[] = {
+DEFINE_PROP_SIZE("size", uefi_vars_isa_state, state.max_storage,
+ 256 * 1024),
+DEFINE_PROP_STRING("jsonfile", uefi_vars_isa_state, state.jsonfile),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void uefi_vars_isa_init(Object *obj)
+{
+uefi_vars_isa_state *uv = UEFI_VARS_ISA(obj);
+
+uefi_vars_init(obj, &uv->state);
+}
+
+static void uefi_vars_isa_reset(DeviceState *dev)
+{
+uefi_vars_isa_state *uv = UEFI_VARS_ISA(dev);
+
+uefi_vars_hard_reset(&uv->state);
+}
+
+static void uefi_vars_isa_realize(DeviceState *dev, Error **errp)
+{
+uefi_vars_isa_state *uv = UEFI_VARS_ISA(dev);
+ISADevice *isa = ISA_DEVICE(dev);
+
+isa_register_ioport(isa, &uv->state.mr, UEFI_VARS_IO_BASE);
+uefi_vars_realize(&uv->state, errp);
+}
+
+static void uefi_vars_isa_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->realize = uefi_vars_isa_realize;
+dc->reset = uefi_vars_isa_reset;
+dc->vmsd = &vmstate_uefi_vars_isa;
+device_class_set_props(dc, uefi_vars_isa_properties);
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo uefi_vars_isa_info = {
+.name  = TYPE_UEFI_VARS_ISA,
+.parent= TYPE_ISA_DEVICE,
+.instance_size = sizeof(uefi_vars_isa_state),
+.instance_init = uefi_vars_isa_init,
+.class_init= uefi_vars_isa_class_init,
+};
+module_obj(TYPE_UEFI_VARS_ISA);
+module_dep("hw-uefi-vars");
+
+static void uefi_vars_isa_register_types(void)
+{
+type_register_static(&uefi_vars_isa_info);
+}
+
+type_init(uefi_vars_isa_register_types)
diff --git a/hw/uefi/Kconfig b/hw/uefi/Kconfig
index ca6c2bc46a96..feb9f6de5e30 100644
--- a/hw/uefi/Kconfig
+++ b/hw/uefi/Kconfig
@@ -1,3 +1,9 @@
 config UEFI_VARS
bool
 default y if X86_64 || AARCH64
+
+config UEFI_VARS_ISA
+   bool
+default y
+depends on UEFI_VARS
+depends on ISA_BUS
diff --git a/hw/uefi/meson.build b/hw/uefi/meson.build
index dc363d67..959d2a630bbf 100644
--- a/hw/uefi/meson.build
+++ b/hw/uefi/meson.build
@@ -8,6 +8,11 @@ uefi_vars_ss.add(when: 'CONFIG_UEFI_VARS',
 'var-service-policy.c',
 'var-service-sysbus.c'))
 
+uefi_vars_isa_ss = ss.source_set()
+uefi_vars_isa_ss.add(when: 'CONFIG_UEFI_VARS_ISA',
+ if_true: files('var-service-isa.c'))
+
 modules += { 'hw-uefi' : {
 'vars' : uefi_vars_ss,
+'vars-isa' : uefi_vars_isa_ss,
 }}
-- 
2.41.0

[PATCH 11/16] hw/uefi: add to Kconfig

2023-11-15 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 hw/Kconfig  | 1 +
 hw/uefi/Kconfig | 3 +++
 2 files changed, 4 insertions(+)
 create mode 100644 hw/uefi/Kconfig

diff --git a/hw/Kconfig b/hw/Kconfig
index 9ca7b38c31f1..af41bd4e0b40 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -38,6 +38,7 @@ source smbios/Kconfig
 source ssi/Kconfig
 source timer/Kconfig
 source tpm/Kconfig
+source uefi/Kconfig
 source ufs/Kconfig
 source usb/Kconfig
 source virtio/Kconfig
diff --git a/hw/uefi/Kconfig b/hw/uefi/Kconfig
new file mode 100644
index ..ca6c2bc46a96
--- /dev/null
+++ b/hw/uefi/Kconfig
@@ -0,0 +1,3 @@
+config UEFI_VARS
+   bool
+default y if X86_64 || AARCH64
-- 
2.41.0

[PATCH 06/16] hw/uefi: add var-service-vars.c

2023-11-15 Thread Gerd Hoffmann

This is the uefi variable service (EfiSmmVariableProtocol),
providing functions for reading and writing variables.

Signed-off-by: Gerd Hoffmann 
---
 hw/uefi/var-service-vars.c | 602 +
 1 file changed, 602 insertions(+)
 create mode 100644 hw/uefi/var-service-vars.c

diff --git a/hw/uefi/var-service-vars.c b/hw/uefi/var-service-vars.c
new file mode 100644
index ..99851a057bb6
--- /dev/null
+++ b/hw/uefi/var-service-vars.c
@@ -0,0 +1,602 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * uefi vars device - EfiSmmVariableProtocol implementation
+ */
+#include "qemu/osdep.h"
+#include "sysemu/dma.h"
+#include "migration/vmstate.h"
+
+#include "hw/uefi/var-service.h"
+#include "hw/uefi/var-service-api.h"
+#include "hw/uefi/var-service-edk2.h"
+
+#include "trace/trace-hw_uefi.h"
+
+const VMStateDescription vmstate_uefi_variable = {
+.name = "uefi-variable",
+.fields = (VMStateField[]) {
+VMSTATE_UINT8_ARRAY_V(guid.data, uefi_variable, sizeof(QemuUUID), 0),
+VMSTATE_UINT32(name_size, uefi_variable),
+VMSTATE_UINT32(data_size, uefi_variable),
+VMSTATE_UINT32(attributes, uefi_variable),
+VMSTATE_VBUFFER_ALLOC_UINT32(name, uefi_variable, 0, NULL, name_size),
+VMSTATE_VBUFFER_ALLOC_UINT32(data, uefi_variable, 0, NULL, data_size),
+VMSTATE_END_OF_LIST()
+},
+};
+
+uefi_variable *uefi_vars_find_variable(uefi_vars_state *uv, QemuUUID guid,
+   const uint16_t *name, uint64_t 
name_size)
+{
+uefi_variable *var;
+
+QTAILQ_FOREACH(var, &uv->variables, next) {
+if (!uefi_str_equal(var->name, var->name_size,
+name, name_size)) {
+continue;
+}
+if (!qemu_uuid_is_equal(&var->guid, &guid)) {
+continue;
+}
+return var;
+}
+return NULL;
+}
+
+static uefi_variable *add_variable(uefi_vars_state *uv, QemuUUID guid,
+   const uint16_t *name, uint64_t name_size,
+   uint32_t attributes)
+{
+uefi_variable *var;
+
+var = g_new0(uefi_variable, 1);
+var->guid = guid;
+var->name = g_malloc(name_size);
+memcpy(var->name, name, name_size);
+var->name_size = name_size;
+var->attributes = attributes;
+
+QTAILQ_INSERT_TAIL(&uv->variables, var, next);
+return var;
+}
+
+static void del_variable(uefi_vars_state *uv, uefi_variable *var)
+{
+if (!var) {
+return;
+}
+
+QTAILQ_REMOVE(&uv->variables, var, next);
+g_free(var->data);
+g_free(var->name);
+g_free(var);
+}
+
+static size_t variable_size(uefi_variable *var)
+{
+size_t size;
+
+size  = sizeof(*var);
+size += var->name_size;
+size += var->data_size;
+return size;
+}
+
+void uefi_vars_set_variable(uefi_vars_state *uv, QemuUUID guid,
+const uint16_t *name, uint64_t name_size,
+uint32_t attributes,
+void *data, uint64_t data_size)
+{
+uefi_variable *old_var, *new_var;
+
+uefi_trace_variable(__func__, guid, name, name_size);
+
+old_var = uefi_vars_find_variable(uv, guid, name, name_size);
+if (old_var) {
+uv->used_storage -= variable_size(old_var);
+del_variable(uv, old_var);
+}
+
+new_var = add_variable(uv, guid, name, name_size, attributes);
+new_var->data = g_malloc(data_size);
+new_var->data_size = data_size;
+memcpy(new_var->data, data, data_size);
+uv->used_storage += variable_size(new_var);
+}
+
+void uefi_vars_clear_volatile(uefi_vars_state *uv)
+{
+uefi_variable *var, *n;
+
+QTAILQ_FOREACH_SAFE(var, &uv->variables, next, n) {
+if (var->attributes & EFI_VARIABLE_NON_VOLATILE) {
+continue;
+}
+uv->used_storage -= variable_size(var);
+del_variable(uv, var);
+}
+}
+
+void uefi_vars_clear_all(uefi_vars_state *uv)
+{
+uefi_variable *var, *n;
+
+QTAILQ_FOREACH_SAFE(var, &uv->variables, next, n) {
+del_variable(uv, var);
+}
+uv->used_storage = 0;
+}
+
+void uefi_vars_update_storage(uefi_vars_state *uv)
+{
+uefi_variable *var;
+
+uv->used_storage = 0;
+QTAILQ_FOREACH(var, &uv->variables, next) {
+uv->used_storage += variable_size(var);
+}
+}
+
+static efi_status check_secure_boot(uefi_vars_state *uv, uefi_variable *var)
+{
+static const uint16_t pk[]  = { 'P', 'K', 0 };
+static const uint16_t kek[] = { 'K', 'E', 'K', 0 };
+static const uint16_t db[]  = { 'd', 'b', 0 };
+static const uint16_t dbx[] = { 'd', 'b', 'x', 0 };
+
+/* TODO (reject for now) */
+if (qemu_uuid_is_equal(&var->guid, &EfiGlobalVariable) &&
+uefi_str_equal(var->name, var->name_size, pk, sizeof(pk))) {
+return EFI_WRITE_PROTECTED;
+}
+if (qemu_uuid_is_equal(&var->guid, &EfiGlobalVariable) &&
+uefi_str_equal(var->name, v

[PATCH 16/16] docs: add uefi variable service documentation and TODO list.

2023-11-15 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 docs/devel/index-internals.rst |  1 +
 docs/devel/uefi-vars.rst   | 66 ++
 hw/uefi/TODO.md| 17 +
 3 files changed, 84 insertions(+)
 create mode 100644 docs/devel/uefi-vars.rst
 create mode 100644 hw/uefi/TODO.md

diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst
index 6f81df92bcab..eee676704cfa 100644
--- a/docs/devel/index-internals.rst
+++ b/docs/devel/index-internals.rst
@@ -17,6 +17,7 @@ Details about QEMU's various subsystems including how to add 
features to them.
s390-cpu-topology
s390-dasd-ipl
tracing
+   uefi-vars
vfio-migration
writing-monitor-commands
virtio-backends
diff --git a/docs/devel/uefi-vars.rst b/docs/devel/uefi-vars.rst
new file mode 100644
index ..8da69f3545af
--- /dev/null
+++ b/docs/devel/uefi-vars.rst
@@ -0,0 +1,66 @@
+==
+UEFI variables
+==
+
+Guest UEFI variable management
+==
+
+Traditional approach for UEFI Variable storage in qemu guests is to
+work as close as possible to physical hardware.  That means provide
+pflash as storage and leave the management of variables and flash to
+the guest.
+
+Secure boot support comes with the requirement that the UEFI variable
+storage must be protected against direct access by the OS.  All update
+requests must pass the sanity checks.  (Parts of) the firmware must
+run with a higher priviledge level than the OS so this can be enforced
+by the firmware.  On x86 this has been implemented using System
+Management Mode (SMM) in qemu and kvm, which again is the same
+approach taken by physical hardware.  Only priviedged code running in
+SMM mode is allowed to access flash storage.
+
+Communication with the firmware code running in SMM mode works by
+serializing the requests to a shared buffer, then trapping into SMM
+mode via SMI.  The SMM code processes the request, stores the reply in
+the same buffer and returns.
+
+Host UEFI variable service
+==
+
+Instead of running the priviledged code inside the guest we can run it
+on the host.  The serialization protocol cen be reused.  The
+communication with the host uses a virtual device, which essentially
+allows to configure the shared buffer location and size and to trap to
+the host to process the requests.
+
+The ``uefi-vars`` device implements the UEFI virtual device.  It comes
+in ``uefi-vars-isa`` and ``uefi-vars-sysbus`` flavours.  The device
+reimplements the handlers needed, specifically
+``EfiSmmVariableProtocol`` and ``VarCheckPolicyLibMmiHandler``.  It
+also consumes events (``EfiEndOfDxeEventGroup``,
+``EfiEventReadyToBoot`` and ``EfiEventExitBootServices``).
+
+The advantage of the approach is that we do not need a special
+prividge level for the firmware to protect itself, i.e. it does not
+depend on SMM emulation on x64, which allows to remove a bunch of
+complex code for SMM emulation from the linux kernel
+(CONFIG_KVM_SMM=n).  It also allows to support secure boot on arm
+without implementing secure world (el3) emulation in kvm.
+
+Of course there are also downsides.  The added device increases the
+attack surface of the host, and we are adding some code duplication
+because we have to reimplement some edk2 functionality in qemu.
+
+usage on x86_64 (isa)
+-
+
+.. code::
+
+   qemu-system-x86_64 -device uefi-vars-isa,jsonfile=/path/to/vars.json
+
+usage on aarch64 (sysbus)
+-
+
+.. code::
+
+   qemu-system-aarch64 -M virt,x-uefi-vars=on
diff --git a/hw/uefi/TODO.md b/hw/uefi/TODO.md
new file mode 100644
index ..5d1cd15a798e
--- /dev/null
+++ b/hw/uefi/TODO.md
@@ -0,0 +1,17 @@
+
+uefi variable service - todo list
+-
+
+* implement reading/writing variable update time.
+* implement authenticated variable updates.
+  - used for 'dbx' updates.
+
+known issues and limitations
+
+
+* secure boot variables are read-only
+  - due to auth vars not being implemented yet.
+* works only on little endian hosts
+  - accessing structs in guest ram is done without endian conversion.
+* works only for 64-bit guests
+  - UINTN is mapped to uint64_t, for 32-bit guests that would be uint32_t
-- 
2.41.0

[PATCH-for-8.2] hw/net/can/xlnx-zynqmp: Avoid underflow while popping TX FIFO

2023-11-15 Thread Philippe Mathieu-Daudé

Per https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Message-Format

  Message Format

  The same message format is used for RXFIFO, TXFIFO, and TXHPB.
  Each message includes four words (16 bytes). Software must read
  and write all four words regardless of the actual number of data
  bytes and valid fields in the message.

There is no mention in this reference manual about what the
hardware does when not all four words are written. To fix the
reported underflow behavior when DATA2 register is written,
I choose to fill the data with the previous content of the
ID / DLC / DATA1 registers, which is how I expect hardware
would do.

Note there is no hardware flag raised under such condition.

Reported-by: Qiang Liu 
Fixes: 98e5d7a2b7 ("hw/net/can: Introduce Xilinx ZynqMP CAN controller")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1425
Signed-off-by: Philippe Mathieu-Daudé 
---
Tested with the CAN tests from 'make check-qtest-aarch64'
---
 hw/net/can/xlnx-zynqmp-can.c | 49 +---
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/hw/net/can/xlnx-zynqmp-can.c b/hw/net/can/xlnx-zynqmp-can.c
index e93e6c5e19..58938b574e 100644
--- a/hw/net/can/xlnx-zynqmp-can.c
+++ b/hw/net/can/xlnx-zynqmp-can.c
@@ -434,6 +434,51 @@ static bool tx_ready_check(XlnxZynqMPCANState *s)
 return true;
 }
 
+static void read_tx_frame(XlnxZynqMPCANState *s, Fifo32 *fifo, uint32_t *data)
+{
+unsigned used = fifo32_num_used(fifo);
+bool is_txhpb = fifo == &s->txhpb_fifo;
+
+assert(used > 0);
+used %= CAN_FRAME_SIZE;
+
+/*
+ * Frame Message Format
+ *
+ * Each frame includes four words (16 bytes). Software must read and write
+ * all four words regardless of the actual number of data bytes and valid
+ * fields in the message.
+ * If software misbehave (not writting all four words), we use the previous
+ * registers content to initialize each missing word.
+ */
+if (used > 0) {
+/* ID, DLC, DATA1 missing */
+data[0] = s->regs[is_txhpb ? R_TXHPB_ID : R_TXFIFO_ID];
+} else {
+data[0] = fifo32_pop(fifo);
+}
+if (used == 1 || used == 2) {
+/* DLC, DATA1 missing */
+data[1] = s->regs[is_txhpb ? R_TXHPB_DLC : R_TXFIFO_DLC];
+} else {
+data[1] = fifo32_pop(fifo);
+}
+if (used == 1) {
+/* DATA1 missing */
+data[2] = s->regs[is_txhpb ? R_TXHPB_DATA1 : R_TXFIFO_DATA1];
+} else {
+data[2] = fifo32_pop(fifo);
+}
+/* DATA2 triggered the transfer thus is always available */
+data[3] = fifo32_pop(fifo);
+
+if (used) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Incomplete CAN frame (only %u/%u slots used)\n",
+  TYPE_XLNX_ZYNQMP_CAN, used, CAN_FRAME_SIZE);
+}
+}
+
 static void transfer_fifo(XlnxZynqMPCANState *s, Fifo32 *fifo)
 {
 qemu_can_frame frame;
@@ -451,9 +496,7 @@ static void transfer_fifo(XlnxZynqMPCANState *s, Fifo32 
*fifo)
 }
 
 while (!fifo32_is_empty(fifo)) {
-for (i = 0; i < CAN_FRAME_SIZE; i++) {
-data[i] = fifo32_pop(fifo);
-}
+read_tx_frame(s, fifo, data);
 
 if (ARRAY_FIELD_EX32(s->regs, STATUS_REGISTER, LBACK)) {
 /*
-- 
2.41.0

QEMU snapshotting

2023-11-15 Thread Brian Cain

Alexander, Bandan, Paolo, Stefan, Manuel,

Hi, I'm Brian and I maintain the Hexagon arch for QEMU.  Elia, a security 
researcher at Qualcomm is exploring ways to fuzz some hexagon OS kernel with 
QEMU and in particular leveraging snapshotting, inspired by your research and 
more.  I'm not an expert on the details, but I'd like to make an introduction 
and see if there's an opportunity for us to learn from one another.  Maybe we 
can have a call to kick things off?

-Brian

Re: [PATCH] monitor: flush messages on abort

2023-11-15 Thread Steven Sistare




On 11/6/2023 5:10 AM, Daniel P. Berrangé wrote:
> On Fri, Nov 03, 2023 at 03:51:00PM -0400, Steven Sistare wrote:
>> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
>>> On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
 Buffered monitor output is lost when abort() is called.  The pattern
 error_report() followed by abort() occurs about 60 times, so valuable
 information is being lost when the abort is called in the context of a
 monitor command.
>>>
>>> I'm curious, was there a particular abort() scenario that you hit ?
>>
>> Yes, while tweaking the suspended state, and forgetting to add transitions:
>>
>> error_report("invalid runstate transition: '%s' -> '%s'",
>> abort();
>>
>> But I have previously hit this for other errors.
>>
>>> For some crude statistics:
>>>
>>>   $ for i in abort return exit goto ; do echo -n "$i: " ; git grep --after 
>>> 1 error_report | grep $i | wc -l ; done
>>>   abort: 47
>>>   return: 512
>>>   exit: 458
>>>   goto: 177
>>>
>>> to me those numbers say that calling "abort()" after error_report
>>> should be considered a bug, and we can blanket replace all the
>>> abort() calls with exit(EXIT_FAILURE), and thus avoid the need to
>>> special case flushing the monitor.
>>
>> And presumably add an atexit handler to flush the monitor ala monitor_abort.
>> AFAICT currently no destructor is called for the monitor at exit time.
> 
> The HMP monitor flushes at each newline,  and exit() will take care of
> flushing stdout, so I don't think there's anything else needed.
> 
>>> Also I think there's a decent case to be made for error_report()
>>> to call monitor_flush().
>>
>> A good start, but that would not help for monitors with skip_flush=true, 
>> which 
>> need to format the buffered string in a json response, which is the case I 
>> tripped over.
> 
> 'skip_flush' is only set to 'true' when using a QMP monitor and invoking
> "hmp-monitor-command".

OK, that is narrower than I thought.  Now I see that other QMP monitors send 
error_report() to stderr, hence it is visible after abort and exit:

int error_vprintf(const char *fmt, va_list ap) {
if (cur_mon && !monitor_cur_is_qmp())
return monitor_vprintf(cur_mon, fmt, ap);
return vfprintf(stderr, fmt, ap);<-- HERE

That surprises me, I thought that would be returned to the monitor caller in the
json response. I guess the rationale is that the "main" error, if any, will be
set and returned by the err object that is passed down the stack during command
evaluation.

> In such a case, the error message needs to be built into a JSON error
> reply and sent over the socket. Your patch doesn't help this case
> since you've just printed to stderr.  

Same as vfprintf above!

> I don't think it is reasonable
> to expect QMP monitors to send replies on SIG_ABRT anyway. 

I agree.  My patch causes the error to be seen somewhere, anywhere, instead
of being dropped on the floor.

> So I don't
> think the skip_flush=true scenario is a problem to be concerned with.

It is indeed a narrow case, and not worth much effort or code change.
I'm inclined to drop it, but I appreciate the time you have spent reviewing it.

- Steve

 To fix, install a SIGABRT handler to flush the monitor buffer to stderr.

 Signed-off-by: Steve Sistare 
 ---
  monitor/monitor.c | 38 ++
  1 file changed, 38 insertions(+)

 diff --git a/monitor/monitor.c b/monitor/monitor.c
 index dc352f9..65dace0 100644
 --- a/monitor/monitor.c
 +++ b/monitor/monitor.c
 @@ -701,6 +701,43 @@ void monitor_cleanup(void)
  }
  }
  
 +#ifdef CONFIG_LINUX
 +
 +static void monitor_abort(int signal, siginfo_t *info, void *c)
 +{
 +Monitor *mon = monitor_cur();
 +
 +if (!mon || qemu_mutex_trylock(&mon->mon_lock)) {
 +return;
 +}
 +
 +if (mon->outbuf && mon->outbuf->len) {
 +fputs("SIGABRT received: ", stderr);
 +fputs(mon->outbuf->str, stderr);
 +if (mon->outbuf->str[mon->outbuf->len - 1] != '\n') {
 +fputc('\n', stderr);
 +}
 +}
 +
 +qemu_mutex_unlock(&mon->mon_lock);
>>>
>>> The SIGABRT handling does not only fire in response to abort()
>>> calls, but also in response to bad memory scenarios, so we have
>>> to be careful what we do in signal handlers.
>>>
>>> In particular using mutexes in signal handlers is a big red
>>> flag generally. Mutex APIs are not declare async signal
>>> safe, so this code is technically a POSIX compliance
>>> violation.
>>
>> Righto.  I would need to mask all signals in the sigaction to be on the 
>> safe(r) side.
> 
> This is still doomed, because SIGABRT could fire while 'mon_lock' is
> already held, and so this code would deadlock trying to acquire the
> lock.
> 
>>> So I think we'd be safer just eliminating the explicit abort()
>>> calls and ad

[PATCH-for-8.2?] tests/avocado: Make fetch_asset() inconditionally require a crypto hash

2023-11-15 Thread Philippe Mathieu-Daudé

In a perfect world we'd have reproducible tests,
but then we'd be sure we run the same binaries.
If a binary artifact isn't hashed, we have no idea
what we are running. Therefore enforce hashing for
all our artifacts.

With this change, unhashed artifacts produce:

  $ avocado run tests/avocado/multiprocess.py
   (1/2) tests/avocado/multiprocess.py:Multiprocess.test_multiprocess_x86_64:
   ERROR: QemuBaseTest.fetch_asset() missing 1 required positional argument: 
'asset_hash' (0.19 s)

Inspired-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
---
Based-on: <20231115145852.494052-1-th...@redhat.com>
  "tests/avocado/multiprocess: Add asset hashes to silence warnings"
---
 tests/avocado/avocado_qemu/__init__.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index d71e989db6..304c428168 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -254,7 +254,7 @@ def setUp(self, bin_prefix):
 self.cancel("No QEMU binary defined or found in the build tree")
 
 def fetch_asset(self, name,
-asset_hash=None, algorithm=None,
+asset_hash, algorithm=None,
 locations=None, expire=None,
 find_only=False, cancel_on_missing=True):
 return super().fetch_asset(name,
-- 
2.41.0

Re: [PATCH-for-8.2?] tests/avocado: Make fetch_asset() inconditionally require a crypto hash

2023-11-15 Thread Philippe Mathieu-Daudé


On 15/11/23 16:32, Philippe Mathieu-Daudé wrote:

In a perfect world we'd have reproducible tests,
but then we'd be sure we run the same binaries.
If a binary artifact isn't hashed, we have no idea
what we are running. Therefore enforce hashing for
all our artifacts.

With this change, unhashed artifacts produce:

   $ avocado run tests/avocado/multiprocess.py
(1/2) tests/avocado/multiprocess.py:Multiprocess.test_multiprocess_x86_64:
ERROR: QemuBaseTest.fetch_asset() missing 1 required positional argument: 
'asset_hash' (0.19 s)

Inspired-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
---
Based-on: <20231115145852.494052-1-th...@redhat.com>
   "tests/avocado/multiprocess: Add asset hashes to silence warnings"


and:

Based-on: <20231114143531.291820-1-th...@redhat.com>
"tests/avocado/intel_iommu: Add asset hashes to avoid warnings"


---
  tests/avocado/avocado_qemu/__init__.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index d71e989db6..304c428168 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -254,7 +254,7 @@ def setUp(self, bin_prefix):
  self.cancel("No QEMU binary defined or found in the build tree")
  
  def fetch_asset(self, name,

-asset_hash=None, algorithm=None,
+asset_hash, algorithm=None,
  locations=None, expire=None,
  find_only=False, cancel_on_missing=True):
  return super().fetch_asset(name,

Re: [PATCH-for-8.2?] tests/avocado: Make fetch_asset() inconditionally require a crypto hash

2023-11-15 Thread Thomas Huth


On 15/11/2023 16.32, Philippe Mathieu-Daudé wrote:

In a perfect world we'd have reproducible tests,
but then we'd be sure we run the same binaries.
If a binary artifact isn't hashed, we have no idea
what we are running. Therefore enforce hashing for
all our artifacts.

With this change, unhashed artifacts produce:

   $ avocado run tests/avocado/multiprocess.py
(1/2) tests/avocado/multiprocess.py:Multiprocess.test_multiprocess_x86_64:
ERROR: QemuBaseTest.fetch_asset() missing 1 required positional argument: 
'asset_hash' (0.19 s)

Inspired-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
---
Based-on: <20231115145852.494052-1-th...@redhat.com>
   "tests/avocado/multiprocess: Add asset hashes to silence warnings"
---
  tests/avocado/avocado_qemu/__init__.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index d71e989db6..304c428168 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -254,7 +254,7 @@ def setUp(self, bin_prefix):
  self.cancel("No QEMU binary defined or found in the build tree")
  
  def fetch_asset(self, name,

-asset_hash=None, algorithm=None,
+asset_hash, algorithm=None,
  locations=None, expire=None,
  find_only=False, cancel_on_missing=True):
  return super().fetch_asset(name,


I think this already makes sense to avoid that those annoying warnings could 
get in again in the future!


Reviewed-by: Thomas Huth

Re: [PATCH] monitor: flush messages on abort

2023-11-15 Thread Steven Sistare

On 11/15/2023 3:41 AM, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
>> On Fri, Nov 03, 2023 at 03:51:00PM -0400, Steven Sistare wrote:
>>> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
 On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
> Buffered monitor output is lost when abort() is called.  The pattern
> error_report() followed by abort() occurs about 60 times, so valuable
> information is being lost when the abort is called in the context of a
> monitor command.

 I'm curious, was there a particular abort() scenario that you hit ?
>>>
>>> Yes, while tweaking the suspended state, and forgetting to add transitions:
>>>
>>> error_report("invalid runstate transition: '%s' -> '%s'",
>>> abort();
>>>
>>> But I have previously hit this for other errors.
> 
> Can you provide a reproducer?

I sometimes hit this when developing new code.  I do not have a reproducer for 
upstream
branches. The patch is aimed at helping developers, not users.

 For some crude statistics:

   $ for i in abort return exit goto ; do echo -n "$i: " ; git grep --after 
 1 error_report | grep $i | wc -l ; done
   abort: 47
   return: 512
   exit: 458
   goto: 177

 to me those numbers say that calling "abort()" after error_report
 should be considered a bug, and we can blanket replace all the
 abort() calls with exit(EXIT_FAILURE), and thus avoid the need to
 special case flushing the monitor.
>>>
>>> And presumably add an atexit handler to flush the monitor ala monitor_abort.
>>> AFAICT currently no destructor is called for the monitor at exit time.
>>
>> The HMP monitor flushes at each newline,  and exit() will take care of
>> flushing stdout, so I don't think there's anything else needed.
> 
> Correct.
> 
 Also I think there's a decent case to be made for error_report()
 to call monitor_flush().
> 
> No, because the messages printed by error_report() all end in newline,
> and printing a newline to a monitor triggers monitor_flush_locked().
> 
>>> A good start, but that would not help for monitors with skip_flush=true, 
>>> which 
>>> need to format the buffered string in a json response, which is the case I 
>>> tripped over.
>>
>> 'skip_flush' is only set to 'true' when using a QMP monitor and invoking
>> "hmp-monitor-command".
> 
> Correct.
> 
>> In such a case, the error message needs to be built into a JSON error
>> reply and sent over the socket. Your patch doesn't help this case
>> since you've just printed to stderr.  I don't think it is reasonable
>> to expect QMP monitors to send replies on SIG_ABRT anyway. So I don't
>> think the skip_flush=true scenario is a problem to be concerned with.
>>
> To fix, install a SIGABRT handler to flush the monitor buffer to stderr.
>
> Signed-off-by: Steve Sistare 
> ---
>  monitor/monitor.c | 38 ++
>  1 file changed, 38 insertions(+)
>
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index dc352f9..65dace0 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -701,6 +701,43 @@ void monitor_cleanup(void)
>  }
>  }
>  
> +#ifdef CONFIG_LINUX
> +
> +static void monitor_abort(int signal, siginfo_t *info, void *c)
> +{
> +Monitor *mon = monitor_cur();
> +
> +if (!mon || qemu_mutex_trylock(&mon->mon_lock)) {
> +return;
> +}
> +
> +if (mon->outbuf && mon->outbuf->len) {
> +fputs("SIGABRT received: ", stderr);
> +fputs(mon->outbuf->str, stderr);
> +if (mon->outbuf->str[mon->outbuf->len - 1] != '\n') {
> +fputc('\n', stderr);
> +}
> +}
> +
> +qemu_mutex_unlock(&mon->mon_lock);

 The SIGABRT handling does not only fire in response to abort()
 calls, but also in response to bad memory scenarios, so we have
 to be careful what we do in signal handlers.

 In particular using mutexes in signal handlers is a big red
 flag generally. Mutex APIs are not declare async signal
 safe, so this code is technically a POSIX compliance
 violation.
> 
> "Technically a POSIX compliance violation" sounds like something only
> pedants would care about.  It's actually a recipe for deadlocks and
> crashes.
> 
>>> Righto.  I would need to mask all signals in the sigaction to be on the 
>>> safe(r) side.
>>
>> This is still doomed, because SIGABRT could fire while 'mon_lock' is
>> already held, and so this code would deadlock trying to acquire the
>> lock.
> 
> Yup.
> 
> There is no way to make async signal unsafe code safe.

The handler calls trylock, not lock.  If it cannot get the lock, it bails.

However, I suppose pthread_mutex_trylock could in theory take and briefly hold
some internal lock as part of its implementation.

- Steve

 So I think we'd be safer just eliminating the explicit

Re: [PATCH 16/16] docs: add uefi variable service documentation and TODO list.

2023-11-15 Thread Eric Blake

On Wed, Nov 15, 2023 at 04:12:38PM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> ---
>  docs/devel/index-internals.rst |  1 +
>  docs/devel/uefi-vars.rst   | 66 ++
>  hw/uefi/TODO.md| 17 +
>  3 files changed, 84 insertions(+)
>  create mode 100644 docs/devel/uefi-vars.rst
>  create mode 100644 hw/uefi/TODO.md

> +
> +Guest UEFI variable management
> +==
> +
> +Traditional approach for UEFI Variable storage in qemu guests is to

The traditional

> +work as close as possible to physical hardware.  That means provide

providing

> +pflash as storage and leave the management of variables and flash to

leaving

> +the guest.

> +
> +Secure boot support comes with the requirement that the UEFI variable
> +storage must be protected against direct access by the OS.  All update
> +requests must pass the sanity checks.  (Parts of) the firmware must
> +run with a higher priviledge level than the OS so this can be enforced

privilege

> +by the firmware.  On x86 this has been implemented using System
> +Management Mode (SMM) in qemu and kvm, which again is the same
> +approach taken by physical hardware.  Only priviedged code running in

privileged

> +SMM mode is allowed to access flash storage.
> +
> +Communication with the firmware code running in SMM mode works by
> +serializing the requests to a shared buffer, then trapping into SMM
> +mode via SMI.  The SMM code processes the request, stores the reply in
> +the same buffer and returns.
> +
> +Host UEFI variable service
> +==
> +
> +Instead of running the priviledged code inside the guest we can run it

privileged

> +on the host.  The serialization protocol cen be reused.  The

can

> +communication with the host uses a virtual device, which essentially
> +allows to configure the shared buffer location and size and to trap to

s/allows to configure/configures/
s/and to trap/, and traps/

> +the host to process the requests.
> +
> +The ``uefi-vars`` device implements the UEFI virtual device.  It comes
> +in ``uefi-vars-isa`` and ``uefi-vars-sysbus`` flavours.  The device
> +reimplements the handlers needed, specifically
> +``EfiSmmVariableProtocol`` and ``VarCheckPolicyLibMmiHandler``.  It
> +also consumes events (``EfiEndOfDxeEventGroup``,
> +``EfiEventReadyToBoot`` and ``EfiEventExitBootServices``).
> +
> +The advantage of the approach is that we do not need a special
> +prividge level for the firmware to protect itself, i.e. it does not

privilege

> +depend on SMM emulation on x64, which allows to remove a bunch of

s/allows to remove/allows the removal of/

> +complex code for SMM emulation from the linux kernel
> +(CONFIG_KVM_SMM=n).  It also allows to support secure boot on arm

s/to support/support for/

> +without implementing secure world (el3) emulation in kvm.
> +
> +Of course there are also downsides.  The added device increases the
> +attack surface of the host, and we are adding some code duplication
> +because we have to reimplement some edk2 functionality in qemu.
> +

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes

2023-11-15 Thread Eric Auger




On 11/14/23 14:46, Cédric Le Goater wrote:
> On 11/14/23 11:09, Zhenzhong Duan wrote:
>> Some vIOMMU such as virtio-iommu use IOVA ranges from host side to
>> setup reserved ranges for passthrough device, so that guest will not
>> use an IOVA range beyond host support.
>>
>> Use an uAPI of IOMMUFD to get IOVA ranges of host side and pass to
>> vIOMMU just like the legacy backend, if this fails, fallback to
>> 64bit IOVA range.
>>
>> Also use out_iova_alignment returned from uAPI as pgsizes instead of
>> qemu_real_host_page_size() as a fallback.
>>
>> Signed-off-by: Zhenzhong Duan 
>> ---
>> v6: propagate iommufd_cdev_get_info_iova_range err and print as warning
>>
>>   hw/vfio/iommufd.c | 55 ++-
>>   1 file changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 06282d885c..e5bf528e89 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -267,6 +267,53 @@ static int
>> iommufd_cdev_ram_block_discard_disable(bool state)
>>   return ram_block_uncoordinated_discard_disable(state);
>>   }
>>   +static int iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer
>> *container,
>> +    uint32_t ioas_id, Error
>> **errp)
>> +{
>> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>> +    struct iommu_ioas_iova_ranges *info;
>> +    struct iommu_iova_range *iova_ranges;
>> +    int ret, sz, fd = container->be->fd;
>> +
>> +    info = g_malloc0(sizeof(*info));
>> +    info->size = sizeof(*info);
>> +    info->ioas_id = ioas_id;
>> +
>> +    ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
>> +    if (ret && errno != EMSGSIZE) {
>> +    goto error;
>> +    }
>> +
>> +    sz = info->num_iovas * sizeof(struct iommu_iova_range);
>> +    info = g_realloc(info, sizeof(*info) + sz);
>> +    info->allowed_iovas = (uintptr_t)(info + 1);
>> +
>> +    ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
>> +    if (ret) {
>> +    goto error;
>> +    }
>> +
>> +    iova_ranges = (struct iommu_iova_range
>> *)(uintptr_t)info->allowed_iovas;
>> +
>> +    for (int i = 0; i < info->num_iovas; i++) {
>> +    Range *range = g_new(Range, 1);
>> +
>> +    range_set_bounds(range, iova_ranges[i].start,
>> iova_ranges[i].last);
>> +    bcontainer->iova_ranges =
>> +    range_list_insert(bcontainer->iova_ranges, range);
>> +    }
>> +    bcontainer->pgsizes = info->out_iova_alignment;
>> +
>> +    g_free(info);
>> +    return 0;
>> +
>> +error:
>> +    ret = -errno;
>> +    g_free(info);
>> +    error_setg_errno(errp, errno, "Cannot get IOVA ranges");
>> +    return ret;
>> +}
>> +
>>   static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>>  AddressSpace *as, Error **errp)
>>   {
>> @@ -341,7 +388,13 @@ static int iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>>   goto err_discard_disable;
>>   }
>>   -    bcontainer->pgsizes = qemu_real_host_page_size();
>> +    ret = iommufd_cdev_get_info_iova_range(container, ioas_id, &err);
>> +    if (ret) {
>> +    warn_report_err(err);
>> +    err = NULL;
>> +    error_printf("Fallback to default 64bit IOVA range and 4K
>> page size\n");
>
> This would be better :
>
>     error_append_hint(&err,
>    "Fallback to default 64bit IOVA range and 4K page
> size\n");
>     warn_report_err(err);
>
> I will take care of it if you agree. With that,
>
> Reviewed-by: Cédric Le Goater 

With Cédric's suggestion,
Reviewed-by: Eric Auger 
Eric
>
> Thanks,
>
> C.
>
>
>> +    bcontainer->pgsizes = qemu_real_host_page_size();
>> +    }
>>     bcontainer->listener = vfio_memory_listener;
>>   memory_listener_register(&bcontainer->listener,
>> bcontainer->space->as);
>

[PATCH 1/2] linux-user/elfload: test return value of getrlimit

2023-11-15 Thread Thomas Weißschuh

Should getrlimit() fail the value of dumpsize.rlimit_cur may not be
initialized. Avoid reading garbage data by checking the return value of
getrlimit.

Signed-off-by: Thomas Weißschuh 
---
 linux-user/elfload.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 4cd6891d7b6a..799fe8497346 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -4667,8 +4667,7 @@ static int elf_core_dump(int signr, const CPUArchState 
*env)
 init_note_info(&info);
 
 errno = 0;
-getrlimit(RLIMIT_CORE, &dumpsize);
-if (dumpsize.rlim_cur == 0)
+if (getrlimit(RLIMIT_CORE, &dumpsize) == 0 && dumpsize.rlim_cur == 0)
 return 0;
 
 corefile = core_dump_filename(ts);

-- 
2.42.1

Re: [PATCH v3 1/8] ppc/pnv: Add pca9552 to powernv10 for PCIe hotplug power control

2023-11-15 Thread Miles Glenn

On Wed, 2023-11-15 at 08:28 +0100, Cédric Le Goater wrote:
> On 11/14/23 20:56, Glenn Miles wrote:
> > The Power Hypervisor code expects to see a pca9552 device connected
> > to the 3rd PNV I2C engine on port 1 at I2C address 0x63 (or left-
> > justified address of 0xC6).  This is used by hypervisor code to
> > control PCIe slot power during hotplug events.
> > 
> > Signed-off-by: Glenn Miles 
> > ---
> > Based-on: <20231024181144.4045056-3-mil...@linux.vnet.ibm.com>
> > [PATCH v3 2/2] misc/pca9552: Let external devices set pca9552
> > inputs
> > 
> > No changes from v2
> > 
> >   hw/ppc/Kconfig | 1 +
> >   hw/ppc/pnv.c   | 7 +++
> >   2 files changed, 8 insertions(+)
> > 
> > diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
> > index 56f0475a8e..f77ca773cf 100644
> > --- a/hw/ppc/Kconfig
> > +++ b/hw/ppc/Kconfig
> > @@ -32,6 +32,7 @@ config POWERNV
> >   select XIVE
> >   select FDT_PPC
> >   select PCI_POWERNV
> > +select PCA9552
> >   
> >   config PPC405
> >   bool
> > diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> > index 9c29727337..7afaf1008f 100644
> > --- a/hw/ppc/pnv.c
> > +++ b/hw/ppc/pnv.c
> > @@ -1877,6 +1877,13 @@ static void
> > pnv_chip_power10_realize(DeviceState *dev, Error **errp)
> > qdev_get_gpio_in(DEVICE(&chip10-
> > >psi),
> >  PSIHB9_IRQ_SBE_I2C
> > ));
> >   }
> > +
> > +/*
> > + * Add a PCA9552 I2C device for PCIe hotplug control
> > + * to engine 2, bus 1, address 0x63
> > + */
> > +i2c_slave_create_simple(chip10->i2c[2].busses[1], "pca9552",
> > 0x63);
> 
> You didn't answer my question in v2. Is this a P10 chip device or a
> board/machine device ?
> 
> Thanks,
> 
> C.
> 
> 

Sorry, you're right, I did miss that one, and after looking at the
Denali spec, I see that the topology is indeed different from Rainier
(which is what I have been modeling).  For the Denali, the PCA9552
has a different I2C address (0x62 instead of 0x63) and the GPIO
connections are also different.  Also, there is no PCA9554 chip because
it looks like they were able to cover all of the functionality with
just the  GPIO's of the PCA9552.  So, good catch!

I'll look at what they did on the Aspeed machines like you suggested.

Thanks,

Glenn

> 
> >   }
> >   
> >   static uint32_t pnv_chip_power10_xscom_pcba(PnvChip *chip,
> > uint64_t addr)

[PATCH 0/2] linux-user: two fixes to coredump generation

2023-11-15 Thread Thomas Weißschuh

Signed-off-by: Thomas Weißschuh 
---
Thomas Weißschuh (2):
  linux-user/elfload: test return value of getrlimit
  linux-user/elfload: check PR_GET_DUMPABLE before creating coredump

 linux-user/elfload.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)
---
base-commit: 9c673a41eefc50f1cb2fe3c083e7de842c7d276a
change-id: 20231115-qemu-user-dumpable-d499c0396103

Best regards,
-- 
Thomas Weißschuh

[PATCH 2/2] linux-user/elfload: check PR_GET_DUMPABLE before creating coredump

2023-11-15 Thread Thomas Weißschuh

A process can opt-out of coredump creation by calling
prctl(PR_SET_DUMPABLE, 0).
linux-user passes this call from the guest through to the
operating system.
>From there it can be read back again to avoid creating coredumps from
qemu-user itself if the guest chose so.

Signed-off-by: Thomas Weißschuh 
---
 linux-user/elfload.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 799fe8497346..76d5740af0ca 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2,6 +2,7 @@
 #include "qemu/osdep.h"
 #include 
 
+#include 
 #include 
 #include 
 
@@ -4667,6 +4668,10 @@ static int elf_core_dump(int signr, const CPUArchState 
*env)
 init_note_info(&info);
 
 errno = 0;
+
+if (prctl(PR_GET_DUMPABLE) == 0)
+return 0;
+
 if (getrlimit(RLIMIT_CORE, &dumpsize) == 0 && dumpsize.rlim_cur == 0)
 return 0;
 

-- 
2.42.1

Re: [PATCH] migration: free 'saddr' since be no longer used

2023-11-15 Thread Peter Xu

On Wed, Nov 15, 2023 at 09:49:09AM +, Daniel P. Berrangé wrote:
> On Wed, Nov 15, 2023 at 11:27:39AM +0800, Zongmin Zhou wrote:
> > Since socket_parse() will allocate memory for 'saddr',
> > and its value will pass to 'addr' that allocated
> > by migrate_uri_parse(),so free 'saddr' to avoid memory leak.
> > 
> > Fixes: 72a8192e225c ("migration: convert migration 'uri' into 
> > 'MigrateAddress'")
> > Signed-off-by: Zongmin Zhou
> > ---
> >  migration/migration.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 28a34c9068..30ed4bf6b6 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -493,6 +493,7 @@ bool migrate_uri_parse(const char *uri, 
> > MigrationChannel **channel,
> >  }
> >  addr->u.socket.type = saddr->type;
> >  addr->u.socket.u = saddr->u;
> 
> 'saddr->u' is a union embedded in SocketAddress, containing:
> 
> union { /* union tag is @type */
> InetSocketAddressWrapper inet;
> UnixSocketAddressWrapper q_unix;
> VsockSocketAddressWrapper vsock;
> StringWrapper fd;
> } u;
> 
> THis assignment is *shallow* copying the contents of the union.
> 
> All the type specifics structs that are members of this union
> containing allocated strings, and with this shallow copy, we
> are stealing the pointers to these allocated strings
> 
> 
> > +qapi_free_SocketAddress(saddr);
> 
> This meanwhle is doing a *deep* free of the contents of the
> SocketAddress, which includes all the pointers we just stole.
> 
> IOW, unless I'm mistaken somehow, this is going to cause a
> double-free

Right.  I think what we need is a g_free(saddr), with a comment explaining?

Or, is there better way to do that?  Something like a QAPI_CLONE() but not
exactly: we already have the object allocated.  We want to deep copy it to
the current object only on the fields but not the object itself.

-- 
Peter Xu

Re: [PATCH 2/2] vhost-scsi: Add support for a worker thread per virtqueue

2023-11-15 Thread Mike Christie

On 11/15/23 5:43 AM, Stefano Garzarella wrote:
> On Mon, Nov 13, 2023 at 06:36:44PM -0600, Mike Christie wrote:
>> This adds support for vhost-scsi to be able to create a worker thread
>> per virtqueue. Right now for vhost-net we get a worker thread per
>> tx/rx virtqueue pair which scales nicely as we add more virtqueues and
>> CPUs, but for scsi we get the single worker thread that's shared by all
>> virtqueues. When trying to send IO to more than 2 virtqueues the single
>> thread becomes a bottlneck.
>>
>> This patch adds a new setting, virtqueue_workers, which can be set to:
>>
>> 1: Existing behavior whre we get the single thread.
>> -1: Create a worker per IO virtqueue.
> 
> I find this setting a bit odd. What about a boolean instead?
> 
> `per_virtqueue_workers`:
>     false: Existing behavior whre we get the single thread.
>     true: Create a worker per IO virtqueue.

Sound good.


> 
>>
>> Signed-off-by: Mike Christie 
>> ---
>> hw/scsi/vhost-scsi.c    | 68 +
>> include/hw/virtio/virtio-scsi.h |  1 +
>> 2 files changed, 69 insertions(+)
>>
>> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
>> index 3126df9e1d9d..5cf669b6563b 100644
>> --- a/hw/scsi/vhost-scsi.c
>> +++ b/hw/scsi/vhost-scsi.c
>> @@ -31,6 +31,9 @@
>> #include "qemu/cutils.h"
>> #include "sysemu/sysemu.h"
>>
>> +#define VHOST_SCSI_WORKER_PER_VQ    -1
>> +#define VHOST_SCSI_WORKER_DEF    1
>> +
>> /* Features supported by host kernel. */
>> static const int kernel_feature_bits[] = {
>>     VIRTIO_F_NOTIFY_ON_EMPTY,
>> @@ -165,6 +168,62 @@ static const VMStateDescription 
>> vmstate_virtio_vhost_scsi = {
>>     .pre_save = vhost_scsi_pre_save,
>> };
>>
>> +static int vhost_scsi_set_workers(VHostSCSICommon *vsc, int workers_cnt)
>> +{
>> +    struct vhost_dev *dev = &vsc->dev;
>> +    struct vhost_vring_worker vq_worker;
>> +    struct vhost_worker_state worker;
>> +    int i, ret;
>> +
>> +    /* Use default worker */
>> +    if (workers_cnt == VHOST_SCSI_WORKER_DEF ||
>> +    dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1) {
>> +    return 0;
>> +    }
>> +
>> +    if (workers_cnt != VHOST_SCSI_WORKER_PER_VQ) {
>> +    return -EINVAL;
>> +    }
>> +
>> +    /*
>> + * ctl/evt share the first worker since it will be rare for them
>> + * to send cmds while IO is running.
>> + */
>> +    for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
>> +    memset(&worker, 0, sizeof(worker));
>> +
>> +    ret = dev->vhost_ops->vhost_new_worker(dev, &worker);
> 
> Should we call vhost_free_worker() in the vhost_scsi_unrealize() or are
> workers automatically freed when `vhostfd` is closed?
> 

All worker threads are freed automatically like how the default worker
created from VHOST_SET_OWNER is freed on close.

Re: [PATCH] monitor: flush messages on abort

2023-11-15 Thread Markus Armbruster

Steven Sistare  writes:

> On 11/6/2023 5:10 AM, Daniel P. Berrangé wrote:
>> On Fri, Nov 03, 2023 at 03:51:00PM -0400, Steven Sistare wrote:
>>> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
 On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
> Buffered monitor output is lost when abort() is called.  The pattern
> error_report() followed by abort() occurs about 60 times, so valuable
> information is being lost when the abort is called in the context of a
> monitor command.

 I'm curious, was there a particular abort() scenario that you hit ?
>>>
>>> Yes, while tweaking the suspended state, and forgetting to add transitions:
>>>
>>> error_report("invalid runstate transition: '%s' -> '%s'",
>>> abort();
>>>
>>> But I have previously hit this for other errors.
>>>
 For some crude statistics:

   $ for i in abort return exit goto ; do echo -n "$i: " ; git grep --after 
 1 error_report | grep $i | wc -l ; done
   abort: 47
   return: 512
   exit: 458
   goto: 177

 to me those numbers say that calling "abort()" after error_report
 should be considered a bug, and we can blanket replace all the
 abort() calls with exit(EXIT_FAILURE), and thus avoid the need to
 special case flushing the monitor.
>>>
>>> And presumably add an atexit handler to flush the monitor ala monitor_abort.
>>> AFAICT currently no destructor is called for the monitor at exit time.
>> 
>> The HMP monitor flushes at each newline,  and exit() will take care of
>> flushing stdout, so I don't think there's anything else needed.
>> 
 Also I think there's a decent case to be made for error_report()
 to call monitor_flush().
>>>
>>> A good start, but that would not help for monitors with skip_flush=true, 
>>> which 
>>> need to format the buffered string in a json response, which is the case I 
>>> tripped over.
>> 
>> 'skip_flush' is only set to 'true' when using a QMP monitor and invoking
>> "hmp-monitor-command".
>
> OK, that is narrower than I thought.  Now I see that other QMP monitors send 
> error_report() to stderr, hence it is visible after abort and exit:
>
> int error_vprintf(const char *fmt, va_list ap) {
> if (cur_mon && !monitor_cur_is_qmp())
> return monitor_vprintf(cur_mon, fmt, ap);
> return vfprintf(stderr, fmt, ap);<-- HERE
>
> That surprises me, I thought that would be returned to the monitor caller in 
> the
> json response. I guess the rationale is that the "main" error, if any, will be
> set and returned by the err object that is passed down the stack during 
> command
> evaluation.

Three cases:

1. !cur_mon

   Not executing a monitor command.  We want to report errors etc to
   stderr.

2. cur_mon && !monitor_cur_is_qmp()

   Executing an HMP command.  We want to report errors to the current
   monitor.

2. cur_mon && monitor_cur_is_qmp()

   Executing a QMP command.  What we want is less obvious.

   Somewhere up the call stack is the QMP command's handler function.
   It takes an Error **errp argument.

   Within such a function, any errors need to be passed up the call
   chain into that argument.  Reporting them with error_report() is
   *wrong*.  Reporting must be left to the function's caller.

   A QMP command handler returns it output, it doesn't print it.  So
   calling monitor_printf() is wrong, too.

   But what about warn_report()?  Is that wrong, too?  We decided it's
   not, mostly because we have nothing else to offer.

   The stupidest way to keep it useful in QMP command context is to have
   error_vprintf() print to stderr.  So that's what it does.

   We could instead accumulate error_vprintf() output in a buffer, and
   include it with the QMP reply.  However, it's not clear what a
   management application could do with it.  So we stick to stupid.

[...]

Re: [PATCH-for-8.2?] tests/avocado: Make fetch_asset() inconditionally require a crypto hash

2023-11-15 Thread Alex Bennée

Philippe Mathieu-Daudé  writes:

s/inconditionally/unconditionally/

Otherwise:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 2/2] vhost-scsi: Add support for a worker thread per virtqueue

2023-11-15 Thread Mike Christie

On 11/15/23 6:57 AM, Stefan Hajnoczi wrote:
> On Wed, Nov 15, 2023 at 12:43:02PM +0100, Stefano Garzarella wrote:
>> On Mon, Nov 13, 2023 at 06:36:44PM -0600, Mike Christie wrote:
>>> This adds support for vhost-scsi to be able to create a worker thread
>>> per virtqueue. Right now for vhost-net we get a worker thread per
>>> tx/rx virtqueue pair which scales nicely as we add more virtqueues and
>>> CPUs, but for scsi we get the single worker thread that's shared by all
>>> virtqueues. When trying to send IO to more than 2 virtqueues the single
>>> thread becomes a bottlneck.
>>>
>>> This patch adds a new setting, virtqueue_workers, which can be set to:
>>>
>>> 1: Existing behavior whre we get the single thread.
>>> -1: Create a worker per IO virtqueue.
>>
>> I find this setting a bit odd. What about a boolean instead?
>>
>> `per_virtqueue_workers`:
>> false: Existing behavior whre we get the single thread.
>> true: Create a worker per IO virtqueue.
> 
> Me too, I thought there would be round-robin assignment for 1 <
> worker_cnt < (dev->nvqs - VHOST_SCSI_VQ_NUM_FIXED) but instead only 1
> and -1 have any meaning.
> 
> Do you want to implement round-robin assignment?
> 

It was an int because I originally did round robin but at some point
dropped it. I found that our users at least:

1. Are used to configuring number of virtqueues.
2. In the userspace guest OS are used to checking the queue to CPU
mappings to figure out how their app should optimize itself.

So users would just do a virtqueue per vCPU or if trying to reduce
mem usage would do N virtqueues < vCPUs. For both cases they just did the
worker per virtqueue.

However, I left it an int in case in the future someone wanted
the future.

Re: [PATCH] softmmu/memory: use memcpy for multi-byte accesses

2023-11-15 Thread Patrick Venture

On Wed, Nov 15, 2023 at 2:30 AM Peter Maydell 
wrote:

> On Tue, 14 Nov 2023 at 21:18, Richard Henderson
>  wrote:
> >
> > On 11/14/23 12:55, Patrick Venture wrote:
> > > Avoids unaligned pointer issues.
> > >
> > > Reviewed-by: Chris Rauer 
> > > Reviewed-by: Peter Foley 
> > > Signed-off-by: Patrick Venture 
> > > ---
> > >   system/memory.c | 16 
> > >   1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/system/memory.c b/system/memory.c
> > > index 304fa843ea..02c97d5187 100644
> > > --- a/system/memory.c
> > > +++ b/system/memory.c
> > > @@ -1343,16 +1343,16 @@ static uint64_t
> memory_region_ram_device_read(void *opaque,
> > >
> > >   switch (size) {
> > >   case 1:
> > > -data = *(uint8_t *)(mr->ram_block->host + addr);
> > > +memcpy(&data, mr->ram_block->host + addr, sizeof(uint8_t));
> >
> >
> > This is incorrect, especially for big-endian hosts.
> >
> > You want to use "qemu/bswap.h", ld*_he_p(), st*_he_p().
>
> More specifically, we have a ldn_he_p() and stn_he_p() that
> take the size in bytes of the data to read, so we should be
> able to replace the switch-on-size in these functions with
> a single call to the appropriate one of those.
>

Thanks!


>
> thanks
> -- PMM
>

Re: [PATCH] softmmu/memory: use memcpy for multi-byte accesses

2023-11-15 Thread Patrick Venture

On Wed, Nov 15, 2023 at 2:35 AM Peter Maydell 
wrote:

> On Tue, 14 Nov 2023 at 20:55, Patrick Venture  wrote:
> > Avoids unaligned pointer issues.
> >
>
> It would be nice to be more specific in the commit message here, by
> describing what kind of guest behaviour or machine config runs into this
> problem, and whether this happens in a situation users are likely to
> run into. If the latter, we should consider tagging the commit
> with "Cc: qemu-sta...@nongnu.org" to have it backported to the
> stable release branches.
>

Thanks! I'll update the commit message with v2.  We were seeing this in our
infrastructure with unaligned accesses using the pointer dereference as
there are no guarantees on alignment of the incoming values.

>
> thanks
> -- PMM
>

Re: [PATCH v6 07/21] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info

2023-11-15 Thread Eric Auger




On 11/14/23 11:09, Zhenzhong Duan wrote:
> This helper will be used by both legacy and iommufd backends.
>
> No functional changes intended.
>
> Signed-off-by: Zhenzhong Duan 
> Reviewed-by: Cédric Le Goater 
> Signed-off-by: Cédric Le Goater 
Reviewed-by: Eric Auger 

Eric
> ---
>  hw/vfio/pci.h |  3 +++
>  hw/vfio/pci.c | 54 +++
>  2 files changed, 40 insertions(+), 17 deletions(-)
>
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index fba8737ab2..1006061afb 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -218,6 +218,9 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int 
> nr);
>  
>  extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
>  
> +int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
> +struct vfio_pci_hot_reset_info **info_p);
> +
>  int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
>  
>  int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c62c02f7b6..eb55e8ae88 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2445,22 +2445,13 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress 
> *addr, const char *name)
>  return (strcmp(tmp, name) == 0);
>  }
>  
> -static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> +int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
> +struct vfio_pci_hot_reset_info **info_p)
>  {
> -VFIOGroup *group;
>  struct vfio_pci_hot_reset_info *info;
> -struct vfio_pci_dependent_device *devices;
> -struct vfio_pci_hot_reset *reset;
> -int32_t *fds;
> -int ret, i, count;
> -bool multi = false;
> +int ret, count;
>  
> -trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> -
> -if (!single) {
> -vfio_pci_pre_reset(vdev);
> -}
> -vdev->vbasedev.needs_reset = false;
> +assert(info_p && !*info_p);
>  
>  info = g_malloc0(sizeof(*info));
>  info->argsz = sizeof(*info);
> @@ -2468,24 +2459,53 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, 
> bool single)
>  ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>  if (ret && errno != ENOSPC) {
>  ret = -errno;
> +g_free(info);
>  if (!vdev->has_pm_reset) {
>  error_report("vfio: Cannot reset device %s, "
>   "no available reset mechanism.", 
> vdev->vbasedev.name);
>  }
> -goto out_single;
> +return ret;
>  }
>  
>  count = info->count;
> -info = g_realloc(info, sizeof(*info) + (count * sizeof(*devices)));
> -info->argsz = sizeof(*info) + (count * sizeof(*devices));
> -devices = &info->devices[0];
> +info = g_realloc(info, sizeof(*info) + (count * 
> sizeof(info->devices[0])));
> +info->argsz = sizeof(*info) + (count * sizeof(info->devices[0]));
>  
>  ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>  if (ret) {
>  ret = -errno;
> +g_free(info);
>  error_report("vfio: hot reset info failed: %m");
> +return ret;
> +}
> +
> +*info_p = info;
> +return 0;
> +}
> +
> +static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> +{
> +VFIOGroup *group;
> +struct vfio_pci_hot_reset_info *info = NULL;
> +struct vfio_pci_dependent_device *devices;
> +struct vfio_pci_hot_reset *reset;
> +int32_t *fds;
> +int ret, i, count;
> +bool multi = false;
> +
> +trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> +
> +if (!single) {
> +vfio_pci_pre_reset(vdev);
> +}
> +vdev->vbasedev.needs_reset = false;
> +
> +ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
> +
> +if (ret) {
>  goto out_single;
>  }
> +devices = &info->devices[0];
>  
>  trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
>

Re: [PATCH] softmmu/memory: use memcpy for multi-byte accesses

2023-11-15 Thread Richard Henderson

On 11/15/23 08:58, Patrick Venture wrote:

On Wed, Nov 15, 2023 at 2:35 AM Peter Maydell > wrote:

On Tue, 14 Nov 2023 at 20:55, Patrick Venture mailto:vent...@google.com>> wrote:
 > Avoids unaligned pointer issues.
 >

It would be nice to be more specific in the commit message here, by
describing what kind of guest behaviour or machine config runs into this
problem, and whether this happens in a situation users are likely to
run into. If the latter, we should consider tagging the commit
with "Cc: qemu-sta...@nongnu.org " to have it
backported to the
stable release branches.

Thanks! I'll update the commit message with v2.  We were seeing this in our 
infrastructure with unaligned accesses using the pointer dereference as there are no 
guarantees on alignment of the incoming values.

Which host cpu, for reference?  There aren't many that generate unaligned traps 
these days...

r~

[PATCH for-8.2 2/4] block: Fix deadlocks in bdrv_graph_wrunlock()

2023-11-15 Thread Kevin Wolf

bdrv_graph_wrunlock() calls aio_poll(), which may run callbacks that
have a nested event loop. Nested event loops can depend on other
iothreads making progress, so in order to allow them to make progress it
must not hold the AioContext lock of another thread while calling
aio_poll().

This introduces a @bs parameter to bdrv_graph_wrunlock() whose
AioContext is temporarily dropped (which matches bdrv_graph_wrlock()),
and a bdrv_graph_wrunlock_ctx() that can be used if the BlockDriverState
doesn't necessarily exist any more when unlocking.

Signed-off-by: Kevin Wolf 
---
 include/block/graph-lock.h | 15 ++-
 block.c| 26 +-
 block/backup.c |  2 +-
 block/blklogwrites.c   |  4 ++--
 block/blkverify.c  |  2 +-
 block/block-backend.c  |  6 --
 block/commit.c | 10 +-
 block/graph-lock.c | 23 ++-
 block/mirror.c | 14 +++---
 block/qcow2.c  |  2 +-
 block/quorum.c |  4 ++--
 block/replication.c| 10 +-
 block/snapshot.c   |  2 +-
 block/stream.c |  8 
 block/vmdk.c   | 10 +-
 blockdev.c |  4 ++--
 blockjob.c |  8 
 tests/unit/test-bdrv-drain.c   | 20 ++--
 tests/unit/test-bdrv-graph-mod.c   | 10 +-
 scripts/block-coroutine-wrapper.py |  2 +-
 20 files changed, 109 insertions(+), 73 deletions(-)

diff --git a/include/block/graph-lock.h b/include/block/graph-lock.h
index 6f1cd12745..22b5db1ed9 100644
--- a/include/block/graph-lock.h
+++ b/include/block/graph-lock.h
@@ -123,8 +123,21 @@ bdrv_graph_wrlock(BlockDriverState *bs);
  * bdrv_graph_wrunlock:
  * Write finished, reset global has_writer to 0 and restart
  * all readers that are waiting.
+ *
+ * If @bs is non-NULL, its AioContext is temporarily released.
+ */
+void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
+bdrv_graph_wrunlock(BlockDriverState *bs);
+
+/*
+ * bdrv_graph_wrunlock_ctx:
+ * Write finished, reset global has_writer to 0 and restart
+ * all readers that are waiting.
+ *
+ * If @ctx is non-NULL, its lock is temporarily released.
  */
-void bdrv_graph_wrunlock(void) TSA_RELEASE(graph_lock) TSA_NO_TSA;
+void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
+bdrv_graph_wrunlock_ctx(AioContext *ctx);
 
 /*
  * bdrv_graph_co_rdlock:
diff --git a/block.c b/block.c
index eac105a504..efb91d4ffb 100644
--- a/block.c
+++ b/block.c
@@ -1713,7 +1713,7 @@ open_failed:
 bdrv_unref_child(bs, bs->file);
 assert(!bs->file);
 }
-bdrv_graph_wrunlock();
+bdrv_graph_wrunlock(NULL);
 
 g_free(bs->opaque);
 bs->opaque = NULL;
@@ -3577,7 +3577,7 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 bdrv_drained_begin(drain_bs);
 bdrv_graph_wrlock(backing_hd);
 ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
-bdrv_graph_wrunlock();
+bdrv_graph_wrunlock(backing_hd);
 bdrv_drained_end(drain_bs);
 bdrv_unref(drain_bs);
 
@@ -3796,7 +3796,7 @@ BdrvChild *bdrv_open_child(const char *filename,
 child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
   errp);
 aio_context_release(ctx);
-bdrv_graph_wrunlock();
+bdrv_graph_wrunlock(NULL);
 
 return child;
 }
@@ -4652,7 +4652,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 
 bdrv_graph_wrlock(NULL);
 tran_commit(tran);
-bdrv_graph_wrunlock();
+bdrv_graph_wrunlock(NULL);
 
 QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
 BlockDriverState *bs = bs_entry->state.bs;
@@ -4671,7 +4671,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 abort:
 bdrv_graph_wrlock(NULL);
 tran_abort(tran);
-bdrv_graph_wrunlock();
+bdrv_graph_wrunlock(NULL);
 
 QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
 if (bs_entry->prepared) {
@@ -4857,7 +4857,7 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState 
*reopen_state,
 ret = bdrv_set_file_or_backing_noperm(bs, new_child_bs, is_backing,
   tran, errp);
 
-bdrv_graph_wrunlock();
+bdrv_graph_wrunlock(new_child_bs);
 
 if (old_ctx != ctx) {
 aio_context_release(ctx);
@@ -5216,7 +5216,7 @@ static void bdrv_close(BlockDriverState *bs)
 
 assert(!bs->backing);
 assert(!bs->file);
-bdrv_graph_wrunlock();
+bdrv_graph_wrunlock(bs);
 
 g_free(bs->opaque);
 bs->opaque = NULL;
@@ -5511,7 +5511,7 @@ int bdrv_drop_filter(BlockDriverState *bs, Error **errp)
 bdrv_drained_begin(child_bs);
 bdrv_graph_wrlock(bs);
 ret = bdrv_replace_node_common(bs, child_bs, true, true, errp);
-bdrv_graph_wrunlock(

[PATCH for-8.2 4/4] iotests: Test two stream jobs in a single iothread

2023-11-15 Thread Kevin Wolf

This tests two parallel stream jobs that will complete around the same
time and run on two different disks in the same iothreads. It is loosely
based on the bug report at https://issues.redhat.com/browse/RHEL-1761.

For me, this test hangs reliably with the originally reported bug in
blk_remove_bs(). After fixing it, it intermittently hangs for the bugs
fixed after it, missing AioContext unlocking in bdrv_graph_wrunlock()
and in stream_prepare(). The deadlocks seem to happen more frequently
when the test directory is on tmpfs.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/tests/iothreads-stream | 73 +++
 tests/qemu-iotests/tests/iothreads-stream.out | 11 +++
 2 files changed, 84 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/iothreads-stream
 create mode 100644 tests/qemu-iotests/tests/iothreads-stream.out

diff --git a/tests/qemu-iotests/tests/iothreads-stream 
b/tests/qemu-iotests/tests/iothreads-stream
new file mode 100755
index 00..dfd52ac2cc
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-stream
@@ -0,0 +1,73 @@
+#!/usr/bin/env python3
+# group: rw quick auto
+#
+# Copyright (C) 2023 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# Creator/Owner: Kevin Wolf 
+
+import iotests
+
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_platforms=['linux'])
+iotests.verify_virtio_scsi_pci_or_ccw()
+
+with iotests.FilePath('disk1.img') as base1_path, \
+ iotests.FilePath('disk1-snap.img') as snap1_path, \
+ iotests.FilePath('disk2.img') as base2_path, \
+ iotests.FilePath('disk2-snap.img') as snap2_path, \
+ iotests.VM() as vm:
+
+img_size = '10M'
+
+# Only one iothread for both disks
+vm.add_object('iothread,id=iothread0')
+vm.add_device('virtio-scsi,iothread=iothread0')
+
+iotests.log('Preparing disks...')
+for i, base_path, snap_path in ((0, base1_path, snap1_path),
+(1, base2_path, snap2_path)):
+iotests.qemu_img_create('-f', iotests.imgfmt, base_path, img_size)
+iotests.qemu_img_create('-f', iotests.imgfmt, '-b', base_path,
+'-F', iotests.imgfmt, snap_path)
+
+iotests.qemu_io_log('-c', f'write 0 {img_size}', base_path)
+
+
vm.add_blockdev(f'file,node-name=disk{i}-base-file,filename={base_path}')
+vm.add_blockdev(f'qcow2,node-name=disk{i}-base,file=disk{i}-base-file')
+vm.add_blockdev(f'file,node-name=disk{i}-file,filename={snap_path}')
+vm.add_blockdev(f'qcow2,node-name=disk{i},file=disk{i}-file,'
+f'backing=disk{i}-base')
+vm.add_device(f'scsi-hd,drive=disk{i}')
+
+iotests.log('Launching VM...')
+vm.launch()
+
+iotests.log('Starting stream jobs...')
+iotests.log(vm.qmp('block-stream', device='disk0', job_id='job0'))
+iotests.log(vm.qmp('block-stream', device='disk1', job_id='job1'))
+
+finished = 0
+while True:
+try:
+ev = vm.event_wait('JOB_STATUS_CHANGE', timeout=0.1)
+if ev is not None and ev['data']['status'] == 'null':
+finished += 1
+# The test is done once both jobs are gone
+if finished == 2:
+break
+except TimeoutError:
+pass
+vm.cmd('query-jobs')
diff --git a/tests/qemu-iotests/tests/iothreads-stream.out 
b/tests/qemu-iotests/tests/iothreads-stream.out
new file mode 100644
index 00..ef134165e5
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-stream.out
@@ -0,0 +1,11 @@
+Preparing disks...
+wrote 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+wrote 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Launching VM...
+Starting stream jobs...
+{"return": {}}
+{"return": {}}
-- 
2.41.0

[PATCH for-8.2 3/4] stream: Fix AioContext locking during bdrv_graph_wrlock()

2023-11-15 Thread Kevin Wolf

In stream_prepare(), we need to temporarily drop the AioContext lock
that job_prepare_locked() took for us while calling the graph write lock
functions which can poll.

All block nodes related to this block job are in the same AioContext, so
we can pass any of them to bdrv_graph_wrlock()/ bdrv_graph_wrunlock().
Unfortunately, the one that we picked is base, which can be NULL - and
in this case the AioContext lock is not released and deadlocks can
occur.

Fix this by passing s->target_bs, which is never NULL.

Signed-off-by: Kevin Wolf 
---
 block/stream.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index e3aa696289..01fe7c0f16 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -99,9 +99,9 @@ static int stream_prepare(Job *job)
 }
 }
 
-bdrv_graph_wrlock(base);
+bdrv_graph_wrlock(s->target_bs);
 bdrv_set_backing_hd_drained(unfiltered_bs, base, &local_err);
-bdrv_graph_wrunlock(base);
+bdrv_graph_wrunlock(s->target_bs);
 
 /*
  * This call will do I/O, so the graph can change again from here on.
-- 
2.41.0

[PATCH for-8.2 0/4] block: Fix deadlocks with the stream job

2023-11-15 Thread Kevin Wolf

This series contains three fixes for deadlocks that follow the same
pattern: A nested event loop in the main thread waits for an iothread to
make progress, but the AioContext lock of that iothread is still held by
the main loop, so it can never make progress.

We're planning to fully remove the AioContext lock in 9.0, which would
automatically get rid of this kind of bugs, but it's still there in 8.2,
so let's fix them individually for this release.

Kevin Wolf (4):
  block: Fix bdrv_graph_wrlock() call in blk_remove_bs()
  block: Fix deadlocks in bdrv_graph_wrunlock()
  stream: Fix AioContext locking during bdrv_graph_wrlock()
  iotests: Test two stream jobs in a single iothread

 include/block/graph-lock.h| 15 +++-
 block.c   | 26 +++
 block/backup.c|  2 +-
 block/blklogwrites.c  |  4 +-
 block/blkverify.c |  2 +-
 block/block-backend.c | 10 ++-
 block/commit.c| 10 +--
 block/graph-lock.c| 23 +-
 block/mirror.c| 14 ++--
 block/qcow2.c |  2 +-
 block/quorum.c|  4 +-
 block/replication.c   | 10 +--
 block/snapshot.c  |  2 +-
 block/stream.c| 10 +--
 block/vmdk.c  | 10 +--
 blockdev.c|  4 +-
 blockjob.c|  8 +-
 tests/unit/test-bdrv-drain.c  | 20 ++---
 tests/unit/test-bdrv-graph-mod.c  | 10 +--
 scripts/block-coroutine-wrapper.py|  2 +-
 tests/qemu-iotests/tests/iothreads-stream | 73 +++
 tests/qemu-iotests/tests/iothreads-stream.out | 11 +++
 22 files changed, 197 insertions(+), 75 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/iothreads-stream
 create mode 100644 tests/qemu-iotests/tests/iothreads-stream.out

-- 
2.41.0

1 2 3 >

1 - 100 of 276 matches

Mail list logo