date:20240219

Re: [PATCH v5 02/11] pcie_sriov: Validate NumVFs

2024-02-19 Thread Akihiko Odaki


On 2024/02/19 2:36, Michael S. Tsirkin wrote:

On Sun, Feb 18, 2024 at 01:56:07PM +0900, Akihiko Odaki wrote:

The guest may write NumVFs greater than TotalVFs and that can lead
to buffer overflow in VF implementations.

Cc: qemu-sta...@nongnu.org
Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization 
(SR/IOV)")
Signed-off-by: Akihiko Odaki 
---
  hw/pci/pcie_sriov.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index a1fe65f5d801..da209b7f47fd 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev)
  
  assert(sriov_cap > 0);

  num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
+if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
+return;
+}
  
  dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);



This reminds me: how is this num_vfs value set on migration?


That's a good point... Actually no consideration of migration is made 
and SR-IOV is completely broken with it.

[PULL 00/49] ppc-for-9.0 queue

2024-02-19 Thread Nicholas Piggin

The following changes since commit da96ad4a6a2ef26c83b15fa95e7fceef5147269c:

  Merge tag 'hw-misc-20240215' of https://github.com/philmd/qemu into staging 
(2024-02-16 11:05:14 +)

are available in the Git repository at:

  https://gitlab.com/npiggin/qemu.git tags/pull-ppc-for-9.0-20240219

for you to fetch changes up to 922e408e12315121d3e09304b8b8f462ea051af1:

  target/ppc: optimise ppcemb_tlb_t flushing (2024-02-19 18:09:19 +1000)


* Avocado tests for ppc64 to boot FreeBSD, run guests with emulated
  or nested hypervisor facilities, among other things.
* Update ppc64 CPU defaults to Power10.
* Add a new powernv10-rainier machine to better capture differences
  between the different Power10 systems.
* Implement more device models for powernv.
* 4xx TLB flushing performance and correctness improvements.
* Correct gdb implementation to access some important SPRs.
* Misc cleanups and bug fixes.

I dropped the BHRB patches, they are very close but minor issue only
noticed recently held them up. Hopefully we can get those and a bunch
of other outstanding submissions in for 9.0 but this PR was taking too
long as it was.

Thanks,
Nick

Chalapathi V (3):
  hw/ppc: Add pnv nest pervasive common chiplet model
  hw/ppc: Add N1 chiplet model
  hw/ppc: N1 chiplet wiring

Cédric Le Goater (1):
  spapr: Tag pseries-2.1 - 2.11 machines as deprecated

Glenn Miles (9):
  misc/pca9552: Fix inverted input status
  misc/pca9552: Let external devices set pca9552 inputs
  ppc/pnv: New powernv10-rainier machine type
  ppc/pnv: Add pca9552 to powernv10-rainier for PCIe hotplug power control
  ppc/pnv: Wire up pca9552 GPIO pins for PCIe hotplug power control
  ppc/pnv: Use resettable interface to reset child I2C buses
  misc: Add a pca9554 GPIO device model
  ppc/pnv: Add a pca9554 I2C device to powernv10-rainier
  ppc/pnv: Test pnv i2c master and connected devices

Harsh Prateek Bora (2):
  ppc/spapr: Introduce SPAPR_IRQ_NR_IPIS to refer IRQ range for CPU IPIs.
  ppc/spapr: Initialize max_cpus limit to SPAPR_IRQ_NR_IPIS.

Nicholas Piggin (28):
  target/ppc: Fix lxv/stxv MSR facility check
  target/ppc: Fix crash on machine check caused by ifetch
  tests/avocado: mark boot_linux.py long runtime instead of flaky
  tests/avocado: improve flaky ppc/pnv boot_linux_console.py test
  tests/avocado: ppc add powernv10 boot_linux_console test
  tests/avocado: Add ppc pseries and powernv hash MMU tests
  tests/avocado: Add pseries KVM boot_linux test
  tests/avocado: ppc add hypervisor tests
  tests/avocado: Add FreeBSD distro boot tests for ppc
  tests/avocado: Use default CPU for pseries machine
  ppc/pnv: Update skiboot to v7.1
  target/ppc: Rename registers to match ISA
  ppc/spapr: change pseries machine default to POWER10 CPU
  ppc/pnv: Change powernv default to powernv10
  target/ppc: Rename TBL to TB on 64-bit
  target/ppc: Improve timebase register defines naming
  target/ppc: Fix move-to timebase SPR access permissions
  ppc/pnv: Add POWER9/10 chiptod model
  ppc/pnv: Wire ChipTOD model to powernv9 and powernv10 machines
  ppc/pnv: Implement the ChipTOD to Core transfer
  target/ppc: Implement core timebase state machine and TFMR
  target/ppc: Add SMT support to time facilities
  target/ppc: Fix 440 tlbwe TLB invalidation gaps
  target/ppc: Factor out 4xx ppcemb_tlb_t flushing
  target/ppc: 4xx don't flush TLB for a newly written software TLB entry
  target/ppc: 4xx optimise tlbwe_lo TLB flushing
  target/ppc: 440 optimise tlbwe TLB flushing
  target/ppc: optimise ppcemb_tlb_t flushing

Peter Maydell (1):
  hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses

Philippe Mathieu-Daudé (4):
  hw/ppc/spapr: Add missing license
  hw/ppc/spapr_hcall: Allow elision of softmmu_resize_hpt_prep
  hw/ppc/spapr_hcall: Rename {softmmu -> vhyp_mmu}_resize_hpt_pr
  hw/ppc/spapr: Rename 'softmmu' -> 'vhyp_mmu'

Saif Abrar (1):
  target/ppc: Update gdbstub to read SPR's CFAR, DEC, HDEC, TB-L/U

 MAINTAINERS  |  11 +-
 docs/about/deprecated.rst|   8 +
 docs/devel/testing.rst   |  11 +
 hw/misc/Kconfig  |   4 +
 hw/misc/meson.build  |   1 +
 hw/misc/pca9552.c|  58 ++-
 hw/misc/pca9554.c| 328 +++
 hw/ppc/Kconfig   |   2 +
 hw/ppc/meson.build   |   5 +-
 hw/ppc/pnv.c | 131 +-
 hw/ppc/pnv_chiptod.c | 586 +++
 hw/ppc/pnv_i2c.c

[PULL 05/49] tests/avocado: ppc add powernv10 boot_linux_console test

2024-02-19 Thread Nicholas Piggin

Add test for POWER10.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 tests/avocado/boot_linux_console.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index af104fff1c..a00202df3c 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -1387,6 +1387,14 @@ def test_ppc_powernv9(self):
 """
 self.do_test_ppc64_powernv('P9')
 
+def test_ppc_powernv10(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv10
+:avocado: tags=accel:tcg
+"""
+self.do_test_ppc64_powernv('P10')
+
 def test_ppc_g3beige(self):
 """
 :avocado: tags=arch:ppc
-- 
2.42.0

[PULL 07/49] tests/avocado: Add pseries KVM boot_linux test

2024-02-19 Thread Nicholas Piggin

ppc has no avocado tests for the KVM backend. Add a KVM boot_linux.py
test for pseries.

Signed-off-by: Nicholas Piggin 
---
 tests/avocado/boot_linux.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/tests/avocado/boot_linux.py b/tests/avocado/boot_linux.py
index de4c8805f7..61ba13dda8 100644
--- a/tests/avocado/boot_linux.py
+++ b/tests/avocado/boot_linux.py
@@ -103,6 +103,14 @@ def test_pseries_tcg(self):
 self.vm.add_args("-accel", "tcg")
 self.launch_and_wait(set_up_ssh_connection=False)
 
+def test_pseries_kvm(self):
+"""
+:avocado: tags=machine:pseries
+:avocado: tags=accel:kvm
+"""
+self.require_accelerator("kvm")
+self.vm.add_args("-accel", "kvm")
+self.launch_and_wait(set_up_ssh_connection=False)
 
 class BootLinuxS390X(LinuxTest):
 """
-- 
2.42.0

[PULL 08/49] tests/avocado: ppc add hypervisor tests

2024-02-19 Thread Nicholas Piggin

The powernv and pseries machines both provide hypervisor facilities
that are supported by KVM. This is a large and complicated set of
features that don't get much system-level testing in ppc tests.

Add a new test case for these which runs QEMU KVM inside the target.
This downloads an Alpine VM image, boots it and downloads and installs
the qemu package, then boots a virtual machine under it, re-using the
original Alpine VM image.

Signed-off-by: Nicholas Piggin 
---
 MAINTAINERS   |   1 +
 tests/avocado/ppc_hv_tests.py | 203 ++
 2 files changed, 204 insertions(+)
 create mode 100644 tests/avocado/ppc_hv_tests.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 7d61fb9319..c0f42e8d4a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1525,6 +1525,7 @@ F: tests/qtest/libqos/*spapr*
 F: tests/qtest/rtas*
 F: tests/qtest/libqos/rtas*
 F: tests/avocado/ppc_pseries.py
+F: tests/avocado/ppc_hv_tests.py
 
 PowerNV (Non-Virtualized)
 M: Cédric Le Goater 
diff --git a/tests/avocado/ppc_hv_tests.py b/tests/avocado/ppc_hv_tests.py
new file mode 100644
index 00..2f80d0d176
--- /dev/null
+++ b/tests/avocado/ppc_hv_tests.py
@@ -0,0 +1,203 @@
+# Tests that specifically try to exercise hypervisor features of the
+# target machines. powernv supports the Power hypervisor ISA, and
+# pseries supports the nested-HV hypervisor spec.
+#
+# Copyright (c) 2023 IBM Corporation
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+from avocado import skipIf, skipUnless
+from avocado.utils import archive
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import wait_for_console_pattern, exec_command
+import os
+import time
+import subprocess
+
+deps = ["xorriso"] # dependent tools needed in the test setup/box.
+
+def which(tool):
+""" looks up the full path for @tool, returns None if not found
+or if @tool does not have executable permissions.
+"""
+paths=os.getenv('PATH')
+for p in paths.split(os.path.pathsep):
+p = os.path.join(p, tool)
+if os.path.exists(p) and os.access(p, os.X_OK):
+return p
+return None
+
+def missing_deps():
+""" returns True if any of the test dependent tools are absent.
+"""
+for dep in deps:
+if which(dep) is None:
+return True
+return False
+
+# Alpine is a light weight distro that supports QEMU. These tests boot
+# that on the machine then run a QEMU guest inside it in KVM mode,
+# that runs the same Alpine distro image.
+# QEMU packages are downloaded and installed on each test. That's not a
+# large download, but it may be more polite to create qcow2 image with
+# QEMU already installed and use that.
+@skipUnless(os.getenv('AVOCADO_ALLOW_LARGE_STORAGE'), 'storage limited')
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
+@skipIf(missing_deps(), 'dependencies (%s) not installed' % ','.join(deps))
+class HypervisorTest(QemuSystemTest):
+
+timeout = 1000
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 console=hvc0 '
+panic_message = 'Kernel panic - not syncing'
+good_message = 'VFS: Cannot open root device'
+
+def extract_from_iso(self, iso, path):
+"""
+Extracts a file from an iso file into the test workdir
+
+:param iso: path to the iso file
+:param path: path within the iso file of the file to be extracted
+:returns: path of the extracted file
+"""
+filename = os.path.basename(path)
+
+cwd = os.getcwd()
+os.chdir(self.workdir)
+
+with open(filename, "w") as outfile:
+cmd = "xorriso -osirrox on -indev %s -cpx %s %s" % (iso, path, 
filename)
+subprocess.run(cmd.split(),
+   stdout=subprocess.DEVNULL, 
stderr=subprocess.DEVNULL)
+
+os.chdir(cwd)
+
+# Return complete path to extracted file.  Because callers to
+# extract_from_iso() specify 'path' with a leading slash, it is
+# necessary to use os.path.relpath() as otherwise os.path.join()
+# interprets it as an absolute path and drops the self.workdir part.
+return os.path.normpath(os.path.join(self.workdir, filename))
+
+def setUp(self):
+super().setUp()
+
+iso_url = 
('https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/ppc64le/alpine-standard-3.18.4-ppc64le.iso')
+
+# Alpine use sha256 so I recalculated this myself
+iso_sha256 = 
'c26b8d3e17c2f3f0fed02b4b1296589c2390e6d5548610099af75300edd7b3ff'
+iso_path = self.fetch_asset(iso_url, asset_hash=iso_sha256,
+algorithm = "sha256")
+
+self.iso_path = iso_path
+self.vmlinuz = self.extract_from_iso(iso_path, '/boot/vmlinuz-lts')
+self.initramfs = self.extract_from_iso(iso_path, '/boot/initramfs-lts')
+
+def do_start_alpine(self):
+self.vm.set_console()
+ke

[PULL 13/49] hw/ppc/spapr: Add missing license

2024-02-19 Thread Nicholas Piggin

From: Philippe Mathieu-Daudé 

Commit 9fdf0c2995 ("Start implementing pSeries logical partition
machine") added hw/ppc/spapr_hcall.c, then commit 962104f044
("hw/ppc: moved hcalls that depend on softmmu") extracted the
system code to hw/ppc/spapr_softmmu.c. Take the license and
copyrights from the original spapr_hcall.c at commit 9fdf0c2995.

Signed-off-by: Philippe Mathieu-Daudé 
[npiggin: Update file description.]
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr_softmmu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/ppc/spapr_softmmu.c b/hw/ppc/spapr_softmmu.c
index fc1bbc0b61..2fade94029 100644
--- a/hw/ppc/spapr_softmmu.c
+++ b/hw/ppc/spapr_softmmu.c
@@ -1,3 +1,12 @@
+/*
+ * MMU hypercalls for the sPAPR (pseries) vHyp hypervisor that is used by TCG
+ *
+ * Copyright (c) 2004-2007 Fabrice Bellard
+ * Copyright (c) 2007 Jocelyn Mayer
+ * Copyright (c) 2010 David Gibson, IBM Corporation.
+ *
+ * SPDX-License-Identifier: MIT
+ */
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/memalign.h"
-- 
2.42.0

[PULL 02/49] target/ppc: Fix crash on machine check caused by ifetch

2024-02-19 Thread Nicholas Piggin

is_prefix_insn_excp() loads the first word of the instruction address
which caused an exception, to determine whether or not it was prefixed
so the prefix bit can be set in [H]SRR1.

This works if the instruction image can be loaded, but if the exception
was caused by an ifetch, this load could fail and cause a recursive
exception and crash. Machine checks caused by ifetch are not excluded
from the prefix check and can crash (see issue 2108 for an example).

Fix this by excluding machine checks caused by ifetch from the prefix
check.

Cc: qemu-sta...@nongnu.org
Acked-by: Cédric Le Goater 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2108
Fixes: 55a7fa34f89 ("target/ppc: Machine check on invalid real address access 
on POWER9/10")
Fixes: 5a5d3b23cb2 ("target/ppc: Add SRR1 prefix indication to interrupt 
handlers")
Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 2ec6429e36..98952de267 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1312,6 +1312,10 @@ static bool is_prefix_insn_excp(PowerPCCPU *cpu, int 
excp)
 {
 CPUPPCState *env = &cpu->env;
 
+if (!(env->insns_flags2 & PPC2_ISA310)) {
+return false;
+}
+
 if (!tcg_enabled()) {
 /*
  * This does not load instructions and set the prefix bit correctly
@@ -1322,6 +1326,15 @@ static bool is_prefix_insn_excp(PowerPCCPU *cpu, int 
excp)
 }
 
 switch (excp) {
+case POWERPC_EXCP_MCHECK:
+if (!(env->error_code & PPC_BIT(42))) {
+/*
+ * Fetch attempt caused a machine check, so attempting to fetch
+ * again would cause a recursive machine check.
+ */
+return false;
+}
+break;
 case POWERPC_EXCP_HDSI:
 /* HDSI PRTABLE_FAULT has the originating access type in error_code */
 if ((env->spr[SPR_HDSISR] & DSISR_PRTABLE_FAULT) &&
@@ -1332,10 +1345,10 @@ static bool is_prefix_insn_excp(PowerPCCPU *cpu, int 
excp)
  * instruction at NIP would cause recursive faults with the same
  * translation).
  */
-break;
+return false;
 }
-/* fall through */
-case POWERPC_EXCP_MCHECK:
+break;
+
 case POWERPC_EXCP_DSI:
 case POWERPC_EXCP_DSEG:
 case POWERPC_EXCP_ALIGN:
@@ -1346,17 +1359,13 @@ static bool is_prefix_insn_excp(PowerPCCPU *cpu, int 
excp)
 case POWERPC_EXCP_VPU:
 case POWERPC_EXCP_VSXU:
 case POWERPC_EXCP_FU:
-case POWERPC_EXCP_HV_FU: {
-uint32_t insn = ppc_ldl_code(env, env->nip);
-if (is_prefix_insn(env, insn)) {
-return true;
-}
+case POWERPC_EXCP_HV_FU:
 break;
-}
 default:
-break;
+return false;
 }
-return false;
+
+return is_prefix_insn(env, ppc_ldl_code(env, env->nip));
 }
 #else
 static bool is_prefix_insn_excp(PowerPCCPU *cpu, int excp)
@@ -3224,6 +3233,7 @@ void ppc_cpu_do_transaction_failed(CPUState *cs, hwaddr 
physaddr,
 
 switch (env->excp_model) {
 #if defined(TARGET_PPC64)
+case POWERPC_EXCP_POWER8:
 case POWERPC_EXCP_POWER9:
 case POWERPC_EXCP_POWER10:
 /*
@@ -3245,6 +3255,10 @@ void ppc_cpu_do_transaction_failed(CPUState *cs, hwaddr 
physaddr,
 env->error_code |= PPC_BIT(42);
 
 } else { /* Fetch */
+/*
+ * is_prefix_insn_excp() tests !PPC_BIT(42) to avoid fetching
+ * the instruction, so that must always be clear for fetches.
+ */
 env->error_code = PPC_BIT(36) | PPC_BIT(44) | PPC_BIT(45);
 }
 break;
-- 
2.42.0

[PULL 10/49] tests/avocado: Use default CPU for pseries machine

2024-02-19 Thread Nicholas Piggin

Use the default CPU with the pseries machine unless there is a
specific requirement.

Signed-off-by: Nicholas Piggin 
---
 tests/avocado/migration.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/avocado/migration.py b/tests/avocado/migration.py
index 09b62f813e..be6234b3c2 100644
--- a/tests/avocado/migration.py
+++ b/tests/avocado/migration.py
@@ -123,7 +123,6 @@ class PPC64(MigrationTest):
 """
 :avocado: tags=arch:ppc64
 :avocado: tags=machine:pseries
-:avocado: tags=cpu:power9_v2.0
 """
 
 def test_migration_with_tcp_localhost(self):
-- 
2.42.0

[PULL 04/49] tests/avocado: improve flaky ppc/pnv boot_linux_console.py test

2024-02-19 Thread Nicholas Piggin

The expected MTD partition detection output does not always appear on
the console, despite the test reaching the boot loader and the string
appearing in dmesg. Possibly due to an init script that quietens the
console output. Using an earlier log message improves reliability.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 tests/avocado/boot_linux_console.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index 3f0180e1f8..af104fff1c 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -1368,7 +1368,8 @@ def do_test_ppc64_powernv(self, proc):
 self.wait_for_console_pattern("CPU: " + proc + " generation processor")
 self.wait_for_console_pattern("zImage starting: loaded")
 self.wait_for_console_pattern("Run /init as init process")
-self.wait_for_console_pattern("Creating 1 MTD partitions")
+# Device detection output driven by udev probing is sometimes cut off
+# from console output, suspect S14silence-console init script.
 
 def test_ppc_powernv8(self):
 """
-- 
2.42.0

[PULL 09/49] tests/avocado: Add FreeBSD distro boot tests for ppc

2024-02-19 Thread Nicholas Piggin

FreeBSD project provides qcow2 images that work well for testing QEMU.
Add pseries tests for HPT and Radix, KVM and TCG. This uses a short
term VM image, because FreeBSD has not set up long term builds for
ppc64 at present.

Other architectures could be added so this does not get a ppc_ prefix
but is instead named similarly to boot_linux.

Reviewed-by: Warner Losh 
Signed-off-by: Nicholas Piggin 

Unfortunately the latest stable (14.0) x86-64 VM image does not seem to
output to console by default and I've not been able to find a reliable
way to edit the filesystem to change the boot loader options, or use
console input in the test case to change it on the fly.
---
 tests/avocado/boot_freebsd.py | 174 ++
 1 file changed, 174 insertions(+)
 create mode 100644 tests/avocado/boot_freebsd.py

diff --git a/tests/avocado/boot_freebsd.py b/tests/avocado/boot_freebsd.py
new file mode 100644
index 00..c01cd06cca
--- /dev/null
+++ b/tests/avocado/boot_freebsd.py
@@ -0,0 +1,174 @@
+# Functional tests that boot FreeBSD in various configurations
+#
+# Copyright (c) 2023 IBM Corporation
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later. See the COPYING file in the top-level directory.
+
+import os
+import subprocess
+
+from avocado import skipUnless
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado.utils import archive
+from avocado.utils import process
+from avocado.utils.path import find_command
+
+@skipUnless(os.getenv('AVOCADO_ALLOW_LARGE_STORAGE'), 'storage limited')
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
+class BootFreeBSDPPC64(QemuSystemTest):
+"""
+:avocado: tags=arch:ppc64
+"""
+
+timeout = 360
+
+def setUp(self):
+super().setUp()
+
+# We need zstd for all the tests
+# See https://github.com/avocado-framework/avocado/issues/5609
+zstd = find_command('zstd', False)
+if zstd is False:
+self.cancel('Could not find "zstd", which is required to '
+'decompress rootfs')
+tar = find_command('tar', False)
+if tar is False:
+self.cancel('Could not find "tar", which is required to '
+'decompress rootfs')
+
+drive_url = 
('https://artifact.ci.freebsd.org/snapshot/15.0-CURRENT/8a735ffdf04936c6785ac4fa31486639262dd416/powerpc/powerpc64le/disk.qcow2.zst')
+drive_hash = '95d863dbbc4b60f4899d1ef21d6489fca05bf03d'
+drive_path_zstd = self.fetch_asset(drive_url, asset_hash=drive_hash)
+self.drive_path = os.path.join(self.workdir, 'disk.qcow2')
+
+cmd = f"{zstd} -d {drive_path_zstd} -o {self.drive_path}"
+process.run(cmd)
+
+kernel_url = 
('https://artifact.ci.freebsd.org/snapshot/15.0-CURRENT/8a735ffdf04936c6785ac4fa31486639262dd416/powerpc/powerpc64le/kernel.txz')
+kernel_hash = '31d14c2dc658858830a7acab5128a5b91ea548cf'
+kernel_path_txz = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+self.kernel_path = os.path.join(self.workdir, 'kernel')
+
+with open(self.kernel_path, "w") as outfile:
+cmd = f"{tar} OJxf {kernel_path_txz} ./boot/kernel/kernel"
+subprocess.run(cmd.split(), stdout=outfile)
+
+def set_pseries_devices(self):
+self.vm.add_args('-drive', 
f"file={self.drive_path},format=qcow2,if=virtio")
+self.vm.add_args('-net', 'nic,model=virtio')
+
+def set_powernv_devices(self):
+self.vm.add_args('-device', 
'nvme,bus=pcie.2,addr=0x0,serial=1234,drive=drive0',
+ '-device', 
'e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0',
+ '-netdev', 
'user,id=net0,hostfwd=::20022-:22,hostname=freebsd')
+self.vm.add_args("-drive", 
f"file={self.drive_path},format=qcow2,if=none,id=drive0")
+self.vm.add_args("-kernel", self.kernel_path)
+
+def run_pseries_test(self, force_HPT=False):
+if force_HPT:
+self.vm.add_args('-m', '4g')
+else:
+self.vm.add_args('-m', '1g')
+self.vm.add_args('-smp', '4')
+self.set_pseries_devices()
+self.vm.set_console()
+self.vm.launch()
+
+wait_for_console_pattern(self, 'Hit [Enter] to boot immediately, or 
any other key for command prompt.')
+if force_HPT:
+exec_command(self, 'x')
+wait_for_console_pattern(self, 'OK')
+exec_command(self, 'set radix_mmu=0')
+exec_command(self, 'boot')
+wait_for_console_pattern(self, 'cas: selected hash MMU', 'panic:')
+else:
+exec_command(self, '')
+wait_for_console_pattern(self, 'cas: selected radix MMU', 'panic:')
+
+wait_for_console_pattern(self, 'FreeBSD 15.0-CURRENT', 'panic:')
+wait_for_console_pattern(self, 'FreeBSD/SMP: Multipr

[PULL 17/49] ppc/spapr: Introduce SPAPR_IRQ_NR_IPIS to refer IRQ range for CPU IPIs.

2024-02-19 Thread Nicholas Piggin

From: Harsh Prateek Bora 

spapr_irq_init currently uses existing macro SPAPR_XIRQ_BASE to refer to
the range of CPU IPIs during initialization of nr-irqs property.
It is more appropriate to have its own define which can be further
reused as appropriate for correct interpretation.

Suggested-by: Cedric Le Goater 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Harsh Prateek Bora 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr_irq.c |  6 --
 include/hw/ppc/spapr_irq.h | 14 +-
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index a0d1e1298e..97b2fc42ab 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -23,6 +23,8 @@
 
 #include "trace.h"
 
+QEMU_BUILD_BUG_ON(SPAPR_IRQ_NR_IPIS > SPAPR_XIRQ_BASE);
+
 static const TypeInfo spapr_intc_info = {
 .name = TYPE_SPAPR_INTC,
 .parent = TYPE_INTERFACE,
@@ -329,7 +331,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
 int i;
 
 dev = qdev_new(TYPE_SPAPR_XIVE);
-qdev_prop_set_uint32(dev, "nr-irqs", smc->nr_xirqs + SPAPR_XIRQ_BASE);
+qdev_prop_set_uint32(dev, "nr-irqs", smc->nr_xirqs + 
SPAPR_IRQ_NR_IPIS);
 /*
  * 8 XIVE END structures per CPU. One for each available
  * priority
@@ -356,7 +358,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
 }
 
 spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
-  smc->nr_xirqs + SPAPR_XIRQ_BASE);
+  smc->nr_xirqs + SPAPR_IRQ_NR_IPIS);
 
 /*
  * Mostly we don't actually need this until reset, except that not
diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index c22a72c9e2..4fd2d5853d 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -14,9 +14,21 @@
 #include "qom/object.h"
 
 /*
- * IRQ range offsets per device type
+ * The XIVE IRQ backend uses the same layout as the XICS backend but
+ * covers the full range of the IRQ number space. The IRQ numbers for
+ * the CPU IPIs are allocated at the bottom of this space, below 4K,
+ * to preserve compatibility with XICS which does not use that range.
+ */
+
+/*
+ * CPU IPI range (XIVE only)
  */
 #define SPAPR_IRQ_IPI0x0
+#define SPAPR_IRQ_NR_IPIS0x1000
+
+/*
+ * IRQ range offsets per device type
+ */
 
 #define SPAPR_XIRQ_BASE  XICS_IRQ_BASE /* 0x1000 */
 #define SPAPR_IRQ_EPOW   (SPAPR_XIRQ_BASE + 0x)
-- 
2.42.0

[PULL 22/49] hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses

2024-02-19 Thread Nicholas Piggin

From: Peter Maydell 

The raven_io_ops MemoryRegionOps is the only one in the source tree
which sets .valid.unaligned to indicate that it should support
unaligned accesses and which does not also set .impl.unaligned to
indicate that its read and write functions can do the unaligned
handling themselves.  This is a problem, because at the moment the
core memory system does not implement the support for handling
unaligned accesses by doing a series of aligned accesses and
combining them (system/memory.c:access_with_adjusted_size() has a
TODO comment noting this).

Fortunately raven_io_read() and raven_io_write() will correctly deal
with the case of being passed an unaligned address, so we can fix the
missing unaligned access support by setting .impl.unaligned in the
MemoryRegionOps struct.

Fixes: 9a1839164c9c8f06 ("raven: Implement non-contiguous I/O region")
Reviewed-by: Cédric Le Goater 
Tested-by: Cédric Le Goater 
Signed-off-by: Peter Maydell 
Signed-off-by: Nicholas Piggin 
-- 
2.42.0

[PULL 25/49] ppc/pnv: New powernv10-rainier machine type

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

Create a new powernv machine type, powernv10-rainier, that
will contain rainier-specific devices.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index b949398689..33b905f854 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2249,7 +2249,7 @@ static void pnv_machine_power9_class_init(ObjectClass 
*oc, void *data)
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
 }
 
-static void pnv_machine_power10_class_init(ObjectClass *oc, void *data)
+static void pnv_machine_p10_common_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
 PnvMachineClass *pmc = PNV_MACHINE_CLASS(oc);
@@ -2261,7 +2261,6 @@ static void pnv_machine_power10_class_init(ObjectClass 
*oc, void *data)
 { TYPE_PNV_PHB_ROOT_PORT, "version", "5" },
 };
 
-mc->desc = "IBM PowerNV (Non-Virtualized) POWER10";
 mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power10_v2.0");
 compat_props_add(mc->compat_props, phb_compat, G_N_ELEMENTS(phb_compat));
 
@@ -2276,6 +2275,22 @@ static void pnv_machine_power10_class_init(ObjectClass 
*oc, void *data)
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
 }
 
+static void pnv_machine_power10_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
+pnv_machine_p10_common_class_init(oc, data);
+mc->desc = "IBM PowerNV (Non-Virtualized) POWER10";
+}
+
+static void pnv_machine_p10_rainier_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
+pnv_machine_p10_common_class_init(oc, data);
+mc->desc = "IBM PowerNV (Non-Virtualized) POWER10 Rainier";
+}
+
 static bool pnv_machine_get_hb(Object *obj, Error **errp)
 {
 PnvMachineState *pnv = PNV_MACHINE(obj);
@@ -2381,6 +2396,11 @@ static void pnv_machine_class_init(ObjectClass *oc, void 
*data)
 }
 
 static const TypeInfo types[] = {
+{
+.name  = MACHINE_TYPE_NAME("powernv10-rainier"),
+.parent= MACHINE_TYPE_NAME("powernv10"),
+.class_init= pnv_machine_p10_rainier_class_init,
+},
 {
 .name  = MACHINE_TYPE_NAME("powernv10"),
 .parent= TYPE_PNV_MACHINE,
-- 
2.42.0

[PULL 16/49] hw/ppc/spapr: Rename 'softmmu' -> 'vhyp_mmu'

2024-02-19 Thread Nicholas Piggin

From: Philippe Mathieu-Daudé 

To reduce the use of the term 'softmmu', rename spapr_softmmu.c
to spapr_vhyp_mmu.c.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Philippe Mathieu-Daudé 
[np: change name]
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/meson.build   | 2 +-
 hw/ppc/{spapr_softmmu.c => spapr_vhyp_mmu.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename hw/ppc/{spapr_softmmu.c => spapr_vhyp_mmu.c} (100%)

diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index eba3406e7f..30bd2aaccf 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -31,7 +31,7 @@ ppc_ss.add(when: 'CONFIG_PSERIES', if_true: files(
   'pef.c',
 ))
 ppc_ss.add(when: ['CONFIG_PSERIES', 'CONFIG_TCG'], if_true: files(
-  'spapr_softmmu.c',
+  'spapr_vhyp_mmu.c',
 ))
 ppc_ss.add(when: 'CONFIG_SPAPR_RNG', if_true: files('spapr_rng.c'))
 if host_os == 'linux'
diff --git a/hw/ppc/spapr_softmmu.c b/hw/ppc/spapr_vhyp_mmu.c
similarity index 100%
rename from hw/ppc/spapr_softmmu.c
rename to hw/ppc/spapr_vhyp_mmu.c
-- 
2.42.0

[PULL 26/49] ppc/pnv: Add pca9552 to powernv10-rainier for PCIe hotplug power control

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

The Power Hypervisor code expects to see a pca9552 device connected
to the 3rd PNV I2C engine on port 1 at I2C address 0x63 (or left-
justified address of 0xC6).  This is used by hypervisor code to
control PCIe slot power during hotplug events.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/Kconfig   |  1 +
 hw/ppc/pnv.c | 25 +
 include/hw/ppc/pnv.h |  1 +
 3 files changed, 27 insertions(+)

diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 44263a58c4..8e592e4307 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -32,6 +32,7 @@ config POWERNV
 select XIVE
 select FDT_PPC
 select PCI_POWERNV
+select PCA9552
 
 config PPC405
 bool
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 33b905f854..78f5c6262a 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -790,6 +790,7 @@ static void pnv_init(MachineState *machine)
 const char *bios_name = machine->firmware ?: FW_FILE_NAME;
 PnvMachineState *pnv = PNV_MACHINE(machine);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
+PnvMachineClass *pmc = PNV_MACHINE_GET_CLASS(machine);
 char *fw_filename;
 long fw_size;
 uint64_t chip_ram_start = 0;
@@ -979,6 +980,13 @@ static void pnv_init(MachineState *machine)
  */
 pnv->powerdown_notifier.notify = pnv_powerdown_notify;
 qemu_register_powerdown_notifier(&pnv->powerdown_notifier);
+
+/*
+ * Create/Connect any machine-specific I2C devices
+ */
+if (pmc->i2c_init) {
+pmc->i2c_init(pnv);
+}
 }
 
 /*
@@ -1879,6 +1887,21 @@ static void pnv_chip_power10_realize(DeviceState *dev, 
Error **errp)
   qdev_get_gpio_in(DEVICE(&chip10->psi),
PSIHB9_IRQ_SBE_I2C));
 }
+
+}
+
+static void pnv_rainier_i2c_init(PnvMachineState *pnv)
+{
+int i;
+for (i = 0; i < pnv->num_chips; i++) {
+Pnv10Chip *chip10 = PNV10_CHIP(pnv->chips[i]);
+
+/*
+ * Add a PCA9552 I2C device for PCIe hotplug control
+ * to engine 2, bus 1, address 0x63
+ */
+i2c_slave_create_simple(chip10->i2c[2].busses[1], "pca9552", 0x63);
+}
 }
 
 static uint32_t pnv_chip_power10_xscom_pcba(PnvChip *chip, uint64_t addr)
@@ -2286,9 +2309,11 @@ static void pnv_machine_power10_class_init(ObjectClass 
*oc, void *data)
 static void pnv_machine_p10_rainier_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
+PnvMachineClass *pmc = PNV_MACHINE_CLASS(oc);
 
 pnv_machine_p10_common_class_init(oc, data);
 mc->desc = "IBM PowerNV (Non-Virtualized) POWER10 Rainier";
+pmc->i2c_init = pnv_rainier_i2c_init;
 }
 
 static bool pnv_machine_get_hb(Object *obj, Error **errp)
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 7e5fef7c43..110ac9aace 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -76,6 +76,7 @@ struct PnvMachineClass {
 int compat_size;
 
 void (*dt_power_mgt)(PnvMachineState *pnv, void *fdt);
+void (*i2c_init)(PnvMachineState *pnv);
 };
 
 struct PnvMachineState {
-- 
2.42.0

[PULL 35/49] target/ppc: Update gdbstub to read SPR's CFAR, DEC, HDEC, TB-L/U

2024-02-19 Thread Nicholas Piggin

From: Saif Abrar 

SPR's CFAR, DEC, HDEC, TB-L/U are not implemented as part of CPUPPCState.
Hence, gdbstub is not able to access them using (CPUPPCState *)env->spr[] array.
Update gdb_get_spr_reg() method to handle these SPR's specifically.

Signed-off-by: Saif Abrar 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/gdbstub.c | 40 ++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index ec5731e5d6..dfe31d0f47 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -394,7 +394,32 @@ static int gdb_get_spr_reg(CPUPPCState *env, GByteArray 
*buf, int n)
 }
 
 len = TARGET_LONG_SIZE;
-gdb_get_regl(buf, env->spr[reg]);
+
+/* Handle those SPRs that are not part of the env->spr[] array */
+target_ulong val;
+switch (reg) {
+#if defined(TARGET_PPC64)
+case SPR_CFAR:
+val = env->cfar;
+break;
+#endif
+case SPR_HDEC:
+val = cpu_ppc_load_hdecr(env);
+break;
+case SPR_TBL:
+val = cpu_ppc_load_tbl(env);
+break;
+case SPR_TBU:
+val = cpu_ppc_load_tbu(env);
+break;
+case SPR_DECR:
+val = cpu_ppc_load_decr(env);
+break;
+default:
+val = env->spr[reg];
+}
+gdb_get_regl(buf, val);
+
 ppc_maybe_bswap_register(env, gdb_get_reg_ptr(buf, len), len);
 return len;
 }
@@ -411,7 +436,18 @@ static int gdb_set_spr_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 
 len = TARGET_LONG_SIZE;
 ppc_maybe_bswap_register(env, mem_buf, len);
-env->spr[reg] = ldn_p(mem_buf, len);
+
+/* Handle those SPRs that are not part of the env->spr[] array */
+target_ulong val = ldn_p(mem_buf, len);
+switch (reg) {
+#if defined(TARGET_PPC64)
+case SPR_CFAR:
+env->cfar = val;
+break;
+#endif
+default:
+env->spr[reg] = val;
+}
 
 return len;
 }
-- 
2.42.0

[PULL 06/49] tests/avocado: Add ppc pseries and powernv hash MMU tests

2024-02-19 Thread Nicholas Piggin

POWER CPUs support hash and radix MMU modes. Linux supports running in
either mode, but defaults to radix. To keep up testing of QEMU's hash
MMU implementation, add some Linux hash boot tests.

Signed-off-by: Nicholas Piggin 
---
 tests/avocado/ppc_powernv.py | 23 +++
 tests/avocado/ppc_pseries.py | 20 +---
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/tests/avocado/ppc_powernv.py b/tests/avocado/ppc_powernv.py
index d0e5c07bde..4342941d5d 100644
--- a/tests/avocado/ppc_powernv.py
+++ b/tests/avocado/ppc_powernv.py
@@ -12,11 +12,11 @@
 class powernvMachine(QemuSystemTest):
 
 timeout = 90
-KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 console=hvc0 '
 panic_message = 'Kernel panic - not syncing'
 good_message = 'VFS: Cannot open root device'
 
-def do_test_linux_boot(self):
+def do_test_linux_boot(self, command_line = KERNEL_COMMON_COMMAND_LINE):
 self.require_accelerator("tcg")
 kernel_url = ('https://archives.fedoraproject.org/pub/archive'
   '/fedora-secondary/releases/29/Everything/ppc64le/os'
@@ -25,9 +25,8 @@ def do_test_linux_boot(self):
 kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
 
 self.vm.set_console()
-kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
 self.vm.add_args('-kernel', kernel_path,
- '-append', kernel_command_line)
+ '-append', command_line)
 self.vm.launch()
 
 def test_linux_boot(self):
@@ -54,6 +53,22 @@ def test_linux_smp_boot(self):
 wait_for_console_pattern(self, console_pattern, self.panic_message)
 wait_for_console_pattern(self, self.good_message, self.panic_message)
 
+def test_linux_smp_hpt_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+:avocado: tags=accel:tcg
+"""
+
+self.vm.add_args('-smp', '4')
+self.do_test_linux_boot(self.KERNEL_COMMON_COMMAND_LINE +
+'disable_radix')
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, 'hash-mmu: Initializing hash mmu',
+ self.panic_message)
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
 def test_linux_smt_boot(self):
 """
 :avocado: tags=arch:ppc64
diff --git a/tests/avocado/ppc_pseries.py b/tests/avocado/ppc_pseries.py
index a8311e6555..74aaa4ac4a 100644
--- a/tests/avocado/ppc_pseries.py
+++ b/tests/avocado/ppc_pseries.py
@@ -12,11 +12,11 @@
 class pseriesMachine(QemuSystemTest):
 
 timeout = 90
-KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 console=hvc0 '
 panic_message = 'Kernel panic - not syncing'
 good_message = 'VFS: Cannot open root device'
 
-def do_test_ppc64_linux_boot(self):
+def do_test_ppc64_linux_boot(self, kernel_command_line = 
KERNEL_COMMON_COMMAND_LINE):
 kernel_url = ('https://archives.fedoraproject.org/pub/archive'
   '/fedora-secondary/releases/29/Everything/ppc64le/os'
   '/ppc/ppc64/vmlinuz')
@@ -24,7 +24,6 @@ def do_test_ppc64_linux_boot(self):
 kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
 
 self.vm.set_console()
-kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
 self.vm.add_args('-kernel', kernel_path,
  '-append', kernel_command_line)
 self.vm.launch()
@@ -62,6 +61,21 @@ def test_ppc64_linux_smp_boot(self):
 wait_for_console_pattern(self, console_pattern, self.panic_message)
 wait_for_console_pattern(self, self.good_message, self.panic_message)
 
+def test_ppc64_linux_hpt_smp_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:pseries
+"""
+
+self.vm.add_args('-smp', '4')
+self.do_test_ppc64_linux_boot(self.KERNEL_COMMON_COMMAND_LINE +
+  'disable_radix')
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, 'hash-mmu: Initializing hash mmu',
+ self.panic_message)
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
 def test_ppc64_linux_smt_boot(self):
 """
 :avocado: tags=arch:ppc64
-- 
2.42.0

[PULL 31/49] ppc/pnv: Test pnv i2c master and connected devices

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

Tests the following for both P9 and P10:
  - I2C master POR status
  - I2C master status after immediate reset

Tests the following for powernv10-ranier only:
  - Config pca9552 hotplug device pins as inputs then
Read the INPUT0/1 registers to verify all pins are high
  - Connected GPIO pin tests of P10 PCA9552 device.  Tests
output of pins 0-4 affect input of pins 5-9 respectively.
  - PCA9554 GPIO pins test.  Tests input and ouput functionality.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv_i2c.c| 131 +
 include/hw/i2c/pnv_i2c_regs.h   | 143 ++
 tests/qtest/meson.build |   1 +
 tests/qtest/pnv-host-i2c-test.c | 491 
 tests/qtest/pnv-xscom-test.c|  61 +---
 tests/qtest/pnv-xscom.h |  80 ++
 6 files changed, 717 insertions(+), 190 deletions(-)
 create mode 100644 include/hw/i2c/pnv_i2c_regs.h
 create mode 100644 tests/qtest/pnv-host-i2c-test.c
 create mode 100644 tests/qtest/pnv-xscom.h

diff --git a/hw/ppc/pnv_i2c.c b/hw/ppc/pnv_i2c.c
index 774946d6b2..4581cc5e5d 100644
--- a/hw/ppc/pnv_i2c.c
+++ b/hw/ppc/pnv_i2c.c
@@ -22,136 +22,7 @@
 
 #include 
 
-/* I2C FIFO register */
-#define I2C_FIFO_REG0x4
-#define I2C_FIFOPPC_BITMASK(0, 7)
-
-/* I2C command register */
-#define I2C_CMD_REG 0x5
-#define I2C_CMD_WITH_START  PPC_BIT(0)
-#define I2C_CMD_WITH_ADDR   PPC_BIT(1)
-#define I2C_CMD_READ_CONT   PPC_BIT(2)
-#define I2C_CMD_WITH_STOP   PPC_BIT(3)
-#define I2C_CMD_INTR_STEERING   PPC_BITMASK(6, 7) /* P9 */
-#define   I2C_CMD_INTR_STEER_HOST   1
-#define   I2C_CMD_INTR_STEER_OCC2
-#define I2C_CMD_DEV_ADDRPPC_BITMASK(8, 14)
-#define I2C_CMD_READ_NOT_WRITE  PPC_BIT(15)
-#define I2C_CMD_LEN_BYTES   PPC_BITMASK(16, 31)
-#define I2C_MAX_TFR_LEN 0xfff0ull
-
-/* I2C mode register */
-#define I2C_MODE_REG0x6
-#define I2C_MODE_BIT_RATE_DIV   PPC_BITMASK(0, 15)
-#define I2C_MODE_PORT_NUM   PPC_BITMASK(16, 21)
-#define I2C_MODE_ENHANCED   PPC_BIT(28)
-#define I2C_MODE_DIAGNOSTIC PPC_BIT(29)
-#define I2C_MODE_PACING_ALLOW   PPC_BIT(30)
-#define I2C_MODE_WRAP   PPC_BIT(31)
-
-/* I2C watermark register */
-#define I2C_WATERMARK_REG   0x7
-#define I2C_WATERMARK_HIGH  PPC_BITMASK(16, 19)
-#define I2C_WATERMARK_LOW   PPC_BITMASK(24, 27)
-
-/*
- * I2C interrupt mask and condition registers
- *
- * NB: The function of 0x9 and 0xa changes depending on whether you're reading
- * or writing to them. When read they return the interrupt condition bits
- * and on writes they update the interrupt mask register.
- *
- *  The bit definitions are the same for all the interrupt registers.
- */
-#define I2C_INTR_MASK_REG   0x8
-
-#define I2C_INTR_RAW_COND_REG   0x9 /* read */
-#define I2C_INTR_MASK_OR_REG0x9 /* write*/
-
-#define I2C_INTR_COND_REG   0xa /* read */
-#define I2C_INTR_MASK_AND_REG   0xa /* write */
-
-#define I2C_INTR_ALLPPC_BITMASK(16, 31)
-#define I2C_INTR_INVALID_CMDPPC_BIT(16)
-#define I2C_INTR_LBUS_PARITY_ERRPPC_BIT(17)
-#define I2C_INTR_BKEND_OVERRUN_ERR  PPC_BIT(18)
-#define I2C_INTR_BKEND_ACCESS_ERR   PPC_BIT(19)
-#define I2C_INTR_ARBT_LOST_ERR  PPC_BIT(20)
-#define I2C_INTR_NACK_RCVD_ERR  PPC_BIT(21)
-#define I2C_INTR_DATA_REQ   PPC_BIT(22)
-#define I2C_INTR_CMD_COMP   PPC_BIT(23)
-#define I2C_INTR_STOP_ERR   PPC_BIT(24)
-#define I2C_INTR_I2C_BUSY   PPC_BIT(25)
-#define I2C_INTR_NOT_I2C_BUSY   PPC_BIT(26)
-#define I2C_INTR_SCL_EQ_1   PPC_BIT(28)
-#define I2C_INTR_SCL_EQ_0   PPC_BIT(29)
-#define I2C_INTR_SDA_EQ_1   PPC_BIT(30)
-#define I2C_INTR_SDA_EQ_0   PPC_BIT(31)
-
-/* I2C status register */
-#define I2C_RESET_I2C_REG   0xb /* write */
-#define I2C_RESET_ERRORS0xc
-#define I2C_STAT_REG0xb /* read */
-#define I2C_STAT_INVALID_CMDPPC_BIT(0)
-#define I2C_STAT_LBUS_PARITY_ERRPPC_BIT(1)
-#define I2C_STAT_BKEND_OVERRUN_ERR  PPC_BIT(2)
-#define I2C_STAT_BKEND_ACCESS_ERR   PPC_BIT(3)
-#define I2C_STAT_ARBT_LOST_ERR  PPC_BIT(4)
-#define I2C_STAT_NACK_RCVD_ERR  PPC_BIT(5)
-#define I2C_STAT_DATA_REQ   PPC_BIT(6)
-#define I2C_STAT_CMD_COMP   PPC_BIT(7)
-#define I2C_STAT_STOP_ERR   PPC_BIT(8)
-#define I2C_STAT_UPPER_THRS PPC_BITMASK(9, 15)
-#define I2C_STAT_ANY_I2C_INTR   PPC_BIT(16)
-#define I2C_STAT_PORT_HISTORY_BUSY  PPC_BIT(19)
-#define I2C_STAT_SCL_INPUT_LEVELPPC_BI

[PULL 24/49] misc/pca9552: Let external devices set pca9552 inputs

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

Allow external devices to drive pca9552 input pins by adding
input GPIO's to the model.  This allows a device to connect
its output GPIO's to the pca9552 input GPIO's.

In order for an external device to set the state of a pca9552
pin, the pin must first be configured for high impedance (LED
is off).  If the pca9552 pin is configured to drive the pin low
(LED is on), then external input will be ignored.

Here is a table describing the logical state of a pca9552 pin
given the state being driven by the pca9552 and an external device:

   PCA9552
   Configured
   State

  | Hi-Z | Low |
--+--+-+
  External   Hi-Z |  Hi  | Low |
  Device--+--+-+
  State  Low  |  Low | Low |
--+--+-+

Reviewed-by: Andrew Jeffery 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 hw/misc/pca9552.c | 50 +--
 include/hw/misc/pca9552.h |  3 ++-
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/hw/misc/pca9552.c b/hw/misc/pca9552.c
index f00a149d61..2ae13af35e 100644
--- a/hw/misc/pca9552.c
+++ b/hw/misc/pca9552.c
@@ -44,6 +44,8 @@ DECLARE_CLASS_CHECKERS(PCA955xClass, PCA955X,
 #define PCA9552_LED_OFF  0x1
 #define PCA9552_LED_PWM0 0x2
 #define PCA9552_LED_PWM1 0x3
+#define PCA9552_PIN_LOW  0x0
+#define PCA9552_PIN_HIZ  0x1
 
 static const char *led_state[] = {"on", "off", "pwm0", "pwm1"};
 
@@ -110,22 +112,27 @@ static void pca955x_update_pin_input(PCA955xState *s)
 
 for (i = 0; i < k->pin_count; i++) {
 uint8_t input_reg = PCA9552_INPUT0 + (i / 8);
-uint8_t input_shift = (i % 8);
+uint8_t bit_mask = 1 << (i % 8);
 uint8_t config = pca955x_pin_get_config(s, i);
+uint8_t old_value = s->regs[input_reg] & bit_mask;
+uint8_t new_value;
 
 switch (config) {
 case PCA9552_LED_ON:
 /* Pin is set to 0V to turn on LED */
-qemu_set_irq(s->gpio[i], 0);
-s->regs[input_reg] &= ~(1 << input_shift);
+s->regs[input_reg] &= ~bit_mask;
 break;
 case PCA9552_LED_OFF:
 /*
  * Pin is set to Hi-Z to turn off LED and
- * pullup sets it to a logical 1.
+ * pullup sets it to a logical 1 unless
+ * external device drives it low.
  */
-qemu_set_irq(s->gpio[i], 1);
-s->regs[input_reg] |= 1 << input_shift;
+if (s->ext_state[i] == PCA9552_PIN_LOW) {
+s->regs[input_reg] &= ~bit_mask;
+} else {
+s->regs[input_reg] |=  bit_mask;
+}
 break;
 case PCA9552_LED_PWM0:
 case PCA9552_LED_PWM1:
@@ -133,6 +140,12 @@ static void pca955x_update_pin_input(PCA955xState *s)
 default:
 break;
 }
+
+/* update irq state only if pin state changed */
+new_value = s->regs[input_reg] & bit_mask;
+if (new_value != old_value) {
+qemu_set_irq(s->gpio_out[i], !!new_value);
+}
 }
 }
 
@@ -340,6 +353,7 @@ static const VMStateDescription pca9552_vmstate = {
 VMSTATE_UINT8(len, PCA955xState),
 VMSTATE_UINT8(pointer, PCA955xState),
 VMSTATE_UINT8_ARRAY(regs, PCA955xState, PCA955X_NR_REGS),
+VMSTATE_UINT8_ARRAY(ext_state, PCA955xState, PCA955X_PIN_COUNT_MAX),
 VMSTATE_I2C_SLAVE(i2c, PCA955xState),
 VMSTATE_END_OF_LIST()
 }
@@ -358,6 +372,7 @@ static void pca9552_reset(DeviceState *dev)
 s->regs[PCA9552_LS2] = 0x55;
 s->regs[PCA9552_LS3] = 0x55;
 
+memset(s->ext_state, PCA9552_PIN_HIZ, PCA955X_PIN_COUNT_MAX);
 pca955x_update_pin_input(s);
 
 s->pointer = 0xFF;
@@ -380,6 +395,26 @@ static void pca955x_initfn(Object *obj)
 }
 }
 
+static void pca955x_set_ext_state(PCA955xState *s, int pin, int level)
+{
+if (s->ext_state[pin] != level) {
+uint16_t pins_status = pca955x_pins_get_status(s);
+s->ext_state[pin] = level;
+pca955x_update_pin_input(s);
+pca955x_display_pins_status(s, pins_status);
+}
+}
+
+static void pca955x_gpio_in_handler(void *opaque, int pin, int level)
+{
+
+PCA955xState *s = PCA955X(opaque);
+PCA955xClass *k = PCA955X_GET_CLASS(s);
+
+assert((pin >= 0) && (pin < k->pin_count));
+pca955x_set_ext_state(s, pin, level);
+}
+
 static void pca955x_realize(DeviceState *dev, Error **errp)
 {
 PCA955xClass *k = PCA955X_GET_CLASS(dev);
@@ -389,7 +424,8 @@ static void pca955x_realize(DeviceState *dev, Error **errp)
 s->description = g_strdup("pca-unspecified");
 }
 
-qdev_init_gpio_out(dev, s->gpio, k->pin_count);
+qdev_init_gpio_out(dev, s->gpio_out, k->pin_count);
+qdev_init_gpio_in(dev, pca955x_gpio_in_handler, k->pin_count);
 }
 
 static Property pca955x_properties[] = {
diff --git a/include/hw/misc/pca9552.h b/

[PULL 20/49] spapr: Tag pseries-2.1 - 2.11 machines as deprecated

2024-02-19 Thread Nicholas Piggin

From: Cédric Le Goater 

pseries machines before version 2.11 have undergone many changes to
correct issues, mostly regarding migration compatibility. This is
obfuscating the code uselessly and makes maintenance more difficult.
Remove them and only keep the last version of the 2.x series, 2.12,
still in use by old distros.

Reviewed-by: Thomas Huth 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 docs/about/deprecated.rst | 8 
 hw/ppc/spapr.c| 1 +
 roms/skiboot  | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 5a2305ccd6..36bd3e15ef 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -229,6 +229,14 @@ The Nios II architecture is orphan.
 The machine is no longer in existence and has been long unmaintained
 in QEMU. This also holds for the TC51828 16MiB flash that it uses.
 
+``pseries-2.1`` up to ``pseries-2.11`` (since 9.0)
+''
+
+Older pseries machines before version 2.12 have undergone many changes
+to correct issues, mostly regarding migration compatibility. These are
+no longer maintained and removing them will make the code easier to
+read and maintain. Use versions 2.12 and above as a replacement.
+
 Backend options
 ---
 
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b442d18317..d1c6d70d8d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -5083,6 +5083,7 @@ static void spapr_machine_2_11_class_options(MachineClass 
*mc)
 spapr_machine_2_12_class_options(mc);
 smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_ON;
 compat_props_add(mc->compat_props, hw_compat_2_11, hw_compat_2_11_len);
+mc->deprecation_reason = "old and not maintained - use a 2.12+ version";
 }
 
 DEFINE_SPAPR_MACHINE(2_11, "2.11", false);
diff --git a/roms/skiboot b/roms/skiboot
index dbd5de6624..24a7eb3596 16
--- a/roms/skiboot
+++ b/roms/skiboot
@@ -1 +1 @@
-Subproject commit dbd5de6624d7466bb67d1eb4e57bc3a8e2ad9e87
+Subproject commit 24a7eb35966d93455520bc2debdd7954314b638b
-- 
2.42.0

[PULL 27/49] ppc/pnv: Wire up pca9552 GPIO pins for PCIe hotplug power control

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

For power10-rainier, a pca9552 device is used for PCIe slot hotplug
power control by the Power Hypervisor code.  The code expects that
some time after it enables power to a PCIe slot by asserting one of
the pca9552 GPIO pins 0-4, it should see a "power good" signal asserted
on one of pca9552 GPIO pins 5-9.

To simulate this behavior, we simply connect the GPIO outputs for
pins 0-4 to the GPIO inputs for pins 5-9.

Each PCIe slot is assigned 3 GPIO pins on the pca9552 device, for
control of up to 5 PCIe slots.  The per-slot signal names are:

   SLOTx_EN...PHYP uses this as an output to enable
  slot power.  We connect this to the
  SLOTx_PG pin to simulate a PGOOD signal.
   SLOTx_PG...PHYP uses this as in input to detect
  PGOOD for the slot.  For our purposes
  we just connect this to the SLOTx_EN
  output.
   SLOTx_Control..PHYP uses this as an output to prevent
  a race condition in the real hotplug
  circuitry, but we can ignore this output
  for simulation.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 78f5c6262a..97bdfb2d1e 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1900,7 +1900,19 @@ static void pnv_rainier_i2c_init(PnvMachineState *pnv)
  * Add a PCA9552 I2C device for PCIe hotplug control
  * to engine 2, bus 1, address 0x63
  */
-i2c_slave_create_simple(chip10->i2c[2].busses[1], "pca9552", 0x63);
+I2CSlave *dev = i2c_slave_create_simple(chip10->i2c[2].busses[1],
+"pca9552", 0x63);
+
+/*
+ * Connect PCA9552 GPIO pins 0-4 (SLOTx_EN) outputs to GPIO pins 5-9
+ * (SLOTx_PG) inputs in order to fake the pgood state of PCIe slots
+ * after hypervisor code sets a SLOTx_EN pin high.
+ */
+qdev_connect_gpio_out(DEVICE(dev), 0, qdev_get_gpio_in(DEVICE(dev), 
5));
+qdev_connect_gpio_out(DEVICE(dev), 1, qdev_get_gpio_in(DEVICE(dev), 
6));
+qdev_connect_gpio_out(DEVICE(dev), 2, qdev_get_gpio_in(DEVICE(dev), 
7));
+qdev_connect_gpio_out(DEVICE(dev), 3, qdev_get_gpio_in(DEVICE(dev), 
8));
+qdev_connect_gpio_out(DEVICE(dev), 4, qdev_get_gpio_in(DEVICE(dev), 
9));
 }
 }
 
-- 
2.42.0

[PULL 47/49] target/ppc: 4xx optimise tlbwe_lo TLB flushing

2024-02-19 Thread Nicholas Piggin

Rather than tlbwe_lo always flushing all TCG TLBs, have it flush just
those corresponding to the old software TLB, and only if it was valid.

Tested-by: BALATON Zoltan 
Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/mmu_helper.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index 68632bf54e..923779d052 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -813,12 +813,20 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong 
entry,
 void helper_4xx_tlbwe_lo(CPUPPCState *env, target_ulong entry,
  target_ulong val)
 {
+CPUState *cs = env_cpu(env);
 ppcemb_tlb_t *tlb;
 
 qemu_log_mask(CPU_LOG_MMU, "%s entry %i val " TARGET_FMT_lx "\n",
   __func__, (int)entry, val);
 entry &= PPC4XX_TLB_ENTRY_MASK;
 tlb = &env->tlb.tlbe[entry];
+/* Invalidate previous TLB (if it's valid) */
+if (tlb->prot & PAGE_VALID) {
+qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
+  TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
+  (int)entry, tlb->EPN, tlb->EPN + tlb->size);
+ppcemb_tlb_flush(cs, tlb);
+}
 tlb->attr = val & PPC4XX_TLBLO_ATTR_MASK;
 tlb->RPN = val & PPC4XX_TLBLO_RPN_MASK;
 tlb->prot = PAGE_READ;
@@ -836,8 +844,6 @@ void helper_4xx_tlbwe_lo(CPUPPCState *env, target_ulong 
entry,
   tlb->prot & PAGE_WRITE ? 'w' : '-',
   tlb->prot & PAGE_EXEC ? 'x' : '-',
   tlb->prot & PAGE_VALID ? 'v' : '-', (int)tlb->PID);
-
-env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
 }
 
 target_ulong helper_4xx_tlbsx(CPUPPCState *env, target_ulong address)
-- 
2.42.0

[PULL 36/49] target/ppc: Rename TBL to TB on 64-bit

2024-02-19 Thread Nicholas Piggin

>From the earliest PowerPC ISA, TBR (later SPR) 268 has been called TB
and accessed with mftb instruction. The problem is that TB is the name
of the 64-bit register, and 32-bit implementations can only read the
lower half with one instruction, so 268 has also been called TBL and
it does only read TBL on 32-bit.

Change SPR 268 to be called TB on 64-bit implementations.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/helper_regs.c  | 4 
 target/ppc/ppc-qmp-cmds.c | 4 
 2 files changed, 8 insertions(+)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 8324ff22db..8f5bd1536e 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -460,7 +460,11 @@ void register_generic_sprs(PowerPCCPU *cpu)
 }
 
 /* Time base */
+#if defined(TARGET_PPC64)
+spr_register(env, SPR_VTBL,  "TB",
+#else
 spr_register(env, SPR_VTBL,  "TBL",
+#endif
  &spr_read_tbl, SPR_NOACCESS,
  &spr_read_tbl, SPR_NOACCESS,
  0x);
diff --git a/target/ppc/ppc-qmp-cmds.c b/target/ppc/ppc-qmp-cmds.c
index c0c137d9d7..ee0b99fce7 100644
--- a/target/ppc/ppc-qmp-cmds.c
+++ b/target/ppc/ppc-qmp-cmds.c
@@ -103,7 +103,11 @@ const MonitorDef monitor_defs[] = {
 { "xer", 0, &monitor_get_xer },
 { "msr", offsetof(CPUPPCState, msr) },
 { "tbu", 0, &monitor_get_tbu, },
+#if defined(TARGET_PPC64)
+{ "tb", 0, &monitor_get_tbl, },
+#else
 { "tbl", 0, &monitor_get_tbl, },
+#endif
 { NULL },
 };
 
-- 
2.42.0

[PULL 01/49] target/ppc: Fix lxv/stxv MSR facility check

2024-02-19 Thread Nicholas Piggin

The move to decodetree flipped the inequality test for the VEC / VSX
MSR facility check.

This caused application crashes under Linux, where these facility
unavailable interrupts are used for lazy-switching of VEC/VSX register
sets. Getting the incorrect interrupt would result in wrong registers
being loaded, potentially overwriting live values and/or exposing
stale ones.

Cc: qemu-sta...@nongnu.org
Reported-by: Joel Stanley 
Fixes: 70426b5bb738 ("target/ppc: moved stxvx and lxvx from legacy to 
decodtree")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1769
Tested-by: Harsh Prateek Bora 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/translate/vsx-impl.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/translate/vsx-impl.c.inc 
b/target/ppc/translate/vsx-impl.c.inc
index 6db87ab336..0266f09119 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2268,7 +2268,7 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv 
displ,
 
 static bool do_lstxv_D(DisasContext *ctx, arg_D *a, bool store, bool paired)
 {
-if (paired || a->rt >= 32) {
+if (paired || a->rt < 32) {
 REQUIRE_VSX(ctx);
 } else {
 REQUIRE_VECTOR(ctx);
-- 
2.42.0

[PULL 21/49] ppc/pnv: Change powernv default to powernv10

2024-02-19 Thread Nicholas Piggin

POWER10 is the latest IBM Power machine. Although it is not offered in
"OPAL mode" (i.e., powernv configuration), so there is a case that it
should remain at powernv9, most of the development work is going into
powernv10 at the moment.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 0297871bdd..b949398689 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2242,8 +2242,6 @@ static void pnv_machine_power9_class_init(ObjectClass 
*oc, void *data)
 
 xfc->match_nvt = pnv_match_nvt;
 
-mc->alias = "powernv";
-
 pmc->compat = compat;
 pmc->compat_size = sizeof(compat);
 pmc->dt_power_mgt = pnv_dt_power_mgt;
@@ -2267,6 +2265,8 @@ static void pnv_machine_power10_class_init(ObjectClass 
*oc, void *data)
 mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power10_v2.0");
 compat_props_add(mc->compat_props, phb_compat, G_N_ELEMENTS(phb_compat));
 
+mc->alias = "powernv";
+
 pmc->compat = compat;
 pmc->compat_size = sizeof(compat);
 pmc->dt_power_mgt = pnv_dt_power_mgt;
-- 
2.42.0

[PULL 15/49] hw/ppc/spapr_hcall: Rename {softmmu -> vhyp_mmu}_resize_hpt_pr

2024-02-19 Thread Nicholas Piggin

From: Philippe Mathieu-Daudé 

Since 'softmmu' is quite a loaded term in QEMU, rename the vhyp MMU
facilities to use the vhyp_mmu_ prefix rather than softmmu_.

vhyp_mmu_ is chosen because the code that manipulates the hash table
via guest software hypercalls is QEMU's implementation of the PAPR
hypervisor interface, called vhyp.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Philippe Mathieu-Daudé 
[npiggin: Pick a different name, explain it in changelog.]
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr_hcall.c   | 4 ++--
 hw/ppc/spapr_softmmu.c | 4 ++--
 include/hw/ppc/spapr.h | 9 ++---
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 0d7d523e6d..75c2d12978 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -124,7 +124,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
 if (kvm_enabled()) {
 return H_HARDWARE;
 } else if (tcg_enabled()) {
-return softmmu_resize_hpt_prepare(cpu, spapr, shift);
+return vhyp_mmu_resize_hpt_prepare(cpu, spapr, shift);
 } else {
 g_assert_not_reached();
 }
@@ -194,7 +194,7 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
 if (kvm_enabled()) {
 return H_HARDWARE;
 } else if (tcg_enabled()) {
-return softmmu_resize_hpt_commit(cpu, spapr, flags, shift);
+return vhyp_mmu_resize_hpt_commit(cpu, spapr, flags, shift);
 } else {
 g_assert_not_reached();
 }
diff --git a/hw/ppc/spapr_softmmu.c b/hw/ppc/spapr_softmmu.c
index 2fade94029..b3dd8b3a59 100644
--- a/hw/ppc/spapr_softmmu.c
+++ b/hw/ppc/spapr_softmmu.c
@@ -378,7 +378,7 @@ static void cancel_hpt_prepare(SpaprMachineState *spapr)
 free_pending_hpt(pending);
 }
 
-target_ulong softmmu_resize_hpt_prepare(PowerPCCPU *cpu,
+target_ulong vhyp_mmu_resize_hpt_prepare(PowerPCCPU *cpu,
  SpaprMachineState *spapr,
  target_ulong shift)
 {
@@ -562,7 +562,7 @@ static int rehash_hpt(PowerPCCPU *cpu,
 return H_SUCCESS;
 }
 
-target_ulong softmmu_resize_hpt_commit(PowerPCCPU *cpu,
+target_ulong vhyp_mmu_resize_hpt_commit(PowerPCCPU *cpu,
 SpaprMachineState *spapr,
 target_ulong flags,
 target_ulong shift)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index e91791a1a9..5b5ba9ef77 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -634,10 +634,13 @@ void spapr_register_hypercall(target_ulong opcode, 
spapr_hcall_fn fn);
 target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
  target_ulong *args);
 
-target_ulong softmmu_resize_hpt_prepare(PowerPCCPU *cpu, SpaprMachineState 
*spapr,
+target_ulong vhyp_mmu_resize_hpt_prepare(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
  target_ulong shift);
-target_ulong softmmu_resize_hpt_commit(PowerPCCPU *cpu, SpaprMachineState 
*spapr,
-target_ulong flags, target_ulong 
shift);
+target_ulong vhyp_mmu_resize_hpt_commit(PowerPCCPU *cpu,
+SpaprMachineState *spapr,
+target_ulong flags,
+target_ulong shift);
 bool is_ram_address(SpaprMachineState *spapr, hwaddr addr);
 void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
 
-- 
2.42.0

[PULL 18/49] ppc/spapr: Initialize max_cpus limit to SPAPR_IRQ_NR_IPIS.

2024-02-19 Thread Nicholas Piggin

From: Harsh Prateek Bora 

Initialize the machine specific max_cpus limit as per the maximum range
of CPU IPIs available. Keeping between 4096 to 8192 will throw IRQ not
free error due to XIVE/XICS limitation and keeping beyond 8192 will hit
assert in tcg_region_init or spapr_xive_claim_irq.

Logs:

Without patch fix:

[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
qemu-system-ppc64: IRQ 4096 is not free
[root@host build]#

On LPAR:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
**
ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
(region_size >= 2 * page_size)
Bail out! ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
(region_size >= 2 * page_size)
Aborted (core dumped)
[root@host build]#

On x86:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
qemu-system-ppc64: ../hw/intc/spapr_xive.c:596: spapr_xive_claim_irq:
Assertion `lisn < xive->nr_irqs' failed.
Aborted (core dumped)
[root@host build]#

With patch fix:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
qemu-system-ppc64: Invalid SMP CPUs 4097. The max CPUs supported by
machine 'pseries-8.2' is 4096
[root@host build]#

Reviewed-by: Cédric Le Goater 
Signed-off-by: Harsh Prateek Bora 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0d72d286d8..0028ce0b67 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4639,13 +4639,10 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 mc->block_default_type = IF_SCSI;
 
 /*
- * Setting max_cpus to INT32_MAX. Both KVM and TCG max_cpus values
- * should be limited by the host capability instead of hardcoded.
- * max_cpus for KVM guests will be checked in kvm_init(), and TCG
- * guests are welcome to have as many CPUs as the host are capable
- * of emulate.
+ * While KVM determines max cpus in kvm_init() using kvm_max_vcpus(),
+ * In TCG the limit is restricted by the range of CPU IPIs available.
  */
-mc->max_cpus = INT32_MAX;
+mc->max_cpus = SPAPR_IRQ_NR_IPIS;
 
 mc->no_parallel = 1;
 mc->default_boot_order = "";
-- 
2.42.0

[PULL 46/49] target/ppc: 4xx don't flush TLB for a newly written software TLB entry

2024-02-19 Thread Nicholas Piggin

BookE software TLB is implemented by flushing old translations from the
relevant TCG TLB whenever software TLB entries change. This means a new
software TLB entry should not have any corresponding cached TCG TLB
translations, so there is nothing to flush. The exception is multiple
software TLBs that cover the same address and address space, but that is
a programming error and results in undefined behaviour, and flushing
does not give an obviously better outcome in that case either.

Remove the unnecessary flush of a newly written software TLB entry.

Tested-by: BALATON Zoltan 
Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/mmu_helper.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index 949ae87f4f..68632bf54e 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -808,13 +808,6 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong 
entry,
   tlb->prot & PAGE_WRITE ? 'w' : '-',
   tlb->prot & PAGE_EXEC ? 'x' : '-',
   tlb->prot & PAGE_VALID ? 'v' : '-', (int)tlb->PID);
-/* Invalidate new TLB (if valid) */
-if (tlb->prot & PAGE_VALID) {
-qemu_log_mask(CPU_LOG_MMU, "%s: invalidate TLB %d start "
-  TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
-  (int)entry, tlb->EPN, tlb->EPN + tlb->size);
-ppcemb_tlb_flush(cs, tlb);
-}
 }
 
 void helper_4xx_tlbwe_lo(CPUPPCState *env, target_ulong entry,
-- 
2.42.0

[PULL 33/49] hw/ppc: Add N1 chiplet model

2024-02-19 Thread Nicholas Piggin

From: Chalapathi V 

The N1 chiplet handle the high speed i/o traffic over PCIe and others.
The N1 chiplet consists of PowerBus Fabric controller,
nest Memory Management Unit, chiplet control unit and more.

This commit creates a N1 chiplet model and initialize and realize the
pervasive chiplet model where chiplet control registers are implemented.

This commit also implement the read/write method for the powerbus scom
registers

Signed-off-by: Chalapathi V 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/meson.build  |   1 +
 hw/ppc/pnv_n1_chiplet.c | 173 
 include/hw/ppc/pnv_n1_chiplet.h |  32 ++
 include/hw/ppc/pnv_xscom.h  |   6 ++
 4 files changed, 212 insertions(+)
 create mode 100644 hw/ppc/pnv_n1_chiplet.c
 create mode 100644 include/hw/ppc/pnv_n1_chiplet.h

diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index e46c9bcd7b..196c87e3e0 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -54,6 +54,7 @@ ppc_ss.add(when: 'CONFIG_POWERNV', if_true: files(
   'pnv_homer.c',
   'pnv_pnor.c',
   'pnv_nest_pervasive.c',
+  'pnv_n1_chiplet.c',
 ))
 # PowerPC 4xx boards
 ppc_ss.add(when: 'CONFIG_PPC405', if_true: files(
diff --git a/hw/ppc/pnv_n1_chiplet.c b/hw/ppc/pnv_n1_chiplet.c
new file mode 100644
index 00..03ff9fbad0
--- /dev/null
+++ b/hw/ppc/pnv_n1_chiplet.c
@@ -0,0 +1,173 @@
+/*
+ * QEMU PowerPC N1 chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_n1_chiplet.h"
+#include "hw/ppc/pnv_nest_pervasive.h"
+
+/*
+ * The n1 chiplet contains chiplet control unit,
+ * PowerBus/RaceTrack/Bridge logic, nest Memory Management Unit(nMMU)
+ * and more.
+ *
+ * In this model Nest1 chiplet control registers are modelled via common
+ * nest pervasive model and few PowerBus racetrack registers are modelled.
+ */
+
+#define PB_SCOM_EQ0_HP_MODE2_CURR  0xe
+#define PB_SCOM_ES3_MODE   0x8a
+
+static uint64_t pnv_n1_chiplet_pb_scom_eq_read(void *opaque, hwaddr addr,
+  unsigned size)
+{
+PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(opaque);
+uint32_t reg = addr >> 3;
+uint64_t val = ~0ull;
+
+switch (reg) {
+case PB_SCOM_EQ0_HP_MODE2_CURR:
+val = n1_chiplet->eq[0].hp_mode2_curr;
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom read at 0x%" PRIx32 "\n",
+  __func__, reg);
+}
+return val;
+}
+
+static void pnv_n1_chiplet_pb_scom_eq_write(void *opaque, hwaddr addr,
+   uint64_t val, unsigned size)
+{
+PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(opaque);
+uint32_t reg = addr >> 3;
+
+switch (reg) {
+case PB_SCOM_EQ0_HP_MODE2_CURR:
+n1_chiplet->eq[0].hp_mode2_curr = val;
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom write at 0x%" PRIx32 "\n",
+  __func__, reg);
+}
+}
+
+static const MemoryRegionOps pnv_n1_chiplet_pb_scom_eq_ops = {
+.read = pnv_n1_chiplet_pb_scom_eq_read,
+.write = pnv_n1_chiplet_pb_scom_eq_write,
+.valid.min_access_size = 8,
+.valid.max_access_size = 8,
+.impl.min_access_size = 8,
+.impl.max_access_size = 8,
+.endianness = DEVICE_BIG_ENDIAN,
+};
+
+static uint64_t pnv_n1_chiplet_pb_scom_es_read(void *opaque, hwaddr addr,
+  unsigned size)
+{
+PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(opaque);
+uint32_t reg = addr >> 3;
+uint64_t val = ~0ull;
+
+switch (reg) {
+case PB_SCOM_ES3_MODE:
+val = n1_chiplet->es[3].mode;
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom read at 0x%" PRIx32 "\n",
+  __func__, reg);
+}
+return val;
+}
+
+static void pnv_n1_chiplet_pb_scom_es_write(void *opaque, hwaddr addr,
+   uint64_t val, unsigned size)
+{
+PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(opaque);
+uint32_t reg = addr >> 3;
+
+switch (reg) {
+case PB_SCOM_ES3_MODE:
+n1_chiplet->es[3].mode = val;
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: Invalid xscom write at 0x%" PRIx32 "\n",
+  __func__, reg);
+}
+}
+
+static const MemoryRegionOps pnv_n1_chiplet_pb_scom_es_ops = {
+.read = pnv_n1_chiplet_pb_scom_es_read,
+.write = pnv_n1_chiplet_pb_scom_es_write,
+.valid.min_access_size = 8,
+.valid.max_access_size = 8,
+.impl.min_access_size = 8,
+.impl.max_access_size = 8,
+.endianness = DEVICE_BIG_ENDIAN,
+};
+
+static void pnv_n1_chiplet_realize(DeviceState *dev, Error **errp)
+{
+PnvN1Chiplet *n1_chiplet = PNV_N1_CHIPLET(dev);
+
+

[PULL 30/49] ppc/pnv: Add a pca9554 I2C device to powernv10-rainier

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

For powernv10-rainier, the Power Hypervisor code expects to see a
pca9554 device connected to the 3rd PNV I2C engine on port 1 at I2C
address 0x25 (or left-justified address of 0x4A).  This is used by
the hypervisor code to detect if a "Cable Card" is present.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 hw/misc/Kconfig | 4 
 hw/misc/meson.build | 1 +
 hw/ppc/Kconfig  | 1 +
 hw/ppc/pnv.c| 6 ++
 4 files changed, 12 insertions(+)

diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index 4fc6b29b43..83ad849b62 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -34,6 +34,10 @@ config PCA9552
 bool
 depends on I2C
 
+config PCA9554
+bool
+depends on I2C
+
 config I2C_ECHO
 bool
 default y if TEST_DEVICES
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index e4ef1da5a5..746686835b 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -4,6 +4,7 @@ system_ss.add(when: 'CONFIG_FW_CFG_DMA', if_true: 
files('vmcoreinfo.c'))
 system_ss.add(when: 'CONFIG_ISA_DEBUG', if_true: files('debugexit.c'))
 system_ss.add(when: 'CONFIG_ISA_TESTDEV', if_true: files('pc-testdev.c'))
 system_ss.add(when: 'CONFIG_PCA9552', if_true: files('pca9552.c'))
+system_ss.add(when: 'CONFIG_PCA9554', if_true: files('pca9554.c'))
 system_ss.add(when: 'CONFIG_PCI_TESTDEV', if_true: files('pci-testdev.c'))
 system_ss.add(when: 'CONFIG_UNIMP', if_true: files('unimp.c'))
 system_ss.add(when: 'CONFIG_EMPTY_SLOT', if_true: files('empty_slot.c'))
diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 8e592e4307..d97743d02f 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -33,6 +33,7 @@ config POWERNV
 select FDT_PPC
 select PCI_POWERNV
 select PCA9552
+select PCA9554
 
 config PPC405
 bool
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 97bdfb2d1e..0755fab155 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1913,6 +1913,12 @@ static void pnv_rainier_i2c_init(PnvMachineState *pnv)
 qdev_connect_gpio_out(DEVICE(dev), 2, qdev_get_gpio_in(DEVICE(dev), 
7));
 qdev_connect_gpio_out(DEVICE(dev), 3, qdev_get_gpio_in(DEVICE(dev), 
8));
 qdev_connect_gpio_out(DEVICE(dev), 4, qdev_get_gpio_in(DEVICE(dev), 
9));
+
+/*
+ * Add a PCA9554 I2C device for cable card presence detection
+ * to engine 2, bus 1, address 0x25
+ */
+i2c_slave_create_simple(chip10->i2c[2].busses[1], "pca9554", 0x25);
 }
 }
 
-- 
2.42.0

[PULL 28/49] ppc/pnv: Use resettable interface to reset child I2C buses

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

The QEMU I2C buses and devices use the resettable
interface for resetting while the PNV I2C controller
and parent buses and devices have not yet transitioned
to this new interface and use the old reset strategy.
This was preventing the I2C buses and devices wired
to the PNV I2C controller from being reset.

The short term fix for this is to have the PNV I2C
Controller's reset function explicitly call the resettable
interface function, bus_cold_reset(), on all child
I2C buses.

The long term fix should be to transition all PNV parent
devices and buses to use the resettable interface so that
all child buses and devices are automatically reset.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv_i2c.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/pnv_i2c.c b/hw/ppc/pnv_i2c.c
index 656a48eebe..774946d6b2 100644
--- a/hw/ppc/pnv_i2c.c
+++ b/hw/ppc/pnv_i2c.c
@@ -629,6 +629,19 @@ static int pnv_i2c_dt_xscom(PnvXScomInterface *dev, void 
*fdt,
 return 0;
 }
 
+static void pnv_i2c_sys_reset(void *dev)
+{
+int port;
+PnvI2C *i2c = PNV_I2C(dev);
+
+pnv_i2c_reset(dev);
+
+/* reset all buses connected to this i2c controller */
+for (port = 0; port < i2c->num_busses; port++) {
+bus_cold_reset(BUS(i2c->busses[port]));
+}
+}
+
 static void pnv_i2c_realize(DeviceState *dev, Error **errp)
 {
 PnvI2C *i2c = PNV_I2C(dev);
@@ -654,7 +667,7 @@ static void pnv_i2c_realize(DeviceState *dev, Error **errp)
 
 fifo8_create(&i2c->fifo, PNV_I2C_FIFO_SIZE);
 
-qemu_register_reset(pnv_i2c_reset, dev);
+qemu_register_reset(pnv_i2c_sys_reset, dev);
 
 qdev_init_gpio_out(DEVICE(dev), &i2c->psi_irq, 1);
 }
-- 
2.42.0

[PULL 19/49] ppc/spapr: change pseries machine default to POWER10 CPU

2024-02-19 Thread Nicholas Piggin

POWER10 is the latest pseries CPU.

Reviewed-by: Harsh Prateek Bora 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0028ce0b67..b442d18317 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4664,7 +4664,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 
 smc->dr_lmb_enabled = true;
 smc->update_dt_enabled = true;
-mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.2");
+mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power10_v2.0");
 mc->has_hotpluggable_cpus = true;
 mc->nvdimm_supported = true;
 smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED;
-- 
2.42.0

[PULL 48/49] target/ppc: 440 optimise tlbwe TLB flushing

2024-02-19 Thread Nicholas Piggin

Have 440 tlbwe flush only the range corresponding to the addresses
covered by the software TLB entry being modified rather than the
entire TLB. This matches what 4xx does.

Tested-by: BALATON Zoltan 
Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/mmu_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index 923779d052..ba965f1779 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -864,7 +864,7 @@ void helper_440_tlbwe(CPUPPCState *env, uint32_t word, 
target_ulong entry,
 
 /* Invalidate previous TLB (if it's valid) */
 if (tlb->prot & PAGE_VALID) {
-tlb_flush(env_cpu(env));
+ppcemb_tlb_flush(env_cpu(env), tlb);
 }
 
 switch (word) {
-- 
2.42.0

[PULL 37/49] target/ppc: Improve timebase register defines naming

2024-02-19 Thread Nicholas Piggin

The timebase in ppc started out with the mftb instruction which is like
mfspr but addressed timebase registers (TBRs) rather than SPRs. These
instructions could be used to read TB and TBU at 268 and 269. Timebase
could be written via the TBL and TBU SPRs at 284 and 285.

The ISA changed around v2.03 to bring TB and TBU reads into the SPR
space at 268 and 269 (access via mftb TBR-space is still supported
but will be phased out). Later, VTB was added which is an entirely
different register.

The SPR number defines in QEMU are understandably inconsistently named.
Change SPR 268, 269, 284, 285 to TBL, TBU, WR_TBL, WR_TBU, respectively.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h |  8 
 target/ppc/helper_regs.c | 10 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index a44de22ca4..16baea609c 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1750,8 +1750,8 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_USPRG5(0x105)
 #define SPR_USPRG6(0x106)
 #define SPR_USPRG7(0x107)
-#define SPR_VTBL  (0x10C)
-#define SPR_VTBU  (0x10D)
+#define SPR_TBL   (0x10C)
+#define SPR_TBU   (0x10D)
 #define SPR_SPRG0 (0x110)
 #define SPR_SPRG1 (0x111)
 #define SPR_SPRG2 (0x112)
@@ -1764,8 +1764,8 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_SPRG7 (0x117)
 #define SPR_ASR   (0x118)
 #define SPR_EAR   (0x11A)
-#define SPR_TBL   (0x11C)
-#define SPR_TBU   (0x11D)
+#define SPR_WR_TBL(0x11C)
+#define SPR_WR_TBU(0x11D)
 #define SPR_TBU40 (0x11E)
 #define SPR_SVR   (0x11E)
 #define SPR_BOOKE_PIR (0x11E)
diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 8f5bd1536e..94c9a5a5c1 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -461,22 +461,22 @@ void register_generic_sprs(PowerPCCPU *cpu)
 
 /* Time base */
 #if defined(TARGET_PPC64)
-spr_register(env, SPR_VTBL,  "TB",
+spr_register(env, SPR_TBL, "TB",
 #else
-spr_register(env, SPR_VTBL,  "TBL",
+spr_register(env, SPR_TBL, "TBL",
 #endif
  &spr_read_tbl, SPR_NOACCESS,
  &spr_read_tbl, SPR_NOACCESS,
  0x);
-spr_register(env, SPR_TBL,   "TBL",
+spr_register(env, SPR_WR_TBL, "TBL",
  &spr_read_tbl, SPR_NOACCESS,
  &spr_read_tbl, &spr_write_tbl,
  0x);
-spr_register(env, SPR_VTBU,  "TBU",
+spr_register(env, SPR_TBU, "TBU",
  &spr_read_tbu, SPR_NOACCESS,
  &spr_read_tbu, SPR_NOACCESS,
  0x);
-spr_register(env, SPR_TBU,   "TBU",
+spr_register(env, SPR_WR_TBU, "TBU",
  &spr_read_tbu, SPR_NOACCESS,
  &spr_read_tbu, &spr_write_tbu,
  0x);
-- 
2.42.0

[PULL 34/49] hw/ppc: N1 chiplet wiring

2024-02-19 Thread Nicholas Piggin

From: Chalapathi V 

This part of the patchset connects the nest1 chiplet model to p10 chip.

Signed-off-by: Chalapathi V 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv.c  | 15 +++
 include/hw/ppc/pnv_chip.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 0755fab155..acc4db00c1 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1688,6 +1688,8 @@ static void pnv_chip_power10_instance_init(Object *obj)
 object_initialize_child(obj, "occ",  &chip10->occ, TYPE_PNV10_OCC);
 object_initialize_child(obj, "sbe",  &chip10->sbe, TYPE_PNV10_SBE);
 object_initialize_child(obj, "homer", &chip10->homer, TYPE_PNV10_HOMER);
+object_initialize_child(obj, "n1-chiplet", &chip10->n1_chiplet,
+TYPE_PNV_N1_CHIPLET);
 
 chip->num_pecs = pcc->num_pecs;
 
@@ -1857,6 +1859,19 @@ static void pnv_chip_power10_realize(DeviceState *dev, 
Error **errp)
 memory_region_add_subregion(get_system_memory(), PNV10_HOMER_BASE(chip),
 &chip10->homer.regs);
 
+/* N1 chiplet */
+if (!qdev_realize(DEVICE(&chip10->n1_chiplet), NULL, errp)) {
+return;
+}
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_N1_CHIPLET_CTRL_REGS_BASE,
+ &chip10->n1_chiplet.nest_pervasive.xscom_ctrl_regs_mr);
+
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_N1_PB_SCOM_EQ_BASE,
+   &chip10->n1_chiplet.xscom_pb_eq_mr);
+
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_N1_PB_SCOM_ES_BASE,
+   &chip10->n1_chiplet.xscom_pb_es_mr);
+
 /* PHBs */
 pnv_chip_power10_phb_realize(chip, &local_err);
 if (local_err) {
diff --git a/include/hw/ppc/pnv_chip.h b/include/hw/ppc/pnv_chip.h
index 0ab5c42308..9b06c8d87c 100644
--- a/include/hw/ppc/pnv_chip.h
+++ b/include/hw/ppc/pnv_chip.h
@@ -4,6 +4,7 @@
 #include "hw/pci-host/pnv_phb4.h"
 #include "hw/ppc/pnv_core.h"
 #include "hw/ppc/pnv_homer.h"
+#include "hw/ppc/pnv_n1_chiplet.h"
 #include "hw/ppc/pnv_lpc.h"
 #include "hw/ppc/pnv_occ.h"
 #include "hw/ppc/pnv_psi.h"
@@ -113,6 +114,7 @@ struct Pnv10Chip {
 PnvOCC   occ;
 PnvSBE   sbe;
 PnvHomer homer;
+PnvN1Chiplet n1_chiplet;
 
 uint32_t nr_quads;
 PnvQuad  *quads;
-- 
2.42.0

[PULL 38/49] target/ppc: Fix move-to timebase SPR access permissions

2024-02-19 Thread Nicholas Piggin

The move-to timebase registers TBU and TBL can not be read, and they
can not be written in supervisor mode on hypervisor-capable CPUs.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/helper_regs.c | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 94c9a5a5c1..410b39c231 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -468,18 +468,33 @@ void register_generic_sprs(PowerPCCPU *cpu)
  &spr_read_tbl, SPR_NOACCESS,
  &spr_read_tbl, SPR_NOACCESS,
  0x);
-spr_register(env, SPR_WR_TBL, "TBL",
- &spr_read_tbl, SPR_NOACCESS,
- &spr_read_tbl, &spr_write_tbl,
- 0x);
 spr_register(env, SPR_TBU, "TBU",
  &spr_read_tbu, SPR_NOACCESS,
  &spr_read_tbu, SPR_NOACCESS,
  0x);
-spr_register(env, SPR_WR_TBU, "TBU",
- &spr_read_tbu, SPR_NOACCESS,
- &spr_read_tbu, &spr_write_tbu,
- 0x);
+#ifndef CONFIG_USER_ONLY
+if (env->has_hv_mode) {
+spr_register_hv(env, SPR_WR_TBL, "TBL",
+SPR_NOACCESS, SPR_NOACCESS,
+SPR_NOACCESS, SPR_NOACCESS,
+SPR_NOACCESS, &spr_write_tbl,
+0x);
+spr_register_hv(env, SPR_WR_TBU, "TBU",
+SPR_NOACCESS, SPR_NOACCESS,
+SPR_NOACCESS, SPR_NOACCESS,
+SPR_NOACCESS, &spr_write_tbu,
+0x);
+} else {
+spr_register(env, SPR_WR_TBL, "TBL",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, &spr_write_tbl,
+ 0x);
+spr_register(env, SPR_WR_TBU, "TBU",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, &spr_write_tbu,
+ 0x);
+}
+#endif
 }
 
 void register_non_embedded_sprs(CPUPPCState *env)
-- 
2.42.0

[PULL 29/49] misc: Add a pca9554 GPIO device model

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

Specs are available here:

https://www.nxp.com/docs/en/data-sheet/PCA9554_9554A.pdf

This is a simple model supporting the basic registers for GPIO
mode.  The device also supports an interrupt output line but the
model does not yet support this.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 MAINTAINERS|  10 +-
 hw/misc/pca9554.c  | 328 +
 include/hw/misc/pca9554.h  |  36 
 include/hw/misc/pca9554_regs.h |  19 ++
 4 files changed, 391 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/pca9554.c
 create mode 100644 include/hw/misc/pca9554.h
 create mode 100644 include/hw/misc/pca9554_regs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index c0f42e8d4a..a74d73960c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1170,9 +1170,7 @@ R: Joel Stanley 
 L: qemu-...@nongnu.org
 S: Maintained
 F: hw/*/*aspeed*
-F: hw/misc/pca9552.c
 F: include/hw/*/*aspeed*
-F: include/hw/misc/pca9552*.h
 F: hw/net/ftgmac100.c
 F: include/hw/net/ftgmac100.h
 F: docs/system/arm/aspeed.rst
@@ -1543,6 +1541,14 @@ F: include/hw/pci-host/pnv*
 F: pc-bios/skiboot.lid
 F: tests/qtest/pnv*
 
+pca955x
+M: Glenn Miles 
+L: qemu-...@nongnu.org
+L: qemu-...@nongnu.org
+S: Odd Fixes
+F: hw/misc/pca955*.c
+F: include/hw/misc/pca955*.h
+
 virtex_ml507
 M: Edgar E. Iglesias 
 L: qemu-...@nongnu.org
diff --git a/hw/misc/pca9554.c b/hw/misc/pca9554.c
new file mode 100644
index 00..778b32e443
--- /dev/null
+++ b/hw/misc/pca9554.c
@@ -0,0 +1,328 @@
+/*
+ * PCA9554 I/O port
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/bitops.h"
+#include "hw/qdev-properties.h"
+#include "hw/misc/pca9554.h"
+#include "hw/misc/pca9554_regs.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "trace.h"
+#include "qom/object.h"
+
+struct PCA9554Class {
+/*< private >*/
+I2CSlaveClass parent_class;
+/*< public >*/
+};
+typedef struct PCA9554Class PCA9554Class;
+
+DECLARE_CLASS_CHECKERS(PCA9554Class, PCA9554,
+   TYPE_PCA9554)
+
+#define PCA9554_PIN_LOW  0x0
+#define PCA9554_PIN_HIZ  0x1
+
+static const char *pin_state[] = {"low", "high"};
+
+static void pca9554_update_pin_input(PCA9554State *s)
+{
+int i;
+uint8_t config = s->regs[PCA9554_CONFIG];
+uint8_t output = s->regs[PCA9554_OUTPUT];
+uint8_t internal_state = config | output;
+
+for (i = 0; i < PCA9554_PIN_COUNT; i++) {
+uint8_t bit_mask = 1 << i;
+uint8_t internal_pin_state = (internal_state >> i) & 0x1;
+uint8_t old_value = s->regs[PCA9554_INPUT] & bit_mask;
+uint8_t new_value;
+
+switch (internal_pin_state) {
+case PCA9554_PIN_LOW:
+s->regs[PCA9554_INPUT] &= ~bit_mask;
+break;
+case PCA9554_PIN_HIZ:
+/*
+ * pullup sets it to a logical 1 unless
+ * external device drives it low.
+ */
+if (s->ext_state[i] == PCA9554_PIN_LOW) {
+s->regs[PCA9554_INPUT] &= ~bit_mask;
+} else {
+s->regs[PCA9554_INPUT] |=  bit_mask;
+}
+break;
+default:
+break;
+}
+
+/* update irq state only if pin state changed */
+new_value = s->regs[PCA9554_INPUT] & bit_mask;
+if (new_value != old_value) {
+if (new_value) {
+/* changed from 0 to 1 */
+qemu_set_irq(s->gpio_out[i], 1);
+} else {
+/* changed from 1 to 0 */
+qemu_set_irq(s->gpio_out[i], 0);
+}
+}
+}
+}
+
+static uint8_t pca9554_read(PCA9554State *s, uint8_t reg)
+{
+switch (reg) {
+case PCA9554_INPUT:
+return s->regs[PCA9554_INPUT] ^ s->regs[PCA9554_POLARITY];
+case PCA9554_OUTPUT:
+case PCA9554_POLARITY:
+case PCA9554_CONFIG:
+return s->regs[reg];
+default:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: unexpected read to register %d\n",
+  __func__, reg);
+return 0xFF;
+}
+}
+
+static void pca9554_write(PCA9554State *s, uint8_t reg, uint8_t data)
+{
+switch (reg) {
+case PCA9554_OUTPUT:
+case PCA9554_CONFIG:
+s->regs[reg] = data;
+pca9554_update_pin_input(s);
+break;
+case PCA9554_POLARITY:
+s->regs[reg] = data;
+break;
+case PCA9554_INPUT:
+default:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: unexpected write to register %d\n",
+  __func__, reg);
+}
+}
+
+static uint8_t pca9554_recv(I2CSlave *i2c)
+{
+PCA9554State *s = PCA9554(i2c);
+uint8_t ret;
+
+ret = pca9554_read(s, s->pointer & 0x3);
+
+return ret;
+}
+
+static int pca9554_send(I2CSlav

[PULL 41/49] ppc/pnv: Implement the ChipTOD to Core transfer

2024-02-19 Thread Nicholas Piggin

One of the functions of the ChipTOD is to transfer TOD to the Core
(aka PC - Pervasive Core) timebase facility.

The ChipTOD can be programmed with a target address to send the TOD
value to. The hardware implementation seems to perform this by
sending the TOD value to a SCOM address.

This implementation grabs the core directly and manipulates the
timebase facility state in the core. This is a hack, but it works
enough for now. A better implementation would implement the transfer
to the PnvCore xscom register and drive the timebase state machine
from there.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv.c |  15 
 hw/ppc/pnv_chiptod.c | 132 +++
 include/hw/ppc/pnv.h |   2 +
 include/hw/ppc/pnv_chiptod.h |   4 ++
 target/ppc/cpu.h |  13 
 5 files changed, 166 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 8beddb1313..0b47b92baa 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2121,6 +2121,21 @@ static void pnv_chip_class_init(ObjectClass *klass, void 
*data)
 dc->desc = "PowerNV Chip";
 }
 
+PnvCore *pnv_chip_find_core(PnvChip *chip, uint32_t core_id)
+{
+int i;
+
+for (i = 0; i < chip->nr_cores; i++) {
+PnvCore *pc = chip->cores[i];
+CPUCore *cc = CPU_CORE(pc);
+
+if (cc->core_id == core_id) {
+return pc;
+}
+}
+return NULL;
+}
+
 PowerPCCPU *pnv_chip_find_cpu(PnvChip *chip, uint32_t pir)
 {
 int i, j;
diff --git a/hw/ppc/pnv_chiptod.c b/hw/ppc/pnv_chiptod.c
index 6ac3eac9d0..3831a72101 100644
--- a/hw/ppc/pnv_chiptod.c
+++ b/hw/ppc/pnv_chiptod.c
@@ -210,6 +210,79 @@ static void chiptod_power10_broadcast_ttype(PnvChipTOD 
*sender,
 }
 }
 
+static PnvCore *pnv_chip_get_core_by_xscom_base(PnvChip *chip,
+uint32_t xscom_base)
+{
+PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
+int i;
+
+for (i = 0; i < chip->nr_cores; i++) {
+PnvCore *pc = chip->cores[i];
+CPUCore *cc = CPU_CORE(pc);
+int core_hwid = cc->core_id;
+
+if (pcc->xscom_core_base(chip, core_hwid) == xscom_base) {
+return pc;
+}
+}
+return NULL;
+}
+
+static PnvCore *chiptod_power9_tx_ttype_target(PnvChipTOD *chiptod,
+   uint64_t val)
+{
+/*
+ * skiboot uses Core ID for P9, though SCOM should work too.
+ */
+if (val & PPC_BIT(35)) { /* SCOM addressing */
+uint32_t addr = val >> 32;
+uint32_t reg = addr & 0xfff;
+
+if (reg != PC_TOD) {
+qemu_log_mask(LOG_GUEST_ERROR, "pnv_chiptod: SCOM addressing: "
+  "unimplemented slave register 0x%" PRIx32 "\n", reg);
+return NULL;
+}
+
+return pnv_chip_get_core_by_xscom_base(chiptod->chip, addr & ~0xfff);
+
+} else { /* Core ID addressing */
+uint32_t core_id = GETFIELD(TOD_TX_TTYPE_PIB_SLAVE_ADDR, val) & 0x1f;
+return pnv_chip_find_core(chiptod->chip, core_id);
+}
+}
+
+static PnvCore *chiptod_power10_tx_ttype_target(PnvChipTOD *chiptod,
+   uint64_t val)
+{
+/*
+ * skiboot uses SCOM for P10 because Core ID was unable to be made to
+ * work correctly. For this reason only SCOM addressing is implemented.
+ */
+if (val & PPC_BIT(35)) { /* SCOM addressing */
+uint32_t addr = val >> 32;
+uint32_t reg = addr & 0xfff;
+
+if (reg != PC_TOD) {
+qemu_log_mask(LOG_GUEST_ERROR, "pnv_chiptod: SCOM addressing: "
+  "unimplemented slave register 0x%" PRIx32 "\n", reg);
+return NULL;
+}
+
+/*
+ * This may not deal with P10 big-core addressing at the moment.
+ * The big-core code in skiboot syncs small cores, but it targets
+ * the even PIR (first small-core) when syncing second small-core.
+ */
+return pnv_chip_get_core_by_xscom_base(chiptod->chip, addr & ~0xfff);
+
+} else { /* Core ID addressing */
+qemu_log_mask(LOG_UNIMP, "pnv_chiptod: TX TTYPE Core ID "
+  "addressing is not implemented for POWER10\n");
+return NULL;
+}
+}
+
 static void pnv_chiptod_xscom_write(void *opaque, hwaddr addr,
 uint64_t val, unsigned size)
 {
@@ -231,6 +304,22 @@ static void pnv_chiptod_xscom_write(void *opaque, hwaddr 
addr,
 chiptod->pss_mss_ctrl_reg = val & PPC_BITMASK(0, 31);
 break;
 
+case TOD_TX_TTYPE_CTRL_REG:
+/*
+ * This register sets the target of the TOD value transfer initiated
+ * by TOD_MOVE_TOD_TO_TB. The TOD is able to send the address to
+ * any target register, though in practice only the PC TOD register
+ * should be used. ChipTOD has a "SCOM addressing" mode which fully
+ * specifies the SCOM address, an

[PULL 42/49] target/ppc: Implement core timebase state machine and TFMR

2024-02-19 Thread Nicholas Piggin

This implements the core timebase state machine, which is the core side
of the time-of-day system in POWER processors. This facility is operated
by control fields in the TFMR register, which also contains status
fields.

The core timebase interacts with the chiptod hardware, primarily to
receive TOD updates, to synchronise timebase with other cores. This
model does not actually update TB values with TOD or updates received
from the chiptod, as timebases are always synchronised. It does step
through the states required to perform the update.

There are several asynchronous state transitions. These are modelled
using using mfTFMR to drive state changes, because it is expected that
firmware poll the register to wait for those states. This is good enough
to test basic firmware behaviour without adding real timers. The values
chosen are arbitrary.

Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h |  36 ++
 target/ppc/timebase_helper.c | 210 ++-
 2 files changed, 243 insertions(+), 3 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 0e932838aa..ec14574d14 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1188,6 +1188,14 @@ DEXCR_ASPECT(PHIE, 6)
 struct pnv_tod_tbst {
 int tb_ready_for_tod; /* core TB ready to receive TOD from chiptod */
 int tod_sent_to_tb;   /* chiptod sent TOD to the core TB */
+
+/*
+ * "Timers" for async TBST events are simulated by mfTFAC because TFAC
+ * is polled for such events. These are just used to ensure firmware
+ * performs the polling at least a few times.
+ */
+int tb_state_timer;
+int tb_sync_pulse_timer;
 };
 
 /*/
@@ -2661,6 +2669,34 @@ enum {
 HMER_XSCOM_STATUS_MASK  = PPC_BITMASK(21, 23),
 };
 
+/* TFMR */
+enum {
+TFMR_CONTROL_MASK   = PPC_BITMASK(0, 24),
+TFMR_MASK_HMI   = PPC_BIT(10),
+TFMR_TB_ECLIPZ  = PPC_BIT(14),
+TFMR_LOAD_TOD_MOD   = PPC_BIT(16),
+TFMR_MOVE_CHIP_TOD_TO_TB= PPC_BIT(18),
+TFMR_CLEAR_TB_ERRORS= PPC_BIT(24),
+TFMR_STATUS_MASK= PPC_BITMASK(25, 63),
+TFMR_TBST_ENCODED   = PPC_BITMASK(28, 31), /* TBST = TB State */
+TFMR_TBST_LAST  = PPC_BITMASK(32, 35), /* Previous TBST */
+TFMR_TB_ENABLED = PPC_BIT(40),
+TFMR_TB_VALID   = PPC_BIT(41),
+TFMR_TB_SYNC_OCCURED= PPC_BIT(42),
+TFMR_FIRMWARE_CONTROL_ERROR = PPC_BIT(46),
+};
+
+/* TFMR TBST (Time Base State Machine). */
+enum {
+TBST_RESET  = 0x0,
+TBST_SEND_TOD_MOD   = 0x1,
+TBST_NOT_SET= 0x2,
+TBST_SYNC_WAIT  = 0x6,
+TBST_GET_TOD= 0x7,
+TBST_TB_RUNNING = 0x8,
+TBST_TB_ERROR   = 0x9,
+};
+
 /*/
 
 #define is_isa300(ctx) (!!(ctx->insns_flags2 & PPC2_ISA300))
diff --git a/target/ppc/timebase_helper.c b/target/ppc/timebase_helper.c
index f618ed2922..b8b9afe0b6 100644
--- a/target/ppc/timebase_helper.c
+++ b/target/ppc/timebase_helper.c
@@ -18,6 +18,7 @@
  */
 #include "qemu/osdep.h"
 #include "cpu.h"
+#include "hw/ppc/ppc.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "qemu/log.h"
@@ -145,15 +146,218 @@ void helper_store_booke_tsr(CPUPPCState *env, 
target_ulong val)
 }
 
 #if defined(TARGET_PPC64)
-/* POWER processor Timebase Facility */
+/*
+ * POWER processor Timebase Facility
+ */
+
+/*
+ * The TBST is the timebase state machine, which is a per-core machine that
+ * is used to synchronize the core TB with the ChipTOD. States 3,4,5 are
+ * not used in POWER8/9/10.
+ *
+ * The state machine gets driven by writes to TFMR SPR from the core, and
+ * by signals from the ChipTOD. The state machine table for common
+ * transitions is as follows (according to hardware specs, not necessarily
+ * this implementation):
+ *
+ * | Cur| Event| New |
+ * ++--+-+
+ * | 0 RESET| TFMR |= LOAD_TOD_MOD | 1   |
+ * | 1 SEND_TOD_MOD | "immediate transition"   | 2   |
+ * | 2 NOT_SET  | mttbu/mttbu40/mttbl  | 2   |
+ * | 2 NOT_SET  | TFMR |= MOVE_CHIP_TOD_TO_TB  | 6   |
+ * | 6 SYNC_WAIT| "sync pulse from ChipTOD"| 7   |
+ * | 7 GET_TOD  | ChipTOD xscom MOVE_TOD_TO_TB_REG | 8   |
+ * | 8 TB_RUNNING   | mttbu/mttbu40| 8   |
+ * | 8 TB_RUNNING   | TFMR |= LOAD_TOD_MOD | 1   |
+ * | 8 TB_RUNNING   | mttbl| 9   |
+ * | 9 TB_ERROR | TFMR |= CLEAR_TB_ERRORS  | 0   |
+ *
+ * - LOAD_TOD_MOD will also move states 2,6 to state 1, omitted from table
+ *   because it's not a typical init flow.
+ *
+ * - The ERROR

[PULL 23/49] misc/pca9552: Fix inverted input status

2024-02-19 Thread Nicholas Piggin

From: Glenn Miles 

The pca9552 INPUT0 and INPUT1 registers are supposed to
hold the logical values of the LED pins.  A logical 0
should be seen in the INPUT0/1 registers for a pin when
its corresponding LSn bits are set to 0, which is also
the state needed for turning on an LED in a typical
usage scenario.  Existing code was doing the opposite
and setting INPUT0/1 bit to a 1 when the LSn bit was
set to 0, so this commit fixes that.

Reviewed-by: Andrew Jeffery 
Signed-off-by: Glenn Miles 
Signed-off-by: Nicholas Piggin 
---
 hw/misc/pca9552.c  | 18 +-
 tests/qtest/pca9552-test.c |  6 +++---
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/hw/misc/pca9552.c b/hw/misc/pca9552.c
index 72b653463f..f00a149d61 100644
--- a/hw/misc/pca9552.c
+++ b/hw/misc/pca9552.c
@@ -36,7 +36,10 @@ typedef struct PCA955xClass PCA955xClass;
 
 DECLARE_CLASS_CHECKERS(PCA955xClass, PCA955X,
TYPE_PCA955X)
-
+/*
+ * Note:  The LED_ON and LED_OFF configuration values for the PCA955X
+ *chips are the reverse of the PCA953X family of chips.
+ */
 #define PCA9552_LED_ON   0x0
 #define PCA9552_LED_OFF  0x1
 #define PCA9552_LED_PWM0 0x2
@@ -112,13 +115,18 @@ static void pca955x_update_pin_input(PCA955xState *s)
 
 switch (config) {
 case PCA9552_LED_ON:
-qemu_set_irq(s->gpio[i], 1);
-s->regs[input_reg] |= 1 << input_shift;
-break;
-case PCA9552_LED_OFF:
+/* Pin is set to 0V to turn on LED */
 qemu_set_irq(s->gpio[i], 0);
 s->regs[input_reg] &= ~(1 << input_shift);
 break;
+case PCA9552_LED_OFF:
+/*
+ * Pin is set to Hi-Z to turn off LED and
+ * pullup sets it to a logical 1.
+ */
+qemu_set_irq(s->gpio[i], 1);
+s->regs[input_reg] |= 1 << input_shift;
+break;
 case PCA9552_LED_PWM0:
 case PCA9552_LED_PWM1:
 /* TODO */
diff --git a/tests/qtest/pca9552-test.c b/tests/qtest/pca9552-test.c
index d80ed93cd3..ccca2b3d91 100644
--- a/tests/qtest/pca9552-test.c
+++ b/tests/qtest/pca9552-test.c
@@ -60,7 +60,7 @@ static void send_and_receive(void *obj, void *data, 
QGuestAllocator *alloc)
 g_assert_cmphex(value, ==, 0x55);
 
 value = i2c_get8(i2cdev, PCA9552_INPUT0);
-g_assert_cmphex(value, ==, 0x0);
+g_assert_cmphex(value, ==, 0xFF);
 
 pca9552_init(i2cdev);
 
@@ -68,13 +68,13 @@ static void send_and_receive(void *obj, void *data, 
QGuestAllocator *alloc)
 g_assert_cmphex(value, ==, 0x54);
 
 value = i2c_get8(i2cdev, PCA9552_INPUT0);
-g_assert_cmphex(value, ==, 0x01);
+g_assert_cmphex(value, ==, 0xFE);
 
 value = i2c_get8(i2cdev, PCA9552_LS3);
 g_assert_cmphex(value, ==, 0x54);
 
 value = i2c_get8(i2cdev, PCA9552_INPUT1);
-g_assert_cmphex(value, ==, 0x10);
+g_assert_cmphex(value, ==, 0xEF);
 }
 
 static void pca9552_register_nodes(void)
-- 
2.42.0

[PULL 39/49] ppc/pnv: Add POWER9/10 chiptod model

2024-02-19 Thread Nicholas Piggin

The ChipTOD (for Time-Of-Day) is a chip pervasive facility in IBM POWER
(powernv) processors that keeps a time of day clock.

In particular for this model are facilities that initialise and start
the time of day clock, and that synchronise that clock to cores on the
chip, and to other chips. In this way, all cores on all chips can
synchronise timebase (TB).

This model implements functionality sufficient to run the skiboot
chiptod synchronisation procedure (with the following core timebase
state machine implementation). It does not modify the TB in the cores
where the real hardware would, because the QEMU ppc timebase
implementation is always synchronised acros all cores.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/meson.build   |   1 +
 hw/ppc/pnv_chiptod.c | 454 +++
 hw/ppc/trace-events  |   4 +
 include/hw/ppc/pnv_chiptod.h |  49 
 include/hw/ppc/pnv_xscom.h   |   9 +
 5 files changed, 517 insertions(+)
 create mode 100644 hw/ppc/pnv_chiptod.c
 create mode 100644 include/hw/ppc/pnv_chiptod.h

diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index 196c87e3e0..c56bf0ac8a 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -48,6 +48,7 @@ ppc_ss.add(when: 'CONFIG_POWERNV', if_true: files(
   'pnv_i2c.c',
   'pnv_lpc.c',
   'pnv_psi.c',
+  'pnv_chiptod.c',
   'pnv_occ.c',
   'pnv_sbe.c',
   'pnv_bmc.c',
diff --git a/hw/ppc/pnv_chiptod.c b/hw/ppc/pnv_chiptod.c
new file mode 100644
index 00..6ac3eac9d0
--- /dev/null
+++ b/hw/ppc/pnv_chiptod.c
@@ -0,0 +1,454 @@
+/*
+ * QEMU PowerPC PowerNV Emulation of some ChipTOD behaviour
+ *
+ * Copyright (c) 2022-2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * ChipTOD (aka TOD) is a facility implemented in the nest / pervasive. The
+ * purpose is to keep time-of-day across chips and cores.
+ *
+ * There is a master chip TOD, which sends signals to slave chip TODs to
+ * keep them synchronized. There are two sets of configuration registers
+ * called primary and secondary, which can be used fail over.
+ *
+ * The chip TOD also distributes synchronisation signals to the timebase
+ * facility in each of the cores on the chip. In particular there is a
+ * feature that can move the TOD value in the ChipTOD to and from the TB.
+ *
+ * Initialisation typically brings all ChipTOD into sync (see tod_state),
+ * and then brings each core TB into sync with the ChipTODs (see timebase
+ * state and TFMR). This model is a very basic simulation of the init sequence
+ * performed by skiboot.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/reset.h"
+#include "target/ppc/cpu.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+#include "hw/ppc/fdt.h"
+#include "hw/ppc/ppc.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_chip.h"
+#include "hw/ppc/pnv_core.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_chiptod.h"
+#include "trace.h"
+
+#include 
+
+/* TOD chip XSCOM addresses */
+#define TOD_M_PATH_CTRL_REG 0x /* Master Path ctrl reg */
+#define TOD_PRI_PORT_0_CTRL_REG 0x0001 /* Primary port0 ctrl reg */
+#define TOD_PRI_PORT_1_CTRL_REG 0x0002 /* Primary port1 ctrl reg */
+#define TOD_SEC_PORT_0_CTRL_REG 0x0003 /* Secondary p0 ctrl reg */
+#define TOD_SEC_PORT_1_CTRL_REG 0x0004 /* Secondary p1 ctrl reg */
+#define TOD_S_PATH_CTRL_REG 0x0005 /* Slave Path ctrl reg */
+#define TOD_I_PATH_CTRL_REG 0x0006 /* Internal Path ctrl reg */
+
+/* -- TOD primary/secondary master/slave control register -- */
+#define TOD_PSS_MSS_CTRL_REG0x0007
+
+/* -- TOD primary/secondary master/slave status register -- */
+#define TOD_PSS_MSS_STATUS_REG  0x0008
+
+/* TOD chip XSCOM addresses */
+#define TOD_CHIP_CTRL_REG   0x0010 /* Chip control reg */
+
+#define TOD_TX_TTYPE_0_REG  0x0011
+#define TOD_TX_TTYPE_1_REG  0x0012 /* PSS switch reg */
+#define TOD_TX_TTYPE_2_REG  0x0013 /* Enable step checkers */
+#define TOD_TX_TTYPE_3_REG  0x0014 /* Request TOD reg */
+#define TOD_TX_TTYPE_4_REG  0x0015 /* Send TOD reg */
+#define TOD_TX_TTYPE_5_REG  0x0016 /* Invalidate TOD reg */
+
+#define TOD_MOVE_TOD_TO_TB_REG  0x0017
+#define TOD_LOAD_TOD_MOD_REG0x0018
+#define TOD_LOAD_TOD_REG0x0021
+#define TOD_START_TOD_REG   0x0022
+#define TOD_FSM_REG 0x0024
+
+#define TOD_TX_TTYPE_CTRL_REG   0x0027 /* TX TTYPE Control reg */
+#define   TOD_TX_TTYPE_PIB_SLAVE_ADDR  PPC_BITMASK(26, 31)
+
+/* -- TOD Error interrupt register -- */
+#define TOD_ERROR_REG   0x0030
+
+/* PC unit PIB address which recieves the timebase transfer from TOD */
+#defin

[PULL 12/49] target/ppc: Rename registers to match ISA

2024-02-19 Thread Nicholas Piggin

Several registers have names that don't match the ISA (or convention
with other QEMU PPC registers), making them unintuitive to use with
GDB.

Fortunately most of these registers are obscure and/or have not been
correctly implemented in the gdb server (e.g., DEC, TB, CFAR), so risk
of breaking users should be low.

QEMU should follow the ISA for register name convention (where there is
no established GDB name).

Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu_init.c| 20 ++--
 target/ppc/helper_regs.c |  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 9931372a08..9bccddb350 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5062,7 +5062,7 @@ static void register_970_hid_sprs(CPUPPCState *env)
 
 static void register_970_hior_sprs(CPUPPCState *env)
 {
-spr_register(env, SPR_HIOR, "SPR_HIOR",
+spr_register(env, SPR_HIOR, "HIOR",
  SPR_NOACCESS, SPR_NOACCESS,
  &spr_read_hior, &spr_write_hior,
  0x);
@@ -5070,11 +5070,11 @@ static void register_970_hior_sprs(CPUPPCState *env)
 
 static void register_book3s_ctrl_sprs(CPUPPCState *env)
 {
-spr_register(env, SPR_CTRL, "SPR_CTRL",
+spr_register(env, SPR_CTRL, "CTRL",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, &spr_write_CTRL,
  0x);
-spr_register(env, SPR_UCTRL, "SPR_UCTRL",
+spr_register(env, SPR_UCTRL, "UCTRL",
  &spr_read_ureg, SPR_NOACCESS,
  &spr_read_ureg, SPR_NOACCESS,
  0x);
@@ -5465,7 +5465,7 @@ static void register_book3s_purr_sprs(CPUPPCState *env)
 static void register_power6_dbg_sprs(CPUPPCState *env)
 {
 #if !defined(CONFIG_USER_ONLY)
-spr_register(env, SPR_CFAR, "SPR_CFAR",
+spr_register(env, SPR_CFAR, "CFAR",
  SPR_NOACCESS, SPR_NOACCESS,
  &spr_read_cfar, &spr_write_cfar,
  0x);
@@ -5483,7 +5483,7 @@ static void register_power5p_common_sprs(CPUPPCState *env)
 static void register_power6_common_sprs(CPUPPCState *env)
 {
 #if !defined(CONFIG_USER_ONLY)
-spr_register_kvm(env, SPR_DSCR, "SPR_DSCR",
+spr_register_kvm(env, SPR_DSCR, "DSCR",
  SPR_NOACCESS, SPR_NOACCESS,
  &spr_read_generic, &spr_write_generic,
  KVM_REG_PPC_DSCR, 0x);
@@ -5695,7 +5695,7 @@ static void register_power8_book4_sprs(CPUPPCState *env)
  &spr_read_generic, &spr_write_generic,
  KVM_REG_PPC_ACOP, 0);
 /* PID is only in BookE in ISA v2.07 */
-spr_register_kvm(env, SPR_BOOKS_PID, "PID",
+spr_register_kvm(env, SPR_BOOKS_PID, "PIDR",
  SPR_NOACCESS, SPR_NOACCESS,
  &spr_read_generic, &spr_write_pidr,
  KVM_REG_PPC_PID, 0);
@@ -5716,7 +5716,7 @@ static void register_power7_book4_sprs(CPUPPCState *env)
  &spr_read_generic, &spr_write_generic,
  KVM_REG_PPC_ACOP, 0);
 /* PID is only in BookE in ISA v2.06 */
-spr_register_kvm(env, SPR_BOOKS_PID, "PID",
+spr_register_kvm(env, SPR_BOOKS_PID, "PIDR",
  SPR_NOACCESS, SPR_NOACCESS,
  &spr_read_generic, &spr_write_generic32,
  KVM_REG_PPC_PID, 0);
@@ -5750,7 +5750,7 @@ static void register_power9_mmu_sprs(CPUPPCState *env)
 &spr_read_generic, &spr_write_generic,
 0x);
 /* PID is part of the BookS ISA from v3.0 */
-spr_register_kvm(env, SPR_BOOKS_PID, "PID",
+spr_register_kvm(env, SPR_BOOKS_PID, "PIDR",
  SPR_NOACCESS, SPR_NOACCESS,
  &spr_read_generic, &spr_write_pidr,
  KVM_REG_PPC_PID, 0);
@@ -5791,7 +5791,7 @@ static void register_power10_dexcr_sprs(CPUPPCState *env)
 &spr_read_generic, &spr_write_generic32,
 0);
 
-spr_register(env, SPR_UDEXCR, "DEXCR",
+spr_register(env, SPR_UDEXCR, "UDEXCR",
 &spr_read_dexcr_ureg, SPR_NOACCESS,
 &spr_read_dexcr_ureg, SPR_NOACCESS,
 0);
@@ -5802,7 +5802,7 @@ static void register_power10_dexcr_sprs(CPUPPCState *env)
 &spr_read_generic, &spr_write_generic32,
 0);
 
-spr_register(env, SPR_UHDEXCR, "HDEXCR",
+spr_register(env, SPR_UHDEXCR, "UHDEXCR",
 &spr_read_dexcr_ureg, SPR_NOACCESS,
 &spr_read_dexcr_ureg, SPR_NOACCESS,
 0);
diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index e0b2dcd02e..8324ff22db 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -490,7 +490,7 @@ void register_non_embedded_sprs(CPUPPCState *env)
  &spr_read_generic, &spr_write_generic,
  KVM_REG_PPC_DAR, 0x000

[PULL 49/49] target/ppc: optimise ppcemb_tlb_t flushing

2024-02-19 Thread Nicholas Piggin

Filter TLB flushing by PID and mmuidx.

Zoltan reports that, together with the previous TLB flush changes,
performance of a sam460ex machine running 'lame' to convert a wav to
mp3 is improved nearly 10%:

  CPU timeTLB partial flushes  TLB elided flushes
Before37s 508238   7680722
After 34s 73  1143

Tested-by: BALATON Zoltan 
Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/mmu_helper.c | 43 +++--
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index ba965f1779..c071b4d5e2 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -751,11 +751,20 @@ target_ulong helper_4xx_tlbre_lo(CPUPPCState *env, 
target_ulong entry)
 
 static void ppcemb_tlb_flush(CPUState *cs, ppcemb_tlb_t *tlb)
 {
-target_ulong ea;
+unsigned mmu_idx = 0;
 
-for (ea = tlb->EPN; ea < tlb->EPN + tlb->size; ea += TARGET_PAGE_SIZE) {
-tlb_flush_page(cs, ea);
+if (tlb->prot & 0xf) {
+mmu_idx |= 0x1;
 }
+if ((tlb->prot >> 4) & 0xf) {
+mmu_idx |= 0x2;
+}
+if (tlb->attr & 1) {
+mmu_idx <<= 2;
+}
+
+tlb_flush_range_by_mmuidx(cs, tlb->EPN, tlb->size, mmu_idx,
+  TARGET_LONG_BITS);
 }
 
 void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong entry,
@@ -770,7 +779,7 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong 
entry,
 entry &= PPC4XX_TLB_ENTRY_MASK;
 tlb = &env->tlb.tlbe[entry];
 /* Invalidate previous TLB (if it's valid) */
-if (tlb->prot & PAGE_VALID) {
+if ((tlb->prot & PAGE_VALID) && tlb->PID == env->spr[SPR_40x_PID]) {
 qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
   TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
   (int)entry, tlb->EPN, tlb->EPN + tlb->size);
@@ -821,7 +830,7 @@ void helper_4xx_tlbwe_lo(CPUPPCState *env, target_ulong 
entry,
 entry &= PPC4XX_TLB_ENTRY_MASK;
 tlb = &env->tlb.tlbe[entry];
 /* Invalidate previous TLB (if it's valid) */
-if (tlb->prot & PAGE_VALID) {
+if ((tlb->prot & PAGE_VALID) && tlb->PID == env->spr[SPR_40x_PID]) {
 qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
   TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
   (int)entry, tlb->EPN, tlb->EPN + tlb->size);
@@ -851,6 +860,25 @@ target_ulong helper_4xx_tlbsx(CPUPPCState *env, 
target_ulong address)
 return ppcemb_tlb_search(env, address, env->spr[SPR_40x_PID]);
 }
 
+static bool mmubooke_pid_match(CPUPPCState *env, ppcemb_tlb_t *tlb)
+{
+if (tlb->PID == env->spr[SPR_BOOKE_PID]) {
+return true;
+}
+if (!env->nb_pids) {
+return false;
+}
+
+if (env->spr[SPR_BOOKE_PID1] && tlb->PID == env->spr[SPR_BOOKE_PID1]) {
+return true;
+}
+if (env->spr[SPR_BOOKE_PID2] && tlb->PID == env->spr[SPR_BOOKE_PID2]) {
+return true;
+}
+
+return false;
+}
+
 /* PowerPC 440 TLB management */
 void helper_440_tlbwe(CPUPPCState *env, uint32_t word, target_ulong entry,
   target_ulong value)
@@ -863,7 +891,10 @@ void helper_440_tlbwe(CPUPPCState *env, uint32_t word, 
target_ulong entry,
 tlb = &env->tlb.tlbe[entry];
 
 /* Invalidate previous TLB (if it's valid) */
-if (tlb->prot & PAGE_VALID) {
+if ((tlb->prot & PAGE_VALID) && mmubooke_pid_match(env, tlb)) {
+qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
+  TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
+  (int)entry, tlb->EPN, tlb->EPN + tlb->size);
 ppcemb_tlb_flush(env_cpu(env), tlb);
 }
 
-- 
2.42.0

[PULL 40/49] ppc/pnv: Wire ChipTOD model to powernv9 and powernv10 machines

2024-02-19 Thread Nicholas Piggin

Wire the ChipTOD model to powernv9 and powernv10 machines.

Suggested-by-by: Cédric Le Goater 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/pnv.c  | 30 ++
 include/hw/ppc/pnv_chip.h |  3 +++
 2 files changed, 33 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index acc4db00c1..8beddb1313 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1427,6 +1427,8 @@ static void pnv_chip_power9_instance_init(Object *obj)
 
 object_initialize_child(obj, "lpc", &chip9->lpc, TYPE_PNV9_LPC);
 
+object_initialize_child(obj, "chiptod", &chip9->chiptod, 
TYPE_PNV9_CHIPTOD);
+
 object_initialize_child(obj, "occ", &chip9->occ, TYPE_PNV9_OCC);
 
 object_initialize_child(obj, "sbe", &chip9->sbe, TYPE_PNV9_SBE);
@@ -1573,6 +1575,19 @@ static void pnv_chip_power9_realize(DeviceState *dev, 
Error **errp)
 chip->dt_isa_nodename = g_strdup_printf("/lpcm-opb@%" PRIx64 "/lpc@0",
 (uint64_t) PNV9_LPCM_BASE(chip));
 
+/* ChipTOD */
+object_property_set_bool(OBJECT(&chip9->chiptod), "primary",
+ chip->chip_id == 0, &error_abort);
+object_property_set_bool(OBJECT(&chip9->chiptod), "secondary",
+ chip->chip_id == 1, &error_abort);
+object_property_set_link(OBJECT(&chip9->chiptod), "chip", OBJECT(chip),
+ &error_abort);
+if (!qdev_realize(DEVICE(&chip9->chiptod), NULL, errp)) {
+return;
+}
+pnv_xscom_add_subregion(chip, PNV9_XSCOM_CHIPTOD_BASE,
+&chip9->chiptod.xscom_regs);
+
 /* Create the simplified OCC model */
 if (!qdev_realize(DEVICE(&chip9->occ), NULL, errp)) {
 return;
@@ -1685,6 +1700,8 @@ static void pnv_chip_power10_instance_init(Object *obj)
   "xive-fabric");
 object_initialize_child(obj, "psi", &chip10->psi, TYPE_PNV10_PSI);
 object_initialize_child(obj, "lpc", &chip10->lpc, TYPE_PNV10_LPC);
+object_initialize_child(obj, "chiptod", &chip10->chiptod,
+TYPE_PNV10_CHIPTOD);
 object_initialize_child(obj, "occ",  &chip10->occ, TYPE_PNV10_OCC);
 object_initialize_child(obj, "sbe",  &chip10->sbe, TYPE_PNV10_SBE);
 object_initialize_child(obj, "homer", &chip10->homer, TYPE_PNV10_HOMER);
@@ -1820,6 +1837,19 @@ static void pnv_chip_power10_realize(DeviceState *dev, 
Error **errp)
 chip->dt_isa_nodename = g_strdup_printf("/lpcm-opb@%" PRIx64 "/lpc@0",
 (uint64_t) PNV10_LPCM_BASE(chip));
 
+/* ChipTOD */
+object_property_set_bool(OBJECT(&chip10->chiptod), "primary",
+ chip->chip_id == 0, &error_abort);
+object_property_set_bool(OBJECT(&chip10->chiptod), "secondary",
+ chip->chip_id == 1, &error_abort);
+object_property_set_link(OBJECT(&chip10->chiptod), "chip", OBJECT(chip),
+ &error_abort);
+if (!qdev_realize(DEVICE(&chip10->chiptod), NULL, errp)) {
+return;
+}
+pnv_xscom_add_subregion(chip, PNV10_XSCOM_CHIPTOD_BASE,
+&chip10->chiptod.xscom_regs);
+
 /* Create the simplified OCC model */
 if (!qdev_realize(DEVICE(&chip10->occ), NULL, errp)) {
 return;
diff --git a/include/hw/ppc/pnv_chip.h b/include/hw/ppc/pnv_chip.h
index 9b06c8d87c..af4cd7a8b8 100644
--- a/include/hw/ppc/pnv_chip.h
+++ b/include/hw/ppc/pnv_chip.h
@@ -2,6 +2,7 @@
 #define PPC_PNV_CHIP_H
 
 #include "hw/pci-host/pnv_phb4.h"
+#include "hw/ppc/pnv_chiptod.h"
 #include "hw/ppc/pnv_core.h"
 #include "hw/ppc/pnv_homer.h"
 #include "hw/ppc/pnv_n1_chiplet.h"
@@ -79,6 +80,7 @@ struct Pnv9Chip {
 PnvXive  xive;
 Pnv9Psi  psi;
 PnvLpcController lpc;
+PnvChipTOD   chiptod;
 PnvOCC   occ;
 PnvSBE   sbe;
 PnvHomer homer;
@@ -111,6 +113,7 @@ struct Pnv10Chip {
 PnvXive2 xive;
 Pnv9Psi  psi;
 PnvLpcController lpc;
+PnvChipTOD   chiptod;
 PnvOCC   occ;
 PnvSBE   sbe;
 PnvHomer homer;
-- 
2.42.0

[PULL 45/49] target/ppc: Factor out 4xx ppcemb_tlb_t flushing

2024-02-19 Thread Nicholas Piggin

Flushing the TCG TLB pages that cache a software TLB is a common
operation, factor it into its own function.

Tested-by: BALATON Zoltan 
Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/mmu_helper.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index c140f3c96d..949ae87f4f 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -749,12 +749,20 @@ target_ulong helper_4xx_tlbre_lo(CPUPPCState *env, 
target_ulong entry)
 return ret;
 }
 
+static void ppcemb_tlb_flush(CPUState *cs, ppcemb_tlb_t *tlb)
+{
+target_ulong ea;
+
+for (ea = tlb->EPN; ea < tlb->EPN + tlb->size; ea += TARGET_PAGE_SIZE) {
+tlb_flush_page(cs, ea);
+}
+}
+
 void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong entry,
  target_ulong val)
 {
 CPUState *cs = env_cpu(env);
 ppcemb_tlb_t *tlb;
-target_ulong page, end;
 
 qemu_log_mask(CPU_LOG_MMU, "%s entry %d val " TARGET_FMT_lx "\n",
   __func__, (int)entry,
@@ -763,13 +771,10 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong 
entry,
 tlb = &env->tlb.tlbe[entry];
 /* Invalidate previous TLB (if it's valid) */
 if (tlb->prot & PAGE_VALID) {
-end = tlb->EPN + tlb->size;
 qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
   TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
-  (int)entry, tlb->EPN, end);
-for (page = tlb->EPN; page < end; page += TARGET_PAGE_SIZE) {
-tlb_flush_page(cs, page);
-}
+  (int)entry, tlb->EPN, tlb->EPN + tlb->size);
+ppcemb_tlb_flush(cs, tlb);
 }
 tlb->size = booke_tlb_to_page_size((val >> PPC4XX_TLBHI_SIZE_SHIFT)
& PPC4XX_TLBHI_SIZE_MASK);
@@ -805,13 +810,10 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong 
entry,
   tlb->prot & PAGE_VALID ? 'v' : '-', (int)tlb->PID);
 /* Invalidate new TLB (if valid) */
 if (tlb->prot & PAGE_VALID) {
-end = tlb->EPN + tlb->size;
 qemu_log_mask(CPU_LOG_MMU, "%s: invalidate TLB %d start "
   TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
-  (int)entry, tlb->EPN, end);
-for (page = tlb->EPN; page < end; page += TARGET_PAGE_SIZE) {
-tlb_flush_page(cs, page);
-}
+  (int)entry, tlb->EPN, tlb->EPN + tlb->size);
+ppcemb_tlb_flush(cs, tlb);
 }
 }
 
-- 
2.42.0

[PULL 32/49] hw/ppc: Add pnv nest pervasive common chiplet model

2024-02-19 Thread Nicholas Piggin

From: Chalapathi V 

A POWER10 chip is divided into logical units called chiplets. Chiplets
are broadly divided into "core chiplets" (with the processor cores) and
"nest chiplets" (with everything else). Each chiplet has an attachment
to the pervasive bus (PIB) and with chiplet-specific registers. All nest
chiplets have a common basic set of registers and This model will provide
the registers functionality for common registers of nest chiplet (Pervasive
Chiplet, PB Chiplet, PCI Chiplets, MC Chiplet, PAU Chiplets)

This commit implement the read/write functions of chiplet control registers.

Signed-off-by: Chalapathi V 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/meson.build  |   1 +
 hw/ppc/pnv_nest_pervasive.c | 208 
 include/hw/ppc/pnv_nest_pervasive.h |  32 +
 include/hw/ppc/pnv_xscom.h  |   3 +
 4 files changed, 244 insertions(+)
 create mode 100644 hw/ppc/pnv_nest_pervasive.c
 create mode 100644 include/hw/ppc/pnv_nest_pervasive.h

diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index 30bd2aaccf..e46c9bcd7b 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -53,6 +53,7 @@ ppc_ss.add(when: 'CONFIG_POWERNV', if_true: files(
   'pnv_bmc.c',
   'pnv_homer.c',
   'pnv_pnor.c',
+  'pnv_nest_pervasive.c',
 ))
 # PowerPC 4xx boards
 ppc_ss.add(when: 'CONFIG_PPC405', if_true: files(
diff --git a/hw/ppc/pnv_nest_pervasive.c b/hw/ppc/pnv_nest_pervasive.c
new file mode 100644
index 00..77476753a4
--- /dev/null
+++ b/hw/ppc/pnv_nest_pervasive.c
@@ -0,0 +1,208 @@
+/*
+ * QEMU PowerPC nest pervasive common chiplet model
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_nest_pervasive.h"
+
+/*
+ * Status, configuration, and control units in POWER chips is provided
+ * by the pervasive subsystem, which connects registers to the SCOM bus,
+ * which can be programmed by processor cores, other units on the chip,
+ * BMCs, or other POWER chips.
+ *
+ * A POWER10 chip is divided into logical units called chiplets. Chiplets
+ * are broadly divided into "core chiplets" (with the processor cores) and
+ * "nest chiplets" (with everything else). Each chiplet has an attachment
+ * to the pervasive bus (PIB) and with chiplet-specific registers.
+ * All nest chiplets have a common basic set of registers.
+ *
+ * This model will provide the registers functionality for common registers of
+ * nest unit (PB Chiplet, PCI Chiplets, MC Chiplet, PAU Chiplets)
+ *
+ * Currently this model provide the read/write functionality of chiplet control
+ * scom registers.
+ */
+
+#define CPLT_CONF0   0x08
+#define CPLT_CONF0_OR0x18
+#define CPLT_CONF0_CLEAR 0x28
+#define CPLT_CONF1   0x09
+#define CPLT_CONF1_OR0x19
+#define CPLT_CONF1_CLEAR 0x29
+#define CPLT_STAT0   0x100
+#define CPLT_MASK0   0x101
+#define CPLT_PROTECT_MODE0x3FE
+#define CPLT_ATOMIC_CLOCK0x3FF
+
+static uint64_t pnv_chiplet_ctrl_read(void *opaque, hwaddr addr, unsigned size)
+{
+PnvNestChipletPervasive *nest_pervasive = PNV_NEST_CHIPLET_PERVASIVE(
+  opaque);
+uint32_t reg = addr >> 3;
+uint64_t val = ~0ull;
+
+/* CPLT_CTRL0 to CPLT_CTRL5 */
+for (int i = 0; i < PNV_CPLT_CTRL_SIZE; i++) {
+if (reg == i) {
+return nest_pervasive->control_regs.cplt_ctrl[i];
+} else if ((reg == (i + 0x10)) || (reg == (i + 0x20))) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Write only register, ignoring "
+   "xscom read at 0x%" PRIx32 "\n",
+   __func__, reg);
+return val;
+}
+}
+
+switch (reg) {
+case CPLT_CONF0:
+val = nest_pervasive->control_regs.cplt_cfg0;
+break;
+case CPLT_CONF0_OR:
+case CPLT_CONF0_CLEAR:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Write only register, ignoring "
+   "xscom read at 0x%" PRIx32 "\n",
+   __func__, reg);
+break;
+case CPLT_CONF1:
+val = nest_pervasive->control_regs.cplt_cfg1;
+break;
+case CPLT_CONF1_OR:
+case CPLT_CONF1_CLEAR:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Write only register, ignoring "
+   "xscom read at 0x%" PRIx32 "\n",
+   __func__, reg);
+break;
+case CPLT_STAT0:
+val = nest_pervasive->control_regs.cplt_stat0;
+break;
+case CPLT_MASK0:
+val = nest_pervasive->control_regs.cplt_mask0;
+break;
+case CPLT_PROTECT_MODE:
+val = nest_pervasive->control_regs.ctrl_prote

[PULL 43/49] target/ppc: Add SMT support to time facilities

2024-02-19 Thread Nicholas Piggin

The TB, VTB, PURR, HDEC SPRs are per-LPAR registers, and the TFMR is a
per-core register. Add the necessary SMT synchronisation and value
sharing.

The TFMR can only drive the timebase state machine via thread 0 of the
core, which is almost certainly not right, but it is enough for skiboot
and certain other proprietary firmware.

Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/timebase_helper.c | 105 ---
 target/ppc/translate.c   |  42 +-
 2 files changed, 136 insertions(+), 11 deletions(-)

diff --git a/target/ppc/timebase_helper.c b/target/ppc/timebase_helper.c
index b8b9afe0b6..39d397416e 100644
--- a/target/ppc/timebase_helper.c
+++ b/target/ppc/timebase_helper.c
@@ -60,19 +60,55 @@ target_ulong helper_load_purr(CPUPPCState *env)
 
 void helper_store_purr(CPUPPCState *env, target_ulong val)
 {
-cpu_ppc_store_purr(env, val);
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+
+if (nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+cpu_ppc_store_purr(env, val);
+return;
+}
+
+THREAD_SIBLING_FOREACH(cs, ccs) {
+CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+cpu_ppc_store_purr(cenv, val);
+}
 }
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
 void helper_store_tbl(CPUPPCState *env, target_ulong val)
 {
-cpu_ppc_store_tbl(env, val);
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+
+if (nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+cpu_ppc_store_tbl(env, val);
+return;
+}
+
+THREAD_SIBLING_FOREACH(cs, ccs) {
+CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+cpu_ppc_store_tbl(cenv, val);
+}
 }
 
 void helper_store_tbu(CPUPPCState *env, target_ulong val)
 {
-cpu_ppc_store_tbu(env, val);
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+
+if (nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+cpu_ppc_store_tbu(env, val);
+return;
+}
+
+THREAD_SIBLING_FOREACH(cs, ccs) {
+CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+cpu_ppc_store_tbu(cenv, val);
+}
 }
 
 void helper_store_atbl(CPUPPCState *env, target_ulong val)
@@ -102,17 +138,53 @@ target_ulong helper_load_hdecr(CPUPPCState *env)
 
 void helper_store_hdecr(CPUPPCState *env, target_ulong val)
 {
-cpu_ppc_store_hdecr(env, val);
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+
+if (nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+cpu_ppc_store_hdecr(env, val);
+return;
+}
+
+THREAD_SIBLING_FOREACH(cs, ccs) {
+CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+cpu_ppc_store_hdecr(cenv, val);
+}
 }
 
 void helper_store_vtb(CPUPPCState *env, target_ulong val)
 {
-cpu_ppc_store_vtb(env, val);
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+
+if (nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+cpu_ppc_store_vtb(env, val);
+return;
+}
+
+THREAD_SIBLING_FOREACH(cs, ccs) {
+CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+cpu_ppc_store_vtb(cenv, val);
+}
 }
 
 void helper_store_tbu40(CPUPPCState *env, target_ulong val)
 {
-cpu_ppc_store_tbu40(env, val);
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+
+if (nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+cpu_ppc_store_tbu40(env, val);
+return;
+}
+
+THREAD_SIBLING_FOREACH(cs, ccs) {
+CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+cpu_ppc_store_tbu40(cenv, val);
+}
 }
 
 target_ulong helper_load_40x_pit(CPUPPCState *env)
@@ -211,6 +283,21 @@ static uint64_t tfmr_new_tb_state(uint64_t tfmr, unsigned 
int tbst)
 return tfmr;
 }
 
+static void write_tfmr(CPUPPCState *env, target_ulong val)
+{
+CPUState *cs = env_cpu(env);
+
+if (cs->nr_threads == 1) {
+env->spr[SPR_TFMR] = val;
+} else {
+CPUState *ccs;
+THREAD_SIBLING_FOREACH(cs, ccs) {
+CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+cenv->spr[SPR_TFMR] = val;
+}
+}
+}
+
 static void tb_state_machine_step(CPUPPCState *env)
 {
 uint64_t tfmr = env->spr[SPR_TFMR];
@@ -224,7 +311,7 @@ static void tb_state_machine_step(CPUPPCState *env)
 env->pnv_tod_tbst.tb_sync_pulse_timer--;
 } else {
 tfmr |= TFMR_TB_SYNC_OCCURED;
-env->spr[SPR_TFMR] = tfmr;
+write_tfmr(env, tfmr);
 }
 
 if (env->pnv_tod_tbst.tb_state_timer) {
@@ -262,7 +349,7 @@ static void tb_state_machine_step(CPUPPCState *env)
 }
 }
 
-env->spr[SPR_TFMR] = tfmr;
+write_tfmr(env, tfmr);
 }
 
 target_ulong helper_load_tfmr(CPUPPCState *env)
@@ -357,7 +444,7 @@

[PULL 44/49] target/ppc: Fix 440 tlbwe TLB invalidation gaps

2024-02-19 Thread Nicholas Piggin

The 440 tlbwe (write entry) instruction misses several cases that must
flush the TCG TLB:

- If the new size is smaller than the existing size, the EA no longer
  covered should be flushed. This looks like an inverted inequality
  test.
- If the TLB PID changes.
- If the TLB attr bit 0 (translation address space) changes.
- If low prot (access control) bits change.

Fix this by removing tricks to avoid TLB flushes, and just invalidate
the TLB if any valid entry is being changed, similarly to 4xx.
Optimisations will be introduced in subsequent changes.

Tested-by: BALATON Zoltan 
Acked-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/mmu_helper.c | 35 ++-
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index f87d35379a..c140f3c96d 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -855,49 +855,34 @@ void helper_440_tlbwe(CPUPPCState *env, uint32_t word, 
target_ulong entry,
   target_ulong value)
 {
 ppcemb_tlb_t *tlb;
-target_ulong EPN, RPN, size;
-int do_flush_tlbs;
 
 qemu_log_mask(CPU_LOG_MMU, "%s word %d entry %d value " TARGET_FMT_lx "\n",
   __func__, word, (int)entry, value);
-do_flush_tlbs = 0;
 entry &= 0x3F;
 tlb = &env->tlb.tlbe[entry];
+
+/* Invalidate previous TLB (if it's valid) */
+if (tlb->prot & PAGE_VALID) {
+tlb_flush(env_cpu(env));
+}
+
 switch (word) {
 default:
 /* Just here to please gcc */
 case 0:
-EPN = value & 0xFC00;
-if ((tlb->prot & PAGE_VALID) && EPN != tlb->EPN) {
-do_flush_tlbs = 1;
-}
-tlb->EPN = EPN;
-size = booke_tlb_to_page_size((value >> 4) & 0xF);
-if ((tlb->prot & PAGE_VALID) && tlb->size < size) {
-do_flush_tlbs = 1;
-}
-tlb->size = size;
+tlb->EPN = value & 0xFC00;
+tlb->size = booke_tlb_to_page_size((value >> 4) & 0xF);
 tlb->attr &= ~0x1;
 tlb->attr |= (value >> 8) & 1;
 if (value & 0x200) {
 tlb->prot |= PAGE_VALID;
 } else {
-if (tlb->prot & PAGE_VALID) {
-tlb->prot &= ~PAGE_VALID;
-do_flush_tlbs = 1;
-}
+tlb->prot &= ~PAGE_VALID;
 }
 tlb->PID = env->spr[SPR_440_MMUCR] & 0x00FF;
-if (do_flush_tlbs) {
-tlb_flush(env_cpu(env));
-}
 break;
 case 1:
-RPN = value & 0xFC0F;
-if ((tlb->prot & PAGE_VALID) && tlb->RPN != RPN) {
-tlb_flush(env_cpu(env));
-}
-tlb->RPN = RPN;
+tlb->RPN = value & 0xFC0F;
 break;
 case 2:
 tlb->attr = (tlb->attr & 0x1) | (value & 0xFF00);
-- 
2.42.0

[PULL 14/49] hw/ppc/spapr_hcall: Allow elision of softmmu_resize_hpt_prep

2024-02-19 Thread Nicholas Piggin

From: Philippe Mathieu-Daudé 

Check tcg_enabled() before calling softmmu_resize_hpt_prepare()
and softmmu_resize_hpt_commit() to allow the compiler to elide
their calls. The stubs are then unnecessary, remove them.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr_hcall.c  | 12 
 target/ppc/tcg-stub.c | 15 ---
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index fcefd1d1c7..0d7d523e6d 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -123,9 +123,11 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
 
 if (kvm_enabled()) {
 return H_HARDWARE;
+} else if (tcg_enabled()) {
+return softmmu_resize_hpt_prepare(cpu, spapr, shift);
+} else {
+g_assert_not_reached();
 }
-
-return softmmu_resize_hpt_prepare(cpu, spapr, shift);
 }
 
 static void do_push_sregs_to_kvm_pr(CPUState *cs, run_on_cpu_data data)
@@ -191,9 +193,11 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
 
 if (kvm_enabled()) {
 return H_HARDWARE;
+} else if (tcg_enabled()) {
+return softmmu_resize_hpt_commit(cpu, spapr, flags, shift);
+} else {
+g_assert_not_reached();
 }
-
-return softmmu_resize_hpt_commit(cpu, spapr, flags, shift);
 }
 
 
diff --git a/target/ppc/tcg-stub.c b/target/ppc/tcg-stub.c
index aadcf59d26..740d796b98 100644
--- a/target/ppc/tcg-stub.c
+++ b/target/ppc/tcg-stub.c
@@ -28,18 +28,3 @@ void create_ppc_opcodes(PowerPCCPU *cpu, Error **errp)
 void destroy_ppc_opcodes(PowerPCCPU *cpu)
 {
 }
-
-target_ulong softmmu_resize_hpt_prepare(PowerPCCPU *cpu,
-SpaprMachineState *spapr,
-target_ulong shift)
-{
-g_assert_not_reached();
-}
-
-target_ulong softmmu_resize_hpt_commit(PowerPCCPU *cpu,
-   SpaprMachineState *spapr,
-   target_ulong flags,
-   target_ulong shift)
-{
-g_assert_not_reached();
-}
-- 
2.42.0

[PULL 03/49] tests/avocado: mark boot_linux.py long runtime instead of flaky

2024-02-19 Thread Nicholas Piggin

The ppc64 and s390x tests were first marked skipIf GITLAB_CI by commit
c0c8687ef0f ("tests/avocado: disable BootLinuxPPC64 test in CI"), and
commit 0f26d94ec9e ("tests/acceptance: skip s390x_ccw_vrtio_tcg on
GitLab") due to being very heavy-weight for gitlab CI.

Commit 9b45cc99318 ("docs/devel: rationalise unstable gitlab tests under
FLAKY_TESTS") changed this to being flaky but it isn't really, it just
had a long runtime.

So take the SPEED=slow variable from qtests and introduce it to avocado,
and make these tests require it.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 docs/devel/testing.rst  | 11 +++
 tests/avocado/boot_linux.py |  8 ++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index bd132306c1..5cdc23b90f 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -1346,6 +1346,17 @@ the environment.
 The definition of *large* is a bit arbitrary here, but it usually means an
 asset which occupies at least 1GB of size on disk when uncompressed.
 
+SPEED
+^
+Tests which have a long runtime will not be run unless ``SPEED=slow`` is
+exported on the environment.
+
+The definition of *long* is a bit arbitrary here, and it depends on the
+usefulness of the test too. A unique test is worth spending more time on,
+small variations on existing tests perhaps less so. As a rough guide,
+a test or set of similar tests which take more than 100 seconds to
+complete.
+
 AVOCADO_ALLOW_UNTRUSTED_CODE
 
 There are tests which will boot a kernel image or firmware that can be
diff --git a/tests/avocado/boot_linux.py b/tests/avocado/boot_linux.py
index 7c4769904e..de4c8805f7 100644
--- a/tests/avocado/boot_linux.py
+++ b/tests/avocado/boot_linux.py
@@ -93,13 +93,11 @@ class BootLinuxPPC64(LinuxTest):
 
 timeout = 360
 
-@skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
GitLab')
-
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
 def test_pseries_tcg(self):
 """
 :avocado: tags=machine:pseries
 :avocado: tags=accel:tcg
-:avocado: tags=flaky
 """
 self.require_accelerator("tcg")
 self.vm.add_args("-accel", "tcg")
@@ -113,13 +111,11 @@ class BootLinuxS390X(LinuxTest):
 
 timeout = 240
 
-@skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
GitLab')
-
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
 def test_s390_ccw_virtio_tcg(self):
 """
 :avocado: tags=machine:s390-ccw-virtio
 :avocado: tags=accel:tcg
-:avocado: tags=flaky
 """
 self.require_accelerator("tcg")
 self.vm.add_args("-accel", "tcg")
-- 
2.42.0

Re: [PATCH 1/6] hw/arm: Inline sysbus_create_simple(PL110 / PL111)

2024-02-19 Thread Philippe Mathieu-Daudé


On 16/2/24 20:54, Philippe Mathieu-Daudé wrote:

On 16/2/24 18:14, BALATON Zoltan wrote:

On Fri, 16 Feb 2024, Philippe Mathieu-Daudé wrote:

We want to set another qdev property (a link) for the pl110
and pl111 devices, we can not use sysbus_create_simple() which
only passes sysbus base address and IRQs as arguments. Inline
it so we can set the link property in the next commit.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/arm/realview.c    |  5 -
hw/arm/versatilepb.c |  6 +-
hw/arm/vexpress.c    | 10 --
3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 9058f5b414..77300e92e5 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -238,7 +238,10 @@ static void realview_init(MachineState *machine,
    sysbus_create_simple("pl061", 0x10014000, pic[7]);
    gpio2 = sysbus_create_simple("pl061", 0x10015000, pic[8]);

-    sysbus_create_simple("pl111", 0x1002, pic[23]);
+    dev = qdev_new("pl111");
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, 0x1002);
+    sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[23]);


Not directly related to this patch but this blows up 1 line into 4 
just to allow setting a property. Maybe just to keep some simplicity 
we'd rather need either a sysbus_realize_simple function that takes a 
sysbus device instead of the name and does not create the device 
itself or some way to pass properties to sysbus create simple (but the 
latter may not be easy to do in a generic way so not sure about that). 
What do you think?


Unfortunately sysbus doesn't scale in heterogeneous setup.


Regarding the HW modelling API complexity you are pointing at, we'd
like to move from the current imperative programming paradigm to a
declarative one, likely DSL driven. Meanwhile it is being investigated
(as part of "Dynamic Machine"), I'm trying to get the HW APIs right
for heterogeneous emulation. Current price to pay is a verbose
imperative QDev API, hoping we'll get later a trivial declarative one
(like this single sysbus_create_simple call), where we shouldn't worry
about the order of low level calls, whether to use link or not, etc.

For the big list of issues we are trying to improve, see:
https://lore.kernel.org/qemu-devel/87o7d1i7ky@pond.sub.org/

Re: [PATCH] hw/i386/pc_q35: Populate interrupt handlers before realizing LPC PCI function

2024-02-19 Thread Philippe Mathieu-Daudé


On 17/2/24 11:46, Bernhard Beschow wrote:

The interrupt handlers need to be populated before the device is realized since
internal devices such as the RTC are wired during realize(). If the interrupt
handlers aren't populated, devices such as the RTC will be wired with a NULL
interrupt handler, i.e. MC146818RtcState::irq is NULL.

Fixes: fc11ca08bc29 "hw/i386/q35: Realize LPC PCI function before accessing it"


I think this commit is correct, but exposes a pre-existing bug.

I noticed it for the PC equivalent, so didn't posted the
pci_realize_and_unref() change there, but missed the Q35 is
similarly affected.

IMO the problem is how the GSI lines are allocated. The ISA
ones are allocated twice!

Before this patch, the 1st alloc is just overwritten and
ignored, ISA RTC IRQ is assigned to the 2nd alloc.

After this patch, ISA RTC IRQ is assigned to the 1st alloc,
then the 2nd alloc wipe it, and an empty IRQ is eventually
wired later.

The proper fix is to alloc ISA IRQs just once. Either filling
GSI with them, or having GSI take care of that.

Since GSI is not a piece of HW but a concept to simplify
developers writing x86 HW drivers, I currently think we shouldn't
model it as a QOM container.


Cc: Philippe Mathieu-Daudé 
Signed-off-by: Bernhard Beschow 
---
  hw/i386/pc_q35.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index d346fa3b1d..43675bf597 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -240,10 +240,10 @@ static void pc_q35_init(MachineState *machine)
  lpc_dev = DEVICE(lpc);
  qdev_prop_set_bit(lpc_dev, "smm-enabled",
x86_machine_is_smm_enabled(x86ms));
-pci_realize_and_unref(lpc, host_bus, &error_fatal);
  for (i = 0; i < IOAPIC_NUM_PINS; i++) {
  qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, x86ms->gsi[i]);
  }
+pci_realize_and_unref(lpc, host_bus, &error_fatal);
  
  rtc_state = ISA_DEVICE(object_resolve_path_component(OBJECT(lpc), "rtc"));

Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation

2024-02-19 Thread Avihai Horon


Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:

External email: Use caution opening links or attachments


There's generally two modes of operation for IOMMUFD:

* The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
and mainly performs IOAS_MAP and UNMAP.

* The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.

For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.

For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
which is responsible for creating an IOMMU domain. This is contrast to
the 'simple API' where the IOMMU domain is created by IOMMUFD
automatically when it attaches to VFIO (usually referred as autodomains)

To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimmicing kernel
iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
to IOAS attach.

Signed-off-by: Joao Martins 
---
Right now the only alternative to a userspace autodomains implementation
is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
IOAS attach. So opted for autodomains userspace approach to avoid the
duplication of hwpt-alloc flags vs attach-ioas flags. I lack mdev real
drivers atm, so testing with those is still TBD.

Opinions, comments, welcome!
---
  backends/iommufd.c| 29 +
  backends/trace-events |  1 +
  hw/vfio/iommufd.c | 78 +++
  include/hw/vfio/vfio-common.h |  9 
  include/sysemu/iommufd.h  |  4 ++
  5 files changed, 121 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 8486894f1b3f..2970135af4b9 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -211,6 +211,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t 
ioas_id,
  return ret;
  }

+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+   uint32_t pt_id, uint32_t flags,
+   uint32_t data_type, uint32_t data_len,
+   void *data_ptr, uint32_t *out_hwpt)
+{
+int ret;
+struct iommu_hwpt_alloc alloc_hwpt = {
+.size = sizeof(struct iommu_hwpt_alloc),
+.flags = flags,
+.dev_id = dev_id,
+.pt_id = pt_id,
+.data_type = data_type,
+.data_len = data_len,
+.data_uptr = (uint64_t)data_ptr,
+.__reserved = 0,
+};
+
+ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id, flags, data_type,
+ data_len, (uint64_t)data_ptr,
+ alloc_hwpt.out_hwpt_id, ret);
+if (ret) {
+error_report("IOMMU_HWPT_ALLOC failed: %m");
+} else {
+*out_hwpt = alloc_hwpt.out_hwpt_id;
+}
+return !ret ? 0 : -errno;
+}
+
  static const TypeInfo iommufd_backend_info = {
  .name = TYPE_IOMMUFD_BACKEND,
  .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index d45c6e31a67e..f83a276a4253 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -13,5 +13,6 @@ iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
  iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool 
readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p 
readonly=%d (%d)"
  iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) 
" Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" 
(%d)"
  iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " 
iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t 
hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d 
dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u 
(%d)"
  iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d 
ioas=%d (%d)"
  iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d 
(%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 7d39d7a5fa51..ca7ec45e725c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -219,10 +219,82 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice 
*vbasedev, Error **er

Re: [PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability

2024-02-19 Thread Avihai Horon


Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:

External email: Use caution opening links or attachments


Probe hardware dirty tracking support by querying device hw capabilities
via IOMMUFD_GET_HW_INFO.

In preparation to using the dirty tracking UAPI, request dirty tracking in
the HWPT flags when the device doesn't support dirty page tracking or has
it disabled; or when support when the VF backing IOMMU supports dirty
tracking. The latter is in the possibility of a device being attached
that doesn't have a dirty tracker.

Signed-off-by: Joao Martins 
---
  hw/vfio/common.c  | 18 ++
  hw/vfio/iommufd.c | 25 -
  include/hw/vfio/vfio-common.h |  2 ++
  3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7f85160be88..d8fc7077f839 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -216,6 +216,24 @@ bool vfio_devices_all_device_dirty_tracking(const 
VFIOContainerBase *bcontainer)
  return true;
  }

+bool vfio_device_migration_supported(VFIODevice *vbasedev)
+{
+if (!vbasedev->migration) {
+return false;
+}
+
+return vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY;


I think this is redundant, as (vbasedev->migration != NULL) implies 
(vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY) == true.



+}
+
+bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev)
+{
+if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+return false;
+}
+
+return !vbasedev->dirty_pages_supported;
+}
+
  /*
   * Check if all VFIO devices are running and migration is active, which is
   * essentially equivalent to the migration being in pre-copy phase.
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index ca7ec45e725c..edacb6d72748 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -219,11 +219,26 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice 
*vbasedev, Error **errp)
  return ret;
  }

+static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
+  Error **errp)
+{
+uint64_t caps;
+int r;
+
+r = iommufd_device_get_hw_capabilities(iommufd_dev, &caps, errp);
+if (r) {
+return false;
+}
+
+return caps & IOMMU_HW_CAP_DIRTY_TRACKING;


The false return value of this function is overloaded, it can indicate 
both error and lack of DPT support.
Should we fail iommufd_cdev_autodomains_get() if 
iommufd_dirty_pages_supported() fails?
Otherwise, errp argument of iommufd_dirty_pages_supported() is redundant 
and we can handle iommufd_device_get_hw_capabilities() error locally.



+}
+
  static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
  VFIOIOMMUFDContainer *container,
  Error **errp)
  {
  int iommufd = vbasedev->iommufd_dev.iommufd->fd;
+uint32_t flags = 0;
  VFIOIOASHwpt *hwpt;
  Error *err = NULL;
  int ret = -EINVAL;
@@ -245,9 +260,15 @@ static int iommufd_cdev_autodomains_get(VFIODevice 
*vbasedev,
  }
  }

+if ((vfio_device_migration_supported(vbasedev) &&
+ !vfio_device_dirty_pages_supported(vbasedev)) ||
+iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err)) {


I think it's too early to check vfio_device_migration_supported() and 
vfio_device_dirty_pages_supported() here, as vfio_migration_init() 
hasn't been called yet so vbasedev->migration and 
vbasedev->dirty_pages_supported are not initialized.
Why do we need to check this? Can't we simply request IOMMUFD DPT if 
it's supported?


Thanks.


+flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+}
+
  ret = iommufd_backend_alloc_hwpt(iommufd,
   vbasedev->iommufd_dev.devid,
- container->ioas_id, 0, 0, 0,
+ container->ioas_id, flags, 0, 0,
   NULL, &hwpt_id);
  if (ret) {
  error_append_hint(&err,
@@ -271,6 +292,8 @@ static int iommufd_cdev_autodomains_get(VFIODevice 
*vbasedev,
  vbasedev->hwpt = hwpt;
  QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
  QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+container->bcontainer.dirty_pages_supported =
+  (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
  return 0;
  }

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7f7d823221e2..a3e691c126c6 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -271,6 +271,8 @@ bool
  vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
  bool
  vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
+bool vfio_device_migration_supported(VFIODevice *vbasedev);
+bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev);

Re: [PATCH RFCv2 4/8] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support

2024-02-19 Thread Avihai Horon


Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:

External email: Use caution opening links or attachments


ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking.

It is called on the whole list of iommu domains it is are tracking,
and on failure it rolls it back.

Signed-off-by: Joao Martins 
---
  backends/iommufd.c   | 19 +++
  backends/trace-events|  1 +
  hw/vfio/common.c |  7 ++-
  hw/vfio/iommufd.c| 28 
  include/sysemu/iommufd.h |  3 +++
  5 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2970135af4b9..954de61c2da0 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -240,6 +240,25 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t 
dev_id,
  return !ret ? 0 : -errno;
  }

+int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
+   bool start)
+{
+int ret;
+struct iommu_hwpt_set_dirty_tracking set_dirty = {
+.size = sizeof(set_dirty),
+.hwpt_id = hwpt_id,
+.flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
+};
+
+ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &set_dirty);
+trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret);
+if (ret) {
+error_report("IOMMU_HWPT_SET_DIRTY_TRACKING failed: %s",
+ strerror(errno));
+}
+return !ret ? 0 : -errno;
+}
+
  static const TypeInfo iommufd_backend_info = {
  .name = TYPE_IOMMUFD_BACKEND,
  .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index f83a276a4253..feba2baca5f7 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, 
uint64_t iova, uint64_t si
  iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t 
hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d 
dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u 
(%d)"
  iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d 
ioas=%d (%d)"
  iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d 
(%d)"
+iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " 
iommufd=%d hwpt=%d enable=%d (%d)"


s/hwpt=%d/hwpt=%u


diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d8fc7077f839..a940c0b6ede8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -190,7 +190,7 @@ static bool 
vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
  QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
  VFIOMigration *migration = vbasedev->migration;

-if (!migration) {
+if (!migration && !vbasedev->iommufd_dev.iommufd) {
  return false;
  }

@@ -199,6 +199,11 @@ static bool 
vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
   vfio_device_state_is_precopy(vbasedev))) {
  return false;
  }
+
+if (vbasedev->iommufd_dev.iommufd &&
+!bcontainer->dirty_pages_supported) {
+return false;
+}


Why do we need this and the above?
IIUC, vfio_devices_all_dirty_tracking() is used to check if this is a 
"proper time" to issue a dirty page sync (e.g., if migration is active, 
if we are in pre-copy and dirty tracking during pre-copy is enabled).
If it's a "proper time" to do dirty page sync, even if 
bcontainer->dirty_pages_supported is false, we should still issue a 
dirty sync which will mark all dirty.



  }
  return true;
  }
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index edacb6d72748..361e659288fd 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
  #include "qemu/cutils.h"
  #include "qemu/chardev_open.h"
  #include "pci.h"
+#include "migration/migration.h"


This is redundant.

Thanks.



  static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
  ram_addr_t size, void *vaddr, bool readonly)
@@ -115,6 +116,32 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice 
*vbasedev)
  iommufd_backend_disconnect(vbasedev->iommufd_dev.iommufd);
  }

+static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
+   bool start)
+{
+const VFIOIOMMUFDContainer *container =
+container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+int ret;
+VFIOIOASHwpt *hwpt;
+
+QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ret = iommufd_backend_set_dirty_tracking(container->be,
+ hwpt->hwpt_id, start);
+if (ret) {
+goto err;
+}
+}
+
+return 0;
+
+err:
+QLIST_F

Re: [PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support

2024-02-19 Thread Avihai Horon


Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:

External email: Use caution opening links or attachments


ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.

A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.

Signed-off-by: Joao Martins 
---
  backends/iommufd.c   | 24 
  backends/trace-events|  1 +
  hw/vfio/iommufd.c| 30 ++
  include/sysemu/iommufd.h |  3 +++
  4 files changed, 58 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 954de61c2da0..dd676d493c37 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -259,6 +259,30 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, 
uint32_t hwpt_id,
  return !ret ? 0 : -errno;
  }

+int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data)
+{
+int ret;
+struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
+.size = sizeof(get_dirty_bitmap),
+.hwpt_id = hwpt_id,
+.iova = iova, .length = size,
+.page_size = page_size, .data = (uintptr_t)data,


Member per line for readability?


+};
+
+ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &get_dirty_bitmap);
+trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
+   page_size, ret);
+if (ret) {
+error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
+ " size: 0x%"PRIx64") failed: %s", iova,
+ size, strerror(errno));
+}
+
+return !ret ? 0 : -errno;
+}
+
  static const TypeInfo iommufd_backend_info = {
  .name = TYPE_IOMMUFD_BACKEND,
  .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index feba2baca5f7..11a27cb114b6 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -17,3 +17,4 @@ iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, 
uint32_t pt_id, uint32_
  iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d 
ioas=%d (%d)"
  iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d 
(%d)"
  iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " 
iommufd=%d hwpt=%d enable=%d (%d)"
+iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int 
ret) " iommufd=%d hwpt=%d iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" 
(%d)"


s/hwpt=%d/hwpt=%u


diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 361e659288fd..79b13bd262cc 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
  #include "qemu/cutils.h"
  #include "qemu/chardev_open.h"
  #include "pci.h"
+#include "exec/ram_addr.h"
  #include "migration/migration.h"

  static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
@@ -142,6 +143,34 @@ err:
  return ret;
  }

+static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
+  VFIOBitmap *vbmap, uint64_t iova,
+  uint64_t size)
+{
+VFIOIOMMUFDContainer *container = container_of(bcontainer,
+   VFIOIOMMUFDContainer,
+   bcontainer);
+int ret;
+VFIOIOASHwpt *hwpt;
+unsigned long page_size;
+
+if (!bcontainer->dirty_pages_supported) {


Do we need this check?
IIUC, if we got to iommufd_query_dirty_bitmap(), it means 
bcontainer->dirty_pages_supported is already true.


Thanks.


+return 0;
+}
+
+page_size = qemu_real_host_page_size();
+QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
+   iova, size, page_size,
+   vbmap->bitmap);
+if (ret) {
+break;
+}
+}
+
+return ret;
+}
+
  static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
  {
  long int ret = -ENOTTY;
@@ -765,6 +794,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass 
*klass, void *data)
  vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
  vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
  vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
+vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
  };

  static const TypeInfo types[] = {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 562c189dd92c..ba19b7ea4c19 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -55,5 +55,8 @@ int iommufd

RE: [PATCH] vhost_net: add NOTIFICATION_DATA and IN_ORDER feature bits to vdpa_feature_bits

2024-02-19 Thread Srujana Challa

Ping.

> Subject: RE: [PATCH] vhost_net: add NOTIFICATION_DATA and IN_ORDER
> feature bits to vdpa_feature_bits
> 
> Hi Michael,
> 
> Can you review this feature support patch, appreciate your review and
> comments.
> 
> Patch considers all feature bits supported by vhost net client type as part of
> feature negotiation to address the concerns raised in below thread.
> https://patchew.org/QEMU/1533833677-27512-1-git-send-email-
> i.maxim...@samsung.com/
> 
> Regards
> Vamsi
> 
> > -Original Message-
> > From: Srujana Challa 
> > Sent: Tuesday, January 2, 2024 4:45 PM
> > To: qemu-devel@nongnu.org
> > Cc: m...@redhat.com; Vamsi Krishna Attunuru ;
> > Jerin Jacob Kollanukkaran 
> > Subject: [PATCH] vhost_net: add NOTIFICATION_DATA and IN_ORDER
> feature
> > bits to vdpa_feature_bits
> >
> > Enables VIRTIO_F_NOTIFICATION_DATA and VIRTIO_F_IN_ORDER feature
> bits
> > for vhost vdpa backend. Also adds code to consider all feature bits
> > supported by vhost net client type for feature negotiation, so that
> > vhost backend device supported features can be negotiated with guest.
> >
> > Signed-off-by: Srujana Challa 
> > ---
> >  hw/net/vhost_net.c | 10 ++
> >  net/vhost-vdpa.c   |  2 ++
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index
> > e8e1661646..65ae8bcece 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -117,6 +117,16 @@ static const int
> > *vhost_net_get_feature_bits(struct
> > vhost_net *net)
> >
> >  uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t
> > features) {
> > +const int *bit = vhost_net_get_feature_bits(net);
> > +
> > +/*
> > + * Consider all feature bits for feature negotiation with vhost 
> > backend,
> > + * so that all backend device supported features can be negotiated.
> > + */
> > +while (*bit != VHOST_INVALID_FEATURE_BIT) {
> > +features |= (1ULL << *bit);
> > +bit++;
> > +}
> >  return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
> >  features);
> >  }
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index
> > 3726ee5d67..51334fcfe2 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -57,7 +57,9 @@ typedef struct VhostVDPAState {
> >   */
> >  const int vdpa_feature_bits[] = {
> >  VIRTIO_F_ANY_LAYOUT,
> > +VIRTIO_F_IN_ORDER,
> >  VIRTIO_F_IOMMU_PLATFORM,
> > +VIRTIO_F_NOTIFICATION_DATA,
> >  VIRTIO_F_NOTIFY_ON_EMPTY,
> >  VIRTIO_F_RING_PACKED,
> >  VIRTIO_F_RING_RESET,
> > --
> > 2.25.1

Re: [PATCH] vhost_net: add NOTIFICATION_DATA and IN_ORDER feature bits to vdpa_feature_bits

2024-02-19 Thread Michael S. Tsirkin

On Tue, Jan 02, 2024 at 04:44:32PM +0530, Srujana Challa wrote:
> Enables VIRTIO_F_NOTIFICATION_DATA and VIRTIO_F_IN_ORDER feature bits
> for vhost vdpa backend. Also adds code to consider all feature bits
> supported by vhost net client type for feature negotiation, so that
> vhost backend device supported features can be negotiated with guest.
> 
> Signed-off-by: Srujana Challa 
> ---
>  hw/net/vhost_net.c | 10 ++
>  net/vhost-vdpa.c   |  2 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e8e1661646..65ae8bcece 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -117,6 +117,16 @@ static const int *vhost_net_get_feature_bits(struct 
> vhost_net *net)
>  
>  uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t features)
>  {
> +const int *bit = vhost_net_get_feature_bits(net);
> +
> +/*
> + * Consider all feature bits for feature negotiation with vhost backend,
> + * so that all backend device supported features can be negotiated.
> + */
> +while (*bit != VHOST_INVALID_FEATURE_BIT) {
> +features |= (1ULL << *bit);
> +bit++;
> +}
>  return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
>  features);
>  }

I don't think we should do this part. With vdpa QEMU is in control of
which features are exposed and that is intentional since features are
often tied to other behaviour.

> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 3726ee5d67..51334fcfe2 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -57,7 +57,9 @@ typedef struct VhostVDPAState {
>   */
>  const int vdpa_feature_bits[] = {
>  VIRTIO_F_ANY_LAYOUT,
> +VIRTIO_F_IN_ORDER,
>  VIRTIO_F_IOMMU_PLATFORM,
> +VIRTIO_F_NOTIFICATION_DATA,
>  VIRTIO_F_NOTIFY_ON_EMPTY,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
> -- 
> 2.25.1

Re: [PATCH] vhost_net: add NOTIFICATION_DATA and IN_ORDER feature bits to vdpa_feature_bits

2024-02-19 Thread Michael S. Tsirkin

Sorry this got tagged for Linux by mistake.
Replied now.

On Mon, Feb 19, 2024 at 09:38:46AM +, Srujana Challa wrote:
> Ping.
> 
> > Subject: RE: [PATCH] vhost_net: add NOTIFICATION_DATA and IN_ORDER
> > feature bits to vdpa_feature_bits
> > 
> > Hi Michael,
> > 
> > Can you review this feature support patch, appreciate your review and
> > comments.
> > 
> > Patch considers all feature bits supported by vhost net client type as part 
> > of
> > feature negotiation to address the concerns raised in below thread.
> > https://patchew.org/QEMU/1533833677-27512-1-git-send-email-
> > i.maxim...@samsung.com/
> > 
> > Regards
> > Vamsi
> > 
> > > -Original Message-
> > > From: Srujana Challa 
> > > Sent: Tuesday, January 2, 2024 4:45 PM
> > > To: qemu-devel@nongnu.org
> > > Cc: m...@redhat.com; Vamsi Krishna Attunuru ;
> > > Jerin Jacob Kollanukkaran 
> > > Subject: [PATCH] vhost_net: add NOTIFICATION_DATA and IN_ORDER
> > feature
> > > bits to vdpa_feature_bits
> > >
> > > Enables VIRTIO_F_NOTIFICATION_DATA and VIRTIO_F_IN_ORDER feature
> > bits
> > > for vhost vdpa backend. Also adds code to consider all feature bits
> > > supported by vhost net client type for feature negotiation, so that
> > > vhost backend device supported features can be negotiated with guest.
> > >
> > > Signed-off-by: Srujana Challa 
> > > ---
> > >  hw/net/vhost_net.c | 10 ++
> > >  net/vhost-vdpa.c   |  2 ++
> > >  2 files changed, 12 insertions(+)
> > >
> > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index
> > > e8e1661646..65ae8bcece 100644
> > > --- a/hw/net/vhost_net.c
> > > +++ b/hw/net/vhost_net.c
> > > @@ -117,6 +117,16 @@ static const int
> > > *vhost_net_get_feature_bits(struct
> > > vhost_net *net)
> > >
> > >  uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t
> > > features) {
> > > +const int *bit = vhost_net_get_feature_bits(net);
> > > +
> > > +/*
> > > + * Consider all feature bits for feature negotiation with vhost 
> > > backend,
> > > + * so that all backend device supported features can be negotiated.
> > > + */
> > > +while (*bit != VHOST_INVALID_FEATURE_BIT) {
> > > +features |= (1ULL << *bit);
> > > +bit++;
> > > +}
> > >  return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
> > >  features);
> > >  }
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index
> > > 3726ee5d67..51334fcfe2 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -57,7 +57,9 @@ typedef struct VhostVDPAState {
> > >   */
> > >  const int vdpa_feature_bits[] = {
> > >  VIRTIO_F_ANY_LAYOUT,
> > > +VIRTIO_F_IN_ORDER,
> > >  VIRTIO_F_IOMMU_PLATFORM,
> > > +VIRTIO_F_NOTIFICATION_DATA,
> > >  VIRTIO_F_NOTIFY_ON_EMPTY,
> > >  VIRTIO_F_RING_PACKED,
> > >  VIRTIO_F_RING_RESET,
> > > --
> > > 2.25.1

Re: [PATCH v5 10/10] tests/bench: Add bufferiszero-bench

2024-02-19 Thread Daniel P . Berrangé

On Sat, Feb 17, 2024 at 09:21:50AM -1000, Richard Henderson wrote:
> On 2/16/24 23:49, Alexander Monakov wrote:
> > 
> > On Fri, 16 Feb 2024, Richard Henderson wrote:
> > 
> > > Benchmark each acceleration function vs an aligned buffer of zeros.
> > > 
> > > Signed-off-by: Richard Henderson 
> > > ---
> > > +
> > > +static void test(const void *opaque)
> > > +{
> > > +size_t len = 64 * KiB;
> > 
> > This exceeds L1 cache capacity, so the performance ceiling of L2 cache
> > throughput is easier to hit with a suboptimal implementation. It also
> > seems to vastly exceed typical buffer sizes in Qemu.
> > 
> > When preparing the patch we mostly tested at 8 KiB. The size decides
> > whether the branch exiting the loop becomes perfectly predictable in
> > the microbenchmark, e.g. at 128 bytes per iteration it exits on the
> > 63'rd iteration, which Intel predictors cannot track, so we get
> > one mispredict per call.
> > 
> > (so perhaps smaller sizes like 2 or 4 KiB are better)
> 
> Fair.  I've adjusted to loop over 1, 4, 16, 64 KiB.
> 
> # Start of bufferiszero tests
> # buffer_is_zero #0: 1KB 49227.29 MB/sec
> # buffer_is_zero #0: 4KB 137461.28 MB/sec
> # buffer_is_zero #0: 16KB 224220.41 MB/sec
> # buffer_is_zero #0: 64KB 142461.00 MB/sec
> # buffer_is_zero #1: 1KB 45423.59 MB/sec
> # buffer_is_zero #1: 4KB 91409.69 MB/sec
> # buffer_is_zero #1: 16KB 123819.94 MB/sec
> # buffer_is_zero #1: 64KB 71173.75 MB/sec
> # buffer_is_zero #2: 1KB 35465.03 MB/sec
> # buffer_is_zero #2: 4KB 56110.46 MB/sec
> # buffer_is_zero #2: 16KB 68852.28 MB/sec
> # buffer_is_zero #2: 64KB 39043.80 MB/sec

Totally nit-picking, but it would be easier to read with a little
alignment and blanks lines:

 # buffer_is_zero #0:  1KB  49227.29 MB/sec
 # buffer_is_zero #0:  4KB 137461.28 MB/sec
 # buffer_is_zero #0: 16KB 224220.41 MB/sec
 # buffer_is_zero #0: 64KB 142461.00 MB/sec
 
 # buffer_is_zero #1:  1KB  45423.59 MB/sec
 # buffer_is_zero #1:  4KB  91409.69 MB/sec
 # buffer_is_zero #1: 16KB 123819.94 MB/sec
 # buffer_is_zero #1: 64KB  71173.75 MB/sec
 
 # buffer_is_zero #2:  1KB  35465.03 MB/sec
 # buffer_is_zero #2:  4KB  56110.46 MB/sec
 # buffer_is_zero #2: 16KB  68852.28 MB/sec
 # buffer_is_zero #2: 64KB  39043.80 MB/sec

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages

2024-02-19 Thread Avihai Horon


Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:

External email: Use caution opening links or attachments


Allow disabling hugepages to be dirty track at base page
granularity in similar vein to vfio_type1_iommu.disable_hugepages
but per IOAS.

Signed-off-by: Joao Martins 
---
  backends/iommufd.c   | 36 
  backends/trace-events|  1 +
  hw/vfio/iommufd.c|  4 
  include/sysemu/iommufd.h |  4 
  qapi/qom.json|  2 +-
  5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index dd676d493c37..72fd98a9a50c 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -29,6 +29,7 @@ static void iommufd_backend_init(Object *obj)
  be->fd = -1;
  be->users = 0;
  be->owned = true;
+be->hugepages = 1;
  }

  static void iommufd_backend_finalize(Object *obj)
@@ -63,6 +64,14 @@ static bool iommufd_backend_can_be_deleted(UserCreatable *uc)
  return !be->users;
  }

+static void iommufd_backend_set_hugepages(Object *obj, bool enabled,
+  Error **errp)
+{
+IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+be->hugepages = enabled;
+}
+
  static void iommufd_backend_class_init(ObjectClass *oc, void *data)
  {
  UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
@@ -70,6 +79,11 @@ static void iommufd_backend_class_init(ObjectClass *oc, void 
*data)
  ucc->can_be_deleted = iommufd_backend_can_be_deleted;

  object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
+
+object_class_property_add_bool(oc, "hugepages", NULL,
+   iommufd_backend_set_hugepages);
+object_class_property_set_description(oc, "hugepages",
+  "Set to 'off' to disable hugepages");
  }

  int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
@@ -106,6 +120,28 @@ out:
  trace_iommufd_backend_disconnect(be->fd, be->users);
  }

+int iommufd_backend_set_option(int fd, uint32_t object_id,
+   uint32_t option_id, uint64_t val64)
+{
+int ret;
+struct iommu_option option = {
+.size = sizeof(option),
+.option_id = option_id,
+.op = IOMMU_OPTION_OP_SET,
+.val64 = val64,
+.object_id = object_id,
+};
+
+ret = ioctl(fd, IOMMU_OPTION, &option);
+if (ret) {
+error_report("Failed to set option %x to value %"PRIx64" %m", 
option_id,
+ val64);
+}
+trace_iommufd_backend_set_option(fd, object_id, option_id, val64, ret);
+
+return ret;
+}
+
  int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
 Error **errp)
  {
diff --git a/backends/trace-events b/backends/trace-events
index 11a27cb114b6..076166552881 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -15,6 +15,7 @@ iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t 
ioas, uint64_t iova, u
  iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " 
iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
  iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t 
hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d 
dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u 
(%d)"
  iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d 
ioas=%d (%d)"
+iommufd_backend_set_option(int iommufd, uint32_t object_id, uint32_t option_id, uint64_t val, int 
ret) " iommufd=%d object_id=%u option_id=%u val64=0x%"PRIx64" (%d)"
  iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d 
(%d)"
  iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " 
iommufd=%d hwpt=%d enable=%d (%d)"
  iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int 
ret) " iommufd=%d hwpt=%d iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" 
(%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 79b13bd262cc..697d40841d7f 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -521,6 +521,10 @@ static int iommufd_cdev_attach(const char *name, 
VFIODevice *vbasedev,
  goto err_alloc_ioas;
  }

+if (!vbasedev->iommufd_dev.iommufd->hugepages) {
+iommufd_backend_set_option(vbasedev->iommufd_dev.iommufd->fd, ioas_id,
+   IOMMU_OPTION_HUGE_PAGES, 0);


Shouldn't we fail device attach if iommufd_backend_set_option() fails?

Thanks.


+}
  trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd_dev.iommufd->fd, ioas_id);

  container = g_malloc0(sizeof(*container));
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index ba19b7ea4c19..bc6607e3d444 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@

Re: [PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported

2024-02-19 Thread Avihai Horon


Hi Joao,

On 12/02/2024 15:56, Joao Martins wrote:

External email: Use caution opening links or attachments


By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.

For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.

So starting with IOMMU dirty tracking it can use to acomodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those.

Signed-off-by: Joao Martins 
---
  hw/vfio/iommufd.c| 3 +--
  hw/vfio/migration.c  | 4 +++-
  include/sysemu/iommufd.h | 1 +
  3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 697d40841d7f..78d8f4391b68 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -275,8 +275,7 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice 
*vbasedev, Error **errp)
  return ret;
  }

-static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
-  Error **errp)
+bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev, Error **errp)
  {
  uint64_t caps;
  int r;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 70e6b1a709f9..674e76b3f3df 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -938,7 +938,9 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error 
**errp)
  return !vfio_block_migration(vbasedev, err, errp);
  }

-if (!vbasedev->dirty_pages_supported) {
+if (!vbasedev->dirty_pages_supported &&
+(vbasedev->iommufd_dev.iommufd &&


Shouldn't we check the type of base_hdev instead?


+ !iommufd_dirty_pages_supported(&vbasedev->iommufd_dev, &err))) {


Maybe we can store IOMMUFD DPT support in iommufd_dev and use it instead 
of querying it here?


Thanks.


  if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
  error_setg(&err,
 "%s: VFIO device doesn't support device dirty 
tracking",
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index bc6607e3d444..d6be49f2ac78 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -53,6 +53,7 @@ typedef struct IOMMUFDDevice {
  void iommufd_device_init(IOMMUFDDevice *idev);
  int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
 Error **errp);
+bool iommufd_dirty_pages_supported(IOMMUFDDevice *idev, Error **errp);
  int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
 uint32_t pt_id, uint32_t flags,
 uint32_t data_type, uint32_t data_len,
--
2.39.3

[PATCH V2 0/1] Change the UEFI loading mode to loongarch

2024-02-19 Thread Xianglai Li

The UEFI loading mode in loongarch is very different
from that in other architectures:loongarch's UEFI code
is in rom, while other architectures' UEFI code is in flash.

loongarch UEFI can be loaded as follows:
-machine virt,pflash=pflash0-format
-bios ./QEMU_EFI.fd

Other architectures load UEFI using the following methods:
-machine virt,pflash0=pflash0-format,pflash1=pflash1-format

loongarch's UEFI loading method makes qemu and libvirt incompatible
when using NVRAM, and the cost of loongarch's current loading method
far outweighs the benefits, so we decided to use the same UEFI loading
scheme as other architectures.

V2:
Change the size of flash0 from 4M to 16M
Add test-by

Cc: Andrea Bolognani 
Cc: maob...@loongson.cn
Cc: Philippe Mathieu-Daudé 
Cc: Song Gao 
Cc: zhaotian...@loongson.cn

Xianglai Li (1):
  loongarch: Change the UEFI loading mode to loongarch

 hw/loongarch/acpi-build.c   |  29 +--
 hw/loongarch/virt.c | 101 ++--
 include/hw/loongarch/virt.h |  10 ++--
 3 files changed, 107 insertions(+), 33 deletions(-)

-- 
2.39.1

[PATCH V2 1/1] loongarch: Change the UEFI loading mode to loongarch

2024-02-19 Thread Xianglai Li

The UEFI loading mode in loongarch is very different
from that in other architectures:loongarch's UEFI code
is in rom, while other architectures' UEFI code is in flash.

loongarch UEFI can be loaded as follows:
-machine virt,pflash=pflash0-format
-bios ./QEMU_EFI.fd

Other architectures load UEFI using the following methods:
-machine virt,pflash0=pflash0-format,pflash1=pflash1-format

loongarch's UEFI loading method makes qemu and libvirt incompatible
when using NVRAM, and the cost of loongarch's current loading method
far outweighs the benefits, so we decided to use the same UEFI loading
scheme as other architectures.

Cc: Andrea Bolognani 
Cc: maob...@loongson.cn
Cc: Philippe Mathieu-Daudé 
Cc: Song Gao 
Cc: zhaotian...@loongson.cn
Signed-off-by: Xianglai Li 
Tested-by: Andrea Bolognani 
---
 hw/loongarch/acpi-build.c   |  29 +--
 hw/loongarch/virt.c | 101 ++--
 include/hw/loongarch/virt.h |  10 ++--
 3 files changed, 107 insertions(+), 33 deletions(-)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index a1c4198741..6c75f216ea 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -314,16 +314,39 @@ static void build_pci_device_aml(Aml *scope, 
LoongArchMachineState *lams)
 static void build_flash_aml(Aml *scope, LoongArchMachineState *lams)
 {
 Aml *dev, *crs;
+MemoryRegion *flash_mem;
 
-hwaddr flash_base = VIRT_FLASH_BASE;
-hwaddr flash_size = VIRT_FLASH_SIZE;
+hwaddr flash0_base;
+hwaddr flash0_size;
+
+hwaddr flash1_base;
+hwaddr flash1_size;
+
+flash_mem = pflash_cfi01_get_memory(lams->flash[0]);
+flash0_base = flash_mem->addr;
+flash0_size = flash_mem->size;
+
+flash_mem = pflash_cfi01_get_memory(lams->flash[1]);
+flash1_base = flash_mem->addr;
+flash1_size = flash_mem->size;
 
 dev = aml_device("FLS0");
 aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
 aml_append(dev, aml_name_decl("_UID", aml_int(0)));
 
 crs = aml_resource_template();
-aml_append(crs, aml_memory32_fixed(flash_base, flash_size, 
AML_READ_WRITE));
+aml_append(crs, aml_memory32_fixed(flash0_base, flash0_size,
+   AML_READ_WRITE));
+aml_append(dev, aml_name_decl("_CRS", crs));
+aml_append(scope, dev);
+
+dev = aml_device("FLS1");
+aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
+aml_append(dev, aml_name_decl("_UID", aml_int(1)));
+
+crs = aml_resource_template();
+aml_append(crs, aml_memory32_fixed(flash1_base, flash1_size,
+   AML_READ_WRITE));
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 }
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 0ad7d8c887..a7b9199e70 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -54,7 +54,9 @@ struct loaderparams {
 const char *initrd_filename;
 };
 
-static void virt_flash_create(LoongArchMachineState *lams)
+static PFlashCFI01 *virt_flash_create1(LoongArchMachineState *lams,
+   const char *name,
+   const char *alias_prop_name)
 {
 DeviceState *dev = qdev_new(TYPE_PFLASH_CFI01);
 
@@ -66,45 +68,78 @@ static void virt_flash_create(LoongArchMachineState *lams)
 qdev_prop_set_uint16(dev, "id1", 0x18);
 qdev_prop_set_uint16(dev, "id2", 0x00);
 qdev_prop_set_uint16(dev, "id3", 0x00);
-qdev_prop_set_string(dev, "name", "virt.flash");
-object_property_add_child(OBJECT(lams), "virt.flash", OBJECT(dev));
-object_property_add_alias(OBJECT(lams), "pflash",
+qdev_prop_set_string(dev, "name", name);
+object_property_add_child(OBJECT(lams), name, OBJECT(dev));
+object_property_add_alias(OBJECT(lams), alias_prop_name,
   OBJECT(dev), "drive");
+return PFLASH_CFI01(dev);
+}
 
-lams->flash = PFLASH_CFI01(dev);
+static void virt_flash_create(LoongArchMachineState *lams)
+{
+lams->flash[0] = virt_flash_create1(lams, "virt.flash0", "pflash0");
+lams->flash[1] = virt_flash_create1(lams, "virt.flash1", "pflash1");
 }
 
-static void virt_flash_map(LoongArchMachineState *lams,
-   MemoryRegion *sysmem)
+static void virt_flash_map1(PFlashCFI01 *flash,
+hwaddr base, hwaddr size,
+MemoryRegion *sysmem)
 {
-PFlashCFI01 *flash = lams->flash;
 DeviceState *dev = DEVICE(flash);
-hwaddr base = VIRT_FLASH_BASE;
-hwaddr size = VIRT_FLASH_SIZE;
+BlockBackend *blk;
+hwaddr real_size = size;
+
+blk = pflash_cfi01_get_blk(flash);
+if (blk) {
+real_size = blk_getlength(blk);
+assert(real_size && real_size <= size);
+}
 
-assert(QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE));
-assert(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX);
+assert(QEMU_IS_ALIGNED(real_size, VIRT_FLASH_SECTOR_SIZE

Re: [PATCH v3 2/3] hw/virtio: cleanup shared resources

2024-02-19 Thread Albert Esteve

On Thu, Feb 15, 2024 at 10:45 AM Albert Esteve  wrote:

>
>
> On Tue, Feb 6, 2024 at 12:11 AM Alex Bennée 
> wrote:
>
>> Albert Esteve  writes:
>>
>> > Ensure that we cleanup all virtio shared
>> > resources when the vhost devices is cleaned
>> > up (after a hot unplug, or a crash).
>> >
>> > To do so, we add a new function to the virtio_dmabuf
>> > API called `virtio_dmabuf_vhost_cleanup`, which
>> > loop through the table and removes all
>> > resources owned by the vhost device parameter.
>> >
>> > Also, add a test to verify that the new
>> > function in the API behaves as expected.
>> >
>> > Signed-off-by: Albert Esteve 
>> > Acked-by: Stefan Hajnoczi 
>> > ---
>> >  hw/display/virtio-dmabuf.c| 22 +
>> >  hw/virtio/vhost.c |  3 +++
>> >  include/hw/virtio/virtio-dmabuf.h | 10 ++
>> >  tests/unit/test-virtio-dmabuf.c   | 33 +++
>> >  4 files changed, 68 insertions(+)
>> >
>> > diff --git a/hw/display/virtio-dmabuf.c b/hw/display/virtio-dmabuf.c
>> > index 3dba4577ca..6688809777 100644
>> > --- a/hw/display/virtio-dmabuf.c
>> > +++ b/hw/display/virtio-dmabuf.c
>> > @@ -136,6 +136,28 @@ SharedObjectType virtio_object_type(const QemuUUID
>> *uuid)
>> >  return vso->type;
>> >  }
>> >
>> > +static bool virtio_dmabuf_resource_is_owned(gpointer key,
>> > +gpointer value,
>> > +gpointer dev)
>> > +{
>> > +VirtioSharedObject *vso;
>> > +
>> > +vso = (VirtioSharedObject *) value;
>> > +return vso->type == TYPE_VHOST_DEV && vso->value == dev;
>>
>> It's a bit surprising to see vso->value being an anonymous gpointer
>> rather than the proper type and a bit confusing between value and
>> vso->value.
>>
>>
> It is the signature required for this to be used with
> `g_hash_table_foreach_remove`.
> For the naming, the HashMap stores gpointers, that point to
> `VirtioSharedObject`, and
> these point to the underlying type (stored at `vso->value`). It may sound
> a bit confusing,
> but is a byproduct of the VirtioSharedObject indirection. Not sure which
> names could be
> more fit for this, but I'm open to change them.
>
>
>> > +}
>> > +
>> > +int virtio_dmabuf_vhost_cleanup(struct vhost_dev *dev)
>> > +{
>> > +int num_removed;
>> > +
>> > +g_mutex_lock(&lock);
>> > +num_removed = g_hash_table_foreach_remove(
>> > +resource_uuids, (GHRFunc) virtio_dmabuf_resource_is_owned,
>> dev);
>> > +g_mutex_unlock(&lock);
>>
>> I'll note if we used a QemuMutex for the lock we could:
>>
>>   - use WITH_QEMU_LOCK_GUARD(&lock) { }
>>   - enable QSP porfiling for the lock
>>
>>
> Was not aware of these QemuMutex's. I wouldn't mind changing the mutex in
> this
> file in a different commit.
>

The problem is that current lock is a global static, and `QemuMutex` needs
to be
initialised by doing `qemu_mutex_init(&lock);`.

Maybe can be initialised at vhost-user.c by adding a public function?


>
>
>> > +
>> > +return num_removed;
>> > +}
>> > +
>> >  void virtio_free_resources(void)
>> >  {
>> >  g_mutex_lock(&lock);
>> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> > index 2c9ac79468..c5622eac14 100644
>> > --- a/hw/virtio/vhost.c
>> > +++ b/hw/virtio/vhost.c
>> > @@ -16,6 +16,7 @@
>> >  #include "qemu/osdep.h"
>> >  #include "qapi/error.h"
>> >  #include "hw/virtio/vhost.h"
>> > +#include "hw/virtio/virtio-dmabuf.h"
>> >  #include "qemu/atomic.h"
>> >  #include "qemu/range.h"
>> >  #include "qemu/error-report.h"
>> > @@ -1599,6 +1600,8 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
>> >  migrate_del_blocker(&hdev->migration_blocker);
>> >  g_free(hdev->mem);
>> >  g_free(hdev->mem_sections);
>> > +/* free virtio shared objects */
>> > +virtio_dmabuf_vhost_cleanup(hdev);
>> >  if (hdev->vhost_ops) {
>> >  hdev->vhost_ops->vhost_backend_cleanup(hdev);
>> >  }
>> > diff --git a/include/hw/virtio/virtio-dmabuf.h
>> b/include/hw/virtio/virtio-dmabuf.h
>> > index 627c3b6db7..73f70fb482 100644
>> > --- a/include/hw/virtio/virtio-dmabuf.h
>> > +++ b/include/hw/virtio/virtio-dmabuf.h
>> > @@ -91,6 +91,16 @@ struct vhost_dev *virtio_lookup_vhost_device(const
>> QemuUUID *uuid);
>> >   */
>> >  SharedObjectType virtio_object_type(const QemuUUID *uuid);
>> >
>> > +/**
>> > + * virtio_dmabuf_vhost_cleanup() - Destroys all entries of the shared
>> > + * resources lookup table that are owned by the vhost backend
>> > + * @dev: the pointer to the vhost device that owns the entries. Data
>> is owned
>> > + *   by the called of the function.
>> > + *
>> > + * Return: the number of resource entries removed.
>> > + */
>> > +int virtio_dmabuf_vhost_cleanup(struct vhost_dev *dev);
>> > +
>> >  /**
>> >   * virtio_free_resources() - Destroys all keys and values of the shared
>> >   * resources lookup table, and frees them
>> > diff --git a/tests/unit/test-virtio-dmabuf.c
>> b/tests/unit/test-virtio-

[PATCH 2/7] hw/ide: Split qdev.c into ide-bus.c and ide-dev.c

2024-02-19 Thread Thomas Huth

qdev.c is a mixture between IDE bus specific functions and IDE device
functions. Let's split it up to make it more obvious which part is
related to bus handling and which part is related to device handling.

Signed-off-by: Thomas Huth 
---
 hw/ide/ide-bus.c | 111 +++
 hw/ide/{qdev.c => ide-dev.c} |  87 +--
 hw/arm/Kconfig   |   2 +
 hw/ide/Kconfig   |  30 ++
 hw/ide/meson.build   |   3 +-
 5 files changed, 134 insertions(+), 99 deletions(-)
 create mode 100644 hw/ide/ide-bus.c
 rename hw/ide/{qdev.c => ide-dev.c} (78%)

diff --git a/hw/ide/ide-bus.c b/hw/ide/ide-bus.c
new file mode 100644
index 00..57fe67b29c
--- /dev/null
+++ b/hw/ide/ide-bus.c
@@ -0,0 +1,111 @@
+/*
+ * ide bus support for qdev.
+ *
+ * Copyright (c) 2009 Gerd Hoffmann 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/module.h"
+#include "hw/ide/internal.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/blockdev.h"
+#include "sysemu/runstate.h"
+
+static char *idebus_get_fw_dev_path(DeviceState *dev);
+static void idebus_unrealize(BusState *qdev);
+
+static void ide_bus_class_init(ObjectClass *klass, void *data)
+{
+BusClass *k = BUS_CLASS(klass);
+
+k->get_fw_dev_path = idebus_get_fw_dev_path;
+k->unrealize = idebus_unrealize;
+}
+
+static void idebus_unrealize(BusState *bus)
+{
+IDEBus *ibus = IDE_BUS(bus);
+
+if (ibus->vmstate) {
+qemu_del_vm_change_state_handler(ibus->vmstate);
+}
+}
+
+static const TypeInfo ide_bus_info = {
+.name = TYPE_IDE_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(IDEBus),
+.class_init = ide_bus_class_init,
+};
+
+void ide_bus_init(IDEBus *idebus, size_t idebus_size, DeviceState *dev,
+ int bus_id, int max_units)
+{
+qbus_init(idebus, idebus_size, TYPE_IDE_BUS, dev, NULL);
+idebus->bus_id = bus_id;
+idebus->max_units = max_units;
+}
+
+static char *idebus_get_fw_dev_path(DeviceState *dev)
+{
+char path[30];
+
+snprintf(path, sizeof(path), "%s@%x", qdev_fw_name(dev),
+ ((IDEBus *)dev->parent_bus)->bus_id);
+
+return g_strdup(path);
+}
+
+IDEDevice *ide_bus_create_drive(IDEBus *bus, int unit, DriveInfo *drive)
+{
+DeviceState *dev;
+
+dev = qdev_new(drive->media_cd ? "ide-cd" : "ide-hd");
+qdev_prop_set_uint32(dev, "unit", unit);
+qdev_prop_set_drive_err(dev, "drive", blk_by_legacy_dinfo(drive),
+&error_fatal);
+qdev_realize_and_unref(dev, &bus->qbus, &error_fatal);
+return DO_UPCAST(IDEDevice, qdev, dev);
+}
+
+int ide_get_geometry(BusState *bus, int unit,
+ int16_t *cyls, int8_t *heads, int8_t *secs)
+{
+IDEState *s = &DO_UPCAST(IDEBus, qbus, bus)->ifs[unit];
+
+if (s->drive_kind != IDE_HD || !s->blk) {
+return -1;
+}
+
+*cyls = s->cylinders;
+*heads = s->heads;
+*secs = s->sectors;
+return 0;
+}
+
+int ide_get_bios_chs_trans(BusState *bus, int unit)
+{
+return DO_UPCAST(IDEBus, qbus, bus)->ifs[unit].chs_trans;
+}
+
+static void ide_bus_register_type(void)
+{
+type_register_static(&ide_bus_info);
+}
+
+type_init(ide_bus_register_type)
diff --git a/hw/ide/qdev.c b/hw/ide/ide-dev.c
similarity index 78%
rename from hw/ide/qdev.c
rename to hw/ide/ide-dev.c
index 4189313d30..15d088fd06 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/ide-dev.c
@@ -1,5 +1,5 @@
 /*
- * ide bus support for qdev.
+ * IDE device functions
  *
  * Copyright (c) 2009 Gerd Hoffmann 
  *
@@ -18,71 +18,21 @@
  */
 
 #include "qemu/osdep.h"
-#include "sysemu/dma.h"
 #include "qapi/error.h"
 #include "qapi/qapi-types-block.h"
 #include "qemu/error-report.h"
-#include "qemu/main-loop.h"
 #include "qemu/module.h"
 #include "hw/ide/ide-dev.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/blockdev.h"
 #include "sysemu/sysemu.h"
-#include "sysemu/runstate.h"
 #include "qapi/visitor.h"
 
-/* - */
-
-static char *idebus_get_fw_dev_path(DeviceState *dev);
-static void idebus_unrealize(BusState *qdev);
-
 static Property ide_props[] = {
 DEFINE_PROP_UINT32("unit", IDEDevice, unit, -1),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-static void ide_bus_class_init(Obj

[PATCH 5/7] hw/ide: Move IDE DMA related definitions to a separate header ide-dma.h

2024-02-19 Thread Thomas Huth

These definitions are required outside of the hw/ide/ code, too,
so lets's move them from internal.h to a new header called ide-dma.h.

Signed-off-by: Thomas Huth 
---
 include/hw/ide/ide-dma.h  | 30 ++
 include/hw/ide/internal.h | 27 +--
 2 files changed, 31 insertions(+), 26 deletions(-)
 create mode 100644 include/hw/ide/ide-dma.h

diff --git a/include/hw/ide/ide-dma.h b/include/hw/ide/ide-dma.h
new file mode 100644
index 00..fb82966bdd
--- /dev/null
+++ b/include/hw/ide/ide-dma.h
@@ -0,0 +1,30 @@
+#ifndef HW_IDE_DMA_H
+#define HW_IDE_DMA_H
+
+typedef void DMAStartFunc(const IDEDMA *, IDEState *, BlockCompletionFunc *);
+typedef void DMAVoidFunc(const IDEDMA *);
+typedef int DMAIntFunc(const IDEDMA *, bool);
+typedef int32_t DMAInt32Func(const IDEDMA *, int32_t len);
+typedef void DMAu32Func(const IDEDMA *, uint32_t);
+typedef void DMAStopFunc(const IDEDMA *, bool);
+
+struct IDEDMAOps {
+DMAStartFunc *start_dma;
+DMAVoidFunc *pio_transfer;
+DMAInt32Func *prepare_buf;
+DMAu32Func *commit_buf;
+DMAIntFunc *rw_buf;
+DMAVoidFunc *restart;
+DMAVoidFunc *restart_dma;
+DMAStopFunc *set_inactive;
+DMAVoidFunc *cmd_done;
+DMAVoidFunc *reset;
+};
+
+struct IDEDMA {
+const struct IDEDMAOps *ops;
+QEMUIOVector qiov;
+BlockAIOCB *aiocb;
+};
+
+#endif
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index 642bd1a979..d1d3fcd23a 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -9,6 +9,7 @@
 
 #include "hw/ide.h"
 #include "hw/ide/ide-bus.h"
+#include "hw/ide/ide-dma.h"
 
 /* debug IDE devices */
 #define USE_DMA_CDROM
@@ -316,13 +317,6 @@
 #define SMART_DISABLE 0xd9
 #define SMART_STATUS  0xda
 
-typedef void DMAStartFunc(const IDEDMA *, IDEState *, BlockCompletionFunc *);
-typedef void DMAVoidFunc(const IDEDMA *);
-typedef int DMAIntFunc(const IDEDMA *, bool);
-typedef int32_t DMAInt32Func(const IDEDMA *, int32_t len);
-typedef void DMAu32Func(const IDEDMA *, uint32_t);
-typedef void DMAStopFunc(const IDEDMA *, bool);
-
 extern const char *IDE_DMA_CMD_lookup[IDE_DMA__COUNT];
 
 extern const MemoryRegionPortio ide_portio_list[];
@@ -340,25 +334,6 @@ typedef struct IDEBufferedRequest {
 bool orphaned;
 } IDEBufferedRequest;
 
-struct IDEDMAOps {
-DMAStartFunc *start_dma;
-DMAVoidFunc *pio_transfer;
-DMAInt32Func *prepare_buf;
-DMAu32Func *commit_buf;
-DMAIntFunc *rw_buf;
-DMAVoidFunc *restart;
-DMAVoidFunc *restart_dma;
-DMAStopFunc *set_inactive;
-DMAVoidFunc *cmd_done;
-DMAVoidFunc *reset;
-};
-
-struct IDEDMA {
-const struct IDEDMAOps *ops;
-QEMUIOVector qiov;
-BlockAIOCB *aiocb;
-};
-
 /* These are used for the error_status field of IDEBus */
 #define IDE_RETRY_MASK 0xf8
 #define IDE_RETRY_DMA  0x08
-- 
2.43.2

[PATCH 3/7] hw/ide: Move IDE device related definitions to ide-dev.h

2024-02-19 Thread Thomas Huth

Let's start to unentangle internal.h by moving public IDE device
related definitions to ide-dev.h.

Signed-off-by: Thomas Huth 
---
 include/hw/ide/ide-dev.h  | 145 +-
 include/hw/ide/internal.h | 145 +-
 hw/ide/ide-dev.c  |   1 +
 3 files changed, 146 insertions(+), 145 deletions(-)

diff --git a/include/hw/ide/ide-dev.h b/include/hw/ide/ide-dev.h
index 7e9663cda9..de88784a25 100644
--- a/include/hw/ide/ide-dev.h
+++ b/include/hw/ide/ide-dev.h
@@ -20,9 +20,152 @@
 #ifndef IDE_DEV_H
 #define IDE_DEV_H
 
+#include "sysemu/dma.h"
 #include "hw/qdev-properties.h"
 #include "hw/block/block.h"
-#include "hw/ide/internal.h"
+
+typedef struct IDEDevice IDEDevice;
+typedef struct IDEState IDEState;
+typedef struct IDEDMA IDEDMA;
+typedef struct IDEDMAOps IDEDMAOps;
+typedef struct IDEBus IDEBus;
+
+typedef void EndTransferFunc(IDEState *);
+
+#define MAX_IDE_DEVS 2
+
+#define TYPE_IDE_DEVICE "ide-device"
+OBJECT_DECLARE_TYPE(IDEDevice, IDEDeviceClass, IDE_DEVICE)
+
+typedef enum { IDE_HD, IDE_CD, IDE_CFATA } IDEDriveKind;
+
+struct unreported_events {
+bool eject_request;
+bool new_media;
+};
+
+enum ide_dma_cmd {
+IDE_DMA_READ = 0,
+IDE_DMA_WRITE,
+IDE_DMA_TRIM,
+IDE_DMA_ATAPI,
+IDE_DMA__COUNT
+};
+
+/* NOTE: IDEState represents in fact one drive */
+struct IDEState {
+IDEBus *bus;
+uint8_t unit;
+/* ide config */
+IDEDriveKind drive_kind;
+int drive_heads, drive_sectors;
+int cylinders, heads, sectors, chs_trans;
+int64_t nb_sectors;
+int mult_sectors;
+int identify_set;
+uint8_t identify_data[512];
+int drive_serial;
+char drive_serial_str[21];
+char drive_model_str[41];
+uint64_t wwn;
+/* ide regs */
+uint8_t feature;
+uint8_t error;
+uint32_t nsector;
+uint8_t sector;
+uint8_t lcyl;
+uint8_t hcyl;
+/* other part of tf for lba48 support */
+uint8_t hob_feature;
+uint8_t hob_nsector;
+uint8_t hob_sector;
+uint8_t hob_lcyl;
+uint8_t hob_hcyl;
+
+uint8_t select;
+uint8_t status;
+
+bool io8;
+bool reset_reverts;
+
+/* set for lba48 access */
+uint8_t lba48;
+BlockBackend *blk;
+char version[9];
+/* ATAPI specific */
+struct unreported_events events;
+uint8_t sense_key;
+uint8_t asc;
+bool tray_open;
+bool tray_locked;
+uint8_t cdrom_changed;
+int packet_transfer_size;
+int elementary_transfer_size;
+int32_t io_buffer_index;
+int lba;
+int cd_sector_size;
+int atapi_dma; /* true if dma is requested for the packet cmd */
+BlockAcctCookie acct;
+BlockAIOCB *pio_aiocb;
+QEMUIOVector qiov;
+QLIST_HEAD(, IDEBufferedRequest) buffered_requests;
+/* ATA DMA state */
+uint64_t io_buffer_offset;
+int32_t io_buffer_size;
+QEMUSGList sg;
+/* PIO transfer handling */
+int req_nb_sectors; /* number of sectors per interrupt */
+EndTransferFunc *end_transfer_func;
+uint8_t *data_ptr;
+uint8_t *data_end;
+uint8_t *io_buffer;
+/* PIO save/restore */
+int32_t io_buffer_total_len;
+int32_t cur_io_buffer_offset;
+int32_t cur_io_buffer_len;
+uint8_t end_transfer_fn_idx;
+QEMUTimer *sector_write_timer; /* only used for win2k install hack */
+uint32_t irq_count; /* counts IRQs when using win2k install hack */
+/* CF-ATA extended error */
+uint8_t ext_error;
+/* CF-ATA metadata storage */
+uint32_t mdata_size;
+uint8_t *mdata_storage;
+int media_changed;
+enum ide_dma_cmd dma_cmd;
+/* SMART */
+uint8_t smart_enabled;
+uint8_t smart_autosave;
+int smart_errors;
+uint8_t smart_selftest_count;
+uint8_t *smart_selftest_data;
+/* AHCI */
+int ncq_queues;
+};
+
+struct IDEDeviceClass {
+DeviceClass parent_class;
+void (*realize)(IDEDevice *dev, Error **errp);
+};
+
+struct IDEDevice {
+DeviceState qdev;
+uint32_t unit;
+BlockConf conf;
+int chs_trans;
+char *version;
+char *serial;
+char *model;
+uint64_t wwn;
+/*
+ * 0x- rotation rate not reported
+ * 0x0001- non-rotating medium (SSD)
+ * 0x0002-0x0400 - reserved
+ * 0x0401-0xffe  - rotations per minute
+ * 0x- reserved
+ */
+uint16_t rotation_rate;
+};
 
 typedef struct IDEDrive {
 IDEDevice dev;
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index 3bdcc75597..5cc109fe82 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -8,24 +8,16 @@
  */
 
 #include "hw/ide.h"
-#include "sysemu/dma.h"
-#include "hw/block/block.h"
 #include "exec/ioport.h"
+#include "hw/ide/ide-dev.h"
 
 /* debug IDE devices */
 #define USE_DMA_CDROM
 #include "qom/object.h"
 
-typedef struct IDEDevice IDEDevice;
-typedef struct IDEState IDEState;
-typedef struct IDEDMA IDEDMA;
-typedef struct IDEDMAOps IDEDMAOps;
-
 #define TYPE_IDE_BUS "I

[PATCH 4/7] hw/ide: Move IDE bus related definitions to a new header ide-bus.h

2024-02-19 Thread Thomas Huth

Let's consolidate the public IDE bus related functions in a separate
header.

Signed-off-by: Thomas Huth 
---
 include/hw/ide/ide-bus.h  | 41 +++
 include/hw/ide/internal.h | 38 +---
 2 files changed, 42 insertions(+), 37 deletions(-)
 create mode 100644 include/hw/ide/ide-bus.h

diff --git a/include/hw/ide/ide-bus.h b/include/hw/ide/ide-bus.h
new file mode 100644
index 00..e0460700ed
--- /dev/null
+++ b/include/hw/ide/ide-bus.h
@@ -0,0 +1,41 @@
+#ifndef HW_IDE_BUS_H
+#define HW_IDE_BUS_H
+
+#include "exec/ioport.h"
+#include "hw/ide/ide-dev.h"
+
+struct IDEBus {
+BusState qbus;
+IDEDevice *master;
+IDEDevice *slave;
+IDEState ifs[2];
+QEMUBH *bh;
+
+int bus_id;
+int max_units;
+IDEDMA *dma;
+uint8_t unit;
+uint8_t cmd;
+qemu_irq irq; /* bus output */
+
+int error_status;
+uint8_t retry_unit;
+int64_t retry_sector_num;
+uint32_t retry_nsector;
+PortioList portio_list;
+PortioList portio2_list;
+VMChangeStateEntry *vmstate;
+};
+
+#define TYPE_IDE_BUS "IDE"
+OBJECT_DECLARE_SIMPLE_TYPE(IDEBus, IDE_BUS)
+
+void ide_bus_init(IDEBus *idebus, size_t idebus_size, DeviceState *dev,
+  int bus_id, int max_units);
+IDEDevice *ide_bus_create_drive(IDEBus *bus, int unit, DriveInfo *drive);
+
+int ide_get_geometry(BusState *bus, int unit,
+ int16_t *cyls, int8_t *heads, int8_t *secs);
+int ide_get_bios_chs_trans(BusState *bus, int unit);
+
+#endif
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index 5cc109fe82..642bd1a979 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -8,16 +8,12 @@
  */
 
 #include "hw/ide.h"
-#include "exec/ioport.h"
-#include "hw/ide/ide-dev.h"
+#include "hw/ide/ide-bus.h"
 
 /* debug IDE devices */
 #define USE_DMA_CDROM
 #include "qom/object.h"
 
-#define TYPE_IDE_BUS "IDE"
-OBJECT_DECLARE_SIMPLE_TYPE(IDEBus, IDE_BUS)
-
 /* Device/Head ("select") Register */
 #define ATA_DEV_SELECT  0x10
 /* ATA1,3: Defined as '1'.
@@ -363,29 +359,6 @@ struct IDEDMA {
 BlockAIOCB *aiocb;
 };
 
-struct IDEBus {
-BusState qbus;
-IDEDevice *master;
-IDEDevice *slave;
-IDEState ifs[2];
-QEMUBH *bh;
-
-int bus_id;
-int max_units;
-IDEDMA *dma;
-uint8_t unit;
-uint8_t cmd;
-qemu_irq irq; /* bus output */
-
-int error_status;
-uint8_t retry_unit;
-int64_t retry_sector_num;
-uint32_t retry_nsector;
-PortioList portio_list;
-PortioList portio2_list;
-VMChangeStateEntry *vmstate;
-};
-
 /* These are used for the error_status field of IDEBus */
 #define IDE_RETRY_MASK 0xf8
 #define IDE_RETRY_DMA  0x08
@@ -502,15 +475,6 @@ void ide_cancel_dma_sync(IDEState *s);
 void ide_atapi_cmd(IDEState *s);
 void ide_atapi_cmd_reply_end(IDEState *s);
 
-/* hw/ide/qdev.c */
-void ide_bus_init(IDEBus *idebus, size_t idebus_size, DeviceState *dev,
-  int bus_id, int max_units);
-IDEDevice *ide_bus_create_drive(IDEBus *bus, int unit, DriveInfo *drive);
-
-int ide_get_geometry(BusState *bus, int unit,
- int16_t *cyls, int8_t *heads, int8_t *secs);
-int ide_get_bios_chs_trans(BusState *bus, int unit);
-
 int ide_handle_rw_error(IDEState *s, int error, int op);
 
 #endif /* HW_IDE_INTERNAL_H */
-- 
2.43.2

[PATCH 0/7] hw/ide: Clean up hw/ide/qdev.c and include/hw/ide/internal.h

2024-02-19 Thread Thomas Huth

While trying to make it possible to compile-out the CompactFlash IDE device
in downstream distributions (first patch), we noticed that there are more
things in the IDE code that could use a proper clean up:

First, hw/ide/qdev.c is quite a mix between IDE BUS specific functions
and (disk) device specific functions. Thus the second patch splits qdev.c
into two new separate files to make it more obvious which part belongs
to which kind of devices.

The remaining patches unentangle include/hw/ide/internal.h, which is meant
as a header that should only be used internally to the IDE subsystem, but
which is currently exposed to the world since include/hw/ide/pci.h includes
this header, too. Thus we move the definitions that are also required for
non-IDE code to other new header files, so we can finally change pci.h to
stop including internal.h. After these changes, internal.h is only included
by files in hw/ide/ as it should be.

Thomas Huth (7):
  hw/ide: Add the possibility to disable the CompactFlash device in the
build
  hw/ide: Split qdev.c into ide-bus.c and ide-dev.c
  hw/ide: Move IDE device related definitions to ide-dev.h
  hw/ide: Move IDE bus related definitions to a new header ide-bus.h
  hw/ide: Move IDE DMA related definitions to a separate header
ide-dma.h
  hw/ide: Remove the include/hw/ide.h legacy file
  hw/ide: Stop exposing internal.h to non-IDE files

 include/hw/ide.h |   9 --
 include/hw/ide/ide-bus.h |  41 +++
 include/hw/ide/ide-dev.h | 186 +++
 include/hw/ide/ide-dma.h |  30 +
 include/hw/ide/internal.h| 209 +--
 include/hw/ide/pci.h |   3 +-
 hw/i386/pc.c |   2 +-
 hw/ide/cf.c  |  58 ++
 hw/ide/cmd646.c  |   1 +
 hw/ide/ide-bus.c | 111 +++
 hw/ide/{qdev.c => ide-dev.c} | 137 +--
 hw/ide/pci.c |   1 +
 hw/ide/piix.c|   1 +
 hw/ide/sii3112.c |   1 +
 hw/ide/via.c |   1 +
 hw/arm/Kconfig   |   2 +
 hw/ide/Kconfig   |  32 --
 hw/ide/meson.build   |   4 +-
 18 files changed, 465 insertions(+), 364 deletions(-)
 delete mode 100644 include/hw/ide.h
 create mode 100644 include/hw/ide/ide-bus.h
 create mode 100644 include/hw/ide/ide-dev.h
 create mode 100644 include/hw/ide/ide-dma.h
 create mode 100644 hw/ide/cf.c
 create mode 100644 hw/ide/ide-bus.c
 rename hw/ide/{qdev.c => ide-dev.c} (67%)

-- 
2.43.2

Re: [RFC PATCH 1/5] cxl/core: correct length of DPA field masks

2024-02-19 Thread Shiyang Ruan via





在 2024/2/10 14:34, Dan Williams 写道:

Shiyang Ruan wrote:

The length of Physical Address in General Media Event Record/DRAM Event
Record is 64-bit, so the field mask should be defined as such length.


Can you include this user visible side-effect of this change. Looks like
this could cause usages of CXL_DPA_FLAGS_MASK to return an incorrect
result?


Ok.  Will change it to this:

The length of Physical Address in General Media Event Record/DRAM Event 
Record is 64bit, per CXL Spec r3.0 - 8.2.9.2.1.1, Table 8-43.  Currently 
CXL_DPA_FLAGS_MASK is defined as int (32bit), then CXL_DPA_MASK is a int 
too, it will be 0xFFC0 while using "->dpa & CXL_DPA_MASK" to 
obtain real physical address (to drop flags in lower bits), in this case 
the higher 32bit of ->dpa will be lost.


To avoid this, define CXL_DPA_FLAGS_MASK as 64bit: 0x3FULL.


--
Thanks,
Ruan.





Signed-off-by: Shiyang Ruan 
---
  drivers/cxl/core/trace.h | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 89445435303a..388a87d972c2 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -253,11 +253,11 @@ TRACE_EVENT(cxl_generic_event,
   * DRAM Event Record
   * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
   */
-#define CXL_DPA_FLAGS_MASK 0x3F
+#define CXL_DPA_FLAGS_MASK 0x3FULL
  #define CXL_DPA_MASK  (~CXL_DPA_FLAGS_MASK)
  
-#define CXL_DPA_VOLATILE			BIT(0)

-#define CXL_DPA_NOT_REPAIRABLE BIT(1)
+#define CXL_DPA_VOLATILE   BIT_ULL(0)
+#define CXL_DPA_NOT_REPAIRABLE BIT_ULL(1)
  #define show_dpa_flags(flags) __print_flags(flags, "|",\
{ CXL_DPA_VOLATILE, "VOLATILE"}, \
{ CXL_DPA_NOT_REPAIRABLE,   "NOT_REPAIRABLE"  }  \
--
2.34.1

[PATCH 1/7] hw/ide: Add the possibility to disable the CompactFlash device in the build

2024-02-19 Thread Thomas Huth

For distros like downstream RHEL, it would be helpful to allow to disable
the CompactFlash device. For making this possible, we need a separate
Kconfig switch for this device, and the code should reside in a separate
file. Let's also introduce a new header ide-dev.h which can be used to
collect definitions related to IDE devices.

Signed-off-by: Thomas Huth 
---
 include/hw/ide/ide-dev.h | 41 
 hw/ide/cf.c  | 58 
 hw/ide/qdev.c| 51 ++-
 hw/ide/Kconfig   |  4 +++
 hw/ide/meson.build   |  1 +
 5 files changed, 106 insertions(+), 49 deletions(-)
 create mode 100644 include/hw/ide/ide-dev.h
 create mode 100644 hw/ide/cf.c

diff --git a/include/hw/ide/ide-dev.h b/include/hw/ide/ide-dev.h
new file mode 100644
index 00..7e9663cda9
--- /dev/null
+++ b/include/hw/ide/ide-dev.h
@@ -0,0 +1,41 @@
+/*
+ * ide device definitions
+ *
+ * Copyright (c) 2009 Gerd Hoffmann 
+ *
+ * This code is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef IDE_DEV_H
+#define IDE_DEV_H
+
+#include "hw/qdev-properties.h"
+#include "hw/block/block.h"
+#include "hw/ide/internal.h"
+
+typedef struct IDEDrive {
+IDEDevice dev;
+} IDEDrive;
+
+#define DEFINE_IDE_DEV_PROPERTIES() \
+DEFINE_BLOCK_PROPERTIES(IDEDrive, dev.conf),\
+DEFINE_BLOCK_ERROR_PROPERTIES(IDEDrive, dev.conf),  \
+DEFINE_PROP_STRING("ver",  IDEDrive, dev.version),  \
+DEFINE_PROP_UINT64("wwn",  IDEDrive, dev.wwn, 0),   \
+DEFINE_PROP_STRING("serial",  IDEDrive, dev.serial),\
+DEFINE_PROP_STRING("model", IDEDrive, dev.model)
+
+void ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind, Error **errp);
+
+#endif
diff --git a/hw/ide/cf.c b/hw/ide/cf.c
new file mode 100644
index 00..2a425cb0f2
--- /dev/null
+++ b/hw/ide/cf.c
@@ -0,0 +1,58 @@
+/*
+ * ide CompactFlash support
+ *
+ * This code is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/ide/ide-dev.h"
+#include "qapi/qapi-types-block.h"
+
+static void ide_cf_realize(IDEDevice *dev, Error **errp)
+{
+ide_dev_initfn(dev, IDE_CFATA, errp);
+}
+
+static Property ide_cf_properties[] = {
+DEFINE_IDE_DEV_PROPERTIES(),
+DEFINE_BLOCK_CHS_PROPERTIES(IDEDrive, dev.conf),
+DEFINE_PROP_BIOS_CHS_TRANS("bios-chs-trans",
+IDEDrive, dev.chs_trans, BIOS_ATA_TRANSLATION_AUTO),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ide_cf_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+IDEDeviceClass *k = IDE_DEVICE_CLASS(klass);
+
+k->realize  = ide_cf_realize;
+dc->fw_name = "drive";
+dc->desc= "virtual CompactFlash card";
+device_class_set_props(dc, ide_cf_properties);
+}
+
+static const TypeInfo ide_cf_info = {
+.name  = "ide-cf",
+.parent= TYPE_IDE_DEVICE,
+.instance_size = sizeof(IDEDrive),
+.class_init= ide_cf_class_init,
+};
+
+static void ide_cf_register_type(void)
+{
+type_register_static(&ide_cf_info);
+}
+
+type_init(ide_cf_register_type)
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 1b3b4da01d..4189313d30 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -24,12 +24,9 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
-#include "hw/ide/internal.h"
-#include "hw/qdev-properties.h"
-#include "hw/qdev-properties-system.h"
+#include "hw/ide/ide-dev.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/blockdev.h"
-#include "hw/block/block.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "qapi/visitor.h"
@@ -158,11 +155,7 @@ int ide_get_bios_chs_trans(BusState *bus, int unit)
 
 /*

[PATCH 6/7] hw/ide: Remove the include/hw/ide.h legacy file

2024-02-19 Thread Thomas Huth

There was only one prototype left in this legacy file. Move it to
ide-dev.h to finally get rid of it.

Signed-off-by: Thomas Huth 
---
 include/hw/ide.h  | 9 -
 include/hw/ide/ide-dev.h  | 2 ++
 include/hw/ide/internal.h | 1 -
 3 files changed, 2 insertions(+), 10 deletions(-)
 delete mode 100644 include/hw/ide.h

diff --git a/include/hw/ide.h b/include/hw/ide.h
deleted file mode 100644
index db963bdb77..00
--- a/include/hw/ide.h
+++ /dev/null
@@ -1,9 +0,0 @@
-#ifndef HW_IDE_H
-#define HW_IDE_H
-
-#include "exec/memory.h"
-
-/* ide/core.c */
-void ide_drive_get(DriveInfo **hd, int max_bus);
-
-#endif /* HW_IDE_H */
diff --git a/include/hw/ide/ide-dev.h b/include/hw/ide/ide-dev.h
index de88784a25..ad55997442 100644
--- a/include/hw/ide/ide-dev.h
+++ b/include/hw/ide/ide-dev.h
@@ -181,4 +181,6 @@ typedef struct IDEDrive {
 
 void ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind, Error **errp);
 
+void ide_drive_get(DriveInfo **hd, int max_bus);
+
 #endif
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index d1d3fcd23a..0fc2013374 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -7,7 +7,6 @@
  * non-internal declarations are in hw/ide.h
  */
 
-#include "hw/ide.h"
 #include "hw/ide/ide-bus.h"
 #include "hw/ide/ide-dma.h"
 
-- 
2.43.2

[PATCH 7/7] hw/ide: Stop exposing internal.h to non-IDE files

2024-02-19 Thread Thomas Huth

include/hw/ide/internal.h is currently included by include/hw/ide/pci.h
and thus exposed to a lot of files that are not part of the IDE subsystem.
Stop including internal.h there and use the appropriate new headers
ide-bus.h and ide-dma.h instead.

Signed-off-by: Thomas Huth 
---
 include/hw/ide/pci.h | 3 ++-
 hw/i386/pc.c | 2 +-
 hw/ide/cmd646.c  | 1 +
 hw/ide/pci.c | 1 +
 hw/ide/piix.c| 1 +
 hw/ide/sii3112.c | 1 +
 hw/ide/via.c | 1 +
 7 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index a814a0a7c3..e1e012c387 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -1,7 +1,8 @@
 #ifndef HW_IDE_PCI_H
 #define HW_IDE_PCI_H
 
-#include "hw/ide/internal.h"
+#include "hw/ide/ide-bus.h"
+#include "hw/ide/ide-dma.h"
 #include "hw/pci/pci_device.h"
 #include "qom/object.h"
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 196827531a..22d0c29575 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -31,7 +31,7 @@
 #include "hw/i386/fw_cfg.h"
 #include "hw/i386/vmport.h"
 #include "sysemu/cpus.h"
-#include "hw/ide/internal.h"
+#include "hw/ide/ide-bus.h"
 #include "hw/timer/hpet.h"
 #include "hw/loader.h"
 #include "hw/rtc/mc146818rtc.h"
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index c0bcfa4414..23d213ff01 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -33,6 +33,7 @@
 #include "sysemu/reset.h"
 
 #include "hw/ide/pci.h"
+#include "hw/ide/internal.h"
 #include "trace.h"
 
 /* CMD646 specific */
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index ca85d8474c..73efeec7f4 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -30,6 +30,7 @@
 #include "sysemu/dma.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
+#include "hw/ide/internal.h"
 #include "hw/ide/pci.h"
 #include "trace.h"
 
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 4e5e12935f..1773a068c3 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -30,6 +30,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "hw/pci/pci.h"
+#include "hw/ide/internal.h"
 #include "hw/ide/piix.h"
 #include "hw/ide/pci.h"
 #include "trace.h"
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 63dc4a0494..321b9e46a1 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -13,6 +13,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/ide/internal.h"
 #include "hw/ide/pci.h"
 #include "qemu/module.h"
 #include "trace.h"
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 3f3c484253..cf151e70ec 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -25,6 +25,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/ide/internal.h"
 #include "hw/pci/pci.h"
 #include "migration/vmstate.h"
 #include "qemu/module.h"
-- 
2.43.2

[PATCH v2] xlnx-versal-ospi: disable reentrancy detection for iomem_dac

2024-02-19 Thread Sai Pavan Boddu

The OSPI DMA reads flash data through the OSPI linear address space (the
iomem_dac region), because of this the reentrancy guard introduced in
commit a2e1753b ("memory: prevent dma-reentracy issues") is disabled for
the memory region.

Signed-off-by: Sai Pavan Boddu 
---
Changes for V2:
Added code comments.

 hw/ssi/xlnx-versal-ospi.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/ssi/xlnx-versal-ospi.c b/hw/ssi/xlnx-versal-ospi.c
index c7b95b1f37..c479138ec1 100644
--- a/hw/ssi/xlnx-versal-ospi.c
+++ b/hw/ssi/xlnx-versal-ospi.c
@@ -1772,6 +1772,12 @@ static void xlnx_versal_ospi_init(Object *obj)
 memory_region_init_io(&s->iomem_dac, obj, &ospi_dac_ops, s,
   TYPE_XILINX_VERSAL_OSPI "-dac", 0x2000);
 sysbus_init_mmio(sbd, &s->iomem_dac);
+/*
+ * The OSPI DMA reads flash data through the OSPI linear address space (the
+ * iomem_dac region), because of this the reentrancy guard needs to be
+ * disabled.
+ */
+s->iomem_dac.disable_reentrancy_guard = true;
 
 sysbus_init_irq(sbd, &s->irq);
 
-- 
2.25.1

[PATCH] tests/qtest: Fix boot-serial-test when using --without-default-devices

2024-02-19 Thread Thomas Huth

If "configure" has been run with "--without-default-devices", there is
no e1000 device in the binaries, so the boot-serial-test currently fails
in that case since it tries to use the e1000 with the sam460ex machine.

Since we're testing the serial output here, and not the NIC, let's
simply switch to the "pci-bridge" device here instead, which should
always be there for PCIe-based machines like the sam460ex.

Signed-off-by: Thomas Huth 
---
 tests/qtest/boot-serial-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index 6dd06aeaf4..e3b7d65fe5 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -156,7 +156,7 @@ static const testdef_t tests[] = {
   "Open Firmware" },
 { "ppc64", "powernv8", "", "OPAL" },
 { "ppc64", "powernv9", "", "OPAL" },
-{ "ppc64", "sam460ex", "-device e1000", "8086  100e" },
+{ "ppc64", "sam460ex", "-device pci-bridge,chassis_nr=2", "1b36  0001" },
 { "i386", "isapc", "-cpu qemu32 -M graphics=off", "SeaBIOS" },
 { "i386", "pc", "-M graphics=off", "SeaBIOS" },
 { "i386", "q35", "-M graphics=off", "SeaBIOS" },
-- 
2.43.2

Re: [PATCH 1/6] hw/arm: Inline sysbus_create_simple(PL110 / PL111)

2024-02-19 Thread BALATON Zoltan


On Mon, 19 Feb 2024, Philippe Mathieu-Daudé wrote:

On 16/2/24 20:54, Philippe Mathieu-Daudé wrote:

On 16/2/24 18:14, BALATON Zoltan wrote:

On Fri, 16 Feb 2024, Philippe Mathieu-Daudé wrote:

We want to set another qdev property (a link) for the pl110
and pl111 devices, we can not use sysbus_create_simple() which
only passes sysbus base address and IRQs as arguments. Inline
it so we can set the link property in the next commit.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/arm/realview.c    |  5 -
hw/arm/versatilepb.c |  6 +-
hw/arm/vexpress.c    | 10 --
3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 9058f5b414..77300e92e5 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -238,7 +238,10 @@ static void realview_init(MachineState *machine,
    sysbus_create_simple("pl061", 0x10014000, pic[7]);
    gpio2 = sysbus_create_simple("pl061", 0x10015000, pic[8]);

-    sysbus_create_simple("pl111", 0x1002, pic[23]);
+    dev = qdev_new("pl111");
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, 0x1002);
+    sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[23]);


Not directly related to this patch but this blows up 1 line into 4 just to 
allow setting a property. Maybe just to keep some simplicity we'd rather 
need either a sysbus_realize_simple function that takes a sysbus device 
instead of the name and does not create the device itself or some way to 
pass properties to sysbus create simple (but the latter may not be easy to 
do in a generic way so not sure about that). What do you think?


Unfortunately sysbus doesn't scale in heterogeneous setup.


Regarding the HW modelling API complexity you are pointing at, we'd
like to move from the current imperative programming paradigm to a
declarative one, likely DSL driven. Meanwhile it is being investigated
(as part of "Dynamic Machine"), I'm trying to get the HW APIs right


I'm aware of that activity but we're currently still using board code to 
construct machines and probably will continue to do so for a while. Also 
because likely not all current machines will be converted to new 
declarative way so having a convenient API for that is still useful.


(As for the language to describe the devices of a machine and their 
connections declaratively the device tree does just that but dts is not a 
very user friendly descrtiption language so I haven't brought that up as a 
possibility. But you may still could get some clues by looking at the 
problems it had to solve to at least get a requirements for the machine 
description language.)



for heterogeneous emulation. Current price to pay is a verbose
imperative QDev API, hoping we'll get later a trivial declarative one
(like this single sysbus_create_simple call), where we shouldn't worry
about the order of low level calls, whether to use link or not, etc.


Having a detailed low level API does not prevent a more convenient for 
current use higher level API on top so keeping that around for current 
machines would allow you to chnage the low level API without having to 
change all the board codes because you's only need to update the simple 
high level API.


Regards,
BALATON Zoltan

Re: [PATCH 3/7] hw/ide: Move IDE device related definitions to ide-dev.h

2024-02-19 Thread Philippe Mathieu-Daudé


On 19/2/24 11:49, Thomas Huth wrote:

Let's start to unentangle internal.h by moving public IDE device
related definitions to ide-dev.h.

Signed-off-by: Thomas Huth 
---
  include/hw/ide/ide-dev.h  | 145 +-
  include/hw/ide/internal.h | 145 +-
  hw/ide/ide-dev.c  |   1 +
  3 files changed, 146 insertions(+), 145 deletions(-)

diff --git a/include/hw/ide/ide-dev.h b/include/hw/ide/ide-dev.h
index 7e9663cda9..de88784a25 100644
--- a/include/hw/ide/ide-dev.h
+++ b/include/hw/ide/ide-dev.h
@@ -20,9 +20,152 @@
  #ifndef IDE_DEV_H
  #define IDE_DEV_H
  
+#include "sysemu/dma.h"


Not required.


  #include "hw/qdev-properties.h"
  #include "hw/block/block.h"
-#include "hw/ide/internal.h"
+
+typedef struct IDEDevice IDEDevice;
+typedef struct IDEState IDEState;



+typedef struct IDEDMA IDEDMA;
+typedef struct IDEDMAOps IDEDMAOps;
+typedef struct IDEBus IDEBus;


Looking at next patches, better forward-declare IDEBus and
IDEDMA in "qemu/typedefs.h".

IDEDMAOps and "sysemu/dma.h" belong to "hw/ide/ide-dma.h.

Re: [PATCH 0/7] hw/ide: Clean up hw/ide/qdev.c and include/hw/ide/internal.h

2024-02-19 Thread Philippe Mathieu-Daudé


On 19/2/24 11:49, Thomas Huth wrote:

While trying to make it possible to compile-out the CompactFlash IDE device
in downstream distributions (first patch), we noticed that there are more
things in the IDE code that could use a proper clean up:

First, hw/ide/qdev.c is quite a mix between IDE BUS specific functions
and (disk) device specific functions. Thus the second patch splits qdev.c
into two new separate files to make it more obvious which part belongs
to which kind of devices.

The remaining patches unentangle include/hw/ide/internal.h, which is meant
as a header that should only be used internally to the IDE subsystem, but
which is currently exposed to the world since include/hw/ide/pci.h includes
this header, too. Thus we move the definitions that are also required for
non-IDE code to other new header files, so we can finally change pci.h to
stop including internal.h. After these changes, internal.h is only included
by files in hw/ide/ as it should be.

Thomas Huth (7):
   hw/ide: Add the possibility to disable the CompactFlash device in the
 build
   hw/ide: Split qdev.c into ide-bus.c and ide-dev.c
   hw/ide: Move IDE device related definitions to ide-dev.h


Modulo comments in "hw/ide/ide-dev.h", series:
Reviewed-by: Philippe Mathieu-Daudé 


   hw/ide: Move IDE bus related definitions to a new header ide-bus.h
   hw/ide: Move IDE DMA related definitions to a separate header
 ide-dma.h
   hw/ide: Remove the include/hw/ide.h legacy file
   hw/ide: Stop exposing internal.h to non-IDE files

Re: [PATCH] tests/qtest: Fix boot-serial-test when using --without-default-devices

2024-02-19 Thread BALATON Zoltan


On Mon, 19 Feb 2024, Thomas Huth wrote:

If "configure" has been run with "--without-default-devices", there is
no e1000 device in the binaries, so the boot-serial-test currently fails
in that case since it tries to use the e1000 with the sam460ex machine.

Since we're testing the serial output here, and not the NIC, let's
simply switch to the "pci-bridge" device here instead, which should
always be there for PCIe-based machines like the sam460ex.


It's not actually testing PCIe but PCI bus but I think that does not 
matter. PCIe on sam460ex does not work yet, I've only implemented it 
partially to pass the firmware init but devices attached to the PCIe bus 
probably won't work. I have some patches to improve that but not yet 
ready.



Signed-off-by: Thomas Huth 
---
tests/qtest/boot-serial-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index 6dd06aeaf4..e3b7d65fe5 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -156,7 +156,7 @@ static const testdef_t tests[] = {
  "Open Firmware" },
{ "ppc64", "powernv8", "", "OPAL" },
{ "ppc64", "powernv9", "", "OPAL" },
-{ "ppc64", "sam460ex", "-device e1000", "8086  100e" },
+{ "ppc64", "sam460ex", "-device pci-bridge,chassis_nr=2", "1b36  0001" },


So if you want to check if PCI bus works then maybe there's no need to add 
a device at all just look for the sm501 display chip ("126f 0501") that's 
soldered on the board so it's always created even with -nodefaults and 
should always present on sam460ex. The -device option just adds a device 
that appears before the sm501 and stops the test there. Not sure if this 
is testing more than looking for a PCI device created by the board code.


Regards,
BALATON Zoltan


{ "i386", "isapc", "-cpu qemu32 -M graphics=off", "SeaBIOS" },
{ "i386", "pc", "-M graphics=off", "SeaBIOS" },
{ "i386", "q35", "-M graphics=off", "SeaBIOS" },

RE: [EXT] Re: [PATCH] vhost_net: add NOTIFICATION_DATA and IN_ORDER feature bits to vdpa_feature_bits

2024-02-19 Thread Srujana Challa




> -Original Message-
> From: Michael S. Tsirkin 
> Sent: Monday, February 19, 2024 3:15 PM
> To: Srujana Challa 
> Cc: qemu-devel@nongnu.org; Vamsi Krishna Attunuru
> ; Jerin Jacob ; Jason Wang
> 
> Subject: [EXT] Re: [PATCH] vhost_net: add NOTIFICATION_DATA and
> IN_ORDER feature bits to vdpa_feature_bits
> 
> External Email
> 
> --
> On Tue, Jan 02, 2024 at 04:44:32PM +0530, Srujana Challa wrote:
> > Enables VIRTIO_F_NOTIFICATION_DATA and VIRTIO_F_IN_ORDER feature
> bits
> > for vhost vdpa backend. Also adds code to consider all feature bits
> > supported by vhost net client type for feature negotiation, so that
> > vhost backend device supported features can be negotiated with guest.
> >
> > Signed-off-by: Srujana Challa 
> > ---
> >  hw/net/vhost_net.c | 10 ++
> >  net/vhost-vdpa.c   |  2 ++
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index
> > e8e1661646..65ae8bcece 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -117,6 +117,16 @@ static const int
> > *vhost_net_get_feature_bits(struct vhost_net *net)
> >
> >  uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t
> > features)  {
> > +const int *bit = vhost_net_get_feature_bits(net);
> > +
> > +/*
> > + * Consider all feature bits for feature negotiation with vhost 
> > backend,
> > + * so that all backend device supported features can be negotiated.
> > + */
> > +while (*bit != VHOST_INVALID_FEATURE_BIT) {
> > +features |= (1ULL << *bit);
> > +bit++;
> > +}
> >  return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
> >  features);
> >  }
> 
> I don't think we should do this part. With vdpa QEMU is in control of which
> features are exposed and that is intentional since features are often tied to
> other behaviour.

Vdpa Qemu can negotiate all the features which vdpa backend device supports 
with the guest right?
Guest drivers (it could be userspace or kernel drivers) will negotiate their 
own features, so that
frontend supported features will get the precedence.   

> 
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index
> > 3726ee5d67..51334fcfe2 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -57,7 +57,9 @@ typedef struct VhostVDPAState {
> >   */
> >  const int vdpa_feature_bits[] = {
> >  VIRTIO_F_ANY_LAYOUT,
> > +VIRTIO_F_IN_ORDER,
> >  VIRTIO_F_IOMMU_PLATFORM,
> > +VIRTIO_F_NOTIFICATION_DATA,
> >  VIRTIO_F_NOTIFY_ON_EMPTY,
> >  VIRTIO_F_RING_PACKED,
> >  VIRTIO_F_RING_RESET,
> > --
> > 2.25.1

Re: [PATCH 2/7] hw/ide: Split qdev.c into ide-bus.c and ide-dev.c

2024-02-19 Thread BALATON Zoltan


On Mon, 19 Feb 2024, Thomas Huth wrote:

qdev.c is a mixture between IDE bus specific functions and IDE device
functions. Let's split it up to make it more obvious which part is
related to bus handling and which part is related to device handling.

Signed-off-by: Thomas Huth 
---
hw/ide/ide-bus.c | 111 +++
hw/ide/{qdev.c => ide-dev.c} |  87 +--
hw/arm/Kconfig   |   2 +
hw/ide/Kconfig   |  30 ++
hw/ide/meson.build   |   3 +-
5 files changed, 134 insertions(+), 99 deletions(-)
create mode 100644 hw/ide/ide-bus.c
rename hw/ide/{qdev.c => ide-dev.c} (78%)

[...]

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 29abe1da29..b372b819a4 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -275,6 +275,8 @@ config SBSA_REF
select USB_XHCI_SYSBUS
select WDT_SBSA
select BOCHS_DISPLAY
+select IDE_BUS
+select IDE_DEV

config SABRELITE
bool
diff --git a/hw/ide/Kconfig b/hw/ide/Kconfig
index b93d6743d5..6dfc5a2129 100644
--- a/hw/ide/Kconfig
+++ b/hw/ide/Kconfig
@@ -1,51 +1,58 @@
config IDE_CORE
bool

-config IDE_QDEV
+config IDE_BUS
bool
select IDE_CORE


Maybe we can assume if something has an IDE bus it also wants to connect 
IDE devices to it so just select IDE_DEV here and not at every place 
IDE_BUS is selected? Or is there a place that only wants IDE_BUS?


Regards,
BALATON Zoltan


+config IDE_DEV
+bool
+depends on IDE_BUS
+
config IDE_PCI
bool
depends on PCI
-select IDE_QDEV
+select IDE_BUS
+select IDE_DEV

config IDE_ISA
bool
depends on ISA_BUS
-select IDE_QDEV
+select IDE_BUS
+select IDE_DEV

config IDE_PIIX
bool
select IDE_PCI
-select IDE_QDEV

config IDE_CMD646
bool
select IDE_PCI
-select IDE_QDEV

config IDE_MACIO
bool
-select IDE_QDEV
+select IDE_BUS
+select IDE_DEV

config IDE_MMIO
bool
-select IDE_QDEV
+select IDE_BUS
+select IDE_DEV

config IDE_VIA
bool
select IDE_PCI
-select IDE_QDEV

config MICRODRIVE
bool
-select IDE_QDEV
+select IDE_BUS
+select IDE_DEV
depends on PCMCIA

config AHCI
bool
-select IDE_QDEV
+select IDE_BUS
+select IDE_DEV

config AHCI_ICH9
bool
@@ -56,8 +63,7 @@ config AHCI_ICH9
config IDE_SII3112
bool
select IDE_PCI
-select IDE_QDEV

config IDE_CF
bool
-default y if IDE_QDEV
+default y if IDE_BUS
diff --git a/hw/ide/meson.build b/hw/ide/meson.build
index d2e5b45c9e..d09705cac0 100644
--- a/hw/ide/meson.build
+++ b/hw/ide/meson.build
@@ -1,15 +1,16 @@
system_ss.add(when: 'CONFIG_AHCI', if_true: files('ahci.c'))
system_ss.add(when: 'CONFIG_AHCI_ICH9', if_true: files('ich.c'))
system_ss.add(when: 'CONFIG_ALLWINNER_A10', if_true: files('ahci-allwinner.c'))
+system_ss.add(when: 'CONFIG_IDE_BUS', if_true: files('ide-bus.c'))
system_ss.add(when: 'CONFIG_IDE_CF', if_true: files('cf.c'))
system_ss.add(when: 'CONFIG_IDE_CMD646', if_true: files('cmd646.c'))
system_ss.add(when: 'CONFIG_IDE_CORE', if_true: files('core.c', 'atapi.c'))
+system_ss.add(when: 'CONFIG_IDE_DEV', if_true: files('ide-dev.c'))
system_ss.add(when: 'CONFIG_IDE_ISA', if_true: files('isa.c', 'ioport.c'))
system_ss.add(when: 'CONFIG_IDE_MACIO', if_true: files('macio.c'))
system_ss.add(when: 'CONFIG_IDE_MMIO', if_true: files('mmio.c'))
system_ss.add(when: 'CONFIG_IDE_PCI', if_true: files('pci.c'))
system_ss.add(when: 'CONFIG_IDE_PIIX', if_true: files('piix.c', 'ioport.c'))
-system_ss.add(when: 'CONFIG_IDE_QDEV', if_true: files('qdev.c'))
system_ss.add(when: 'CONFIG_IDE_SII3112', if_true: files('sii3112.c'))
system_ss.add(when: 'CONFIG_IDE_VIA', if_true: files('via.c'))
system_ss.add(when: 'CONFIG_MICRODRIVE', if_true: files('microdrive.c'))

Re: [PATCH 1/6] hw/arm: Inline sysbus_create_simple(PL110 / PL111)

2024-02-19 Thread Philippe Mathieu-Daudé


On 19/2/24 12:27, BALATON Zoltan wrote:

On Mon, 19 Feb 2024, Philippe Mathieu-Daudé wrote:

On 16/2/24 20:54, Philippe Mathieu-Daudé wrote:

On 16/2/24 18:14, BALATON Zoltan wrote:

On Fri, 16 Feb 2024, Philippe Mathieu-Daudé wrote:

We want to set another qdev property (a link) for the pl110
and pl111 devices, we can not use sysbus_create_simple() which
only passes sysbus base address and IRQs as arguments. Inline
it so we can set the link property in the next commit.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/arm/realview.c    |  5 -
hw/arm/versatilepb.c |  6 +-
hw/arm/vexpress.c    | 10 --
3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 9058f5b414..77300e92e5 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -238,7 +238,10 @@ static void realview_init(MachineState *machine,
    sysbus_create_simple("pl061", 0x10014000, pic[7]);
    gpio2 = sysbus_create_simple("pl061", 0x10015000, pic[8]);

-    sysbus_create_simple("pl111", 0x1002, pic[23]);
+    dev = qdev_new("pl111");
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, 0x1002);
+    sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[23]);


Not directly related to this patch but this blows up 1 line into 4 
just to allow setting a property. Maybe just to keep some simplicity 
we'd rather need either a sysbus_realize_simple function that takes 
a sysbus device instead of the name and does not create the device 
itself or some way to pass properties to sysbus create simple (but 
the latter may not be easy to do in a generic way so not sure about 
that). What do you think?


Unfortunately sysbus doesn't scale in heterogeneous setup.


Regarding the HW modelling API complexity you are pointing at, we'd
like to move from the current imperative programming paradigm to a
declarative one, likely DSL driven. Meanwhile it is being investigated
(as part of "Dynamic Machine"), I'm trying to get the HW APIs right


I'm aware of that activity but we're currently still using board code to 
construct machines and probably will continue to do so for a while. Also 
because likely not all current machines will be converted to new 
declarative way so having a convenient API for that is still useful.


(As for the language to describe the devices of a machine and their 
connections declaratively the device tree does just that but dts is not 
a very user friendly descrtiption language so I haven't brought that up 
as a possibility. But you may still could get some clues by looking at 
the problems it had to solve to at least get a requirements for the 
machine description language.)



for heterogeneous emulation. Current price to pay is a verbose
imperative QDev API, hoping we'll get later a trivial declarative one
(like this single sysbus_create_simple call), where we shouldn't worry
about the order of low level calls, whether to use link or not, etc.


Having a detailed low level API does not prevent a more convenient for 
current use higher level API on top so keeping that around for current 
machines would allow you to chnage the low level API without having to 
change all the board codes because you's only need to update the simple 
high level API.


So what is your suggestion here, add a new complex helper to keep
a one-line style?

DeviceState *sysbus_create_simple_dma_link(const char *typename,
   hwaddr baseaddr,
   const char *linkname,
   Object *linkobj,
   qemu_irq irq);

I wonder why this is that important since you never modified
any of the files changed by this series:

$ git shortlog -es hw/arm/realview.c hw/arm/versatilepb.c 
hw/arm/vexpress.c hw/display/pl110.c hw/arm/exynos4210.c 
hw/display/exynos4210_fimd.c hw/i386/kvmvapic.c

66  Peter Maydell 
34  Markus Armbruster 
29  Philippe Mathieu-Daudé 
28  Paolo Bonzini 
17  Andreas Färber 
13  Eduardo Habkost 
 8  Greg Bellows 
 7  Krzysztof Kozlowski 
 6  Gerd Hoffmann 
 5  Richard Henderson 
 5  Jan Kiszka 
 5  Igor Mammedov 
 4  Xiaoqiang Zhao 
 4  Thomas Huth 
 4  Anthony Liguori 
 3  Stefan Weil 
 3  Pavel Dovgaluk 
 3  Guenter Roeck 
 3  Daniel P. Berrangé 
 3  Alistair Francis 
 2  Roy Franz 
 2  Pavel Dovgaluk 
 2  Marcel Apfelbaum 
 2  Linus Walleij 
 2  Like Xu 
 2  Juan Quintela 
 2  Igor Mitsyanko 
 2  Hu Tao 
 2  David Woodhouse 
 1  Zongyuan Li 
 1  Wen, Jianxian 
 1  Vincent Palatin 
 1  Tao Xu 
 1  Sergey Fedorov 
 1  Prasad J Pandit 
 1  Prasad J Pandit 
 1  Pranith Kumar 
 1  Peter Crosthwaite 
 1  Nikita Belov 
 1  Martin Kletzander 
 1  Mark Cave-Ayland 
 1  Marcelo Tosatti 
 1  Marcel Apfelbaum 
 1  Marc

Re: QEMU features useful for Xen development?

2024-02-19 Thread Peter Maydell

On Thu, 31 Aug 2023 at 11:32, Ayan Kumar Halder  wrote:
> On 31/08/2023 11:03, Peter Maydell wrote:
> > On Thu, 31 Aug 2023 at 10:53, Alex Bennée  wrote:
> >> Peter Maydell  writes:
> >>> On Thu, 31 Aug 2023 at 01:57, Stefano Stabellini  
> >>> wrote:
>  As Xen is gaining R52 and R82 support, it would be great to be able to
>  use QEMU for development and testing there as well, but I don't think
>  QEMU can emulate EL2 properly for the Cortex-R architecture. We would
>  need EL2 support in the GIC/timer for R52/R82 as well.

> >>> (What sort of board model would Xen want to use it with?)

> >> We already model a bunch of the mps2/mps3 images so I'm assuming adding
> >> the mps3-an536 would be a fairly simple step to do (mps2tz.c is mostly
> >> tweaking config values). The question is would it be a useful target for
> >> Xen?

> Yes, it will be helpful if Qemu can model this board. We have a
> downstream port of Xen on R52 (upstreaming is in progress).
>
> So, we can test the Qemu model with Xen.

Hi, all. I just wanted to provide an update on this. We've now
completed the mps3-an536 board model, and you can find it if
you check out the head-of-git QEMU. The new board will be in the
9.0 QEMU release, but if you have a chance to give it a spin now
we'll be able to fix any bugs or problems with it before the release.

The documentation for the board is here:
https://www.qemu.org/docs/master/system/arm/mps2.html
and it lists the limitations/missing features. (If any of those
are important let me know and we can look at scheduling the work
to fill them in.)

I'd also like to draw your attention to the note about the UART
ordering on the AN536; unfortunately the hardware setup is a bit
awkward here, so if you have an "I don't see any output" problem
make sure your guest is sending to the same UART you're asking
QEMU to show you the output from :-)

thanks
-- PMM

Re: [PATCH 5/7] hw/ide: Move IDE DMA related definitions to a separate header ide-dma.h

2024-02-19 Thread BALATON Zoltan


On Mon, 19 Feb 2024, Thomas Huth wrote:

These definitions are required outside of the hw/ide/ code, too,
so lets's move them from internal.h to a new header called ide-dma.h.

Signed-off-by: Thomas Huth 
---
include/hw/ide/ide-dma.h  | 30 ++
include/hw/ide/internal.h | 27 +--
2 files changed, 31 insertions(+), 26 deletions(-)
create mode 100644 include/hw/ide/ide-dma.h

diff --git a/include/hw/ide/ide-dma.h b/include/hw/ide/ide-dma.h
new file mode 100644
index 00..fb82966bdd
--- /dev/null
+++ b/include/hw/ide/ide-dma.h
@@ -0,0 +1,30 @@
+#ifndef HW_IDE_DMA_H
+#define HW_IDE_DMA_H
+
+typedef void DMAStartFunc(const IDEDMA *, IDEState *, BlockCompletionFunc *);
+typedef void DMAVoidFunc(const IDEDMA *);
+typedef int DMAIntFunc(const IDEDMA *, bool);
+typedef int32_t DMAInt32Func(const IDEDMA *, int32_t len);
+typedef void DMAu32Func(const IDEDMA *, uint32_t);
+typedef void DMAStopFunc(const IDEDMA *, bool);
+
+struct IDEDMAOps {
+DMAStartFunc *start_dma;
+DMAVoidFunc *pio_transfer;
+DMAInt32Func *prepare_buf;
+DMAu32Func *commit_buf;
+DMAIntFunc *rw_buf;
+DMAVoidFunc *restart;
+DMAVoidFunc *restart_dma;
+DMAStopFunc *set_inactive;
+DMAVoidFunc *cmd_done;
+DMAVoidFunc *reset;
+};
+
+struct IDEDMA {
+const struct IDEDMAOps *ops;
+QEMUIOVector qiov;
+BlockAIOCB *aiocb;


Doesn't this need to #include something to define QEMUIOVector and 
BlockAIOCB and some of the DMA and IDE types not defined above?


Regards,
BALATON Zoltan


+};
+
+#endif
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index 642bd1a979..d1d3fcd23a 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -9,6 +9,7 @@

#include "hw/ide.h"
#include "hw/ide/ide-bus.h"
+#include "hw/ide/ide-dma.h"

/* debug IDE devices */
#define USE_DMA_CDROM
@@ -316,13 +317,6 @@
#define SMART_DISABLE 0xd9
#define SMART_STATUS  0xda

-typedef void DMAStartFunc(const IDEDMA *, IDEState *, BlockCompletionFunc *);
-typedef void DMAVoidFunc(const IDEDMA *);
-typedef int DMAIntFunc(const IDEDMA *, bool);
-typedef int32_t DMAInt32Func(const IDEDMA *, int32_t len);
-typedef void DMAu32Func(const IDEDMA *, uint32_t);
-typedef void DMAStopFunc(const IDEDMA *, bool);
-
extern const char *IDE_DMA_CMD_lookup[IDE_DMA__COUNT];

extern const MemoryRegionPortio ide_portio_list[];
@@ -340,25 +334,6 @@ typedef struct IDEBufferedRequest {
bool orphaned;
} IDEBufferedRequest;

-struct IDEDMAOps {
-DMAStartFunc *start_dma;
-DMAVoidFunc *pio_transfer;
-DMAInt32Func *prepare_buf;
-DMAu32Func *commit_buf;
-DMAIntFunc *rw_buf;
-DMAVoidFunc *restart;
-DMAVoidFunc *restart_dma;
-DMAStopFunc *set_inactive;
-DMAVoidFunc *cmd_done;
-DMAVoidFunc *reset;
-};
-
-struct IDEDMA {
-const struct IDEDMAOps *ops;
-QEMUIOVector qiov;
-BlockAIOCB *aiocb;
-};
-
/* These are used for the error_status field of IDEBus */
#define IDE_RETRY_MASK 0xf8
#define IDE_RETRY_DMA  0x08

Re: [PATCH v2 1/3] tests/migration-test: Stick with gicv3 in aarch64 test

2024-02-19 Thread Thomas Huth


On 07/02/2024 01.54, pet...@redhat.com wrote:

From: Peter Xu 

Recently we introduced cross-binary migration test.  It's always wanted
that migration-test uses stable guest ABI for both QEMU binaries in this
case, so that both QEMU binaries will be compatible on the migration
stream with the cmdline specified.

Switch to a static gic version "3" rather than using version "max", so that
GIC should be stable now across any future QEMU binaries for migration-test.

Here the version can actually be anything as long as the ABI is stable.  We
choose "3" because it's the majority of what we already use in QEMU while
still new enough: "git grep gic-version=3" shows 6 hit, while version 4 has
no direct user yet besides "max".

Note that even with this change, aarch64 won't be able to work yet with
migration cross binary test, but then the only missing piece will be the
stable CPU model.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Peter Xu 
---
  tests/qtest/migration-test.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7675519cfa..8a5bb1752e 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -819,7 +819,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  } else if (strcmp(arch, "aarch64") == 0) {
  memory_size = "150M";
  machine_alias = "virt";
-machine_opts = "gic-version=max";
+machine_opts = "gic-version=3";
  arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
  start_address = ARM_TEST_MEM_START;
  end_address = ARM_TEST_MEM_END;


Looks like the migration test now fails on aarch64 when "configure" has been 
run with "--without-default-devices", since that disables the gicv3 in the 
binary ... is there a way to check whether the gicv3 is available, and use 
"=max" instead if it is not?


 Thomas

Re: [PATCH v7 2/4] qcow2: add configurations for zoned format extension

2024-02-19 Thread Markus Armbruster

I apologize for the delayed review.

Sam Li  writes:

> To configure the zoned format feature on the qcow2 driver, it
> requires settings as: the device size, zone model, zone size,
> zone capacity, number of conventional zones, limits on zone
> resources (max append bytes, max open zones, and max_active_zones).
>
> To create a qcow2 image with zoned format feature, use command like
> this:
> qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
> -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
> -o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
> -o zone.max_active_zones=8 -o zone.mode=host-managed
>
> Signed-off-by: Sam Li 

[...]

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index ca390c5700..e2e0ec21a5 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5038,6 +5038,67 @@
>  { 'enum': 'Qcow2CompressionType',
>'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
>  
> +##
> +# @Qcow2ZoneModel:
> +#
> +# Zoned device model used in qcow2 image file
> +#
> +# @host-managed: The host-managed model only allows sequential write over the
> +# device zones.
> +#
> +# Since 8.2
> +##
> +{ 'enum': 'Qcow2ZoneModel',
> +  'data': [ 'host-managed'] }
> +
> +##
> +# @Qcow2ZoneHostManaged:
> +#
> +# The host-managed zone model.  It only allows sequential writes.
> +#
> +# @size: Total number of bytes within zones.

Default?

> +#
> +# @capacity: The number of usable logical blocks within zones
> +# in bytes.  A zone capacity is always smaller or equal to the
> +# zone size.

Default?

> +#
> +# @conventional-zones: The number of conventional zones of the
> +# zoned device (default 0).
> +#
> +# @max-open-zones: The maximal number of open zones.  It is less than
> +# or equal to the number of sequential write required zones of
> +# the device (default 0).
> +#
> +# @max-active-zones: The maximal number of zones in the implicit
> +# open, explicit open or closed state.  It is less than or equal
> +# to the max open zones (default 0).
> +#
> +# @max-append-bytes: The maximal number of bytes of a zone
> +# append request that can be issued to the device.  It must be
> +# 512-byte aligned and less than the zone capacity.

Default?

> +#
> +# Since 8.2
> +##
> +{ 'struct': 'Qcow2ZoneHostManaged',
> +  'data': { '*size':  'size',
> +'*capacity':  'size',
> +'*conventional-zones': 'uint32',
> +'*max-open-zones': 'uint32',
> +'*max-active-zones':   'uint32',
> +'*max-append-bytes':   'size' } }
> +
> +##
> +# @Qcow2ZoneCreateOptions:
> +#
> +# The zone device model for the qcow2 image.
> +#
> +# Since 8.2
> +##
> +{ 'union': 'Qcow2ZoneCreateOptions',
> +  'base': { 'mode': 'Qcow2ZoneModel' },
> +  'discriminator': 'mode',
> +  'data': { 'host-managed': 'Qcow2ZoneHostManaged' } }
> +
>  ##
>  # @BlockdevCreateOptionsQcow2:
>  #
> @@ -5080,6 +5141,9 @@
>  # @compression-type: The image cluster compression method
>  # (default: zlib, since 5.1)
>  #
> +# @zone: The zone device model modes.  The default is that the device is
> +# not zoned.  (since 8.2)
> +#
>  # Since: 2.12
>  ##
>  { 'struct': 'BlockdevCreateOptionsQcow2',
> @@ -5096,7 +5160,8 @@
>  '*preallocation':   'PreallocMode',
>  '*lazy-refcounts':  'bool',
>  '*refcount-bits':   'int',
> -'*compression-type':'Qcow2CompressionType' } }
> +'*compression-type':'Qcow2CompressionType',
> +'*zone':'Qcow2ZoneCreateOptions' } }
>  
>  ##
>  # @BlockdevCreateOptionsQed:

Re: [PATCH 04/21] hw/tricore/testboard: Use qdev_new() instead of QOM basic API

2024-02-19 Thread Bastian Koppelmann

On Fri, Feb 16, 2024 at 12:02:55PM +0100, Philippe Mathieu-Daudé wrote:
> Prefer QDev API for QDev objects, avoid the underlying QOM layer.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/tricore/tricore_testdevice.h | 3 ---
>  hw/tricore/tricore_testboard.c  | 4 +---
>  2 files changed, 1 insertion(+), 6 deletions(-)

Reviewed-by: Bastian Koppelmann 

Cheers,
Bastian

Re: [PATCH v5 01/11] hw/nvme: Use pcie_sriov_num_vfs()

2024-02-19 Thread Klaus Jensen

On Feb 18 13:56, Akihiko Odaki wrote:
> nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> configurations to know the number of VFs being disabled due to SR-IOV
> configuration writes, but the logic was flawed and resulted in
> out-of-bound memory access.
> 
> It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> VFs, but it actually doesn't in the following cases:
> - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> - VFs were only partially enabled because of realization failure.
> 
> It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> provides, to get the number of enabled VFs before and after SR-IOV
> configuration writes.
> 
> Cc: qemu-sta...@nongnu.org
> Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management 
> command")
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Akihiko Odaki 

Thanks Akihiko,

I'll pick this up for hw/nvme nvme-next as-is.

Reviewed-by: Klaus Jensen 

> ---
>  hw/nvme/ctrl.c | 26 --
>  1 file changed, 8 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index f026245d1e9e..7a56e7b79b4d 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
>  nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
>  }
>  
> -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> -  uint32_t val, int len)
> +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t 
> old_num_vfs)
>  {
>  NvmeCtrl *n = NVME(dev);
>  NvmeSecCtrlEntry *sctrl;
> -uint16_t sriov_cap = dev->exp.sriov_cap;
> -uint32_t off = address - sriov_cap;
> -int i, num_vfs;
> +int i;
>  
> -if (!sriov_cap) {
> -return;
> -}
> -
> -if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> -if (!(val & PCI_SRIOV_CTRL_VFE)) {
> -num_vfs = pci_get_word(dev->config + sriov_cap + 
> PCI_SRIOV_NUM_VF);
> -for (i = 0; i < num_vfs; i++) {
> -sctrl = &n->sec_ctrl_list.sec[i];
> -nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> -}
> -}
> +for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> +sctrl = &n->sec_ctrl_list.sec[i];
> +nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
>  }
>  }
>  
>  static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
>uint32_t val, int len)
>  {
> -nvme_sriov_pre_write_ctrl(dev, address, val, len);
> +uint16_t old_num_vfs = pcie_sriov_num_vfs(dev);
> +
>  pci_default_write_config(dev, address, val, len);
>  pcie_cap_flr_write_config(dev, address, val, len);
> +nvme_sriov_post_write_config(dev, old_num_vfs);
>  }
>  
>  static const VMStateDescription nvme_vmstate = {
> 
> -- 
> 2.43.1
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature

Re: [PATCH 1/6] hw/arm: Inline sysbus_create_simple(PL110 / PL111)

2024-02-19 Thread BALATON Zoltan


On Mon, 19 Feb 2024, Philippe Mathieu-Daudé wrote:

On 19/2/24 12:27, BALATON Zoltan wrote:

On Mon, 19 Feb 2024, Philippe Mathieu-Daudé wrote:

On 16/2/24 20:54, Philippe Mathieu-Daudé wrote:

On 16/2/24 18:14, BALATON Zoltan wrote:

On Fri, 16 Feb 2024, Philippe Mathieu-Daudé wrote:

We want to set another qdev property (a link) for the pl110
and pl111 devices, we can not use sysbus_create_simple() which
only passes sysbus base address and IRQs as arguments. Inline
it so we can set the link property in the next commit.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/arm/realview.c    |  5 -
hw/arm/versatilepb.c |  6 +-
hw/arm/vexpress.c    | 10 --
3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 9058f5b414..77300e92e5 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -238,7 +238,10 @@ static void realview_init(MachineState *machine,
    sysbus_create_simple("pl061", 0x10014000, pic[7]);
    gpio2 = sysbus_create_simple("pl061", 0x10015000, pic[8]);

-    sysbus_create_simple("pl111", 0x1002, pic[23]);
+    dev = qdev_new("pl111");
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, 0x1002);
+    sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[23]);


Not directly related to this patch but this blows up 1 line into 4 just 
to allow setting a property. Maybe just to keep some simplicity we'd 
rather need either a sysbus_realize_simple function that takes a sysbus 
device instead of the name and does not create the device itself or some 
way to pass properties to sysbus create simple (but the latter may not 
be easy to do in a generic way so not sure about that). What do you 
think?


Unfortunately sysbus doesn't scale in heterogeneous setup.


Regarding the HW modelling API complexity you are pointing at, we'd
like to move from the current imperative programming paradigm to a
declarative one, likely DSL driven. Meanwhile it is being investigated
(as part of "Dynamic Machine"), I'm trying to get the HW APIs right


I'm aware of that activity but we're currently still using board code to 
construct machines and probably will continue to do so for a while. Also 
because likely not all current machines will be converted to new 
declarative way so having a convenient API for that is still useful.


(As for the language to describe the devices of a machine and their 
connections declaratively the device tree does just that but dts is not a 
very user friendly descrtiption language so I haven't brought that up as a 
possibility. But you may still could get some clues by looking at the 
problems it had to solve to at least get a requirements for the machine 
description language.)



for heterogeneous emulation. Current price to pay is a verbose
imperative QDev API, hoping we'll get later a trivial declarative one
(like this single sysbus_create_simple call), where we shouldn't worry
about the order of low level calls, whether to use link or not, etc.


Having a detailed low level API does not prevent a more convenient for 
current use higher level API on top so keeping that around for current 
machines would allow you to chnage the low level API without having to 
change all the board codes because you's only need to update the simple 
high level API.


So what is your suggestion here, add a new complex helper to keep
a one-line style?

DeviceState *sysbus_create_simple_dma_link(const char *typename,
  hwaddr baseaddr,
  const char *linkname,
  Object *linkobj,
  qemu_irq irq);


I think just having sysbus_realize_simple that does the same as 
sysbus_create_simple minus creating the device would be enough because 
then the cases where you need to set properties could still use it after 
qdev_new or init and property_set but hide the realize and connecting the 
device behind this single call.



I wonder why this is that important since you never modified
any of the files changed by this series:


For new people trying to contribute to QEMU QDev is overwhelming so having 
some way to need less of it to do simple things would help them to get 
started.


Regards,
BALATON Zoltan

Re: [PATCH v7 2/4] qcow2: add configurations for zoned format extension

2024-02-19 Thread Markus Armbruster

One more thing...

Markus Armbruster  writes:

> I apologize for the delayed review.
>
> Sam Li  writes:
>
>> To configure the zoned format feature on the qcow2 driver, it
>> requires settings as: the device size, zone model, zone size,
>> zone capacity, number of conventional zones, limits on zone
>> resources (max append bytes, max open zones, and max_active_zones).
>>
>> To create a qcow2 image with zoned format feature, use command like
>> this:
>> qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
>> -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
>> -o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
>> -o zone.max_active_zones=8 -o zone.mode=host-managed
>>
>> Signed-off-by: Sam Li 
>
> [...]
>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index ca390c5700..e2e0ec21a5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -5038,6 +5038,67 @@
>>  { 'enum': 'Qcow2CompressionType',
>>'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
>>  
>> +##
>> +# @Qcow2ZoneModel:
>> +#
>> +# Zoned device model used in qcow2 image file
>> +#
>> +# @host-managed: The host-managed model only allows sequential write over 
>> the
>> +# device zones.
>> +#
>> +# Since 8.2
>> +##
>> +{ 'enum': 'Qcow2ZoneModel',
>> +  'data': [ 'host-managed'] }
>> +
>> +##
>> +# @Qcow2ZoneHostManaged:
>> +#
>> +# The host-managed zone model.  It only allows sequential writes.
>> +#
>> +# @size: Total number of bytes within zones.
>
> Default?
>
>> +#
>> +# @capacity: The number of usable logical blocks within zones
>> +# in bytes.  A zone capacity is always smaller or equal to the
>> +# zone size.
>
> Default?
>
>> +#
>> +# @conventional-zones: The number of conventional zones of the
>> +# zoned device (default 0).
>> +#
>> +# @max-open-zones: The maximal number of open zones.  It is less than
>> +# or equal to the number of sequential write required zones of
>> +# the device (default 0).
>> +#
>> +# @max-active-zones: The maximal number of zones in the implicit
>> +# open, explicit open or closed state.  It is less than or equal
>> +# to the max open zones (default 0).
>> +#
>> +# @max-append-bytes: The maximal number of bytes of a zone
>> +# append request that can be issued to the device.  It must be
>> +# 512-byte aligned and less than the zone capacity.
>
> Default?
>
>> +#
>> +# Since 8.2
>> +##
>> +{ 'struct': 'Qcow2ZoneHostManaged',
>> +  'data': { '*size':  'size',
>> +'*capacity':  'size',
>> +'*conventional-zones': 'uint32',
>> +'*max-open-zones': 'uint32',
>> +'*max-active-zones':   'uint32',
>> +'*max-append-bytes':   'size' } }
>> +
>> +##
>> +# @Qcow2ZoneCreateOptions:
>> +#
>> +# The zone device model for the qcow2 image.

Please document member @mode.

Fails to build since merge commit 61e7a0d27c1:

qapi/block-core.json: In union 'Qcow2ZoneCreateOptions':
qapi/block-core.json:5135: member 'mode' lacks documentation

>> +#
>> +# Since 8.2
>> +##
>> +{ 'union': 'Qcow2ZoneCreateOptions',
>> +  'base': { 'mode': 'Qcow2ZoneModel' },
>> +  'discriminator': 'mode',
>> +  'data': { 'host-managed': 'Qcow2ZoneHostManaged' } }
>> +
>>  ##
>>  # @BlockdevCreateOptionsQcow2:
>>  #
>> @@ -5080,6 +5141,9 @@
>>  # @compression-type: The image cluster compression method
>>  # (default: zlib, since 5.1)
>>  #
>> +# @zone: The zone device model modes.  The default is that the device is
>> +# not zoned.  (since 8.2)
>> +#
>>  # Since: 2.12
>>  ##
>>  { 'struct': 'BlockdevCreateOptionsQcow2',
>> @@ -5096,7 +5160,8 @@
>>  '*preallocation':   'PreallocMode',
>>  '*lazy-refcounts':  'bool',
>>  '*refcount-bits':   'int',
>> -'*compression-type':'Qcow2CompressionType' } }
>> +'*compression-type':'Qcow2CompressionType',
>> +'*zone':'Qcow2ZoneCreateOptions' } }
>>  
>>  ##
>>  # @BlockdevCreateOptionsQed:

Re: [PATCH v3 2/3] hw/virtio: cleanup shared resources

2024-02-19 Thread Albert Esteve

On Mon, Feb 19, 2024 at 11:45 AM Albert Esteve  wrote:

>
>
>
> On Thu, Feb 15, 2024 at 10:45 AM Albert Esteve  wrote:
>
>>
>>
>> On Tue, Feb 6, 2024 at 12:11 AM Alex Bennée 
>> wrote:
>>
>>> Albert Esteve  writes:
>>>
>>> > Ensure that we cleanup all virtio shared
>>> > resources when the vhost devices is cleaned
>>> > up (after a hot unplug, or a crash).
>>> >
>>> > To do so, we add a new function to the virtio_dmabuf
>>> > API called `virtio_dmabuf_vhost_cleanup`, which
>>> > loop through the table and removes all
>>> > resources owned by the vhost device parameter.
>>> >
>>> > Also, add a test to verify that the new
>>> > function in the API behaves as expected.
>>> >
>>> > Signed-off-by: Albert Esteve 
>>> > Acked-by: Stefan Hajnoczi 
>>> > ---
>>> >  hw/display/virtio-dmabuf.c| 22 +
>>> >  hw/virtio/vhost.c |  3 +++
>>> >  include/hw/virtio/virtio-dmabuf.h | 10 ++
>>> >  tests/unit/test-virtio-dmabuf.c   | 33 +++
>>> >  4 files changed, 68 insertions(+)
>>> >
>>> > diff --git a/hw/display/virtio-dmabuf.c b/hw/display/virtio-dmabuf.c
>>> > index 3dba4577ca..6688809777 100644
>>> > --- a/hw/display/virtio-dmabuf.c
>>> > +++ b/hw/display/virtio-dmabuf.c
>>> > @@ -136,6 +136,28 @@ SharedObjectType virtio_object_type(const
>>> QemuUUID *uuid)
>>> >  return vso->type;
>>> >  }
>>> >
>>> > +static bool virtio_dmabuf_resource_is_owned(gpointer key,
>>> > +gpointer value,
>>> > +gpointer dev)
>>> > +{
>>> > +VirtioSharedObject *vso;
>>> > +
>>> > +vso = (VirtioSharedObject *) value;
>>> > +return vso->type == TYPE_VHOST_DEV && vso->value == dev;
>>>
>>> It's a bit surprising to see vso->value being an anonymous gpointer
>>> rather than the proper type and a bit confusing between value and
>>> vso->value.
>>>
>>>
>> It is the signature required for this to be used with
>> `g_hash_table_foreach_remove`.
>> For the naming, the HashMap stores gpointers, that point to
>> `VirtioSharedObject`, and
>> these point to the underlying type (stored at `vso->value`). It may sound
>> a bit confusing,
>> but is a byproduct of the VirtioSharedObject indirection. Not sure which
>> names could be
>> more fit for this, but I'm open to change them.
>>
>>
>>> > +}
>>> > +
>>> > +int virtio_dmabuf_vhost_cleanup(struct vhost_dev *dev)
>>> > +{
>>> > +int num_removed;
>>> > +
>>> > +g_mutex_lock(&lock);
>>> > +num_removed = g_hash_table_foreach_remove(
>>> > +resource_uuids, (GHRFunc) virtio_dmabuf_resource_is_owned,
>>> dev);
>>> > +g_mutex_unlock(&lock);
>>>
>>> I'll note if we used a QemuMutex for the lock we could:
>>>
>>>   - use WITH_QEMU_LOCK_GUARD(&lock) { }
>>>   - enable QSP porfiling for the lock
>>>
>>>
>> Was not aware of these QemuMutex's. I wouldn't mind changing the mutex in
>> this
>> file in a different commit.
>>
>
> The problem is that current lock is a global static, and `QemuMutex` needs
> to be
> initialised by doing `qemu_mutex_init(&lock);`.
>
> Maybe can be initialised at vhost-user.c by adding a public function?
>

I think `virtio_init` at `virtio.c` will be a better candidate.


>
>
>>
>>
>>> > +
>>> > +return num_removed;
>>> > +}
>>> > +
>>> >  void virtio_free_resources(void)
>>> >  {
>>> >  g_mutex_lock(&lock);
>>> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> > index 2c9ac79468..c5622eac14 100644
>>> > --- a/hw/virtio/vhost.c
>>> > +++ b/hw/virtio/vhost.c
>>> > @@ -16,6 +16,7 @@
>>> >  #include "qemu/osdep.h"
>>> >  #include "qapi/error.h"
>>> >  #include "hw/virtio/vhost.h"
>>> > +#include "hw/virtio/virtio-dmabuf.h"
>>> >  #include "qemu/atomic.h"
>>> >  #include "qemu/range.h"
>>> >  #include "qemu/error-report.h"
>>> > @@ -1599,6 +1600,8 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
>>> >  migrate_del_blocker(&hdev->migration_blocker);
>>> >  g_free(hdev->mem);
>>> >  g_free(hdev->mem_sections);
>>> > +/* free virtio shared objects */
>>> > +virtio_dmabuf_vhost_cleanup(hdev);
>>> >  if (hdev->vhost_ops) {
>>> >  hdev->vhost_ops->vhost_backend_cleanup(hdev);
>>> >  }
>>> > diff --git a/include/hw/virtio/virtio-dmabuf.h
>>> b/include/hw/virtio/virtio-dmabuf.h
>>> > index 627c3b6db7..73f70fb482 100644
>>> > --- a/include/hw/virtio/virtio-dmabuf.h
>>> > +++ b/include/hw/virtio/virtio-dmabuf.h
>>> > @@ -91,6 +91,16 @@ struct vhost_dev *virtio_lookup_vhost_device(const
>>> QemuUUID *uuid);
>>> >   */
>>> >  SharedObjectType virtio_object_type(const QemuUUID *uuid);
>>> >
>>> > +/**
>>> > + * virtio_dmabuf_vhost_cleanup() - Destroys all entries of the shared
>>> > + * resources lookup table that are owned by the vhost backend
>>> > + * @dev: the pointer to the vhost device that owns the entries. Data
>>> is owned
>>> > + *   by the called of the function.
>>> > + *
>>> > + * Return: the number of resource entries removed.
>>> > + */
>>>

Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.

2024-02-19 Thread Jonathan Cameron via

On Thu, 15 Feb 2024 09:30:27 -1000
Richard Henderson  wrote:

> On 2/15/24 05:01, Jonathan Cameron wrote:
> > On i386, after fixing the page walking code to work with pages in
> > MMIO memory (specifically CXL emulated interleaved memory),
> > a crash was seen in an interrupt handling path.
> > 
> > Useful part of bt
> > 
> > Peter identified this as being due to the BQL already being
> > held when the page table walker encounters MMIO memory and attempts
> > to take the lock again.  There are other examples of similar paths
> > TCG, so this follows the approach taken in those of simply checking
> > if the lock is already held and if it is, don't take it again.
> > 
> > Suggested-by: Peter Maydell 
> > Signed-off-by: Jonathan Cameron 
> > ---
> >   accel/tcg/cputlb.c | 9 +++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> > index 047cd2cc0a..3b8d178707 100644
> > --- a/accel/tcg/cputlb.c
> > +++ b/accel/tcg/cputlb.c
> > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, 
> > CPUTLBEntryFull *full,
> >  int mmu_idx, MMUAccessType type, uintptr_t 
> > ra)
> >   {
> >   MemoryRegionSection *section;
> > +bool locked = bql_locked();
> >   MemoryRegion *mr;
> >   hwaddr mr_offset;
> >   MemTxAttrs attrs;
> > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, 
> > CPUTLBEntryFull *full,
> >   section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, 
> > addr, ra);
> >   mr = section->mr;
> >   
> > -bql_lock();
> > +if (!locked) {
> > +bql_lock();
> > +}
> >   ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
> > type, ra, mr, mr_offset);
> > -bql_unlock();
> > +if (!locked) {
> > +bql_unlock();
> > +}  
> 
> On top of other comments, I'm never keen on this type of 
> test/lock/test/unlock.  When this 
> kind of thing is encountered, it means we should have been using a recursive 
> lock in the 
> first place.

Hi Richard,

Whilst I agree this stuff is really ugly, is it practical to fix it for this 
case?
Or was intent here to make a general comment on QEMU locking?

Jonathan


> 
> 
> r~

Re: [PATCH v7 2/4] qcow2: add configurations for zoned format extension

2024-02-19 Thread Sam Li

Markus Armbruster  于2024年2月19日周一 13:05写道：
>
> One more thing...
>
> Markus Armbruster  writes:
>
> > I apologize for the delayed review.

No problems. Thanks for reviewing!

> >
> > Sam Li  writes:
> >
> >> To configure the zoned format feature on the qcow2 driver, it
> >> requires settings as: the device size, zone model, zone size,
> >> zone capacity, number of conventional zones, limits on zone
> >> resources (max append bytes, max open zones, and max_active_zones).
> >>
> >> To create a qcow2 image with zoned format feature, use command like
> >> this:
> >> qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
> >> -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
> >> -o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
> >> -o zone.max_active_zones=8 -o zone.mode=host-managed
> >>
> >> Signed-off-by: Sam Li 
> >
> > [...]
> >
> >> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> index ca390c5700..e2e0ec21a5 100644
> >> --- a/qapi/block-core.json
> >> +++ b/qapi/block-core.json
> >> @@ -5038,6 +5038,67 @@
> >>  { 'enum': 'Qcow2CompressionType',
> >>'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >>
> >> +##
> >> +# @Qcow2ZoneModel:
> >> +#
> >> +# Zoned device model used in qcow2 image file
> >> +#
> >> +# @host-managed: The host-managed model only allows sequential write over 
> >> the
> >> +# device zones.
> >> +#
> >> +# Since 8.2
> >> +##
> >> +{ 'enum': 'Qcow2ZoneModel',
> >> +  'data': [ 'host-managed'] }
> >> +
> >> +##
> >> +# @Qcow2ZoneHostManaged:
> >> +#
> >> +# The host-managed zone model.  It only allows sequential writes.
> >> +#
> >> +# @size: Total number of bytes within zones.
> >
> > Default?

It should be set by users. No default value provided. If it's unset
then it is zero and an error will be returned.

> >
> >> +#
> >> +# @capacity: The number of usable logical blocks within zones
> >> +# in bytes.  A zone capacity is always smaller or equal to the
> >> +# zone size.
> >
> > Default?

Same.

> >
> >> +# @max-append-bytes: The maximal number of bytes of a zone
> >> +# append request that can be issued to the device.  It must be
> >> +# 512-byte aligned and less than the zone capacity.
> >
> > Default?

Same.

For those values, I guess it could be set when users provide no
information and still want a workable emulated zoned block device.

> >
> >> +#
> >> +# Since 8.2
> >> +##
> >> +{ 'struct': 'Qcow2ZoneHostManaged',
> >> +  'data': { '*size':  'size',
> >> +'*capacity':  'size',
> >> +'*conventional-zones': 'uint32',
> >> +'*max-open-zones': 'uint32',
> >> +'*max-active-zones':   'uint32',
> >> +'*max-append-bytes':   'size' } }
> >> +
> >> +##
> >> +# @Qcow2ZoneCreateOptions:
> >> +#
> >> +# The zone device model for the qcow2 image.
>
> Please document member @mode.
>
> Fails to build since merge commit 61e7a0d27c1:
>
> qapi/block-core.json: In union 'Qcow2ZoneCreateOptions':
> qapi/block-core.json:5135: member 'mode' lacks documentation
>

I see. Will update to the latest commit.

> >> +#
> >> +# Since 8.2
> >> +##
> >> +{ 'union': 'Qcow2ZoneCreateOptions',
> >> +  'base': { 'mode': 'Qcow2ZoneModel' },
> >> +  'discriminator': 'mode',
> >> +  'data': { 'host-managed': 'Qcow2ZoneHostManaged' } }
> >> +
> >>  ##
> >>  # @BlockdevCreateOptionsQcow2:
> >>  #
> >> @@ -5080,6 +5141,9 @@
> >>  # @compression-type: The image cluster compression method
> >>  # (default: zlib, since 5.1)
> >>  #
> >> +# @zone: The zone device model modes.  The default is that the device is
> >> +# not zoned.  (since 8.2)
> >> +#
> >>  # Since: 2.12
> >>  ##
> >>  { 'struct': 'BlockdevCreateOptionsQcow2',
> >> @@ -5096,7 +5160,8 @@
> >>  '*preallocation':   'PreallocMode',
> >>  '*lazy-refcounts':  'bool',
> >>  '*refcount-bits':   'int',
> >> -'*compression-type':'Qcow2CompressionType' } }
> >> +'*compression-type':'Qcow2CompressionType',
> >> +'*zone':'Qcow2ZoneCreateOptions' } }
> >>
> >>  ##
> >>  # @BlockdevCreateOptionsQed:
>

Re: [PATCH] tests/qtest: Fix boot-serial-test when using --without-default-devices

2024-02-19 Thread Thomas Huth


On 19/02/2024 12.37, BALATON Zoltan wrote:

On Mon, 19 Feb 2024, Thomas Huth wrote:

If "configure" has been run with "--without-default-devices", there is
no e1000 device in the binaries, so the boot-serial-test currently fails
in that case since it tries to use the e1000 with the sam460ex machine.

Since we're testing the serial output here, and not the NIC, let's
simply switch to the "pci-bridge" device here instead, which should
always be there for PCIe-based machines like the sam460ex.


It's not actually testing PCIe but PCI bus but I think that does not matter. 
PCIe on sam460ex does not work yet, I've only implemented it partially to 
pass the firmware init but devices attached to the PCIe bus probably won't 
work. I have some patches to improve that but not yet ready.


Ah, ok, I looked at the Kconfig file and saw the "select PCI_EXPRESS" there 
that got selected by PPC440 (which gets selected by SAM460EX), that's why I 
concluded that it must be "PCIe-based" ... I'll drop the "e" here.



Signed-off-by: Thomas Huth 
---
tests/qtest/boot-serial-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index 6dd06aeaf4..e3b7d65fe5 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -156,7 +156,7 @@ static const testdef_t tests[] = {
  "Open Firmware" },
    { "ppc64", "powernv8", "", "OPAL" },
    { "ppc64", "powernv9", "", "OPAL" },
-    { "ppc64", "sam460ex", "-device e1000", "8086  100e" },
+    { "ppc64", "sam460ex", "-device pci-bridge,chassis_nr=2", "1b36  
0001" },


So if you want to check if PCI bus works then maybe there's no need to add a 
device at all just look for the sm501 display chip ("126f 0501") that's 
soldered on the board so it's always created even with -nodefaults and 
should always present on sam460ex. The -device option just adds a device 
that appears before the sm501 and stops the test there. Not sure if this is 
testing more than looking for a PCI device created by the board code.


I was considering the sm501, too, but I thought that we might test a little 
bit more if we check that cold-plugging via "-device" works, too, so I'd 
prefer to keep it this way.


 Thomas

Re: [PATCH 1/6] hw/arm: Inline sysbus_create_simple(PL110 / PL111)

2024-02-19 Thread Philippe Mathieu-Daudé


On 19/2/24 13:00, BALATON Zoltan wrote:

On Mon, 19 Feb 2024, Philippe Mathieu-Daudé wrote:

On 19/2/24 12:27, BALATON Zoltan wrote:

On Mon, 19 Feb 2024, Philippe Mathieu-Daudé wrote:

On 16/2/24 20:54, Philippe Mathieu-Daudé wrote:

On 16/2/24 18:14, BALATON Zoltan wrote:

On Fri, 16 Feb 2024, Philippe Mathieu-Daudé wrote:

We want to set another qdev property (a link) for the pl110
and pl111 devices, we can not use sysbus_create_simple() which
only passes sysbus base address and IRQs as arguments. Inline
it so we can set the link property in the next commit.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/arm/realview.c    |  5 -
hw/arm/versatilepb.c |  6 +-
hw/arm/vexpress.c    | 10 --
3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 9058f5b414..77300e92e5 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -238,7 +238,10 @@ static void realview_init(MachineState 
*machine,

    sysbus_create_simple("pl061", 0x10014000, pic[7]);
    gpio2 = sysbus_create_simple("pl061", 0x10015000, pic[8]);

-    sysbus_create_simple("pl111", 0x1002, pic[23]);
+    dev = qdev_new("pl111");
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, 0x1002);
+    sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[23]);


Not directly related to this patch but this blows up 1 line into 4 
just to allow setting a property. Maybe just to keep some 
simplicity we'd rather need either a sysbus_realize_simple 
function that takes a sysbus device instead of the name and does 
not create the device itself or some way to pass properties to 
sysbus create simple (but the latter may not be easy to do in a 
generic way so not sure about that). What do you think?


Unfortunately sysbus doesn't scale in heterogeneous setup.


Regarding the HW modelling API complexity you are pointing at, we'd
like to move from the current imperative programming paradigm to a
declarative one, likely DSL driven. Meanwhile it is being investigated
(as part of "Dynamic Machine"), I'm trying to get the HW APIs right


I'm aware of that activity but we're currently still using board code 
to construct machines and probably will continue to do so for a 
while. Also because likely not all current machines will be converted 
to new declarative way so having a convenient API for that is still 
useful.


(As for the language to describe the devices of a machine and their 
connections declaratively the device tree does just that but dts is 
not a very user friendly descrtiption language so I haven't brought 
that up as a possibility. But you may still could get some clues by 
looking at the problems it had to solve to at least get a 
requirements for the machine description language.)



for heterogeneous emulation. Current price to pay is a verbose
imperative QDev API, hoping we'll get later a trivial declarative one
(like this single sysbus_create_simple call), where we shouldn't worry
about the order of low level calls, whether to use link or not, etc.


Having a detailed low level API does not prevent a more convenient 
for current use higher level API on top so keeping that around for 
current machines would allow you to chnage the low level API without 
having to change all the board codes because you's only need to 
update the simple high level API.


So what is your suggestion here, add a new complex helper to keep
a one-line style?

DeviceState *sysbus_create_simple_dma_link(const char *typename,
  hwaddr baseaddr,
  const char *linkname,
  Object *linkobj,
  qemu_irq irq);


I think just having sysbus_realize_simple that does the same as 
sysbus_create_simple minus creating the device would be enough because 
then the cases where you need to set properties could still use it after 
qdev_new or init and property_set but hide the realize and connecting 
the device behind this single call.


So you suggest splitting sysbus_create_simple() as
sysbus_create_simple() + sysbus_realize_simple(), so we can set
properties between the 2 calls? IOW extract qdev_new() from
sysbus_create_varargs() and rename it as sysbus_realize_simple()?

So we need a massive refactoring of:

- dev = sysbus_create_simple(typename, addr, irq);
+ dev = qdev_new(typename);
+ // optionally set properties
+ sysbus_realize_simple(dev, addr, irq);

- dev = sysbus_create_varargs(typename, addr, irqA, irqB, ...);
+ dev = qdev_new(typename);
+ // optionally set properties
+ sysbus_realize_varargs(dev, addr, irqA, irqB, ...);

I'm not sure it is worth it because we want to move away from
sysbus, merging the non-sysbus specific API to qdev (like indexed
memory regions and IRQs to named ones).


I wonder why this is that important since you never modified
any of the files changed by this series:


For ne

Re: [PATCH v4 11/66] i386: Introduce tdx-guest object

2024-02-19 Thread Markus Armbruster

Xiaoyao Li  writes:

> Introduce tdx-guest object which implements the interface of
> CONFIDENTIAL_GUEST_SUPPORT, and will be used to create TDX VMs (TDs) by
>
>   qemu -machine ...,confidential-guest-support=tdx0   \
>-object tdx-guest,id=tdx0
>
> It has only one member 'attributes' with fixed value 0 and not
> configurable so far.

Really?  Can't see it.

Suggest to add something like "Configuration properties will be added
later in this series."

> Signed-off-by: Xiaoyao Li 
> Acked-by: Gerd Hoffmann 
> Acked-by: Markus Armbruster 

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index 95516ba325e5..5b3c3146947f 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -895,6 +895,16 @@
>  'reduced-phys-bits': 'uint32',
>  '*kernel-hashes': 'bool' } }
>  
> +##
> +# @TdxGuestProperties:
> +#
> +# Properties for tdx-guest objects.
> +#
> +# Since: 9.0
> +##
> +{ 'struct': 'TdxGuestProperties',
> +  'data': { }}
> +
>  ##
>  # @ThreadContextProperties:
>  #
> @@ -974,6 +984,7 @@
>  'sev-guest',
>  'thread-context',
>  's390-pv-guest',
> +'tdx-guest',
>  'throttle-group',
>  'tls-creds-anon',
>  'tls-creds-psk',
> @@ -1041,6 +1052,7 @@
>'secret_keyring': { 'type': 'SecretKeyringProperties',
>'if': 'CONFIG_SECRET_KEYRING' },
>'sev-guest':  'SevGuestProperties',
> +  'tdx-guest':  'TdxGuestProperties',
>'thread-context': 'ThreadContextProperties',
>'throttle-group': 'ThrottleGroupProperties',
>'tls-creds-anon': 'TlsCredsAnonProperties',

[...]

[PATCH] tests/qtest: Don't run the dbus-display-test without CONFIG_VGA_PCI

2024-02-19 Thread Thomas Huth

When compiling with "configure --without-default-devices", the
dbus-display-test fails since it implicitly assumes that the
machine comes with the standard VGA card. Thus add a check to
meson.build to disable the test if the VGA card is not available.

Signed-off-by: Thomas Huth 
---
 tests/qtest/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 2b89e8634b..c8e6d7df40 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -108,7 +108,7 @@ qtests_i386 = \
'numa-test'
   ]
 
-if dbus_display
+if dbus_display and config_all_devices.has_key('CONFIG_VGA_PCI')
   qtests_i386 += ['dbus-display-test']
 endif
 
-- 
2.43.2

Re: [PATCH v2 02/11] hw/audio/virtio-sound: fix segmentation fault in tx/rx xfer handler

2024-02-19 Thread Manos Pitsidianakis

Hello Volker, thanks for working on this,

On Sun, 18 Feb 2024 at 10:33, Volker Rümelin  wrote:
>
> A malicious guest may trigger a segmentation fault in the tx/rx xfer
> handlers. On handler entry the stream variable is initialized with
> NULL. If the first element of the virtio queue has an invalid size
> or an invalid stream id, the error handling code dereferences the
> stream variable NULL pointer.

Why not just add a bounds check and a null check instead?

>
> Don't try to handle the invalid virtio queue element with a stream
> queue. Instead, push the invalid queue element back to the guest
> immediately.

IIRC this will result in an infinite loop, because the code is
emptying the vq until virtqueue_pop returns NULL.

So if you add the invalid message back, the vq will never be empty.
Eventually you will loop over all invalid messages forever.

(Please correct me if I'm wrong of course!)

>
> Signed-off-by: Volker Rümelin 
> ---
>  hw/audio/virtio-snd.c | 100 ++
>  include/hw/audio/virtio-snd.h |   1 -
>  2 files changed, 29 insertions(+), 72 deletions(-)
>
> diff --git a/hw/audio/virtio-snd.c b/hw/audio/virtio-snd.c
> index e604d8f30c..b87653daf4 100644
> --- a/hw/audio/virtio-snd.c
> +++ b/hw/audio/virtio-snd.c
> @@ -456,7 +456,6 @@ static uint32_t virtio_snd_pcm_prepare(VirtIOSound *s, 
> uint32_t stream_id)
>  stream->s = s;
>  qemu_mutex_init(&stream->queue_mutex);
>  QSIMPLEQ_INIT(&stream->queue);
> -QSIMPLEQ_INIT(&stream->invalid);
>
>  /*
>   * stream_id >= s->snd_conf.streams was checked before so this is
> @@ -611,9 +610,6 @@ static size_t 
> virtio_snd_pcm_get_io_msgs_count(VirtIOSoundPCMStream *stream)
>  QSIMPLEQ_FOREACH_SAFE(buffer, &stream->queue, entry, next) {
>  count += 1;
>  }
> -QSIMPLEQ_FOREACH_SAFE(buffer, &stream->invalid, entry, next) {
> -count += 1;
> -}
>  }
>  return count;
>  }
> @@ -831,47 +827,19 @@ static void virtio_snd_handle_event(VirtIODevice *vdev, 
> VirtQueue *vq)
>  trace_virtio_snd_handle_event();
>  }
>
> -static inline void empty_invalid_queue(VirtIODevice *vdev, VirtQueue *vq)
> +static void push_bad_msg_resp(VirtQueue *vq, VirtQueueElement *elem)
>  {
> -VirtIOSoundPCMBuffer *buffer = NULL;
> -VirtIOSoundPCMStream *stream = NULL;
>  virtio_snd_pcm_status resp = { 0 };
> -VirtIOSound *vsnd = VIRTIO_SND(vdev);
> -bool any = false;
> -
> -for (uint32_t i = 0; i < vsnd->snd_conf.streams; i++) {
> -stream = vsnd->pcm->streams[i];
> -if (stream) {
> -any = false;
> -WITH_QEMU_LOCK_GUARD(&stream->queue_mutex) {
> -while (!QSIMPLEQ_EMPTY(&stream->invalid)) {
> -buffer = QSIMPLEQ_FIRST(&stream->invalid);
> -if (buffer->vq != vq) {
> -break;
> -}
> -any = true;
> -resp.status = cpu_to_le32(VIRTIO_SND_S_BAD_MSG);
> -iov_from_buf(buffer->elem->in_sg,
> - buffer->elem->in_num,
> - 0,
> - &resp,
> - sizeof(virtio_snd_pcm_status));
> -virtqueue_push(vq,
> -   buffer->elem,
> -   sizeof(virtio_snd_pcm_status));
> -QSIMPLEQ_REMOVE_HEAD(&stream->invalid, entry);
> -virtio_snd_pcm_buffer_free(buffer);
> -}
> -if (any) {
> -/*
> - * Notify vq about virtio_snd_pcm_status responses.
> - * Buffer responses must be notified separately later.
> - */
> -virtio_notify(vdev, vq);
> -}
> -}
> -}
> -}
> +size_t msg_sz;
> +
> +resp.status = cpu_to_le32(VIRTIO_SND_S_BAD_MSG);
> +msg_sz = iov_from_buf(elem->in_sg,
> +  elem->in_num,
> +  0,
> +  &resp,
> +  sizeof(virtio_snd_pcm_status));
> +virtqueue_push(vq, elem, msg_sz);
> +g_free(elem);
>  }
>
>  /*
> @@ -890,11 +858,7 @@ static void virtio_snd_handle_tx_xfer(VirtIODevice 
> *vdev, VirtQueue *vq)
>  size_t msg_sz, size;
>  virtio_snd_pcm_xfer hdr;
>  uint32_t stream_id;
> -/*
> - * If any of the I/O messages are invalid, put them in stream->invalid 
> and
> - * return them after the for loop.
> - */
> -bool must_empty_invalid_queue = false;
> +bool notify = false;
>
>  if (!virtio_queue_ready(vq)) {
>  return;
> @@ -942,17 +906,16 @@ static void virtio_snd_handle_tx_xfer(VirtIODevice 
> *vdev, VirtQueue *vq)
>  continue;
>
>  tx_err:
> -WITH_QEMU_LOCK_GUARD(&stream->

Re: [PATCH v4 11/66] i386: Introduce tdx-guest object

2024-02-19 Thread Daniel P . Berrangé

On Mon, Feb 19, 2024 at 01:34:37PM +0100, Markus Armbruster wrote:
> Xiaoyao Li  writes:
> 
> > Introduce tdx-guest object which implements the interface of
> > CONFIDENTIAL_GUEST_SUPPORT, and will be used to create TDX VMs (TDs) by
> >
> >   qemu -machine ...,confidential-guest-support=tdx0 \
> >-object tdx-guest,id=tdx0
> >
> > It has only one member 'attributes' with fixed value 0 and not
> > configurable so far.
> 
> Really?  Can't see it.

The 'attributes' referred to is an internal struct field,
rather than a QAPI declared member.

> 
> Suggest to add something like "Configuration properties will be added
> later in this series."
> 
> > Signed-off-by: Xiaoyao Li 
> > Acked-by: Gerd Hoffmann 
> > Acked-by: Markus Armbruster 
> 
> [...]
> 
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index 95516ba325e5..5b3c3146947f 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -895,6 +895,16 @@
> >  'reduced-phys-bits': 'uint32',
> >  '*kernel-hashes': 'bool' } }
> >  
> > +##
> > +# @TdxGuestProperties:
> > +#
> > +# Properties for tdx-guest objects.
> > +#
> > +# Since: 9.0
> > +##
> > +{ 'struct': 'TdxGuestProperties',
> > +  'data': { }}
> > +
> >  ##
> >  # @ThreadContextProperties:
> >  #
> > @@ -974,6 +984,7 @@
> >  'sev-guest',
> >  'thread-context',
> >  's390-pv-guest',
> > +'tdx-guest',
> >  'throttle-group',
> >  'tls-creds-anon',
> >  'tls-creds-psk',
> > @@ -1041,6 +1052,7 @@
> >'secret_keyring': { 'type': 'SecretKeyringProperties',
> >'if': 'CONFIG_SECRET_KEYRING' },
> >'sev-guest':  'SevGuestProperties',
> > +  'tdx-guest':  'TdxGuestProperties',
> >'thread-context': 'ThreadContextProperties',
> >'throttle-group': 'ThrottleGroupProperties',
> >'tls-creds-anon': 'TlsCredsAnonProperties',
> 
> [...]
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

1 2 3 4 >

1 - 100 of 312 matches

Mail list logo