Re: [RFC] Unable to use qemu-ppc to run 32-bit powerpc executables generated on gcc110 machine
Le 12/08/2022 à 06:16, Thomas Huth a écrit : On 11/08/2022 23.38, Pierre Muller wrote: I am using qemu to check code generated by Free Pascal compiler for various CPUs. Recently, this allowed me to find out that Free Pascal was generating wrong instructions, leading to SIGBUS errors using qemu-mips. The same binaries worked without troubles on mips test machines, probably because SIGBUS is handled directly inside the kernel. Here I would like to report the problem I get when trying to run powerpc executables using shared libs generated on gcc110 machine. I copied over the needed libraries into a sys-root directory. The problem is that the code crashes with a Illegal Instruction after only a very few instructions: muller@gcc186:~/pas/check$ ~/sys-root/bin/qemu-ppc -cpu g2 -d in_asm -L ~/sys-root/powerpc-linux ./twide1 [...] 0x3ffc1d60: f4d7 xxlxor v0, v0, v0 qemu: uncaught target signal 4 (Illegal instruction) - core dumped The problem is the the 'xxlxor' instruction is a VSX extension instruction. There is apparently no cpu in the powerpc cpu list that enabled this extension. The output of cat /proc/cpuinfo on gcc110 gives that: . processor : 63 cpu : POWER7 (architected), altivec supported clock : 3550.00MHz revision : 2.1 (pvr 003f 0201) timebase : 51200 platform : pSeries model : IBM,8231-E2B machine : CHRP IBM,8231-E2B Is there a way to enable cpu features separately for ppc like is done for x86_64? Or would it be possible to define a new cpu inside qemu source that would match the description above? So you are building on a POWER7 host and try to run the binaries on an emulated G2? That sounds weird. Why don't you use The g2 was just an example, I used a script to iterate over all possible cpus (as listed by --cpu help), but I always get a Illegal instruction on xllxor, because none of the cpu in the least seems to enable VSX extension. qemu-ppc64 -cpu power7 ... Because I am interested in testing 32-bit ELF binaries: muller@gcc186:~/pas/check$ file ./twide1 ./twide1: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), dynamically linked, interpreter /lib/ld.so.1, stripped muller@gcc186:~/pas/check$ qemu-ppc64 -cpu power7 ./twide1 -bash: qemu-ppc64: command not found muller@gcc186:~/pas/check$ ~/sys-root/bin/qemu-ppc64 -cpu power7 ./twide1 qemu-ppc64: ./twide1: Invalid ELF image for this architecture muller@gcc186:~/pas/check$ ~/sys-root/bin/qemu-ppc64 --version qemu-ppc64 version 7.0.0 Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers muller@gcc186:~/pas/check$ ~/gnu/qemu/build-qemu-7.1.0-rc2/qemu-ppc64 -cpu power7 ./twide1 qemu-ppc64: ./twide1: Invalid ELF image for this architecture So qemu-ppc64 (7.0.0 and 7.1.0-rc2) is only able to run 64-bit binaries. Pierre
Re: [PATCH] hw/riscv: opentitan: bump opentitan version
On Fri, Aug 12, 2022 at 10:54 AM Wilfred Mallawa wrote: > > From: Wilfred Mallawa > > The following patch updates opentitan to match the new configuration, > as per, lowRISC/opentitan@217a0168ba118503c166a9587819e3811eeb0c0c > > Note: with this patch we now skip the usage of the opentitan > `boot_rom`. The Opentitan boot rom contains hw verification > for devies which we are currently not supporting in qemu. As of now, > the `boot_rom` has no major significance, however, would be good to > support in the future. > > Tested by running utests from the latest tock [1] > (that supports this version of OT). > > [1] https://github.com/tock/tock/pull/3056 > > Signed-off-by: Wilfred Mallawa Reviewed-by: Alistair Francis Alistair > --- > hw/riscv/opentitan.c | 12 > include/hw/riscv/opentitan.h | 11 ++- > 2 files changed, 14 insertions(+), 9 deletions(-) > > diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c > index 4495a2c039..af13dbe3b1 100644 > --- a/hw/riscv/opentitan.c > +++ b/hw/riscv/opentitan.c > @@ -29,9 +29,9 @@ > #include "sysemu/sysemu.h" > > static const MemMapEntry ibex_memmap[] = { > -[IBEX_DEV_ROM] ={ 0x8000, 16 * KiB }, > -[IBEX_DEV_RAM] ={ 0x1000, 0x1 }, > -[IBEX_DEV_FLASH] = { 0x2000, 0x8 }, > +[IBEX_DEV_ROM] ={ 0x8000, 0x8000 }, > +[IBEX_DEV_RAM] ={ 0x1000, 0x2 }, > +[IBEX_DEV_FLASH] = { 0x2000, 0x10 }, > [IBEX_DEV_UART] = { 0x4000, 0x1000 }, > [IBEX_DEV_GPIO] = { 0x4004, 0x1000 }, > [IBEX_DEV_SPI_DEVICE] = { 0x4005, 0x1000 }, > @@ -40,6 +40,7 @@ static const MemMapEntry ibex_memmap[] = { > [IBEX_DEV_TIMER] = { 0x4010, 0x1000 }, > [IBEX_DEV_SENSOR_CTRL] ={ 0x4011, 0x1000 }, > [IBEX_DEV_OTP_CTRL] = { 0x4013, 0x4000 }, > +[IBEX_DEV_LC_CTRL] ={ 0x4014, 0x1000 }, > [IBEX_DEV_USBDEV] = { 0x4015, 0x1000 }, > [IBEX_DEV_SPI_HOST0] = { 0x4030, 0x1000 }, > [IBEX_DEV_SPI_HOST1] = { 0x4031, 0x1000 }, > @@ -141,7 +142,8 @@ static void lowrisc_ibex_soc_realize(DeviceState > *dev_soc, Error **errp) > &error_abort); > object_property_set_int(OBJECT(&s->cpus), "num-harts", ms->smp.cpus, > &error_abort); > -object_property_set_int(OBJECT(&s->cpus), "resetvec", 0x8080, > &error_abort); > +object_property_set_int(OBJECT(&s->cpus), "resetvec", 0x2490, > +&error_abort); > sysbus_realize(SYS_BUS_DEVICE(&s->cpus), &error_fatal); > > /* Boot ROM */ > @@ -253,6 +255,8 @@ static void lowrisc_ibex_soc_realize(DeviceState > *dev_soc, Error **errp) > memmap[IBEX_DEV_SENSOR_CTRL].base, > memmap[IBEX_DEV_SENSOR_CTRL].size); > create_unimplemented_device("riscv.lowrisc.ibex.otp_ctrl", > memmap[IBEX_DEV_OTP_CTRL].base, memmap[IBEX_DEV_OTP_CTRL].size); > +create_unimplemented_device("riscv.lowrisc.ibex.lc_ctrl", > +memmap[IBEX_DEV_LC_CTRL].base, memmap[IBEX_DEV_LC_CTRL].size); > create_unimplemented_device("riscv.lowrisc.ibex.pwrmgr", > memmap[IBEX_DEV_PWRMGR].base, memmap[IBEX_DEV_PWRMGR].size); > create_unimplemented_device("riscv.lowrisc.ibex.rstmgr", > diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h > index 68892cd8e5..26d960f288 100644 > --- a/include/hw/riscv/opentitan.h > +++ b/include/hw/riscv/opentitan.h > @@ -74,6 +74,7 @@ enum { > IBEX_DEV_TIMER, > IBEX_DEV_SENSOR_CTRL, > IBEX_DEV_OTP_CTRL, > +IBEX_DEV_LC_CTRL, > IBEX_DEV_PWRMGR, > IBEX_DEV_RSTMGR, > IBEX_DEV_CLKMGR, > @@ -105,11 +106,11 @@ enum { > IBEX_UART0_RX_BREAK_ERR_IRQ = 6, > IBEX_UART0_RX_TIMEOUT_IRQ = 7, > IBEX_UART0_RX_PARITY_ERR_IRQ = 8, > -IBEX_TIMER_TIMEREXPIRED0_0= 126, > -IBEX_SPI_HOST0_ERR_IRQ= 150, > -IBEX_SPI_HOST0_SPI_EVENT_IRQ = 151, > -IBEX_SPI_HOST1_ERR_IRQ= 152, > -IBEX_SPI_HOST1_SPI_EVENT_IRQ = 153, > +IBEX_TIMER_TIMEREXPIRED0_0= 127, > +IBEX_SPI_HOST0_ERR_IRQ= 151, > +IBEX_SPI_HOST0_SPI_EVENT_IRQ = 152, > +IBEX_SPI_HOST1_ERR_IRQ= 153, > +IBEX_SPI_HOST1_SPI_EVENT_IRQ = 154, > }; > > #endif > -- > 2.37.1 > >
Re: Missing dll
Am 12.08.22 um 01:34 schrieb Philippe Mathieu-Daudé: Cc'ing qemu-windows@ team On 10/8/22 23:42, Peter Butler wrote: In x64 win10 I today I d/l QEMU into new directory. Then navigated to that dir and… qemu-system-aarch64 -boot d -cdrom f:\Downloads\debian-11.4.0-arm64-netinst.iso -m 2048 Error message:…libncursesw6.dll not found… This should be fixed since 2022-08-09. From notes on https://qemu.weilnetz.de/w64/: History *2022-08-09*: New QEMU installers (7.1.0-rc1). Added missing DLL files. *2022-08-05*: New QEMU installers (7.1.0-rc1). Missing DLL files. * *
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
This is the v7 of this series which tries to implement the fd-based KVM guest private memory. The patches are based on latest kvm/queue branch commit: b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU split_desc_cache only by default capacity Introduction In general this patch series introduce fd-based memslot which provides guest memory through memory file descriptor fd[offset,size] instead of hva/size. The fd can be created from a supported memory filesystem like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM and the the memory backing store exchange callbacks when such memslot gets created. At runtime KVM will call into callbacks provided by the backing store to get the pfn with the fd+offset. Memory backing store will also call into KVM callbacks when userspace punch hole on the fd to notify KVM to unmap secondary MMU page table entries. Comparing to existing hva-based memslot, this new type of memslot allows guest memory unmapped from host userspace like QEMU and even the kernel itself, therefore reduce attack surface and prevent bugs. Based on this fd-based memslot, we can build guest private memory that is going to be used in confidential computing environments such as Intel TDX and AMD SEV. When supported, the memory backing store can provide more enforcement on the fd and KVM can use a single memslot to hold both the private and shared part of the guest memory. mm extension - Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file created with these flags cannot read(), write() or mmap() etc via normal MMU operations. The file content can only be used with the newly introduced memfile_notifier extension. The memfile_notifier extension provides two sets of callbacks for KVM to interact with the memory backing store: - memfile_notifier_ops: callbacks for memory backing store to notify KVM when memory gets invalidated. - backing store callbacks: callbacks for KVM to call into memory backing store to request memory pages for guest private memory. The memfile_notifier extension also provides APIs for memory backing store to register/unregister itself and to trigger the notifier when the bookmarked memory gets invalidated. The patchset also introduces a new memfd seal F_SEAL_AUTO_ALLOCATE to prevent double allocation caused by unintentional guest when we only have a single side of the shared/private memfds effective. memslot extension - Add the private fd and the fd offset to existing 'shared' memslot so that both private/shared guest memory can live in one single memslot. A page in the memslot is either private or shared. Whether a guest page is private or shared is maintained through reusing existing SEV ioctls KVM_MEMORY_ENCRYPT_{UN,}REG_REGION. Test To test the new functionalities of this patch TDX patchset is needed. Since TDX patchset has not been merged so I did two kinds of test: - Regresion test on kvm/queue (this patchset) Most new code are not covered. Code also in below repo: https://github.com/chao-p/linux/tree/privmem-v7 - New Funational test on latest TDX code The patch is rebased to latest TDX code and tested the new funcationalities. See below repos: Linux: https://github.com/chao-p/linux/tree/privmem-v7-tdx QEMU: https://github.com/chao-p/qemu/tree/privmem-v7 While debugging an issue with SEV+UPM, found that fallocate() returns an error in QEMU which is not handled (EINTR). With the below handling of EINTR subsequent fallocate() succeeds: diff --git a/backends/hostmem-memfd-private.c b/backends/hostmem-memfd-private.c index af8fb0c957..e8597ed28d 100644 --- a/backends/hostmem-memfd-private.c +++ b/backends/hostmem-memfd-private.c @@ -39,7 +39,7 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) MachineState *machine = MACHINE(qdev_get_machine()); uint32_t ram_flags; char *name; - int fd, priv_fd; + int fd, priv_fd, ret; if (!backend->size) { error_setg(errp, "can't create backend with size 0"); @@ -65,7 +65,15 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) backend->size, ram_flags, fd, 0, errp); g_free(name); - fallocate(priv_fd, 0, 0, backend->size); +again: + ret = fallocate(priv_fd, 0, 0, backend->size); + if (ret) { + perror("Fallocate failed: \n"); + if (errno == EINTR) + goto again; + else + exit(1); + } However, fallocate() preallocates full guest memory before starting the guest. With this behaviour guest memory is *not* demand pinned. Is there a way to prevent fallocate() from reserving full guest memory? Isn't the pinning being handled by the corresponding host memory backend with mmu > notifier and architecture support while doing the memory op
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
On Thu, 11 Aug 2022 23:05:52 -0300 Murilo Opsfelder Araújo wrote: > On 8/11/22 11:02, Daniel P. Berrangé wrote: > [...] > >>> Hmm, I was hoping you could just use SIGKILL to guarantee that this > >>> gets killed off. Is SIGKILL delivered too soon to allow for the > >>> main QEMU process to have exited quickly ? > >> > >> yes, I tried. qemu has not finished exiting when the signal is > >> delivered, the cleanup process dies before qemu, which defeats the > >> purpose > > > > Ok, too bad. > > > >>> If so I wonder what happens when systemd just delivers SIGKILL to > >>> all processes in the cgroup - I'm not sure there's a guarantee it > >>> will SIGKILL the main qemu before it SIGKILLs this helper > >> > >> I'm afraid in that case there is no guarantee. > >> > >> for what it's worth, both virsh shutdown and destroy seem to do things > >> properly. > > > > Hmm, probably because libvirt tells QEMU to exit before systemd comes > > along and tells everything in the cgroup to die with SIGKILL. > > It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10 > seconds after Libvirt sent SIGTERM: > > https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375 but this is fine. with asynchronous teardown, qemu will exit almost immediately when receiving SIGTERM, and the cleanup process will start cleaning up. > > So I guess this patch happened to work with Libvirt because the main qemu > process terminated before the timeout and before SIGKILL was delivered. it seems so > > The cleanup process is trying to solve the problem where the main qemu process > takes too long to terminate. However, if the cleanup process itself takes too > long, SIGKILL will be sent by Libvirt anyway. but that is not a problem, the sole purpose of the cleanup process is to terminate _after_ qemu. it doesn't matter what happens after qemu has terminated. if you look at the patch, after going to great lengths to assure that qemu has terminated, all the child process does is _exit(0). > > Perhaps we can describe this situation in the parameter help, e.g.: If > management layer decides to send SIGKILL (e.g.: due to timeout or deliberate > decision), the cleanup process can exit before the main process, deceiving its > purpose. if the management layer (or the user) decides to send SIGKILL immediately to the whole cgroup without sending SIGTERM first, then this whole asynchronous teardown mechanism is defeated, yes.
[PATCH for-7.1?] Fix some typos in documentation (most of them found by codespell)
Signed-off-by: Stefan Weil --- docs/about/deprecated.rst | 2 +- docs/specs/acpi_erst.rst| 4 ++-- docs/system/devices/canokey.rst | 8 docs/system/devices/cxl.rst | 12 ++-- 4 files changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index 7ee26626d5..91b03115ee 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -297,7 +297,7 @@ by using ``-machine graphics=off``. In QEMU versions 6.1, 6.2 and 7.0, the ``nvme-ns`` generates an EUI-64 -identifer that is not globally unique. If an EUI-64 identifer is required, the +identifier that is not globally unique. If an EUI-64 identifier is required, the user must set it explicitly using the ``nvme-ns`` device parameter ``eui64``. ``-device nvme,use-intel-id=on|off`` (since 7.1) diff --git a/docs/specs/acpi_erst.rst b/docs/specs/acpi_erst.rst index a8a9d22d25..2339b60ad7 100644 --- a/docs/specs/acpi_erst.rst +++ b/docs/specs/acpi_erst.rst @@ -108,7 +108,7 @@ Slot 0 contains a backend storage header that identifies the contents as ERST and also facilitates efficient access to the records. Depending upon the size of the backend storage, additional slots will be designated to be a part of the slot 0 header. For example, at 8KiB, -the slot 0 header can accomodate 1021 records. Thus a storage size +the slot 0 header can accommodate 1021 records. Thus a storage size of 8MiB (8KiB * 1024) requires an additional slot for use by the header. In this scenario, slot 0 and slot 1 form the backend storage header, and records can be stored starting at slot 2. @@ -196,5 +196,5 @@ References [2] "Unified Extensible Firmware Interface Specification", version 2.1, October 2008. -[3] "Windows Hardware Error Architecture", specfically +[3] "Windows Hardware Error Architecture", specifically "Error Record Persistence Mechanism". diff --git a/docs/system/devices/canokey.rst b/docs/system/devices/canokey.rst index c2c58ae3e7..cfa6186e48 100644 --- a/docs/system/devices/canokey.rst +++ b/docs/system/devices/canokey.rst @@ -28,9 +28,9 @@ With the same software configuration as a hardware key, the guest OS can use all the functionalities of a secure key as if there was actually an hardware key plugged in. -CanoKey QEMU provides much convenience for debuging: +CanoKey QEMU provides much convenience for debugging: -* libcanokey-qemu supports debuging output thus developers can +* libcanokey-qemu supports debugging output thus developers can inspect what happens inside a secure key * CanoKey QEMU supports trace event thus event * QEMU USB stack supports pcap thus USB packet between the guest @@ -102,8 +102,8 @@ and find CanoKey QEMU there: You may setup the key as guided in [6]_. The console for the key is at [7]_. -Debuging - +Debugging += CanoKey QEMU consists of two parts, ``libcanokey-qemu.so`` and ``canokey.c``, the latter of which resides in QEMU. The former provides core functionality diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst index 36031325cc..f25783a4ec 100644 --- a/docs/system/devices/cxl.rst +++ b/docs/system/devices/cxl.rst @@ -83,7 +83,7 @@ CXL Fixed Memory Windows (CFMW) A CFMW consists of a particular range of Host Physical Address space which is routed to particular CXL Host Bridges. At time of generic software initialization it will have a particularly interleaving -configuration and associated Quality of Serice Throtling Group (QTG). +configuration and associated Quality of Service Throttling Group (QTG). This information is available to system software, when making decisions about how to configure interleave across available CXL memory devices. It is provide as CFMW Structures (CFMWS) in @@ -98,7 +98,7 @@ specification defined register interface called CXL Host Bridge Component Registers (CHBCR). The location of this CHBCR MMIO space is described to system software via a CXL Host Bridge Structure (CHBS) in the CEDT ACPI table. The actual interfaces -are identical to those used for other parts of the CXL heirarchy +are identical to those used for other parts of the CXL hierarchy as CXL Component Registers in PCI BARs. Interfaces provided include: @@ -143,7 +143,7 @@ CXL Memory Devices - Type 3 ~~~ CXL type 3 devices use a PCI class code and are intended to be supported by a generic operating system driver. They have HDM decoders -though in these EP devices, the decoder is reponsible not for +though in these EP devices, the decoder is responsible not for routing but for translation of the incoming host physical address (HPA) into a Device Physical Address (DPA). @@ -209,7 +209,7 @@ Notes: ranges of the system physical address map. Each CFMW has particular interleave setup across the CXL Host Bridges (HB) CFMW0 provides uninterleaved access to HB0, CFW2 p
Re: [RFC] Testing 7.1.0-rc2, qemu-ppc does not give valid disassembly
On Thu, 11 Aug 2022 at 22:26, Pierre Muller wrote: >But as I use machines on which I am not admin, > I needed to compile capstone locally, install it inside my home dir, > and export PKG_CONFIG_PATH to allow the meson configuration > to correctly detect this local installation... Yes, like all of QEMU's many library dependencies, if you aren't in a position to get them installed as system libraries you'll need to sort them out locally as an individual user. This, to the extent it's a pain, is a Linux distro problem, not a QEMU problem. >However, this is not optimal, especially if I want to be able to > copy over the resulting binaries to other test machines, > on which the configuration completely fails, > like gcc188 for which the current clib is too old according to > the configure requirements. >>Is it really required to have glibc 2.56? Do you mean glib or glibc ? The two are different... > On several of these test machines, the version is much older... > I tried to force acceptance by reducing the requirement, > and it did compile successfully after that. > >Is there a solid reason to be so restrictive? We try not to arbitrarily bump the version requirements, so usually when we do there is a reason. More generally, you should check out the "supported build platforms" information at https://www.qemu.org/docs/master/about/build-platforms.html For Linux distros, we support the most recent major version, and we support the major version prior to that for 2 years after it's been superseded. We don't bump things like glib version requirements if this would break one of our supported build platforms. So if you're running into a version problem with the system version of a library, this means that you're trying to build on a host platform which is not in our supported set, probably because it is simply too old. You can always expect that that is going to be potentially awkward and that you're going to have to carry local workarounds or build newer versions of dependent libraries locally. The fix is to upgrade the machine to a newer version of the distro. thanks -- PMM
[BUG] Qemu abort with error "kvm_mem_ioeventfd_add: error adding ioeventfd: File exists (17)"
Hi list, When I did some tests in my virtual domain with live-attached virtio deivces, I got a coredump file of Qemu. The error print from qemu is "kvm_mem_ioeventfd_add: error adding ioeventfd: File exists (17)". And the call trace in the coredump file displays as below: #0 0x89acecc8 in ?? () from /usr/lib64/libc.so.6 #1 0x89a8acbc in raise () from /usr/lib64/libc.so.6 #2 0x89a78d2c in abort () from /usr/lib64/libc.so.6 #3 0xbd7ccf1c in kvm_mem_ioeventfd_add (listener=, section=, match_data=, data=, e=) at ../accel/kvm/kvm-all.c:1607 #4 0xbd6e0304 in address_space_add_del_ioeventfds (fds_old_nb=164, fds_old=0x5c80a1d0, fds_new_nb=160, fds_new=0x5c565080, as=0xbdfa8810 ) at ../softmmu/memory.c:795 #5 address_space_update_ioeventfds (as=0xbdfa8810 ) at ../softmmu/memory.c:856 #6 0xbd6e24d8 in memory_region_commit () at ../softmmu/memory.c:1113 #7 0xbd6e25c4 in memory_region_transaction_commit () at ../softmmu/memory.c:1144 #8 0xbd394eb4 in pci_bridge_update_mappings (br=br@entry=0xe755f7c0) at ../hw/pci/pci_bridge.c:248 #9 0xbd394f4c in pci_bridge_write_config (d=0xe755f7c0, address=44, val=, len=4) at ../hw/pci/pci_bridge.c:272 #10 0xbd39a928 in rp_write_config (d=0xe755f7c0, address=44, val=128, len=4) at ../hw/pci-bridge/pcie_root_port.c:39 #11 0xbd6df328 in memory_region_write_accessor (mr=0xe63898d0, addr=65580, value=, size=4, shift=, mask=, attrs=...) at ../softmmu/memory.c:494 #12 0xbd6dcb6c in access_with_adjusted_size (addr=addr@entry=65580, value=value@entry=0x817adc78, size=size@entry=4, access_size_min=, access_size_max=, access_fn=access_fn@entry=0xbd6df284 , mr=mr@entry=0xe63898d0, attrs=attrs@entry=...) at ../softmmu/memory.c:556 #13 0xbd6e0dc8 in memory_region_dispatch_write (mr=mr@entry=0xe63898d0, addr=65580, data=, op=MO_32, attrs=attrs@entry=...) at ../softmmu/memory.c:1534 #14 0xbd6d0574 in flatview_write_continue (fv=fv@entry=0x5c02da00, addr=addr@entry=275146407980, attrs=attrs@entry=..., ptr=ptr@entry=0x8aa8c028, len=len@entry=4, addr1=, l=, mr=mr@entry=0xe63898d0) at /usr/src/debug/qemu-6.2.0-226.aarch64/include/qemu/host-utils.h:165 #15 0xbd6d4584 in flatview_write (len=4, buf=0x8aa8c028, attrs=..., addr=275146407980, fv=0x5c02da00) at ../softmmu/physmem.c:3375 #16 address_space_write (as=, addr=275146407980, attrs=..., buf=buf@entry=0x8aa8c028, len=4) at ../softmmu/physmem.c:3467 #17 0xbd6d462c in address_space_rw (as=, addr=, attrs=..., attrs@entry=..., buf=buf@entry=0x8aa8c028, len=, is_write=) at ../softmmu/physmem.c:3477 #18 0xbd7cf6e8 in kvm_cpu_exec (cpu=cpu@entry=0xe625dfd0) at ../accel/kvm/kvm-all.c:2970 #19 0xbd7d09bc in kvm_vcpu_thread_fn (arg=arg@entry=0xe625dfd0) at ../accel/kvm/kvm-accel-ops.c:49 #20 0xbd94ccd8 in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:559 By printing more info in the coredump file, I found that the addr of fds_old[146] and fds_new[146] are same, but fds_old[146] belonged to a live-attached virtio-scsi device while fds_new[146] was owned by another live-attached virtio-net. The reason why addr conflicted was then been found from vm's console log. Just before qemu aborted, the guest kernel crashed and kdump.service booted the dump-capture kernel where re-alloced address for the devices. Because those virtio devices were live-attached after vm creating, different addr may been assigned to them in the dump-capture kernel: the initial kernel booting log: [1.663297] pci :00:02.1: BAR 14: assigned [mem 0x1190-0x11af] [1.664560] pci :00:02.1: BAR 15: assigned [mem 0x800180-0x80019f 64bit pref] the dump-capture kernel booting log: [1.845211] pci :00:02.0: BAR 14: assigned [mem 0x1190-0x11bf] [1.846542] pci :00:02.0: BAR 15: assigned [mem 0x800180-0x8001af 64bit pref] I think directly aborting the qemu process may not be the best choice in this case cuz it will interrupt the work of kdump.service so that failed to generate memory dump of the crashed guest kernel. Perhaps, IMO, the error could be simply ignored in this case and just let kdump to reboot the system after memory-dump finishing, but I failed to find a suitable judgment in the codes. Any solution for this problem? Hope I can get some helps here. Hao
Re: [PATCH for-7.1?] Fix some typos in documentation (most of them found by codespell)
On Fri, Aug 12, 2022 at 09:56:42AM +0200, Stefan Weil wrote: > diff --git a/docs/system/devices/canokey.rst b/docs/system/devices/canokey.rst > index c2c58ae3e7..cfa6186e48 100644 > --- a/docs/system/devices/canokey.rst > +++ b/docs/system/devices/canokey.rst > @@ -28,9 +28,9 @@ With the same software configuration as a hardware key, > the guest OS can use all the functionalities of a secure key as if > there was actually an hardware key plugged in. > > -CanoKey QEMU provides much convenience for debuging: > +CanoKey QEMU provides much convenience for debugging: > > -* libcanokey-qemu supports debuging output thus developers can > +* libcanokey-qemu supports debugging output thus developers can >inspect what happens inside a secure key > * CanoKey QEMU supports trace event thus event > * QEMU USB stack supports pcap thus USB packet between the guest > @@ -102,8 +102,8 @@ and find CanoKey QEMU there: > > You may setup the key as guided in [6]_. The console for the key is at [7]_. > > -Debuging > - > +Debugging > += > > CanoKey QEMU consists of two parts, ``libcanokey-qemu.so`` and ``canokey.c``, > the latter of which resides in QEMU. The former provides core functionality Reviewed-by: Hongren (Zenithal) Zheng
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
On 12/08/22 12:48, Gupta, Pankaj wrote: > >> >> However, fallocate() preallocates full guest memory before starting the >> guest. >> With this behaviour guest memory is *not* demand pinned. Is there a way >> to >> prevent fallocate() from reserving full guest memory? > > Isn't the pinning being handled by the corresponding host memory backend > with mmu > notifier and architecture support while doing the memory > operations e.g page> migration and swapping/reclaim (not supported > currently AFAIU). But yes, we need> to allocate entire guest memory with > the new flags MEMFILE_F_{UNMOVABLE, UNRECLAIMABLE etc}. That is correct, but the question is when does the memory allocated, as these flags are set, memory is neither moved nor reclaimed. In current scenario, if I start a 32GB guest, all 32GB is allocated. >>> >>> I guess so if guest memory is private by default. >>> >>> Other option would be to allocate memory as shared by default and >>> handle on demand allocation and RMPUPDATE with page state change event. But >>> still that would be done at guest boot time, IIUC. >> >> Sorry! Don't want to hijack the other thread so replying here. >> >> I thought the question is for SEV SNP. For SEV, maybe the hypercall with the >> page state information can be used to allocate memory as we use it or >> something like quota based memory allocation (just thinking). > > But all this would have considerable performance overhead (if by default > memory is shared) and used mostly at boot time. > So, preallocating memory (default memory private) seems better approach for > both SEV & SEV SNP with later page management (pinning, reclaim) taken care > by host memory backend & architecture together. I am not sure how will pre-allocating memory help, even if guest would not use full memory it will be pre-allocated. Which if I understand correctly is not expected. Regards Nikunj
[PATCH for-7.1] docs/system/loongarch: Update the LoongArch document
1. Add some information about how to boot the LoongArch virt machine by uefi bios and linux kernel and how to access the source code or binary file. 2. Move the explanation of LoongArch system emulation in the target/loongarch/README to docs/system/loongarch/loongson3.rst Signed-off-by: Xiaojuan Yang --- docs/system/loongarch/loongson3.rst | 104 +--- target/loongarch/README | 49 + 2 files changed, 97 insertions(+), 56 deletions(-) diff --git a/docs/system/loongarch/loongson3.rst b/docs/system/loongarch/loongson3.rst index fa3acd01c0..66bcd28030 100644 --- a/docs/system/loongarch/loongson3.rst +++ b/docs/system/loongarch/loongson3.rst @@ -15,27 +15,115 @@ The ``virt`` machine supports: - Gpex host bridge - Ls7a RTC device - Ls7a IOAPIC device -- Ls7a ACPI device +- ACPI GED device - Fw_cfg device - PCI/PCIe devices - Memory device -- CPU device. Type: Loongson-3A5000. +- CPU device. Type: la464-loongarch-cpu. CPU and machine Type The ``qemu-system-loongarch64`` provides emulation for virt machine. You can specify the machine type ``virt`` and -cpu type ``Loongson-3A5000``. +cpu type ``la464-loongarch-cpu``. Boot options -Now the ``virt`` machine can run test program in ELF format and the -method of compiling is in target/loongarch/README. +We can boot the LoongArch virt machine by specifying the uefi bios, +initrd, and linux kernel. And those source codes and binary files +can be accessed by following steps. + +(1) booting command: + +.. code-block:: bash + + $ qemu-system-loongarch64 -machine virt -m 4G -cpu la464-loongarch-cpu \ + -smp 1 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd initrd.img \ + -append "root=/dev/ram rdinit=/sbin/init consol e=ttyS0,115200" \ + --nographic + +Note: The running speed may be a little slow, as the performance of our +qemu and uefi bios is not perfect, and it is being fixed. + +(2) cross compiler tools: + +.. code-block:: bash + + wget https://github.com/loongson/build-tools/releases/download/ \ + 2022.05.29/loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz + + tar -vxf loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz + +(3) qemu compile configure option: + +.. code-block:: bash + + ./configure --disable-rdma --disable-pvrdma --prefix=usr \ + --target-list="loongarch64-softmmu" \ + --disable-libiscsi --disable-libnfs --disable-libpmem \ + --disable-glusterfs --enable-libusb --enable-usb-redir \ + --disable-opengl --disable-xen --enable-spice \ + --enable-debug --disable-capstone --disable-kvm \ + --enable-profiler + make + +(4) uefi bios source code and compile method: + +.. code-block:: bash + + git clone https://github.com/loongson/edk2-LoongarchVirt.git + + cd edk2-LoongarchVirt + + git submodule update --init + + export PATH=$YOUR_COMPILER_PATH/bin:$PATH + + export WORKSPACE=`pwd` + + export PACKAGES_PATH=$WORKSPACE/edk2-LoongarchVirt + + export GCC5_LOONGARCH64_PREFIX=loongarch64-unknown-linux-gnu- + + edk2-LoongarchVirt/edksetup.sh + + make -C edk2-LoongarchVirt/BaseTools + + build --buildtarget=DEBUG --tagname=GCC5 --arch=LOONGARCH64 --platform=OvmfPkg/LoongArchQemu/Loongson.dsc + + build --buildtarget=RELEASE --tagname=GCC5 --arch=LOONGARCH64 --platform=OvmfPkg/LoongArchQemu/Loongson.dsc + +The efi binary file path: + + Build/LoongArchQemu/DEBUG_GCC5/FV/QEMU_EFI.fd + + Build/LoongArchQemu/RELEASE_GCC5/FV/QEMU_EFI.fd + +(5) linux kernel source code and compile method: .. code-block:: bash - $ qemu-system-loongarch64 -machine virt -m 4G -cpu Loongson-3A5000 \ - -smp 1 -kernel hello -monitor none -display none \ - -chardev file,path=hello.out,id=output -serial chardev:output + git clone https://github.com/loongson/linux.git + + export PATH=$YOUR_COMPILER_PATH/bin:$PATH + + export LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/lib:$LD_LIBRARY_PATH + + export LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/loongarch64-unknown-linux-gnu/lib/:$LD_LIBRARY_PATH + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- loongson3_defconfig + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- install + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- modules_install + +Note: The branch of linux source code is loongarch-next. + +(6) initrd file: + + You can use busybox tool and the linux modules to make a initrd file. Or you can access the + binary files: https://github.com/yangxiaojuan-loongson/qemu-binary diff --git a/target/loongarch/README b/target/loongarch/README index 1823375d04..0b9dc0d40a 100644 --- a/target/loongarch/README +++ b/target/loongarch/README @@ -11,54 +11,7 @@ - System emulation - Mainly emulate a virt 3A5000 board and ls7a bridge that is not exactly the same as the host. - 3A5000 support multiple interrupt
Re: [PATCH 2/3] hw/ssi: fixup coverity issue
On Fri, Aug 12, 2022 at 02:21:40AM +, Wilfred Mallawa wrote: > On Thu, 2022-08-11 at 16:23 +0200, Andrew Jones wrote: > > On Thu, Aug 11, 2022 at 09:02:00AM +1000, Wilfred Mallawa wrote: > > > From: Wilfred Mallawa > > > > > > This patch addresses the coverity issues specified in [1], > > > as suggested, `FIELD_DP32()`/`FIELD_EX32()` macros have been > > > implemented to clean up the code. > > > > > > Additionally, the `EVENT_ENABLE` register is correctly updated > > > to addr of `0x34`. > > > > This sounds like separate patch material. > > > > > > > > [1] > > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg887713.html > > > > > > Fixes: Coverity CID 1488107 > > > > > > Signed-off-by: Wilfred Mallawa > > > --- > > > hw/ssi/ibex_spi_host.c | 141 +++-- > > > > > > 1 file changed, 78 insertions(+), 63 deletions(-) > > > > > > diff --git a/hw/ssi/ibex_spi_host.c b/hw/ssi/ibex_spi_host.c > > > index 601041d719..8c35bfa95f 100644 > > > --- a/hw/ssi/ibex_spi_host.c > > > +++ b/hw/ssi/ibex_spi_host.c > > > @@ -93,7 +93,7 @@ REG32(ERROR_STATUS, 0x30) > > > FIELD(ERROR_STATUS, CMDINVAL, 3, 1) > > > FIELD(ERROR_STATUS, CSIDINVAL, 4, 1) > > > FIELD(ERROR_STATUS, ACCESSINVAL, 5, 1) > > > -REG32(EVENT_ENABLE, 0x30) > > > +REG32(EVENT_ENABLE, 0x34) > > > FIELD(EVENT_ENABLE, RXFULL, 0, 1) > > > FIELD(EVENT_ENABLE, TXEMPTY, 1, 1) > > > FIELD(EVENT_ENABLE, RXWM, 2, 1) > > > @@ -108,18 +108,20 @@ static inline uint8_t div4_round_up(uint8_t > > > dividend) > > > > > > static void ibex_spi_rxfifo_reset(IbexSPIHostState *s) > > > { > > > + uint32_t data = s->regs[IBEX_SPI_HOST_STATUS]; > > > /* Empty the RX FIFO and assert RXEMPTY */ > > > fifo8_reset(&s->rx_fifo); > > > - s->regs[IBEX_SPI_HOST_STATUS] &= ~R_STATUS_RXFULL_MASK; > > > - s->regs[IBEX_SPI_HOST_STATUS] |= R_STATUS_RXEMPTY_MASK; > > > + data = FIELD_DP32(data, STATUS, RXEMPTY, 1); > > > + s->regs[IBEX_SPI_HOST_STATUS] = data; > > > } > > > > > > static void ibex_spi_txfifo_reset(IbexSPIHostState *s) > > > { > > > + uint32_t data = s->regs[IBEX_SPI_HOST_STATUS]; > > > /* Empty the TX FIFO and assert TXEMPTY */ > > > fifo8_reset(&s->tx_fifo); > > > - s->regs[IBEX_SPI_HOST_STATUS] &= ~R_STATUS_TXFULL_MASK; > > > - s->regs[IBEX_SPI_HOST_STATUS] |= R_STATUS_TXEMPTY_MASK; > > > + data = FIELD_DP32(data, STATUS, TXEMPTY, 1); > > > + s->regs[IBEX_SPI_HOST_STATUS] = data; > > > } > > > > > > static void ibex_spi_host_reset(DeviceState *dev) > > > @@ -162,37 +164,41 @@ static void ibex_spi_host_reset(DeviceState > > > *dev) > > > */ > > > static void ibex_spi_host_irq(IbexSPIHostState *s) > > > { > > > - bool error_en = s->regs[IBEX_SPI_HOST_INTR_ENABLE] > > > - & R_INTR_ENABLE_ERROR_MASK; > > > - bool event_en = s->regs[IBEX_SPI_HOST_INTR_ENABLE] > > > - & R_INTR_ENABLE_SPI_EVENT_MASK; > > > - bool err_pending = s->regs[IBEX_SPI_HOST_INTR_STATE] > > > - & R_INTR_STATE_ERROR_MASK; > > > - bool status_pending = s->regs[IBEX_SPI_HOST_INTR_STATE] > > > - & R_INTR_STATE_SPI_EVENT_MASK; > > > + bool error_en = FIELD_EX32(s->regs[IBEX_SPI_HOST_INTR_ENABLE], > > > + INTR_ENABLE, ERROR); > > > + > > > + bool event_en = FIELD_EX32(s->regs[IBEX_SPI_HOST_INTR_ENABLE], > > > + INTR_ENABLE, SPI_EVENT); > > > + > > > + bool err_pending = FIELD_EX32(s- > > > >regs[IBEX_SPI_HOST_INTR_STATE], > > > + INTR_STATE, ERROR); > > > + > > > + bool status_pending = FIELD_EX32(s- > > > >regs[IBEX_SPI_HOST_INTR_STATE], > > > + INTR_STATE, SPI_EVENT); > > > > uint32_t reg = s->regs[IBEX_SPI_HOST_INTR_ENABLE]; > > bool error_en = FIELD_EX32(reg, INTR_ENABLE, ERROR); > > bool event_en = FIELD_EX32(reg, INTR_ENABLE, SPI_EVENT); > > ... > > > > would like nicer, IMHO. > as per the comment below, do you mean to make all the conditions > variables here? and substitute those for the if statements instead of > the `FIELD_..` macros? For the above code, the suggestion I gave is what I'm thinking. For the below code, keep the FIELD macros in place, but shrink the length of the line by doing uint32_t enable = s->regs[IBEX_SPI_HOST_ERROR_ENABLE]; uint32_t status = s->regs[IBEX_SPI_HOST_ERROR_STATUS]; at the top, and then } else if (FIELD_EX32(enable, ERROR_ENABLE, CMDBUSY) && FIELD_EX32(status, ERROR_STATUS, CMDBUSY)) { ... > > > > > > > + > > > int err_irq = 0, event_irq = 0; > > > > > > /* Error IRQ enabled and Error IRQ Cleared */ > > > if (error_en && !err_pending) { > > > /* Event enabled, Interrupt Test Error */ > > > - if (s->regs[IBEX_SPI_HOST_INTR_TEST] & > > > R_INTR_TEST_ERROR_MASK) { > > > + if (FIELD_EX32(s->regs[IBEX_SPI_HOST_INT
Re: qemu-system-aarch64: Failed to retrieve host CPU features
I've added some more relevant mailing lists to the cc. On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov wrote: > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote: > > I noticed that we starting to get many errors like this: > > > > qemu-system-aarch64: Failed to retrieve host CPU features > > > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and > > Linux kernel is v5.15.59, but it started to appear months before that. > > > > strace shows in erroneous case: > > > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system > > call) > > > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu: > > > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size); > > if (vmfd < 0) { > > goto err; > > } > > > > Maybe it should restart ioctl on EINTR? > > > > I don't see EINTR documented in ioctl(2) nor in Linux' > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it > > says "an unmasked signal is pending". > > I am suggested that almost any blocking syscall could return EINTR, so I > checked the strace log and it does not show evidence of arriving a signal, > the log ends like this: > > 1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9 > 1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48 > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system call) > 1152244 close(9)= 0 > 1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, > st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, > st_blocks=0, st_size=0, st_atime=1660268019 /* > 2022-08-12T01:33:39.850436293+ */, st_atime_nsec=850436293, > st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+ */, > st_mtime_nsec=850436293, st_ctime=1660268019 /* > 2022-08-12T01:33:39.850436293+ */, st_ctime_nsec=850436293}, > AT_EMPTY_PATH) = 0 > 1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58 > 1152244 exit_group(1) = ? > 1152245 <... clock_nanosleep resumed> ) = ? > 1152245 +++ exited with 1 +++ > 1152244 +++ exited with 1 +++ KVM folks: should we expect that KVM_CREATE_VM might fail EINTR and need retrying? (I suspect the answer is "yes", given we do this in the generic code in kvm-all.c.) thanks -- PMM
Re: [PATCH for-7.1] docs/system/loongarch: Update the LoongArch document
On 2022/8/12 下午5:19, Xiaojuan Yang wrote: 1. Add some information about how to boot the LoongArch virt machine by uefi bios and linux kernel and how to access the source code or binary file. 2. Move the explanation of LoongArch system emulation in the target/loongarch/README to docs/system/loongarch/loongson3.rst Signed-off-by: Xiaojuan Yang --- docs/system/loongarch/loongson3.rst | 104 +--- target/loongarch/README | 49 + 2 files changed, 97 insertions(+), 56 deletions(-) Reviewed-by: Song Gao Thanks. Song Gao diff --git a/docs/system/loongarch/loongson3.rst b/docs/system/loongarch/loongson3.rst index fa3acd01c0..66bcd28030 100644 --- a/docs/system/loongarch/loongson3.rst +++ b/docs/system/loongarch/loongson3.rst @@ -15,27 +15,115 @@ The ``virt`` machine supports: - Gpex host bridge - Ls7a RTC device - Ls7a IOAPIC device -- Ls7a ACPI device +- ACPI GED device - Fw_cfg device - PCI/PCIe devices - Memory device -- CPU device. Type: Loongson-3A5000. +- CPU device. Type: la464-loongarch-cpu. CPU and machine Type The ``qemu-system-loongarch64`` provides emulation for virt machine. You can specify the machine type ``virt`` and -cpu type ``Loongson-3A5000``. +cpu type ``la464-loongarch-cpu``. Boot options -Now the ``virt`` machine can run test program in ELF format and the -method of compiling is in target/loongarch/README. +We can boot the LoongArch virt machine by specifying the uefi bios, +initrd, and linux kernel. And those source codes and binary files +can be accessed by following steps. + +(1) booting command: + +.. code-block:: bash + + $ qemu-system-loongarch64 -machine virt -m 4G -cpu la464-loongarch-cpu \ + -smp 1 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd initrd.img \ + -append "root=/dev/ram rdinit=/sbin/init consol e=ttyS0,115200" \ + --nographic + +Note: The running speed may be a little slow, as the performance of our +qemu and uefi bios is not perfect, and it is being fixed. + +(2) cross compiler tools: + +.. code-block:: bash + + wget https://github.com/loongson/build-tools/releases/download/ \ + 2022.05.29/loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz + + tar -vxf loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz + +(3) qemu compile configure option: + +.. code-block:: bash + + ./configure --disable-rdma --disable-pvrdma --prefix=usr \ + --target-list="loongarch64-softmmu" \ + --disable-libiscsi --disable-libnfs --disable-libpmem \ + --disable-glusterfs --enable-libusb --enable-usb-redir \ + --disable-opengl --disable-xen --enable-spice \ + --enable-debug --disable-capstone --disable-kvm \ + --enable-profiler + make + +(4) uefi bios source code and compile method: + +.. code-block:: bash + + git clone https://github.com/loongson/edk2-LoongarchVirt.git + + cd edk2-LoongarchVirt + + git submodule update --init + + export PATH=$YOUR_COMPILER_PATH/bin:$PATH + + export WORKSPACE=`pwd` + + export PACKAGES_PATH=$WORKSPACE/edk2-LoongarchVirt + + export GCC5_LOONGARCH64_PREFIX=loongarch64-unknown-linux-gnu- + + edk2-LoongarchVirt/edksetup.sh + + make -C edk2-LoongarchVirt/BaseTools + + build --buildtarget=DEBUG --tagname=GCC5 --arch=LOONGARCH64 --platform=OvmfPkg/LoongArchQemu/Loongson.dsc + + build --buildtarget=RELEASE --tagname=GCC5 --arch=LOONGARCH64 --platform=OvmfPkg/LoongArchQemu/Loongson.dsc + +The efi binary file path: + + Build/LoongArchQemu/DEBUG_GCC5/FV/QEMU_EFI.fd + + Build/LoongArchQemu/RELEASE_GCC5/FV/QEMU_EFI.fd + +(5) linux kernel source code and compile method: .. code-block:: bash - $ qemu-system-loongarch64 -machine virt -m 4G -cpu Loongson-3A5000 \ - -smp 1 -kernel hello -monitor none -display none \ - -chardev file,path=hello.out,id=output -serial chardev:output + git clone https://github.com/loongson/linux.git + + export PATH=$YOUR_COMPILER_PATH/bin:$PATH + + export LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/lib:$LD_LIBRARY_PATH + + export LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/loongarch64-unknown-linux-gnu/lib/:$LD_LIBRARY_PATH + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- loongson3_defconfig + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- install + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- modules_install + +Note: The branch of linux source code is loongarch-next. + +(6) initrd file: + + You can use busybox tool and the linux modules to make a initrd file. Or you can access the + binary files: https://github.com/yangxiaojuan-loongson/qemu-binary diff --git a/target/loongarch/README b/target/loongarch/README index 1823375d04..0b9dc0d40a 100644 --- a/target/loongarch/README +++ b/target/loongarch/README @@ -11,54 +11,7 @@ - System emulation - Mainly emula
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
However, fallocate() preallocates full guest memory before starting the guest. With this behaviour guest memory is *not* demand pinned. Is there a way to prevent fallocate() from reserving full guest memory? Isn't the pinning being handled by the corresponding host memory backend with mmu > notifier and architecture support while doing the memory operations e.g page> migration and swapping/reclaim (not supported currently AFAIU). But yes, we need> to allocate entire guest memory with the new flags MEMFILE_F_{UNMOVABLE, UNRECLAIMABLE etc}. That is correct, but the question is when does the memory allocated, as these flags are set, memory is neither moved nor reclaimed. In current scenario, if I start a 32GB guest, all 32GB is allocated. I guess so if guest memory is private by default. Other option would be to allocate memory as shared by default and handle on demand allocation and RMPUPDATE with page state change event. But still that would be done at guest boot time, IIUC. Sorry! Don't want to hijack the other thread so replying here. I thought the question is for SEV SNP. For SEV, maybe the hypercall with the page state information can be used to allocate memory as we use it or something like quota based memory allocation (just thinking). But all this would have considerable performance overhead (if by default memory is shared) and used mostly at boot time. So, preallocating memory (default memory private) seems better approach for both SEV & SEV SNP with later page management (pinning, reclaim) taken care by host memory backend & architecture together. I am not sure how will pre-allocating memory help, even if guest would not use full memory it will be pre-allocated. Which if I understand correctly is not expected. For SEV I am also not very sure what would be the best way. There could be a tradeoff between memory pinning and performance. As I was also thinking about "Async page fault" aspect of guest in my previous reply. Details needs to be figure out. Quoting my previous reply here: "Or maybe later we can think of something like allowing direct page fault on host memory access for *SEV* guest as there is no strict requirement for memory integrity guarantee and the performance overhead." Thanks, Pankaj
Re: [PATCH v2 0/5] hw/smbios: add core_count2 to smbios table type 4
On Sun, Jul 31, 2022 at 06:21:36PM +0200, Julia Suvorova wrote: > The SMBIOS 3.0 specification provides the ability to reflect over > 255 cores. The 64-bit entry point has been used for a while, but > structure type 4 has not been updated before, so the dmidecode output > looked like this (-smp 280): > > Handle 0x0400, DMI type 4, 42 bytes > Processor Information > ... > Core Count: 24 > Core Enabled: 24 > Thread Count: 1 > ... > > Big update in the bios-tables-test as it couldn't work with SMBIOS 3.0. Looks good to me. I tagged this but just to help make sure it's not lost pls ping me after the release. > v2: > * generate tables type 4 of different sizes based on the > selected smbios version > * use SmbiosEntryPoint* types instead of creating new constants > * refactor smbios_cpu_test [Igor, Ani] > * clarify signature check [Igor] > * add comments with specifications and clarification of the structure > loop [Ani] > > Julia Suvorova (5): > hw/smbios: add core_count2 to smbios table type 4 > bios-tables-test: teach test to use smbios 3.0 tables > tests/acpi: allow changes for core_count2 test > bios-tables-test: add test for number of cores > 255 > tests/acpi: update tables for new core count test > > hw/smbios/smbios_build.h | 9 +- > include/hw/firmware/smbios.h | 11 ++ > hw/smbios/smbios.c | 18 +++- > tests/qtest/bios-tables-test.c | 148 --- > tests/data/acpi/q35/APIC.core-count2 | Bin 0 -> 2478 bytes > tests/data/acpi/q35/DSDT.core-count2 | Bin 0 -> 32414 bytes > tests/data/acpi/q35/FACP.core-count2 | Bin 0 -> 244 bytes > 7 files changed, 144 insertions(+), 42 deletions(-) > create mode 100644 tests/data/acpi/q35/APIC.core-count2 > create mode 100644 tests/data/acpi/q35/DSDT.core-count2 > create mode 100644 tests/data/acpi/q35/FACP.core-count2 > > -- > 2.35.3
Re: [PATCH for-7.1?] Fix some typos in documentation (most of them found by codespell)
On Fri, 12 Aug 2022 at 08:59, Stefan Weil via wrote: > > Signed-off-by: Stefan Weil > --- Reviewed-by: Peter Maydell thanks -- PMM
Re: [PATCH for-7.1?] Fix some typos in documentation (most of them found by codespell)
On Fri, 12 Aug 2022 at 08:59, Stefan Weil via wrote: > > Signed-off-by: Stefan Weil > --- > docs/about/deprecated.rst | 2 +- > docs/specs/acpi_erst.rst| 4 ++-- > docs/system/devices/canokey.rst | 8 > docs/system/devices/cxl.rst | 12 ++-- > 4 files changed, 13 insertions(+), 13 deletions(-) > Docs patches are safe for rc3. I have to do a target-arm pullreq anyway for rc3, so I'll take this via that, unless anybody objects. thanks -- PMM
Re: [PATCH] tests/unit: fix a -Wformat-trunction warning
On Wed, 10 Aug 2022 at 13:20, wrote: > > From: Marc-André Lureau > > ../tests/test-qobject-input-visitor.c: In function ‘test_visitor_in_list’: > ../tests/test-qobject-input-visitor.c:454:49: warning: ‘%d’ directive output > may be truncated writing between 1 and 10 bytes into a region of size 6 > [-Wformat-truncation=] > 454 | snprintf(string, sizeof(string), "string%d", i); > | ^~ > ../tests/test-qobject-input-visitor.c:454:42: note: directive argument in the > range [0, 2147483606] > 454 | snprintf(string, sizeof(string), "string%d", i); > | ^~ > ../tests/test-qobject-input-visitor.c:454:9: note: ‘snprintf’ output between > 8 and 17 bytes into a destination of size 12 > 454 | snprintf(string, sizeof(string), "string%d", i); > | ^~~ > > Not trying to be clever, this is called 3 times during tests, > let simply use g_strdup_printf(). > > Signed-off-by: Marc-André Lureau This is a pretty safe fix and compiler warnings seem like they're worth fixing for rc3. Since I need to do a target-arm pullreq anyway I'll take this via that, unless anybody objects. (I've fixed up the commit message typo in my tree.) thanks -- PMM
Re: [PATCH] hw/arm/virt-acpi-build: Present the GICR structure properly for GICv4
On Fri, 12 Aug 2022 at 03:20, Zenghui Yu wrote: > > With the introduction of the new TCG GICv4, build_madt() is badly broken > as we do not present any GIC Redistributor structure in MADT for GICv4 > guests, so that they have no idea about where the Redistributor > register frames are. This fixes a Linux guest crash at boot time with > ACPI enabled and '-machine gic-version=4'. > > While at it, let's convert the remaining hard coded gic_version into > enumeration VIRT_GIC_VERSION_2 for consistency. > > Signed-off-by: Zenghui Yu Oops, I missed the ACPI side of things when I added GICv4 support :-( Crash-on-boot seems like a bug worth fixing for rc3... Applied to target-arm.next, thanks. -- PMM
Re: add qemu_fdt_setprop_strings() and use it in most places
On Tue, 9 Aug 2022 at 19:57, Ben Dooks wrote: > > Add a helper for qemu_fdt_setprop_strings() to take a set of strings > to put into a device-tree, which removes several open-coded methods > such as setting an char arr[] = {..} or setting char val[] = "str\0str2"; > > This is for hw/arm, hw/mips and hw/riscv as well as a couple of cores. It > is not fully tested over all of those, I may not have caught all places > this is to be replaced. Hi; just as a minor process thing for next time you send a patchset: the usual convention is that the 'cover letter' email has a subject starting with a tag like "[PATCH v4 0/6]". This makes it easier to pick the cover letter out when eyeballing a set of subject lines, and also helps some of the automated tooling (eg I think this is why https://patchew.org/QEMU/ didn't notice this series). Usually 'git format-patch --cover-letter' will get this right automatically for you. thanks -- PMM
[PULL 1/1] linux-user/aarch64: Reset target data on MADV_DONTNEED
From: Vitaly Buka aarch64 stores MTE tags in target_date, and they should be reset by MADV_DONTNEED. Signed-off-by: Vitaly Buka Reviewed-by: Richard Henderson Message-Id: <20220711220028.2467290-1-vitalyb...@google.com> [lv: fix code style issues] Signed-off-by: Laurent Vivier --- accel/tcg/translate-all.c | 26 ++ include/exec/cpu-all.h| 1 + linux-user/mmap.c | 3 +++ 3 files changed, 30 insertions(+) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index ef62a199c7db..b83161a08190 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -2314,6 +2314,32 @@ void page_set_flags(target_ulong start, target_ulong end, int flags) } } +void page_reset_target_data(target_ulong start, target_ulong end) +{ +target_ulong addr, len; + +/* + * This function should never be called with addresses outside the + * guest address space. If this assert fires, it probably indicates + * a missing call to h2g_valid. + */ +assert(end - 1 <= GUEST_ADDR_MAX); +assert(start < end); +assert_memory_lock(); + +start = start & TARGET_PAGE_MASK; +end = TARGET_PAGE_ALIGN(end); + +for (addr = start, len = end - start; + len != 0; + len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) { +PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1); + +g_free(p->target_data); +p->target_data = NULL; +} +} + void *page_get_target_data(target_ulong address) { PageDesc *p = page_find(address >> TARGET_PAGE_BITS); diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index f5bda2c3caa7..491629b9ba7a 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -271,6 +271,7 @@ int walk_memory_regions(void *, walk_memory_regions_fn); int page_get_flags(target_ulong address); void page_set_flags(target_ulong start, target_ulong end, int flags); +void page_reset_target_data(target_ulong start, target_ulong end); int page_check_range(target_ulong start, target_ulong len, int flags); /** diff --git a/linux-user/mmap.c b/linux-user/mmap.c index edceaca4a8e1..048c4135af14 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -894,6 +894,9 @@ abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice) if (advice == MADV_DONTNEED && can_passthrough_madv_dontneed(start, end)) { ret = get_errno(madvise(g2h_untagged(start), len, MADV_DONTNEED)); +if (ret == 0) { +page_reset_target_data(start, start + len); +} } mmap_unlock(); -- 2.37.1
Re: [PATCH for-7.1] cutils: Add missing dyld(3) include on macOS
On Tue, 9 Aug 2022 at 23:22, Philippe Mathieu-Daudé via wrote: > > Commit 06680b15b4 moved qemu_*_exec_dir() to cutils but forgot > to move the macOS dyld(3) include, resulting in the following > error (when building with Homebrew GCC on macOS Monterey 12.4): > > [313/1197] Compiling C object libqemuutil.a.p/util_cutils.c.o > FAILED: libqemuutil.a.p/util_cutils.c.o > ../../util/cutils.c:1039:13: error: implicit declaration of function > '_NSGetExecutablePath' [-Werror=implicit-function-declaration] >1039 | if (_NSGetExecutablePath(fpath, &len) == 0) { > | ^~~~ > ../../util/cutils.c:1039:13: error: nested extern declaration of > '_NSGetExecutablePath' [-Werror=nested-externs] > > Fix by moving the include line to cutils. > > Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils") > Signed-off-by: Philippe Mathieu-Daudé > --- > Cc: Marc-André Lureau > Cc: Markus Armbruster I wonder why this doesn't show up with clang? Anyway, obvious fix. I'll take it via target-arm.next for my pull for rc3, unless anybody has a different preference. thanks -- PMM
[PULL 0/1] Linux user for 7.1 patches
The following changes since commit a6b1c53e79d08a99a28cc3e67a3e1a7c34102d6b: Merge tag 'linux-user-for-7.1-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging (2022-08-10 10:26:57 -0700) are available in the Git repository at: https://gitlab.com/laurent_vivier/qemu.git tags/linux-user-for-7.1-pull-request for you to fetch changes up to dbbf89751b14aa5d281bad3af273e9ffaae82262: linux-user/aarch64: Reset target data on MADV_DONTNEED (2022-08-11 11:34:17 +0200) Pull request linux-user 20220812 Vitaly Buka (1): linux-user/aarch64: Reset target data on MADV_DONTNEED accel/tcg/translate-all.c | 26 ++ include/exec/cpu-all.h| 1 + linux-user/mmap.c | 3 +++ 3 files changed, 30 insertions(+) -- 2.37.1
[PATCH] osdep: Introduce qemu_get_fd() to wrap the common codes
socket_get_fd() have much the same codes as monitor_fd_param(), so qemu_get_fd() is introduced to implement the common logic. now socket_get_fd() and monitor_fd_param() directly call this function. Signed-off-by: Guoyi Tu --- include/qemu/osdep.h | 1 + monitor/misc.c | 21 + util/osdep.c | 25 + util/qemu-sockets.c | 17 + 4 files changed, 32 insertions(+), 32 deletions(-) diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index b1c161c035..b920f128a7 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -491,6 +491,7 @@ int qemu_open_old(const char *name, int flags, ...); int qemu_open(const char *name, int flags, Error **errp); int qemu_create(const char *name, int flags, mode_t mode, Error **errp); int qemu_close(int fd); +int qemu_get_fd(Monitor *mon, const char *fdname, Error **errp); int qemu_unlink(const char *name); #ifndef _WIN32 int qemu_dup_flags(int fd, int flags); diff --git a/monitor/misc.c b/monitor/misc.c index 3d2312ba8d..0d3372cf2b 100644 --- a/monitor/misc.c +++ b/monitor/misc.c @@ -1395,26 +1395,7 @@ void monitor_fdset_dup_fd_remove(int dup_fd) int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp) { - int fd; - Error *local_err = NULL; - - if (!qemu_isdigit(fdname[0]) && mon) { - fd = monitor_get_fd(mon, fdname, &local_err); - } else { - fd = qemu_parse_fd(fdname); - if (fd == -1) { - error_setg(&local_err, "Invalid file descriptor number '%s'", - fdname); - } - } - if (local_err) { - error_propagate(errp, local_err); - assert(fd == -1); - } else { - assert(fd != -1); - } - - return fd; + return qemu_get_fd(mon, fdname, errp); } /* Please update hmp-commands.hx when adding or changing commands */ diff --git a/util/osdep.c b/util/osdep.c index 60fcbbaebe..c57551ca78 100644 --- a/util/osdep.c +++ b/util/osdep.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" #include "qapi/error.h" +#include "qemu/ctype.h" #include "qemu/cutils.h" #include "qemu/sockets.h" #include "qemu/error-report.h" @@ -413,6 +414,30 @@ int qemu_close(int fd) return close(fd); } +int qemu_get_fd(Monitor *mon, const char *fdname, Error **errp) +{ + int fd; + Error *local_err = NULL; + + if (!qemu_isdigit(fdname[0]) && mon) { + fd = monitor_get_fd(mon, fdname, &local_err); + } else { + fd = qemu_parse_fd(fdname); + if (fd == -1) { + error_setg(&local_err, "Invalid file descriptor number '%s'", + fdname); + } + } + if (local_err) { + error_propagate(errp, local_err); + assert(fd == -1); + } else { + assert(fd != -1); + } + + return fd; +} + /* * Delete a file from the filesystem, unless the filename is /dev/fdset/... * diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c index 13b5b197f9..92960ee6eb 100644 --- a/util/qemu-sockets.c +++ b/util/qemu-sockets.c @@ -1142,19 +1142,12 @@ static int socket_get_fd(const char *fdstr, Error **errp) { Monitor *cur_mon = monitor_cur(); int fd; - if (cur_mon) { - fd = monitor_get_fd(cur_mon, fdstr, errp); - if (fd < 0) { - return -1; - } - } else { - if (qemu_strtoi(fdstr, NULL, 10, &fd) < 0) { - error_setg_errno(errp, errno, - "Unable to parse FD number %s", - fdstr); - return -1; - } + + fd = qemu_get_fd(cur_mon, fdstr, errp); + if (fd < 0) { + return -1; } + if (!fd_is_socket(fd)) { error_setg(errp, "File descriptor '%s' is not a socket", fdstr); close(fd); -- 2.25.1
[PATCH] hw/nvme: remove param zoned.auto_transition
The intention of the Zoned Namespace Command Set Specification was never to make an automatic zone transition optional. Excerpt from the nvmexpress.org zns mailing list: """ A question came up internally on the differences between ZNS and ZAC/ZBC that asked about when a controller should transitions a specific zone in the Implicitly Opened state to Closed state. For example, consider a ZNS SSD that supports a max of 20 active zones, and a max of 10 open zones, which has the following actions occur: First, the host writes to ten empty zones, thereby transitioning 10 zones to the Implicitly Opened state. Second, the host issues a write to an 11th empty zone. Given that state, my understanding of the second part is that the ZNS SSD chooses one of the previously 10 zones, and transition the chosen zone to the Closed state, and then proceeds to write to the new zone which also implicitly transition it from the Empty state to the Impl. Open state. After this, there would be 11 active zones in total, 10 in impl. Open state, and one in closed state. The above assumes that a ZNS SSD will always transition an implicitly opened zone to closed state when required to free up resources when another zone is opened. However, it isn’t strictly said in the ZNS spec. The paragraph that should cover it is defined in section 2.1.1.4.1 – Managing Resources: The controller may transition zones in the ZSIO:Implicitly Opened state to the ZSC:Closed state for resource management purposes. However, it doesn’t say “when” it should occur. Thus, as the text stand, it could be misinterpreted that the controller shouldn’t do close a zone to make room for a new zone. The issue with this, is that it makes the point of having implicitly managed zones moot. The ZAC/ZBC specs is more specific and clarifies when a zone should be closed. I think it would be natural to the same here. """ While the Zoned Namespace Command Set Specification hasn't received an errata yet, it is quite clear that the intention was that an automatic zone transition was never supposed to be optional, as then the whole point of having implictly open zones would be pointless. Therefore, remove the param zoned.auto_transition, as this was never supposed to be controller implementation specific. Signed-off-by: Niklas Cassel --- hw/nvme/ctrl.c | 35 --- hw/nvme/nvme.h | 1 - 2 files changed, 12 insertions(+), 24 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 87aeba0564..b8c8075301 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -34,7 +34,6 @@ * aerl=,aer_max_queued=, \ * mdts=,vsl=, \ * zoned.zasl=, \ - * zoned.auto_transition=, \ * sriov_max_vfs= \ * sriov_vq_flexible= \ * sriov_vi_flexible= \ @@ -106,11 +105,6 @@ * the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e. * defaulting to the value of `mdts`). * - * - `zoned.auto_transition` - * Indicates if zones in zone state implicitly opened can be automatically - * transitioned to zone state closed for resource management purposes. - * Defaults to 'on'. - * * - `sriov_max_vfs` * Indicates the maximum number of PCIe virtual functions supported * by the controller. The default value is 0. Specifying a non-zero value @@ -1867,8 +1861,8 @@ enum { NVME_ZRM_ZRWA = 1 << 1, }; -static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, NvmeNamespace *ns, -NvmeZone *zone, int flags) +static uint16_t nvme_zrm_open_flags(NvmeNamespace *ns, NvmeZone *zone, +int flags) { int act = 0; uint16_t status; @@ -1880,9 +1874,7 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, NvmeNamespace *ns, /* fallthrough */ case NVME_ZONE_STATE_CLOSED: -if (n->params.auto_transition_zones) { -nvme_zrm_auto_transition_zone(ns); -} +nvme_zrm_auto_transition_zone(ns); status = nvme_zns_check_resources(ns, act, 1, (flags & NVME_ZRM_ZRWA) ? 1 : 0); if (status) { @@ -1925,10 +1917,9 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, NvmeNamespace *ns, } } -static inline uint16_t nvme_zrm_auto(NvmeCtrl *n, NvmeNamespace *ns, - NvmeZone *zone) +static inline uint16_t nvme_zrm_auto(NvmeNamespace *ns, NvmeZone *zone) { -return nvme_zrm_open_flags(n, ns, zone, NVME_ZRM_AUTO); +return nvme_zrm_open_flags(ns, zone, NVME_ZRM_AUTO); } static void nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone, @@ -3065,7 +3056,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) goto invalid; } -status = nvme_zrm_auto(n, ns, iocb->zone); +status = nvme_zrm_auto(ns, iocb->zone); if (status) { goto invalid; } @@ -3471,7 +3462,7 @@ static u
Re: [PATCH] riscv: Make semihosting configurable for all privilege modes
On Thu, Aug 11, 2022 at 01:41:04PM -0700, Furquan Shaikh wrote: > Unlike ARM, RISC-V does not define a separate breakpoint type for > semihosting. Instead, it is entirely ABI. Thus, we need an option > to allow users to configure what the ebreak behavior should be for > different privilege levels - M, S, U, VS, VU. As per the RISC-V > privilege specification[1], ebreak traps into the execution > environment. However, RISC-V debug specification[2] provides > ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to > be configured to trap into debug mode instead. This change adds > settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`, > `ebreakvs` and `ebreakvu` to allow user to configure whether qemu > should treat ebreak as semihosting traps or trap according to the > privilege specification. > > [1] > https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf > [2] > https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf > > Signed-off-by: Furquan Shaikh > --- > target/riscv/cpu.c| 8 > target/riscv/cpu.h| 7 +++ > target/riscv/cpu_helper.c | 26 +- > 3 files changed, 40 insertions(+), 1 deletion(-) > > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c > index ac6f82ebd0..082194652b 100644 > --- a/target/riscv/cpu.c > +++ b/target/riscv/cpu.c > @@ -953,6 +953,14 @@ static Property riscv_cpu_properties[] = { > DEFINE_PROP_BOOL("short-isa-string", RISCVCPU, > cfg.short_isa_string, false), > > DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false), > + > +/* Debug spec */ > +DEFINE_PROP_BOOL("ebreakm", RISCVCPU, cfg.ebreakm, true), > +DEFINE_PROP_BOOL("ebreaks", RISCVCPU, cfg.ebreaks, false), > +DEFINE_PROP_BOOL("ebreaku", RISCVCPU, cfg.ebreaku, false), > +DEFINE_PROP_BOOL("ebreakvs", RISCVCPU, cfg.ebreakvs, false), > +DEFINE_PROP_BOOL("ebreakvu", RISCVCPU, cfg.ebreakvu, false), > + > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h > index 5c7acc055a..eee8e487a6 100644 > --- a/target/riscv/cpu.h > +++ b/target/riscv/cpu.h > @@ -454,6 +454,13 @@ struct RISCVCPUConfig { > bool epmp; > bool aia; > bool debug; > + > +/* Debug spec */ > +bool ebreakm; > +bool ebreaks; > +bool ebreaku; > +bool ebreakvs; > +bool ebreakvu; There's only five of these, so having each separate probably makes the most sense, but I wanted to point out that we could keep the properties independent booleans, as we should, but still consolidate the values into a single bitmap like we did for the sve vq bitmap for arm (see cpu_arm_get/set_vq). Maybe worth considering? > uint64_t resetvec; > > bool short_isa_string; > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c > index 59b3680b1b..be09abbe27 100644 > --- a/target/riscv/cpu_helper.c > +++ b/target/riscv/cpu_helper.c > @@ -1314,6 +1314,30 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr > address, int size, > > return true; > } > + > +static bool semihosting_enabled(RISCVCPU *cpu) > +{ > +CPURISCVState *env = &cpu->env; > + > +switch (env->priv) { > +case PRV_M: > +return cpu->cfg.ebreakm; > +case PRV_S: > +if (riscv_cpu_virt_enabled(env)) { > +return cpu->cfg.ebreakvs; > +} else { > +return cpu->cfg.ebreaks; > +} > +case PRV_U: > +if (riscv_cpu_virt_enabled(env)) { > +return cpu->cfg.ebreakvu; > +} else { > +return cpu->cfg.ebreaku; > +} > +} > + > +return false; > +} > #endif /* !CONFIG_USER_ONLY */ > > /* > @@ -1342,7 +1366,7 @@ void riscv_cpu_do_interrupt(CPUState *cs) > target_ulong mtval2 = 0; > > if (cause == RISCV_EXCP_SEMIHOST) { > -if (env->priv >= PRV_S) { > +if (semihosting_enabled(cpu)) { > do_common_semihosting(cs); > env->pc += 4; > return; > -- > 2.34.1 > Bitmap or no bitmap, Reviewed-by: Andrew Jones
Re: [PATCH] riscv: Make semihosting configurable for all privilege modes
On Thu, 11 Aug 2022 at 21:47, Furquan Shaikh wrote: > > Unlike ARM, RISC-V does not define a separate breakpoint type for > semihosting. Instead, it is entirely ABI. Thus, we need an option > to allow users to configure what the ebreak behavior should be for > different privilege levels - M, S, U, VS, VU. As per the RISC-V > privilege specification[1], ebreak traps into the execution > environment. However, RISC-V debug specification[2] provides > ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to > be configured to trap into debug mode instead. This change adds > settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`, > `ebreakvs` and `ebreakvu` to allow user to configure whether qemu > should treat ebreak as semihosting traps or trap according to the > privilege specification. > > [1] > https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf > [2] > https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf As a general rule we don't allow userspace to make semihosting calls, as a (rather weak) attempt at fencing off unprivileged guest code from being able to scribble all over the host filesystem. We should try to be consistent across architectures about that, and in particular about how we enable it. I have a half-finished patchset where I was planning to add a --semihosting-config userspace-enable=on option or similar to that effect. It sounds like these ebreak bits are somewhat architectural, so maybe they make sense as a riscv specific thing, but we should consider how they ought to interact with the general behaviour of semihosting. As it stands in QEMU today, we (at least in theory) ought not to permit userspace to make semihosting ebreak calls at all I think. thanks -- PMM
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
On 8/12/22 04:26, Claudio Imbrenda wrote: On Thu, 11 Aug 2022 23:05:52 -0300 Murilo Opsfelder Araújo wrote: On 8/11/22 11:02, Daniel P. Berrangé wrote: [...] Hmm, I was hoping you could just use SIGKILL to guarantee that this gets killed off. Is SIGKILL delivered too soon to allow for the main QEMU process to have exited quickly ? yes, I tried. qemu has not finished exiting when the signal is delivered, the cleanup process dies before qemu, which defeats the purpose Ok, too bad. If so I wonder what happens when systemd just delivers SIGKILL to all processes in the cgroup - I'm not sure there's a guarantee it will SIGKILL the main qemu before it SIGKILLs this helper I'm afraid in that case there is no guarantee. for what it's worth, both virsh shutdown and destroy seem to do things properly. Hmm, probably because libvirt tells QEMU to exit before systemd comes along and tells everything in the cgroup to die with SIGKILL. It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10 seconds after Libvirt sent SIGTERM: https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375 but this is fine. with asynchronous teardown, qemu will exit almost immediately when receiving SIGTERM, and the cleanup process will start cleaning up. Under normal and orderly conditions, yes. So I guess this patch happened to work with Libvirt because the main qemu process terminated before the timeout and before SIGKILL was delivered. it seems so The cleanup process is trying to solve the problem where the main qemu process takes too long to terminate. However, if the cleanup process itself takes too long, SIGKILL will be sent by Libvirt anyway. but that is not a problem, the sole purpose of the cleanup process is to terminate _after_ qemu. it doesn't matter what happens after qemu has terminated. if you look at the patch, after going to great lengths to assure that qemu has terminated, all the child process does is _exit(0). Perhaps we can describe this situation in the parameter help, e.g.: If management layer decides to send SIGKILL (e.g.: due to timeout or deliberate decision), the cleanup process can exit before the main process, deceiving its purpose. if the management layer (or the user) decides to send SIGKILL immediately to the whole cgroup without sending SIGTERM first, then this whole asynchronous teardown mechanism is defeated, yes. This situation is what we likely want to describe in the parameter help. I don't want to give users the false impression that this option will *always* behave the manner we expect it to work *most* of the time. -- Murilo
[PATCH 1/2] osdeps: Introduce qemu_socketpair()
From: Guoyi Tu qemu_socketpair() will create a pair of connected sockets with FD_CLOEXEC set Signed-off-by: Guoyi Tu --- include/qemu/sockets.h | 3 +++ util/osdep.c | 24 2 files changed, 27 insertions(+) diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h index 038faa157f..52cf2855df 100644 --- a/include/qemu/sockets.h +++ b/include/qemu/sockets.h @@ -14,6 +14,9 @@ int inet_aton(const char *cp, struct in_addr *ia); /* misc helpers */ bool fd_is_socket(int fd); int qemu_socket(int domain, int type, int protocol); +#ifndef WIN32 +int qemu_socketpair(int domain, int type, int protocol, int sv[2]); +#endif int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen); int socket_set_cork(int fd, int v); int socket_set_nodelay(int fd); diff --git a/util/osdep.c b/util/osdep.c index 60fcbbaebe..4b1ab623c7 100644 --- a/util/osdep.c +++ b/util/osdep.c @@ -481,6 +481,30 @@ int qemu_socket(int domain, int type, int protocol) return ret; } +#ifndef _WIN32 +/* + * Create a pair of connected sockets with FD_CLOEXEC set + */ +int qemu_socketpair(int domain, int type, int protocol, int sv[2]) +{ +int ret; + +#ifdef SOCK_CLOEXEC +ret = socketpair(domain, type | SOCK_CLOEXEC, protocol, sv); +if (ret != -1 || errno != EINVAL) { +return ret; +} +#endif +ret = socketpair(domain, type, protocol, sv);; +if (ret == 0) { +qemu_set_cloexec(sv[0]); +qemu_set_cloexec(sv[1]); +} + +return ret; +} +#endif + /* * Accept a connection and set FD_CLOEXEC */ -- 2.25.1
[PULL 4/5] hw/arm/virt-acpi-build: Present the GICR structure properly for GICv4
From: Zenghui Yu With the introduction of the new TCG GICv4, build_madt() is badly broken as we do not present any GIC Redistributor structure in MADT for GICv4 guests, so that they have no idea about where the Redistributor register frames are. This fixes a Linux guest crash at boot time with ACPI enabled and '-machine gic-version=4'. While at it, let's convert the remaining hard coded gic_version into enumeration VIRT_GIC_VERSION_2 for consistency. Signed-off-by: Zenghui Yu Message-id: 20220812022018.1069-1-yuzeng...@huawei.com Reviewed-by: Peter Maydell Signed-off-by: Peter Maydell --- hw/arm/virt-acpi-build.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 449fab00805..9b3aee01bf8 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -732,7 +732,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) uint32_t pmu_interrupt = arm_feature(&armcpu->env, ARM_FEATURE_PMU) ? PPI(VIRTUAL_PMU_IRQ) : 0; -if (vms->gic_version == 2) { +if (vms->gic_version == VIRT_GIC_VERSION_2) { physical_base_address = memmap[VIRT_GIC_CPU].base; gicv = memmap[VIRT_GIC_VCPU].base; gich = memmap[VIRT_GIC_HYP].base; @@ -762,7 +762,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) build_append_int_noprefix(table_data, armcpu->mp_affinity, 8); } -if (vms->gic_version == 3) { +if (vms->gic_version != VIRT_GIC_VERSION_2) { build_append_gicr(table_data, memmap[VIRT_GIC_REDIST].base, memmap[VIRT_GIC_REDIST].size); if (virt_gicv3_redist_region_count(vms) == 2) { -- 2.25.1
[PATCH 0/2] introduce qemu_socketpiar()
From: Guoyi Tu introduce qemu_socketpair() to create socket pair fd, and set the close-on-exec flag at default as with the other type of socket does. besides, the live update feature is developing, so it's necessary to do that. Guoyi Tu (2): osdeps: Introduce qemu_socketpair() vhost-user: Call qemu_socketpair() instead of socketpair() hw/display/vhost-user-gpu.c | 3 ++- hw/virtio/vhost-user.c | 2 +- include/qemu/sockets.h | 3 +++ util/osdep.c| 24 4 files changed, 30 insertions(+), 2 deletions(-) -- 2.25.1
[PATCH 2/2] vhost-user: Call qemu_socketpair() instead of socketpair()
From: Guoyi Tu set close-on-exec flag on the new opened file descriptors at default Signed-off-by: Guoyi Tu --- hw/display/vhost-user-gpu.c | 3 ++- hw/virtio/vhost-user.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c index 3340ef9e5f..19c0e20103 100644 --- a/hw/display/vhost-user-gpu.c +++ b/hw/display/vhost-user-gpu.c @@ -11,6 +11,7 @@ */ #include "qemu/osdep.h" +#include "qemu/sockets.h" #include "hw/qdev-properties.h" #include "hw/virtio/virtio-gpu.h" #include "chardev/char-fe.h" @@ -375,7 +376,7 @@ vhost_user_gpu_do_set_socket(VhostUserGPU *g, Error **errp) Chardev *chr; int sv[2]; -if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) { +if (qemu_socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) { error_setg_errno(errp, errno, "socketpair() failed"); return false; } diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 75b8df21a4..4d2b227028 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -1726,7 +1726,7 @@ static int vhost_setup_slave_channel(struct vhost_dev *dev) return 0; } -if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) { +if (qemu_socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) { int saved_errno = errno; error_report("socketpair() failed"); return -saved_errno; -- 2.25.1
[PULL 2/5] Fix some typos in documentation (most of them found by codespell)
From: Stefan Weil Signed-off-by: Stefan Weil Reviewed-by: Hongren (Zenithal) Zheng Message-id: 20220812075642.1200578-1...@weilnetz.de Reviewed-by: Peter Maydell Signed-off-by: Peter Maydell --- docs/about/deprecated.rst | 2 +- docs/specs/acpi_erst.rst| 4 ++-- docs/system/devices/canokey.rst | 8 docs/system/devices/cxl.rst | 12 ++-- 4 files changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index 7ee26626d5c..91b03115ee2 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -297,7 +297,7 @@ by using ``-machine graphics=off``. In QEMU versions 6.1, 6.2 and 7.0, the ``nvme-ns`` generates an EUI-64 -identifer that is not globally unique. If an EUI-64 identifer is required, the +identifier that is not globally unique. If an EUI-64 identifier is required, the user must set it explicitly using the ``nvme-ns`` device parameter ``eui64``. ``-device nvme,use-intel-id=on|off`` (since 7.1) diff --git a/docs/specs/acpi_erst.rst b/docs/specs/acpi_erst.rst index a8a9d22d254..2339b60ad74 100644 --- a/docs/specs/acpi_erst.rst +++ b/docs/specs/acpi_erst.rst @@ -108,7 +108,7 @@ Slot 0 contains a backend storage header that identifies the contents as ERST and also facilitates efficient access to the records. Depending upon the size of the backend storage, additional slots will be designated to be a part of the slot 0 header. For example, at 8KiB, -the slot 0 header can accomodate 1021 records. Thus a storage size +the slot 0 header can accommodate 1021 records. Thus a storage size of 8MiB (8KiB * 1024) requires an additional slot for use by the header. In this scenario, slot 0 and slot 1 form the backend storage header, and records can be stored starting at slot 2. @@ -196,5 +196,5 @@ References [2] "Unified Extensible Firmware Interface Specification", version 2.1, October 2008. -[3] "Windows Hardware Error Architecture", specfically +[3] "Windows Hardware Error Architecture", specifically "Error Record Persistence Mechanism". diff --git a/docs/system/devices/canokey.rst b/docs/system/devices/canokey.rst index c2c58ae3e7c..cfa6186e483 100644 --- a/docs/system/devices/canokey.rst +++ b/docs/system/devices/canokey.rst @@ -28,9 +28,9 @@ With the same software configuration as a hardware key, the guest OS can use all the functionalities of a secure key as if there was actually an hardware key plugged in. -CanoKey QEMU provides much convenience for debuging: +CanoKey QEMU provides much convenience for debugging: -* libcanokey-qemu supports debuging output thus developers can +* libcanokey-qemu supports debugging output thus developers can inspect what happens inside a secure key * CanoKey QEMU supports trace event thus event * QEMU USB stack supports pcap thus USB packet between the guest @@ -102,8 +102,8 @@ and find CanoKey QEMU there: You may setup the key as guided in [6]_. The console for the key is at [7]_. -Debuging - +Debugging += CanoKey QEMU consists of two parts, ``libcanokey-qemu.so`` and ``canokey.c``, the latter of which resides in QEMU. The former provides core functionality diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst index 36031325cca..f25783a4ecf 100644 --- a/docs/system/devices/cxl.rst +++ b/docs/system/devices/cxl.rst @@ -83,7 +83,7 @@ CXL Fixed Memory Windows (CFMW) A CFMW consists of a particular range of Host Physical Address space which is routed to particular CXL Host Bridges. At time of generic software initialization it will have a particularly interleaving -configuration and associated Quality of Serice Throtling Group (QTG). +configuration and associated Quality of Service Throttling Group (QTG). This information is available to system software, when making decisions about how to configure interleave across available CXL memory devices. It is provide as CFMW Structures (CFMWS) in @@ -98,7 +98,7 @@ specification defined register interface called CXL Host Bridge Component Registers (CHBCR). The location of this CHBCR MMIO space is described to system software via a CXL Host Bridge Structure (CHBS) in the CEDT ACPI table. The actual interfaces -are identical to those used for other parts of the CXL heirarchy +are identical to those used for other parts of the CXL hierarchy as CXL Component Registers in PCI BARs. Interfaces provided include: @@ -143,7 +143,7 @@ CXL Memory Devices - Type 3 ~~~ CXL type 3 devices use a PCI class code and are intended to be supported by a generic operating system driver. They have HDM decoders -though in these EP devices, the decoder is reponsible not for +though in these EP devices, the decoder is responsible not for routing but for translation of the incoming host physical address (HPA) into a Device Physical Address (DPA). @@ -209,7 +209,7 @@ Notes: ra
[PULL 1/5] target/arm: Don't report Statistical Profiling Extension in ID registers
The newly added neoverse-n1 CPU has ID register values which indicate the presence of the Statistical Profiling Extension, because the real hardware has this feature. QEMU's TCG emulation does not yet implement SPE, though (not even as a minimal stub implementation), so guests will crash if they try to use it because the SPE system registers don't exist. Force ID_AA64DFR0_EL1.PMSVer to 0 in CPU realize for TCG, so that we don't advertise to the guest a feature that doesn't exist. (We could alternatively do this by editing the value that aarch64_neoverse_n1_initfn() sets for this ID register, but suppressing the field in realize means we won't re-introduce this bug when we add other CPUs that have SPE in hardware, such as the Neoverse-V1.) An example of a non-booting guest is current mainline Linux (5.19), when booting in EL2 on the virt board (ie with -machine virtualization=on). Reported-by: Zenghui Yu Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Zenghui Yu Message-id: 20220811131127.947334-1-peter.mayd...@linaro.org --- target/arm/cpu.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 1b7b3d76bb3..7ec3281da9a 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -1933,6 +1933,17 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp) } #endif +if (tcg_enabled()) { +/* + * Don't report the Statistical Profiling Extension in the ID + * registers, because TCG doesn't implement it yet (not even a + * minimal stub version) and guests will fall over when they + * try to access the non-existent system registers for it. + */ +cpu->isar.id_aa64dfr0 = +FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, PMSVER, 0); +} + /* MPU can be configured out of a PMSA CPU either by setting has-mpu * to false or by setting pmsav7-dregion to 0. */ -- 2.25.1
[PULL 5/5] cutils: Add missing dyld(3) include on macOS
From: Philippe Mathieu-Daudé Commit 06680b15b4 moved qemu_*_exec_dir() to cutils but forgot to move the macOS dyld(3) include, resulting in the following error (when building with Homebrew GCC on macOS Monterey 12.4): [313/1197] Compiling C object libqemuutil.a.p/util_cutils.c.o FAILED: libqemuutil.a.p/util_cutils.c.o ../../util/cutils.c:1039:13: error: implicit declaration of function '_NSGetExecutablePath' [-Werror=implicit-function-declaration] 1039 | if (_NSGetExecutablePath(fpath, &len) == 0) { | ^~~~ ../../util/cutils.c:1039:13: error: nested extern declaration of '_NSGetExecutablePath' [-Werror=nested-externs] Fix by moving the include line to cutils. Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils") Signed-off-by: Philippe Mathieu-Daudé Message-id: 20220809222046.30812-1-f4...@amsat.org Reviewed-by: Peter Maydell Signed-off-by: Peter Maydell --- util/cutils.c | 4 util/oslib-posix.c | 4 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/util/cutils.c b/util/cutils.c index cb43dda213c..def9c746ced 100644 --- a/util/cutils.c +++ b/util/cutils.c @@ -39,6 +39,10 @@ #include #endif +#ifdef __APPLE__ +#include +#endif + #ifdef G_OS_WIN32 #include #include diff --git a/util/oslib-posix.c b/util/oslib-posix.c index bffec18869e..d55af69c112 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -58,10 +58,6 @@ #include #endif -#ifdef __APPLE__ -#include -#endif - #include "qemu/mmap-alloc.h" #ifdef CONFIG_DEBUG_STACK_USAGE -- 2.25.1
[PULL 0/5] target-arm queue
This pullreq has: * two arm bug fixes which fix some "Linux fails to boot" bugs * a docs typo-fixing patch * a couple of compile failure/warning issues I think they're all pretty safe and worth having in rc3. thanks -- PMM The following changes since commit a6b1c53e79d08a99a28cc3e67a3e1a7c34102d6b: Merge tag 'linux-user-for-7.1-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging (2022-08-10 10:26:57 -0700) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20220812 for you to fetch changes up to 4311682ea8293f720730f260e8a7601117d79e65: cutils: Add missing dyld(3) include on macOS (2022-08-12 11:33:52 +0100) target-arm queue: * Don't report Statistical Profiling Extension in ID registers * virt ACPI tables: Present the GICR structure properly for GICv4 * Fix some typos in documentation * tests/unit: fix a -Wformat-truncation warning * cutils: Add missing dyld(3) include on macOS Marc-André Lureau (1): tests/unit: fix a -Wformat-truncation warning Peter Maydell (1): target/arm: Don't report Statistical Profiling Extension in ID registers Philippe Mathieu-Daudé (1): cutils: Add missing dyld(3) include on macOS Stefan Weil (1): Fix some typos in documentation (most of them found by codespell) Zenghui Yu (1): hw/arm/virt-acpi-build: Present the GICR structure properly for GICv4 docs/about/deprecated.rst | 2 +- docs/specs/acpi_erst.rst| 4 ++-- docs/system/devices/canokey.rst | 8 docs/system/devices/cxl.rst | 12 ++-- hw/arm/virt-acpi-build.c| 4 ++-- target/arm/cpu.c| 11 +++ tests/unit/test-qobject-input-visitor.c | 3 +-- util/cutils.c | 4 util/oslib-posix.c | 4 9 files changed, 31 insertions(+), 21 deletions(-)
Re: [PATCH 2/2] vhost-user: Call qemu_socketpair() instead of socketpair()
On Fri, 12 Aug 2022 at 12:44, wrote: > > From: Guoyi Tu > > set close-on-exec flag on the new opened file descriptors at default What goes wrong if we don't do this? The commit message is a good place to explain what bug the commit is fixing, and its consequences. thanks -- PMM
[PULL 3/5] tests/unit: fix a -Wformat-truncation warning
From: Marc-André Lureau ../tests/test-qobject-input-visitor.c: In function ‘test_visitor_in_list’: ../tests/test-qobject-input-visitor.c:454:49: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 6 [-Wformat-truncation=] 454 | snprintf(string, sizeof(string), "string%d", i); | ^~ ../tests/test-qobject-input-visitor.c:454:42: note: directive argument in the range [0, 2147483606] 454 | snprintf(string, sizeof(string), "string%d", i); | ^~ ../tests/test-qobject-input-visitor.c:454:9: note: ‘snprintf’ output between 8 and 17 bytes into a destination of size 12 454 | snprintf(string, sizeof(string), "string%d", i); | ^~~ Rather than trying to be clever, since this is called 3 times during tests, let's simply use g_strdup_printf(). Signed-off-by: Marc-André Lureau Reviewed-by: Markus Armbruster Message-id: 20220810121513.1356081-1-marcandre.lur...@redhat.com Reviewed-by: Peter Maydell [PMM: fixed commit message typos] Signed-off-by: Peter Maydell --- tests/unit/test-qobject-input-visitor.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/unit/test-qobject-input-visitor.c b/tests/unit/test-qobject-input-visitor.c index 14329dabcfe..5f614afdbf1 100644 --- a/tests/unit/test-qobject-input-visitor.c +++ b/tests/unit/test-qobject-input-visitor.c @@ -447,9 +447,8 @@ static void test_visitor_in_list(TestInputVisitorData *data, g_assert(head != NULL); for (i = 0, item = head; item; item = item->next, i++) { -char string[12]; +g_autofree char *string = g_strdup_printf("string%d", i); -snprintf(string, sizeof(string), "string%d", i); g_assert_cmpstr(item->value->string, ==, string); g_assert_cmpint(item->value->integer, ==, 42 + i); } -- 2.25.1
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
On Fri, 12 Aug 2022 08:38:59 -0300 Murilo Opsfelder Araújo wrote: > On 8/12/22 04:26, Claudio Imbrenda wrote: > > On Thu, 11 Aug 2022 23:05:52 -0300 > > Murilo Opsfelder Araújo wrote: > > > >> On 8/11/22 11:02, Daniel P. Berrangé wrote: > >> [...] > > Hmm, I was hoping you could just use SIGKILL to guarantee that this > > gets killed off. Is SIGKILL delivered too soon to allow for the > > main QEMU process to have exited quickly ? > > yes, I tried. qemu has not finished exiting when the signal is > delivered, the cleanup process dies before qemu, which defeats the > purpose > >>> > >>> Ok, too bad. > >>> > > If so I wonder what happens when systemd just delivers SIGKILL to > > all processes in the cgroup - I'm not sure there's a guarantee it > > will SIGKILL the main qemu before it SIGKILLs this helper > > I'm afraid in that case there is no guarantee. > > for what it's worth, both virsh shutdown and destroy seem to do things > properly. > >>> > >>> Hmm, probably because libvirt tells QEMU to exit before systemd comes > >>> along and tells everything in the cgroup to die with SIGKILL. > >> > >> It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10 > >> seconds after Libvirt sent SIGTERM: > >> > >> https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375 > >> > > > > but this is fine. > > > > with asynchronous teardown, qemu will exit almost immediately when > > receiving SIGTERM, and the cleanup process will start cleaning up. > > Under normal and orderly conditions, yes. > > >> So I guess this patch happened to work with Libvirt because the main qemu > >> process terminated before the timeout and before SIGKILL was delivered. > > > > it seems so > > > >> > >> The cleanup process is trying to solve the problem where the main qemu > >> process > >> takes too long to terminate. However, if the cleanup process itself takes > >> too > >> long, SIGKILL will be sent by Libvirt anyway. > > > > but that is not a problem, the sole purpose of the cleanup process is > > to terminate _after_ qemu. it doesn't matter what happens after qemu > > has terminated. if you look at the patch, after going to great lengths > > to assure that qemu has terminated, all the child process does is > > _exit(0). > > > >> > >> Perhaps we can describe this situation in the parameter help, e.g.: If > >> management layer decides to send SIGKILL (e.g.: due to timeout or > >> deliberate > >> decision), the cleanup process can exit before the main process, deceiving > >> its > >> purpose. > > > > if the management layer (or the user) decides to send SIGKILL > > immediately to the whole cgroup without sending SIGTERM first, then > > this whole asynchronous teardown mechanism is defeated, yes. > > This situation is what we likely want to describe in the parameter help. I > don't > want to give users the false impression that this option will *always* behave > the manner we expect it to work *most* of the time. fair enough, I'll improve the documentation > > -- > Murilo
Re: [PATCH 1/2] osdeps: Introduce qemu_socketpair()
On Fri, 12 Aug 2022 at 12:44, wrote: > > From: Guoyi Tu > > qemu_socketpair() will create a pair of connected sockets > with FD_CLOEXEC set > > Signed-off-by: Guoyi Tu > --- > include/qemu/sockets.h | 3 +++ > util/osdep.c | 24 > 2 files changed, 27 insertions(+) > > diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h > index 038faa157f..52cf2855df 100644 > --- a/include/qemu/sockets.h > +++ b/include/qemu/sockets.h > @@ -14,6 +14,9 @@ int inet_aton(const char *cp, struct in_addr *ia); > /* misc helpers */ > bool fd_is_socket(int fd); > int qemu_socket(int domain, int type, int protocol); > +#ifndef WIN32 > +int qemu_socketpair(int domain, int type, int protocol, int sv[2]); Any new function declaration in a header file needs a doc-comment documenting what it does, please. > +#endif > int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen); > int socket_set_cork(int fd, int v); > int socket_set_nodelay(int fd); > diff --git a/util/osdep.c b/util/osdep.c > index 60fcbbaebe..4b1ab623c7 100644 > --- a/util/osdep.c > +++ b/util/osdep.c > @@ -481,6 +481,30 @@ int qemu_socket(int domain, int type, int protocol) > return ret; > } > > +#ifndef _WIN32 If this function only exists and is usable on posix hosts, put it in util/oslib-posix.c rather than having it here with a win32 ifdef. thanks -- PMM
Re: [PATCH v2 2/9] target/arm: Don't add all MIDR aliases for Cortex-R
On Mon, 18 Jul 2022 at 12:54, Tobias Roehmel wrote: > > From: Tobias Röhmel > > Cortex-R52 has the MPUIR register which has the same encoding > has the MIDR alias with opc2=4. So we only add that alias > when we are not realizing a Cortex-R. > > Signed-off-by: Tobias Röhmel > --- > target/arm/helper.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/target/arm/helper.c b/target/arm/helper.c > index 6457e6301c..03bdc3d149 100644 > --- a/target/arm/helper.c > +++ b/target/arm/helper.c > @@ -8189,9 +8189,6 @@ void register_cp_regs_for_features(ARMCPU *cpu) >.fieldoffset = offsetof(CPUARMState, cp15.c0_cpuid), >.readfn = midr_read }, > /* crn = 0 op1 = 0 crm = 0 op2 = 4,7 : AArch32 aliases of MIDR */ Moving the op2=4 register definition makes this comment out of date, so you need to update it (ie remove the '4' here, and add a comment similarly noting what id_v8_midr_alias_cp_reginfo[] is doing). > -{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST, > - .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4, > - .access = PL1_R, .resetvalue = cpu->midr }, > { .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST, >.cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 7, >.access = PL1_R, .resetvalue = cpu->midr }, > @@ -8201,6 +8198,11 @@ void register_cp_regs_for_features(ARMCPU *cpu) >.accessfn = access_aa64_tid1, >.type = ARM_CP_CONST, .resetvalue = cpu->revidr }, > }; > +ARMCPRegInfo id_v8_midr_alias_cp_reginfo[] = { > +{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST, > + .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4, > + .access = PL1_R, .resetvalue = cpu->midr }, > +}; If there's only one register, you don't need to define a single-element array, you can use define_one_arm_cp_reg() and pass it a single ARMCPRegInfo struct. > ARMCPRegInfo id_cp_reginfo[] = { > /* These are common to v8 and pre-v8 */ > { .name = "CTR", > @@ -8264,8 +8266,12 @@ void register_cp_regs_for_features(ARMCPU *cpu) > id_mpuir_reginfo.access = PL1_RW; > id_tlbtr_reginfo.access = PL1_RW; > } > + > if (arm_feature(env, ARM_FEATURE_V8)) { > define_arm_cp_regs(cpu, id_v8_midr_cp_reginfo); > +if (!arm_feature(env, ARM_FEATURE_V8_R)) { You can use ARM_FEATURE_PMSA here. > +define_arm_cp_regs(cpu, id_v8_midr_alias_cp_reginfo); > +} > } else { > define_arm_cp_regs(cpu, id_pre_v8_midr_cp_reginfo); > } > -- thanks -- PMM
Re: [PATCH v2 3/9] target/arm: Make RVBAR available for all ARMv8 CPUs
(I've added your rwth-aachen.de address because the quicinc one seems to be bouncing :-( ) On Mon, 18 Jul 2022 at 12:54, Tobias Roehmel wrote: > > From: Tobias Röhmel > > Signed-off-by: Tobias Röhmel Having looked a bit more carefully at the architecture manual, I think this is not complete. In particular you don't actually define the AArch32 RVBAR register (which needs to exist with permissions permitting only EL2 access if EL2 is the highest EL, and with permissions for EL1 access if EL1 is the highest EL, in the same way the AArch64 RVBAR_EL2 and RVBAR_EL1 work (ie by adding AArch32 alias registers where we define RVBAR_EL2 and RVBAR_EL1). (AArch32 has no equivalent of RVBAR_EL3; MVBAR is kind of the same thing, but it's not required that it has the actual reset address in it on reset. So we can ignore that as we're architecturally compliant for A-profile, and for the R52 we don't care about EL3.). The commit message here could also usefully be expanded to explain what we're doing and why it's the right thing. > --- > target/arm/cpu.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/target/arm/cpu.c b/target/arm/cpu.c > index 1b5d535788..9007768418 100644 > --- a/target/arm/cpu.c > +++ b/target/arm/cpu.c > @@ -258,6 +258,10 @@ static void arm_cpu_reset(DeviceState *dev) > env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1, > CPACR, CP11, 3); > #endif > +if (arm_feature(env, ARM_FEATURE_V8)) { > +env->cp15.rvbar = cpu->rvbar_prop; > +env->regs[15] = cpu->rvbar_prop; > +} > } > > #if defined(CONFIG_USER_ONLY) > @@ -1273,7 +1277,7 @@ void arm_cpu_post_init(Object *obj) > qdev_property_add_static(DEVICE(obj), > &arm_cpu_reset_hivecs_property); > } > > -if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) { > +if (arm_feature(&cpu->env, ARM_FEATURE_V8)) { > object_property_add_uint64_ptr(obj, "rvbar", > &cpu->rvbar_prop, > OBJ_PROP_FLAG_READWRITE); thanks -- PMM
[PATCH 2/2] target/riscv: fence: reconcile with specification
Our decoding of fence-instructions is problematic in respect to the RISC-V ISA specification: - rs and rd are ignored, but need to be 0 - fm is ignored This change adjusts the decode pattern to enfore rs and rd being 0, and validates the fm-field (together with pred/succ for FENCE.TSO) to determine whether a reserved instruction is specified. While the specification allows UNSPECIFIED behaviour for reserved instructions, we now always raise an illegal instruction exception. Signed-off-by: Philipp Tomsich --- target/riscv/insn32.decode | 2 +- target/riscv/insn_trans/trans_rvi.c.inc | 19 ++- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 089128c3dc..4e53df1b62 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -150,7 +150,7 @@ srl 000 .. 101 . 0110011 @r sra 010 .. 101 . 0110011 @r or 000 .. 110 . 0110011 @r and 000 .. 111 . 0110011 @r -fence pred:4 succ:4 - 000 - 000 +fencefm:4 pred:4 succ:4 0 000 0 000 fence_i 0 001 0 000 csrrw . 001 . 1110011 @csr csrrs . 010 . 1110011 @csr diff --git a/target/riscv/insn_trans/trans_rvi.c.inc b/target/riscv/insn_trans/trans_rvi.c.inc index ca8e3d1ea1..515bb3b22a 100644 --- a/target/riscv/insn_trans/trans_rvi.c.inc +++ b/target/riscv/insn_trans/trans_rvi.c.inc @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a) static bool trans_fence(DisasContext *ctx, arg_fence *a) { -/* FENCE is a full memory barrier. */ +switch (a->fm) { +case 0b: + /* normal fence */ + break; + +case 0b0001: + /* FENCE.TSO requires PRED and SUCC to be RW */ + if (a->pred != 0xb0011 || a->succ != 0b0011) { +return false; + } + break; + +default: +/* reserved for future use */ +return false; +} + +/* We implement FENCE(.TSO) is a full memory barrier. */ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); return true; } -- 2.34.1
[PATCH 1/2] target/riscv: fence.i: update decode pattern
The RISC-V specification specifies imm12, rs1 and rd to be all-zeros, so we can't ignore these bits when decoding into fence.i. Update the decode pattern to reflect the specification. Signed-off-by: Philipp Tomsich --- target/riscv/insn32.decode | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 014127d066..089128c3dc 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -151,7 +151,7 @@ sra 010 .. 101 . 0110011 @r or 000 .. 110 . 0110011 @r and 000 .. 111 . 0110011 @r fence pred:4 succ:4 - 000 - 000 -fence_i - 001 - 000 +fence_i 0 001 0 000 csrrw . 001 . 1110011 @csr csrrs . 010 . 1110011 @csr csrrc . 011 . 1110011 @csr -- 2.34.1
Re: [PATCH 2/2] target/riscv: fence: reconcile with specification
On Fri, 12 Aug 2022 at 14:17, Philipp Tomsich wrote: > > Our decoding of fence-instructions is problematic in respect to the > RISC-V ISA specification: > - rs and rd are ignored, but need to be 0 > - fm is ignored > > This change adjusts the decode pattern to enfore rs and rd being 0, > and validates the fm-field (together with pred/succ for FENCE.TSO) to > determine whether a reserved instruction is specified. > > While the specification allows UNSPECIFIED behaviour for reserved > instructions, we now always raise an illegal instruction exception. > > Signed-off-by: Philipp Tomsich > > --- > > target/riscv/insn32.decode | 2 +- > target/riscv/insn_trans/trans_rvi.c.inc | 19 ++- > 2 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode > index 089128c3dc..4e53df1b62 100644 > --- a/target/riscv/insn32.decode > +++ b/target/riscv/insn32.decode > @@ -150,7 +150,7 @@ srl 000 .. 101 . 0110011 @r > sra 010 .. 101 . 0110011 @r > or 000 .. 110 . 0110011 @r > and 000 .. 111 . 0110011 @r > -fence pred:4 succ:4 - 000 - 000 > +fencefm:4 pred:4 succ:4 0 000 0 000 > fence_i 0 001 0 000 > csrrw . 001 . 1110011 @csr > csrrs . 010 . 1110011 @csr > diff --git a/target/riscv/insn_trans/trans_rvi.c.inc > b/target/riscv/insn_trans/trans_rvi.c.inc > index ca8e3d1ea1..515bb3b22a 100644 > --- a/target/riscv/insn_trans/trans_rvi.c.inc > +++ b/target/riscv/insn_trans/trans_rvi.c.inc > @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a) > > static bool trans_fence(DisasContext *ctx, arg_fence *a) > { > -/* FENCE is a full memory barrier. */ > +switch (a->fm) { > +case 0b: > + /* normal fence */ > + break; > + > +case 0b0001: > + /* FENCE.TSO requires PRED and SUCC to be RW */ > + if (a->pred != 0xb0011 || a->succ != 0b0011) { > +return false; > + } > + break; > + > +default: > +/* reserved for future use */ > +return false; > +} I think it would be neater to do this decode in the .decode file, rather than by hand in the trans function. thanks -- PMM
[PATCH v4 1/1] os-posix: asynchronous teardown for shutdown on Linux
This patch adds support for asynchronously tearing down a VM on Linux. When qemu terminates, either naturally or because of a fatal signal, the VM is torn down. If the VM is huge, it can take a considerable amount of time for it to be cleaned up. In case of a protected VM, it might take even longer than a non-protected VM (this is the case on s390x, for example). Some users might want to shut down a VM and restart it immediately, without having to wait. This is especially true if management infrastructure like libvirt is used. This patch implements a simple trick on Linux to allow qemu to return immediately, with the teardown of the VM being performed asynchronously. If the new commandline option -async-teardown is used, a new process is spawned from qemu at startup, using the clone syscall, in such way that it will share its address space with qemu.The new process will have the name "cleanup/". It will wait until qemu terminates completely, and then it will exit itself. This allows qemu to terminate quickly, without having to wait for the whole address space to be torn down. The cleanup process will exit after qemu, so it will be the last user of the address space, and therefore it will take care of the actual teardown. The cleanup process will share the same cgroups as qemu, so both memory usage and cpu time will be accounted properly. If possible, close_range will be used in the cleanup process to close all open file descriptors. If it is not available or if it fails, /proc will be used to determine which file descriptors to close. If the cleanup process is forcefully killed with SIGKILL before the main qemu process has terminated completely, the mechanism is defeated and the teardown will not be asynchronous. This feature can already be used with libvirt by adding the following to the XML domain definition to pass the parameter to qemu directly: http://libvirt.org/schemas/domain/qemu/1.0";> Signed-off-by: Claudio Imbrenda Reviewed-by: Murilo Opsfelder Araujo Tested-by: Murilo Opsfelder Araujo --- include/qemu/async-teardown.h | 22 + meson.build | 1 + os-posix.c| 6 ++ qemu-options.hx | 19 + util/async-teardown.c | 155 ++ util/meson.build | 1 + 6 files changed, 204 insertions(+) create mode 100644 include/qemu/async-teardown.h create mode 100644 util/async-teardown.c diff --git a/include/qemu/async-teardown.h b/include/qemu/async-teardown.h new file mode 100644 index 00..092e7a37e7 --- /dev/null +++ b/include/qemu/async-teardown.h @@ -0,0 +1,22 @@ +/* + * Asynchronous teardown + * + * Copyright IBM, Corp. 2022 + * + * Authors: + * Claudio Imbrenda + * + * This work is licensed under the terms of the GNU GPL, version 2 or (at your + * option) any later version. See the COPYING file in the top-level directory. + * + */ +#ifndef QEMU_ASYNC_TEARDOWN_H +#define QEMU_ASYNC_TEARDOWN_H + +#include "config-host.h" + +#ifdef CONFIG_LINUX +void init_async_teardown(void); +#endif + +#endif diff --git a/meson.build b/meson.build index 294e9a8f32..7bccad93d0 100644 --- a/meson.build +++ b/meson.build @@ -1892,6 +1892,7 @@ config_host_data.set('HAVE_SYS_IOCCOM_H', cc.has_header('sys/ioccom.h')) config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h')) # has_function +config_host_data.set('CONFIG_CLOSE_RANGE', cc.has_function('close_range')) config_host_data.set('CONFIG_ACCEPT4', cc.has_function('accept4')) config_host_data.set('CONFIG_CLOCK_ADJTIME', cc.has_function('clock_adjtime')) config_host_data.set('CONFIG_DUP3', cc.has_function('dup3')) diff --git a/os-posix.c b/os-posix.c index 321fc4bd13..4858650c3e 100644 --- a/os-posix.c +++ b/os-posix.c @@ -39,6 +39,7 @@ #ifdef CONFIG_LINUX #include +#include "qemu/async-teardown.h" #endif /* @@ -150,6 +151,11 @@ int os_parse_cmd_args(int index, const char *optarg) case QEMU_OPTION_daemonize: daemonize = 1; break; +#if defined(CONFIG_LINUX) +case QEMU_OPTION_asyncteardown: +init_async_teardown(); +break; +#endif default: return -1; } diff --git a/qemu-options.hx b/qemu-options.hx index 3f23a42fa8..f913fc307f 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -4743,6 +4743,25 @@ HXCOMM Internal use DEF("qtest", HAS_ARG, QEMU_OPTION_qtest, "", QEMU_ARCH_ALL) DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL) +#ifdef __linux__ +DEF("async-teardown", 0, QEMU_OPTION_asyncteardown, +"-async-teardown enable asynchronous teardown\n", +QEMU_ARCH_ALL) +#endif +SRST +``-async-teardown`` +Enable asynchronous teardown. A new process called "cleanup/" +will be created at startup sharing the address space with the main qemu +process, using clone. It will wait for the main qemu process to +terminate completely, and then exit. +This allows qemu to terminate very quickly even if the guest was
[PATCH v2] hw/smbios: support for type 8 (port connector)
PATCH v1: add support for SMBIOS type 8 to qemu PATCH v2: incorporate patch v1 feedback and add smbios type=8 to qemu-options internal_reference: internal reference designator external_reference: external reference designator connector_type: hex value for port connector type (see SMBIOS 7.9.2) port_type: hex value for port type (see SMBIOS 7.9.3) After studying various vendor implementationsi (Dell, Lenovo, MSI), the value of internal connector type was hard-coded to 0x0 (None). Example usage: -smbios type=8,internal_reference=JUSB1,external_reference=USB1,connector_type=0x12,port_type=0x10 \ -smbios type=8,internal_reference=JAUD1,external_reference="Audio Jack",connector_type=0x1f,port_type=0x1d \ -smbios type=8,internal_reference=LAN,external_reference=Ethernet,connector_type=0x0b,port_type=0x1f \ -smbios type=8,internal_reference=PS2,external_reference=Mouse,connector_type=0x0f,port_type=0x0e \ -smbios type=8,internal_reference=PS2,external_reference=Keyboard,connector_type=0x0f,port_type=0x0d Signed-off-by: Hal Martin --- hw/smbios/smbios.c | 63 include/hw/firmware/smbios.h | 10 ++ qemu-options.hx | 2 ++ 3 files changed, 75 insertions(+) diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c index 60349ee402..578cae0f0a 100644 --- a/hw/smbios/smbios.c +++ b/hw/smbios/smbios.c @@ -111,6 +111,13 @@ static struct { .processor_id = 0, }; +struct type8_instance { +const char *internal_reference, *external_reference; +uint8_t connector_type, port_type; +QTAILQ_ENTRY(type8_instance) next; +}; +static QTAILQ_HEAD(, type8_instance) type8 = QTAILQ_HEAD_INITIALIZER(type8); + static struct { size_t nvalues; char **values; @@ -337,6 +344,29 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = { { /* end of list */ } }; +static const QemuOptDesc qemu_smbios_type8_opts[] = { +{ +.name = "internal_reference", +.type = QEMU_OPT_STRING, +.help = "internal reference designator", +}, +{ +.name = "external_reference", +.type = QEMU_OPT_STRING, +.help = "external reference designator", +}, +{ +.name = "connector_type", +.type = QEMU_OPT_NUMBER, +.help = "connector type", +}, +{ +.name = "port_type", +.type = QEMU_OPT_NUMBER, +.help = "port type", +}, +}; + static const QemuOptDesc qemu_smbios_type11_opts[] = { { .name = "value", @@ -718,6 +748,26 @@ static void smbios_build_type_4_table(MachineState *ms, unsigned instance) smbios_type4_count++; } +static void smbios_build_type_8_table(void) +{ +unsigned instance = 0; +struct type8_instance *t8; + +QTAILQ_FOREACH(t8, &type8, next) { +SMBIOS_BUILD_TABLE_PRE(8, T0_BASE + instance, true); + +SMBIOS_TABLE_SET_STR(8, internal_reference_str, t8->internal_reference); +SMBIOS_TABLE_SET_STR(8, external_reference_str, t8->external_reference); +/* most vendors seem to set this to None */ +t->internal_connector_type = 0x0; +t->external_connector_type = t8->connector_type; +t->port_type = t8->port_type; + +SMBIOS_BUILD_TABLE_POST; +instance++; +} +} + static void smbios_build_type_11_table(void) { char count_str[128]; @@ -1030,6 +1080,7 @@ void smbios_get_tables(MachineState *ms, smbios_build_type_4_table(ms, i); } +smbios_build_type_8_table(); smbios_build_type_11_table(); #define MAX_DIMM_SZ (16 * GiB) @@ -1346,6 +1397,18 @@ void smbios_entry_add(QemuOpts *opts, Error **errp) UINT16_MAX); } return; +case 8: +if (!qemu_opts_validate(opts, qemu_smbios_type8_opts, errp)) { +return; +} +struct type8_instance *t; +t = g_new0(struct type8_instance, 1); +save_opt(&t->internal_reference, opts, "internal_reference"); +save_opt(&t->external_reference, opts, "external_reference"); +t->connector_type = qemu_opt_get_number(opts, "connector_type", 0); +t->port_type = qemu_opt_get_number(opts, "port_type", 0); +QTAILQ_INSERT_TAIL(&type8, t, next); +return; case 11: if (!qemu_opts_validate(opts, qemu_smbios_type11_opts, errp)) { return; diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h index 4b7ad77a44..e7d386f7c8 100644 --- a/include/hw/firmware/smbios.h +++ b/include/hw/firmware/smbios.h @@ -189,6 +189,16 @@ struct smbios_type_4 { uint16_t processor_family2; } QEMU_PACKED; +/* SMBIOS type 8 - Port Connector Information */ +struct smbios_type_8 { +struct smbios_structure_header header; +uint8_t internal_reference_str; +uint8_t internal_connector_type; +uint8_t external_reference_str; +uint8_t external_connec
Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern
Please use a cover-letter for multi-patch patch series. On Fri, Aug 12, 2022 at 03:13:03PM +0200, Philipp Tomsich wrote: > The RISC-V specification specifies imm12, rs1 and rd to be all-zeros, > so we can't ignore these bits when decoding into fence.i. > > Update the decode pattern to reflect the specification. I got hung-up on this for a bit since there isn't any "must-be-0" fields, only ignored fields, but the next patch gives a clue which helped me make sense of this. The encoding of these instructions with ignored fields set to anything except zero gets into reserved instruction territory, and QEMU may legally raise an illegal-instruction in that case, which this patch will start doing. It'd be nice to have a bit more text in this commit message to make that clear. > > Signed-off-by: Philipp Tomsich > --- > > target/riscv/insn32.decode | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode > index 014127d066..089128c3dc 100644 > --- a/target/riscv/insn32.decode > +++ b/target/riscv/insn32.decode > @@ -151,7 +151,7 @@ sra 010 .. 101 . 0110011 @r > or 000 .. 110 . 0110011 @r > and 000 .. 111 . 0110011 @r > fence pred:4 succ:4 - 000 - 000 > -fence_i - 001 - 000 > +fence_i 0 001 0 000 ^ need two more spaces here to line up with fence. > csrrw . 001 . 1110011 @csr > csrrs . 010 . 1110011 @csr > csrrc . 011 . 1110011 @csr > -- > 2.34.1 > > Thanks, drew
Re: [PATCH 2/2] target/riscv: fence: reconcile with specification
On Fri, Aug 12, 2022 at 03:13:04PM +0200, Philipp Tomsich wrote: > Our decoding of fence-instructions is problematic in respect to the > RISC-V ISA specification: > - rs and rd are ignored, but need to be 0 > - fm is ignored > > This change adjusts the decode pattern to enfore rs and rd being 0, > and validates the fm-field (together with pred/succ for FENCE.TSO) to > determine whether a reserved instruction is specified. > > While the specification allows UNSPECIFIED behaviour for reserved > instructions, we now always raise an illegal instruction exception. > > Signed-off-by: Philipp Tomsich > > --- > > target/riscv/insn32.decode | 2 +- > target/riscv/insn_trans/trans_rvi.c.inc | 19 ++- > 2 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode > index 089128c3dc..4e53df1b62 100644 > --- a/target/riscv/insn32.decode > +++ b/target/riscv/insn32.decode > @@ -150,7 +150,7 @@ srl 000 .. 101 . 0110011 @r > sra 010 .. 101 . 0110011 @r > or 000 .. 110 . 0110011 @r > and 000 .. 111 . 0110011 @r > -fence pred:4 succ:4 - 000 - 000 > +fencefm:4 pred:4 succ:4 0 000 0 000 > fence_i 0 001 0 000 > csrrw . 001 . 1110011 @csr > csrrs . 010 . 1110011 @csr > diff --git a/target/riscv/insn_trans/trans_rvi.c.inc > b/target/riscv/insn_trans/trans_rvi.c.inc > index ca8e3d1ea1..515bb3b22a 100644 > --- a/target/riscv/insn_trans/trans_rvi.c.inc > +++ b/target/riscv/insn_trans/trans_rvi.c.inc > @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a) > > static bool trans_fence(DisasContext *ctx, arg_fence *a) > { > -/* FENCE is a full memory barrier. */ > +switch (a->fm) { > +case 0b: > + /* normal fence */ > + break; > + > +case 0b0001: > + /* FENCE.TSO requires PRED and SUCC to be RW */ > + if (a->pred != 0xb0011 || a->succ != 0b0011) { > +return false; > + } > + break; > + > +default: > +/* reserved for future use */ > +return false; > +} > + > +/* We implement FENCE(.TSO) is a full memory barrier. */ s/is/as/ Thanks, drew > tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); > return true; > } > -- > 2.34.1 > >
Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern
On Fri, 12 Aug 2022 at 16:01, Andrew Jones wrote: > > > Update the decode pattern to reflect the specification. > > I got hung-up on this for a bit since there isn't any "must-be-0" fields, Please refer to '“Zifencei” Instruction-Fetch Fence, Version 2.0' in the specification. The encoding diagram clearly states 0 for imm[11:0], 0 for rs1 and 0 for rd. However, there is an explanatory paragraph below (unfortunately, it is not clear whether this is normative or informative): > The unused fields in the FENCE.I instruction, imm[11:0], rs1, and rd, are > reserved for finer-grain fences in future extensions. For forward > compatibility, base implementations shall ignore these fields, and standard > software shall zero these fields. Strictly speaking, this patch may be too restrictive (it violates the "for forward-compatibility" part — which I consider informative only, though). Thanks, Philipp.
Re: [PATCH v2 1/8] parallels: Out of image offset in BAT leads to image inflation
On 11.08.2022 17:00, Alexander Ivanov wrote: When an image is opened, data_end field in BDRVParallelsState is setted as the biggest offset in the BAT plus cluster size. If there is a corrupted offset pointing outside the image, the image size increase accordingly. It potentially leads to attempts to create a file size of petabytes. Set the data_end field with the original file size if the image was opened for checking and repairing purposes or raise an error. v2: No changes. Changelog should be below --- In that case it will not be merged. There are a lot of typos/mistakes inside, I'd better use the comment below. "data_end field in BDRVParallelsState is set to the biggest offset present in BAT. If this offset is outside of the image, any further write will create the cluster at this offset and/or the image will be truncated to this offset on close. This is definitely not correct and should be fixed." With this change: Reviewed-by: Denis V. Lunev Signed-off-by: Alexander Ivanov --- block/parallels.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/parallels.c b/block/parallels.c index a229c06f25..a76cf9d993 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags, BDRVParallelsState *s = bs->opaque; ParallelsHeader ph; int ret, size, i; +int64_t file_size; QemuOpts *opts = NULL; Error *local_err = NULL; char *buf; @@ -811,6 +812,22 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags, } } +file_size = bdrv_getlength(bs->file->bs); +if (file_size < 0) { +goto fail; +} + +file_size >>= BDRV_SECTOR_BITS; +if (s->data_end > file_size) { +if (flags & BDRV_O_CHECK) { +s->data_end = file_size; +} else { +error_setg(errp, "parallels: Offset in BAT is out of image"); +ret = -EINVAL; +goto fail; +} +} + if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) { /* Image was not closed correctly. The check is mandatory */ s->header_unclean = true;
Re: [PATCH 2/2] target/riscv: fence: reconcile with specification
Happy to lower it back into the decode file. However, I initially pulled it up into the trans-function to more closely match the ISA specification: there is only one FENCE instruction with 3 arguments (FM, PRED, and SUCC). One might argue that the decode table for "RV32I Base Instruction Set" in the specification lists FENCE.TSO as a separate instruction, but the normative text doesn't (and FENCE overlaps FENCE.TSO in the tabular representation) — so I would consider the table as informative. I'll wait until we see what consensus emerges from the discussion. Philipp. On Fri, 12 Aug 2022 at 15:21, Peter Maydell wrote: > > On Fri, 12 Aug 2022 at 14:17, Philipp Tomsich > wrote: > > > > Our decoding of fence-instructions is problematic in respect to the > > RISC-V ISA specification: > > - rs and rd are ignored, but need to be 0 > > - fm is ignored > > > > This change adjusts the decode pattern to enfore rs and rd being 0, > > and validates the fm-field (together with pred/succ for FENCE.TSO) to > > determine whether a reserved instruction is specified. > > > > While the specification allows UNSPECIFIED behaviour for reserved > > instructions, we now always raise an illegal instruction exception. > > > > Signed-off-by: Philipp Tomsich > > > > --- > > > > target/riscv/insn32.decode | 2 +- > > target/riscv/insn_trans/trans_rvi.c.inc | 19 ++- > > 2 files changed, 19 insertions(+), 2 deletions(-) > > > > diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode > > index 089128c3dc..4e53df1b62 100644 > > --- a/target/riscv/insn32.decode > > +++ b/target/riscv/insn32.decode > > @@ -150,7 +150,7 @@ srl 000 .. 101 . 0110011 @r > > sra 010 .. 101 . 0110011 @r > > or 000 .. 110 . 0110011 @r > > and 000 .. 111 . 0110011 @r > > -fence pred:4 succ:4 - 000 - 000 > > +fencefm:4 pred:4 succ:4 0 000 0 000 > > fence_i 0 001 0 000 > > csrrw . 001 . 1110011 @csr > > csrrs . 010 . 1110011 @csr > > diff --git a/target/riscv/insn_trans/trans_rvi.c.inc > > b/target/riscv/insn_trans/trans_rvi.c.inc > > index ca8e3d1ea1..515bb3b22a 100644 > > --- a/target/riscv/insn_trans/trans_rvi.c.inc > > +++ b/target/riscv/insn_trans/trans_rvi.c.inc > > @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a) > > > > static bool trans_fence(DisasContext *ctx, arg_fence *a) > > { > > -/* FENCE is a full memory barrier. */ > > +switch (a->fm) { > > +case 0b: > > + /* normal fence */ > > + break; > > + > > +case 0b0001: > > + /* FENCE.TSO requires PRED and SUCC to be RW */ > > + if (a->pred != 0xb0011 || a->succ != 0b0011) { > > +return false; > > + } > > + break; > > + > > +default: > > +/* reserved for future use */ > > +return false; > > +} > > I think it would be neater to do this decode in the > .decode file, rather than by hand in the trans function. > > thanks > -- PMM
Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern
On Fri, 12 Aug 2022 at 15:11, Philipp Tomsich wrote: > > On Fri, 12 Aug 2022 at 16:01, Andrew Jones wrote: > > > > > Update the decode pattern to reflect the specification. > > > > I got hung-up on this for a bit since there isn't any "must-be-0" fields, > > Please refer to '“Zifencei” Instruction-Fetch Fence, Version 2.0' in > the specification. > The encoding diagram clearly states 0 for imm[11:0], 0 for rs1 and 0 for rd. > > However, there is an explanatory paragraph below (unfortunately, it is > not clear whether this is normative or informative): > > The unused fields in the FENCE.I instruction, imm[11:0], rs1, and rd, are > > reserved for finer-grain fences in future extensions. For forward > > compatibility, base implementations shall ignore these fields, and standard > > software shall zero these fields. That's pretty clear that this patch is wrong, then -- QEMU is an implementation, and so we must ignore these fields. Otherwise when a future version of the spec defines a finer-grain fence instruction in this part of the encoding space, older QEMU will incorrectly make software that uses it crash. If you think the spec is insufficiently clear about whether that is normative then that would be something to raise with the spec authors, preferably before anybody builds hardware that enforces must-be-zeroes on these fields... thanks -- PMM
Re: [PATCH v2 2/8] parallels: Move BAT entry setting to a separate function
On 11.08.2022 17:00, Alexander Ivanov wrote: Will need to set BAT entry in multiple places. Move the code of settings entries and marking relevant blocks dirty to a separate helper parallels_set_bat_entry. The comment and the patch text is ambiguous. You say that we need to set BAT in multiple places but patch changes one place only. I think that it would be better to say like the following: "parallels: create parallels_set_bat_entry_helper() to assign BAT value This helper will be reused in next patches during parallels_co_check rework to simplify its code" v2: A new patch - a part of a splitted patch. Same note about version tracking With above taken into account: Reviewed-by: Denis V. Lunev Signed-off-by: Alexander Ivanov --- block/parallels.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index a76cf9d993..7f68f3cbc9 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -165,6 +165,13 @@ static int64_t block_status(BDRVParallelsState *s, int64_t sector_num, return start_off; } +static void parallels_set_bat_entry(BDRVParallelsState *s, +uint32_t index, uint32_t offset) +{ +s->bat_bitmap[index] = offset; +bitmap_set(s->bat_dirty_bmap, bat_entry_off(index) / s->bat_dirty_block, 1); +} + static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum) { @@ -250,10 +257,9 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num, } for (i = 0; i < to_allocate; i++) { -s->bat_bitmap[idx + i] = cpu_to_le32(s->data_end / s->off_multiplier); +parallels_set_bat_entry(s, idx + i, +cpu_to_le32(s->data_end / s->off_multiplier)); s->data_end += s->tracks; -bitmap_set(s->bat_dirty_bmap, - bat_entry_off(idx + i) / s->bat_dirty_block, 1); } return bat2sect(s, idx) + sector_num % s->tracks;
Re: [PATCH v2 3/8] parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT flushing
Use generic infrastructure for BAT writing in parallels_co_check() On 11.08.2022 17:00, Alexander Ivanov wrote: It's too costly to write all the BAT to the disk. Let the flush function write only dirty blocks. Use parallels_set_bat_entry for setting a BAT entry and marking a relevant block as dirty. Move bdrv_co_flush call outside the locked area. This is not correct too: - with a lot of cases inside parallels_co_check, which will be split to smaller functions, we would like to avoid writing BAT once each stage is complete Thus the comment should look like the following: "BAT is written in the context of conventional operations over the image inside bdrv_co_flush() when it calls parallels_co_flush_to_os() callback. Thus we should not modify BAT array directly, but call parallels_set_bat_entry() helper and bdrv_co_flush() further on. After that there is no need to manually write BAT and track its modification. This makes code more generic and allows to split parallels_set_bat_entry() for independent pieces." v2: Patch order was changed so the replacement is done in parallels_co_check. Now we use a helper to set BAT entry and mark the block dirty. Same note about changelog as n other patches. Side note. I like Linux kernel a lot and there is a good guide in. Please look how commit message https://www.kernel.org/doc/html/latest/process/submitting-patches.html May be you could spend some more time on commit message. With a better message: Reviewed-by: Denis V. Lunev Signed-off-by: Alexander Ivanov --- block/parallels.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 7f68f3cbc9..6879ea4597 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -428,7 +428,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, int64_t size, prev_off, high_off; int ret; uint32_t i; -bool flush_bat = false; size = bdrv_getlength(bs->file->bs); if (size < 0) { @@ -467,9 +466,8 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, res->corruptions++; if (fix & BDRV_FIX_ERRORS) { prev_off = 0; -s->bat_bitmap[i] = 0; +parallels_set_bat_entry(s, i, 0); res->corruptions_fixed++; -flush_bat = true; continue; } } @@ -485,15 +483,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, prev_off = off; } -ret = 0; -if (flush_bat) { -ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0); -if (ret < 0) { -res->check_errors++; -goto out; -} -} - res->image_end_offset = high_off + s->cluster_size; if (size > res->image_end_offset) { int64_t count; @@ -522,6 +511,12 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, out: qemu_co_mutex_unlock(&s->lock); + +ret = bdrv_co_flush(bs); +if (ret < 0) { +res->check_errors++; +} + return ret; }
Re: [PATCH for-7.2 v4 02/11] ppc/pnv: add phb-id/chip-id PnvPHB4RootBus properties
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: The same rationale provided in the PHB3 bus case applies here. Note: we could have merged both buses in a single object, like we did with the root ports, and spare some boilerplate. The reason we opted to preserve both buses objects is twofold: - there's not user side advantage in doing so. Unifying the root ports presents a clear user QOL change when we enable user created devices back. The buses objects, aside from having a different QOM name, is transparent to the user; - we leave a door opened in case we want to increase the root port limit for phb4/5 later on without having to deal with phb3 code. Reviewed-by: Cédric Le Goater Signed-off-by: Daniel Henrique Barboza --- Reviewed-by: Frederic Barrat Fred hw/pci-host/pnv_phb4.c | 51 ++ include/hw/pci-host/pnv_phb4.h | 10 +++ 2 files changed, 61 insertions(+) diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c index b98c394713..824e1a73fb 100644 --- a/hw/pci-host/pnv_phb4.c +++ b/hw/pci-host/pnv_phb4.c @@ -1551,6 +1551,12 @@ void pnv_phb4_bus_init(DeviceState *dev, PnvPHB4 *phb) pnv_phb4_set_irq, pnv_phb4_map_irq, phb, &phb->pci_mmio, &phb->pci_io, 0, 4, TYPE_PNV_PHB4_ROOT_BUS); + +object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id, +&error_abort); +object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id, +&error_abort); + pci_setup_iommu(pci->bus, pnv_phb4_dma_iommu, phb); pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE; } @@ -1708,10 +1714,55 @@ static const TypeInfo pnv_phb5_type_info = { .instance_size = sizeof(PnvPHB4), }; +static void pnv_phb4_root_bus_get_prop(Object *obj, Visitor *v, + const char *name, + void *opaque, Error **errp) +{ +PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj); +uint64_t value = 0; + +if (strcmp(name, "phb-id") == 0) { +value = bus->phb_id; +} else { +value = bus->chip_id; +} + +visit_type_size(v, name, &value, errp); +} + +static void pnv_phb4_root_bus_set_prop(Object *obj, Visitor *v, + const char *name, + void *opaque, Error **errp) + +{ +PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj); +uint64_t value; + +if (!visit_type_size(v, name, &value, errp)) { +return; +} + +if (strcmp(name, "phb-id") == 0) { +bus->phb_id = value; +} else { +bus->chip_id = value; +} +} + static void pnv_phb4_root_bus_class_init(ObjectClass *klass, void *data) { BusClass *k = BUS_CLASS(klass); +object_class_property_add(klass, "phb-id", "int", + pnv_phb4_root_bus_get_prop, + pnv_phb4_root_bus_set_prop, + NULL, NULL); + +object_class_property_add(klass, "chip-id", "int", + pnv_phb4_root_bus_get_prop, + pnv_phb4_root_bus_set_prop, + NULL, NULL); + /* * PHB4 has only a single root complex. Enforce the limit on the * parent bus diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h index 20aa4819d3..50d4faa001 100644 --- a/include/hw/pci-host/pnv_phb4.h +++ b/include/hw/pci-host/pnv_phb4.h @@ -45,7 +45,17 @@ typedef struct PnvPhb4DMASpace { QLIST_ENTRY(PnvPhb4DMASpace) list; } PnvPhb4DMASpace; +/* + * PHB4 PCIe Root Bus + */ #define TYPE_PNV_PHB4_ROOT_BUS "pnv-phb4-root" +struct PnvPHB4RootBus { +PCIBus parent; + +uint32_t chip_id; +uint32_t phb_id; +}; +OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB4RootBus, PNV_PHB4_ROOT_BUS) /* * PHB4 PCIe Host Bridge for PowerNV machines (POWER9)
Re: [PATCH for-7.2 v4 03/11] ppc/pnv: set root port chassis and slot using Bus properties
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: For default root ports we have a way of accessing chassis and slot, before root_port_realize(), via pnv_phb_attach_root_port(). For the future user created root ports this won't be the case: we can't use this helper because we don't have access to the PHB phb-id/chip-id values. In earlier patches we've added phb-id and chip-id to pnv-phb-root-bus objects. We're now able to use the bus to retrieve them. The bus is reachable for both user created and default devices, so we're changing all the code paths. This also allow us to validate these changes with the existing default devices. Reviewed-by: Cédric Le Goater Signed-off-by: Daniel Henrique Barboza --- Reviewed-by: Frederic Barrat Fred hw/pci-host/pnv_phb.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c index c47ed92462..826c0c144e 100644 --- a/hw/pci-host/pnv_phb.c +++ b/hw/pci-host/pnv_phb.c @@ -25,21 +25,19 @@ * QOM id. 'chip_id' is going to be used as PCIE chassis for the * root port. */ -static void pnv_phb_attach_root_port(PCIHostState *pci, int index, int chip_id) +static void pnv_phb_attach_root_port(PCIHostState *pci) { PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT); -g_autofree char *default_id = g_strdup_printf("%s[%d]", - TYPE_PNV_PHB_ROOT_PORT, - index); const char *dev_id = DEVICE(root)->id; +g_autofree char *default_id = NULL; +int index; + +index = object_property_get_int(OBJECT(pci->bus), "phb-id", &error_fatal); +default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index); object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id, OBJECT(root)); -/* Set unique chassis/slot values for the root port */ -qdev_prop_set_uint8(DEVICE(root), "chassis", chip_id); -qdev_prop_set_uint16(DEVICE(root), "slot", index); - pci_realize_and_unref(root, pci->bus, &error_fatal); } @@ -93,7 +91,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp) pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend)); } -pnv_phb_attach_root_port(pci, phb->phb_id, phb->chip_id); +pnv_phb_attach_root_port(pci); } static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge, @@ -162,9 +160,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, Error **errp) { PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev); PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev); +PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev)); PCIDevice *pci = PCI_DEVICE(dev); uint16_t device_id = 0; Error *local_err = NULL; +int chip_id, index; + +chip_id = object_property_get_int(OBJECT(bus), "chip-id", &error_fatal); +index = object_property_get_int(OBJECT(bus), "phb-id", &error_fatal); + +/* Set unique chassis/slot values for the root port */ +qdev_prop_set_uint8(dev, "chassis", chip_id); +qdev_prop_set_uint16(dev, "slot", index); rpc->parent_realize(dev, &local_err); if (local_err) {
Re: [PATCH v2 4/8] parallels: Move check of unclean image to a separate function
On 11.08.2022 17:00, Alexander Ivanov wrote: v2: Revert the condition with s->header_unclean. same comment about change log as previously And commit message misses motivation part, why we are doing this rework. What is the goal of this change? The code part is clean. Signed-off-by: Alexander Ivanov --- block/parallels.c | 31 +-- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 6879ea4597..c53b2810cf 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -419,6 +419,25 @@ static coroutine_fn int parallels_co_readv(BlockDriverState *bs, return ret; } +static void parallels_check_unclean(BlockDriverState *bs, +BdrvCheckResult *res, +BdrvCheckMode fix) +{ +BDRVParallelsState *s = bs->opaque; + +if (!s->header_unclean) { +return; +} + +fprintf(stderr, "%s image was not closed correctly\n", +fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR"); +res->corruptions++; +if (fix & BDRV_FIX_ERRORS) { +/* parallels_close will do the job right */ +res->corruptions_fixed++; +s->header_unclean = false; +} +} static int coroutine_fn parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res, @@ -436,16 +455,8 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, } qemu_co_mutex_lock(&s->lock); -if (s->header_unclean) { -fprintf(stderr, "%s image was not closed correctly\n", -fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR"); -res->corruptions++; -if (fix & BDRV_FIX_ERRORS) { -/* parallels_close will do the job right */ -res->corruptions_fixed++; -s->header_unclean = false; -} -} + +parallels_check_unclean(bs, res, fix); res->bfi.total_clusters = s->bat_size; res->bfi.compressed_clusters = 0; /* compression is not supported */
Re: [PATCH for-7.2 v4 01/11] ppc/pnv: add phb-id/chip-id PnvPHB3RootBus properties
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: We rely on the phb-id and chip-id, which are PHB properties, to assign chassis and slot to the root port. For default devices this is no big deal: the root port is being created under pnv_phb_realize() and the values are being passed on via the 'index' and 'chip-id' of the pnv_phb_attach_root_port() helper. If we want to implement user created root ports we have a problem. The user created root port will not be aware of which PHB it belongs to, unless we're willing to violate QOM best practices and access the PHB via dev->parent_bus->parent. What we can do is to access the root bus parent bus. Since we're already assigning the root port as QOM child of the bus, and the bus is initiated using PHB properties, let's add phb-id and chip-id as properties of the bus. This will allow us trivial access to them, for both user-created and default root ports, without doing anything too shady with QOM. Reviewed-by: Cédric Le Goater Signed-off-by: Daniel Henrique Barboza --- Reviewed-by: Frederic Barrat Fred hw/pci-host/pnv_phb3.c | 50 ++ include/hw/pci-host/pnv_phb3.h | 9 +- 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/hw/pci-host/pnv_phb3.c b/hw/pci-host/pnv_phb3.c index d4c04a281a..af8575c007 100644 --- a/hw/pci-host/pnv_phb3.c +++ b/hw/pci-host/pnv_phb3.c @@ -1006,6 +1006,11 @@ void pnv_phb3_bus_init(DeviceState *dev, PnvPHB3 *phb) &phb->pci_mmio, &phb->pci_io, 0, 4, TYPE_PNV_PHB3_ROOT_BUS); +object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id, +&error_abort); +object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id, +&error_abort); + pci_setup_iommu(pci->bus, pnv_phb3_dma_iommu, phb); } @@ -1105,10 +1110,55 @@ static const TypeInfo pnv_phb3_type_info = { .instance_init = pnv_phb3_instance_init, }; +static void pnv_phb3_root_bus_get_prop(Object *obj, Visitor *v, + const char *name, + void *opaque, Error **errp) +{ +PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj); +uint64_t value = 0; + +if (strcmp(name, "phb-id") == 0) { +value = bus->phb_id; +} else { +value = bus->chip_id; +} + +visit_type_size(v, name, &value, errp); +} + +static void pnv_phb3_root_bus_set_prop(Object *obj, Visitor *v, + const char *name, + void *opaque, Error **errp) + +{ +PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj); +uint64_t value; + +if (!visit_type_size(v, name, &value, errp)) { +return; +} + +if (strcmp(name, "phb-id") == 0) { +bus->phb_id = value; +} else { +bus->chip_id = value; +} +} + static void pnv_phb3_root_bus_class_init(ObjectClass *klass, void *data) { BusClass *k = BUS_CLASS(klass); +object_class_property_add(klass, "phb-id", "int", + pnv_phb3_root_bus_get_prop, + pnv_phb3_root_bus_set_prop, + NULL, NULL); + +object_class_property_add(klass, "chip-id", "int", + pnv_phb3_root_bus_get_prop, + pnv_phb3_root_bus_set_prop, + NULL, NULL); + /* * PHB3 has only a single root complex. Enforce the limit on the * parent bus diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h index bff69201d9..4854f6d2f6 100644 --- a/include/hw/pci-host/pnv_phb3.h +++ b/include/hw/pci-host/pnv_phb3.h @@ -104,9 +104,16 @@ struct PnvPBCQState { }; /* - * PHB3 PCIe Root port + * PHB3 PCIe Root Bus */ #define TYPE_PNV_PHB3_ROOT_BUS "pnv-phb3-root" +struct PnvPHB3RootBus { +PCIBus parent; + +uint32_t chip_id; +uint32_t phb_id; +}; +OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB3RootBus, PNV_PHB3_ROOT_BUS) /* * PHB3 PCIe Host Bridge for PowerNV machines (POWER8)
Re: [PATCH v2 5/8] parallels: Move check of cluster outside image to a separate function
On 11.08.2022 17:00, Alexander Ivanov wrote: v2: Move unrelated helper parallels_set_bat_entry creation to a separate patch. same notes as for previous patch Signed-off-by: Alexander Ivanov --- block/parallels.c | 48 ++- 1 file changed, 35 insertions(+), 13 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index c53b2810cf..12104ba5ad 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -439,6 +439,36 @@ static void parallels_check_unclean(BlockDriverState *bs, } } +static int parallels_check_outside_image(BlockDriverState *bs, + BdrvCheckResult *res, + BdrvCheckMode fix) +{ +BDRVParallelsState *s = bs->opaque; +uint32_t i; +int64_t off, size; + +size = bdrv_getlength(bs->file->bs); +if (size < 0) { +res->check_errors++; +return size; +} + +for (i = 0; i < s->bat_size; i++) { +off = bat2sect(s, i) << BDRV_SECTOR_BITS; +if (off > size) { +fprintf(stderr, "%s cluster %u is outside image\n", +fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i); +res->corruptions++; +if (fix & BDRV_FIX_ERRORS) { +parallels_set_bat_entry(s, i, 0); +res->corruptions_fixed++; +} +} +} + +return 0; +} + static int coroutine_fn parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix) @@ -458,6 +488,11 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, parallels_check_unclean(bs, res, fix); +ret = parallels_check_outside_image(bs, res, fix); +if (ret < 0) { +goto out; +} + res->bfi.total_clusters = s->bat_size; res->bfi.compressed_clusters = 0; /* compression is not supported */ @@ -470,19 +505,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, continue; } -/* cluster outside the image */ -if (off > size) { -fprintf(stderr, "%s cluster %u is outside image\n", -fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i); -res->corruptions++; -if (fix & BDRV_FIX_ERRORS) { -prev_off = 0; -parallels_set_bat_entry(s, i, 0); -res->corruptions_fixed++; -continue; -} -} - res->bfi.allocated_clusters++; if (off > high_off) { high_off = off;
Re: [PULL 02/28] target/arm: Add coproc parameter to syn_fp_access_trap
On Fri, 10 Jun 2022 at 17:07, Peter Maydell wrote: > > From: Richard Henderson > > With ARMv8, this field is always RES0. > With ARMv7, targeting EL2 and TA=0, it is always 0xA. I was just looking at this change again because we still have the loose end of syn_simd_access_trap() not being used, and I realized that the claim in this commit message and the comment isn't right. The "RES0 or fill in TA/copro fields" test is not v8 vs v7, but "are we reporting this syndrome to AArch64 in ESR_ELx or to AArch32 in HSR?". I filed https://gitlab.com/qemu-project/qemu/-/issues/1153 to make a note of this since we might not get around to fixing this for a while, given it's not very important. thanks -- PMM
Re: [PATCH v2 6/8] parallels: Move check of leaks to a separate function
On 11.08.2022 17:00, Alexander Ivanov wrote: v2: No changes. same notes about motivation, changelog as before Signed-off-by: Alexander Ivanov --- block/parallels.c | 85 +-- 1 file changed, 52 insertions(+), 33 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 12104ba5ad..8737eadfb4 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -469,14 +469,13 @@ static int parallels_check_outside_image(BlockDriverState *bs, return 0; } -static int coroutine_fn parallels_co_check(BlockDriverState *bs, - BdrvCheckResult *res, - BdrvCheckMode fix) +static int parallels_check_leak(BlockDriverState *bs, +BdrvCheckResult *res, +BdrvCheckMode fix) { BDRVParallelsState *s = bs->opaque; -int64_t size, prev_off, high_off; -int ret; -uint32_t i; +int64_t size, off, high_off, count; +int i, ret; size = bdrv_getlength(bs->file->bs); if (size < 0) { @@ -484,41 +483,16 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, return size; } -qemu_co_mutex_lock(&s->lock); - -parallels_check_unclean(bs, res, fix); - -ret = parallels_check_outside_image(bs, res, fix); -if (ret < 0) { -goto out; -} - -res->bfi.total_clusters = s->bat_size; -res->bfi.compressed_clusters = 0; /* compression is not supported */ - high_off = 0; -prev_off = 0; for (i = 0; i < s->bat_size; i++) { -int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS; -if (off == 0) { -prev_off = 0; -continue; -} - -res->bfi.allocated_clusters++; +off = bat2sect(s, i) << BDRV_SECTOR_BITS; if (off > high_off) { high_off = off; } - -if (prev_off != 0 && (prev_off + s->cluster_size) != off) { -res->bfi.fragmented_clusters++; -} -prev_off = off; } res->image_end_offset = high_off + s->cluster_size; if (size > res->image_end_offset) { -int64_t count; count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size); fprintf(stderr, "%s space leaked at the end of the image %" PRId64 "\n", fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR", @@ -536,11 +510,56 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, if (ret < 0) { error_report_err(local_err); res->check_errors++; -goto out; +return ret; } res->leaks_fixed += count; } } +return 0; +} + +static int coroutine_fn parallels_co_check(BlockDriverState *bs, + BdrvCheckResult *res, + BdrvCheckMode fix) +{ +BDRVParallelsState *s = bs->opaque; +int64_t prev_off; +int ret; +uint32_t i; + + please remove extra empty line. This looks like an accident +qemu_co_mutex_lock(&s->lock); + +parallels_check_unclean(bs, res, fix); + +ret = parallels_check_outside_image(bs, res, fix); +if (ret < 0) { +goto out; +} + +ret = parallels_check_leak(bs, res, fix); +if (ret < 0) { +goto out; +} + +res->bfi.total_clusters = s->bat_size; +res->bfi.compressed_clusters = 0; /* compression is not supported */ + +prev_off = 0; +for (i = 0; i < s->bat_size; i++) { +int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS; +if (off == 0) { +prev_off = 0; +continue; +} + +res->bfi.allocated_clusters++; + +if (prev_off != 0 && (prev_off + s->cluster_size) != off) { +res->bfi.fragmented_clusters++; +} +prev_off = off; +} out: qemu_co_mutex_unlock(&s->lock);
Re: [PATCH v2 7/8] parallels: Move statistic collection to a separate function
On 11.08.2022 17:00, Alexander Ivanov wrote: v2: Move fragmentation counting code to this function too. same note here about ChnageLog and motivation Signed-off-by: Alexander Ivanov --- block/parallels.c | 54 +++ 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 8737eadfb4..d0364182bb 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -518,48 +518,56 @@ static int parallels_check_leak(BlockDriverState *bs, return 0; } -static int coroutine_fn parallels_co_check(BlockDriverState *bs, - BdrvCheckResult *res, - BdrvCheckMode fix) +static void parallels_collect_statistics(BlockDriverState *bs, + BdrvCheckResult *res, + BdrvCheckMode fix) { BDRVParallelsState *s = bs->opaque; -int64_t prev_off; -int ret; +int64_t off, prev_off; uint32_t i; - -qemu_co_mutex_lock(&s->lock); - -parallels_check_unclean(bs, res, fix); - -ret = parallels_check_outside_image(bs, res, fix); -if (ret < 0) { -goto out; -} - -ret = parallels_check_leak(bs, res, fix); -if (ret < 0) { -goto out; -} - res->bfi.total_clusters = s->bat_size; res->bfi.compressed_clusters = 0; /* compression is not supported */ prev_off = 0; for (i = 0; i < s->bat_size; i++) { -int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS; +off = bat2sect(s, i) << BDRV_SECTOR_BITS; if (off == 0) { prev_off = 0; continue; } -res->bfi.allocated_clusters++; - if (prev_off != 0 && (prev_off + s->cluster_size) != off) { res->bfi.fragmented_clusters++; } + prev_off = off; +res->bfi.allocated_clusters++; } +} + +static int coroutine_fn parallels_co_check(BlockDriverState *bs, + BdrvCheckResult *res, + BdrvCheckMode fix) +{ +BDRVParallelsState *s = bs->opaque; +int ret; + +qemu_co_mutex_lock(&s->lock); + +parallels_check_unclean(bs, res, fix); + +ret = parallels_check_outside_image(bs, res, fix); +if (ret < 0) { +goto out; +} + +ret = parallels_check_leak(bs, res, fix); +if (ret < 0) { +goto out; +} + +parallels_collect_statistics(bs, res, fix); out: qemu_co_mutex_unlock(&s->lock);
Re: [PATCH for-7.2 v4 04/11] ppc/pnv: add helpers for pnv-phb user devices
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: pnv_parent_qom_fixup() and pnv_parent_bus_fixup() are versions of the helpers that were reverted by commit 9c10d86fee "ppc/pnv: Remove user-created PHB{3,4,5} devices". They are needed to amend the QOM and bus hierarchies of user created pnv-phbs, matching them with default pnv-phbs. A new helper pnv_phb_user_device_init() is created to handle user-created devices setup. We're going to call it inside pnv_phb_realize() in case we're realizing an user created device. This will centralize all user device realated in a single spot, leaving the realize functions of the phb3/phb4 backends untouched. Another helper called pnv_chip_add_phb() was added to handle the particularities of each chip version when adding a new PHB. Signed-off-by: Daniel Henrique Barboza --- Just a minor typo in a comment below. Reviewed-by: Frederic Barrat diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index f9e5a3d248..2deaac17f7 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -281,6 +281,26 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t pir, g_free(reg); } +/* + * Adds a PnvPHB to the chip. Returns the parent obj of the + * PHB which varies with each version (phb version 3 is parented + * by the chip, version 4 and 4 are parented by the PEC typo-^ Fred + * device). + * + * TODO: for version 3 we're still parenting the PHB with the + * chip. We should parent with a (so far not implemented) + * PHB3 PEC device. + */ +Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp) +{ +if (phb->version == 3) { +return OBJECT(chip); +} else { +/* phb4 support will be added later */ +return NULL; +} +} + static void pnv_chip_power8_dt_populate(PnvChip *chip, void *fdt) { static const char compat[] = "ibm,power8-xscom\0ibm,xscom"; diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h index 033d907287..781d0acffa 100644 --- a/include/hw/ppc/pnv.h +++ b/include/hw/ppc/pnv.h @@ -231,6 +231,7 @@ struct PnvMachineState { }; PnvChip *pnv_get_chip(PnvMachineState *pnv, uint32_t chip_id); +Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp); #define PNV_FDT_ADDR 0x0100 #define PNV_TIMEBASE_FREQ 51200ULL
Re: [PATCH for-7.2 v4 05/11] ppc/pnv: turn chip8->phbs[] into a PnvPHB* array
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: When enabling user created PHBs (a change reverted by commit 9c10d86fee) we were handling PHBs created by default versus by the user in different manners. The only difference between these PHBs is that one will have a valid phb3->chip that is assigned during pnv_chip_power8_realize(), while the user created needs to search which chip it belongs to. Aside from that there shouldn't be any difference. Making the default PHBs behave in line with the user created ones will make it easier to re-introduce them later on. It will also make the code easier to follow since we are dealing with them in equal manner. The first step is to turn chip8->phbs[] into a PnvPHB3 pointer array. This will allow us to assign user created PHBs into it later on. The way we initilize the default case is now more in line with that would happen with the user created case: the object is created, parented by the chip because pnv_xscom_dt() relies on it, and then assigned to the array. Signed-off-by: Daniel Henrique Barboza --- Reviewed-by: Frederic Barrat Fred hw/ppc/pnv.c | 27 ++- include/hw/ppc/pnv.h | 6 +- 2 files changed, 27 insertions(+), 6 deletions(-) diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index 2deaac17f7..e53e9e297d 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -294,6 +294,13 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t pir, Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp) { if (phb->version == 3) { +Pnv8Chip *chip8 = PNV8_CHIP(chip); + +phb->chip = chip; + +chip8->phbs[chip8->num_phbs] = phb; +chip8->num_phbs++; + return OBJECT(chip); } else { /* phb4 support will be added later */ @@ -681,7 +688,7 @@ static void pnv_chip_power8_pic_print_info(PnvChip *chip, Monitor *mon) ics_pic_print_info(&chip8->psi.ics, mon); for (i = 0; i < chip8->num_phbs; i++) { -PnvPHB *phb = &chip8->phbs[i]; +PnvPHB *phb = chip8->phbs[i]; PnvPHB3 *phb3 = PNV_PHB3(phb->backend); pnv_phb3_msi_pic_print_info(&phb3->msis, mon); @@ -1174,7 +1181,17 @@ static void pnv_chip_power8_instance_init(Object *obj) chip8->num_phbs = pcc->num_phbs; for (i = 0; i < chip8->num_phbs; i++) { -object_initialize_child(obj, "phb[*]", &chip8->phbs[i], TYPE_PNV_PHB); +Object *phb = object_new(TYPE_PNV_PHB); + +/* + * We need the chip to parent the PHB to allow the DT + * to build correctly (via pnv_xscom_dt()). + * + * TODO: the PHB should be parented by a PEC device that, at + * this moment, is not modelled powernv8/phb3. + */ +object_property_add_child(obj, "phb[*]", phb); +chip8->phbs[i] = PNV_PHB(phb); } } @@ -1290,7 +1307,7 @@ static void pnv_chip_power8_realize(DeviceState *dev, Error **errp) /* PHB controllers */ for (i = 0; i < chip8->num_phbs; i++) { -PnvPHB *phb = &chip8->phbs[i]; +PnvPHB *phb = chip8->phbs[i]; object_property_set_int(OBJECT(phb), "index", i, &error_fatal); object_property_set_int(OBJECT(phb), "chip-id", chip->chip_id, @@ -1958,7 +1975,7 @@ static ICSState *pnv_ics_get(XICSFabric *xi, int irq) } for (j = 0; j < chip8->num_phbs; j++) { -PnvPHB *phb = &chip8->phbs[j]; +PnvPHB *phb = chip8->phbs[j]; PnvPHB3 *phb3 = PNV_PHB3(phb->backend); if (ics_valid_irq(&phb3->lsis, irq)) { @@ -1997,7 +2014,7 @@ static void pnv_ics_resend(XICSFabric *xi) ics_resend(&chip8->psi.ics); for (j = 0; j < chip8->num_phbs; j++) { -PnvPHB *phb = &chip8->phbs[j]; +PnvPHB *phb = chip8->phbs[j]; PnvPHB3 *phb3 = PNV_PHB3(phb->backend); ics_resend(&phb3->lsis); diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h index 781d0acffa..49433281d7 100644 --- a/include/hw/ppc/pnv.h +++ b/include/hw/ppc/pnv.h @@ -81,7 +81,11 @@ struct Pnv8Chip { PnvHomer homer; #define PNV8_CHIP_PHB3_MAX 4 -PnvPHB phbs[PNV8_CHIP_PHB3_MAX]; +/* + * The array is used to allow quick access to the phbs by + * pnv_ics_get_child() and pnv_ics_resend_child(). + */ +PnvPHB *phbs[PNV8_CHIP_PHB3_MAX]; uint32_t num_phbs; XICSFabric*xics;
Re: qemu-system-aarch64: Failed to retrieve host CPU features
Hi Peter, On Fri, 12 Aug 2022 10:25:55 +0100, Peter Maydell wrote: > > I've added some more relevant mailing lists to the cc. > > On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov wrote: > > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote: > > > I noticed that we starting to get many errors like this: > > > > > > qemu-system-aarch64: Failed to retrieve host CPU features > > > > > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and > > > Linux kernel is v5.15.59, but it started to appear months before that. > > > > > > strace shows in erroneous case: > > > > > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system > > > call) > > > > > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu: > > > > > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size); > > > if (vmfd < 0) { > > > goto err; > > > } > > > > > > Maybe it should restart ioctl on EINTR? > > > > > > I don't see EINTR documented in ioctl(2) nor in Linux' > > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it > > > says "an unmasked signal is pending". > > > > I am suggested that almost any blocking syscall could return EINTR, so I > > checked the strace log and it does not show evidence of arriving a signal, > > the log ends like this: > > > > 1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9 > > 1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48 > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system > > call) > > 1152244 close(9)= 0 > > 1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, > > st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, > > st_blocks=0, st_size=0, st_atime=1660268019 /* > > 2022-08-12T01:33:39.850436293+ */, st_atime_nsec=850436293, > > st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+ */, > > st_mtime_nsec=850436293, st_ctime=1660268019 /* > > 2022-08-12T01:33:39.850436293+ */, st_ctime_nsec=850436293}, > > AT_EMPTY_PATH) = 0 > > 1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58 > > 1152244 exit_group(1) = ? > > 1152245 <... clock_nanosleep resumed> ) = ? > > 1152245 +++ exited with 1 +++ > > 1152244 +++ exited with 1 +++ > > KVM folks: should we expect that KVM_CREATE_VM might fail EINTR > and need retrying? In general, yes. But for this particular one, this is pretty odd. The only path I can so far see that would match this behaviour is if mm_take_all_locks() (called from __mmu_notifier_register()) was getting interrupted by a signal (I'm looking at a 5.19-ish kernel, which may slightly differ from the 5.15 mentioned above). But as Vitaly points out, it doesn't seem to be a signal delivered here. Vitaly: could you please share your exact test case (full qemu command line), and instrument your kernel to see if mm_take_all_locks() is the one failing? Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2 8/8] parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
On 11.08.2022 17:00, Alexander Ivanov wrote: Replace the way we use mutex in parallels_co_check() for more clean code. I think that "cleaness" is the same, but new code would be just shorter ;) or less error prone. v2: Fix an incorrect usage of WITH_QEMU_LOCK_GUARD. Signed-off-by: Alexander Ivanov --- block/parallels.c | 26 -- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index d0364182bb..e124a8bb7d 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -553,24 +553,22 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs, BDRVParallelsState *s = bs->opaque; int ret; -qemu_co_mutex_lock(&s->lock); +WITH_QEMU_LOCK_GUARD(&s->lock) { +parallels_check_unclean(bs, res, fix); -parallels_check_unclean(bs, res, fix); +ret = parallels_check_outside_image(bs, res, fix); +if (ret < 0) { +return ret; +} -ret = parallels_check_outside_image(bs, res, fix); -if (ret < 0) { -goto out; -} - -ret = parallels_check_leak(bs, res, fix); -if (ret < 0) { -goto out; -} +ret = parallels_check_leak(bs, res, fix); +if (ret < 0) { +return ret; +} -parallels_collect_statistics(bs, res, fix); +parallels_collect_statistics(bs, res, fix); -out: -qemu_co_mutex_unlock(&s->lock); +} ret = bdrv_co_flush(bs); if (ret < 0) {
Re: [PATCH v2] hw/smbios: support for type 8 (port connector)
On Fri, Aug 12, 2022 at 03:51:53PM +0200, Hal Martin wrote: > PATCH v1: add support for SMBIOS type 8 to qemu > PATCH v2: incorporate patch v1 feedback and add smbios type=8 to qemu-options history after --- pls > internal_reference: internal reference designator > external_reference: external reference designator > connector_type: hex value for port connector type (see SMBIOS 7.9.2) > port_type: hex value for port type (see SMBIOS 7.9.3) > > After studying various vendor implementationsi (Dell, Lenovo, MSI), > the value of internal connector type was hard-coded to 0x0 (None). > > Example usage: > -smbios > type=8,internal_reference=JUSB1,external_reference=USB1,connector_type=0x12,port_type=0x10 > \ > -smbios type=8,internal_reference=JAUD1,external_reference="Audio > Jack",connector_type=0x1f,port_type=0x1d \ > -smbios > type=8,internal_reference=LAN,external_reference=Ethernet,connector_type=0x0b,port_type=0x1f > \ > -smbios > type=8,internal_reference=PS2,external_reference=Mouse,connector_type=0x0f,port_type=0x0e > \ > -smbios > type=8,internal_reference=PS2,external_reference=Keyboard,connector_type=0x0f,port_type=0x0d > > > Signed-off-by: Hal Martin We are in freeze, I tagged this for after the release. Just to make sure pls ping me after the release if possible. > --- > hw/smbios/smbios.c | 63 > include/hw/firmware/smbios.h | 10 ++ > qemu-options.hx | 2 ++ > 3 files changed, 75 insertions(+) > > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c > index 60349ee402..578cae0f0a 100644 > --- a/hw/smbios/smbios.c > +++ b/hw/smbios/smbios.c > @@ -111,6 +111,13 @@ static struct { > .processor_id = 0, > }; > > +struct type8_instance { > +const char *internal_reference, *external_reference; > +uint8_t connector_type, port_type; > +QTAILQ_ENTRY(type8_instance) next; > +}; > +static QTAILQ_HEAD(, type8_instance) type8 = QTAILQ_HEAD_INITIALIZER(type8); > + > static struct { > size_t nvalues; > char **values; > @@ -337,6 +344,29 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = { > { /* end of list */ } > }; > > +static const QemuOptDesc qemu_smbios_type8_opts[] = { > +{ > +.name = "internal_reference", > +.type = QEMU_OPT_STRING, > +.help = "internal reference designator", > +}, > +{ > +.name = "external_reference", > +.type = QEMU_OPT_STRING, > +.help = "external reference designator", > +}, > +{ > +.name = "connector_type", > +.type = QEMU_OPT_NUMBER, > +.help = "connector type", > +}, > +{ > +.name = "port_type", > +.type = QEMU_OPT_NUMBER, > +.help = "port type", > +}, > +}; > + > static const QemuOptDesc qemu_smbios_type11_opts[] = { > { > .name = "value", > @@ -718,6 +748,26 @@ static void smbios_build_type_4_table(MachineState *ms, > unsigned instance) > smbios_type4_count++; > } > > +static void smbios_build_type_8_table(void) > +{ > +unsigned instance = 0; > +struct type8_instance *t8; > + > +QTAILQ_FOREACH(t8, &type8, next) { > +SMBIOS_BUILD_TABLE_PRE(8, T0_BASE + instance, true); > + > +SMBIOS_TABLE_SET_STR(8, internal_reference_str, > t8->internal_reference); > +SMBIOS_TABLE_SET_STR(8, external_reference_str, > t8->external_reference); > +/* most vendors seem to set this to None */ > +t->internal_connector_type = 0x0; > +t->external_connector_type = t8->connector_type; > +t->port_type = t8->port_type; > + > +SMBIOS_BUILD_TABLE_POST; > +instance++; > +} > +} > + > static void smbios_build_type_11_table(void) > { > char count_str[128]; > @@ -1030,6 +1080,7 @@ void smbios_get_tables(MachineState *ms, > smbios_build_type_4_table(ms, i); > } > > +smbios_build_type_8_table(); > smbios_build_type_11_table(); > > #define MAX_DIMM_SZ (16 * GiB) > @@ -1346,6 +1397,18 @@ void smbios_entry_add(QemuOpts *opts, Error **errp) > UINT16_MAX); > } > return; > +case 8: > +if (!qemu_opts_validate(opts, qemu_smbios_type8_opts, errp)) { > +return; > +} > +struct type8_instance *t; > +t = g_new0(struct type8_instance, 1); > +save_opt(&t->internal_reference, opts, "internal_reference"); > +save_opt(&t->external_reference, opts, "external_reference"); > +t->connector_type = qemu_opt_get_number(opts, "connector_type", > 0); > +t->port_type = qemu_opt_get_number(opts, "port_type", 0); > +QTAILQ_INSERT_TAIL(&type8, t, next); > +return; > case 11: > if (!qemu_opts_validate(opts, qemu_smbios_type11_opts, errp)) { > return; > diff --git a/include/hw/firmware/smbios.h b/inclu
Re: [PATCH v3 1/4] accel/tcg: Invalidate translations when clearing PAGE_EXEC
On Thu, 2022-08-11 at 08:42 -0700, Richard Henderson wrote: > On 8/11/22 02:28, Ilya Leoshkevich wrote: > > How is qemu-user's get_page_addr_code() involved here? > > > > I tried to experiment with it, and while I agree that it looks > > buggy, > > it's called only from translation code paths. If we already have a > > translation block, these code paths are not used. > > It's called from tb_lookup too, when we're trying to find an existing > TB. > > > r~ > Oh, I see. I was first worried about direct block chaining with goto_tb, but it turned out that translator_use_goto_tb() prevented it. tb_lookup() skips get_page_addr_code() if tb is found in tb_jmp_cache. I assume it's a bug?
Re: [PATCH v2 0/8] parallels: Refactor the code of images checks and fix a bug
On 11.08.2022 17:00, Alexander Ivanov wrote: Fix image inflation when offset in BAT is out of image. Replace whole BAT syncing by flushing only dirty blocks. Move all the checks outside the main check function in separate functions Use WITH_QEMU_LOCK_GUARD for more clean code. Alexander Ivanov (8): parallels: Out of image offset in BAT leads to image inflation parallels: Move BAT entry setting to a separate function parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT flushing parallels: Move check of unclean image to a separate function parallels: Move check of cluster outside image to a separate function parallels: Move check of leaks to a separate function parallels: Move statistic collection to a separate function parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD block/parallels.c | 188 -- 1 file changed, 132 insertions(+), 56 deletions(-) I believe that this series is ready to go once commit messages will be improved. Den
Re: qemu-system-aarch64: Failed to retrieve host CPU features
On Fri, 12 Aug 2022 10:25:55 +0100, Peter Maydell wrote: > > I've added some more relevant mailing lists to the cc. > > On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov wrote: > > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote: > > > I noticed that we starting to get many errors like this: > > > > > > qemu-system-aarch64: Failed to retrieve host CPU features > > > > > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and > > > Linux kernel is v5.15.59, but it started to appear months before that. > > > > > > strace shows in erroneous case: > > > > > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system > > > call) > > > > > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu: > > > > > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size); > > > if (vmfd < 0) { > > > goto err; > > > } > > > > > > Maybe it should restart ioctl on EINTR? > > > > > > I don't see EINTR documented in ioctl(2) nor in Linux' > > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it > > > says "an unmasked signal is pending". > > > > I am suggested that almost any blocking syscall could return EINTR, so I > > checked the strace log and it does not show evidence of arriving a signal, > > the log ends like this: > > > > 1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9 > > 1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48 > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system > > call) > > 1152244 close(9)= 0 > > 1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, > > st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, > > st_blocks=0, st_size=0, st_atime=1660268019 /* > > 2022-08-12T01:33:39.850436293+ */, st_atime_nsec=850436293, > > st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+ */, > > st_mtime_nsec=850436293, st_ctime=1660268019 /* > > 2022-08-12T01:33:39.850436293+ */, st_ctime_nsec=850436293}, > > AT_EMPTY_PATH) = 0 > > 1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58 > > 1152244 exit_group(1) = ? > > 1152245 <... clock_nanosleep resumed> ) = ? > > 1152245 +++ exited with 1 +++ > > 1152244 +++ exited with 1 +++ > > KVM folks: should we expect that KVM_CREATE_VM might fail EINTR > and need retrying? > > (I suspect the answer is "yes", given we do this in the generic > code in kvm-all.c.) Interestingly, this has cropped up in the (distant) past: https://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01031.html and seems to point at the path I was mentioning earlier (the code hasn't changed too much since, apparently). I'd still like to understand the underlying reason though. Thanks, M. -- Without deviation from the norm, progress is not possible.
[PATCH] hw/arm/bcm2835_property: Add support for RPI_FIRMWARE_FRAMEBUFFER_GET_NUM_DISPLAYS
In more recent Raspbian OS Linux kernels, the fb driver gives up immediately if RPI_FIRMWARE_FRAMEBUFFER_GET_NUM_DISPLAYS fails or no displays are reported. This change simply always reports one display. It makes bcm2835_fb work again with these more recent kernels. Signed-off-by: Enrik Berkhan --- hw/misc/bcm2835_property.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c index e94e951057..890ae7bae5 100644 --- a/hw/misc/bcm2835_property.c +++ b/hw/misc/bcm2835_property.c @@ -270,6 +270,10 @@ static void bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value) stl_le_phys(&s->dma_as, value + 12, 0); resplen = 4; break; +case 0x00040013: /* Get number of displays */ +stl_le_phys(&s->dma_as, value + 12, 1); +resplen = 4; +break; case 0x00060001: /* Get DMA channels */ /* channels 2-5 */ -- 2.34.1
Re: [PATCH for-7.2 v4 07/11] ppc/pnv: add PHB4 helpers for user created pnv-phb
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: The PHB4 backend relies on a link with the corresponding PEC element. This is trivial to do during machine_init() time for default devices, but not so much for user created ones. pnv_phb4_get_pec() is a small variation of the function that was reverted by commit 9c10d86fee "ppc/pnv: Remove user-created PHB{3,4,5} devices". We'll use it to determine the appropriate PEC for a given user created pnv-phb that uses a PHB4 backend. This is done during realize() time, in pnv_phb_user_device_init(). Signed-off-by: Daniel Henrique Barboza --- Reviewed-by: Frederic Barrat Fred hw/ppc/pnv.c | 35 --- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index e82d6522f0..0644f45a1d 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -281,6 +281,34 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t pir, g_free(reg); } +static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb, + Error **errp) +{ +Pnv9Chip *chip9 = PNV9_CHIP(chip); +int chip_id = phb->chip_id; +int index = phb->phb_id; +int i, j; + +for (i = 0; i < chip->num_pecs; i++) { +/* + * For each PEC, check the amount of phbs it supports + * and see if the given phb4 index matches an index. + */ +PnvPhb4PecState *pec = &chip9->pecs[i]; + +for (j = 0; j < pec->num_phbs; j++) { +if (index == pnv_phb4_pec_get_phb_id(pec, j)) { +return pec; +} +} +} +error_setg(errp, + "pnv-phb4 chip-id %d index %d didn't match any existing PEC", + chip_id, index); + +return NULL; +} + /* * Adds a PnvPHB to the chip. Returns the parent obj of the * PHB which varies with each version (phb version 3 is parented @@ -302,10 +330,11 @@ Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp) chip8->num_phbs++; return OBJECT(chip); -} else { -/* phb4 support will be added later */ -return NULL; } + +phb->pec = pnv_phb4_get_pec(chip, PNV_PHB4(phb->backend), errp); + +return OBJECT(phb->pec); } static void pnv_chip_power8_dt_populate(PnvChip *chip, void *fdt)
Re: [PATCH for-7.2 v4 09/11] ppc/pnv: change pnv_phb4_get_pec() to also retrieve chip10->pecs
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: The function assumes that we're always dealing with a PNV9_CHIP() object. This is not the case when the pnv-phb device belongs to a powernv10 machine. Change pnv_phb4_get_pec() to be able to work with PNV10_CHIP() if necessary. Signed-off-by: Daniel Henrique Barboza --- Reviewed-by: Frederic Barrat Fred hw/ppc/pnv.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index ec0558ed1c..e6c14fe231 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -284,17 +284,30 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t pir, static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb, Error **errp) { -Pnv9Chip *chip9 = PNV9_CHIP(chip); +PnvPHB *phb_base = phb->phb_base; +PnvPhb4PecState *pecs = NULL; int chip_id = phb->chip_id; int index = phb->phb_id; int i, j; +if (phb_base->version == 4) { +Pnv9Chip *chip9 = PNV9_CHIP(chip); + +pecs = chip9->pecs; +} else if (phb_base->version == 5) { +Pnv10Chip *chip10 = PNV10_CHIP(chip); + +pecs = chip10->pecs; +} else { +g_assert_not_reached(); +} + for (i = 0; i < chip->num_pecs; i++) { /* * For each PEC, check the amount of phbs it supports * and see if the given phb4 index matches an index. */ -PnvPhb4PecState *pec = &chip9->pecs[i]; +PnvPhb4PecState *pec = &pecs[i]; for (j = 0; j < pec->num_phbs; j++) { if (index == pnv_phb4_pec_get_phb_id(pec, j)) {
Re: [PATCH for-7.2 v4 10/11] ppc/pnv: user creatable pnv-phb for powernv10
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: Given that powernv9 and powernv10 uses the same pnv-phb backend, the logic to allow user created pnv-phbs for powernv10 is already in place. Let's flip the switch. Reviewed-by: Cédric Le Goater Signed-off-by: Daniel Henrique Barboza --- Reviewed-by: Frederic Barrat Fred hw/ppc/pnv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index e6c14fe231..9bf35ca9d6 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -2226,6 +2226,8 @@ static void pnv_machine_power10_class_init(ObjectClass *oc, void *data) pmc->dt_power_mgt = pnv_dt_power_mgt; xfc->match_nvt = pnv10_xive_match_nvt; + +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB); } static bool pnv_machine_get_hb(Object *obj, Error **errp)
Re: [BUG] cxl can not create region
On Thu, 11 Aug 2022 18:08:57 +0100 Jonathan Cameron via wrote: > On Tue, 9 Aug 2022 17:08:25 +0100 > Jonathan Cameron wrote: > > > On Tue, 9 Aug 2022 21:07:06 +0800 > > Bobo WL wrote: > > > > > Hi Jonathan > > > > > > Thanks for your reply! > > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron > > > wrote: > > > > > > > > Probably not related to your problem, but there is a disconnect in QEMU > > > > / > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only > > > > has a single root port. Spec allows it to be provided or not as an > > > > implementation choice. > > > > Kernel assumes it isn't provide. Qemu assumes it is. > > > > > > > > The temporary solution is to throw in a second root port on the HB and > > > > not > > > > connect anything to it. Longer term I may special case this so that > > > > the particular > > > > decoder defaults to pass through settings in QEMU if there is only one > > > > root port. > > > > > > > > > > You are right! After adding an extra HB in qemu, I can create a x1 > > > region successfully. > > > But have some errors in Nvdimm: > > > > > > [ 74.925838] Unknown online node for memory at 0x100, assuming > > > node 0 > > > [ 74.925846] Unknown target node for memory at 0x100, assuming > > > node 0 > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe > > > > Ah. I've seen this one, but not chased it down yet. Was on my todo list to > > chase > > down. Once I reach this state I can verify the HDM Decode is correct which > > is what > > I've been using to test (Which wasn't true until earlier this week). > > I'm currently testing via devmem, more for historical reasons than because > > it makes > > that much sense anymore. > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. > I'd forgotten that was still on the todo list. I don't think it will > be particularly hard to do and will take a look in next few days. > > Very very indirectly this error is causing a driver probe fail that means that > we hit a code path that has a rather odd looking check on NDD_LABELING. > Should not have gotten near that path though - hence the problem is actually > when we call cxl_pmem_get_config_data() and it returns an error because > we haven't fully connected up the command in QEMU. So a least one bug in QEMU. We were not supporting variable length payloads on mailbox inputs (but were on outputs). That hasn't mattered until we get to LSA writes. We just need to relax condition on the supplied length. diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c index c352a935c4..fdda9529fe 100644 --- a/hw/cxl/cxl-mailbox-utils.c +++ b/hw/cxl/cxl-mailbox-utils.c @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) cxl_cmd = &cxl_cmd_set[set][cmd]; h = cxl_cmd->handler; if (h) { -if (len == cxl_cmd->in) { +if (len == cxl_cmd->in || !cxl_cmd->in) { cxl_cmd->payload = cxl_dstate->mbox_reg_state + A_CXL_DEV_CMD_PAYLOAD; ret = (*h)(cxl_cmd, cxl_dstate, &len); This lets the nvdimm/region probe fine, but I'm getting some issues with namespace capacity so I'll look at what is causing that next. Unfortunately I'm not that familiar with the driver/nvdimm side of things so it's take a while to figure out what kicks off what! Jonathan > > Jonathan > > > > > > > > > > And x4 region still failed with same errors, using latest cxl/preview > > > branch don't work. > > > I have picked "Two CXL emulation fixes" patches in qemu, still not > > > working. > > > > > > Bob > >
Re: [RFC v2 00/10] Introduce an extensible static analyzer
On Thu, Aug 4, 2022 at 12:44 PM Marc-André Lureau wrote: > Hi > > Great work so far! This seems easier to hack than my attempt to use > clang-tidy to write some qemu checks > (https://github.com/elmarco/clang-tools-extra) > > The code seems quite generic, I wonder if such a tool in python wasn't > already developed (I couldn't find it easily searching on github). > > Why not make it standalone from qemu? Similar to > https://gitlab.com/qemu-project/python-qemu-qmp, you could have your > own release management, issue tracker, code formatting, license, CI > etc. (you should add copyright header in each file, at least that's > pretty much required in qemu nowadays). You could also have the > qemu-specific checks there imho (clang-tidy has google & llvm specific > checks too) This is an interesting idea. Indeed, the analyzer is essentially a more easily extensible, Python version of clang-tidy (without the big built-in library of checks). I think I'll continue working on it embedded in QEMU for now, mostly because it depends on some aspects of the build system, and gradually generalize it to a point where it could be made into a standalone thing. > It would be nice to write some docs, in docs/devel/testing.rst and > some new meson/ninja/make targets to run the checks directly from a > build tree. Sounds good, I'll work on it. > On fc36, I had several dependencies I needed to install manually (imho > they should have been pulled by python3-clang), but more annoyingly I > got: > clang.cindex.LibclangError: libclang.so: cannot open shared object > file: No such file or directory. To provide a path to libclang use > Config.set_library_path() or Config.set_library_file(). > > clang-libs doesn't install libclang.so, I wonder why. I made a link > manually and it works, but it's probably incorrect. I'll try to open > issues for the clang packaging. That's strange. Thanks for looking into this. Alberto
Re: [RFC v2 02/10] Drop unused static function return values
On Wed, Aug 3, 2022 at 1:30 PM Peter Maydell wrote: > The problem with a patch like this is that it rolls up into a > single patch changes to the API of many functions in multiple > subsystems across the whole codebase. Some of those changes > might be right; some might be wrong. No single person is going > to be in a position to review the whole lot, and a +248-403 > patch email makes it very unwieldy to try to discuss. > > If you want to propose some of these I think you need to: > * split it out so that you're only suggesting changes in >one subsystem at a time > * look at the places you are suggesting changes, to see if >the correct answer is actually "add the missing error >check in the caller(s)" > * not change places that are following standard API patterns >like "return bool and have an Error** argument" Sounds good. For now, I'll limit the changes to a few representative cases e.g. in the block layer, since this patch is mostly intended as a demonstration of the type of issue that the check catches. Once there is agreement on the semantics for the check, I'll probably send a separate tree-wide series with per-subsystem patches. Thanks, Alberto
Re: [PATCH for-7.2 v4 11/11] ppc/pnv: fix QOM parenting of user creatable root ports
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: User creatable root ports are being parented by the 'peripheral' or the 'peripheral-anon' container. This happens because this is the regular QOM schema for sysbus devices that are added via the command line. Let's make this QOM hierarchy similar to what we have with default root ports, i.e. the root port must be parented by the pnv-root-bus. To do that we change the qom and bus parent of the root port during root_port_realize(). The realize() is shared by the default root port code path, so we can remove the code inside pnv_phb_attach_root_port() that was adding the root port as a child of the bus as well. While we're at it, change pnv_phb_attach_root_port() to receive a PCIBus instead of a PCIHostState to make it clear that the function does not make use of the PHB. Signed-off-by: Daniel Henrique Barboza --- hw/pci-host/pnv_phb.c | 35 +++ 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c index 17d9960aa1..65ed1f9eb4 100644 --- a/hw/pci-host/pnv_phb.c +++ b/hw/pci-host/pnv_phb.c @@ -51,27 +51,11 @@ static void pnv_parent_bus_fixup(DeviceState *parent, DeviceState *child, } } -/* - * Attach a root port device. - * - * 'index' will be used both as a PCIE slot value and to calculate - * QOM id. 'chip_id' is going to be used as PCIE chassis for the - * root port. - */ -static void pnv_phb_attach_root_port(PCIHostState *pci) +static void pnv_phb_attach_root_port(PCIBus *bus) { PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT); -const char *dev_id = DEVICE(root)->id; -g_autofree char *default_id = NULL; -int index; - -index = object_property_get_int(OBJECT(pci->bus), "phb-id", &error_fatal); -default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index); - -object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id, - OBJECT(root)); -pci_realize_and_unref(root, pci->bus, &error_fatal); +pci_realize_and_unref(root, bus, &error_fatal); } /* @@ -171,7 +155,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp) return; } -pnv_phb_attach_root_port(pci); +pnv_phb_attach_root_port(pci->bus); } static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge, @@ -240,12 +224,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, Error **errp) { PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev); PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev); -PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev)); +BusState *qbus = qdev_get_parent_bus(dev); +PCIBus *bus = PCI_BUS(qbus); PCIDevice *pci = PCI_DEVICE(dev); uint16_t device_id = 0; Error *local_err = NULL; int chip_id, index; +/* + * 'index' will be used both as a PCIE slot value and to calculate + * QOM id. 'chip_id' is going to be used as PCIE chassis for the + * root port. + */ chip_id = object_property_get_int(OBJECT(bus), "chip-id", &error_fatal); index = object_property_get_int(OBJECT(bus), "phb-id", &error_fatal); @@ -253,6 +243,11 @@ static void pnv_phb_root_port_realize(DeviceState *dev, Error **errp) qdev_prop_set_uint8(dev, "chassis", chip_id); qdev_prop_set_uint16(dev, "slot", index); +pnv_parent_qom_fixup(OBJECT(bus), OBJECT(dev), index); +if (!qdev_set_parent_bus(dev, qbus, &error_fatal)) { That line looks surprising at first, because we got qbus from qdev_get_parent_bus() just above, so it looks like a noop. I talked to Daniel about it: the device<->bus relationship is correct when entering the function but the call to pnv_parent_qom_fixup() interferes with the bus relationship, so it needs to be re-established. Short of a better suggestion, it probably need a comment. Fred +return; +} + rpc->parent_realize(dev, &local_err); if (local_err) { error_propagate(errp, local_err);
Re: [BUG] cxl can not create region
Jonathan Cameron wrote: > On Thu, 11 Aug 2022 18:08:57 +0100 > Jonathan Cameron via wrote: > > > On Tue, 9 Aug 2022 17:08:25 +0100 > > Jonathan Cameron wrote: > > > > > On Tue, 9 Aug 2022 21:07:06 +0800 > > > Bobo WL wrote: > > > > > > > Hi Jonathan > > > > > > > > Thanks for your reply! > > > > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron > > > > wrote: > > > > > > > > > > Probably not related to your problem, but there is a disconnect in > > > > > QEMU / > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only > > > > > has a single root port. Spec allows it to be provided or not as an > > > > > implementation choice. > > > > > Kernel assumes it isn't provide. Qemu assumes it is. > > > > > > > > > > The temporary solution is to throw in a second root port on the HB > > > > > and not > > > > > connect anything to it. Longer term I may special case this so that > > > > > the particular > > > > > decoder defaults to pass through settings in QEMU if there is only > > > > > one root port. > > > > > > > > > > > > > You are right! After adding an extra HB in qemu, I can create a x1 > > > > region successfully. > > > > But have some errors in Nvdimm: > > > > > > > > [ 74.925838] Unknown online node for memory at 0x100, > > > > assuming node 0 > > > > [ 74.925846] Unknown target node for memory at 0x100, > > > > assuming node 0 > > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe > > > > > > Ah. I've seen this one, but not chased it down yet. Was on my todo list > > > to chase > > > down. Once I reach this state I can verify the HDM Decode is correct > > > which is what > > > I've been using to test (Which wasn't true until earlier this week). > > > I'm currently testing via devmem, more for historical reasons than > > > because it makes > > > that much sense anymore. > > > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. > > I'd forgotten that was still on the todo list. I don't think it will > > be particularly hard to do and will take a look in next few days. > > > > Very very indirectly this error is causing a driver probe fail that means > > that > > we hit a code path that has a rather odd looking check on NDD_LABELING. > > Should not have gotten near that path though - hence the problem is actually > > when we call cxl_pmem_get_config_data() and it returns an error because > > we haven't fully connected up the command in QEMU. > > So a least one bug in QEMU. We were not supporting variable length payloads > on mailbox > inputs (but were on outputs). That hasn't mattered until we get to LSA > writes. > We just need to relax condition on the supplied length. > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c > index c352a935c4..fdda9529fe 100644 > --- a/hw/cxl/cxl-mailbox-utils.c > +++ b/hw/cxl/cxl-mailbox-utils.c > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) > cxl_cmd = &cxl_cmd_set[set][cmd]; > h = cxl_cmd->handler; > if (h) { > -if (len == cxl_cmd->in) { > +if (len == cxl_cmd->in || !cxl_cmd->in) { > cxl_cmd->payload = cxl_dstate->mbox_reg_state + > A_CXL_DEV_CMD_PAYLOAD; > ret = (*h)(cxl_cmd, cxl_dstate, &len); > > > This lets the nvdimm/region probe fine, but I'm getting some issues with > namespace capacity so I'll look at what is causing that next. > Unfortunately I'm not that familiar with the driver/nvdimm side of things > so it's take a while to figure out what kicks off what! The whirlwind tour is that 'struct nd_region' instances that represent a persitent memory address range are composed of one more mappings of 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking the dimm (if locked) and interrogating the label area to look for namespace labels. The label command calls are routed to the '->ndctl()' callback that was registered when the CXL nvdimm_bus_descriptor was created. That callback handles both 'bus' scope calls, currently none for CXL, and per nvdimm calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands to CXL commands. The 'struct nvdimm' objects that the CXL side registers have the NDD_LABELING flag set which means that namespaces need to be explicitly created / provisioned from region capacity. Otherwise, if drivers/nvdimm/dimm.c does not find a namespace-label-index block then the region reverts to label-less mode and a default namespace equal to the size of the region is instantiated. If you are seeing small mismatches in namespace capacity then it may just be the fact that by default 'ndctl create-namespace' results in an 'fsdax' mode namespace which just means that it is a block device where 1.5% of the capacity is reserved for 'struct page' metadata. You should be able to see namespace capacity
Re: [PATCH for-7.2 v4 08/11] ppc/pnv: enable user created pnv-phb for powernv9
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: Enable pnv-phb user created devices for powernv9 now that we have everything in place. Reviewed-by: Cédric Le Goater Signed-off-by: Daniel Henrique Barboza --- Same comment as in patch 6 regarding the QOM relationship of the user-created root port. Reviewed-by: Frederic Barrat Fred hw/pci-host/pnv_phb.c | 2 +- hw/pci-host/pnv_phb4_pec.c | 6 -- hw/ppc/pnv.c | 2 ++ 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c index 1f53ff77c5..17d9960aa1 100644 --- a/hw/pci-host/pnv_phb.c +++ b/hw/pci-host/pnv_phb.c @@ -167,7 +167,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp) pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend)); } -if (phb->version == 3 && !defaults_enabled()) { +if (!defaults_enabled()) { return; } diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c index 8dc363d69c..9871f462cd 100644 --- a/hw/pci-host/pnv_phb4_pec.c +++ b/hw/pci-host/pnv_phb4_pec.c @@ -146,8 +146,10 @@ static void pnv_pec_realize(DeviceState *dev, Error **errp) pec->num_phbs = pecc->num_phbs[pec->index]; /* Create PHBs if running with defaults */ -for (i = 0; i < pec->num_phbs; i++) { -pnv_pec_default_phb_realize(pec, i, errp); +if (defaults_enabled()) { +for (i = 0; i < pec->num_phbs; i++) { +pnv_pec_default_phb_realize(pec, i, errp); +} } /* Initialize the XSCOM regions for the PEC registers */ diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index 0644f45a1d..ec0558ed1c 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -2188,6 +2188,8 @@ static void pnv_machine_power9_class_init(ObjectClass *oc, void *data) pmc->compat = compat; pmc->compat_size = sizeof(compat); pmc->dt_power_mgt = pnv_dt_power_mgt; + +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB); } static void pnv_machine_power10_class_init(ObjectClass *oc, void *data)
Re: [BUG] cxl can not create region
On Fri, 12 Aug 2022 09:03:02 -0700 Dan Williams wrote: > Jonathan Cameron wrote: > > On Thu, 11 Aug 2022 18:08:57 +0100 > > Jonathan Cameron via wrote: > > > > > On Tue, 9 Aug 2022 17:08:25 +0100 > > > Jonathan Cameron wrote: > > > > > > > On Tue, 9 Aug 2022 21:07:06 +0800 > > > > Bobo WL wrote: > > > > > > > > > Hi Jonathan > > > > > > > > > > Thanks for your reply! > > > > > > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron > > > > > wrote: > > > > > > > > > > > > Probably not related to your problem, but there is a disconnect in > > > > > > QEMU / > > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB > > > > > > only > > > > > > has a single root port. Spec allows it to be provided or not as an > > > > > > implementation choice. > > > > > > Kernel assumes it isn't provide. Qemu assumes it is. > > > > > > > > > > > > The temporary solution is to throw in a second root port on the HB > > > > > > and not > > > > > > connect anything to it. Longer term I may special case this so > > > > > > that the particular > > > > > > decoder defaults to pass through settings in QEMU if there is only > > > > > > one root port. > > > > > > > > > > > > > > > > You are right! After adding an extra HB in qemu, I can create a x1 > > > > > region successfully. > > > > > But have some errors in Nvdimm: > > > > > > > > > > [ 74.925838] Unknown online node for memory at 0x100, > > > > > assuming node 0 > > > > > [ 74.925846] Unknown target node for memory at 0x100, > > > > > assuming node 0 > > > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe > > > > > > > > > > > > > Ah. I've seen this one, but not chased it down yet. Was on my todo > > > > list to chase > > > > down. Once I reach this state I can verify the HDM Decode is correct > > > > which is what > > > > I've been using to test (Which wasn't true until earlier this week). > > > > I'm currently testing via devmem, more for historical reasons than > > > > because it makes > > > > that much sense anymore. > > > > > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. > > > I'd forgotten that was still on the todo list. I don't think it will > > > be particularly hard to do and will take a look in next few days. > > > > > > Very very indirectly this error is causing a driver probe fail that means > > > that > > > we hit a code path that has a rather odd looking check on NDD_LABELING. > > > Should not have gotten near that path though - hence the problem is > > > actually > > > when we call cxl_pmem_get_config_data() and it returns an error because > > > we haven't fully connected up the command in QEMU. > > > > So a least one bug in QEMU. We were not supporting variable length payloads > > on mailbox > > inputs (but were on outputs). That hasn't mattered until we get to LSA > > writes. > > We just need to relax condition on the supplied length. > > > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c > > index c352a935c4..fdda9529fe 100644 > > --- a/hw/cxl/cxl-mailbox-utils.c > > +++ b/hw/cxl/cxl-mailbox-utils.c > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) > > cxl_cmd = &cxl_cmd_set[set][cmd]; > > h = cxl_cmd->handler; > > if (h) { > > -if (len == cxl_cmd->in) { > > +if (len == cxl_cmd->in || !cxl_cmd->in) { > > cxl_cmd->payload = cxl_dstate->mbox_reg_state + > > A_CXL_DEV_CMD_PAYLOAD; > > ret = (*h)(cxl_cmd, cxl_dstate, &len); > > > > > > This lets the nvdimm/region probe fine, but I'm getting some issues with > > namespace capacity so I'll look at what is causing that next. > > Unfortunately I'm not that familiar with the driver/nvdimm side of things > > so it's take a while to figure out what kicks off what! > > The whirlwind tour is that 'struct nd_region' instances that represent a > persitent memory address range are composed of one more mappings of > 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver > in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking > the dimm (if locked) and interrogating the label area to look for > namespace labels. > > The label command calls are routed to the '->ndctl()' callback that was > registered when the CXL nvdimm_bus_descriptor was created. That callback > handles both 'bus' scope calls, currently none for CXL, and per nvdimm > calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands > to CXL commands. > > The 'struct nvdimm' objects that the CXL side registers have the > NDD_LABELING flag set which means that namespaces need to be explicitly > created / provisioned from region capacity. Otherwise, if > drivers/nvdimm/dimm.c does not find a namespace-label-index block then > the region reverts to label-less mode and a default namespace equal to > the size of the region is instantiated.
Re: [PATCH for-7.2 v4 06/11] ppc/pnv: enable user created pnv-phb for powernv8
On 11/08/2022 18:39, Daniel Henrique Barboza wrote: The bulk of the work was already done by previous patches. Use defaults_enabled() to determine whether we need to create the default devices or not. Reviewed-by: Cédric Le Goater Signed-off-by: Daniel Henrique Barboza --- The QOM relationship for user-created root port is not ideal, but it's addressed in the last patch of the series. Reviewed-by: Frederic Barrat Fred hw/pci-host/pnv_phb.c | 9 +++-- hw/ppc/pnv.c | 32 ++-- 2 files changed, 25 insertions(+), 16 deletions(-) diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c index 5dc44f45d1..1f53ff77c5 100644 --- a/hw/pci-host/pnv_phb.c +++ b/hw/pci-host/pnv_phb.c @@ -17,6 +17,7 @@ #include "hw/ppc/pnv.h" #include "hw/qdev-properties.h" #include "qom/object.h" +#include "sysemu/sysemu.h" /* @@ -166,6 +167,10 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp) pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend)); } +if (phb->version == 3 && !defaults_enabled()) { +return; +} + pnv_phb_attach_root_port(pci); } @@ -201,7 +206,7 @@ static void pnv_phb_class_init(ObjectClass *klass, void *data) dc->realize = pnv_phb_realize; device_class_set_props(dc, pnv_phb_properties); set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); -dc->user_creatable = false; +dc->user_creatable = true; } static void pnv_phb_root_port_reset(DeviceState *dev) @@ -292,7 +297,7 @@ static void pnv_phb_root_port_class_init(ObjectClass *klass, void *data) device_class_set_parent_reset(dc, pnv_phb_root_port_reset, &rpc->parent_reset); dc->reset = &pnv_phb_root_port_reset; -dc->user_creatable = false; +dc->user_creatable = true; k->vendor_id = PCI_VENDOR_ID_IBM; /* device_id will be written during realize() */ diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index e53e9e297d..e82d6522f0 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -1178,20 +1178,22 @@ static void pnv_chip_power8_instance_init(Object *obj) object_initialize_child(obj, "homer", &chip8->homer, TYPE_PNV8_HOMER); -chip8->num_phbs = pcc->num_phbs; - -for (i = 0; i < chip8->num_phbs; i++) { -Object *phb = object_new(TYPE_PNV_PHB); - -/* - * We need the chip to parent the PHB to allow the DT - * to build correctly (via pnv_xscom_dt()). - * - * TODO: the PHB should be parented by a PEC device that, at - * this moment, is not modelled powernv8/phb3. - */ -object_property_add_child(obj, "phb[*]", phb); -chip8->phbs[i] = PNV_PHB(phb); +if (defaults_enabled()) { +chip8->num_phbs = pcc->num_phbs; + +for (i = 0; i < chip8->num_phbs; i++) { +Object *phb = object_new(TYPE_PNV_PHB); + +/* + * We need the chip to parent the PHB to allow the DT + * to build correctly (via pnv_xscom_dt()). + * + * TODO: the PHB should be parented by a PEC device that, at + * this moment, is not modelled powernv8/phb3. + */ +object_property_add_child(obj, "phb[*]", phb); +chip8->phbs[i] = PNV_PHB(phb); +} } } @@ -2130,6 +2132,8 @@ static void pnv_machine_power8_class_init(ObjectClass *oc, void *data) pmc->compat = compat; pmc->compat_size = sizeof(compat); + +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB); } static void pnv_machine_power9_class_init(ObjectClass *oc, void *data)
Re: [RFC v2 02/10] Drop unused static function return values
On Wed, Aug 3, 2022 at 12:15 PM Richard W.M. Jones wrote: > If it helps to think about this, Coverity checks for consistency. > Across the whole code base, is the return value of a function used or > ignored consistently. You will see Coverity errors like: > > Error: CHECKED_RETURN (CWE-252): [#def37] > libnbd-1.12.5/fuse/operations.c:180: check_return: Calling "nbd_poll" > without checking return value (as is done elsewhere 5 out of 6 times). > libnbd-1.12.5/examples/aio-connect-read.c:96: example_checked: Example > 1: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == -1". > libnbd-1.12.5/examples/aio-connect-read.c:128: example_checked: Example > 2: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == -1". > libnbd-1.12.5/examples/strict-structured-reads.c:246: example_checked: > Example 3: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == > -1". > libnbd-1.12.5/ocaml/nbd-c.c:2599: example_assign: Example 4: Assigning: > "r" = return value from "nbd_poll(h, timeout)". > libnbd-1.12.5/ocaml/nbd-c.c:2602: example_checked: Example 4 (cont.): > "r" has its value checked in "r == -1". > libnbd-1.12.5/python/methods.c:2806: example_assign: Example 5: > Assigning: "ret" = return value from "nbd_poll(h, timeout)". > libnbd-1.12.5/python/methods.c:2808: example_checked: Example 5 > (cont.): "ret" has its value checked in "ret == -1". > # 178| /* Dispatch work while there are commands in flight. */ > # 179| while (thread->in_flight > 0) > # 180|-> nbd_poll (h, -1); > # 181| } > # 182| > > What it's saying is that in this code base, nbd_poll's return value > was checked by the caller 5 out of 6 times, but ignored here. (This > turned out to be a real bug which we fixed). > > It seems like the check implemented in your patch is: If the return > value is used 0 times anywhere in the code base, change the return > value to 'void'. Coverity would not flag this. > > Maybe a consistent use check is better? Note that the analyzer is currently limited to analyzing a single translation unit at a time, so we would only be able to implement a consistent use check for static functions (this is why the current "return-value-never-used" check only applies to static functions). It may be worthwhile exploring cross-translation unit analysis, although it may be difficult to accomplish while also avoiding reanalyzing the entire tree every time a single translation unit is modified. Alberto
Re: [PATCH 00/62] target/arm: Implement FEAT_HAFDBS
On Sun, 3 Jul 2022 at 09:25, Richard Henderson wrote: > > This is a major reorg to arm page table walking. While the result > here is "merely" Hardware-assited Access Flag and Dirty Bit Setting > (HAFDBS), the ultimate goal is the Realm Management Extension (RME). > RME "recommends" that HAFDBS be implemented (I_CSLWZ). > Richard Henderson (62): > accel/tcg: Introduce PageEntryExtra > target/arm: Enable PageEntryExtra > target/arm: Fix MTE check in sve_ldnfff1_r > target/arm: Record tagged bit for user-only in sve_probe_page > target/arm: Use PageEntryExtra for MTE > target/arm: Use PageEntryExtra for BTI > include/exec: Remove target_tlb_bitN from MemTxAttrs > target/arm: Create GetPhysAddrResult > target/arm: Fix ipa_secure in get_phys_addr > target/arm: Use GetPhysAddrResult in get_phys_addr_lpae > target/arm: Use GetPhysAddrResult in get_phys_addr_v6 > target/arm: Use GetPhysAddrResult in get_phys_addr_v5 > target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav5 > target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav7 > target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav8 > target/arm: Use GetPhysAddrResult in pmsav8_mpu_lookup > target/arm: Remove is_subpage argument to pmsav8_mpu_lookup > target/arm: Add is_secure parameter to v8m_security_lookup > target/arm: Add is_secure parameter to pmsav8_mpu_lookup > target/arm: Add is_secure parameter to get_phys_addr_v5 > target/arm: Add is_secure parameter to get_phys_addr_v6 > target/arm: Add secure parameter to get_phys_addr_pmsav8 > target/arm: Add is_secure parameter to pmsav7_use_background_region > target/arm: Add is_secure parameter to get_phys_addr_lpae > target/arm: Add is_secure parameter to get_phys_addr_pmsav7 > target/arm: Add is_secure parameter to regime_translation_disabled > target/arm: Add is_secure parameter to get_phys_addr_pmsav5 Is it possible to rearrange this patchset so the easy refactoring patches that do "use a struct to return values from get_phys_addr and friends" are at the front (ie before the stuff that touches core code) ? That way they're easy to take into the tree early while the rest of the series is still under review... thanks -- PMM
[PATCH] target/arm: Rearrange cpu64.c so all the CPU initfns are together
cpu64.c has ended up in a slightly odd order -- it starts with the initfns for most of the models-real-hardware CPUs; after that comes a bunch of support code for SVE, SME, pauth and LPA2 properties. Then come the initfns for the 'host' and 'max' CPU types, and then after that one more models-real-hardware CPU initfn, for a64fx. (This ordering is partly historical and partly required because a64fx needs the SVE properties.) Reorder the file into: * CPU property support functions * initfns for real hardware CPUs * initfns for host and max * class boilerplate Signed-off-by: Peter Maydell --- I started off thinking this would be a relatively simple "move the a64fx initfn up to live with the others", but because we effectively have to move all the cpu initfns the diffstat has ended up quite big. On the other hand this patch is purely code motion, and the resulting order in the file does seem to me to be more sensible. --- target/arm/cpu64.c | 712 ++--- 1 file changed, 356 insertions(+), 356 deletions(-) diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index 78e27f778ac..741f959fbe8 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -37,313 +37,6 @@ #include "internals.h" -static void aarch64_a57_initfn(Object *obj) -{ -ARMCPU *cpu = ARM_CPU(obj); - -cpu->dtb_compatible = "arm,cortex-a57"; -set_feature(&cpu->env, ARM_FEATURE_V8); -set_feature(&cpu->env, ARM_FEATURE_NEON); -set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER); -set_feature(&cpu->env, ARM_FEATURE_AARCH64); -set_feature(&cpu->env, ARM_FEATURE_CBAR_RO); -set_feature(&cpu->env, ARM_FEATURE_EL2); -set_feature(&cpu->env, ARM_FEATURE_EL3); -set_feature(&cpu->env, ARM_FEATURE_PMU); -cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A57; -cpu->midr = 0x411fd070; -cpu->revidr = 0x; -cpu->reset_fpsid = 0x41034070; -cpu->isar.mvfr0 = 0x10110222; -cpu->isar.mvfr1 = 0x1211; -cpu->isar.mvfr2 = 0x0043; -cpu->ctr = 0x8444c004; -cpu->reset_sctlr = 0x00c50838; -cpu->isar.id_pfr0 = 0x0131; -cpu->isar.id_pfr1 = 0x00011011; -cpu->isar.id_dfr0 = 0x03010066; -cpu->id_afr0 = 0x; -cpu->isar.id_mmfr0 = 0x10101105; -cpu->isar.id_mmfr1 = 0x4000; -cpu->isar.id_mmfr2 = 0x0126; -cpu->isar.id_mmfr3 = 0x02102211; -cpu->isar.id_isar0 = 0x02101110; -cpu->isar.id_isar1 = 0x13112111; -cpu->isar.id_isar2 = 0x21232042; -cpu->isar.id_isar3 = 0x01112131; -cpu->isar.id_isar4 = 0x00011142; -cpu->isar.id_isar5 = 0x00011121; -cpu->isar.id_isar6 = 0; -cpu->isar.id_aa64pfr0 = 0x; -cpu->isar.id_aa64dfr0 = 0x10305106; -cpu->isar.id_aa64isar0 = 0x00011120; -cpu->isar.id_aa64mmfr0 = 0x1124; -cpu->isar.dbgdidr = 0x3516d000; -cpu->isar.dbgdevid = 0x01110f13; -cpu->isar.dbgdevid1 = 0x2; -cpu->isar.reset_pmcr_el0 = 0x41013000; -cpu->clidr = 0x0a200023; -cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */ -cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */ -cpu->ccsidr[2] = 0x70ffe07a; /* 2048KB L2 cache */ -cpu->dcz_blocksize = 4; /* 64 bytes */ -cpu->gic_num_lrs = 4; -cpu->gic_vpribits = 5; -cpu->gic_vprebits = 5; -cpu->gic_pribits = 5; -define_cortex_a72_a57_a53_cp_reginfo(cpu); -} - -static void aarch64_a53_initfn(Object *obj) -{ -ARMCPU *cpu = ARM_CPU(obj); - -cpu->dtb_compatible = "arm,cortex-a53"; -set_feature(&cpu->env, ARM_FEATURE_V8); -set_feature(&cpu->env, ARM_FEATURE_NEON); -set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER); -set_feature(&cpu->env, ARM_FEATURE_AARCH64); -set_feature(&cpu->env, ARM_FEATURE_CBAR_RO); -set_feature(&cpu->env, ARM_FEATURE_EL2); -set_feature(&cpu->env, ARM_FEATURE_EL3); -set_feature(&cpu->env, ARM_FEATURE_PMU); -cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A53; -cpu->midr = 0x410fd034; -cpu->revidr = 0x; -cpu->reset_fpsid = 0x41034070; -cpu->isar.mvfr0 = 0x10110222; -cpu->isar.mvfr1 = 0x1211; -cpu->isar.mvfr2 = 0x0043; -cpu->ctr = 0x84448004; /* L1Ip = VIPT */ -cpu->reset_sctlr = 0x00c50838; -cpu->isar.id_pfr0 = 0x0131; -cpu->isar.id_pfr1 = 0x00011011; -cpu->isar.id_dfr0 = 0x03010066; -cpu->id_afr0 = 0x; -cpu->isar.id_mmfr0 = 0x10101105; -cpu->isar.id_mmfr1 = 0x4000; -cpu->isar.id_mmfr2 = 0x0126; -cpu->isar.id_mmfr3 = 0x02102211; -cpu->isar.id_isar0 = 0x02101110; -cpu->isar.id_isar1 = 0x13112111; -cpu->isar.id_isar2 = 0x21232042; -cpu->isar.id_isar3 = 0x01112131; -cpu->isar.id_isar4 = 0x00011142; -cpu->isar.id_isar5 = 0x00011121; -cpu->isar.id_isar6 = 0; -cpu->isar.id_aa64pfr0 = 0x; -cpu->isar.id_aa64dfr0 = 0x10305106; -cpu->isar.id_aa64isar0 = 0x00011120; -cpu->isar.id_aa64mmfr0 = 0x1122; /* 40 bit physical addr */ -cpu->isar.dbgdidr = 0x3516d000; -
Re: [PULL 0/1] Linux user for 7.1 patches
On 8/12/22 03:33, Laurent Vivier wrote: The following changes since commit a6b1c53e79d08a99a28cc3e67a3e1a7c34102d6b: Merge tag 'linux-user-for-7.1-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging (2022-08-10 10:26:57 -0700) are available in the Git repository at: https://gitlab.com/laurent_vivier/qemu.git tags/linux-user-for-7.1-pull-request for you to fetch changes up to dbbf89751b14aa5d281bad3af273e9ffaae82262: linux-user/aarch64: Reset target data on MADV_DONTNEED (2022-08-11 11:34:17 +0200) Pull request linux-user 20220812 Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/7.1 as appropriate. r~ Vitaly Buka (1): linux-user/aarch64: Reset target data on MADV_DONTNEED accel/tcg/translate-all.c | 26 ++ include/exec/cpu-all.h| 1 + linux-user/mmap.c | 3 +++ 3 files changed, 30 insertions(+)
Re: [PATCH 00/62] target/arm: Implement FEAT_HAFDBS
On 8/12/22 09:31, Peter Maydell wrote: Is it possible to rearrange this patchset so the easy refactoring patches that do "use a struct to return values from get_phys_addr and friends" are at the front (ie before the stuff that touches core code) ? That way they're easy to take into the tree early while the rest of the series is still under review... Yes, I think so. r~
Re: [PATCH v3 1/4] accel/tcg: Invalidate translations when clearing PAGE_EXEC
On 8/12/22 08:02, Ilya Leoshkevich wrote: tb_lookup() skips get_page_addr_code() if tb is found in tb_jmp_cache. I assume it's a bug? Yes, I think so. I've rearranged that for other reasons, and so may have inadvertently fix this. I'll post the in-progress work in a moment. r~
[PATCH for-7.2 00/21] accel/tcg: minimize tlb lookups during translate + user-only PROT_EXEC fixes
This is part of a larger body of work, but in the process of reorganizing I was reminded that PROT_EXEC wasn't being enforced properly for user-only. As this has come up in the context of some of Ilya's patches, I thought I'd go ahead and post this part. r~ Ilya Leoshkevich (1): accel/tcg: Introduce is_same_page() Richard Henderson (20): linux-user/arm: Mark the commpage executable linux-user/hppa: Allocate page zero as a commpage linux-user/x86_64: Allocate vsyscall page as a commpage linux-user: Honor PT_GNU_STACK tests/tcg/i386: Move smc_code2 to an executable section accel/tcg: Remove PageDesc code_bitmap accel/tcg: Use bool for page_find_alloc accel/tcg: Merge tb_htable_lookup into caller accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c accel/tcg: Properly implement get_page_addr_code for user-only accel/tcg: Use probe_access_internal for softmmu get_page_addr_code_hostp accel/tcg: Add nofault parameter to get_page_addr_code_hostp accel/tcg: Unlock mmap_lock after longjmp accel/tcg: Hoist get_page_addr_code out of tb_lookup accel/tcg: Hoist get_page_addr_code out of tb_gen_code accel/tcg: Raise PROT_EXEC exception early accel/tcg: Remove translator_ldsw accel/tcg: Add pc and host_pc params to gen_intermediate_code accel/tcg: Add fast path for translator_ld* accel/tcg: Use DisasContextBase in plugin_gen_tb_start accel/tcg/internal.h | 7 +- include/elf.h | 1 + include/exec/cpu-common.h | 1 + include/exec/exec-all.h | 87 +--- include/exec/plugin-gen.h | 7 +- include/exec/translator.h | 85 linux-user/arm/target_cpu.h | 4 +- linux-user/qemu.h | 1 + accel/tcg/cpu-exec.c | 184 ++ accel/tcg/cputlb.c| 93 + accel/tcg/plugin-gen.c| 23 +++-- accel/tcg/translate-all.c | 120 -- accel/tcg/translator.c| 122 +- accel/tcg/user-exec.c | 15 +++ linux-user/elfload.c | 80 ++- softmmu/physmem.c | 12 +++ target/alpha/translate.c | 5 +- target/arm/translate.c| 5 +- target/avr/translate.c| 5 +- target/cris/translate.c | 5 +- target/hexagon/translate.c| 6 +- target/hppa/translate.c | 5 +- target/i386/tcg/translate.c | 7 +- target/loongarch/translate.c | 6 +- target/m68k/translate.c | 5 +- target/microblaze/translate.c | 5 +- target/mips/tcg/translate.c | 5 +- target/nios2/translate.c | 5 +- target/openrisc/translate.c | 6 +- target/ppc/translate.c| 5 +- target/riscv/translate.c | 5 +- target/rx/translate.c | 5 +- target/s390x/tcg/translate.c | 5 +- target/sh4/translate.c| 5 +- target/sparc/translate.c | 5 +- target/tricore/translate.c| 6 +- target/xtensa/translate.c | 6 +- tests/tcg/i386/test-i386.c| 2 +- 38 files changed, 532 insertions(+), 424 deletions(-) -- 2.34.1
[PATCH for-7.2 05/21] tests/tcg/i386: Move smc_code2 to an executable section
We're about to start validating PAGE_EXEC, which means that we've got to put this code into a section that is both writable and executable. Note that this test did not run on hardware beforehand either. Signed-off-by: Richard Henderson --- tests/tcg/i386/test-i386.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/tcg/i386/test-i386.c b/tests/tcg/i386/test-i386.c index ac8d5a3c1f..e6b308a2c0 100644 --- a/tests/tcg/i386/test-i386.c +++ b/tests/tcg/i386/test-i386.c @@ -1998,7 +1998,7 @@ uint8_t code[] = { 0xc3, /* ret */ }; -asm(".section \".data\"\n" +asm(".section \".data_x\",\"awx\"\n" "smc_code2:\n" "movl 4(%esp), %eax\n" "movl %eax, smc_patch_addr2 + 1\n" -- 2.34.1
[PATCH for-7.2 01/21] linux-user/arm: Mark the commpage executable
We're about to start validating PAGE_EXEC, which means that we've got to mark the commpage executable. We had been placing the commpage outside of reserved_va, which was incorrect and lead to an abort. Signed-off-by: Richard Henderson --- linux-user/arm/target_cpu.h | 4 ++-- linux-user/elfload.c| 6 +- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/linux-user/arm/target_cpu.h b/linux-user/arm/target_cpu.h index 709d19bc9e..89ba274cfc 100644 --- a/linux-user/arm/target_cpu.h +++ b/linux-user/arm/target_cpu.h @@ -34,9 +34,9 @@ static inline unsigned long arm_max_reserved_va(CPUState *cs) } else { /* * We need to be able to map the commpage. - * See validate_guest_space in linux-user/elfload.c. + * See init_guest_commpage in linux-user/elfload.c. */ -return 0xul; +return 0xul; } } #define MAX_RESERVED_VA arm_max_reserved_va diff --git a/linux-user/elfload.c b/linux-user/elfload.c index ce902dbd56..3e3dc02499 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -398,7 +398,8 @@ enum { static bool init_guest_commpage(void) { -void *want = g2h_untagged(HI_COMMPAGE & -qemu_host_page_size); +abi_ptr commpage = HI_COMMPAGE & -qemu_host_page_size; +void *want = g2h_untagged(commpage); void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); @@ -417,6 +418,9 @@ static bool init_guest_commpage(void) perror("Protecting guest commpage"); exit(EXIT_FAILURE); } + +page_set_flags(commpage, commpage + qemu_host_page_size, + PAGE_READ | PAGE_EXEC | PAGE_VALID); return true; } -- 2.34.1
[PATCH for-7.2 18/21] accel/tcg: Remove translator_ldsw
The only user can easily use translator_lduw and adjust the type to signed during the return. Signed-off-by: Richard Henderson --- include/exec/translator.h | 1 - target/i386/tcg/translate.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/include/exec/translator.h b/include/exec/translator.h index 0d0bf3a31e..45b9268ca4 100644 --- a/include/exec/translator.h +++ b/include/exec/translator.h @@ -178,7 +178,6 @@ bool translator_use_goto_tb(DisasContextBase *db, target_ulong dest); #define FOR_EACH_TRANSLATOR_LD(F) \ F(translator_ldub, uint8_t, cpu_ldub_code, /* no swap */) \ -F(translator_ldsw, int16_t, cpu_ldsw_code, bswap16) \ F(translator_lduw, uint16_t, cpu_lduw_code, bswap16)\ F(translator_ldl, uint32_t, cpu_ldl_code, bswap32) \ F(translator_ldq, uint64_t, cpu_ldq_code, bswap64) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index b7972f0ff5..a23417d058 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2033,7 +2033,7 @@ static inline uint8_t x86_ldub_code(CPUX86State *env, DisasContext *s) static inline int16_t x86_ldsw_code(CPUX86State *env, DisasContext *s) { -return translator_ldsw(env, &s->base, advance_pc(env, s, 2)); +return translator_lduw(env, &s->base, advance_pc(env, s, 2)); } static inline uint16_t x86_lduw_code(CPUX86State *env, DisasContext *s) -- 2.34.1
[PATCH for-7.2 04/21] linux-user: Honor PT_GNU_STACK
Map the stack executable if required by default or on demand. Signed-off-by: Richard Henderson --- include/elf.h| 1 + linux-user/qemu.h| 1 + linux-user/elfload.c | 19 ++- 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/include/elf.h b/include/elf.h index 3a4bcb646a..3d6b9062c0 100644 --- a/include/elf.h +++ b/include/elf.h @@ -31,6 +31,7 @@ typedef int64_t Elf64_Sxword; #define PT_LOPROC 0x7000 #define PT_HIPROC 0x7fff +#define PT_GNU_STACK (PT_LOOS + 0x474e551) #define PT_GNU_PROPERTY (PT_LOOS + 0x474e553) #define PT_MIPS_REGINFO 0x7000 diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 7d90de1b15..e2e93fbd1d 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -48,6 +48,7 @@ struct image_info { uint32_telf_flags; int personality; abi_ulong alignment; +boolexec_stack; /* Generic semihosting knows about these pointers. */ abi_ulong arg_strings; /* strings for argv */ diff --git a/linux-user/elfload.c b/linux-user/elfload.c index e315155dad..b1169ca6df 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -232,6 +232,7 @@ static bool init_guest_commpage(void) #define ELF_ARCHEM_386 #define ELF_PLATFORM get_elf_platform() +#define EXSTACK_DEFAULT true static const char *get_elf_platform(void) { @@ -308,6 +309,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, const CPUX86State *en #define ELF_ARCHEM_ARM #define ELF_CLASS ELFCLASS32 +#define EXSTACK_DEFAULT true static inline void init_thread(struct target_pt_regs *regs, struct image_info *infop) @@ -776,6 +778,7 @@ static inline void init_thread(struct target_pt_regs *regs, #else #define ELF_CLASS ELFCLASS32 +#define EXSTACK_DEFAULT true #endif @@ -973,6 +976,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, const CPUPPCState *en #define ELF_CLASS ELFCLASS64 #define ELF_ARCHEM_LOONGARCH +#define EXSTACK_DEFAULT true #define elf_check_arch(x) ((x) == EM_LOONGARCH) @@ -1068,6 +1072,7 @@ static uint32_t get_elf_hwcap(void) #define ELF_CLASS ELFCLASS32 #endif #define ELF_ARCHEM_MIPS +#define EXSTACK_DEFAULT true #ifdef TARGET_ABI_MIPSN32 #define elf_check_abi(x) ((x) & EF_MIPS_ABI2) @@ -1806,6 +1811,10 @@ static inline void init_thread(struct target_pt_regs *regs, #define bswaptls(ptr) bswap32s(ptr) #endif +#ifndef EXSTACK_DEFAULT +#define EXSTACK_DEFAULT false +#endif + #include "elf.h" /* We must delay the following stanzas until after "elf.h". */ @@ -2081,6 +2090,7 @@ static abi_ulong setup_arg_pages(struct linux_binprm *bprm, struct image_info *info) { abi_ulong size, error, guard; +int prot; size = guest_stack_size; if (size < STACK_LOWER_LIMIT) { @@ -2091,7 +2101,11 @@ static abi_ulong setup_arg_pages(struct linux_binprm *bprm, guard = qemu_real_host_page_size(); } -error = target_mmap(0, size + guard, PROT_READ | PROT_WRITE, +prot = PROT_READ | PROT_WRITE; +if (info->exec_stack) { +prot |= PROT_EXEC; +} +error = target_mmap(0, size + guard, prot, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (error == -1) { perror("mmap stack"); @@ -2919,6 +2933,7 @@ static void load_elf_image(const char *image_name, int image_fd, */ loaddr = -1, hiaddr = 0; info->alignment = 0; +info->exec_stack = EXSTACK_DEFAULT; for (i = 0; i < ehdr->e_phnum; ++i) { struct elf_phdr *eppnt = phdr + i; if (eppnt->p_type == PT_LOAD) { @@ -2961,6 +2976,8 @@ static void load_elf_image(const char *image_name, int image_fd, if (!parse_elf_properties(image_fd, info, eppnt, bprm_buf, &err)) { goto exit_errmsg; } +} else if (eppnt->p_type == PT_GNU_STACK) { +info->exec_stack = eppnt->p_flags & PF_X; } } -- 2.34.1
[PATCH for-7.2 02/21] linux-user/hppa: Allocate page zero as a commpage
We're about to start validating PAGE_EXEC, which means that we've got to mark page zero executable. We had been special casing this entirely within translate. Signed-off-by: Richard Henderson --- linux-user/elfload.c | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 3e3dc02499..29d910c4cc 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -1646,6 +1646,34 @@ static inline void init_thread(struct target_pt_regs *regs, regs->gr[31] = infop->entry; } +#define LO_COMMPAGE 0 + +static bool init_guest_commpage(void) +{ +void *want = g2h_untagged(LO_COMMPAGE); +void *addr = mmap(want, qemu_host_page_size, PROT_NONE, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); + +if (addr == MAP_FAILED) { +perror("Allocating guest commpage"); +exit(EXIT_FAILURE); +} +if (addr != want) { +return false; +} + +/* + * On Linux, page zero is normally marked execute only + gateway. + * Normal read or write is supposed to fail (thus PROT_NONE above), + * but specific offsets have kernel code mapped to raise permissions + * and implement syscalls. Here, simply mark the page executable. + * Special case the entry points during translation (see do_page_zero). + */ +page_set_flags(LO_COMMPAGE, LO_COMMPAGE + TARGET_PAGE_SIZE, + PAGE_EXEC | PAGE_VALID); +return true; +} + #endif /* TARGET_HPPA */ #ifdef TARGET_XTENSA @@ -2326,12 +2354,12 @@ static abi_ulong create_elf_tables(abi_ulong p, int argc, int envc, } #if defined(HI_COMMPAGE) -#define LO_COMMPAGE 0 +#define LO_COMMPAGE -1 #elif defined(LO_COMMPAGE) #define HI_COMMPAGE 0 #else #define HI_COMMPAGE 0 -#define LO_COMMPAGE 0 +#define LO_COMMPAGE -1 #define init_guest_commpage() true #endif @@ -2555,7 +2583,7 @@ static void pgb_static(const char *image_name, abi_ulong orig_loaddr, } else { offset = -(HI_COMMPAGE & -align); } -} else if (LO_COMMPAGE != 0) { +} else if (LO_COMMPAGE != -1) { loaddr = MIN(loaddr, LO_COMMPAGE & -align); } -- 2.34.1
[PATCH for-7.2 03/21] linux-user/x86_64: Allocate vsyscall page as a commpage
We're about to start validating PAGE_EXEC, which means that we've got to the vsyscall page executable. We had been special casing this entirely within translate. Signed-off-by: Richard Henderson --- linux-user/elfload.c | 21 + 1 file changed, 21 insertions(+) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 29d910c4cc..e315155dad 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -195,6 +195,27 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, const CPUX86State *en (*regs)[26] = tswapreg(env->segs[R_GS].selector & 0x); } +#define HI_COMMPAGE TARGET_VSYSCALL_PAGE + +static bool init_guest_commpage(void) +{ +/* + * The vsyscall page is at a high negative address aka kernel space, + * which means that we cannot actually allocate it with target_mmap. + * We still should be able to use page_set_flags, unless the user + * has specified -R reserved_va, which would trigger an assert(). + */ +if (reserved_va != 0 && +TARGET_VSYSCALL_PAGE + TARGET_PAGE_SIZE >= reserved_va) { +error_report("Cannot allocate vsyscall page"); +exit(EXIT_FAILURE); +} +page_set_flags(TARGET_VSYSCALL_PAGE, + TARGET_VSYSCALL_PAGE + TARGET_PAGE_SIZE, + PAGE_EXEC | PAGE_VALID); +return true; +} + #else #define ELF_START_MMAP 0x8000 -- 2.34.1
[PATCH for-7.2 08/21] accel/tcg: Merge tb_htable_lookup into caller
This function is used only once, so merge it into its only caller, tb_lookup. This requires moving the support routine, tb_lookup_cmp, and its private data structure, tb_desc, up in the file. Signed-off-by: Richard Henderson --- include/exec/exec-all.h | 3 - accel/tcg/cpu-exec.c| 134 +++- 2 files changed, 64 insertions(+), 73 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 311e5fb422..e7e30d55b8 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -552,9 +552,6 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs); #endif void tb_flush(CPUState *cpu); void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr); -TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, - target_ulong cs_base, uint32_t flags, - uint32_t cflags); void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr); /* GETPC is the true target of the return instruction that we'll execute. */ diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index a565a3f8ec..f6c0c0aff6 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -170,19 +170,60 @@ uint32_t curr_cflags(CPUState *cpu) return cflags; } -/* Might cause an exception, so have a longjmp destination ready */ -static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc, - target_ulong cs_base, - uint32_t flags, uint32_t cflags) +struct tb_desc { +target_ulong pc; +target_ulong cs_base; +CPUArchState *env; +tb_page_addr_t phys_page1; +uint32_t flags; +uint32_t cflags; +uint32_t trace_vcpu_dstate; +}; + +static bool tb_lookup_cmp(const void *p, const void *d) { +const TranslationBlock *tb = p; +const struct tb_desc *desc = d; + +if (tb->pc == desc->pc && +tb->page_addr[0] == desc->phys_page1 && +tb->cs_base == desc->cs_base && +tb->flags == desc->flags && +tb->trace_vcpu_dstate == desc->trace_vcpu_dstate && +tb_cflags(tb) == desc->cflags) { +/* check next page if needed */ +if (tb->page_addr[1] == -1) { +return true; +} else { +tb_page_addr_t phys_page2; +target_ulong virt_page2; + +virt_page2 = (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE; +phys_page2 = get_page_addr_code(desc->env, virt_page2); +if (tb->page_addr[1] == phys_page2) { +return true; +} +} +} +return false; +} + +/* Might cause an exception, so have a longjmp destination ready */ +static TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc, + target_ulong cs_base, + uint32_t flags, uint32_t cflags) +{ +CPUArchState *env = cpu->env_ptr; TranslationBlock *tb; -uint32_t hash; +tb_page_addr_t phys_pc; +struct tb_desc desc; +uint32_t jmp_hash, tb_hash; /* we should never be trying to look up an INVALID tb */ tcg_debug_assert(!(cflags & CF_INVALID)); -hash = tb_jmp_cache_hash_func(pc); -tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash]); +jmp_hash = tb_jmp_cache_hash_func(pc); +tb = qatomic_rcu_read(&cpu->tb_jmp_cache[jmp_hash]); if (likely(tb && tb->pc == pc && @@ -192,11 +233,25 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc, tb_cflags(tb) == cflags)) { return tb; } -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cflags); + +desc.env = env; +desc.cs_base = cs_base; +desc.flags = flags; +desc.cflags = cflags; +desc.trace_vcpu_dstate = *cpu->trace_dstate; +desc.pc = pc; +phys_pc = get_page_addr_code(desc.env, pc); +if (phys_pc == -1) { +return NULL; +} +desc.phys_page1 = phys_pc & TARGET_PAGE_MASK; +tb_hash = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate); +tb = qht_lookup_custom(&tb_ctx.htable, &desc, tb_hash, tb_lookup_cmp); if (tb == NULL) { return NULL; } -qatomic_set(&cpu->tb_jmp_cache[hash], tb); + +qatomic_set(&cpu->tb_jmp_cache[jmp_hash], tb); return tb; } @@ -487,67 +542,6 @@ void cpu_exec_step_atomic(CPUState *cpu) end_exclusive(); } -struct tb_desc { -target_ulong pc; -target_ulong cs_base; -CPUArchState *env; -tb_page_addr_t phys_page1; -uint32_t flags; -uint32_t cflags; -uint32_t trace_vcpu_dstate; -}; - -static bool tb_lookup_cmp(const void *p, const void *d) -{ -const TranslationBlock *tb = p; -const struct tb_desc *desc = d; - -if (tb->pc == desc->pc && -tb->page_addr[0] == desc->phys_page1 && -tb->cs_base == desc->cs_base && -tb->flags == desc->flags && -
[PATCH for-7.2 07/21] accel/tcg: Use bool for page_find_alloc
Bool is more appropriate type for the alloc parameter. Signed-off-by: Richard Henderson --- accel/tcg/translate-all.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index cf99b2b876..65a23f47d6 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -464,7 +464,7 @@ void page_init(void) #endif } -static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc) +static PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc) { PageDesc *pd; void **lp; @@ -532,11 +532,11 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc) static inline PageDesc *page_find(tb_page_addr_t index) { -return page_find_alloc(index, 0); +return page_find_alloc(index, false); } static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1, - PageDesc **ret_p2, tb_page_addr_t phys2, int alloc); + PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc); /* In user-mode page locks aren't used; mmap_lock is enough */ #ifdef CONFIG_USER_ONLY @@ -650,7 +650,7 @@ static inline void page_unlock(PageDesc *pd) /* lock the page(s) of a TB in the correct acquisition order */ static inline void page_lock_tb(const TranslationBlock *tb) { -page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], 0); +page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], false); } static inline void page_unlock_tb(const TranslationBlock *tb) @@ -839,7 +839,7 @@ void page_collection_unlock(struct page_collection *set) #endif /* !CONFIG_USER_ONLY */ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1, - PageDesc **ret_p2, tb_page_addr_t phys2, int alloc) + PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc) { PageDesc *p1, *p2; tb_page_addr_t page1; @@ -1289,7 +1289,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, * Note that inserting into the hash table first isn't an option, since * we can only insert TBs that are fully initialized. */ -page_lock_pair(&p, phys_pc, &p2, phys_page2, 1); +page_lock_pair(&p, phys_pc, &p2, phys_page2, true); tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK); if (p2) { tb_page_add(p2, tb, 1, phys_page2); @@ -2224,7 +2224,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags) for (addr = start, len = end - start; len != 0; len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) { -PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1); +PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true); /* If the write protection bit is set, then we invalidate the code inside. */ -- 2.34.1
[PATCH for-7.2 06/21] accel/tcg: Remove PageDesc code_bitmap
This bitmap is created and discarded immediately. We gain nothing by its existence. Signed-off-by: Richard Henderson --- accel/tcg/translate-all.c | 78 ++- 1 file changed, 4 insertions(+), 74 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index ef62a199c7..cf99b2b876 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -101,21 +101,14 @@ #define assert_memory_lock() tcg_debug_assert(have_mmap_lock()) #endif -#define SMC_BITMAP_USE_THRESHOLD 10 - typedef struct PageDesc { /* list of TBs intersecting this ram page */ uintptr_t first_tb; -#ifdef CONFIG_SOFTMMU -/* in order to optimize self modifying code, we count the number - of lookups we do to a given page to use a bitmap */ -unsigned long *code_bitmap; -unsigned int code_write_count; -#else +#ifdef CONFIG_USER_ONLY unsigned long flags; void *target_data; #endif -#ifndef CONFIG_USER_ONLY +#ifdef CONFIG_SOFTMMU QemuSpin lock; #endif } PageDesc; @@ -906,17 +899,6 @@ void tb_htable_init(void) qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode); } -/* call with @p->lock held */ -static inline void invalidate_page_bitmap(PageDesc *p) -{ -assert_page_locked(p); -#ifdef CONFIG_SOFTMMU -g_free(p->code_bitmap); -p->code_bitmap = NULL; -p->code_write_count = 0; -#endif -} - /* Set to NULL all the 'first_tb' fields in all PageDescs. */ static void page_flush_tb_1(int level, void **lp) { @@ -931,7 +913,6 @@ static void page_flush_tb_1(int level, void **lp) for (i = 0; i < V_L2_SIZE; ++i) { page_lock(&pd[i]); pd[i].first_tb = (uintptr_t)NULL; -invalidate_page_bitmap(pd + i); page_unlock(&pd[i]); } } else { @@ -1196,11 +1177,9 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list) if (rm_from_page_list) { p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); tb_page_remove(p, tb); -invalidate_page_bitmap(p); if (tb->page_addr[1] != -1) { p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); tb_page_remove(p, tb); -invalidate_page_bitmap(p); } } @@ -1245,35 +1224,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) } } -#ifdef CONFIG_SOFTMMU -/* call with @p->lock held */ -static void build_page_bitmap(PageDesc *p) -{ -int n, tb_start, tb_end; -TranslationBlock *tb; - -assert_page_locked(p); -p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE); - -PAGE_FOR_EACH_TB(p, tb, n) { -/* NOTE: this is subtle as a TB may span two physical pages */ -if (n == 0) { -/* NOTE: tb_end may be after the end of the page, but - it is not a problem */ -tb_start = tb->pc & ~TARGET_PAGE_MASK; -tb_end = tb_start + tb->size; -if (tb_end > TARGET_PAGE_SIZE) { -tb_end = TARGET_PAGE_SIZE; - } -} else { -tb_start = 0; -tb_end = ((tb->pc + tb->size) & ~TARGET_PAGE_MASK); -} -bitmap_set(p->code_bitmap, tb_start, tb_end - tb_start); -} -} -#endif - /* add the tb in the target page and protect it if necessary * * Called with mmap_lock held for user-mode emulation. @@ -1294,7 +1244,6 @@ static inline void tb_page_add(PageDesc *p, TranslationBlock *tb, page_already_protected = p->first_tb != (uintptr_t)NULL; #endif p->first_tb = (uintptr_t)tb | n; -invalidate_page_bitmap(p); #if defined(CONFIG_USER_ONLY) /* translator_loop() must have made all TB pages non-writable */ @@ -1356,10 +1305,8 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, /* remove TB from the page(s) if we couldn't insert it */ if (unlikely(existing_tb)) { tb_page_remove(p, tb); -invalidate_page_bitmap(p); if (p2) { tb_page_remove(p2, tb); -invalidate_page_bitmap(p2); } tb = existing_tb; } @@ -1736,7 +1683,6 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages, #if !defined(CONFIG_USER_ONLY) /* if no code remaining, no need to continue to use slow writes */ if (!p->first_tb) { -invalidate_page_bitmap(p); tlb_unprotect_code(start); } #endif @@ -1832,24 +1778,8 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages, } assert_page_locked(p); -if (!p->code_bitmap && -++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) { -build_page_bitmap(p); -} -if (p->code_bitmap) { -unsigned int nr; -unsigned long b; - -nr = start & ~TARGET_PAGE_MASK; -b = p->code_bitmap[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG - 1)); -if (b & ((1 << len) - 1)) { -goto do_invalidate; -} -} else { -do_invalidate: