date:20220812

On Thu, 11 Aug 2022 at 22:26, Pierre Muller  wrote:
>But as I use machines on which I am not admin,
> I needed to compile capstone locally, install it inside my home dir,
> and export PKG_CONFIG_PATH to allow the meson configuration
> to correctly detect this local installation...

Yes, like all of QEMU's many library dependencies, if you
aren't in a position to get them installed as system libraries
you'll need to sort them out locally as an individual user.
This, to the extent it's a pain, is a Linux distro problem,
not a QEMU problem.

>However, this is not optimal, especially if I want to be able to
> copy over the resulting binaries to other test machines,
> on which the configuration completely fails,
> like gcc188 for which the current clib is too old according to
> the configure requirements.
>>Is it really required to have glibc 2.56?

Do you mean glib or glibc ? The two are different...

> On several of these test machines, the version is much older...
> I tried to force acceptance by reducing the requirement,
> and it did compile successfully after that.
>
>Is there a solid reason to be so restrictive?

We try not to arbitrarily bump the version requirements,
so usually when we do there is a reason. More generally,
you should check out the "supported build platforms"
information at
https://www.qemu.org/docs/master/about/build-platforms.html
For Linux distros, we support the most recent major version,
and we support the major version prior to that for 2 years
after it's been superseded. We don't bump things like
glib version requirements if this would break one of
our supported build platforms.

So if you're running into a version problem with the
system version of a library, this means that you're
trying to build on a host platform which is not in our
supported set, probably because it is simply too old.
You can always expect that that is going to be potentially
awkward and that you're going to have to carry local
workarounds or build newer versions of dependent libraries
locally. The fix is to upgrade the machine to a newer
version of the distro.

thanks
-- PMM

[BUG] Qemu abort with error "kvm_mem_ioeventfd_add: error adding ioeventfd: File exists (17)"

2022-08-12 Thread Hao Wang via

Hi list,

When I did some tests in my virtual domain with live-attached virtio deivces, I 
got a coredump file of Qemu.

The error print from qemu is "kvm_mem_ioeventfd_add: error adding ioeventfd: 
File exists (17)".
And the call trace in the coredump file displays as below:
#0  0x89acecc8 in ?? () from /usr/lib64/libc.so.6
#1  0x89a8acbc in raise () from /usr/lib64/libc.so.6
#2  0x89a78d2c in abort () from /usr/lib64/libc.so.6
#3  0xbd7ccf1c in kvm_mem_ioeventfd_add (listener=, 
section=, match_data=, data=, 
e=) at ../accel/kvm/kvm-all.c:1607
#4  0xbd6e0304 in address_space_add_del_ioeventfds (fds_old_nb=164, 
fds_old=0x5c80a1d0, fds_new_nb=160, fds_new=0x5c565080, 
as=0xbdfa8810 )
at ../softmmu/memory.c:795
#5  address_space_update_ioeventfds (as=0xbdfa8810 ) 
at ../softmmu/memory.c:856
#6  0xbd6e24d8 in memory_region_commit () at ../softmmu/memory.c:1113
#7  0xbd6e25c4 in memory_region_transaction_commit () at 
../softmmu/memory.c:1144
#8  0xbd394eb4 in pci_bridge_update_mappings 
(br=br@entry=0xe755f7c0) at ../hw/pci/pci_bridge.c:248
#9  0xbd394f4c in pci_bridge_write_config (d=0xe755f7c0, 
address=44, val=, len=4) at ../hw/pci/pci_bridge.c:272
#10 0xbd39a928 in rp_write_config (d=0xe755f7c0, address=44, 
val=128, len=4) at ../hw/pci-bridge/pcie_root_port.c:39
#11 0xbd6df328 in memory_region_write_accessor (mr=0xe63898d0, 
addr=65580, value=, size=4, shift=, 
mask=, attrs=...) at ../softmmu/memory.c:494
#12 0xbd6dcb6c in access_with_adjusted_size (addr=addr@entry=65580, 
value=value@entry=0x817adc78, size=size@entry=4, access_size_min=, access_size_max=,
access_fn=access_fn@entry=0xbd6df284 , 
mr=mr@entry=0xe63898d0, attrs=attrs@entry=...) at ../softmmu/memory.c:556
#13 0xbd6e0dc8 in memory_region_dispatch_write 
(mr=mr@entry=0xe63898d0, addr=65580, data=, op=MO_32, 
attrs=attrs@entry=...) at ../softmmu/memory.c:1534
#14 0xbd6d0574 in flatview_write_continue (fv=fv@entry=0x5c02da00, 
addr=addr@entry=275146407980, attrs=attrs@entry=..., 
ptr=ptr@entry=0x8aa8c028, len=len@entry=4,
addr1=, l=, mr=mr@entry=0xe63898d0) at 
/usr/src/debug/qemu-6.2.0-226.aarch64/include/qemu/host-utils.h:165
#15 0xbd6d4584 in flatview_write (len=4, buf=0x8aa8c028, attrs=..., 
addr=275146407980, fv=0x5c02da00) at ../softmmu/physmem.c:3375
#16 address_space_write (as=, addr=275146407980, attrs=..., 
buf=buf@entry=0x8aa8c028, len=4) at ../softmmu/physmem.c:3467
#17 0xbd6d462c in address_space_rw (as=, addr=, attrs=..., attrs@entry=..., buf=buf@entry=0x8aa8c028, len=, is_write=)
at ../softmmu/physmem.c:3477
#18 0xbd7cf6e8 in kvm_cpu_exec (cpu=cpu@entry=0xe625dfd0) at 
../accel/kvm/kvm-all.c:2970
#19 0xbd7d09bc in kvm_vcpu_thread_fn (arg=arg@entry=0xe625dfd0) at 
../accel/kvm/kvm-accel-ops.c:49
#20 0xbd94ccd8 in qemu_thread_start (args=) at 
../util/qemu-thread-posix.c:559


By printing more info in the coredump file, I found that the addr of 
fds_old[146] and fds_new[146] are same, but fds_old[146] belonged to a 
live-attached virtio-scsi device while fds_new[146] was owned by another 
live-attached virtio-net.
The reason why addr conflicted was then been found from vm's console log. Just 
before qemu aborted, the guest kernel crashed and kdump.service booted the 
dump-capture kernel where re-alloced address for the devices.
Because those virtio devices were live-attached after vm creating, different 
addr may been assigned to them in the dump-capture kernel:

the initial kernel booting log:
[1.663297] pci :00:02.1: BAR 14: assigned [mem 0x1190-0x11af]
[1.664560] pci :00:02.1: BAR 15: assigned [mem 
0x800180-0x80019f 64bit pref]

the dump-capture kernel booting log:
[1.845211] pci :00:02.0: BAR 14: assigned [mem 0x1190-0x11bf]
[1.846542] pci :00:02.0: BAR 15: assigned [mem 
0x800180-0x8001af 64bit pref]


I think directly aborting the qemu process may not be the best choice in this 
case cuz it will interrupt the work of kdump.service so that failed to generate 
memory dump of the crashed guest kernel.
Perhaps, IMO, the error could be simply ignored in this case and just let kdump 
to reboot the system after memory-dump finishing, but I failed to find a 
suitable judgment in the codes.

Any solution for this problem? Hope I can get some helps here.

Hao

Re: [PATCH for-7.1?] Fix some typos in documentation (most of them found by codespell)

2022-08-12 Thread Hongren Zheng

On Fri, Aug 12, 2022 at 09:56:42AM +0200, Stefan Weil wrote:
> diff --git a/docs/system/devices/canokey.rst b/docs/system/devices/canokey.rst
> index c2c58ae3e7..cfa6186e48 100644
> --- a/docs/system/devices/canokey.rst
> +++ b/docs/system/devices/canokey.rst
> @@ -28,9 +28,9 @@ With the same software configuration as a hardware key,
>  the guest OS can use all the functionalities of a secure key as if
>  there was actually an hardware key plugged in.
>  
> -CanoKey QEMU provides much convenience for debuging:
> +CanoKey QEMU provides much convenience for debugging:
>  
> -* libcanokey-qemu supports debuging output thus developers can
> +* libcanokey-qemu supports debugging output thus developers can
>inspect what happens inside a secure key
>  * CanoKey QEMU supports trace event thus event
>  * QEMU USB stack supports pcap thus USB packet between the guest
> @@ -102,8 +102,8 @@ and find CanoKey QEMU there:
>  
>  You may setup the key as guided in [6]_. The console for the key is at [7]_.
>  
> -Debuging
> -
> +Debugging
> +=
>  
>  CanoKey QEMU consists of two parts, ``libcanokey-qemu.so`` and ``canokey.c``,
>  the latter of which resides in QEMU. The former provides core functionality

Reviewed-by: Hongren (Zenithal) Zheng

Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-08-12 Thread Nikunj A. Dadhania




On 12/08/22 12:48, Gupta, Pankaj wrote:
> 
>>
>> However, fallocate() preallocates full guest memory before starting the 
>> guest.
>> With this behaviour guest memory is *not* demand pinned. Is there a way 
>> to
>> prevent fallocate() from reserving full guest memory?
>
> Isn't the pinning being handled by the corresponding host memory backend 
> with mmu > notifier and architecture support while doing the memory 
> operations e.g page> migration and swapping/reclaim (not supported 
> currently AFAIU). But yes, we need> to allocate entire guest memory with 
> the new flags MEMFILE_F_{UNMOVABLE, UNRECLAIMABLE etc}.

 That is correct, but the question is when does the memory allocated, as 
 these flags are set,
 memory is neither moved nor reclaimed. In current scenario, if I start a 
 32GB guest, all 32GB is
 allocated.
>>>
>>> I guess so if guest memory is private by default.
>>>
>>> Other option would be to allocate memory as shared by default and
>>> handle on demand allocation and RMPUPDATE with page state change event. But 
>>> still that would be done at guest boot time, IIUC.
>>
>> Sorry! Don't want to hijack the other thread so replying here.
>>
>> I thought the question is for SEV SNP. For SEV, maybe the hypercall with the 
>> page state information can be used to allocate memory as we use it or 
>> something like quota based memory allocation (just thinking).
> 
> But all this would have considerable performance overhead (if by default 
> memory is shared) and used mostly at boot time. 

> So, preallocating memory (default memory private) seems better approach for 
> both SEV & SEV SNP with later page management (pinning, reclaim) taken care 
> by host memory backend & architecture together.

I am not sure how will pre-allocating memory help, even if guest would not use 
full memory it will be pre-allocated. Which if I understand correctly is not 
expected.

Regards
Nikunj

[PATCH for-7.1] docs/system/loongarch: Update the LoongArch document

2022-08-12 Thread Xiaojuan Yang

1. Add some information about how to boot the LoongArch virt
machine by uefi bios and linux kernel and how to access the
source code or binary file.
2. Move the explanation of LoongArch system emulation in the
target/loongarch/README to docs/system/loongarch/loongson3.rst


Signed-off-by: Xiaojuan Yang 
---
 docs/system/loongarch/loongson3.rst | 104 +---
 target/loongarch/README |  49 +
 2 files changed, 97 insertions(+), 56 deletions(-)

diff --git a/docs/system/loongarch/loongson3.rst 
b/docs/system/loongarch/loongson3.rst
index fa3acd01c0..66bcd28030 100644
--- a/docs/system/loongarch/loongson3.rst
+++ b/docs/system/loongarch/loongson3.rst
@@ -15,27 +15,115 @@ The ``virt`` machine supports:
 - Gpex host bridge
 - Ls7a RTC device
 - Ls7a IOAPIC device
-- Ls7a ACPI device
+- ACPI GED device
 - Fw_cfg device
 - PCI/PCIe devices
 - Memory device
-- CPU device. Type: Loongson-3A5000.
+- CPU device. Type: la464-loongarch-cpu.
 
 CPU and machine Type
 
 
 The ``qemu-system-loongarch64`` provides emulation for virt
 machine. You can specify the machine type ``virt`` and
-cpu type ``Loongson-3A5000``.
+cpu type ``la464-loongarch-cpu``.
 
 Boot options
 
 
-Now the ``virt`` machine can run test program in ELF format and the
-method of compiling is in target/loongarch/README.
+We can boot the LoongArch virt machine by specifying the uefi bios,
+initrd, and linux kernel. And those source codes and binary files
+can be accessed by following steps.
+
+(1) booting command:
+
+.. code-block:: bash
+
+  $ qemu-system-loongarch64 -machine virt -m 4G -cpu la464-loongarch-cpu \
+  -smp 1 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd initrd.img \
+  -append "root=/dev/ram rdinit=/sbin/init consol e=ttyS0,115200" \
+  --nographic
+
+Note: The running speed may be a little slow, as the performance of our
+qemu and uefi bios is not perfect, and it is being fixed.
+
+(2) cross compiler tools:
+
+.. code-block:: bash
+
+  wget https://github.com/loongson/build-tools/releases/download/ \
+  2022.05.29/loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz
+
+  tar -vxf loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz
+
+(3) qemu compile configure option:
+
+.. code-block:: bash
+
+  ./configure --disable-rdma --disable-pvrdma --prefix=usr \
+  --target-list="loongarch64-softmmu" \
+  --disable-libiscsi --disable-libnfs --disable-libpmem \
+  --disable-glusterfs --enable-libusb --enable-usb-redir \
+  --disable-opengl --disable-xen --enable-spice \
+  --enable-debug --disable-capstone --disable-kvm \
+  --enable-profiler
+  make
+
+(4) uefi bios source code and compile method:
+
+.. code-block:: bash
+
+  git clone https://github.com/loongson/edk2-LoongarchVirt.git
+
+  cd edk2-LoongarchVirt
+
+  git submodule update --init
+
+  export PATH=$YOUR_COMPILER_PATH/bin:$PATH
+
+  export WORKSPACE=`pwd`
+
+  export PACKAGES_PATH=$WORKSPACE/edk2-LoongarchVirt
+
+  export GCC5_LOONGARCH64_PREFIX=loongarch64-unknown-linux-gnu-
+
+  edk2-LoongarchVirt/edksetup.sh
+
+  make -C edk2-LoongarchVirt/BaseTools
+
+  build --buildtarget=DEBUG --tagname=GCC5 --arch=LOONGARCH64  
--platform=OvmfPkg/LoongArchQemu/Loongson.dsc
+
+  build --buildtarget=RELEASE --tagname=GCC5 --arch=LOONGARCH64  
--platform=OvmfPkg/LoongArchQemu/Loongson.dsc
+
+The efi binary file path:
+
+  Build/LoongArchQemu/DEBUG_GCC5/FV/QEMU_EFI.fd
+
+  Build/LoongArchQemu/RELEASE_GCC5/FV/QEMU_EFI.fd
+
+(5) linux kernel source code and compile method:
 
 .. code-block:: bash
 
-  $ qemu-system-loongarch64 -machine virt -m 4G -cpu Loongson-3A5000 \
-  -smp 1 -kernel hello -monitor none -display none \
-  -chardev file,path=hello.out,id=output -serial chardev:output
+  git clone https://github.com/loongson/linux.git
+
+  export PATH=$YOUR_COMPILER_PATH/bin:$PATH
+
+  export LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/lib:$LD_LIBRARY_PATH
+
+  export 
LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/loongarch64-unknown-linux-gnu/lib/:$LD_LIBRARY_PATH
+
+  make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- 
loongson3_defconfig
+
+  make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu-
+
+  make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- install
+
+  make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- 
modules_install
+
+Note: The branch of linux source code is loongarch-next.
+
+(6) initrd file:
+
+  You can use busybox tool and the linux modules to make a initrd file. Or you 
can access the
+  binary files: https://github.com/yangxiaojuan-loongson/qemu-binary
diff --git a/target/loongarch/README b/target/loongarch/README
index 1823375d04..0b9dc0d40a 100644
--- a/target/loongarch/README
+++ b/target/loongarch/README
@@ -11,54 +11,7 @@
 
 - System emulation
 
-  Mainly emulate a virt 3A5000 board and ls7a bridge that is not exactly the 
same as the host.
-  3A5000 support multiple interrupt

Re: [PATCH 2/3] hw/ssi: fixup coverity issue

On Fri, Aug 12, 2022 at 02:21:40AM +, Wilfred Mallawa wrote:
> On Thu, 2022-08-11 at 16:23 +0200, Andrew Jones wrote:
> > On Thu, Aug 11, 2022 at 09:02:00AM +1000, Wilfred Mallawa wrote:
> > > From: Wilfred Mallawa 
> > > 
> > > This patch addresses the coverity issues specified in [1],
> > > as suggested, `FIELD_DP32()`/`FIELD_EX32()` macros have been
> > > implemented to clean up the code.
> > > 
> > > Additionally, the `EVENT_ENABLE` register is correctly updated
> > > to addr of `0x34`.
> > 
> > This sounds like separate patch material.
> > 
> > > 
> > > [1]
> > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg887713.html
> > > 
> > > Fixes: Coverity CID 1488107
> > > 
> > > Signed-off-by: Wilfred Mallawa 
> > > ---
> > >  hw/ssi/ibex_spi_host.c | 141 +++--
> > > 
> > >  1 file changed, 78 insertions(+), 63 deletions(-)
> > > 
> > > diff --git a/hw/ssi/ibex_spi_host.c b/hw/ssi/ibex_spi_host.c
> > > index 601041d719..8c35bfa95f 100644
> > > --- a/hw/ssi/ibex_spi_host.c
> > > +++ b/hw/ssi/ibex_spi_host.c
> > > @@ -93,7 +93,7 @@ REG32(ERROR_STATUS, 0x30)
> > >  FIELD(ERROR_STATUS, CMDINVAL, 3, 1)
> > >  FIELD(ERROR_STATUS, CSIDINVAL, 4, 1)
> > >  FIELD(ERROR_STATUS, ACCESSINVAL, 5, 1)
> > > -REG32(EVENT_ENABLE, 0x30)
> > > +REG32(EVENT_ENABLE, 0x34)
> > >  FIELD(EVENT_ENABLE, RXFULL, 0, 1)
> > >  FIELD(EVENT_ENABLE, TXEMPTY, 1, 1)
> > >  FIELD(EVENT_ENABLE, RXWM, 2, 1)
> > > @@ -108,18 +108,20 @@ static inline uint8_t div4_round_up(uint8_t
> > > dividend)
> > >  
> > >  static void ibex_spi_rxfifo_reset(IbexSPIHostState *s)
> > >  {
> > > +    uint32_t data = s->regs[IBEX_SPI_HOST_STATUS];
> > >  /* Empty the RX FIFO and assert RXEMPTY */
> > >  fifo8_reset(&s->rx_fifo);
> > > -    s->regs[IBEX_SPI_HOST_STATUS] &= ~R_STATUS_RXFULL_MASK;
> > > -    s->regs[IBEX_SPI_HOST_STATUS] |= R_STATUS_RXEMPTY_MASK;
> > > +    data = FIELD_DP32(data, STATUS, RXEMPTY, 1);
> > > +    s->regs[IBEX_SPI_HOST_STATUS] = data;
> > >  }
> > >  
> > >  static void ibex_spi_txfifo_reset(IbexSPIHostState *s)
> > >  {
> > > +    uint32_t data = s->regs[IBEX_SPI_HOST_STATUS];
> > >  /* Empty the TX FIFO and assert TXEMPTY */
> > >  fifo8_reset(&s->tx_fifo);
> > > -    s->regs[IBEX_SPI_HOST_STATUS] &= ~R_STATUS_TXFULL_MASK;
> > > -    s->regs[IBEX_SPI_HOST_STATUS] |= R_STATUS_TXEMPTY_MASK;
> > > +    data = FIELD_DP32(data, STATUS, TXEMPTY, 1);
> > > +    s->regs[IBEX_SPI_HOST_STATUS] = data;
> > >  }
> > >  
> > >  static void ibex_spi_host_reset(DeviceState *dev)
> > > @@ -162,37 +164,41 @@ static void ibex_spi_host_reset(DeviceState
> > > *dev)
> > >   */
> > >  static void ibex_spi_host_irq(IbexSPIHostState *s)
> > >  {
> > > -    bool error_en = s->regs[IBEX_SPI_HOST_INTR_ENABLE]
> > > -    & R_INTR_ENABLE_ERROR_MASK;
> > > -    bool event_en = s->regs[IBEX_SPI_HOST_INTR_ENABLE]
> > > -    & R_INTR_ENABLE_SPI_EVENT_MASK;
> > > -    bool err_pending = s->regs[IBEX_SPI_HOST_INTR_STATE]
> > > -    & R_INTR_STATE_ERROR_MASK;
> > > -    bool status_pending = s->regs[IBEX_SPI_HOST_INTR_STATE]
> > > -    & R_INTR_STATE_SPI_EVENT_MASK;
> > > +    bool error_en = FIELD_EX32(s->regs[IBEX_SPI_HOST_INTR_ENABLE],
> > > +   INTR_ENABLE, ERROR);
> > > +
> > > +    bool event_en = FIELD_EX32(s->regs[IBEX_SPI_HOST_INTR_ENABLE],
> > > +   INTR_ENABLE, SPI_EVENT);
> > > +
> > > +    bool err_pending = FIELD_EX32(s-
> > > >regs[IBEX_SPI_HOST_INTR_STATE],
> > > +  INTR_STATE, ERROR);
> > > +
> > > +    bool status_pending = FIELD_EX32(s-
> > > >regs[IBEX_SPI_HOST_INTR_STATE],
> > > + INTR_STATE, SPI_EVENT);
> > 
> >  uint32_t reg = s->regs[IBEX_SPI_HOST_INTR_ENABLE];
> >  bool error_en = FIELD_EX32(reg, INTR_ENABLE, ERROR);
> >  bool event_en = FIELD_EX32(reg, INTR_ENABLE, SPI_EVENT);
> >  ...
> > 
> > would like nicer, IMHO.
> as per the comment below, do you mean to make all the conditions
> variables here? and substitute those for the if statements instead of
> the `FIELD_..` macros? 

For the above code, the suggestion I gave is what I'm thinking. For the
below code, keep the FIELD macros in place, but shrink the length of the
line by doing

 uint32_t enable = s->regs[IBEX_SPI_HOST_ERROR_ENABLE];
 uint32_t status = s->regs[IBEX_SPI_HOST_ERROR_STATUS];

at the top, and then

} else if (FIELD_EX32(enable, ERROR_ENABLE,  CMDBUSY) &&
   FIELD_EX32(status, ERROR_STATUS,  CMDBUSY)) {

...

> > 
> > 
> > > +
> > >  int err_irq = 0, event_irq = 0;
> > >  
> > >  /* Error IRQ enabled and Error IRQ Cleared */
> > >  if (error_en && !err_pending) {
> > >  /* Event enabled, Interrupt Test Error */
> > > -    if (s->regs[IBEX_SPI_HOST_INTR_TEST] &
> > > R_INTR_TEST_ERROR_MASK) {
> > > +    if (FIELD_EX32(s->regs[IBEX_SPI_HOST_INT

Re: qemu-system-aarch64: Failed to retrieve host CPU features

I've added some more relevant mailing lists to the cc.

On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov  wrote:
> On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote:
> > I noticed that we starting to get many errors like this:
> >
> >   qemu-system-aarch64: Failed to retrieve host CPU features
> >
> > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and
> > Linux kernel is v5.15.59, but it started to appear months before that.
> >
> > strace shows in erroneous case:
> >
> >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > call)
> >
> > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu:
> >
> > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
> > if (vmfd < 0) {
> > goto err;
> > }
> >
> > Maybe it should restart ioctl on EINTR?
> >
> > I don't see EINTR documented in ioctl(2) nor in Linux'
> > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it
> > says "an unmasked signal is pending".
>
> I am suggested that almost any blocking syscall could return EINTR, so I
> checked the strace log and it does not show evidence of arriving a signal,
> the log ends like this:
>
>   1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9
>   1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48
>   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system call)
>   1152244 close(9)= 0
>   1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, 
> st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, 
> st_blocks=0, st_size=0, st_atime=1660268019 /* 
> 2022-08-12T01:33:39.850436293+ */, st_atime_nsec=850436293, 
> st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+ */, 
> st_mtime_nsec=850436293, st_ctime=1660268019 /* 
> 2022-08-12T01:33:39.850436293+ */, st_ctime_nsec=850436293}, 
> AT_EMPTY_PATH) = 0
>   1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58
>   1152244 exit_group(1)   = ?
>   1152245 <... clock_nanosleep resumed> ) = ?
>   1152245 +++ exited with 1 +++
>   1152244 +++ exited with 1 +++

KVM folks: should we expect that KVM_CREATE_VM might fail EINTR
and need retrying?

(I suspect the answer is "yes", given we do this in the generic
code in kvm-all.c.)

thanks
-- PMM

Re: [PATCH for-7.1] docs/system/loongarch: Update the LoongArch document

2022-08-12 Thread gaosong




On 2022/8/12 下午5:19, Xiaojuan Yang wrote:

1. Add some information about how to boot the LoongArch virt
machine by uefi bios and linux kernel and how to access the
source code or binary file.
2. Move the explanation of LoongArch system emulation in the
target/loongarch/README to docs/system/loongarch/loongson3.rst


Signed-off-by: Xiaojuan Yang 
---
  docs/system/loongarch/loongson3.rst | 104 +---
  target/loongarch/README |  49 +
  2 files changed, 97 insertions(+), 56 deletions(-)


Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/docs/system/loongarch/loongson3.rst 
b/docs/system/loongarch/loongson3.rst
index fa3acd01c0..66bcd28030 100644
--- a/docs/system/loongarch/loongson3.rst
+++ b/docs/system/loongarch/loongson3.rst
@@ -15,27 +15,115 @@ The ``virt`` machine supports:
  - Gpex host bridge
  - Ls7a RTC device
  - Ls7a IOAPIC device
-- Ls7a ACPI device
+- ACPI GED device
  - Fw_cfg device
  - PCI/PCIe devices
  - Memory device
-- CPU device. Type: Loongson-3A5000.
+- CPU device. Type: la464-loongarch-cpu.
  
  CPU and machine Type

  
  
  The ``qemu-system-loongarch64`` provides emulation for virt

  machine. You can specify the machine type ``virt`` and
-cpu type ``Loongson-3A5000``.
+cpu type ``la464-loongarch-cpu``.
  
  Boot options

  
  
-Now the ``virt`` machine can run test program in ELF format and the

-method of compiling is in target/loongarch/README.
+We can boot the LoongArch virt machine by specifying the uefi bios,
+initrd, and linux kernel. And those source codes and binary files
+can be accessed by following steps.
+
+(1) booting command:
+
+.. code-block:: bash
+
+  $ qemu-system-loongarch64 -machine virt -m 4G -cpu la464-loongarch-cpu \
+  -smp 1 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd initrd.img \
+  -append "root=/dev/ram rdinit=/sbin/init consol e=ttyS0,115200" \
+  --nographic
+
+Note: The running speed may be a little slow, as the performance of our
+qemu and uefi bios is not perfect, and it is being fixed.
+
+(2) cross compiler tools:
+
+.. code-block:: bash
+
+  wget https://github.com/loongson/build-tools/releases/download/ \
+  2022.05.29/loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz
+
+  tar -vxf loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz
+
+(3) qemu compile configure option:
+
+.. code-block:: bash
+
+  ./configure --disable-rdma --disable-pvrdma --prefix=usr \
+  --target-list="loongarch64-softmmu" \
+  --disable-libiscsi --disable-libnfs --disable-libpmem \
+  --disable-glusterfs --enable-libusb --enable-usb-redir \
+  --disable-opengl --disable-xen --enable-spice \
+  --enable-debug --disable-capstone --disable-kvm \
+  --enable-profiler
+  make
+
+(4) uefi bios source code and compile method:
+
+.. code-block:: bash
+
+  git clone https://github.com/loongson/edk2-LoongarchVirt.git
+
+  cd edk2-LoongarchVirt
+
+  git submodule update --init
+
+  export PATH=$YOUR_COMPILER_PATH/bin:$PATH
+
+  export WORKSPACE=`pwd`
+
+  export PACKAGES_PATH=$WORKSPACE/edk2-LoongarchVirt
+
+  export GCC5_LOONGARCH64_PREFIX=loongarch64-unknown-linux-gnu-
+
+  edk2-LoongarchVirt/edksetup.sh
+
+  make -C edk2-LoongarchVirt/BaseTools
+
+  build --buildtarget=DEBUG --tagname=GCC5 --arch=LOONGARCH64  
--platform=OvmfPkg/LoongArchQemu/Loongson.dsc
+
+  build --buildtarget=RELEASE --tagname=GCC5 --arch=LOONGARCH64  
--platform=OvmfPkg/LoongArchQemu/Loongson.dsc
+
+The efi binary file path:
+
+  Build/LoongArchQemu/DEBUG_GCC5/FV/QEMU_EFI.fd
+
+  Build/LoongArchQemu/RELEASE_GCC5/FV/QEMU_EFI.fd
+
+(5) linux kernel source code and compile method:
  
  .. code-block:: bash
  
-  $ qemu-system-loongarch64 -machine virt -m 4G -cpu Loongson-3A5000 \

-  -smp 1 -kernel hello -monitor none -display none \
-  -chardev file,path=hello.out,id=output -serial chardev:output
+  git clone https://github.com/loongson/linux.git
+
+  export PATH=$YOUR_COMPILER_PATH/bin:$PATH
+
+  export LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/lib:$LD_LIBRARY_PATH
+
+  export 
LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/loongarch64-unknown-linux-gnu/lib/:$LD_LIBRARY_PATH
+
+  make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- 
loongson3_defconfig
+
+  make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu-
+
+  make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- install
+
+  make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- 
modules_install
+
+Note: The branch of linux source code is loongarch-next.
+
+(6) initrd file:
+
+  You can use busybox tool and the linux modules to make a initrd file. Or you 
can access the
+  binary files: https://github.com/yangxiaojuan-loongson/qemu-binary
diff --git a/target/loongarch/README b/target/loongarch/README
index 1823375d04..0b9dc0d40a 100644
--- a/target/loongarch/README
+++ b/target/loongarch/README
@@ -11,54 +11,7 @@
  
  - System emulation
  
-  Mainly emula

Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-08-12 Thread Gupta, Pankaj






However, fallocate() preallocates full guest memory before starting the guest.
With this behaviour guest memory is *not* demand pinned. Is there a way to
prevent fallocate() from reserving full guest memory?


Isn't the pinning being handled by the corresponding host memory backend with mmu > 
notifier and architecture support while doing the memory operations e.g page> 
migration and swapping/reclaim (not supported currently AFAIU). But yes, we need> to 
allocate entire guest memory with the new flags MEMFILE_F_{UNMOVABLE, UNRECLAIMABLE etc}.


That is correct, but the question is when does the memory allocated, as these 
flags are set,
memory is neither moved nor reclaimed. In current scenario, if I start a 32GB 
guest, all 32GB is
allocated.


I guess so if guest memory is private by default.

Other option would be to allocate memory as shared by default and
handle on demand allocation and RMPUPDATE with page state change event. But 
still that would be done at guest boot time, IIUC.


Sorry! Don't want to hijack the other thread so replying here.

I thought the question is for SEV SNP. For SEV, maybe the hypercall with the 
page state information can be used to allocate memory as we use it or something 
like quota based memory allocation (just thinking).


But all this would have considerable performance overhead (if by default memory 
is shared) and used mostly at boot time.



So, preallocating memory (default memory private) seems better approach for both SEV 
& SEV SNP with later page management (pinning, reclaim) taken care by host memory 
backend & architecture together.


I am not sure how will pre-allocating memory help, even if guest would not use 
full memory it will be pre-allocated. Which if I understand correctly is not 
expected.


For SEV I am also not very sure what would be the best way.
There could be a tradeoff between memory pinning and performance.
As I was also thinking about "Async page fault" aspect of guest
in my previous reply. Details needs to be figure out.

Quoting my previous reply here:
"Or maybe later we can think of something like allowing direct page 
fault on host memory access for *SEV* guest as there is no strict 
requirement for memory integrity guarantee and the performance overhead."


Thanks,
Pankaj

Re: [PATCH v2 0/5] hw/smbios: add core_count2 to smbios table type 4

2022-08-12 Thread Michael S. Tsirkin

On Sun, Jul 31, 2022 at 06:21:36PM +0200, Julia Suvorova wrote:
> The SMBIOS 3.0 specification provides the ability to reflect over
> 255 cores. The 64-bit entry point has been used for a while, but
> structure type 4 has not been updated before, so the dmidecode output
> looked like this (-smp 280):
> 
> Handle 0x0400, DMI type 4, 42 bytes
> Processor Information
> ...
> Core Count: 24
> Core Enabled: 24
> Thread Count: 1
> ...
> 
> Big update in the bios-tables-test as it couldn't work with SMBIOS 3.0.

Looks good to me. I tagged this but just to help make sure it's
not lost pls ping me after the release.

> v2:
> * generate tables type 4 of different sizes based on the
>   selected smbios version
> * use SmbiosEntryPoint* types instead of creating new constants
> * refactor smbios_cpu_test [Igor, Ani]
> * clarify signature check [Igor]
> * add comments with specifications and clarification of the structure 
> loop [Ani]
> 
> Julia Suvorova (5):
>   hw/smbios: add core_count2 to smbios table type 4
>   bios-tables-test: teach test to use smbios 3.0 tables
>   tests/acpi: allow changes for core_count2 test
>   bios-tables-test: add test for number of cores > 255
>   tests/acpi: update tables for new core count test
> 
>  hw/smbios/smbios_build.h |   9 +-
>  include/hw/firmware/smbios.h |  11 ++
>  hw/smbios/smbios.c   |  18 +++-
>  tests/qtest/bios-tables-test.c   | 148 ---
>  tests/data/acpi/q35/APIC.core-count2 | Bin 0 -> 2478 bytes
>  tests/data/acpi/q35/DSDT.core-count2 | Bin 0 -> 32414 bytes
>  tests/data/acpi/q35/FACP.core-count2 | Bin 0 -> 244 bytes
>  7 files changed, 144 insertions(+), 42 deletions(-)
>  create mode 100644 tests/data/acpi/q35/APIC.core-count2
>  create mode 100644 tests/data/acpi/q35/DSDT.core-count2
>  create mode 100644 tests/data/acpi/q35/FACP.core-count2
> 
> -- 
> 2.35.3

Re: [PATCH for-7.1?] Fix some typos in documentation (most of them found by codespell)

On Fri, 12 Aug 2022 at 08:59, Stefan Weil via  wrote:
>
> Signed-off-by: Stefan Weil 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH for-7.1?] Fix some typos in documentation (most of them found by codespell)

On Fri, 12 Aug 2022 at 08:59, Stefan Weil via  wrote:
>
> Signed-off-by: Stefan Weil 
> ---
>  docs/about/deprecated.rst   |  2 +-
>  docs/specs/acpi_erst.rst|  4 ++--
>  docs/system/devices/canokey.rst |  8 
>  docs/system/devices/cxl.rst | 12 ++--
>  4 files changed, 13 insertions(+), 13 deletions(-)
>

Docs patches are safe for rc3. I have to do a target-arm
pullreq anyway for rc3, so I'll take this via that, unless
anybody objects.

thanks
-- PMM

Re: [PATCH] tests/unit: fix a -Wformat-trunction warning

On Wed, 10 Aug 2022 at 13:20,  wrote:
>
> From: Marc-André Lureau 
>
> ../tests/test-qobject-input-visitor.c: In function ‘test_visitor_in_list’:
> ../tests/test-qobject-input-visitor.c:454:49: warning: ‘%d’ directive output 
> may be truncated writing between 1 and 10 bytes into a region of size 6 
> [-Wformat-truncation=]
>   454 | snprintf(string, sizeof(string), "string%d", i);
>   | ^~
> ../tests/test-qobject-input-visitor.c:454:42: note: directive argument in the 
> range [0, 2147483606]
>   454 | snprintf(string, sizeof(string), "string%d", i);
>   |  ^~
> ../tests/test-qobject-input-visitor.c:454:9: note: ‘snprintf’ output between 
> 8 and 17 bytes into a destination of size 12
>   454 | snprintf(string, sizeof(string), "string%d", i);
>   | ^~~
>
> Not trying to be clever, this is called 3 times during tests,
> let simply use g_strdup_printf().
>
> Signed-off-by: Marc-André Lureau 

This is a pretty safe fix and compiler warnings seem like they're
worth fixing for rc3. Since I need to do a target-arm pullreq
anyway I'll take this via that, unless anybody objects.

(I've fixed up the commit message typo in my tree.)

thanks
-- PMM

Re: [PATCH] hw/arm/virt-acpi-build: Present the GICR structure properly for GICv4

On Fri, 12 Aug 2022 at 03:20, Zenghui Yu  wrote:
>
> With the introduction of the new TCG GICv4, build_madt() is badly broken
> as we do not present any GIC Redistributor structure in MADT for GICv4
> guests, so that they have no idea about where the Redistributor
> register frames are. This fixes a Linux guest crash at boot time with
> ACPI enabled and '-machine gic-version=4'.
>
> While at it, let's convert the remaining hard coded gic_version into
> enumeration VIRT_GIC_VERSION_2 for consistency.
>
> Signed-off-by: Zenghui Yu 

Oops, I missed the ACPI side of things when I added GICv4
support :-(

Crash-on-boot seems like a bug worth fixing for rc3...



Applied to target-arm.next, thanks.

-- PMM

Re: add qemu_fdt_setprop_strings() and use it in most places

On Tue, 9 Aug 2022 at 19:57, Ben Dooks  wrote:
>
> Add a helper for qemu_fdt_setprop_strings() to take a set of strings
> to put into a device-tree, which removes several open-coded methods
> such as setting an char arr[] = {..} or setting char val[] = "str\0str2";
>
> This is for hw/arm, hw/mips and hw/riscv as well as a couple of cores. It
> is not fully tested over all of those, I may not have caught all places
> this is to be replaced.

Hi; just as a minor process thing for next time you send a
patchset: the usual convention is that the 'cover letter'
email has a subject starting with a tag like "[PATCH v4 0/6]".
This makes it easier to pick the cover letter out when eyeballing
a set of subject lines, and also helps some of the automated
tooling (eg I think this is why https://patchew.org/QEMU/ didn't
notice this series). Usually 'git format-patch --cover-letter'
will get this right automatically for you.

thanks
-- PMM

[PULL 1/1] linux-user/aarch64: Reset target data on MADV_DONTNEED

2022-08-12 Thread Laurent Vivier

From: Vitaly Buka 

aarch64 stores MTE tags in target_date, and they should be reset by
MADV_DONTNEED.

Signed-off-by: Vitaly Buka 
Reviewed-by: Richard Henderson 
Message-Id: <20220711220028.2467290-1-vitalyb...@google.com>
[lv: fix code style issues]
Signed-off-by: Laurent Vivier 
---
 accel/tcg/translate-all.c | 26 ++
 include/exec/cpu-all.h|  1 +
 linux-user/mmap.c |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ef62a199c7db..b83161a08190 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2314,6 +2314,32 @@ void page_set_flags(target_ulong start, target_ulong 
end, int flags)
 }
 }
 
+void page_reset_target_data(target_ulong start, target_ulong end)
+{
+target_ulong addr, len;
+
+/*
+ * This function should never be called with addresses outside the
+ * guest address space.  If this assert fires, it probably indicates
+ * a missing call to h2g_valid.
+ */
+assert(end - 1 <= GUEST_ADDR_MAX);
+assert(start < end);
+assert_memory_lock();
+
+start = start & TARGET_PAGE_MASK;
+end = TARGET_PAGE_ALIGN(end);
+
+for (addr = start, len = end - start;
+ len != 0;
+ len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
+PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+
+g_free(p->target_data);
+p->target_data = NULL;
+}
+}
+
 void *page_get_target_data(target_ulong address)
 {
 PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index f5bda2c3caa7..491629b9ba7a 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -271,6 +271,7 @@ int walk_memory_regions(void *, walk_memory_regions_fn);
 
 int page_get_flags(target_ulong address);
 void page_set_flags(target_ulong start, target_ulong end, int flags);
+void page_reset_target_data(target_ulong start, target_ulong end);
 int page_check_range(target_ulong start, target_ulong len, int flags);
 
 /**
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index edceaca4a8e1..048c4135af14 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -894,6 +894,9 @@ abi_long target_madvise(abi_ulong start, abi_ulong len_in, 
int advice)
 if (advice == MADV_DONTNEED &&
 can_passthrough_madv_dontneed(start, end)) {
 ret = get_errno(madvise(g2h_untagged(start), len, MADV_DONTNEED));
+if (ret == 0) {
+page_reset_target_data(start, start + len);
+}
 }
 mmap_unlock();
 
-- 
2.37.1

Re: [PATCH for-7.1] cutils: Add missing dyld(3) include on macOS

On Tue, 9 Aug 2022 at 23:22, Philippe Mathieu-Daudé via
 wrote:
>
> Commit 06680b15b4 moved qemu_*_exec_dir() to cutils but forgot
> to move the macOS dyld(3) include, resulting in the following
> error (when building with Homebrew GCC on macOS Monterey 12.4):
>
>   [313/1197] Compiling C object libqemuutil.a.p/util_cutils.c.o
>   FAILED: libqemuutil.a.p/util_cutils.c.o
>   ../../util/cutils.c:1039:13: error: implicit declaration of function 
> '_NSGetExecutablePath' [-Werror=implicit-function-declaration]
>1039 | if (_NSGetExecutablePath(fpath, &len) == 0) {
> | ^~~~
>   ../../util/cutils.c:1039:13: error: nested extern declaration of 
> '_NSGetExecutablePath' [-Werror=nested-externs]
>
> Fix by moving the include line to cutils.
>
> Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils")
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Marc-André Lureau 
> Cc: Markus Armbruster 

I wonder why this doesn't show up with clang?

Anyway, obvious fix. I'll take it via target-arm.next for my
pull for rc3, unless anybody has a different preference.

thanks
-- PMM

[PULL 0/1] Linux user for 7.1 patches

2022-08-12 Thread Laurent Vivier

The following changes since commit a6b1c53e79d08a99a28cc3e67a3e1a7c34102d6b:

  Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-10 10:26:57 -0700)

are available in the Git repository at:

  https://gitlab.com/laurent_vivier/qemu.git 
tags/linux-user-for-7.1-pull-request

for you to fetch changes up to dbbf89751b14aa5d281bad3af273e9ffaae82262:

  linux-user/aarch64: Reset target data on MADV_DONTNEED (2022-08-11 11:34:17 
+0200)


Pull request linux-user 20220812



Vitaly Buka (1):
  linux-user/aarch64: Reset target data on MADV_DONTNEED

 accel/tcg/translate-all.c | 26 ++
 include/exec/cpu-all.h|  1 +
 linux-user/mmap.c |  3 +++
 3 files changed, 30 insertions(+)

-- 
2.37.1

[PATCH] osdep: Introduce qemu_get_fd() to wrap the common codes

2022-08-12 Thread Guoyi Tu


socket_get_fd() have much the same codes as monitor_fd_param(),
so qemu_get_fd() is introduced to implement the common logic.
now socket_get_fd() and monitor_fd_param() directly call this
function.

Signed-off-by: Guoyi Tu 
---
 include/qemu/osdep.h |  1 +
 monitor/misc.c   | 21 +
 util/osdep.c | 25 +
 util/qemu-sockets.c  | 17 +
 4 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b1c161c035..b920f128a7 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -491,6 +491,7 @@ int qemu_open_old(const char *name, int flags, ...);
 int qemu_open(const char *name, int flags, Error **errp);
 int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
 int qemu_close(int fd);
+int qemu_get_fd(Monitor *mon, const char *fdname, Error **errp);
 int qemu_unlink(const char *name);
 #ifndef _WIN32
 int qemu_dup_flags(int fd, int flags);
diff --git a/monitor/misc.c b/monitor/misc.c
index 3d2312ba8d..0d3372cf2b 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -1395,26 +1395,7 @@ void monitor_fdset_dup_fd_remove(int dup_fd)

 int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
 {
-    int fd;
-    Error *local_err = NULL;
-
-    if (!qemu_isdigit(fdname[0]) && mon) {
-    fd = monitor_get_fd(mon, fdname, &local_err);
-    } else {
-    fd = qemu_parse_fd(fdname);
-    if (fd == -1) {
-    error_setg(&local_err, "Invalid file descriptor number '%s'",
-   fdname);
-    }
-    }
-    if (local_err) {
-    error_propagate(errp, local_err);
-    assert(fd == -1);
-    } else {
-    assert(fd != -1);
-    }
-
-    return fd;
+    return qemu_get_fd(mon, fdname, errp);
 }

 /* Please update hmp-commands.hx when adding or changing commands */
diff --git a/util/osdep.c b/util/osdep.c
index 60fcbbaebe..c57551ca78 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -23,6 +23,7 @@
  */
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qemu/ctype.h"
 #include "qemu/cutils.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
@@ -413,6 +414,30 @@ int qemu_close(int fd)
 return close(fd);
 }

+int qemu_get_fd(Monitor *mon, const char *fdname, Error **errp)
+{
+    int fd;
+    Error *local_err = NULL;
+
+    if (!qemu_isdigit(fdname[0]) && mon) {
+    fd = monitor_get_fd(mon, fdname, &local_err);
+    } else {
+    fd = qemu_parse_fd(fdname);
+    if (fd == -1) {
+    error_setg(&local_err, "Invalid file descriptor number '%s'",
+   fdname);
+    }
+    }
+    if (local_err) {
+    error_propagate(errp, local_err);
+    assert(fd == -1);
+    } else {
+    assert(fd != -1);
+    }
+
+    return fd;
+}
+
 /*
  * Delete a file from the filesystem, unless the filename is 
/dev/fdset/...

  *
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 13b5b197f9..92960ee6eb 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -1142,19 +1142,12 @@ static int socket_get_fd(const char *fdstr, 
Error **errp)

 {
 Monitor *cur_mon = monitor_cur();
 int fd;
-    if (cur_mon) {
-    fd = monitor_get_fd(cur_mon, fdstr, errp);
-    if (fd < 0) {
-    return -1;
-    }
-    } else {
-    if (qemu_strtoi(fdstr, NULL, 10, &fd) < 0) {
-    error_setg_errno(errp, errno,
- "Unable to parse FD number %s",
- fdstr);
-    return -1;
-    }
+
+    fd = qemu_get_fd(cur_mon, fdstr, errp);
+    if (fd < 0) {
+    return -1;
 }
+
 if (!fd_is_socket(fd)) {
 error_setg(errp, "File descriptor '%s' is not a socket", fdstr);
 close(fd);
--
2.25.1

[PATCH] hw/nvme: remove param zoned.auto_transition

2022-08-12 Thread Niklas Cassel via

The intention of the Zoned Namespace Command Set Specification was
never to make an automatic zone transition optional.

Excerpt from the nvmexpress.org zns mailing list:
"""
A question came up internally on the differences between ZNS and ZAC/ZBC
that asked about when a controller should transitions a specific zone in
the Implicitly Opened state to Closed state.

For example, consider a ZNS SSD that supports a max of 20 active zones,
and a max of 10 open zones, which has the following actions occur:

First, the host writes to ten empty zones, thereby transitioning 10 zones
to the Implicitly Opened state.

Second, the host issues a write to an 11th empty zone.

Given that state, my understanding of the second part is that the ZNS SSD
chooses one of the previously 10 zones, and transition the chosen zone to
the Closed state, and then proceeds to write to the new zone which also
implicitly transition it from the Empty state to the Impl. Open state.
After this, there would be 11 active zones in total, 10 in impl. Open
state, and one in closed state.

The above assumes that a ZNS SSD will always transition an implicitly
opened zone to closed state when required to free up resources when
another zone is opened. However, it isn’t strictly said in the ZNS spec.

The paragraph that should cover it is defined in section
2.1.1.4.1 – Managing Resources:
The controller may transition zones in the ZSIO:Implicitly Opened state
to the ZSC:Closed state for resource management purposes.

However, it doesn’t say “when” it should occur. Thus, as the text stand,
it could be misinterpreted that the controller shouldn’t do close a zone
to make room for a new zone. The issue with this, is that it makes the
point of having implicitly managed zones moot.

The ZAC/ZBC specs is more specific and clarifies when a zone should be
closed. I think it would be natural to the same here.
"""

While the Zoned Namespace Command Set Specification hasn't received an
errata yet, it is quite clear that the intention was that an automatic
zone transition was never supposed to be optional, as then the whole
point of having implictly open zones would be pointless. Therefore,
remove the param zoned.auto_transition, as this was never supposed to
be controller implementation specific.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ctrl.c | 35 ---
 hw/nvme/nvme.h |  1 -
 2 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 87aeba0564..b8c8075301 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -34,7 +34,6 @@
  *  aerl=,aer_max_queued=, \
  *  mdts=,vsl=, \
  *  zoned.zasl=, \
- *  zoned.auto_transition=, \
  *  sriov_max_vfs= \
  *  sriov_vq_flexible= \
  *  sriov_vi_flexible= \
@@ -106,11 +105,6 @@
  *   the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e.
  *   defaulting to the value of `mdts`).
  *
- * - `zoned.auto_transition`
- *   Indicates if zones in zone state implicitly opened can be automatically
- *   transitioned to zone state closed for resource management purposes.
- *   Defaults to 'on'.
- *
  * - `sriov_max_vfs`
  *   Indicates the maximum number of PCIe virtual functions supported
  *   by the controller. The default value is 0. Specifying a non-zero value
@@ -1867,8 +1861,8 @@ enum {
 NVME_ZRM_ZRWA = 1 << 1,
 };
 
-static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, NvmeNamespace *ns,
-NvmeZone *zone, int flags)
+static uint16_t nvme_zrm_open_flags(NvmeNamespace *ns, NvmeZone *zone,
+int flags)
 {
 int act = 0;
 uint16_t status;
@@ -1880,9 +1874,7 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, 
NvmeNamespace *ns,
 /* fallthrough */
 
 case NVME_ZONE_STATE_CLOSED:
-if (n->params.auto_transition_zones) {
-nvme_zrm_auto_transition_zone(ns);
-}
+nvme_zrm_auto_transition_zone(ns);
 status = nvme_zns_check_resources(ns, act, 1,
   (flags & NVME_ZRM_ZRWA) ? 1 : 0);
 if (status) {
@@ -1925,10 +1917,9 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, 
NvmeNamespace *ns,
 }
 }
 
-static inline uint16_t nvme_zrm_auto(NvmeCtrl *n, NvmeNamespace *ns,
- NvmeZone *zone)
+static inline uint16_t nvme_zrm_auto(NvmeNamespace *ns, NvmeZone *zone)
 {
-return nvme_zrm_open_flags(n, ns, zone, NVME_ZRM_AUTO);
+return nvme_zrm_open_flags(ns, zone, NVME_ZRM_AUTO);
 }
 
 static void nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone,
@@ -3065,7 +3056,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 goto invalid;
 }
 
-status = nvme_zrm_auto(n, ns, iocb->zone);
+status = nvme_zrm_auto(ns, iocb->zone);
 if (status) {
 goto invalid;
 }
@@ -3471,7 +3462,7 @@ static u

Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

On Thu, Aug 11, 2022 at 01:41:04PM -0700, Furquan Shaikh wrote:
> Unlike ARM, RISC-V does not define a separate breakpoint type for
> semihosting. Instead, it is entirely ABI. Thus, we need an option
> to allow users to configure what the ebreak behavior should be for
> different privilege levels - M, S, U, VS, VU. As per the RISC-V
> privilege specification[1], ebreak traps into the execution
> environment. However, RISC-V debug specification[2] provides
> ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
> be configured to trap into debug mode instead. This change adds
> settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
> `ebreakvs` and `ebreakvu` to allow user to configure whether qemu
> should treat ebreak as semihosting traps or trap according to the
> privilege specification.
> 
> [1] 
> https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
> [2] 
> https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf
> 
> Signed-off-by: Furquan Shaikh 
> ---
>  target/riscv/cpu.c|  8 
>  target/riscv/cpu.h|  7 +++
>  target/riscv/cpu_helper.c | 26 +-
>  3 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index ac6f82ebd0..082194652b 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -953,6 +953,14 @@ static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("short-isa-string", RISCVCPU,
> cfg.short_isa_string, false),
> 
>  DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false),
> +
> +/* Debug spec */
> +DEFINE_PROP_BOOL("ebreakm", RISCVCPU, cfg.ebreakm, true),
> +DEFINE_PROP_BOOL("ebreaks", RISCVCPU, cfg.ebreaks, false),
> +DEFINE_PROP_BOOL("ebreaku", RISCVCPU, cfg.ebreaku, false),
> +DEFINE_PROP_BOOL("ebreakvs", RISCVCPU, cfg.ebreakvs, false),
> +DEFINE_PROP_BOOL("ebreakvu", RISCVCPU, cfg.ebreakvu, false),
> +
>  DEFINE_PROP_END_OF_LIST(),
>  };
> 
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 5c7acc055a..eee8e487a6 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -454,6 +454,13 @@ struct RISCVCPUConfig {
>  bool epmp;
>  bool aia;
>  bool debug;
> +
> +/* Debug spec */
> +bool ebreakm;
> +bool ebreaks;
> +bool ebreaku;
> +bool ebreakvs;
> +bool ebreakvu;

There's only five of these, so having each separate probably makes the
most sense, but I wanted to point out that we could keep the properties
independent booleans, as we should, but still consolidate the values
into a single bitmap like we did for the sve vq bitmap for arm (see
cpu_arm_get/set_vq). Maybe worth considering?

>  uint64_t resetvec;
> 
>  bool short_isa_string;
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 59b3680b1b..be09abbe27 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -1314,6 +1314,30 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr
> address, int size,
> 
>  return true;
>  }
> +
> +static bool semihosting_enabled(RISCVCPU *cpu)
> +{
> +CPURISCVState *env = &cpu->env;
> +
> +switch (env->priv) {
> +case PRV_M:
> +return cpu->cfg.ebreakm;
> +case PRV_S:
> +if (riscv_cpu_virt_enabled(env)) {
> +return cpu->cfg.ebreakvs;
> +} else {
> +return cpu->cfg.ebreaks;
> +}
> +case PRV_U:
> +if (riscv_cpu_virt_enabled(env)) {
> +return cpu->cfg.ebreakvu;
> +} else {
> +return cpu->cfg.ebreaku;
> +}
> +}
> +
> +return false;
> +}
>  #endif /* !CONFIG_USER_ONLY */
> 
>  /*
> @@ -1342,7 +1366,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>  target_ulong mtval2 = 0;
> 
>  if  (cause == RISCV_EXCP_SEMIHOST) {
> -if (env->priv >= PRV_S) {
> +if (semihosting_enabled(cpu)) {
>  do_common_semihosting(cs);
>  env->pc += 4;
>  return;
> --
> 2.34.1
>

Bitmap or no bitmap,

Reviewed-by: Andrew Jones

Re: [PATCH] riscv: Make semihosting configurable for all privilege modes

On Thu, 11 Aug 2022 at 21:47, Furquan Shaikh  wrote:
>
> Unlike ARM, RISC-V does not define a separate breakpoint type for
> semihosting. Instead, it is entirely ABI. Thus, we need an option
> to allow users to configure what the ebreak behavior should be for
> different privilege levels - M, S, U, VS, VU. As per the RISC-V
> privilege specification[1], ebreak traps into the execution
> environment. However, RISC-V debug specification[2] provides
> ebreak{m,s,u,vs,vu} configuration bits to allow ebreak behavior to
> be configured to trap into debug mode instead. This change adds
> settable properties for RISC-V CPUs - `ebreakm`, `ebreaks`, `ebreaku`,
> `ebreakvs` and `ebreakvu` to allow user to configure whether qemu
> should treat ebreak as semihosting traps or trap according to the
> privilege specification.
>
> [1] 
> https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-privileged.pdf
> [2] 
> https://github.com/riscv/riscv-debug-spec/blob/release/riscv-debug-release.pdf

As a general rule we don't allow userspace to make semihosting
calls, as a (rather weak) attempt at fencing off unprivileged
guest code from being able to scribble all over the host
filesystem. We should try to be consistent across architectures
about that, and in particular about how we enable it.

I have a half-finished patchset where I was planning to add
a --semihosting-config userspace-enable=on option or similar
to that effect.

It sounds like these ebreak bits are somewhat architectural,
so maybe they make sense as a riscv specific thing, but we
should consider how they ought to interact with the general
behaviour of semihosting. As it stands in QEMU today, we
(at least in theory) ought not to permit userspace to make
semihosting ebreak calls at all I think.

thanks
-- PMM

Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux

2022-08-12 Thread Murilo Opsfelder Araújo

On 8/12/22 04:26, Claudio Imbrenda wrote:

On Thu, 11 Aug 2022 23:05:52 -0300
Murilo Opsfelder Araújo wrote:

On 8/11/22 11:02, Daniel P. Berrangé wrote:
[...]

Hmm, I was hoping you could just use SIGKILL to guarantee that this
gets killed off. Is SIGKILL delivered too soon to allow for the
main QEMU process to have exited quickly ?

yes, I tried. qemu has not finished exiting when the signal is
delivered, the cleanup process dies before qemu, which defeats the
purpose

Ok, too bad.

If so I wonder what happens when systemd just delivers SIGKILL to
all processes in the cgroup - I'm not sure there's a guarantee it
will SIGKILL the main qemu before it SIGKILLs this helper

I'm afraid in that case there is no guarantee.

for what it's worth, both virsh shutdown and destroy seem to do things
properly.

Hmm, probably because libvirt tells QEMU to exit before systemd comes
along and tells everything in the cgroup to die with SIGKILL.

It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10
seconds after Libvirt sent SIGTERM:

https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375

but this is fine.

with asynchronous teardown, qemu will exit almost immediately when
receiving SIGTERM, and the cleanup process will start cleaning up.

Under normal and orderly conditions, yes.

So I guess this patch happened to work with Libvirt because the main qemu
process terminated before the timeout and before SIGKILL was delivered.

it seems so

The cleanup process is trying to solve the problem where the main qemu process
takes too long to terminate. However, if the cleanup process itself takes too
long, SIGKILL will be sent by Libvirt anyway.

but that is not a problem, the sole purpose of the cleanup process is
to terminate _after_ qemu. it doesn't matter what happens after qemu
has terminated. if you look at the patch, after going to great lengths
to assure that qemu has terminated, all the child process does is
_exit(0).

Perhaps we can describe this situation in the parameter help, e.g.: If
management layer decides to send SIGKILL (e.g.: due to timeout or deliberate
decision), the cleanup process can exit before the main process, deceiving its
purpose.

if the management layer (or the user) decides to send SIGKILL
immediately to the whole cgroup without sending SIGTERM first, then
this whole asynchronous teardown mechanism is defeated, yes.

This situation is what we likely want to describe in the parameter help. I don't
want to give users the false impression that this option will *always* behave
the manner we expect it to work *most* of the time.

--
Murilo

[PATCH 1/2] osdeps: Introduce qemu_socketpair()

2022-08-12 Thread tugy

From: Guoyi Tu 

qemu_socketpair() will create a pair of connected sockets
with FD_CLOEXEC set

Signed-off-by: Guoyi Tu 
---
 include/qemu/sockets.h |  3 +++
 util/osdep.c   | 24 
 2 files changed, 27 insertions(+)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 038faa157f..52cf2855df 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -14,6 +14,9 @@ int inet_aton(const char *cp, struct in_addr *ia);
 /* misc helpers */
 bool fd_is_socket(int fd);
 int qemu_socket(int domain, int type, int protocol);
+#ifndef WIN32
+int qemu_socketpair(int domain, int type, int protocol, int sv[2]);
+#endif
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 int socket_set_cork(int fd, int v);
 int socket_set_nodelay(int fd);
diff --git a/util/osdep.c b/util/osdep.c
index 60fcbbaebe..4b1ab623c7 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -481,6 +481,30 @@ int qemu_socket(int domain, int type, int protocol)
 return ret;
 }
 
+#ifndef _WIN32
+/*
+ * Create a pair of connected sockets with FD_CLOEXEC set
+ */
+int qemu_socketpair(int domain, int type, int protocol, int sv[2])
+{
+int ret;
+
+#ifdef SOCK_CLOEXEC
+ret = socketpair(domain, type | SOCK_CLOEXEC, protocol, sv);
+if (ret != -1 || errno != EINVAL) {
+return ret;
+}
+#endif
+ret = socketpair(domain, type, protocol, sv);;
+if (ret == 0) {
+qemu_set_cloexec(sv[0]);
+qemu_set_cloexec(sv[1]);
+}
+
+return ret;
+}
+#endif
+
 /*
  * Accept a connection and set FD_CLOEXEC
  */
-- 
2.25.1

[PULL 4/5] hw/arm/virt-acpi-build: Present the GICR structure properly for GICv4

From: Zenghui Yu 

With the introduction of the new TCG GICv4, build_madt() is badly broken
as we do not present any GIC Redistributor structure in MADT for GICv4
guests, so that they have no idea about where the Redistributor
register frames are. This fixes a Linux guest crash at boot time with
ACPI enabled and '-machine gic-version=4'.

While at it, let's convert the remaining hard coded gic_version into
enumeration VIRT_GIC_VERSION_2 for consistency.

Signed-off-by: Zenghui Yu 
Message-id: 20220812022018.1069-1-yuzeng...@huawei.com
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 hw/arm/virt-acpi-build.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 449fab00805..9b3aee01bf8 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -732,7 +732,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 uint32_t pmu_interrupt = arm_feature(&armcpu->env, ARM_FEATURE_PMU) ?
  PPI(VIRTUAL_PMU_IRQ) : 0;
 
-if (vms->gic_version == 2) {
+if (vms->gic_version == VIRT_GIC_VERSION_2) {
 physical_base_address = memmap[VIRT_GIC_CPU].base;
 gicv = memmap[VIRT_GIC_VCPU].base;
 gich = memmap[VIRT_GIC_HYP].base;
@@ -762,7 +762,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 build_append_int_noprefix(table_data, armcpu->mp_affinity, 8);
 }
 
-if (vms->gic_version == 3) {
+if (vms->gic_version != VIRT_GIC_VERSION_2) {
 build_append_gicr(table_data, memmap[VIRT_GIC_REDIST].base,
   memmap[VIRT_GIC_REDIST].size);
 if (virt_gicv3_redist_region_count(vms) == 2) {
-- 
2.25.1

[PATCH 0/2] introduce qemu_socketpiar()

2022-08-12 Thread tugy

From: Guoyi Tu 

introduce qemu_socketpair() to create socket pair fd, and 
set the close-on-exec flag at default as with the other type
of socket does.

besides, the live update feature is developing, so it's necessary
to do that.

Guoyi Tu (2):
  osdeps: Introduce qemu_socketpair()
  vhost-user: Call qemu_socketpair() instead of socketpair()

 hw/display/vhost-user-gpu.c |  3 ++-
 hw/virtio/vhost-user.c  |  2 +-
 include/qemu/sockets.h  |  3 +++
 util/osdep.c| 24 
 4 files changed, 30 insertions(+), 2 deletions(-)

-- 
2.25.1

[PATCH 2/2] vhost-user: Call qemu_socketpair() instead of socketpair()

2022-08-12 Thread tugy

From: Guoyi Tu 

set close-on-exec flag on the new opened file descriptors at default

Signed-off-by: Guoyi Tu 
---
 hw/display/vhost-user-gpu.c | 3 ++-
 hw/virtio/vhost-user.c  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index 3340ef9e5f..19c0e20103 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/sockets.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-gpu.h"
 #include "chardev/char-fe.h"
@@ -375,7 +376,7 @@ vhost_user_gpu_do_set_socket(VhostUserGPU *g, Error **errp)
 Chardev *chr;
 int sv[2];
 
-if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+if (qemu_socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
 error_setg_errno(errp, errno, "socketpair() failed");
 return false;
 }
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 75b8df21a4..4d2b227028 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1726,7 +1726,7 @@ static int vhost_setup_slave_channel(struct vhost_dev 
*dev)
 return 0;
 }
 
-if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+if (qemu_socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
 int saved_errno = errno;
 error_report("socketpair() failed");
 return -saved_errno;
-- 
2.25.1

[PULL 2/5] Fix some typos in documentation (most of them found by codespell)

From: Stefan Weil 

Signed-off-by: Stefan Weil 
Reviewed-by: Hongren (Zenithal) Zheng 
Message-id: 20220812075642.1200578-1...@weilnetz.de
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 docs/about/deprecated.rst   |  2 +-
 docs/specs/acpi_erst.rst|  4 ++--
 docs/system/devices/canokey.rst |  8 
 docs/system/devices/cxl.rst | 12 ++--
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 7ee26626d5c..91b03115ee2 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -297,7 +297,7 @@ by using ``-machine graphics=off``.
 
 
 In QEMU versions 6.1, 6.2 and 7.0, the ``nvme-ns`` generates an EUI-64
-identifer that is not globally unique. If an EUI-64 identifer is required, the
+identifier that is not globally unique. If an EUI-64 identifier is required, 
the
 user must set it explicitly using the ``nvme-ns`` device parameter ``eui64``.
 
 ``-device nvme,use-intel-id=on|off`` (since 7.1)
diff --git a/docs/specs/acpi_erst.rst b/docs/specs/acpi_erst.rst
index a8a9d22d254..2339b60ad74 100644
--- a/docs/specs/acpi_erst.rst
+++ b/docs/specs/acpi_erst.rst
@@ -108,7 +108,7 @@ Slot 0 contains a backend storage header that identifies 
the contents
 as ERST and also facilitates efficient access to the records.
 Depending upon the size of the backend storage, additional slots will
 be designated to be a part of the slot 0 header. For example, at 8KiB,
-the slot 0 header can accomodate 1021 records. Thus a storage size
+the slot 0 header can accommodate 1021 records. Thus a storage size
 of 8MiB (8KiB * 1024) requires an additional slot for use by the
 header. In this scenario, slot 0 and slot 1 form the backend storage
 header, and records can be stored starting at slot 2.
@@ -196,5 +196,5 @@ References
 [2] "Unified Extensible Firmware Interface Specification",
 version 2.1, October 2008.
 
-[3] "Windows Hardware Error Architecture", specfically
+[3] "Windows Hardware Error Architecture", specifically
 "Error Record Persistence Mechanism".
diff --git a/docs/system/devices/canokey.rst b/docs/system/devices/canokey.rst
index c2c58ae3e7c..cfa6186e483 100644
--- a/docs/system/devices/canokey.rst
+++ b/docs/system/devices/canokey.rst
@@ -28,9 +28,9 @@ With the same software configuration as a hardware key,
 the guest OS can use all the functionalities of a secure key as if
 there was actually an hardware key plugged in.
 
-CanoKey QEMU provides much convenience for debuging:
+CanoKey QEMU provides much convenience for debugging:
 
-* libcanokey-qemu supports debuging output thus developers can
+* libcanokey-qemu supports debugging output thus developers can
   inspect what happens inside a secure key
 * CanoKey QEMU supports trace event thus event
 * QEMU USB stack supports pcap thus USB packet between the guest
@@ -102,8 +102,8 @@ and find CanoKey QEMU there:
 
 You may setup the key as guided in [6]_. The console for the key is at [7]_.
 
-Debuging
-
+Debugging
+=
 
 CanoKey QEMU consists of two parts, ``libcanokey-qemu.so`` and ``canokey.c``,
 the latter of which resides in QEMU. The former provides core functionality
diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index 36031325cca..f25783a4ecf 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -83,7 +83,7 @@ CXL Fixed Memory Windows (CFMW)
 A CFMW consists of a particular range of Host Physical Address space
 which is routed to particular CXL Host Bridges.  At time of generic
 software initialization it will have a particularly interleaving
-configuration and associated Quality of Serice Throtling Group (QTG).
+configuration and associated Quality of Service Throttling Group (QTG).
 This information is available to system software, when making
 decisions about how to configure interleave across available CXL
 memory devices.  It is provide as CFMW Structures (CFMWS) in
@@ -98,7 +98,7 @@ specification defined register interface called CXL Host 
Bridge
 Component Registers (CHBCR). The location of this CHBCR MMIO
 space is described to system software via a CXL Host Bridge
 Structure (CHBS) in the CEDT ACPI table.  The actual interfaces
-are identical to those used for other parts of the CXL heirarchy
+are identical to those used for other parts of the CXL hierarchy
 as CXL Component Registers in PCI BARs.
 
 Interfaces provided include:
@@ -143,7 +143,7 @@ CXL Memory Devices - Type 3
 ~~~
 CXL type 3 devices use a PCI class code and are intended to be supported
 by a generic operating system driver. They have HDM decoders
-though in these EP devices, the decoder is reponsible not for
+though in these EP devices, the decoder is responsible not for
 routing but for translation of the incoming host physical address (HPA)
 into a Device Physical Address (DPA).
 
@@ -209,7 +209,7 @@ Notes:
 ra

[PULL 1/5] target/arm: Don't report Statistical Profiling Extension in ID registers

The newly added neoverse-n1 CPU has ID register values which indicate
the presence of the Statistical Profiling Extension, because the real
hardware has this feature.  QEMU's TCG emulation does not yet
implement SPE, though (not even as a minimal stub implementation), so
guests will crash if they try to use it because the SPE system
registers don't exist.

Force ID_AA64DFR0_EL1.PMSVer to 0 in CPU realize for TCG, so that
we don't advertise to the guest a feature that doesn't exist.

(We could alternatively do this by editing the value that
aarch64_neoverse_n1_initfn() sets for this ID register, but
suppressing the field in realize means we won't re-introduce this bug
when we add other CPUs that have SPE in hardware, such as the
Neoverse-V1.)

An example of a non-booting guest is current mainline Linux (5.19),
when booting in EL2 on the virt board (ie with -machine
virtualization=on).

Reported-by: Zenghui Yu 
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Zenghui Yu 
Message-id: 20220811131127.947334-1-peter.mayd...@linaro.org
---
 target/arm/cpu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 1b7b3d76bb3..7ec3281da9a 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1933,6 +1933,17 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 }
 #endif
 
+if (tcg_enabled()) {
+/*
+ * Don't report the Statistical Profiling Extension in the ID
+ * registers, because TCG doesn't implement it yet (not even a
+ * minimal stub version) and guests will fall over when they
+ * try to access the non-existent system registers for it.
+ */
+cpu->isar.id_aa64dfr0 =
+FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, PMSVER, 0);
+}
+
 /* MPU can be configured out of a PMSA CPU either by setting has-mpu
  * to false or by setting pmsav7-dregion to 0.
  */
-- 
2.25.1

[PULL 5/5] cutils: Add missing dyld(3) include on macOS

From: Philippe Mathieu-Daudé 

Commit 06680b15b4 moved qemu_*_exec_dir() to cutils but forgot
to move the macOS dyld(3) include, resulting in the following
error (when building with Homebrew GCC on macOS Monterey 12.4):

  [313/1197] Compiling C object libqemuutil.a.p/util_cutils.c.o
  FAILED: libqemuutil.a.p/util_cutils.c.o
  ../../util/cutils.c:1039:13: error: implicit declaration of function 
'_NSGetExecutablePath' [-Werror=implicit-function-declaration]
   1039 | if (_NSGetExecutablePath(fpath, &len) == 0) {
| ^~~~
  ../../util/cutils.c:1039:13: error: nested extern declaration of 
'_NSGetExecutablePath' [-Werror=nested-externs]

Fix by moving the include line to cutils.

Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils")
Signed-off-by: Philippe Mathieu-Daudé 
Message-id: 20220809222046.30812-1-f4...@amsat.org
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 util/cutils.c  | 4 
 util/oslib-posix.c | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/cutils.c b/util/cutils.c
index cb43dda213c..def9c746ced 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -39,6 +39,10 @@
 #include 
 #endif
 
+#ifdef __APPLE__
+#include 
+#endif
+
 #ifdef G_OS_WIN32
 #include 
 #include 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index bffec18869e..d55af69c112 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -58,10 +58,6 @@
 #include 
 #endif
 
-#ifdef __APPLE__
-#include 
-#endif
-
 #include "qemu/mmap-alloc.h"
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
-- 
2.25.1

[PULL 0/5] target-arm queue

This pullreq has:
 * two arm bug fixes which fix some "Linux fails to boot" bugs
 * a docs typo-fixing patch
 * a couple of compile failure/warning issues

I think they're all pretty safe and worth having in rc3.

thanks
-- PMM

The following changes since commit a6b1c53e79d08a99a28cc3e67a3e1a7c34102d6b:

  Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-10 10:26:57 -0700)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20220812

for you to fetch changes up to 4311682ea8293f720730f260e8a7601117d79e65:

  cutils: Add missing dyld(3) include on macOS (2022-08-12 11:33:52 +0100)


target-arm queue:
 * Don't report Statistical Profiling Extension in ID registers
 * virt ACPI tables: Present the GICR structure properly for GICv4
 * Fix some typos in documentation
 * tests/unit: fix a -Wformat-truncation warning
 * cutils: Add missing dyld(3) include on macOS


Marc-André Lureau (1):
  tests/unit: fix a -Wformat-truncation warning

Peter Maydell (1):
  target/arm: Don't report Statistical Profiling Extension in ID registers

Philippe Mathieu-Daudé (1):
  cutils: Add missing dyld(3) include on macOS

Stefan Weil (1):
  Fix some typos in documentation (most of them found by codespell)

Zenghui Yu (1):
  hw/arm/virt-acpi-build: Present the GICR structure properly for GICv4

 docs/about/deprecated.rst   |  2 +-
 docs/specs/acpi_erst.rst|  4 ++--
 docs/system/devices/canokey.rst |  8 
 docs/system/devices/cxl.rst | 12 ++--
 hw/arm/virt-acpi-build.c|  4 ++--
 target/arm/cpu.c| 11 +++
 tests/unit/test-qobject-input-visitor.c |  3 +--
 util/cutils.c   |  4 
 util/oslib-posix.c  |  4 
 9 files changed, 31 insertions(+), 21 deletions(-)

Re: [PATCH 2/2] vhost-user: Call qemu_socketpair() instead of socketpair()

On Fri, 12 Aug 2022 at 12:44,  wrote:
>
> From: Guoyi Tu 
>
> set close-on-exec flag on the new opened file descriptors at default

What goes wrong if we don't do this? The commit message
is a good place to explain what bug the commit is fixing,
and its consequences.

thanks
-- PMM

[PULL 3/5] tests/unit: fix a -Wformat-truncation warning

From: Marc-André Lureau 

../tests/test-qobject-input-visitor.c: In function ‘test_visitor_in_list’:
../tests/test-qobject-input-visitor.c:454:49: warning: ‘%d’ directive output 
may be truncated writing between 1 and 10 bytes into a region of size 6 
[-Wformat-truncation=]
  454 | snprintf(string, sizeof(string), "string%d", i);
  | ^~
../tests/test-qobject-input-visitor.c:454:42: note: directive argument in the 
range [0, 2147483606]
  454 | snprintf(string, sizeof(string), "string%d", i);
  |  ^~
../tests/test-qobject-input-visitor.c:454:9: note: ‘snprintf’ output between 8 
and 17 bytes into a destination of size 12
  454 | snprintf(string, sizeof(string), "string%d", i);
  | ^~~

Rather than trying to be clever, since this is called 3 times during
tests, let's simply use g_strdup_printf().

Signed-off-by: Marc-André Lureau 
Reviewed-by: Markus Armbruster 
Message-id: 20220810121513.1356081-1-marcandre.lur...@redhat.com
Reviewed-by: Peter Maydell 
[PMM: fixed commit message typos]
Signed-off-by: Peter Maydell 
---
 tests/unit/test-qobject-input-visitor.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/unit/test-qobject-input-visitor.c 
b/tests/unit/test-qobject-input-visitor.c
index 14329dabcfe..5f614afdbf1 100644
--- a/tests/unit/test-qobject-input-visitor.c
+++ b/tests/unit/test-qobject-input-visitor.c
@@ -447,9 +447,8 @@ static void test_visitor_in_list(TestInputVisitorData *data,
 g_assert(head != NULL);
 
 for (i = 0, item = head; item; item = item->next, i++) {
-char string[12];
+g_autofree char *string = g_strdup_printf("string%d", i);
 
-snprintf(string, sizeof(string), "string%d", i);
 g_assert_cmpstr(item->value->string, ==, string);
 g_assert_cmpint(item->value->integer, ==, 42 + i);
 }
-- 
2.25.1

Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux

2022-08-12 Thread Claudio Imbrenda

On Fri, 12 Aug 2022 08:38:59 -0300
Murilo Opsfelder Araújo  wrote:

> On 8/12/22 04:26, Claudio Imbrenda wrote:
> > On Thu, 11 Aug 2022 23:05:52 -0300
> > Murilo Opsfelder Araújo  wrote:
> >  
> >> On 8/11/22 11:02, Daniel P. Berrangé wrote:
> >> [...]  
> > Hmm, I was hoping you could just use SIGKILL to guarantee that this
> > gets killed off.  Is SIGKILL delivered too soon to allow for the
> > main QEMU process to have exited quickly ?  
> 
>  yes, I tried. qemu has not finished exiting when the signal is
>  delivered, the cleanup process dies before qemu, which defeats the
>  purpose  
> >>>
> >>> Ok, too bad.
> >>>  
> > If so I wonder what happens when systemd just delivers SIGKILL to
> > all processes in the cgroup - I'm not sure there's a guarantee it
> > will SIGKILL the main qemu before it SIGKILLs this helper  
> 
>  I'm afraid in that case there is no guarantee.
> 
>  for what it's worth, both virsh shutdown and destroy seem to do things
>  properly.  
> >>>
> >>> Hmm, probably because libvirt tells QEMU to exit before systemd comes
> >>> along and tells everything in the cgroup to die with SIGKILL.  
> >>
> >> It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10
> >> seconds after Libvirt sent SIGTERM:
> >>
> >> https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375
> >>   
> >
> > but this is fine.
> >
> > with asynchronous teardown, qemu will exit almost immediately when
> > receiving SIGTERM, and the cleanup process will start cleaning up.  
> 
> Under normal and orderly conditions, yes.
> 
> >> So I guess this patch happened to work with Libvirt because the main qemu
> >> process terminated before the timeout and before SIGKILL was delivered.  
> >
> > it seems so
> >  
> >>
> >> The cleanup process is trying to solve the problem where the main qemu 
> >> process
> >> takes too long to terminate. However, if the cleanup process itself takes 
> >> too
> >> long, SIGKILL will be sent by Libvirt anyway.  
> >
> > but that is not a problem, the sole purpose of the cleanup process is
> > to terminate _after_ qemu. it doesn't matter what happens after qemu
> > has terminated. if you look at the patch, after going to great lengths
> > to assure that qemu has terminated, all the child process does is
> > _exit(0).
> >  
> >>
> >> Perhaps we can describe this situation in the parameter help, e.g.: If
> >> management layer decides to send SIGKILL (e.g.: due to timeout or 
> >> deliberate
> >> decision), the cleanup process can exit before the main process, deceiving 
> >> its
> >> purpose.  
> >
> > if the management layer (or the user) decides to send SIGKILL
> > immediately to the whole cgroup without sending SIGTERM first, then
> > this whole asynchronous teardown mechanism is defeated, yes.  
> 
> This situation is what we likely want to describe in the parameter help. I 
> don't
> want to give users the false impression that this option will *always* behave
> the manner we expect it to work *most* of the time.

fair enough, I'll improve the documentation

> 
> --
> Murilo

Re: [PATCH 1/2] osdeps: Introduce qemu_socketpair()

On Fri, 12 Aug 2022 at 12:44,  wrote:
>
> From: Guoyi Tu 
>
> qemu_socketpair() will create a pair of connected sockets
> with FD_CLOEXEC set
>
> Signed-off-by: Guoyi Tu 
> ---
>  include/qemu/sockets.h |  3 +++
>  util/osdep.c   | 24 
>  2 files changed, 27 insertions(+)
>
> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
> index 038faa157f..52cf2855df 100644
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -14,6 +14,9 @@ int inet_aton(const char *cp, struct in_addr *ia);
>  /* misc helpers */
>  bool fd_is_socket(int fd);
>  int qemu_socket(int domain, int type, int protocol);
> +#ifndef WIN32
> +int qemu_socketpair(int domain, int type, int protocol, int sv[2]);

Any new function declaration in a header file needs a
doc-comment documenting what it does, please.

> +#endif
>  int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
>  int socket_set_cork(int fd, int v);
>  int socket_set_nodelay(int fd);
> diff --git a/util/osdep.c b/util/osdep.c
> index 60fcbbaebe..4b1ab623c7 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -481,6 +481,30 @@ int qemu_socket(int domain, int type, int protocol)
>  return ret;
>  }
>
> +#ifndef _WIN32

If this function only exists and is usable on posix
hosts, put it in util/oslib-posix.c rather than having
it here with a win32 ifdef.

thanks
-- PMM

Re: [PATCH v2 2/9] target/arm: Don't add all MIDR aliases for Cortex-R

On Mon, 18 Jul 2022 at 12:54, Tobias Roehmel  wrote:
>
> From: Tobias Röhmel 
>
> Cortex-R52 has the MPUIR register which has the same encoding
> has the MIDR alias with opc2=4. So we only add that alias
> when we are not realizing a Cortex-R.
>
> Signed-off-by: Tobias Röhmel 
> ---
>  target/arm/helper.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 6457e6301c..03bdc3d149 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -8189,9 +8189,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>.fieldoffset = offsetof(CPUARMState, cp15.c0_cpuid),
>.readfn = midr_read },
>  /* crn = 0 op1 = 0 crm = 0 op2 = 4,7 : AArch32 aliases of MIDR */

Moving the op2=4 register definition makes this comment out of date,
so you need to update it (ie remove the '4' here, and add a comment
similarly noting what id_v8_midr_alias_cp_reginfo[] is doing).

> -{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
> -  .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4,
> -  .access = PL1_R, .resetvalue = cpu->midr },
>  { .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
>.cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 7,
>.access = PL1_R, .resetvalue = cpu->midr },
> @@ -8201,6 +8198,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>.accessfn = access_aa64_tid1,
>.type = ARM_CP_CONST, .resetvalue = cpu->revidr },
>  };
> +ARMCPRegInfo id_v8_midr_alias_cp_reginfo[] = {
> +{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
> +  .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4,
> +  .access = PL1_R, .resetvalue = cpu->midr },
> +};

If there's only one register, you don't need to define a
single-element array, you can use define_one_arm_cp_reg()
and pass it a single ARMCPRegInfo struct.

>  ARMCPRegInfo id_cp_reginfo[] = {
>  /* These are common to v8 and pre-v8 */
>  { .name = "CTR",
> @@ -8264,8 +8266,12 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>  id_mpuir_reginfo.access = PL1_RW;
>  id_tlbtr_reginfo.access = PL1_RW;
>  }
> +
>  if (arm_feature(env, ARM_FEATURE_V8)) {
>  define_arm_cp_regs(cpu, id_v8_midr_cp_reginfo);
> +if (!arm_feature(env, ARM_FEATURE_V8_R)) {

You can use ARM_FEATURE_PMSA here.

> +define_arm_cp_regs(cpu, id_v8_midr_alias_cp_reginfo);
> +}
>  } else {
>  define_arm_cp_regs(cpu, id_pre_v8_midr_cp_reginfo);
>  }
> --

thanks
-- PMM

Re: [PATCH v2 3/9] target/arm: Make RVBAR available for all ARMv8 CPUs

(I've added your rwth-aachen.de address because the quicinc
one seems to be bouncing :-( )

On Mon, 18 Jul 2022 at 12:54, Tobias Roehmel  wrote:
>
> From: Tobias Röhmel 
>
> Signed-off-by: Tobias Röhmel 

Having looked a bit more carefully at the architecture
manual, I think this is not complete. In particular
you don't actually define the AArch32 RVBAR register
(which needs to exist with permissions permitting only
EL2 access if EL2 is the highest EL, and with permissions
for EL1 access if EL1 is the highest EL, in the same way
the AArch64 RVBAR_EL2 and RVBAR_EL1 work (ie by adding
AArch32 alias registers where we define RVBAR_EL2 and
RVBAR_EL1).

(AArch32 has no equivalent of RVBAR_EL3; MVBAR is kind
of the same thing, but it's not required that it has the
actual reset address in it on reset. So we can ignore that
as we're architecturally compliant for A-profile, and for
the R52 we don't care about EL3.).

The commit message here could also usefully be expanded
to explain what we're doing and why it's the right thing.

> ---
>  target/arm/cpu.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 1b5d535788..9007768418 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -258,6 +258,10 @@ static void arm_cpu_reset(DeviceState *dev)
>  env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
>   CPACR, CP11, 3);
>  #endif
> +if (arm_feature(env, ARM_FEATURE_V8)) {
> +env->cp15.rvbar = cpu->rvbar_prop;
> +env->regs[15] = cpu->rvbar_prop;
> +}
>  }
>
>  #if defined(CONFIG_USER_ONLY)
> @@ -1273,7 +1277,7 @@ void arm_cpu_post_init(Object *obj)
>  qdev_property_add_static(DEVICE(obj), 
> &arm_cpu_reset_hivecs_property);
>  }
>
> -if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
> +if (arm_feature(&cpu->env, ARM_FEATURE_V8)) {
>  object_property_add_uint64_ptr(obj, "rvbar",
> &cpu->rvbar_prop,
> OBJ_PROP_FLAG_READWRITE);

thanks
-- PMM

[PATCH 2/2] target/riscv: fence: reconcile with specification

Our decoding of fence-instructions is problematic in respect to the
RISC-V ISA specification:
- rs and rd are ignored, but need to be 0
- fm is ignored

This change adjusts the decode pattern to enfore rs and rd being 0,
and validates the fm-field (together with pred/succ for FENCE.TSO) to
determine whether a reserved instruction is specified.

While the specification allows UNSPECIFIED behaviour for reserved
instructions, we now always raise an illegal instruction exception.

Signed-off-by: Philipp Tomsich 

---

 target/riscv/insn32.decode  |  2 +-
 target/riscv/insn_trans/trans_rvi.c.inc | 19 ++-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 089128c3dc..4e53df1b62 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -150,7 +150,7 @@ srl  000 .. 101 . 0110011 @r
 sra  010 .. 101 . 0110011 @r
 or   000 .. 110 . 0110011 @r
 and  000 .. 111 . 0110011 @r
-fence pred:4 succ:4 - 000 - 000
+fencefm:4 pred:4 succ:4 0 000 0 000
 fence_i   0 001 0 000
 csrrw . 001 . 1110011 @csr
 csrrs . 010 . 1110011 @csr
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index ca8e3d1ea1..515bb3b22a 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a)
 
 static bool trans_fence(DisasContext *ctx, arg_fence *a)
 {
-/* FENCE is a full memory barrier. */
+switch (a->fm) {
+case 0b:
+ /* normal fence */
+ break;
+
+case 0b0001:
+ /* FENCE.TSO requires PRED and SUCC to be RW */
+ if (a->pred != 0xb0011 || a->succ != 0b0011) {
+return false;
+ }
+ break;
+
+default:
+/* reserved for future use */
+return false;
+}
+
+/* We implement FENCE(.TSO) is a full memory barrier. */
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
 return true;
 }
-- 
2.34.1

[PATCH 1/2] target/riscv: fence.i: update decode pattern

The RISC-V specification specifies imm12, rs1 and rd to be all-zeros,
so we can't ignore these bits when decoding into fence.i.

Update the decode pattern to reflect the specification.

Signed-off-by: Philipp Tomsich 
---

 target/riscv/insn32.decode | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 014127d066..089128c3dc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -151,7 +151,7 @@ sra  010 .. 101 . 0110011 @r
 or   000 .. 110 . 0110011 @r
 and  000 .. 111 . 0110011 @r
 fence pred:4 succ:4 - 000 - 000
-fence_i         - 001 - 000
+fence_i   0 001 0 000
 csrrw . 001 . 1110011 @csr
 csrrs . 010 . 1110011 @csr
 csrrc . 011 . 1110011 @csr
-- 
2.34.1

Re: [PATCH 2/2] target/riscv: fence: reconcile with specification

On Fri, 12 Aug 2022 at 14:17, Philipp Tomsich  wrote:
>
> Our decoding of fence-instructions is problematic in respect to the
> RISC-V ISA specification:
> - rs and rd are ignored, but need to be 0
> - fm is ignored
>
> This change adjusts the decode pattern to enfore rs and rd being 0,
> and validates the fm-field (together with pred/succ for FENCE.TSO) to
> determine whether a reserved instruction is specified.
>
> While the specification allows UNSPECIFIED behaviour for reserved
> instructions, we now always raise an illegal instruction exception.
>
> Signed-off-by: Philipp Tomsich 
>
> ---
>
>  target/riscv/insn32.decode  |  2 +-
>  target/riscv/insn_trans/trans_rvi.c.inc | 19 ++-
>  2 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 089128c3dc..4e53df1b62 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -150,7 +150,7 @@ srl  000 .. 101 . 0110011 @r
>  sra  010 .. 101 . 0110011 @r
>  or   000 .. 110 . 0110011 @r
>  and  000 .. 111 . 0110011 @r
> -fence pred:4 succ:4 - 000 - 000
> +fencefm:4 pred:4 succ:4 0 000 0 000
>  fence_i   0 001 0 000
>  csrrw . 001 . 1110011 @csr
>  csrrs . 010 . 1110011 @csr
> diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> b/target/riscv/insn_trans/trans_rvi.c.inc
> index ca8e3d1ea1..515bb3b22a 100644
> --- a/target/riscv/insn_trans/trans_rvi.c.inc
> +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a)
>
>  static bool trans_fence(DisasContext *ctx, arg_fence *a)
>  {
> -/* FENCE is a full memory barrier. */
> +switch (a->fm) {
> +case 0b:
> + /* normal fence */
> + break;
> +
> +case 0b0001:
> + /* FENCE.TSO requires PRED and SUCC to be RW */
> + if (a->pred != 0xb0011 || a->succ != 0b0011) {
> +return false;
> + }
> + break;
> +
> +default:
> +/* reserved for future use */
> +return false;
> +}

I think it would be neater to do this decode in the
.decode file, rather than by hand in the trans function.

thanks
-- PMM

[PATCH v4 1/1] os-posix: asynchronous teardown for shutdown on Linux

2022-08-12 Thread Claudio Imbrenda

This patch adds support for asynchronously tearing down a VM on Linux.

When qemu terminates, either naturally or because of a fatal signal,
the VM is torn down. If the VM is huge, it can take a considerable
amount of time for it to be cleaned up. In case of a protected VM, it
might take even longer than a non-protected VM (this is the case on
s390x, for example).

Some users might want to shut down a VM and restart it immediately,
without having to wait. This is especially true if management
infrastructure like libvirt is used.

This patch implements a simple trick on Linux to allow qemu to return
immediately, with the teardown of the VM being performed
asynchronously.

If the new commandline option -async-teardown is used, a new process is
spawned from qemu at startup, using the clone syscall, in such way that
it will share its address space with qemu.The new process will have the
name "cleanup/". It will wait until qemu terminates
completely, and then it will exit itself.

This allows qemu to terminate quickly, without having to wait for the
whole address space to be torn down. The cleanup process will exit
after qemu, so it will be the last user of the address space, and
therefore it will take care of the actual teardown. The cleanup
process will share the same cgroups as qemu, so both memory usage and
cpu time will be accounted properly.

If possible, close_range will be used in the cleanup process to close
all open file descriptors. If it is not available or if it fails, /proc
will be used to determine which file descriptors to close.

If the cleanup process is forcefully killed with SIGKILL before the
main qemu process has terminated completely, the mechanism is defeated
and the teardown will not be asynchronous.

This feature can already be used with libvirt by adding the following
to the XML domain definition to pass the parameter to qemu directly:

  http://libvirt.org/schemas/domain/qemu/1.0";>
  
  

Signed-off-by: Claudio Imbrenda 
Reviewed-by: Murilo Opsfelder Araujo 
Tested-by: Murilo Opsfelder Araujo 
---
 include/qemu/async-teardown.h |  22 +
 meson.build   |   1 +
 os-posix.c|   6 ++
 qemu-options.hx   |  19 +
 util/async-teardown.c | 155 ++
 util/meson.build  |   1 +
 6 files changed, 204 insertions(+)
 create mode 100644 include/qemu/async-teardown.h
 create mode 100644 util/async-teardown.c

diff --git a/include/qemu/async-teardown.h b/include/qemu/async-teardown.h
new file mode 100644
index 00..092e7a37e7
--- /dev/null
+++ b/include/qemu/async-teardown.h
@@ -0,0 +1,22 @@
+/*
+ * Asynchronous teardown
+ *
+ * Copyright IBM, Corp. 2022
+ *
+ * Authors:
+ *  Claudio Imbrenda 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_ASYNC_TEARDOWN_H
+#define QEMU_ASYNC_TEARDOWN_H
+
+#include "config-host.h"
+
+#ifdef CONFIG_LINUX
+void init_async_teardown(void);
+#endif
+
+#endif
diff --git a/meson.build b/meson.build
index 294e9a8f32..7bccad93d0 100644
--- a/meson.build
+++ b/meson.build
@@ -1892,6 +1892,7 @@ config_host_data.set('HAVE_SYS_IOCCOM_H', 
cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
 
 # has_function
+config_host_data.set('CONFIG_CLOSE_RANGE', cc.has_function('close_range'))
 config_host_data.set('CONFIG_ACCEPT4', cc.has_function('accept4'))
 config_host_data.set('CONFIG_CLOCK_ADJTIME', cc.has_function('clock_adjtime'))
 config_host_data.set('CONFIG_DUP3', cc.has_function('dup3'))
diff --git a/os-posix.c b/os-posix.c
index 321fc4bd13..4858650c3e 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -39,6 +39,7 @@
 
 #ifdef CONFIG_LINUX
 #include 
+#include "qemu/async-teardown.h"
 #endif
 
 /*
@@ -150,6 +151,11 @@ int os_parse_cmd_args(int index, const char *optarg)
 case QEMU_OPTION_daemonize:
 daemonize = 1;
 break;
+#if defined(CONFIG_LINUX)
+case QEMU_OPTION_asyncteardown:
+init_async_teardown();
+break;
+#endif
 default:
 return -1;
 }
diff --git a/qemu-options.hx b/qemu-options.hx
index 3f23a42fa8..f913fc307f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4743,6 +4743,25 @@ HXCOMM Internal use
 DEF("qtest", HAS_ARG, QEMU_OPTION_qtest, "", QEMU_ARCH_ALL)
 DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL)
 
+#ifdef __linux__
+DEF("async-teardown", 0, QEMU_OPTION_asyncteardown,
+"-async-teardown enable asynchronous teardown\n",
+QEMU_ARCH_ALL)
+#endif
+SRST
+``-async-teardown``
+Enable asynchronous teardown. A new process called "cleanup/"
+will be created at startup sharing the address space with the main qemu
+process, using clone. It will wait for the main qemu process to
+terminate completely, and then exit.
+This allows qemu to terminate very quickly even if the guest was

[PATCH v2] hw/smbios: support for type 8 (port connector)

2022-08-12 Thread Hal Martin

PATCH v1: add support for SMBIOS type 8 to qemu
PATCH v2: incorporate patch v1 feedback and add smbios type=8 to qemu-options

internal_reference: internal reference designator
external_reference: external reference designator
connector_type: hex value for port connector type (see SMBIOS 7.9.2)
port_type: hex value for port type (see SMBIOS 7.9.3)

After studying various vendor implementationsi (Dell, Lenovo, MSI),
the value of internal connector type was hard-coded to 0x0 (None).

Example usage:
-smbios 
type=8,internal_reference=JUSB1,external_reference=USB1,connector_type=0x12,port_type=0x10
 \
-smbios type=8,internal_reference=JAUD1,external_reference="Audio 
Jack",connector_type=0x1f,port_type=0x1d \
-smbios 
type=8,internal_reference=LAN,external_reference=Ethernet,connector_type=0x0b,port_type=0x1f
 \
-smbios 
type=8,internal_reference=PS2,external_reference=Mouse,connector_type=0x0f,port_type=0x0e
 \
-smbios 
type=8,internal_reference=PS2,external_reference=Keyboard,connector_type=0x0f,port_type=0x0d


Signed-off-by: Hal Martin 

---
 hw/smbios/smbios.c   | 63 
 include/hw/firmware/smbios.h | 10 ++
 qemu-options.hx  |  2 ++
 3 files changed, 75 insertions(+)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 60349ee402..578cae0f0a 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -111,6 +111,13 @@ static struct {
 .processor_id = 0,
 };
 
+struct type8_instance {
+const char *internal_reference, *external_reference;
+uint8_t connector_type, port_type;
+QTAILQ_ENTRY(type8_instance) next;
+};
+static QTAILQ_HEAD(, type8_instance) type8 = QTAILQ_HEAD_INITIALIZER(type8);
+
 static struct {
 size_t nvalues;
 char **values;
@@ -337,6 +344,29 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = {
 { /* end of list */ }
 };
 
+static const QemuOptDesc qemu_smbios_type8_opts[] = {
+{
+.name = "internal_reference",
+.type = QEMU_OPT_STRING,
+.help = "internal reference designator",
+},
+{
+.name = "external_reference",
+.type = QEMU_OPT_STRING,
+.help = "external reference designator",
+},
+{
+.name = "connector_type",
+.type = QEMU_OPT_NUMBER,
+.help = "connector type",
+},
+{
+.name = "port_type",
+.type = QEMU_OPT_NUMBER,
+.help = "port type",
+},
+};
+
 static const QemuOptDesc qemu_smbios_type11_opts[] = {
 {
 .name = "value",
@@ -718,6 +748,26 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance)
 smbios_type4_count++;
 }
 
+static void smbios_build_type_8_table(void)
+{
+unsigned instance = 0;
+struct type8_instance *t8;
+
+QTAILQ_FOREACH(t8, &type8, next) {
+SMBIOS_BUILD_TABLE_PRE(8, T0_BASE + instance, true);
+
+SMBIOS_TABLE_SET_STR(8, internal_reference_str, 
t8->internal_reference);
+SMBIOS_TABLE_SET_STR(8, external_reference_str, 
t8->external_reference);
+/* most vendors seem to set this to None */
+t->internal_connector_type = 0x0;
+t->external_connector_type = t8->connector_type;
+t->port_type = t8->port_type;
+
+SMBIOS_BUILD_TABLE_POST;
+instance++;
+}
+}
+
 static void smbios_build_type_11_table(void)
 {
 char count_str[128];
@@ -1030,6 +1080,7 @@ void smbios_get_tables(MachineState *ms,
 smbios_build_type_4_table(ms, i);
 }
 
+smbios_build_type_8_table();
 smbios_build_type_11_table();
 
 #define MAX_DIMM_SZ (16 * GiB)
@@ -1346,6 +1397,18 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
UINT16_MAX);
 }
 return;
+case 8:
+if (!qemu_opts_validate(opts, qemu_smbios_type8_opts, errp)) {
+return;
+}
+struct type8_instance *t;
+t = g_new0(struct type8_instance, 1);
+save_opt(&t->internal_reference, opts, "internal_reference");
+save_opt(&t->external_reference, opts, "external_reference");
+t->connector_type = qemu_opt_get_number(opts, "connector_type", 0);
+t->port_type = qemu_opt_get_number(opts, "port_type", 0);
+QTAILQ_INSERT_TAIL(&type8, t, next);
+return;
 case 11:
 if (!qemu_opts_validate(opts, qemu_smbios_type11_opts, errp)) {
 return;
diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
index 4b7ad77a44..e7d386f7c8 100644
--- a/include/hw/firmware/smbios.h
+++ b/include/hw/firmware/smbios.h
@@ -189,6 +189,16 @@ struct smbios_type_4 {
 uint16_t processor_family2;
 } QEMU_PACKED;
 
+/* SMBIOS type 8 - Port Connector Information */
+struct smbios_type_8 {
+struct smbios_structure_header header;
+uint8_t internal_reference_str;
+uint8_t internal_connector_type;
+uint8_t external_reference_str;
+uint8_t external_connec

Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern

Please use a cover-letter for multi-patch patch series.

On Fri, Aug 12, 2022 at 03:13:03PM +0200, Philipp Tomsich wrote:
> The RISC-V specification specifies imm12, rs1 and rd to be all-zeros,
> so we can't ignore these bits when decoding into fence.i.
> 
> Update the decode pattern to reflect the specification.

I got hung-up on this for a bit since there isn't any "must-be-0" fields,
only ignored fields, but the next patch gives a clue which helped me make
sense of this. The encoding of these instructions with ignored fields set
to anything except zero gets into reserved instruction territory, and QEMU
may legally raise an illegal-instruction in that case, which this patch
will start doing. It'd be nice to have a bit more text in this commit
message to make that clear.

> 
> Signed-off-by: Philipp Tomsich 
> ---
> 
>  target/riscv/insn32.decode | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 014127d066..089128c3dc 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -151,7 +151,7 @@ sra  010 .. 101 . 0110011 @r
>  or   000 .. 110 . 0110011 @r
>  and  000 .. 111 . 0110011 @r
>  fence pred:4 succ:4 - 000 - 000
> -fence_i         - 001 - 000
> +fence_i   0 001 0 000
^ need two more spaces here to line up with fence.

>  csrrw . 001 . 1110011 @csr
>  csrrs . 010 . 1110011 @csr
>  csrrc . 011 . 1110011 @csr
> -- 
> 2.34.1
> 
> 

Thanks,
drew

Re: [PATCH 2/2] target/riscv: fence: reconcile with specification

On Fri, Aug 12, 2022 at 03:13:04PM +0200, Philipp Tomsich wrote:
> Our decoding of fence-instructions is problematic in respect to the
> RISC-V ISA specification:
> - rs and rd are ignored, but need to be 0
> - fm is ignored
> 
> This change adjusts the decode pattern to enfore rs and rd being 0,
> and validates the fm-field (together with pred/succ for FENCE.TSO) to
> determine whether a reserved instruction is specified.
> 
> While the specification allows UNSPECIFIED behaviour for reserved
> instructions, we now always raise an illegal instruction exception.
> 
> Signed-off-by: Philipp Tomsich 
> 
> ---
> 
>  target/riscv/insn32.decode  |  2 +-
>  target/riscv/insn_trans/trans_rvi.c.inc | 19 ++-
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 089128c3dc..4e53df1b62 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -150,7 +150,7 @@ srl  000 .. 101 . 0110011 @r
>  sra  010 .. 101 . 0110011 @r
>  or   000 .. 110 . 0110011 @r
>  and  000 .. 111 . 0110011 @r
> -fence pred:4 succ:4 - 000 - 000
> +fencefm:4 pred:4 succ:4 0 000 0 000
>  fence_i   0 001 0 000
>  csrrw . 001 . 1110011 @csr
>  csrrs . 010 . 1110011 @csr
> diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> b/target/riscv/insn_trans/trans_rvi.c.inc
> index ca8e3d1ea1..515bb3b22a 100644
> --- a/target/riscv/insn_trans/trans_rvi.c.inc
> +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a)
>  
>  static bool trans_fence(DisasContext *ctx, arg_fence *a)
>  {
> -/* FENCE is a full memory barrier. */
> +switch (a->fm) {
> +case 0b:
> + /* normal fence */
> + break;
> +
> +case 0b0001:
> + /* FENCE.TSO requires PRED and SUCC to be RW */
> + if (a->pred != 0xb0011 || a->succ != 0b0011) {
> +return false;
> + }
> + break;
> +
> +default:
> +/* reserved for future use */
> +return false;
> +}
> +
> +/* We implement FENCE(.TSO) is a full memory barrier. */

s/is/as/

Thanks,
drew

>  tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
>  return true;
>  }
> -- 
> 2.34.1
> 
>

Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern

On Fri, 12 Aug 2022 at 16:01, Andrew Jones  wrote:
>
> > Update the decode pattern to reflect the specification.
>
> I got hung-up on this for a bit since there isn't any "must-be-0" fields,

Please refer to '“Zifencei” Instruction-Fetch Fence, Version 2.0' in
the specification.
The encoding diagram clearly states 0 for imm[11:0], 0 for rs1 and 0 for rd.

However, there is an explanatory paragraph below (unfortunately, it is
not clear whether this is normative or informative):
> The unused fields in the FENCE.I instruction, imm[11:0], rs1, and rd, are 
> reserved for finer-grain fences in future extensions. For forward 
> compatibility, base implementations shall ignore these fields, and standard 
> software shall zero these fields.

Strictly speaking, this patch may be too restrictive (it violates the
"for forward-compatibility" part — which I consider informative only,
though).

Thanks,
Philipp.

Re: [PATCH v2 1/8] parallels: Out of image offset in BAT leads to image inflation


On 11.08.2022 17:00, Alexander Ivanov wrote:

When an image is opened, data_end field in BDRVParallelsState
is setted as the biggest offset in the BAT plus cluster size.
If there is a corrupted offset pointing outside the image,
the image size increase accordingly. It potentially leads
to attempts to create a file size of petabytes.

Set the data_end field with the original file size if the image
was opened for checking and repairing purposes or raise an error.

v2: No changes.

Changelog should be below ---
In that case it will not be merged.

There are a lot of typos/mistakes inside, I'd better use the comment
below.

"data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will 
create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct and should be fixed."

With this change:
Reviewed-by: Denis V. Lunev 


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..a76cf9d993 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,22 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  
+file_size = bdrv_getlength(bs->file->bs);

+if (file_size < 0) {
+goto fail;
+}
+
+file_size >>= BDRV_SECTOR_BITS;
+if (s->data_end > file_size) {
+if (flags & BDRV_O_CHECK) {
+s->data_end = file_size;
+} else {
+error_setg(errp, "parallels: Offset in BAT is out of image");
+ret = -EINVAL;
+goto fail;
+}
+}
+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;

Re: [PATCH 2/2] target/riscv: fence: reconcile with specification

Happy to lower it back into the decode file.
However, I initially pulled it up into the trans-function to more
closely match the ISA specification: there is only one FENCE
instruction with 3 arguments (FM, PRED, and SUCC).
One might argue that the decode table for "RV32I Base Instruction Set"
in the specification lists FENCE.TSO as a separate instruction, but
the normative text doesn't (and FENCE overlaps FENCE.TSO in the
tabular representation) — so I would consider the table as
informative.

I'll wait until we see what consensus emerges from the discussion.

Philipp.

On Fri, 12 Aug 2022 at 15:21, Peter Maydell  wrote:
>
> On Fri, 12 Aug 2022 at 14:17, Philipp Tomsich  
> wrote:
> >
> > Our decoding of fence-instructions is problematic in respect to the
> > RISC-V ISA specification:
> > - rs and rd are ignored, but need to be 0
> > - fm is ignored
> >
> > This change adjusts the decode pattern to enfore rs and rd being 0,
> > and validates the fm-field (together with pred/succ for FENCE.TSO) to
> > determine whether a reserved instruction is specified.
> >
> > While the specification allows UNSPECIFIED behaviour for reserved
> > instructions, we now always raise an illegal instruction exception.
> >
> > Signed-off-by: Philipp Tomsich 
> >
> > ---
> >
> >  target/riscv/insn32.decode  |  2 +-
> >  target/riscv/insn_trans/trans_rvi.c.inc | 19 ++-
> >  2 files changed, 19 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> > index 089128c3dc..4e53df1b62 100644
> > --- a/target/riscv/insn32.decode
> > +++ b/target/riscv/insn32.decode
> > @@ -150,7 +150,7 @@ srl  000 .. 101 . 0110011 @r
> >  sra  010 .. 101 . 0110011 @r
> >  or   000 .. 110 . 0110011 @r
> >  and  000 .. 111 . 0110011 @r
> > -fence pred:4 succ:4 - 000 - 000
> > +fencefm:4 pred:4 succ:4 0 000 0 000
> >  fence_i   0 001 0 000
> >  csrrw . 001 . 1110011 @csr
> >  csrrs . 010 . 1110011 @csr
> > diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> > b/target/riscv/insn_trans/trans_rvi.c.inc
> > index ca8e3d1ea1..515bb3b22a 100644
> > --- a/target/riscv/insn_trans/trans_rvi.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> > @@ -795,7 +795,24 @@ static bool trans_srad(DisasContext *ctx, arg_srad *a)
> >
> >  static bool trans_fence(DisasContext *ctx, arg_fence *a)
> >  {
> > -/* FENCE is a full memory barrier. */
> > +switch (a->fm) {
> > +case 0b:
> > + /* normal fence */
> > + break;
> > +
> > +case 0b0001:
> > + /* FENCE.TSO requires PRED and SUCC to be RW */
> > + if (a->pred != 0xb0011 || a->succ != 0b0011) {
> > +return false;
> > + }
> > + break;
> > +
> > +default:
> > +/* reserved for future use */
> > +return false;
> > +}
>
> I think it would be neater to do this decode in the
> .decode file, rather than by hand in the trans function.
>
> thanks
> -- PMM

Re: [PATCH 1/2] target/riscv: fence.i: update decode pattern

On Fri, 12 Aug 2022 at 15:11, Philipp Tomsich  wrote:
>
> On Fri, 12 Aug 2022 at 16:01, Andrew Jones  wrote:
> >
> > > Update the decode pattern to reflect the specification.
> >
> > I got hung-up on this for a bit since there isn't any "must-be-0" fields,
>
> Please refer to '“Zifencei” Instruction-Fetch Fence, Version 2.0' in
> the specification.
> The encoding diagram clearly states 0 for imm[11:0], 0 for rs1 and 0 for rd.
>
> However, there is an explanatory paragraph below (unfortunately, it is
> not clear whether this is normative or informative):

> > The unused fields in the FENCE.I instruction, imm[11:0], rs1, and rd, are 
> > reserved for finer-grain fences in future extensions. For forward 
> > compatibility, base implementations shall ignore these fields, and standard 
> > software shall zero these fields.

That's pretty clear that this patch is wrong, then -- QEMU
is an implementation, and so we must ignore these fields.
Otherwise when a future version of the spec defines a finer-grain
fence instruction in this part of the encoding space, older
QEMU will incorrectly make software that uses it crash.

If you think the spec is insufficiently clear about whether that
is normative then that would be something to raise with the
spec authors, preferably before anybody builds hardware that
enforces must-be-zeroes on these fields...

thanks
-- PMM

Re: [PATCH v2 2/8] parallels: Move BAT entry setting to a separate function


On 11.08.2022 17:00, Alexander Ivanov wrote:

Will need to set BAT entry in multiple places.
Move the code of settings entries and marking relevant blocks dirty
to a separate helper parallels_set_bat_entry.

The comment and the patch text is ambiguous.
You say that we need to set BAT in multiple places
but patch changes one place only. I think that it would
be better to say like the following:

"parallels: create parallels_set_bat_entry_helper() to assign BAT value

This helper will be reused in next patches during parallels_co_check
rework to simplify its code"


v2: A new patch - a part of a splitted patch.

Same note about version tracking

With above taken into account:
Reviewed-by: Denis V. Lunev 


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index a76cf9d993..7f68f3cbc9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -165,6 +165,13 @@ static int64_t block_status(BDRVParallelsState *s, int64_t 
sector_num,
  return start_off;
  }
  
+static void parallels_set_bat_entry(BDRVParallelsState *s,

+uint32_t index, uint32_t offset)
+{
+s->bat_bitmap[index] = offset;
+bitmap_set(s->bat_dirty_bmap, bat_entry_off(index) / s->bat_dirty_block, 
1);
+}
+
  static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors, int *pnum)
  {
@@ -250,10 +257,9 @@ static int64_t allocate_clusters(BlockDriverState *bs, 
int64_t sector_num,
  }
  
  for (i = 0; i < to_allocate; i++) {

-s->bat_bitmap[idx + i] = cpu_to_le32(s->data_end / s->off_multiplier);
+parallels_set_bat_entry(s, idx + i,
+cpu_to_le32(s->data_end / s->off_multiplier));
  s->data_end += s->tracks;
-bitmap_set(s->bat_dirty_bmap,
-   bat_entry_off(idx + i) / s->bat_dirty_block, 1);
  }
  
  return bat2sect(s, idx) + sector_num % s->tracks;

Re: [PATCH v2 3/8] parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT flushing


Use generic infrastructure for BAT writing in parallels_co_check()

On 11.08.2022 17:00, Alexander Ivanov wrote:

It's too costly to write all the BAT to the disk. Let the flush function
write only dirty blocks.
Use parallels_set_bat_entry for setting a BAT entry and marking a relevant
block as dirty.
Move bdrv_co_flush call outside the locked area.

This is not correct too:
- with a lot of cases inside parallels_co_check, which will be split to
  smaller functions, we would like to avoid writing BAT once each
  stage is complete

Thus the comment should look like the following:
"BAT is written in the context of conventional operations over
the image inside bdrv_co_flush() when it calls
parallels_co_flush_to_os() callback. Thus we should not
modify BAT array directly, but call parallels_set_bat_entry()
helper and bdrv_co_flush() further on. After that there is no
need to manually write BAT and track its modification.

This makes code more generic and allows to split
parallels_set_bat_entry() for independent pieces."


v2: Patch order was changed so the replacement is done in parallels_co_check.
 Now we use a helper to set BAT entry and mark the block dirty.

Same note about changelog as n other patches.

Side note. I like Linux kernel a lot and there is a good
guide in. Please look how commit message
https://www.kernel.org/doc/html/latest/process/submitting-patches.html

May be you could spend some more time on commit
message.

With a better message:
Reviewed-by: Denis V. Lunev 


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 19 +++
  1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 7f68f3cbc9..6879ea4597 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -428,7 +428,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
  int64_t size, prev_off, high_off;
  int ret;
  uint32_t i;
-bool flush_bat = false;
  
  size = bdrv_getlength(bs->file->bs);

  if (size < 0) {
@@ -467,9 +466,8 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
  res->corruptions++;
  if (fix & BDRV_FIX_ERRORS) {
  prev_off = 0;
-s->bat_bitmap[i] = 0;
+parallels_set_bat_entry(s, i, 0);
  res->corruptions_fixed++;
-flush_bat = true;
  continue;
  }
  }
@@ -485,15 +483,6 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  prev_off = off;
  }
  
-ret = 0;

-if (flush_bat) {
-ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0);
-if (ret < 0) {
-res->check_errors++;
-goto out;
-}
-}
-
  res->image_end_offset = high_off + s->cluster_size;
  if (size > res->image_end_offset) {
  int64_t count;
@@ -522,6 +511,12 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  
  out:

  qemu_co_mutex_unlock(&s->lock);
+
+ret = bdrv_co_flush(bs);
+if (ret < 0) {
+res->check_errors++;
+}
+
  return ret;
  }

Re: [PATCH for-7.2 v4 02/11] ppc/pnv: add phb-id/chip-id PnvPHB4RootBus properties





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

The same rationale provided in the PHB3 bus case applies here.

Note: we could have merged both buses in a single object, like we did
with the root ports, and spare some boilerplate. The reason we opted to
preserve both buses objects is twofold:

- there's not user side advantage in doing so. Unifying the root ports
presents a clear user QOL change when we enable user created devices back.
The buses objects, aside from having a different QOM name, is transparent
to the user;

- we leave a door opened in case we want to increase the root port limit
for phb4/5 later on without having to deal with phb3 code.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb4.c | 51 ++
  include/hw/pci-host/pnv_phb4.h | 10 +++
  2 files changed, 61 insertions(+)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index b98c394713..824e1a73fb 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -1551,6 +1551,12 @@ void pnv_phb4_bus_init(DeviceState *dev, PnvPHB4 *phb)
   pnv_phb4_set_irq, pnv_phb4_map_irq, phb,
   &phb->pci_mmio, &phb->pci_io,
   0, 4, TYPE_PNV_PHB4_ROOT_BUS);
+
+object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id,
+&error_abort);
+object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id,
+&error_abort);
+
  pci_setup_iommu(pci->bus, pnv_phb4_dma_iommu, phb);
  pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
  }
@@ -1708,10 +1714,55 @@ static const TypeInfo pnv_phb5_type_info = {
  .instance_size = sizeof(PnvPHB4),
  };
  
+static void pnv_phb4_root_bus_get_prop(Object *obj, Visitor *v,

+   const char *name,
+   void *opaque, Error **errp)
+{
+PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj);
+uint64_t value = 0;
+
+if (strcmp(name, "phb-id") == 0) {
+value = bus->phb_id;
+} else {
+value = bus->chip_id;
+}
+
+visit_type_size(v, name, &value, errp);
+}
+
+static void pnv_phb4_root_bus_set_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+
+{
+PnvPHB4RootBus *bus = PNV_PHB4_ROOT_BUS(obj);
+uint64_t value;
+
+if (!visit_type_size(v, name, &value, errp)) {
+return;
+}
+
+if (strcmp(name, "phb-id") == 0) {
+bus->phb_id = value;
+} else {
+bus->chip_id = value;
+}
+}
+
  static void pnv_phb4_root_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *k = BUS_CLASS(klass);
  
+object_class_property_add(klass, "phb-id", "int",

+  pnv_phb4_root_bus_get_prop,
+  pnv_phb4_root_bus_set_prop,
+  NULL, NULL);
+
+object_class_property_add(klass, "chip-id", "int",
+  pnv_phb4_root_bus_get_prop,
+  pnv_phb4_root_bus_set_prop,
+  NULL, NULL);
+
  /*
   * PHB4 has only a single root complex. Enforce the limit on the
   * parent bus
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index 20aa4819d3..50d4faa001 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -45,7 +45,17 @@ typedef struct PnvPhb4DMASpace {
  QLIST_ENTRY(PnvPhb4DMASpace) list;
  } PnvPhb4DMASpace;
  
+/*

+ * PHB4 PCIe Root Bus
+ */
  #define TYPE_PNV_PHB4_ROOT_BUS "pnv-phb4-root"
+struct PnvPHB4RootBus {
+PCIBus parent;
+
+uint32_t chip_id;
+uint32_t phb_id;
+};
+OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB4RootBus, PNV_PHB4_ROOT_BUS)
  
  /*

   * PHB4 PCIe Host Bridge for PowerNV machines (POWER9)

Re: [PATCH for-7.2 v4 03/11] ppc/pnv: set root port chassis and slot using Bus properties





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

For default root ports we have a way of accessing chassis and slot,
before root_port_realize(), via pnv_phb_attach_root_port(). For the
future user created root ports this won't be the case: we can't use
this helper because we don't have access to the PHB phb-id/chip-id
values.

In earlier patches we've added phb-id and chip-id to pnv-phb-root-bus
objects. We're now able to use the bus to retrieve them. The bus is
reachable for both user created and default devices, so we're changing
all the code paths. This also allow us to validate these changes with
the existing default devices.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb.c | 25 -
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index c47ed92462..826c0c144e 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -25,21 +25,19 @@
   * QOM id. 'chip_id' is going to be used as PCIE chassis for the
   * root port.
   */
-static void pnv_phb_attach_root_port(PCIHostState *pci, int index, int chip_id)
+static void pnv_phb_attach_root_port(PCIHostState *pci)
  {
  PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
-g_autofree char *default_id = g_strdup_printf("%s[%d]",
-  TYPE_PNV_PHB_ROOT_PORT,
-  index);
  const char *dev_id = DEVICE(root)->id;
+g_autofree char *default_id = NULL;
+int index;
+
+index = object_property_get_int(OBJECT(pci->bus), "phb-id", &error_fatal);
+default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index);
  
  object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id,

OBJECT(root));
  
-/* Set unique chassis/slot values for the root port */

-qdev_prop_set_uint8(DEVICE(root), "chassis", chip_id);
-qdev_prop_set_uint16(DEVICE(root), "slot", index);
-
  pci_realize_and_unref(root, pci->bus, &error_fatal);
  }
  
@@ -93,7 +91,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)

  pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
  }
  
-pnv_phb_attach_root_port(pci, phb->phb_id, phb->chip_id);

+pnv_phb_attach_root_port(pci);
  }
  
  static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge,

@@ -162,9 +160,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
  {
  PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
  PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev);
+PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev));
  PCIDevice *pci = PCI_DEVICE(dev);
  uint16_t device_id = 0;
  Error *local_err = NULL;
+int chip_id, index;
+
+chip_id = object_property_get_int(OBJECT(bus), "chip-id", &error_fatal);
+index = object_property_get_int(OBJECT(bus), "phb-id", &error_fatal);
+
+/* Set unique chassis/slot values for the root port */
+qdev_prop_set_uint8(dev, "chassis", chip_id);
+qdev_prop_set_uint16(dev, "slot", index);
  
  rpc->parent_realize(dev, &local_err);

  if (local_err) {

Re: [PATCH v2 4/8] parallels: Move check of unclean image to a separate function


On 11.08.2022 17:00, Alexander Ivanov wrote:

v2: Revert the condition with s->header_unclean.

same comment about change log as previously

And commit message misses motivation part, why we are
doing this rework. What is the goal of this change?

The code part is clean.


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 31 +--
  1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 6879ea4597..c53b2810cf 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -419,6 +419,25 @@ static coroutine_fn int 
parallels_co_readv(BlockDriverState *bs,
  return ret;
  }
  
+static void parallels_check_unclean(BlockDriverState *bs,

+BdrvCheckResult *res,
+BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+
+if (!s->header_unclean) {
+return;
+}
+
+fprintf(stderr, "%s image was not closed correctly\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
+res->corruptions++;
+if (fix & BDRV_FIX_ERRORS) {
+/* parallels_close will do the job right */
+res->corruptions_fixed++;
+s->header_unclean = false;
+}
+}
  
  static int coroutine_fn parallels_co_check(BlockDriverState *bs,

 BdrvCheckResult *res,
@@ -436,16 +455,8 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  }
  
  qemu_co_mutex_lock(&s->lock);

-if (s->header_unclean) {
-fprintf(stderr, "%s image was not closed correctly\n",
-fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
-res->corruptions++;
-if (fix & BDRV_FIX_ERRORS) {
-/* parallels_close will do the job right */
-res->corruptions_fixed++;
-s->header_unclean = false;
-}
-}
+
+parallels_check_unclean(bs, res, fix);
  
  res->bfi.total_clusters = s->bat_size;

  res->bfi.compressed_clusters = 0; /* compression is not supported */

Re: [PATCH for-7.2 v4 01/11] ppc/pnv: add phb-id/chip-id PnvPHB3RootBus properties





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

We rely on the phb-id and chip-id, which are PHB properties, to assign
chassis and slot to the root port. For default devices this is no big
deal: the root port is being created under pnv_phb_realize() and the
values are being passed on via the 'index' and 'chip-id' of the
pnv_phb_attach_root_port() helper.

If we want to implement user created root ports we have a problem. The
user created root port will not be aware of which PHB it belongs to,
unless we're willing to violate QOM best practices and access the PHB
via dev->parent_bus->parent. What we can do is to access the root bus
parent bus.

Since we're already assigning the root port as QOM child of the bus, and
the bus is initiated using PHB properties, let's add phb-id and chip-id
as properties of the bus. This will allow us trivial access to them, for
both user-created and default root ports, without doing anything too
shady with QOM.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb3.c | 50 ++
  include/hw/pci-host/pnv_phb3.h |  9 +-
  2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/hw/pci-host/pnv_phb3.c b/hw/pci-host/pnv_phb3.c
index d4c04a281a..af8575c007 100644
--- a/hw/pci-host/pnv_phb3.c
+++ b/hw/pci-host/pnv_phb3.c
@@ -1006,6 +1006,11 @@ void pnv_phb3_bus_init(DeviceState *dev, PnvPHB3 *phb)
   &phb->pci_mmio, &phb->pci_io,
   0, 4, TYPE_PNV_PHB3_ROOT_BUS);
  
+object_property_set_int(OBJECT(pci->bus), "phb-id", phb->phb_id,

+&error_abort);
+object_property_set_int(OBJECT(pci->bus), "chip-id", phb->chip_id,
+&error_abort);
+
  pci_setup_iommu(pci->bus, pnv_phb3_dma_iommu, phb);
  }
  
@@ -1105,10 +1110,55 @@ static const TypeInfo pnv_phb3_type_info = {

  .instance_init = pnv_phb3_instance_init,
  };
  
+static void pnv_phb3_root_bus_get_prop(Object *obj, Visitor *v,

+   const char *name,
+   void *opaque, Error **errp)
+{
+PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj);
+uint64_t value = 0;
+
+if (strcmp(name, "phb-id") == 0) {
+value = bus->phb_id;
+} else {
+value = bus->chip_id;
+}
+
+visit_type_size(v, name, &value, errp);
+}
+
+static void pnv_phb3_root_bus_set_prop(Object *obj, Visitor *v,
+   const char *name,
+   void *opaque, Error **errp)
+
+{
+PnvPHB3RootBus *bus = PNV_PHB3_ROOT_BUS(obj);
+uint64_t value;
+
+if (!visit_type_size(v, name, &value, errp)) {
+return;
+}
+
+if (strcmp(name, "phb-id") == 0) {
+bus->phb_id = value;
+} else {
+bus->chip_id = value;
+}
+}
+
  static void pnv_phb3_root_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *k = BUS_CLASS(klass);
  
+object_class_property_add(klass, "phb-id", "int",

+  pnv_phb3_root_bus_get_prop,
+  pnv_phb3_root_bus_set_prop,
+  NULL, NULL);
+
+object_class_property_add(klass, "chip-id", "int",
+  pnv_phb3_root_bus_get_prop,
+  pnv_phb3_root_bus_set_prop,
+  NULL, NULL);
+
  /*
   * PHB3 has only a single root complex. Enforce the limit on the
   * parent bus
diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
index bff69201d9..4854f6d2f6 100644
--- a/include/hw/pci-host/pnv_phb3.h
+++ b/include/hw/pci-host/pnv_phb3.h
@@ -104,9 +104,16 @@ struct PnvPBCQState {
  };
  
  /*

- * PHB3 PCIe Root port
+ * PHB3 PCIe Root Bus
   */
  #define TYPE_PNV_PHB3_ROOT_BUS "pnv-phb3-root"
+struct PnvPHB3RootBus {
+PCIBus parent;
+
+uint32_t chip_id;
+uint32_t phb_id;
+};
+OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB3RootBus, PNV_PHB3_ROOT_BUS)
  
  /*

   * PHB3 PCIe Host Bridge for PowerNV machines (POWER8)

Re: [PATCH v2 5/8] parallels: Move check of cluster outside image to a separate function


On 11.08.2022 17:00, Alexander Ivanov wrote:

v2: Move unrelated helper parallels_set_bat_entry creation to
 a separate patch.

same notes as for previous patch



Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 48 ++-
  1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index c53b2810cf..12104ba5ad 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -439,6 +439,36 @@ static void parallels_check_unclean(BlockDriverState *bs,
  }
  }
  
+static int parallels_check_outside_image(BlockDriverState *bs,

+ BdrvCheckResult *res,
+ BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+uint32_t i;
+int64_t off, size;
+
+size = bdrv_getlength(bs->file->bs);
+if (size < 0) {
+res->check_errors++;
+return size;
+}
+
+for (i = 0; i < s->bat_size; i++) {
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (off > size) {
+fprintf(stderr, "%s cluster %u is outside image\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+res->corruptions++;
+if (fix & BDRV_FIX_ERRORS) {
+parallels_set_bat_entry(s, i, 0);
+res->corruptions_fixed++;
+}
+}
+}
+
+return 0;
+}
+
  static int coroutine_fn parallels_co_check(BlockDriverState *bs,
 BdrvCheckResult *res,
 BdrvCheckMode fix)
@@ -458,6 +488,11 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  
  parallels_check_unclean(bs, res, fix);
  
+ret = parallels_check_outside_image(bs, res, fix);

+if (ret < 0) {
+goto out;
+}
+
  res->bfi.total_clusters = s->bat_size;
  res->bfi.compressed_clusters = 0; /* compression is not supported */
  
@@ -470,19 +505,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs,

  continue;
  }
  
-/* cluster outside the image */

-if (off > size) {
-fprintf(stderr, "%s cluster %u is outside image\n",
-fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
-res->corruptions++;
-if (fix & BDRV_FIX_ERRORS) {
-prev_off = 0;
-parallels_set_bat_entry(s, i, 0);
-res->corruptions_fixed++;
-continue;
-}
-}
-
  res->bfi.allocated_clusters++;
  if (off > high_off) {
  high_off = off;

Re: [PULL 02/28] target/arm: Add coproc parameter to syn_fp_access_trap

On Fri, 10 Jun 2022 at 17:07, Peter Maydell  wrote:
>
> From: Richard Henderson 
>
> With ARMv8, this field is always RES0.
> With ARMv7, targeting EL2 and TA=0, it is always 0xA.

I was just looking at this change again because we still
have the loose end of syn_simd_access_trap() not being used,
and I realized that the claim in this commit message and the
comment isn't right. The "RES0 or fill in TA/copro fields"
test is not v8 vs v7, but "are we reporting this syndrome
to AArch64 in ESR_ELx or to AArch32 in HSR?".

I filed https://gitlab.com/qemu-project/qemu/-/issues/1153
to make a note of this since we might not get around to
fixing this for a while, given it's not very important.

thanks
-- PMM

Re: [PATCH v2 6/8] parallels: Move check of leaks to a separate function


On 11.08.2022 17:00, Alexander Ivanov wrote:

v2: No changes.

same notes about motivation, changelog as before


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 85 +--
  1 file changed, 52 insertions(+), 33 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 12104ba5ad..8737eadfb4 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -469,14 +469,13 @@ static int parallels_check_outside_image(BlockDriverState 
*bs,
  return 0;
  }
  
-static int coroutine_fn parallels_co_check(BlockDriverState *bs,

-   BdrvCheckResult *res,
-   BdrvCheckMode fix)
+static int parallels_check_leak(BlockDriverState *bs,
+BdrvCheckResult *res,
+BdrvCheckMode fix)
  {
  BDRVParallelsState *s = bs->opaque;
-int64_t size, prev_off, high_off;
-int ret;
-uint32_t i;
+int64_t size, off, high_off, count;
+int i, ret;
  
  size = bdrv_getlength(bs->file->bs);

  if (size < 0) {
@@ -484,41 +483,16 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  return size;
  }
  
-qemu_co_mutex_lock(&s->lock);

-
-parallels_check_unclean(bs, res, fix);
-
-ret = parallels_check_outside_image(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
-res->bfi.total_clusters = s->bat_size;
-res->bfi.compressed_clusters = 0; /* compression is not supported */
-
  high_off = 0;
-prev_off = 0;
  for (i = 0; i < s->bat_size; i++) {
-int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
-if (off == 0) {
-prev_off = 0;
-continue;
-}
-
-res->bfi.allocated_clusters++;
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
  if (off > high_off) {
  high_off = off;
  }
-
-if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
-res->bfi.fragmented_clusters++;
-}
-prev_off = off;
  }
  
  res->image_end_offset = high_off + s->cluster_size;

  if (size > res->image_end_offset) {
-int64_t count;
  count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
  fprintf(stderr, "%s space leaked at the end of the image %" PRId64 
"\n",
  fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
@@ -536,11 +510,56 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  if (ret < 0) {
  error_report_err(local_err);
  res->check_errors++;
-goto out;
+return ret;
  }
  res->leaks_fixed += count;
  }
  }
+return 0;
+}
+
+static int coroutine_fn parallels_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res,
+   BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int64_t prev_off;
+int ret;
+uint32_t i;
+
+


please remove extra empty line. This looks like an accident

+qemu_co_mutex_lock(&s->lock);
+
+parallels_check_unclean(bs, res, fix);
+
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+res->bfi.total_clusters = s->bat_size;
+res->bfi.compressed_clusters = 0; /* compression is not supported */
+
+prev_off = 0;
+for (i = 0; i < s->bat_size; i++) {
+int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (off == 0) {
+prev_off = 0;
+continue;
+}
+
+res->bfi.allocated_clusters++;
+
+if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
+res->bfi.fragmented_clusters++;
+}
+prev_off = off;
+}
  
  out:

  qemu_co_mutex_unlock(&s->lock);

Re: [PATCH v2 7/8] parallels: Move statistic collection to a separate function


On 11.08.2022 17:00, Alexander Ivanov wrote:

v2: Move fragmentation counting code to this function too.


same note here about ChnageLog and motivation


Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 54 +++
  1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8737eadfb4..d0364182bb 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -518,48 +518,56 @@ static int parallels_check_leak(BlockDriverState *bs,
  return 0;
  }
  
-static int coroutine_fn parallels_co_check(BlockDriverState *bs,

-   BdrvCheckResult *res,
-   BdrvCheckMode fix)
+static void parallels_collect_statistics(BlockDriverState *bs,
+ BdrvCheckResult *res,
+ BdrvCheckMode fix)
  {
  BDRVParallelsState *s = bs->opaque;
-int64_t prev_off;
-int ret;
+int64_t off, prev_off;
  uint32_t i;
  
-

-qemu_co_mutex_lock(&s->lock);
-
-parallels_check_unclean(bs, res, fix);
-
-ret = parallels_check_outside_image(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
-ret = parallels_check_leak(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
  res->bfi.total_clusters = s->bat_size;
  res->bfi.compressed_clusters = 0; /* compression is not supported */
  
  prev_off = 0;

  for (i = 0; i < s->bat_size; i++) {
-int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
  if (off == 0) {
  prev_off = 0;
  continue;
  }
  
-res->bfi.allocated_clusters++;

-
  if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
  res->bfi.fragmented_clusters++;
  }
+
  prev_off = off;
+res->bfi.allocated_clusters++;
  }
+}
+
+static int coroutine_fn parallels_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res,
+   BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int ret;
+
+qemu_co_mutex_lock(&s->lock);
+
+parallels_check_unclean(bs, res, fix);
+
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+parallels_collect_statistics(bs, res, fix);
  
  out:

  qemu_co_mutex_unlock(&s->lock);

Re: [PATCH for-7.2 v4 04/11] ppc/pnv: add helpers for pnv-phb user devices





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

pnv_parent_qom_fixup() and pnv_parent_bus_fixup() are versions of the
helpers that were reverted by commit 9c10d86fee "ppc/pnv: Remove
user-created PHB{3,4,5} devices". They are needed to amend the QOM and
bus hierarchies of user created pnv-phbs, matching them with default
pnv-phbs.

A new helper pnv_phb_user_device_init() is created to handle
user-created devices setup. We're going to call it inside
pnv_phb_realize() in case we're realizing an user created device. This
will centralize all user device realated in a single spot, leaving the
realize functions of the phb3/phb4 backends untouched.

Another helper called pnv_chip_add_phb() was added to handle the
particularities of each chip version when adding a new PHB.

Signed-off-by: Daniel Henrique Barboza 
---



Just a minor typo in a comment below.
Reviewed-by: Frederic Barrat 



diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index f9e5a3d248..2deaac17f7 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -281,6 +281,26 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
pir,
  g_free(reg);
  }
  
+/*

+ * Adds a PnvPHB to the chip. Returns the parent obj of the
+ * PHB which varies with each version (phb version 3 is parented
+ * by the chip, version 4 and 4 are parented by the PEC



typo-^

  Fred



+ * device).
+ *
+ * TODO: for version 3 we're still parenting the PHB with the
+ * chip. We should parent with a (so far not implemented)
+ * PHB3 PEC device.
+ */
+Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp)
+{
+if (phb->version == 3) {
+return OBJECT(chip);
+} else {
+/* phb4 support will be added later */
+return NULL;
+}
+}
+
  static void pnv_chip_power8_dt_populate(PnvChip *chip, void *fdt)
  {
  static const char compat[] = "ibm,power8-xscom\0ibm,xscom";
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 033d907287..781d0acffa 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -231,6 +231,7 @@ struct PnvMachineState {
  };
  
  PnvChip *pnv_get_chip(PnvMachineState *pnv, uint32_t chip_id);

+Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp);
  
  #define PNV_FDT_ADDR  0x0100

  #define PNV_TIMEBASE_FREQ 51200ULL

Re: [PATCH for-7.2 v4 05/11] ppc/pnv: turn chip8->phbs[] into a PnvPHB* array





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

When enabling user created PHBs (a change reverted by commit 9c10d86fee)
we were handling PHBs created by default versus by the user in different
manners. The only difference between these PHBs is that one will have a
valid phb3->chip that is assigned during pnv_chip_power8_realize(),
while the user created needs to search which chip it belongs to.

Aside from that there shouldn't be any difference. Making the default
PHBs behave in line with the user created ones will make it easier to
re-introduce them later on. It will also make the code easier to follow
since we are dealing with them in equal manner.

The first step is to turn chip8->phbs[] into a PnvPHB3 pointer array.
This will allow us to assign user created PHBs into it later on. The way
we initilize the default case is now more in line with that would happen
with the user created case: the object is created, parented by the chip
because pnv_xscom_dt() relies on it, and then assigned to the array.

Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/ppc/pnv.c | 27 ++-
  include/hw/ppc/pnv.h |  6 +-
  2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 2deaac17f7..e53e9e297d 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -294,6 +294,13 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
pir,
  Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, Error **errp)
  {
  if (phb->version == 3) {
+Pnv8Chip *chip8 = PNV8_CHIP(chip);
+
+phb->chip = chip;
+
+chip8->phbs[chip8->num_phbs] = phb;
+chip8->num_phbs++;
+
  return OBJECT(chip);
  } else {
  /* phb4 support will be added later */
@@ -681,7 +688,7 @@ static void pnv_chip_power8_pic_print_info(PnvChip *chip, 
Monitor *mon)
  ics_pic_print_info(&chip8->psi.ics, mon);
  
  for (i = 0; i < chip8->num_phbs; i++) {

-PnvPHB *phb = &chip8->phbs[i];
+PnvPHB *phb = chip8->phbs[i];
  PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
  
  pnv_phb3_msi_pic_print_info(&phb3->msis, mon);

@@ -1174,7 +1181,17 @@ static void pnv_chip_power8_instance_init(Object *obj)
  chip8->num_phbs = pcc->num_phbs;
  
  for (i = 0; i < chip8->num_phbs; i++) {

-object_initialize_child(obj, "phb[*]", &chip8->phbs[i], TYPE_PNV_PHB);
+Object *phb = object_new(TYPE_PNV_PHB);
+
+/*
+ * We need the chip to parent the PHB to allow the DT
+ * to build correctly (via pnv_xscom_dt()).
+ *
+ * TODO: the PHB should be parented by a PEC device that, at
+ * this moment, is not modelled powernv8/phb3.
+ */
+object_property_add_child(obj, "phb[*]", phb);
+chip8->phbs[i] = PNV_PHB(phb);
  }
  
  }

@@ -1290,7 +1307,7 @@ static void pnv_chip_power8_realize(DeviceState *dev, 
Error **errp)
  
  /* PHB controllers */

  for (i = 0; i < chip8->num_phbs; i++) {
-PnvPHB *phb = &chip8->phbs[i];
+PnvPHB *phb = chip8->phbs[i];
  
  object_property_set_int(OBJECT(phb), "index", i, &error_fatal);

  object_property_set_int(OBJECT(phb), "chip-id", chip->chip_id,
@@ -1958,7 +1975,7 @@ static ICSState *pnv_ics_get(XICSFabric *xi, int irq)
  }
  
  for (j = 0; j < chip8->num_phbs; j++) {

-PnvPHB *phb = &chip8->phbs[j];
+PnvPHB *phb = chip8->phbs[j];
  PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
  
  if (ics_valid_irq(&phb3->lsis, irq)) {

@@ -1997,7 +2014,7 @@ static void pnv_ics_resend(XICSFabric *xi)
  ics_resend(&chip8->psi.ics);
  
  for (j = 0; j < chip8->num_phbs; j++) {

-PnvPHB *phb = &chip8->phbs[j];
+PnvPHB *phb = chip8->phbs[j];
  PnvPHB3 *phb3 = PNV_PHB3(phb->backend);
  
  ics_resend(&phb3->lsis);

diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 781d0acffa..49433281d7 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -81,7 +81,11 @@ struct Pnv8Chip {
  PnvHomer homer;
  
  #define PNV8_CHIP_PHB3_MAX 4

-PnvPHB   phbs[PNV8_CHIP_PHB3_MAX];
+/*
+ * The array is used to allow quick access to the phbs by
+ * pnv_ics_get_child() and pnv_ics_resend_child().
+ */
+PnvPHB   *phbs[PNV8_CHIP_PHB3_MAX];
  uint32_t num_phbs;
  
  XICSFabric*xics;

Re: qemu-system-aarch64: Failed to retrieve host CPU features

2022-08-12 Thread Marc Zyngier

Hi Peter,

On Fri, 12 Aug 2022 10:25:55 +0100,
Peter Maydell  wrote:
> 
> I've added some more relevant mailing lists to the cc.
> 
> On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov  wrote:
> > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote:
> > > I noticed that we starting to get many errors like this:
> > >
> > >   qemu-system-aarch64: Failed to retrieve host CPU features
> > >
> > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and
> > > Linux kernel is v5.15.59, but it started to appear months before that.
> > >
> > > strace shows in erroneous case:
> > >
> > >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > > call)
> > >
> > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu:
> > >
> > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
> > > if (vmfd < 0) {
> > > goto err;
> > > }
> > >
> > > Maybe it should restart ioctl on EINTR?
> > >
> > > I don't see EINTR documented in ioctl(2) nor in Linux'
> > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it
> > > says "an unmasked signal is pending".
> >
> > I am suggested that almost any blocking syscall could return EINTR, so I
> > checked the strace log and it does not show evidence of arriving a signal,
> > the log ends like this:
> >
> >   1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9
> >   1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48
> >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > call)
> >   1152244 close(9)= 0
> >   1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, 
> > st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, 
> > st_blocks=0, st_size=0, st_atime=1660268019 /* 
> > 2022-08-12T01:33:39.850436293+ */, st_atime_nsec=850436293, 
> > st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+ */, 
> > st_mtime_nsec=850436293, st_ctime=1660268019 /* 
> > 2022-08-12T01:33:39.850436293+ */, st_ctime_nsec=850436293}, 
> > AT_EMPTY_PATH) = 0
> >   1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58
> >   1152244 exit_group(1)   = ?
> >   1152245 <... clock_nanosleep resumed> ) = ?
> >   1152245 +++ exited with 1 +++
> >   1152244 +++ exited with 1 +++
> 
> KVM folks: should we expect that KVM_CREATE_VM might fail EINTR
> and need retrying?

In general, yes. But for this particular one, this is pretty odd.

The only path I can so far see that would match this behaviour is if
mm_take_all_locks() (called from __mmu_notifier_register()) was
getting interrupted by a signal (I'm looking at a 5.19-ish kernel,
which may slightly differ from the 5.15 mentioned above).

But as Vitaly points out, it doesn't seem to be a signal delivered
here.

Vitaly: could you please share your exact test case (full qemu command
line), and instrument your kernel to see if mm_take_all_locks() is the
one failing?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.

Re: [PATCH v2 8/8] parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD


On 11.08.2022 17:00, Alexander Ivanov wrote:

Replace the way we use mutex in parallels_co_check() for more clean code.

I think that "cleaness" is the same, but new code would be just shorter ;)
or less error prone.


v2: Fix an incorrect usage of WITH_QEMU_LOCK_GUARD.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 26 --
  1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index d0364182bb..e124a8bb7d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -553,24 +553,22 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  BDRVParallelsState *s = bs->opaque;
  int ret;
  
-qemu_co_mutex_lock(&s->lock);

+WITH_QEMU_LOCK_GUARD(&s->lock) {
+parallels_check_unclean(bs, res, fix);
  
-parallels_check_unclean(bs, res, fix);

+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+return ret;
+}
  
-ret = parallels_check_outside_image(bs, res, fix);

-if (ret < 0) {
-goto out;
-}
-
-ret = parallels_check_leak(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+return ret;
+}
  
-parallels_collect_statistics(bs, res, fix);

+parallels_collect_statistics(bs, res, fix);
  
-out:

-qemu_co_mutex_unlock(&s->lock);
+}
  
  ret = bdrv_co_flush(bs);

  if (ret < 0) {

Re: [PATCH v2] hw/smbios: support for type 8 (port connector)

2022-08-12 Thread Michael S. Tsirkin

On Fri, Aug 12, 2022 at 03:51:53PM +0200, Hal Martin wrote:
> PATCH v1: add support for SMBIOS type 8 to qemu
> PATCH v2: incorporate patch v1 feedback and add smbios type=8 to qemu-options

history after --- pls

> internal_reference: internal reference designator
> external_reference: external reference designator
> connector_type: hex value for port connector type (see SMBIOS 7.9.2)
> port_type: hex value for port type (see SMBIOS 7.9.3)
> 
> After studying various vendor implementationsi (Dell, Lenovo, MSI),
> the value of internal connector type was hard-coded to 0x0 (None).
> 
> Example usage:
> -smbios 
> type=8,internal_reference=JUSB1,external_reference=USB1,connector_type=0x12,port_type=0x10
>  \
> -smbios type=8,internal_reference=JAUD1,external_reference="Audio 
> Jack",connector_type=0x1f,port_type=0x1d \
> -smbios 
> type=8,internal_reference=LAN,external_reference=Ethernet,connector_type=0x0b,port_type=0x1f
>  \
> -smbios 
> type=8,internal_reference=PS2,external_reference=Mouse,connector_type=0x0f,port_type=0x0e
>  \
> -smbios 
> type=8,internal_reference=PS2,external_reference=Keyboard,connector_type=0x0f,port_type=0x0d
> 
> 
> Signed-off-by: Hal Martin 

We are in freeze, I tagged this for after the release.
Just to make sure pls ping me after the release if possible.



> ---
>  hw/smbios/smbios.c   | 63 
>  include/hw/firmware/smbios.h | 10 ++
>  qemu-options.hx  |  2 ++
>  3 files changed, 75 insertions(+)
> 
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index 60349ee402..578cae0f0a 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c
> @@ -111,6 +111,13 @@ static struct {
>  .processor_id = 0,
>  };
>  
> +struct type8_instance {
> +const char *internal_reference, *external_reference;
> +uint8_t connector_type, port_type;
> +QTAILQ_ENTRY(type8_instance) next;
> +};
> +static QTAILQ_HEAD(, type8_instance) type8 = QTAILQ_HEAD_INITIALIZER(type8);
> +
>  static struct {
>  size_t nvalues;
>  char **values;
> @@ -337,6 +344,29 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = {
>  { /* end of list */ }
>  };
>  
> +static const QemuOptDesc qemu_smbios_type8_opts[] = {
> +{
> +.name = "internal_reference",
> +.type = QEMU_OPT_STRING,
> +.help = "internal reference designator",
> +},
> +{
> +.name = "external_reference",
> +.type = QEMU_OPT_STRING,
> +.help = "external reference designator",
> +},
> +{
> +.name = "connector_type",
> +.type = QEMU_OPT_NUMBER,
> +.help = "connector type",
> +},
> +{
> +.name = "port_type",
> +.type = QEMU_OPT_NUMBER,
> +.help = "port type",
> +},
> +};
> +
>  static const QemuOptDesc qemu_smbios_type11_opts[] = {
>  {
>  .name = "value",
> @@ -718,6 +748,26 @@ static void smbios_build_type_4_table(MachineState *ms, 
> unsigned instance)
>  smbios_type4_count++;
>  }
>  
> +static void smbios_build_type_8_table(void)
> +{
> +unsigned instance = 0;
> +struct type8_instance *t8;
> +
> +QTAILQ_FOREACH(t8, &type8, next) {
> +SMBIOS_BUILD_TABLE_PRE(8, T0_BASE + instance, true);
> +
> +SMBIOS_TABLE_SET_STR(8, internal_reference_str, 
> t8->internal_reference);
> +SMBIOS_TABLE_SET_STR(8, external_reference_str, 
> t8->external_reference);
> +/* most vendors seem to set this to None */
> +t->internal_connector_type = 0x0;
> +t->external_connector_type = t8->connector_type;
> +t->port_type = t8->port_type;
> +
> +SMBIOS_BUILD_TABLE_POST;
> +instance++;
> +}
> +}
> +
>  static void smbios_build_type_11_table(void)
>  {
>  char count_str[128];
> @@ -1030,6 +1080,7 @@ void smbios_get_tables(MachineState *ms,
>  smbios_build_type_4_table(ms, i);
>  }
>  
> +smbios_build_type_8_table();
>  smbios_build_type_11_table();
>  
>  #define MAX_DIMM_SZ (16 * GiB)
> @@ -1346,6 +1397,18 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
> UINT16_MAX);
>  }
>  return;
> +case 8:
> +if (!qemu_opts_validate(opts, qemu_smbios_type8_opts, errp)) {
> +return;
> +}
> +struct type8_instance *t;
> +t = g_new0(struct type8_instance, 1);
> +save_opt(&t->internal_reference, opts, "internal_reference");
> +save_opt(&t->external_reference, opts, "external_reference");
> +t->connector_type = qemu_opt_get_number(opts, "connector_type", 
> 0);
> +t->port_type = qemu_opt_get_number(opts, "port_type", 0);
> +QTAILQ_INSERT_TAIL(&type8, t, next);
> +return;
>  case 11:
>  if (!qemu_opts_validate(opts, qemu_smbios_type11_opts, errp)) {
>  return;
> diff --git a/include/hw/firmware/smbios.h b/inclu

Re: [PATCH v3 1/4] accel/tcg: Invalidate translations when clearing PAGE_EXEC

2022-08-12 Thread Ilya Leoshkevich

On Thu, 2022-08-11 at 08:42 -0700, Richard Henderson wrote:
> On 8/11/22 02:28, Ilya Leoshkevich wrote:
> > How is qemu-user's get_page_addr_code() involved here?
> > 
> > I tried to experiment with it, and while I agree that it looks
> > buggy,
> > it's called only from translation code paths. If we already have a
> > translation block, these code paths are not used.
> 
> It's called from tb_lookup too, when we're trying to find an existing
> TB.
> 
> 
> r~
> 

Oh, I see. I was first worried about direct block chaining with
goto_tb, but it turned out that translator_use_goto_tb() prevented it.

tb_lookup() skips get_page_addr_code() if tb is found in tb_jmp_cache.
I assume it's a bug?

Re: [PATCH v2 0/8] parallels: Refactor the code of images checks and fix a bug


On 11.08.2022 17:00, Alexander Ivanov wrote:

Fix image inflation when offset in BAT is out of image.

Replace whole BAT syncing by flushing only dirty blocks.

Move all the checks outside the main check function in
separate functions

Use WITH_QEMU_LOCK_GUARD for more clean code.

Alexander Ivanov (8):
   parallels: Out of image offset in BAT leads to image inflation
   parallels: Move BAT entry setting to a separate function
   parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT
 flushing
   parallels: Move check of unclean image to a separate function
   parallels: Move check of cluster outside image to a separate function
   parallels: Move check of leaks to a separate function
   parallels: Move statistic collection to a separate function
   parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD

  block/parallels.c | 188 --
  1 file changed, 132 insertions(+), 56 deletions(-)


I believe that this series is ready to go once commit
messages will be improved.

Den

Re: qemu-system-aarch64: Failed to retrieve host CPU features

2022-08-12 Thread Marc Zyngier

On Fri, 12 Aug 2022 10:25:55 +0100,
Peter Maydell  wrote:
> 
> I've added some more relevant mailing lists to the cc.
> 
> On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov  wrote:
> > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote:
> > > I noticed that we starting to get many errors like this:
> > >
> > >   qemu-system-aarch64: Failed to retrieve host CPU features
> > >
> > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and
> > > Linux kernel is v5.15.59, but it started to appear months before that.
> > >
> > > strace shows in erroneous case:
> > >
> > >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > > call)
> > >
> > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu:
> > >
> > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
> > > if (vmfd < 0) {
> > > goto err;
> > > }
> > >
> > > Maybe it should restart ioctl on EINTR?
> > >
> > > I don't see EINTR documented in ioctl(2) nor in Linux'
> > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it
> > > says "an unmasked signal is pending".
> >
> > I am suggested that almost any blocking syscall could return EINTR, so I
> > checked the strace log and it does not show evidence of arriving a signal,
> > the log ends like this:
> >
> >   1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9
> >   1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48
> >   1152244 ioctl(9, KVM_CREATE_VM, 0x30)   = -1 EINTR (Interrupted system 
> > call)
> >   1152244 close(9)= 0
> >   1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, 
> > st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, 
> > st_blocks=0, st_size=0, st_atime=1660268019 /* 
> > 2022-08-12T01:33:39.850436293+ */, st_atime_nsec=850436293, 
> > st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+ */, 
> > st_mtime_nsec=850436293, st_ctime=1660268019 /* 
> > 2022-08-12T01:33:39.850436293+ */, st_ctime_nsec=850436293}, 
> > AT_EMPTY_PATH) = 0
> >   1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58
> >   1152244 exit_group(1)   = ?
> >   1152245 <... clock_nanosleep resumed> ) = ?
> >   1152245 +++ exited with 1 +++
> >   1152244 +++ exited with 1 +++
> 
> KVM folks: should we expect that KVM_CREATE_VM might fail EINTR
> and need retrying?
> 
> (I suspect the answer is "yes", given we do this in the generic
> code in kvm-all.c.)

Interestingly, this has cropped up in the (distant) past:

https://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01031.html

and seems to point at the path I was mentioning earlier (the code
hasn't changed too much since, apparently).

I'd still like to understand the underlying reason though.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.

[PATCH] hw/arm/bcm2835_property: Add support for RPI_FIRMWARE_FRAMEBUFFER_GET_NUM_DISPLAYS

2022-08-12 Thread Enrik Berkhan

In more recent Raspbian OS Linux kernels, the fb driver gives up
immediately if RPI_FIRMWARE_FRAMEBUFFER_GET_NUM_DISPLAYS fails or no
displays are reported.

This change simply always reports one display. It makes bcm2835_fb work
again with these more recent kernels.

Signed-off-by: Enrik Berkhan 
---
 hw/misc/bcm2835_property.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c
index e94e951057..890ae7bae5 100644
--- a/hw/misc/bcm2835_property.c
+++ b/hw/misc/bcm2835_property.c
@@ -270,6 +270,10 @@ static void 
bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value)
 stl_le_phys(&s->dma_as, value + 12, 0);
 resplen = 4;
 break;
+case 0x00040013: /* Get number of displays */
+stl_le_phys(&s->dma_as, value + 12, 1);
+resplen = 4;
+break;
 
 case 0x00060001: /* Get DMA channels */
 /* channels 2-5 */
-- 
2.34.1

Re: [PATCH for-7.2 v4 07/11] ppc/pnv: add PHB4 helpers for user created pnv-phb





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

The PHB4 backend relies on a link with the corresponding PEC element.
This is trivial to do during machine_init() time for default devices,
but not so much for user created ones.

pnv_phb4_get_pec() is a small variation of the function that was
reverted by commit 9c10d86fee "ppc/pnv: Remove user-created PHB{3,4,5}
devices". We'll use it to determine the appropriate PEC for a given user
created pnv-phb that uses a PHB4 backend.

This is done during realize() time, in pnv_phb_user_device_init().

Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/ppc/pnv.c | 35 ---
  1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index e82d6522f0..0644f45a1d 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -281,6 +281,34 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
pir,
  g_free(reg);
  }
  
+static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb,

+ Error **errp)
+{
+Pnv9Chip *chip9 = PNV9_CHIP(chip);
+int chip_id = phb->chip_id;
+int index = phb->phb_id;
+int i, j;
+
+for (i = 0; i < chip->num_pecs; i++) {
+/*
+ * For each PEC, check the amount of phbs it supports
+ * and see if the given phb4 index matches an index.
+ */
+PnvPhb4PecState *pec = &chip9->pecs[i];
+
+for (j = 0; j < pec->num_phbs; j++) {
+if (index == pnv_phb4_pec_get_phb_id(pec, j)) {
+return pec;
+}
+}
+}
+error_setg(errp,
+   "pnv-phb4 chip-id %d index %d didn't match any existing PEC",
+   chip_id, index);
+
+return NULL;
+}
+
  /*
   * Adds a PnvPHB to the chip. Returns the parent obj of the
   * PHB which varies with each version (phb version 3 is parented
@@ -302,10 +330,11 @@ Object *pnv_chip_add_phb(PnvChip *chip, PnvPHB *phb, 
Error **errp)
  chip8->num_phbs++;
  
  return OBJECT(chip);

-} else {
-/* phb4 support will be added later */
-return NULL;
  }
+
+phb->pec = pnv_phb4_get_pec(chip, PNV_PHB4(phb->backend), errp);
+
+return OBJECT(phb->pec);
  }
  
  static void pnv_chip_power8_dt_populate(PnvChip *chip, void *fdt)

Re: [PATCH for-7.2 v4 09/11] ppc/pnv: change pnv_phb4_get_pec() to also retrieve chip10->pecs





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

The function assumes that we're always dealing with a PNV9_CHIP()
object. This is not the case when the pnv-phb device belongs to a
powernv10 machine.

Change pnv_phb4_get_pec() to be able to work with PNV10_CHIP() if
necessary.

Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/ppc/pnv.c | 17 +++--
  1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index ec0558ed1c..e6c14fe231 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -284,17 +284,30 @@ static void pnv_dt_icp(PnvChip *chip, void *fdt, uint32_t 
pir,
  static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb,
   Error **errp)
  {
-Pnv9Chip *chip9 = PNV9_CHIP(chip);
+PnvPHB *phb_base = phb->phb_base;
+PnvPhb4PecState *pecs = NULL;
  int chip_id = phb->chip_id;
  int index = phb->phb_id;
  int i, j;
  
+if (phb_base->version == 4) {

+Pnv9Chip *chip9 = PNV9_CHIP(chip);
+
+pecs = chip9->pecs;
+} else if (phb_base->version == 5) {
+Pnv10Chip *chip10 = PNV10_CHIP(chip);
+
+pecs = chip10->pecs;
+} else {
+g_assert_not_reached();
+}
+
  for (i = 0; i < chip->num_pecs; i++) {
  /*
   * For each PEC, check the amount of phbs it supports
   * and see if the given phb4 index matches an index.
   */
-PnvPhb4PecState *pec = &chip9->pecs[i];
+PnvPhb4PecState *pec = &pecs[i];
  
  for (j = 0; j < pec->num_phbs; j++) {

  if (index == pnv_phb4_pec_get_phb_id(pec, j)) {

Re: [PATCH for-7.2 v4 10/11] ppc/pnv: user creatable pnv-phb for powernv10





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

Given that powernv9 and powernv10 uses the same pnv-phb backend, the
logic to allow user created pnv-phbs for powernv10 is already in place.
Let's flip the switch.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 

  Fred



  hw/ppc/pnv.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index e6c14fe231..9bf35ca9d6 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2226,6 +2226,8 @@ static void pnv_machine_power10_class_init(ObjectClass 
*oc, void *data)
  pmc->dt_power_mgt = pnv_dt_power_mgt;
  
  xfc->match_nvt = pnv10_xive_match_nvt;

+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
  }
  
  static bool pnv_machine_get_hb(Object *obj, Error **errp)

Re: [BUG] cxl can not create region

2022-08-12 Thread Jonathan Cameron via

On Thu, 11 Aug 2022 18:08:57 +0100
Jonathan Cameron via  wrote:

> On Tue, 9 Aug 2022 17:08:25 +0100
> Jonathan Cameron  wrote:
> 
> > On Tue, 9 Aug 2022 21:07:06 +0800
> > Bobo WL  wrote:
> >   
> > > Hi Jonathan
> > > 
> > > Thanks for your reply!
> > > 
> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > >  wrote:
> > > >
> > > > Probably not related to your problem, but there is a disconnect in QEMU 
> > > > /
> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > has a single root port. Spec allows it to be provided or not as an 
> > > > implementation choice.
> > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > >
> > > > The temporary solution is to throw in a second root port on the HB and 
> > > > not
> > > > connect anything to it.  Longer term I may special case this so that 
> > > > the particular
> > > > decoder defaults to pass through settings in QEMU if there is only one 
> > > > root port.
> > > >  
> > > 
> > > You are right! After adding an extra HB in qemu, I can create a x1
> > > region successfully.
> > > But have some errors in Nvdimm:
> > > 
> > > [   74.925838] Unknown online node for memory at 0x100, assuming 
> > > node 0
> > > [   74.925846] Unknown target node for memory at 0x100, assuming 
> > > node 0
> > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
> > 
> > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to 
> > chase
> > down. Once I reach this state I can verify the HDM Decode is correct which 
> > is what
> > I've been using to test (Which wasn't true until earlier this week). 
> > I'm currently testing via devmem, more for historical reasons than because 
> > it makes
> > that much sense anymore.
> 
> *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> I'd forgotten that was still on the todo list. I don't think it will
> be particularly hard to do and will take a look in next few days.
> 
> Very very indirectly this error is causing a driver probe fail that means that
> we hit a code path that has a rather odd looking check on NDD_LABELING.
> Should not have gotten near that path though - hence the problem is actually
> when we call cxl_pmem_get_config_data() and it returns an error because
> we haven't fully connected up the command in QEMU.

So a least one bug in QEMU. We were not supporting variable length payloads on 
mailbox
inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
We just need to relax condition on the supplied length.

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index c352a935c4..fdda9529fe 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 cxl_cmd = &cxl_cmd_set[set][cmd];
 h = cxl_cmd->handler;
 if (h) {
-if (len == cxl_cmd->in) {
+if (len == cxl_cmd->in || !cxl_cmd->in) {
 cxl_cmd->payload = cxl_dstate->mbox_reg_state +
 A_CXL_DEV_CMD_PAYLOAD;
 ret = (*h)(cxl_cmd, cxl_dstate, &len);


This lets the nvdimm/region probe fine, but I'm getting some issues with
namespace capacity so I'll look at what is causing that next.
Unfortunately I'm not that familiar with the driver/nvdimm side of things
so it's take a while to figure out what kicks off what!

Jonathan

> 
> Jonathan
> 
> 
> >   
> > > 
> > > And x4 region still failed with same errors, using latest cxl/preview
> > > branch don't work.
> > > I have picked "Two CXL emulation fixes" patches in qemu, still not 
> > > working.
> > > 
> > > Bob
> 
>

Re: [RFC v2 00/10] Introduce an extensible static analyzer

2022-08-12 Thread Alberto Faria

On Thu, Aug 4, 2022 at 12:44 PM Marc-André Lureau
 wrote:
> Hi
>
> Great work so far! This seems easier to hack than my attempt to use
> clang-tidy to write some qemu checks
> (https://github.com/elmarco/clang-tools-extra)
>
> The code seems quite generic, I wonder if such a tool in python wasn't
> already developed (I couldn't find it easily searching on github).
>
> Why not make it standalone from qemu? Similar to
> https://gitlab.com/qemu-project/python-qemu-qmp, you could have your
> own release management, issue tracker, code formatting, license, CI
> etc. (you should add copyright header in each file, at least that's
> pretty much required in qemu nowadays). You could also have the
> qemu-specific checks there imho (clang-tidy has google & llvm specific
> checks too)

This is an interesting idea. Indeed, the analyzer is essentially a
more easily extensible, Python version of clang-tidy (without the big
built-in library of checks).

I think I'll continue working on it embedded in QEMU for now, mostly
because it depends on some aspects of the build system, and gradually
generalize it to a point where it could be made into a standalone
thing.

> It would be nice to write some docs, in docs/devel/testing.rst and
> some new meson/ninja/make targets to run the checks directly from a
> build tree.

Sounds good, I'll work on it.

> On fc36, I had several dependencies I needed to install manually (imho
> they should have been pulled by python3-clang), but more annoyingly I
> got:
> clang.cindex.LibclangError: libclang.so: cannot open shared object
> file: No such file or directory. To provide a path to libclang use
> Config.set_library_path() or Config.set_library_file().
>
> clang-libs doesn't install libclang.so, I wonder why. I made a link
> manually and it works, but it's probably incorrect. I'll try to open
> issues for the clang packaging.

That's strange. Thanks for looking into this.

Alberto

Re: [RFC v2 02/10] Drop unused static function return values

2022-08-12 Thread Alberto Faria

On Wed, Aug 3, 2022 at 1:30 PM Peter Maydell  wrote:
> The problem with a patch like this is that it rolls up into a
> single patch changes to the API of many functions in multiple
> subsystems across the whole codebase. Some of those changes
> might be right; some might be wrong. No single person is going
> to be in a position to review the whole lot, and a +248-403
> patch email makes it very unwieldy to try to discuss.
>
> If you want to propose some of these I think you need to:
>  * split it out so that you're only suggesting changes in
>one subsystem at a time
>  * look at the places you are suggesting changes, to see if
>the correct answer is actually "add the missing error
>check in the caller(s)"
>  * not change places that are following standard API patterns
>like "return bool and have an Error** argument"

Sounds good. For now, I'll limit the changes to a few representative
cases e.g. in the block layer, since this patch is mostly intended as
a demonstration of the type of issue that the check catches. Once
there is agreement on the semantics for the check, I'll probably send
a separate tree-wide series with per-subsystem patches.

Thanks,
Alberto

Re: [PATCH for-7.2 v4 11/11] ppc/pnv: fix QOM parenting of user creatable root ports





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

User creatable root ports are being parented by the 'peripheral' or the
'peripheral-anon' container. This happens because this is the regular
QOM schema for sysbus devices that are added via the command line.

Let's make this QOM hierarchy similar to what we have with default root
ports, i.e. the root port must be parented by the pnv-root-bus. To do
that we change the qom and bus parent of the root port during
root_port_realize(). The realize() is shared by the default root port
code path, so we can remove the code inside pnv_phb_attach_root_port()
that was adding the root port as a child of the bus as well.

While we're at it, change pnv_phb_attach_root_port() to receive a PCIBus
instead of a PCIHostState to make it clear that the function does not
make use of the PHB.

Signed-off-by: Daniel Henrique Barboza 
---
  hw/pci-host/pnv_phb.c | 35 +++
  1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 17d9960aa1..65ed1f9eb4 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -51,27 +51,11 @@ static void pnv_parent_bus_fixup(DeviceState *parent, 
DeviceState *child,
  }
  }
  
-/*

- * Attach a root port device.
- *
- * 'index' will be used both as a PCIE slot value and to calculate
- * QOM id. 'chip_id' is going to be used as PCIE chassis for the
- * root port.
- */
-static void pnv_phb_attach_root_port(PCIHostState *pci)
+static void pnv_phb_attach_root_port(PCIBus *bus)
  {
  PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
-const char *dev_id = DEVICE(root)->id;
-g_autofree char *default_id = NULL;
-int index;
-
-index = object_property_get_int(OBJECT(pci->bus), "phb-id", &error_fatal);
-default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index);
-
-object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id,
-  OBJECT(root));
  
-pci_realize_and_unref(root, pci->bus, &error_fatal);

+pci_realize_and_unref(root, bus, &error_fatal);
  }
  
  /*

@@ -171,7 +155,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
  return;
  }
  
-pnv_phb_attach_root_port(pci);

+pnv_phb_attach_root_port(pci->bus);
  }
  
  static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge,

@@ -240,12 +224,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
  {
  PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
  PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev);
-PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev));
+BusState *qbus = qdev_get_parent_bus(dev);
+PCIBus *bus = PCI_BUS(qbus);
  PCIDevice *pci = PCI_DEVICE(dev);
  uint16_t device_id = 0;
  Error *local_err = NULL;
  int chip_id, index;
  
+/*

+ * 'index' will be used both as a PCIE slot value and to calculate
+ * QOM id. 'chip_id' is going to be used as PCIE chassis for the
+ * root port.
+ */
  chip_id = object_property_get_int(OBJECT(bus), "chip-id", &error_fatal);
  index = object_property_get_int(OBJECT(bus), "phb-id", &error_fatal);
  
@@ -253,6 +243,11 @@ static void pnv_phb_root_port_realize(DeviceState *dev, Error **errp)

  qdev_prop_set_uint8(dev, "chassis", chip_id);
  qdev_prop_set_uint16(dev, "slot", index);
  
+pnv_parent_qom_fixup(OBJECT(bus), OBJECT(dev), index);

+if (!qdev_set_parent_bus(dev, qbus, &error_fatal)) {



That line looks surprising at first, because we got qbus from 
qdev_get_parent_bus() just above, so it looks like a noop. I talked to 
Daniel about it: the device<->bus relationship is correct when entering 
the function but the call to pnv_parent_qom_fixup() interferes with the 
bus relationship, so it needs to be re-established.

Short of a better suggestion, it probably need a comment.

  Fred




+return;
+}
+
  rpc->parent_realize(dev, &local_err);
  if (local_err) {
  error_propagate(errp, local_err);

Re: [BUG] cxl can not create region

2022-08-12 Thread Dan Williams

Jonathan Cameron wrote:
> On Thu, 11 Aug 2022 18:08:57 +0100
> Jonathan Cameron via  wrote:
> 
> > On Tue, 9 Aug 2022 17:08:25 +0100
> > Jonathan Cameron  wrote:
> > 
> > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > Bobo WL  wrote:
> > >   
> > > > Hi Jonathan
> > > > 
> > > > Thanks for your reply!
> > > > 
> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > >  wrote:
> > > > >
> > > > > Probably not related to your problem, but there is a disconnect in 
> > > > > QEMU /
> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > has a single root port. Spec allows it to be provided or not as an 
> > > > > implementation choice.
> > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > >
> > > > > The temporary solution is to throw in a second root port on the HB 
> > > > > and not
> > > > > connect anything to it.  Longer term I may special case this so that 
> > > > > the particular
> > > > > decoder defaults to pass through settings in QEMU if there is only 
> > > > > one root port.
> > > > >  
> > > > 
> > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > region successfully.
> > > > But have some errors in Nvdimm:
> > > > 
> > > > [   74.925838] Unknown online node for memory at 0x100, 
> > > > assuming node 0
> > > > [   74.925846] Unknown target node for memory at 0x100, 
> > > > assuming node 0
> > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
> > > 
> > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list 
> > > to chase
> > > down. Once I reach this state I can verify the HDM Decode is correct 
> > > which is what
> > > I've been using to test (Which wasn't true until earlier this week). 
> > > I'm currently testing via devmem, more for historical reasons than 
> > > because it makes
> > > that much sense anymore.
> > 
> > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > I'd forgotten that was still on the todo list. I don't think it will
> > be particularly hard to do and will take a look in next few days.
> > 
> > Very very indirectly this error is causing a driver probe fail that means 
> > that
> > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > Should not have gotten near that path though - hence the problem is actually
> > when we call cxl_pmem_get_config_data() and it returns an error because
> > we haven't fully connected up the command in QEMU.
> 
> So a least one bug in QEMU. We were not supporting variable length payloads 
> on mailbox
> inputs (but were on outputs).  That hasn't mattered until we get to LSA 
> writes.
> We just need to relax condition on the supplied length.
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index c352a935c4..fdda9529fe 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
>  cxl_cmd = &cxl_cmd_set[set][cmd];
>  h = cxl_cmd->handler;
>  if (h) {
> -if (len == cxl_cmd->in) {
> +if (len == cxl_cmd->in || !cxl_cmd->in) {
>  cxl_cmd->payload = cxl_dstate->mbox_reg_state +
>  A_CXL_DEV_CMD_PAYLOAD;
>  ret = (*h)(cxl_cmd, cxl_dstate, &len);
> 
> 
> This lets the nvdimm/region probe fine, but I'm getting some issues with
> namespace capacity so I'll look at what is causing that next.
> Unfortunately I'm not that familiar with the driver/nvdimm side of things
> so it's take a while to figure out what kicks off what!

The whirlwind tour is that 'struct nd_region' instances that represent a
persitent memory address range are composed of one more mappings of
'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
the dimm (if locked) and interrogating the label area to look for
namespace labels.

The label command calls are routed to the '->ndctl()' callback that was
registered when the CXL nvdimm_bus_descriptor was created. That callback
handles both 'bus' scope calls, currently none for CXL, and per nvdimm
calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
to CXL commands.

The 'struct nvdimm' objects that the CXL side registers have the
NDD_LABELING flag set which means that namespaces need to be explicitly
created / provisioned from region capacity. Otherwise, if
drivers/nvdimm/dimm.c does not find a namespace-label-index block then
the region reverts to label-less mode and a default namespace equal to
the size of the region is instantiated.

If you are seeing small mismatches in namespace capacity then it may
just be the fact that by default 'ndctl create-namespace' results in an
'fsdax' mode namespace which just means that it is a block device where
1.5% of the capacity is reserved for 'struct page' metadata. You should
be able to see namespace capacity

Re: [PATCH for-7.2 v4 08/11] ppc/pnv: enable user created pnv-phb for powernv9





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

Enable pnv-phb user created devices for powernv9 now that we have
everything in place.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



Same comment as in patch 6 regarding the QOM relationship of the 
user-created root port.

Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb.c  | 2 +-
  hw/pci-host/pnv_phb4_pec.c | 6 --
  hw/ppc/pnv.c   | 2 ++
  3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 1f53ff77c5..17d9960aa1 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -167,7 +167,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
  pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
  }
  
-if (phb->version == 3 && !defaults_enabled()) {

+if (!defaults_enabled()) {
  return;
  }
  
diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c

index 8dc363d69c..9871f462cd 100644
--- a/hw/pci-host/pnv_phb4_pec.c
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -146,8 +146,10 @@ static void pnv_pec_realize(DeviceState *dev, Error **errp)
  pec->num_phbs = pecc->num_phbs[pec->index];
  
  /* Create PHBs if running with defaults */

-for (i = 0; i < pec->num_phbs; i++) {
-pnv_pec_default_phb_realize(pec, i, errp);
+if (defaults_enabled()) {
+for (i = 0; i < pec->num_phbs; i++) {
+pnv_pec_default_phb_realize(pec, i, errp);
+}
  }
  
  /* Initialize the XSCOM regions for the PEC registers */

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 0644f45a1d..ec0558ed1c 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2188,6 +2188,8 @@ static void pnv_machine_power9_class_init(ObjectClass 
*oc, void *data)
  pmc->compat = compat;
  pmc->compat_size = sizeof(compat);
  pmc->dt_power_mgt = pnv_dt_power_mgt;
+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
  }
  
  static void pnv_machine_power10_class_init(ObjectClass *oc, void *data)

Re: [BUG] cxl can not create region

2022-08-12 Thread Jonathan Cameron via

On Fri, 12 Aug 2022 09:03:02 -0700
Dan Williams  wrote:

> Jonathan Cameron wrote:
> > On Thu, 11 Aug 2022 18:08:57 +0100
> > Jonathan Cameron via  wrote:
> >   
> > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > Jonathan Cameron  wrote:
> > >   
> > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > Bobo WL  wrote:
> > > > 
> > > > > Hi Jonathan
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > >  wrote:  
> > > > > >
> > > > > > Probably not related to your problem, but there is a disconnect in 
> > > > > > QEMU /
> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB 
> > > > > > only
> > > > > > has a single root port. Spec allows it to be provided or not as an 
> > > > > > implementation choice.
> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > >
> > > > > > The temporary solution is to throw in a second root port on the HB 
> > > > > > and not
> > > > > > connect anything to it.  Longer term I may special case this so 
> > > > > > that the particular
> > > > > > decoder defaults to pass through settings in QEMU if there is only 
> > > > > > one root port.
> > > > > >
> > > > > 
> > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > region successfully.
> > > > > But have some errors in Nvdimm:
> > > > > 
> > > > > [   74.925838] Unknown online node for memory at 0x100, 
> > > > > assuming node 0
> > > > > [   74.925846] Unknown target node for memory at 0x100, 
> > > > > assuming node 0
> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe   
> > > > >
> > > > 
> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo 
> > > > list to chase
> > > > down. Once I reach this state I can verify the HDM Decode is correct 
> > > > which is what
> > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > I'm currently testing via devmem, more for historical reasons than 
> > > > because it makes
> > > > that much sense anymore.  
> > > 
> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > I'd forgotten that was still on the todo list. I don't think it will
> > > be particularly hard to do and will take a look in next few days.
> > > 
> > > Very very indirectly this error is causing a driver probe fail that means 
> > > that
> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > Should not have gotten near that path though - hence the problem is 
> > > actually
> > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > we haven't fully connected up the command in QEMU.  
> > 
> > So a least one bug in QEMU. We were not supporting variable length payloads 
> > on mailbox
> > inputs (but were on outputs).  That hasn't mattered until we get to LSA 
> > writes.
> > We just need to relax condition on the supplied length.
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index c352a935c4..fdda9529fe 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> >  cxl_cmd = &cxl_cmd_set[set][cmd];
> >  h = cxl_cmd->handler;
> >  if (h) {
> > -if (len == cxl_cmd->in) {
> > +if (len == cxl_cmd->in || !cxl_cmd->in) {
> >  cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> >  A_CXL_DEV_CMD_PAYLOAD;
> >  ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > 
> > 
> > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > namespace capacity so I'll look at what is causing that next.
> > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > so it's take a while to figure out what kicks off what!  
> 
> The whirlwind tour is that 'struct nd_region' instances that represent a
> persitent memory address range are composed of one more mappings of
> 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
> in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
> the dimm (if locked) and interrogating the label area to look for
> namespace labels.
> 
> The label command calls are routed to the '->ndctl()' callback that was
> registered when the CXL nvdimm_bus_descriptor was created. That callback
> handles both 'bus' scope calls, currently none for CXL, and per nvdimm
> calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
> to CXL commands.
> 
> The 'struct nvdimm' objects that the CXL side registers have the
> NDD_LABELING flag set which means that namespaces need to be explicitly
> created / provisioned from region capacity. Otherwise, if
> drivers/nvdimm/dimm.c does not find a namespace-label-index block then
> the region reverts to label-less mode and a default namespace equal to
> the size of the region is instantiated.

Re: [PATCH for-7.2 v4 06/11] ppc/pnv: enable user created pnv-phb for powernv8





On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

The bulk of the work was already done by previous patches.

Use defaults_enabled() to determine whether we need to create the
default devices or not.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---



The QOM relationship for user-created root port is not ideal, but it's 
addressed in the last patch of the series.

Reviewed-by: Frederic Barrat 

  Fred



  hw/pci-host/pnv_phb.c |  9 +++--
  hw/ppc/pnv.c  | 32 ++--
  2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 5dc44f45d1..1f53ff77c5 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -17,6 +17,7 @@
  #include "hw/ppc/pnv.h"
  #include "hw/qdev-properties.h"
  #include "qom/object.h"
+#include "sysemu/sysemu.h"
  
  
  /*

@@ -166,6 +167,10 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
  pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
  }
  
+if (phb->version == 3 && !defaults_enabled()) {

+return;
+}
+
  pnv_phb_attach_root_port(pci);
  }
  
@@ -201,7 +206,7 @@ static void pnv_phb_class_init(ObjectClass *klass, void *data)

  dc->realize = pnv_phb_realize;
  device_class_set_props(dc, pnv_phb_properties);
  set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
-dc->user_creatable = false;
+dc->user_creatable = true;
  }
  
  static void pnv_phb_root_port_reset(DeviceState *dev)

@@ -292,7 +297,7 @@ static void pnv_phb_root_port_class_init(ObjectClass 
*klass, void *data)
  device_class_set_parent_reset(dc, pnv_phb_root_port_reset,
&rpc->parent_reset);
  dc->reset = &pnv_phb_root_port_reset;
-dc->user_creatable = false;
+dc->user_creatable = true;
  
  k->vendor_id = PCI_VENDOR_ID_IBM;

  /* device_id will be written during realize() */
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index e53e9e297d..e82d6522f0 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1178,20 +1178,22 @@ static void pnv_chip_power8_instance_init(Object *obj)
  
  object_initialize_child(obj, "homer", &chip8->homer, TYPE_PNV8_HOMER);
  
-chip8->num_phbs = pcc->num_phbs;

-
-for (i = 0; i < chip8->num_phbs; i++) {
-Object *phb = object_new(TYPE_PNV_PHB);
-
-/*
- * We need the chip to parent the PHB to allow the DT
- * to build correctly (via pnv_xscom_dt()).
- *
- * TODO: the PHB should be parented by a PEC device that, at
- * this moment, is not modelled powernv8/phb3.
- */
-object_property_add_child(obj, "phb[*]", phb);
-chip8->phbs[i] = PNV_PHB(phb);
+if (defaults_enabled()) {
+chip8->num_phbs = pcc->num_phbs;
+
+for (i = 0; i < chip8->num_phbs; i++) {
+Object *phb = object_new(TYPE_PNV_PHB);
+
+/*
+ * We need the chip to parent the PHB to allow the DT
+ * to build correctly (via pnv_xscom_dt()).
+ *
+ * TODO: the PHB should be parented by a PEC device that, at
+ * this moment, is not modelled powernv8/phb3.
+ */
+object_property_add_child(obj, "phb[*]", phb);
+chip8->phbs[i] = PNV_PHB(phb);
+}
  }
  
  }

@@ -2130,6 +2132,8 @@ static void pnv_machine_power8_class_init(ObjectClass 
*oc, void *data)
  
  pmc->compat = compat;

  pmc->compat_size = sizeof(compat);
+
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_PNV_PHB);
  }
  
  static void pnv_machine_power9_class_init(ObjectClass *oc, void *data)

Re: [RFC v2 02/10] Drop unused static function return values

2022-08-12 Thread Alberto Faria

On Wed, Aug 3, 2022 at 12:15 PM Richard W.M. Jones  wrote:
> If it helps to think about this, Coverity checks for consistency.
> Across the whole code base, is the return value of a function used or
> ignored consistently.  You will see Coverity errors like:
>
>   Error: CHECKED_RETURN (CWE-252): [#def37]
>   libnbd-1.12.5/fuse/operations.c:180: check_return: Calling "nbd_poll" 
> without checking return value (as is done elsewhere 5 out of 6 times).
>   libnbd-1.12.5/examples/aio-connect-read.c:96: example_checked: Example 
> 1: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == -1".
>   libnbd-1.12.5/examples/aio-connect-read.c:128: example_checked: Example 
> 2: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == -1".
>   libnbd-1.12.5/examples/strict-structured-reads.c:246: example_checked: 
> Example 3: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == 
> -1".
>   libnbd-1.12.5/ocaml/nbd-c.c:2599: example_assign: Example 4: Assigning: 
> "r" = return value from "nbd_poll(h, timeout)".
>   libnbd-1.12.5/ocaml/nbd-c.c:2602: example_checked: Example 4 (cont.): 
> "r" has its value checked in "r == -1".
>   libnbd-1.12.5/python/methods.c:2806: example_assign: Example 5: 
> Assigning: "ret" = return value from "nbd_poll(h, timeout)".
>   libnbd-1.12.5/python/methods.c:2808: example_checked: Example 5 
> (cont.): "ret" has its value checked in "ret == -1".
>   #  178|   /* Dispatch work while there are commands in flight. */
>   #  179|   while (thread->in_flight > 0)
>   #  180|->   nbd_poll (h, -1);
>   #  181| }
>   #  182|
>
> What it's saying is that in this code base, nbd_poll's return value
> was checked by the caller 5 out of 6 times, but ignored here.  (This
> turned out to be a real bug which we fixed).
>
> It seems like the check implemented in your patch is: If the return
> value is used 0 times anywhere in the code base, change the return
> value to 'void'.  Coverity would not flag this.
>
> Maybe a consistent use check is better?

Note that the analyzer is currently limited to analyzing a single
translation unit at a time, so we would only be able to implement a
consistent use check for static functions (this is why the current
"return-value-never-used" check only applies to static functions).

It may be worthwhile exploring cross-translation unit analysis,
although it may be difficult to accomplish while also avoiding
reanalyzing the entire tree every time a single translation unit is
modified.

Alberto

Re: [PATCH 00/62] target/arm: Implement FEAT_HAFDBS

On Sun, 3 Jul 2022 at 09:25, Richard Henderson
 wrote:
>
> This is a major reorg to arm page table walking.  While the result
> here is "merely" Hardware-assited Access Flag and Dirty Bit Setting
> (HAFDBS), the ultimate goal is the Realm Management Extension (RME).
> RME "recommends" that HAFDBS be implemented (I_CSLWZ).

> Richard Henderson (62):
>   accel/tcg: Introduce PageEntryExtra
>   target/arm: Enable PageEntryExtra
>   target/arm: Fix MTE check in sve_ldnfff1_r
>   target/arm: Record tagged bit for user-only in sve_probe_page
>   target/arm: Use PageEntryExtra for MTE
>   target/arm: Use PageEntryExtra for BTI
>   include/exec: Remove target_tlb_bitN from MemTxAttrs
>   target/arm: Create GetPhysAddrResult
>   target/arm: Fix ipa_secure in get_phys_addr
>   target/arm: Use GetPhysAddrResult in get_phys_addr_lpae
>   target/arm: Use GetPhysAddrResult in get_phys_addr_v6
>   target/arm: Use GetPhysAddrResult in get_phys_addr_v5
>   target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav5
>   target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav7
>   target/arm: Use GetPhysAddrResult in get_phys_addr_pmsav8
>   target/arm: Use GetPhysAddrResult in pmsav8_mpu_lookup
>   target/arm: Remove is_subpage argument to pmsav8_mpu_lookup
>   target/arm: Add is_secure parameter to v8m_security_lookup
>   target/arm: Add is_secure parameter to pmsav8_mpu_lookup
>   target/arm: Add is_secure parameter to get_phys_addr_v5
>   target/arm: Add is_secure parameter to get_phys_addr_v6
>   target/arm: Add secure parameter to get_phys_addr_pmsav8
>   target/arm: Add is_secure parameter to pmsav7_use_background_region
>   target/arm: Add is_secure parameter to get_phys_addr_lpae
>   target/arm: Add is_secure parameter to get_phys_addr_pmsav7
>   target/arm: Add is_secure parameter to regime_translation_disabled
>   target/arm: Add is_secure parameter to get_phys_addr_pmsav5

Is it possible to rearrange this patchset so the easy
refactoring patches that do "use a struct to return
values from get_phys_addr and friends" are at the front
(ie before the stuff that touches core code) ?
That way they're easy to take into the tree early while
the rest of the series is still under review...

thanks
-- PMM

[PATCH] target/arm: Rearrange cpu64.c so all the CPU initfns are together

cpu64.c has ended up in a slightly odd order -- it starts with the
initfns for most of the models-real-hardware CPUs; after that comes a
bunch of support code for SVE, SME, pauth and LPA2 properties.  Then
come the initfns for the 'host' and 'max' CPU types, and then after
that one more models-real-hardware CPU initfn, for a64fx.  (This
ordering is partly historical and partly required because a64fx needs
the SVE properties.)

Reorder the file into:
 * CPU property support functions
 * initfns for real hardware CPUs
 * initfns for host and max
 * class boilerplate

Signed-off-by: Peter Maydell 
---
I started off thinking this would be a relatively simple "move the
a64fx initfn up to live with the others", but because we effectively
have to move all the cpu initfns the diffstat has ended up quite big.
On the other hand this patch is purely code motion, and the resulting
order in the file does seem to me to be more sensible.
---
 target/arm/cpu64.c | 712 ++---
 1 file changed, 356 insertions(+), 356 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 78e27f778ac..741f959fbe8 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -37,313 +37,6 @@
 #include "internals.h"
 
 
-static void aarch64_a57_initfn(Object *obj)
-{
-ARMCPU *cpu = ARM_CPU(obj);
-
-cpu->dtb_compatible = "arm,cortex-a57";
-set_feature(&cpu->env, ARM_FEATURE_V8);
-set_feature(&cpu->env, ARM_FEATURE_NEON);
-set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
-set_feature(&cpu->env, ARM_FEATURE_AARCH64);
-set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-set_feature(&cpu->env, ARM_FEATURE_EL2);
-set_feature(&cpu->env, ARM_FEATURE_EL3);
-set_feature(&cpu->env, ARM_FEATURE_PMU);
-cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A57;
-cpu->midr = 0x411fd070;
-cpu->revidr = 0x;
-cpu->reset_fpsid = 0x41034070;
-cpu->isar.mvfr0 = 0x10110222;
-cpu->isar.mvfr1 = 0x1211;
-cpu->isar.mvfr2 = 0x0043;
-cpu->ctr = 0x8444c004;
-cpu->reset_sctlr = 0x00c50838;
-cpu->isar.id_pfr0 = 0x0131;
-cpu->isar.id_pfr1 = 0x00011011;
-cpu->isar.id_dfr0 = 0x03010066;
-cpu->id_afr0 = 0x;
-cpu->isar.id_mmfr0 = 0x10101105;
-cpu->isar.id_mmfr1 = 0x4000;
-cpu->isar.id_mmfr2 = 0x0126;
-cpu->isar.id_mmfr3 = 0x02102211;
-cpu->isar.id_isar0 = 0x02101110;
-cpu->isar.id_isar1 = 0x13112111;
-cpu->isar.id_isar2 = 0x21232042;
-cpu->isar.id_isar3 = 0x01112131;
-cpu->isar.id_isar4 = 0x00011142;
-cpu->isar.id_isar5 = 0x00011121;
-cpu->isar.id_isar6 = 0;
-cpu->isar.id_aa64pfr0 = 0x;
-cpu->isar.id_aa64dfr0 = 0x10305106;
-cpu->isar.id_aa64isar0 = 0x00011120;
-cpu->isar.id_aa64mmfr0 = 0x1124;
-cpu->isar.dbgdidr = 0x3516d000;
-cpu->isar.dbgdevid = 0x01110f13;
-cpu->isar.dbgdevid1 = 0x2;
-cpu->isar.reset_pmcr_el0 = 0x41013000;
-cpu->clidr = 0x0a200023;
-cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
-cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
-cpu->ccsidr[2] = 0x70ffe07a; /* 2048KB L2 cache */
-cpu->dcz_blocksize = 4; /* 64 bytes */
-cpu->gic_num_lrs = 4;
-cpu->gic_vpribits = 5;
-cpu->gic_vprebits = 5;
-cpu->gic_pribits = 5;
-define_cortex_a72_a57_a53_cp_reginfo(cpu);
-}
-
-static void aarch64_a53_initfn(Object *obj)
-{
-ARMCPU *cpu = ARM_CPU(obj);
-
-cpu->dtb_compatible = "arm,cortex-a53";
-set_feature(&cpu->env, ARM_FEATURE_V8);
-set_feature(&cpu->env, ARM_FEATURE_NEON);
-set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
-set_feature(&cpu->env, ARM_FEATURE_AARCH64);
-set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-set_feature(&cpu->env, ARM_FEATURE_EL2);
-set_feature(&cpu->env, ARM_FEATURE_EL3);
-set_feature(&cpu->env, ARM_FEATURE_PMU);
-cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A53;
-cpu->midr = 0x410fd034;
-cpu->revidr = 0x;
-cpu->reset_fpsid = 0x41034070;
-cpu->isar.mvfr0 = 0x10110222;
-cpu->isar.mvfr1 = 0x1211;
-cpu->isar.mvfr2 = 0x0043;
-cpu->ctr = 0x84448004; /* L1Ip = VIPT */
-cpu->reset_sctlr = 0x00c50838;
-cpu->isar.id_pfr0 = 0x0131;
-cpu->isar.id_pfr1 = 0x00011011;
-cpu->isar.id_dfr0 = 0x03010066;
-cpu->id_afr0 = 0x;
-cpu->isar.id_mmfr0 = 0x10101105;
-cpu->isar.id_mmfr1 = 0x4000;
-cpu->isar.id_mmfr2 = 0x0126;
-cpu->isar.id_mmfr3 = 0x02102211;
-cpu->isar.id_isar0 = 0x02101110;
-cpu->isar.id_isar1 = 0x13112111;
-cpu->isar.id_isar2 = 0x21232042;
-cpu->isar.id_isar3 = 0x01112131;
-cpu->isar.id_isar4 = 0x00011142;
-cpu->isar.id_isar5 = 0x00011121;
-cpu->isar.id_isar6 = 0;
-cpu->isar.id_aa64pfr0 = 0x;
-cpu->isar.id_aa64dfr0 = 0x10305106;
-cpu->isar.id_aa64isar0 = 0x00011120;
-cpu->isar.id_aa64mmfr0 = 0x1122; /* 40 bit physical addr */
-cpu->isar.dbgdidr = 0x3516d000;
-

Re: [PULL 0/1] Linux user for 7.1 patches


On 8/12/22 03:33, Laurent Vivier wrote:

The following changes since commit a6b1c53e79d08a99a28cc3e67a3e1a7c34102d6b:

   Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-08-10 10:26:57 -0700)

are available in the Git repository at:

   https://gitlab.com/laurent_vivier/qemu.git 
tags/linux-user-for-7.1-pull-request

for you to fetch changes up to dbbf89751b14aa5d281bad3af273e9ffaae82262:

   linux-user/aarch64: Reset target data on MADV_DONTNEED (2022-08-11 11:34:17 
+0200)


Pull request linux-user 20220812


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~






Vitaly Buka (1):
   linux-user/aarch64: Reset target data on MADV_DONTNEED

  accel/tcg/translate-all.c | 26 ++
  include/exec/cpu-all.h|  1 +
  linux-user/mmap.c |  3 +++
  3 files changed, 30 insertions(+)

Re: [PATCH 00/62] target/arm: Implement FEAT_HAFDBS


On 8/12/22 09:31, Peter Maydell wrote:

Is it possible to rearrange this patchset so the easy
refactoring patches that do "use a struct to return
values from get_phys_addr and friends" are at the front
(ie before the stuff that touches core code) ?
That way they're easy to take into the tree early while
the rest of the series is still under review...


Yes, I think so.


r~

Re: [PATCH v3 1/4] accel/tcg: Invalidate translations when clearing PAGE_EXEC


On 8/12/22 08:02, Ilya Leoshkevich wrote:

tb_lookup() skips get_page_addr_code() if tb is found in tb_jmp_cache.
I assume it's a bug?


Yes, I think so.  I've rearranged that for other reasons, and so may have inadvertently 
fix this.  I'll post the in-progress work in a moment.



r~

[PATCH for-7.2 00/21] accel/tcg: minimize tlb lookups during translate + user-only PROT_EXEC fixes

This is part of a larger body of work, but in the process of
reorganizing I was reminded that PROT_EXEC wasn't being enforced
properly for user-only.  As this has come up in the context of
some of Ilya's patches, I thought I'd go ahead and post this part.


r~


Ilya Leoshkevich (1):
  accel/tcg: Introduce is_same_page()

Richard Henderson (20):
  linux-user/arm: Mark the commpage executable
  linux-user/hppa: Allocate page zero as a commpage
  linux-user/x86_64: Allocate vsyscall page as a commpage
  linux-user: Honor PT_GNU_STACK
  tests/tcg/i386: Move smc_code2 to an executable section
  accel/tcg: Remove PageDesc code_bitmap
  accel/tcg: Use bool for page_find_alloc
  accel/tcg: Merge tb_htable_lookup into caller
  accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c
  accel/tcg: Properly implement get_page_addr_code for user-only
  accel/tcg: Use probe_access_internal for softmmu
get_page_addr_code_hostp
  accel/tcg: Add nofault parameter to get_page_addr_code_hostp
  accel/tcg: Unlock mmap_lock after longjmp
  accel/tcg: Hoist get_page_addr_code out of tb_lookup
  accel/tcg: Hoist get_page_addr_code out of tb_gen_code
  accel/tcg: Raise PROT_EXEC exception early
  accel/tcg: Remove translator_ldsw
  accel/tcg: Add pc and host_pc params to gen_intermediate_code
  accel/tcg: Add fast path for translator_ld*
  accel/tcg: Use DisasContextBase in plugin_gen_tb_start

 accel/tcg/internal.h  |   7 +-
 include/elf.h |   1 +
 include/exec/cpu-common.h |   1 +
 include/exec/exec-all.h   |  87 +---
 include/exec/plugin-gen.h |   7 +-
 include/exec/translator.h |  85 
 linux-user/arm/target_cpu.h   |   4 +-
 linux-user/qemu.h |   1 +
 accel/tcg/cpu-exec.c  | 184 ++
 accel/tcg/cputlb.c|  93 +
 accel/tcg/plugin-gen.c|  23 +++--
 accel/tcg/translate-all.c | 120 --
 accel/tcg/translator.c| 122 +-
 accel/tcg/user-exec.c |  15 +++
 linux-user/elfload.c  |  80 ++-
 softmmu/physmem.c |  12 +++
 target/alpha/translate.c  |   5 +-
 target/arm/translate.c|   5 +-
 target/avr/translate.c|   5 +-
 target/cris/translate.c   |   5 +-
 target/hexagon/translate.c|   6 +-
 target/hppa/translate.c   |   5 +-
 target/i386/tcg/translate.c   |   7 +-
 target/loongarch/translate.c  |   6 +-
 target/m68k/translate.c   |   5 +-
 target/microblaze/translate.c |   5 +-
 target/mips/tcg/translate.c   |   5 +-
 target/nios2/translate.c  |   5 +-
 target/openrisc/translate.c   |   6 +-
 target/ppc/translate.c|   5 +-
 target/riscv/translate.c  |   5 +-
 target/rx/translate.c |   5 +-
 target/s390x/tcg/translate.c  |   5 +-
 target/sh4/translate.c|   5 +-
 target/sparc/translate.c  |   5 +-
 target/tricore/translate.c|   6 +-
 target/xtensa/translate.c |   6 +-
 tests/tcg/i386/test-i386.c|   2 +-
 38 files changed, 532 insertions(+), 424 deletions(-)

-- 
2.34.1

[PATCH for-7.2 05/21] tests/tcg/i386: Move smc_code2 to an executable section

We're about to start validating PAGE_EXEC, which means
that we've got to put this code into a section that is
both writable and executable.

Note that this test did not run on hardware beforehand either.

Signed-off-by: Richard Henderson 
---
 tests/tcg/i386/test-i386.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/i386/test-i386.c b/tests/tcg/i386/test-i386.c
index ac8d5a3c1f..e6b308a2c0 100644
--- a/tests/tcg/i386/test-i386.c
+++ b/tests/tcg/i386/test-i386.c
@@ -1998,7 +1998,7 @@ uint8_t code[] = {
 0xc3, /* ret */
 };
 
-asm(".section \".data\"\n"
+asm(".section \".data_x\",\"awx\"\n"
 "smc_code2:\n"
 "movl 4(%esp), %eax\n"
 "movl %eax, smc_patch_addr2 + 1\n"
-- 
2.34.1

[PATCH for-7.2 01/21] linux-user/arm: Mark the commpage executable

We're about to start validating PAGE_EXEC, which means
that we've got to mark the commpage executable.  We had
been placing the commpage outside of reserved_va, which
was incorrect and lead to an abort.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/target_cpu.h | 4 ++--
 linux-user/elfload.c| 6 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/linux-user/arm/target_cpu.h b/linux-user/arm/target_cpu.h
index 709d19bc9e..89ba274cfc 100644
--- a/linux-user/arm/target_cpu.h
+++ b/linux-user/arm/target_cpu.h
@@ -34,9 +34,9 @@ static inline unsigned long arm_max_reserved_va(CPUState *cs)
 } else {
 /*
  * We need to be able to map the commpage.
- * See validate_guest_space in linux-user/elfload.c.
+ * See init_guest_commpage in linux-user/elfload.c.
  */
-return 0xul;
+return 0xul;
 }
 }
 #define MAX_RESERVED_VA  arm_max_reserved_va
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ce902dbd56..3e3dc02499 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -398,7 +398,8 @@ enum {
 
 static bool init_guest_commpage(void)
 {
-void *want = g2h_untagged(HI_COMMPAGE & -qemu_host_page_size);
+abi_ptr commpage = HI_COMMPAGE & -qemu_host_page_size;
+void *want = g2h_untagged(commpage);
 void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
   MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
 
@@ -417,6 +418,9 @@ static bool init_guest_commpage(void)
 perror("Protecting guest commpage");
 exit(EXIT_FAILURE);
 }
+
+page_set_flags(commpage, commpage + qemu_host_page_size,
+   PAGE_READ | PAGE_EXEC | PAGE_VALID);
 return true;
 }
 
-- 
2.34.1

[PATCH for-7.2 18/21] accel/tcg: Remove translator_ldsw

The only user can easily use translator_lduw and
adjust the type to signed during the return.

Signed-off-by: Richard Henderson 
---
 include/exec/translator.h   | 1 -
 target/i386/tcg/translate.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 0d0bf3a31e..45b9268ca4 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -178,7 +178,6 @@ bool translator_use_goto_tb(DisasContextBase *db, 
target_ulong dest);
 
 #define FOR_EACH_TRANSLATOR_LD(F)   \
 F(translator_ldub, uint8_t, cpu_ldub_code, /* no swap */)   \
-F(translator_ldsw, int16_t, cpu_ldsw_code, bswap16) \
 F(translator_lduw, uint16_t, cpu_lduw_code, bswap16)\
 F(translator_ldl, uint32_t, cpu_ldl_code, bswap32)  \
 F(translator_ldq, uint64_t, cpu_ldq_code, bswap64)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index b7972f0ff5..a23417d058 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2033,7 +2033,7 @@ static inline uint8_t x86_ldub_code(CPUX86State *env, 
DisasContext *s)
 
 static inline int16_t x86_ldsw_code(CPUX86State *env, DisasContext *s)
 {
-return translator_ldsw(env, &s->base, advance_pc(env, s, 2));
+return translator_lduw(env, &s->base, advance_pc(env, s, 2));
 }
 
 static inline uint16_t x86_lduw_code(CPUX86State *env, DisasContext *s)
-- 
2.34.1

[PATCH for-7.2 04/21] linux-user: Honor PT_GNU_STACK

Map the stack executable if required by default or on demand.

Signed-off-by: Richard Henderson 
---
 include/elf.h|  1 +
 linux-user/qemu.h|  1 +
 linux-user/elfload.c | 19 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/elf.h b/include/elf.h
index 3a4bcb646a..3d6b9062c0 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -31,6 +31,7 @@ typedef int64_t  Elf64_Sxword;
 #define PT_LOPROC  0x7000
 #define PT_HIPROC  0x7fff
 
+#define PT_GNU_STACK  (PT_LOOS + 0x474e551)
 #define PT_GNU_PROPERTY   (PT_LOOS + 0x474e553)
 
 #define PT_MIPS_REGINFO   0x7000
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 7d90de1b15..e2e93fbd1d 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -48,6 +48,7 @@ struct image_info {
 uint32_telf_flags;
 int personality;
 abi_ulong   alignment;
+boolexec_stack;
 
 /* Generic semihosting knows about these pointers. */
 abi_ulong   arg_strings;   /* strings for argv */
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index e315155dad..b1169ca6df 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -232,6 +232,7 @@ static bool init_guest_commpage(void)
 #define ELF_ARCHEM_386
 
 #define ELF_PLATFORM get_elf_platform()
+#define EXSTACK_DEFAULT true
 
 static const char *get_elf_platform(void)
 {
@@ -308,6 +309,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUX86State *en
 
 #define ELF_ARCHEM_ARM
 #define ELF_CLASS   ELFCLASS32
+#define EXSTACK_DEFAULT true
 
 static inline void init_thread(struct target_pt_regs *regs,
struct image_info *infop)
@@ -776,6 +778,7 @@ static inline void init_thread(struct target_pt_regs *regs,
 #else
 
 #define ELF_CLASS   ELFCLASS32
+#define EXSTACK_DEFAULT true
 
 #endif
 
@@ -973,6 +976,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUPPCState *en
 
 #define ELF_CLASS   ELFCLASS64
 #define ELF_ARCHEM_LOONGARCH
+#define EXSTACK_DEFAULT true
 
 #define elf_check_arch(x) ((x) == EM_LOONGARCH)
 
@@ -1068,6 +1072,7 @@ static uint32_t get_elf_hwcap(void)
 #define ELF_CLASS   ELFCLASS32
 #endif
 #define ELF_ARCHEM_MIPS
+#define EXSTACK_DEFAULT true
 
 #ifdef TARGET_ABI_MIPSN32
 #define elf_check_abi(x) ((x) & EF_MIPS_ABI2)
@@ -1806,6 +1811,10 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 #define bswaptls(ptr) bswap32s(ptr)
 #endif
 
+#ifndef EXSTACK_DEFAULT
+#define EXSTACK_DEFAULT false
+#endif
+
 #include "elf.h"
 
 /* We must delay the following stanzas until after "elf.h". */
@@ -2081,6 +2090,7 @@ static abi_ulong setup_arg_pages(struct linux_binprm 
*bprm,
  struct image_info *info)
 {
 abi_ulong size, error, guard;
+int prot;
 
 size = guest_stack_size;
 if (size < STACK_LOWER_LIMIT) {
@@ -2091,7 +2101,11 @@ static abi_ulong setup_arg_pages(struct linux_binprm 
*bprm,
 guard = qemu_real_host_page_size();
 }
 
-error = target_mmap(0, size + guard, PROT_READ | PROT_WRITE,
+prot = PROT_READ | PROT_WRITE;
+if (info->exec_stack) {
+prot |= PROT_EXEC;
+}
+error = target_mmap(0, size + guard, prot,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
 if (error == -1) {
 perror("mmap stack");
@@ -2919,6 +2933,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  */
 loaddr = -1, hiaddr = 0;
 info->alignment = 0;
+info->exec_stack = EXSTACK_DEFAULT;
 for (i = 0; i < ehdr->e_phnum; ++i) {
 struct elf_phdr *eppnt = phdr + i;
 if (eppnt->p_type == PT_LOAD) {
@@ -2961,6 +2976,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 if (!parse_elf_properties(image_fd, info, eppnt, bprm_buf, &err)) {
 goto exit_errmsg;
 }
+} else if (eppnt->p_type == PT_GNU_STACK) {
+info->exec_stack = eppnt->p_flags & PF_X;
 }
 }
 
-- 
2.34.1

[PATCH for-7.2 02/21] linux-user/hppa: Allocate page zero as a commpage

We're about to start validating PAGE_EXEC, which means that we've
got to mark page zero executable.  We had been special casing this
entirely within translate.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 3e3dc02499..29d910c4cc 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1646,6 +1646,34 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 regs->gr[31] = infop->entry;
 }
 
+#define LO_COMMPAGE  0
+
+static bool init_guest_commpage(void)
+{
+void *want = g2h_untagged(LO_COMMPAGE);
+void *addr = mmap(want, qemu_host_page_size, PROT_NONE,
+  MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+
+if (addr == MAP_FAILED) {
+perror("Allocating guest commpage");
+exit(EXIT_FAILURE);
+}
+if (addr != want) {
+return false;
+}
+
+/*
+ * On Linux, page zero is normally marked execute only + gateway.
+ * Normal read or write is supposed to fail (thus PROT_NONE above),
+ * but specific offsets have kernel code mapped to raise permissions
+ * and implement syscalls.  Here, simply mark the page executable.
+ * Special case the entry points during translation (see do_page_zero).
+ */
+page_set_flags(LO_COMMPAGE, LO_COMMPAGE + TARGET_PAGE_SIZE,
+   PAGE_EXEC | PAGE_VALID);
+return true;
+}
+
 #endif /* TARGET_HPPA */
 
 #ifdef TARGET_XTENSA
@@ -2326,12 +2354,12 @@ static abi_ulong create_elf_tables(abi_ulong p, int 
argc, int envc,
 }
 
 #if defined(HI_COMMPAGE)
-#define LO_COMMPAGE 0
+#define LO_COMMPAGE -1
 #elif defined(LO_COMMPAGE)
 #define HI_COMMPAGE 0
 #else
 #define HI_COMMPAGE 0
-#define LO_COMMPAGE 0
+#define LO_COMMPAGE -1
 #define init_guest_commpage() true
 #endif
 
@@ -2555,7 +2583,7 @@ static void pgb_static(const char *image_name, abi_ulong 
orig_loaddr,
 } else {
 offset = -(HI_COMMPAGE & -align);
 }
-} else if (LO_COMMPAGE != 0) {
+} else if (LO_COMMPAGE != -1) {
 loaddr = MIN(loaddr, LO_COMMPAGE & -align);
 }
 
-- 
2.34.1

[PATCH for-7.2 03/21] linux-user/x86_64: Allocate vsyscall page as a commpage

We're about to start validating PAGE_EXEC, which means that we've
got to the vsyscall page executable.  We had been special casing
this entirely within translate.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 29d910c4cc..e315155dad 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -195,6 +195,27 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUX86State *en
 (*regs)[26] = tswapreg(env->segs[R_GS].selector & 0x);
 }
 
+#define HI_COMMPAGE  TARGET_VSYSCALL_PAGE
+
+static bool init_guest_commpage(void)
+{
+/*
+ * The vsyscall page is at a high negative address aka kernel space,
+ * which means that we cannot actually allocate it with target_mmap.
+ * We still should be able to use page_set_flags, unless the user
+ * has specified -R reserved_va, which would trigger an assert().
+ */
+if (reserved_va != 0 &&
+TARGET_VSYSCALL_PAGE + TARGET_PAGE_SIZE >= reserved_va) {
+error_report("Cannot allocate vsyscall page");
+exit(EXIT_FAILURE);
+}
+page_set_flags(TARGET_VSYSCALL_PAGE,
+   TARGET_VSYSCALL_PAGE + TARGET_PAGE_SIZE,
+   PAGE_EXEC | PAGE_VALID);
+return true;
+}
+
 #else
 
 #define ELF_START_MMAP 0x8000
-- 
2.34.1

[PATCH for-7.2 08/21] accel/tcg: Merge tb_htable_lookup into caller

This function is used only once, so merge it into
its only caller, tb_lookup.  This requires moving
the support routine, tb_lookup_cmp, and its private
data structure, tb_desc, up in the file.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h |   3 -
 accel/tcg/cpu-exec.c| 134 +++-
 2 files changed, 64 insertions(+), 73 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 311e5fb422..e7e30d55b8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -552,9 +552,6 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, 
MemTxAttrs attrs);
 #endif
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
-TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
-   target_ulong cs_base, uint32_t flags,
-   uint32_t cflags);
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index a565a3f8ec..f6c0c0aff6 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -170,19 +170,60 @@ uint32_t curr_cflags(CPUState *cpu)
 return cflags;
 }
 
-/* Might cause an exception, so have a longjmp destination ready */
-static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
-  target_ulong cs_base,
-  uint32_t flags, uint32_t cflags)
+struct tb_desc {
+target_ulong pc;
+target_ulong cs_base;
+CPUArchState *env;
+tb_page_addr_t phys_page1;
+uint32_t flags;
+uint32_t cflags;
+uint32_t trace_vcpu_dstate;
+};
+
+static bool tb_lookup_cmp(const void *p, const void *d)
 {
+const TranslationBlock *tb = p;
+const struct tb_desc *desc = d;
+
+if (tb->pc == desc->pc &&
+tb->page_addr[0] == desc->phys_page1 &&
+tb->cs_base == desc->cs_base &&
+tb->flags == desc->flags &&
+tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
+tb_cflags(tb) == desc->cflags) {
+/* check next page if needed */
+if (tb->page_addr[1] == -1) {
+return true;
+} else {
+tb_page_addr_t phys_page2;
+target_ulong virt_page2;
+
+virt_page2 = (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+phys_page2 = get_page_addr_code(desc->env, virt_page2);
+if (tb->page_addr[1] == phys_page2) {
+return true;
+}
+}
+}
+return false;
+}
+
+/* Might cause an exception, so have a longjmp destination ready */
+static TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
+   target_ulong cs_base,
+   uint32_t flags, uint32_t cflags)
+{
+CPUArchState *env = cpu->env_ptr;
 TranslationBlock *tb;
-uint32_t hash;
+tb_page_addr_t phys_pc;
+struct tb_desc desc;
+uint32_t jmp_hash, tb_hash;
 
 /* we should never be trying to look up an INVALID tb */
 tcg_debug_assert(!(cflags & CF_INVALID));
 
-hash = tb_jmp_cache_hash_func(pc);
-tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash]);
+jmp_hash = tb_jmp_cache_hash_func(pc);
+tb = qatomic_rcu_read(&cpu->tb_jmp_cache[jmp_hash]);
 
 if (likely(tb &&
tb->pc == pc &&
@@ -192,11 +233,25 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, 
target_ulong pc,
tb_cflags(tb) == cflags)) {
 return tb;
 }
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cflags);
+
+desc.env = env;
+desc.cs_base = cs_base;
+desc.flags = flags;
+desc.cflags = cflags;
+desc.trace_vcpu_dstate = *cpu->trace_dstate;
+desc.pc = pc;
+phys_pc = get_page_addr_code(desc.env, pc);
+if (phys_pc == -1) {
+return NULL;
+}
+desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
+tb_hash = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
+tb = qht_lookup_custom(&tb_ctx.htable, &desc, tb_hash, tb_lookup_cmp);
 if (tb == NULL) {
 return NULL;
 }
-qatomic_set(&cpu->tb_jmp_cache[hash], tb);
+
+qatomic_set(&cpu->tb_jmp_cache[jmp_hash], tb);
 return tb;
 }
 
@@ -487,67 +542,6 @@ void cpu_exec_step_atomic(CPUState *cpu)
 end_exclusive();
 }
 
-struct tb_desc {
-target_ulong pc;
-target_ulong cs_base;
-CPUArchState *env;
-tb_page_addr_t phys_page1;
-uint32_t flags;
-uint32_t cflags;
-uint32_t trace_vcpu_dstate;
-};
-
-static bool tb_lookup_cmp(const void *p, const void *d)
-{
-const TranslationBlock *tb = p;
-const struct tb_desc *desc = d;
-
-if (tb->pc == desc->pc &&
-tb->page_addr[0] == desc->phys_page1 &&
-tb->cs_base == desc->cs_base &&
-tb->flags == desc->flags &&
-

[PATCH for-7.2 07/21] accel/tcg: Use bool for page_find_alloc

Bool is more appropriate type for the alloc parameter.

Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index cf99b2b876..65a23f47d6 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -464,7 +464,7 @@ void page_init(void)
 #endif
 }
 
-static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
+static PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
 {
 PageDesc *pd;
 void **lp;
@@ -532,11 +532,11 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 
 static inline PageDesc *page_find(tb_page_addr_t index)
 {
-return page_find_alloc(index, 0);
+return page_find_alloc(index, false);
 }
 
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-   PageDesc **ret_p2, tb_page_addr_t phys2, int alloc);
+   PageDesc **ret_p2, tb_page_addr_t phys2, bool 
alloc);
 
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
@@ -650,7 +650,7 @@ static inline void page_unlock(PageDesc *pd)
 /* lock the page(s) of a TB in the correct acquisition order */
 static inline void page_lock_tb(const TranslationBlock *tb)
 {
-page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], 0);
+page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], false);
 }
 
 static inline void page_unlock_tb(const TranslationBlock *tb)
@@ -839,7 +839,7 @@ void page_collection_unlock(struct page_collection *set)
 #endif /* !CONFIG_USER_ONLY */
 
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-   PageDesc **ret_p2, tb_page_addr_t phys2, int alloc)
+   PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc)
 {
 PageDesc *p1, *p2;
 tb_page_addr_t page1;
@@ -1289,7 +1289,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
  * Note that inserting into the hash table first isn't an option, since
  * we can only insert TBs that are fully initialized.
  */
-page_lock_pair(&p, phys_pc, &p2, phys_page2, 1);
+page_lock_pair(&p, phys_pc, &p2, phys_page2, true);
 tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
 if (p2) {
 tb_page_add(p2, tb, 1, phys_page2);
@@ -2224,7 +2224,7 @@ void page_set_flags(target_ulong start, target_ulong end, 
int flags)
 for (addr = start, len = end - start;
  len != 0;
  len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true);
 
 /* If the write protection bit is set, then we invalidate
the code inside.  */
-- 
2.34.1

[PATCH for-7.2 06/21] accel/tcg: Remove PageDesc code_bitmap