Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support

2022-12-08 Thread Klaus Jensen
On Dec  8 08:16, Klaus Jensen wrote:
> On Dec  7 09:49, Guenter Roeck wrote:
> > Hi,
> > 
> > On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
> > > Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3)
> > > and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13
> > > in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config
> > > command, the nvme_dbbuf_config function tries to associate each existing
> > > SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address.
> > > Queues created after the Doorbell Buffer Config command will have the
> > > doorbell buffers associated with them when they are initialized.
> > > 
> > > In nvme_process_sq and nvme_post_cqe, proactively check for Shadow
> > > Doorbell buffer changes instead of wait for doorbell register changes.
> > > This reduces the number of MMIOs.
> > > 
> > > In nvme_process_db(), update the shadow doorbell buffer value with
> > > the doorbell register value if it is the admin queue. This is a hack
> > > since hosts like Linux NVMe driver and SPDK do not use shadow
> > > doorbell buffer for the admin queue. Copying the doorbell register
> > > value to the shadow doorbell buffer allows us to support these hosts
> > > as well as spec-compliant hosts that use shadow doorbell buffer for
> > > the admin queue.
> > > 
> > > Signed-off-by: Jinhao Fan 
> > 
> > I noticed that I can no longer boot Linux kernels from nvme on riscv64
> > systems. The problem is seen with qemu v7.1 and qemu v7.2-rc4.
> > The log shows:
> > 
> > [   35.904128] nvme nvme0: I/O 642 (I/O Cmd) QID 1 timeout, aborting
> > [   35.905000] EXT4-fs (nvme0n1): mounting ext2 file system using the ext4 
> > subsystem
> > [   66.623863] nvme nvme0: I/O 643 (I/O Cmd) QID 1 timeout, aborting
> > [   97.343989] nvme nvme0: Abort status: 0x0
> > [   97.344355] nvme nvme0: Abort status: 0x0
> > [   97.344647] nvme nvme0: I/O 7 QID 0 timeout, reset controller
> > [   97.350568] nvme nvme0: I/O 644 (I/O Cmd) QID 1 timeout, aborting
> > 
> > This is with the mainline Linux kernel (v6.1-rc8).
> > 
> > Bisect points to this patch. Reverting this patch and a number of associated
> > patches (to fix conflicts) fixes the problem.
> > 
> > 06143d8771 Revert "hw/nvme: Implement shadow doorbell buffer support"
> > acb4443e3a Revert "hw/nvme: Use ioeventfd to handle doorbell updates"
> > d5fd309feb Revert "hw/nvme: do not enable ioeventfd by default"
> > 1ca1e6c47c Revert "hw/nvme: unregister the event notifier handler on the 
> > main loop"
> > 2d26abd51e Revert "hw/nvme: skip queue processing if notifier is cleared"
> > 99d411b5a5 Revert "hw/nvme: reenable cqe batching"
> > 2293d3ca6c Revert "hw/nvme: Add trace events for shadow doorbell buffer"
> > 
> > Qemu command line:
> > 
> > qemu-system-riscv64 -M virt -m 512M \
> >  -kernel arch/riscv/boot/Image -snapshot \
> >  -device nvme,serial=foo,drive=d0 \
> >  -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> >  -bios default \
> >  -append "root=/dev/nvme0n1 console=ttyS0,115200 
> > earlycon=uart8250,mmio,0x1000,115200" \
> >  -nographic -monitor none
> > 
> > Guenter
> 
> Hi Guenter,
> 
> Thanks for the bisect.
> 
> The shadow doorbell is also an obvious candidate for this regression. I
> wonder if this could be a kernel bug, since we are not observing this on
> other architectures. The memory barriers required are super finicky, but
> in QEMU all the operations are associated with full memory barriers. The
> barriers are more fine grained in the kernel though.
> 
> I will dig into this together with Keith.

A cq head doorbell mmio is skipped... And it is not the fault of the
kernel. The kernel is in it's good right to skip the mmio since the cq
eventidx is not properly updated.

Adding that and it boots properly on riscv. But I'm perplexed as to why
this didnt show up on our regularly tested platforms.

Gonna try to get this in for 7.2!


signature.asc
Description: PGP signature


Re: [PATCH v11 1/2] vhost-vdpa: Skip the range check while MR is IOMMU

2022-12-08 Thread Cindy Lu
On Thu, 8 Dec 2022 at 15:42, Jason Wang  wrote:
>
> On Wed, Nov 30, 2022 at 1:33 PM Cindy Lu  wrote:
> >
> > Skip the check in vhost_vdpa_listener_skipped_section() while
> > MR is IOMMU, Move this check to  vhost_vdpa_iommu_map_notify()
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >  hw/virtio/vhost-vdpa.c | 21 ++---
> >  1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 3ff9ce3501..f0e9963d19 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -60,15 +60,22 @@ static bool 
> > vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> >   iova_min, section->offset_within_address_space);
> >  return true;
> >  }
> > +/*
> > + * While using vIOMMU, Sometimes the section will be larger than 
> > iova_max
> > + * but the memory that  actually mapping is smaller, So skip the check
> > + * here. Will add the check in vhost_vdpa_iommu_map_notify,
> > + *There is the real size that maps to the kernel
> > + */
> >
>
> I suggest squashing this into the next patch since we haven't
> implemented vhost_vdpa_iommu_map_notify() yet.
>
> Thanks
>
Sure, will do
Thanks
Cindy
> > -llend = vhost_vdpa_section_end(section);
> > -if (int128_gt(llend, int128_make64(iova_max))) {
> > -error_report("RAM section out of device range (max=0x%" PRIx64
> > - ", end addr=0x%" PRIx64 ")",
> > - iova_max, int128_get64(llend));
> > -return true;
> > +if (!memory_region_is_iommu(section->mr)) {
> > +llend = vhost_vdpa_section_end(section);
> > +if (int128_gt(llend, int128_make64(iova_max))) {
> > +error_report("RAM section out of device range (max=0x%" PRIx64
> > + ", end addr=0x%" PRIx64 ")",
> > + iova_max, int128_get64(llend));
> > +return true;
> > +}
> >  }
> > -
> >  return false;
> >  }
> >
> > --
> > 2.34.3
> >
>




RE: [PATCH] target/i386/hax: Add XCR0 support

2022-12-08 Thread Wang, Wenchao
Hi, Paolo,

As HAXM v7.8.0 is released and it added XCR0 support, could you help to merge 
this patch to add corresponding support into HAX user space of QEMU? The patch 
has been attached below. Thanks.


Best Regards,
Wenchao

-

From b1789f2523d06798b8883664bfa9a9df797bfccf Mon Sep 17 00:00:00 2001
From: Wenchao Wang 
Date: Fri, 25 Nov 2022 18:37:34 +0800
Subject: [PATCH] target/i386/hax: Add XCR0 support

Introduce extended control register XCR0 to support XSAVE feature set.

Note: This change requires at least HAXM v7.8.0 to support.

Reviewed-by: Hang Yuan 
Signed-off-by: Wenchao Wang 
---
 target/i386/hax/hax-interface.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/i386/hax/hax-interface.h b/target/i386/hax/hax-interface.h
index 537ae084e9..1d13bb2380 100644
--- a/target/i386/hax/hax-interface.h
+++ b/target/i386/hax/hax-interface.h
@@ -201,6 +201,8 @@ struct vcpu_state_t {
 uint64_t _cr3;
 uint64_t _cr4;
 
+uint64_t _xcr0;
+
 uint64_t _dr0;
 uint64_t _dr1;
 uint64_t _dr2;
-- 
2.17.1


-Original Message-
From: Wang, Wenchao 
Sent: Monday, December 5, 2022 17:10
To: Philippe Mathieu-Daudé ; qemu-devel@nongnu.org
Cc: haxm-team ; Paolo Bonzini 
Subject: RE: [PATCH] target/i386/hax: Add XCR0 support

Thanks for Phillippe's reply.

Hi, Paolo,

Could you help to review the patch of HAX? If there is any concern about it, 
feel free to discuss with me. Thanks a lot.


Best Regards,
Wenchao

-Original Message-
From: Philippe Mathieu-Daudé 
Sent: Monday, December 5, 2022 17:05
To: Wang, Wenchao ; qemu-devel@nongnu.org
Cc: haxm-team ; Paolo Bonzini 
Subject: Re: [PATCH] target/i386/hax: Add XCR0 support

Hi Wenchao,

On 5/12/22 09:35, Wang, Wenchao wrote:
> Hi, Philippe,
> 
> Do you agree with my opinion and is there any further process that I need to 
> follow to get this patch merged? Thanks a lot.

I don't understand this part of HAXM enough, but per your explanation, your 
change looks correct. I'll let Paolo decide :)

Regards,

Phil.

> Best Regards,
> Wenchao
> 
> -Original Message-
> From: Wang, Wenchao
> Sent: Monday, November 28, 2022 16:11
> To: Philippe Mathieu-Daudé ; qemu-devel@nongnu.org
> Cc: haxm-team ; Paolo Bonzini 
> 
> Subject: RE: [PATCH] target/i386/hax: Add XCR0 support
> 
> Hi, Philippe,
> 
> It is just the full patch. Currently, the implementation of HAXM is simple, 
> we did not synchronize the vCPU register for xcr0 from QEMU. HAXM will handle 
> the xcr0 state within the kernel space, including initialization, update, 
> etc. This patch adds the xcr0 variable for allocating extra 8-byte buffer 
> occupation, which will be passed between QEMU and HAXM when 
> hax_sync_vcpu_state() is invoked. We have verified the patched QEMU and it 
> can launch all guest OSes. Thanks for your comments.
> 
> 
> Best Regards,
> Wenchao
> 
> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Friday, November 25, 2022 21:37
> To: Wang, Wenchao ; qemu-devel@nongnu.org
> Cc: haxm-team ; Paolo Bonzini 
> 
> Subject: Re: [PATCH] target/i386/hax: Add XCR0 support
> 
> Hi,
> 
> On 25/11/22 13:18, Wang, Wenchao wrote:
>> Hi, maintainers,
>>
>> As HAXM v7.8.0 is released and it added XCR0 support, could you help 
>> to merge this patch to add corresponding support into HAX user space 
>> of QEMU? The patch has been included in the attachment. Thanks.
> 
> See
> https://www.qemu.org/docs/master/devel/submitting-a-patch.html#submitt
> ing-your-patches on how to send patches to a mailing list.
> 
>>
>> Best Regards,
>>
>> Wenchao
>>
>>   From b1789f2523d06798b8883664bfa9a9df797bfccf Mon Sep 17 00:00:00
>> 2001
>>
>> From: Wenchao Wang 
>>
>> Date: Fri, 25 Nov 2022 18:37:34 +0800
>>
>> Subject: [PATCH] target/i386/hax: Add XCR0 support
>>
>> Introduce extended control register XCR0 to support XSAVE feature set.
>>
>> Note: This change requires at least HAXM v7.8.0 to support.
>>
>> Reviewed-by: Hang Yuan 
>>
>> Signed-off-by: Wenchao Wang 
>>
>> ---
>>
>> target/i386/hax/hax-interface.h | 2 ++
>>
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/target/i386/hax/hax-interface.h 
>> b/target/i386/hax/hax-interface.h
>>
>> index 537ae084e9..1d13bb2380 100644
>>
>> --- a/target/i386/hax/hax-interface.h
>>
>> +++ b/target/i386/hax/hax-interface.h
>>
>> @@ -201,6 +201,8 @@ struct vcpu_state_t {
>>
>>    uint64_t _cr3;
>>
>>    uint64_t _cr4;
>>
>> +    uint64_t _xcr0;
>>
>> +
>>
>>    uint64_t _dr0;
>>
>>    uint64_t _dr1;
>>
>>    uint64_t _dr2;
>>
>> --
>>
>> 2.17.1
>>
> 
> Is that the full patch? It is missing the register use in 
> hax_sync_vcpu_register()...
> 
> Regards,
> 
> Phil.



[PATCH 1/1] hw/nvme: fix missing cq eventidx update

2022-12-08 Thread Klaus Jensen
From: Klaus Jensen 

Prior to reading the shadow doorbell cq head, we have to update the
eventidx. Otherwise, we risk that the driver will skip an mmio doorbell
write. This happens on riscv64, as reported by Guenter.

Adding the missing update to the cq eventidx fixes the issue.

Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support")
Reported-by: Guenter Roeck 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e54276dc1dc7..1192919b4869 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1331,6 +1331,13 @@ static inline void nvme_blk_write(BlockBackend *blk, 
int64_t offset,
 }
 }
 
+static void nvme_update_cq_eventidx(const NvmeCQueue *cq)
+{
+pci_dma_write(&cq->ctrl->parent_obj, cq->ei_addr, &cq->head,
+  sizeof(cq->head));
+trace_pci_nvme_eventidx_cq(cq->cqid, cq->head);
+}
+
 static void nvme_update_cq_head(NvmeCQueue *cq)
 {
 pci_dma_read(&cq->ctrl->parent_obj, cq->db_addr, &cq->head,
@@ -1351,6 +1358,7 @@ static void nvme_post_cqes(void *opaque)
 hwaddr addr;
 
 if (n->dbbuf_enabled) {
+nvme_update_cq_eventidx(cq);
 nvme_update_cq_head(cq);
 }
 
-- 
2.38.1




[PATCH 0/1] hw/nvme: shadow doorbells broken on riscv64

2022-12-08 Thread Klaus Jensen
From: Klaus Jensen 

Guenter reports[1] that hw/nvme is broken on riscv64.

This is a regression since 7.1, so this does not warrent an rc5 for 7.2.
I'm sure Guenter can carry this patch in his tree, and maybe we can get
this out in a stable release.

I really wonder why this issue only shows up on riscv64. We have not
observed this on other platforms (yet).

  [1]: https://lore.kernel.org/qemu-devel/20221207174918.ga1151...@roeck-us.net/

Klaus Jensen (1):
  hw/nvme: fix missing cq eventidx update

 hw/nvme/ctrl.c | 8 
 1 file changed, 8 insertions(+)

-- 
2.38.1




Re: [PATCH v10 3/9] KVM: Extend the memslot to support fd-based private memory

2022-12-08 Thread Xiaoyao Li

On 12/2/2022 2:13 PM, Chao Peng wrote:

..


Together with the change, a new config HAVE_KVM_RESTRICTED_MEM is added
and right now it is selected on X86_64 only.



From the patch implementation, I have no idea why 
HAVE_KVM_RESTRICTED_MEM is needed.





Re: [PATCH-for-8.0 1/4] cputlb: Restrict SavedIOTLB to system emulation

2022-12-08 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> Commit 2f3a57ee47 ("cputlb: ensure we save the IOTLB data in
> case of reset") added the SavedIOTLB structure -- which is
> system emulation specific -- in the generic CPUState structure.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  accel/tcg/cputlb.c| 4 ++--
>  include/hw/core/cpu.h | 6 --
>  2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 6f1c00682b..0ea96fbcdf 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1395,7 +1395,7 @@ static uint64_t io_readx(CPUArchState *env, 
> CPUTLBEntryFull *full,
>  static void save_iotlb_data(CPUState *cs, MemoryRegionSection *section,
>  hwaddr mr_offset)
>  {
> -#ifdef CONFIG_PLUGIN
> +#if defined(CONFIG_PLUGIN) && !defined(CONFIG_USER_ONLY)

cputlb is softmmu only so I don't think we need to check CONFIG_USER_ONLY here.

>  SavedIOTLB *saved = &cs->saved_iotlb;
>  saved->section = section;
>  saved->mr_offset = mr_offset;
> @@ -1699,7 +1699,7 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
> *env, target_ulong addr,
>  return qemu_ram_addr_from_host_nofail(p);
>  }
>  
> -#ifdef CONFIG_PLUGIN
> +#if defined(CONFIG_PLUGIN) && !defined(CONFIG_USER_ONLY)
>  /*
>   * Perform a TLB lookup and populate the qemu_plugin_hwaddr structure.
>   * This should be a hot path as we will have just looked this path up
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 8830546121..bc3229ae13 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -222,7 +222,7 @@ struct CPUWatchpoint {
>  QTAILQ_ENTRY(CPUWatchpoint) entry;
>  };
>  
> -#ifdef CONFIG_PLUGIN
> +#if defined(CONFIG_PLUGIN) && !defined(CONFIG_USER_ONLY)
>  /*
>   * For plugins we sometime need to save the resolved iotlb data before
>   * the memory regions get moved around  by io_writex.
> @@ -406,9 +406,11 @@ struct CPUState {
>  
>  #ifdef CONFIG_PLUGIN
>  GArray *plugin_mem_cbs;
> +#if !defined(CONFIG_USER_ONLY)
>  /* saved iotlb data from io_writex */
>  SavedIOTLB saved_iotlb;
> -#endif
> +#endif /* !CONFIG_USER_ONLY */
> +#endif /* CONFIG_PLUGIN */
>  
>  /* TODO Move common fields from CPUArchState here. */
>  int cpu_index;


-- 
Alex Bennée



Re: [PATCH v2 2/5] target/riscv: Update VS timer whenever htimedelta changes

2022-12-08 Thread Anup Patel
On Thu, Dec 8, 2022 at 9:00 AM Alistair Francis  wrote:
>
> On Tue, Nov 8, 2022 at 11:07 PM Anup Patel  wrote:
> >
> > The htimedelta[h] CSR has impact on the VS timer comparison so we
> > should call riscv_timer_write_timecmp() whenever htimedelta changes.
> >
> > Fixes: 3ec0fe18a31f ("target/riscv: Add vstimecmp suppor")
> > Signed-off-by: Anup Patel 
> > Reviewed-by: Alistair Francis 
>
> This patch breaks my Xvisor test. When running OpenSBI and Xvisor like this:
>
> qemu-system-riscv64 -machine virt \
> -m 1G -serial mon:stdio -serial null -nographic \
> -append 'vmm.console=uart@1000 vmm.bootcmd="vfs mount initrd
> /;vfs run /boot.xscript;vfs cat /system/banner.txt; guest kick guest0;
> vserial bind guest0/uart0"' \
> -smp 4 -d guest_errors \
> -bios none \
> -device loader,file=./images/qemuriscv64/vmm.bin,addr=0x8020 \
> -kernel ./images/qemuriscv64/fw_jump.elf \
> -initrd ./images/qemuriscv64/vmm-disk-linux.img -cpu rv64,h=true
>
> Running:
>
> Xvisor v0.3.0-129-gbc33f339 (Jan  1 1970 00:00:00)
>
> I see this failure:
>
> INIT: bootcmd:  guest kick guest0
>
> guest0: Kicked
>
> INIT: bootcmd:  vserial bind guest0/uart0
>
> [guest0/uart0] cpu_vcpu_stage2_map: guest_phys=0x3B9AC000
> size=0x4096 map failed
>
> do_error: CPU3: VCPU=guest0/vcpu0 page fault failed (error -1)
>
>zero=0x  ra=0x80001B4E
>
>  sp=0x8001CF80  gp=0x
>
>  tp=0x  s0=0x8001CFB0
>
>  s1=0x  a0=0x10001048
>
>  a1=0x  a2=0x00989680
>
>  a3=0x3B9ACA00  a4=0x0048
>
>  a5=0x  a6=0x00019000
>
>  a7=0x  s2=0x
>
>  s3=0x  s4=0x
>
>  s5=0x  s6=0x
>
>  s7=0x  s8=0x
>
>  s9=0x s10=0x
>
> s11=0x  t0=0x4000
>
>  t1=0x0100  t2=0x
>
>  t3=0x  t4=0x
>
>  t5=0x  t6=0x
>
>sepc=0x80001918 sstatus=0x00024120
>
> hstatus=0x0002002001C0 sp_exec=0x10A64000
>
>  scause=0x0017   stval=0x3B9ACAF8
>
>   htval=0x0EE6B2BE  htinst=0x00D03021
>
> I have tried updating to a newer Xvisor release, but with that I don't
> get any serial output.
>
> Can you help get the Xvisor tests back up and running?

I tried the latest Xvisor-next (https://github.com/avpatel/xvisor-next)
with your QEMU riscv-to-apply.next branch and it works fine (both
with and without Sstc).

Here's the QEMU command which I use:

qemu-system-riscv64 -M virt -m 512M -nographic \
-bios opensbi/build/platform/generic/firmware/fw_jump.bin \
-kernel ../xvisor-next/build/vmm.bin \
-initrd rbd_v64.img \
-append "vmm.bootcmd=\"vfs mount initrd /;vfs run /boot.xscript;vfs
cat /system/banner.txt\"" \
-smp 4

Also, I will be releasing Xvisor-0.3.2 by the end of Dec 2022 so I
suggest using this upcoming release in your test.

Regards,
Anup



[PATCH v13 2/7] s390x/cpu topology: reporting the CPU topology to the guest

2022-12-08 Thread Pierre Morel
The guest uses the STSI instruction to get information on the
CPU topology.

Let us implement the STSI instruction for the basis CPU topology
level, level 2.

Signed-off-by: Pierre Morel 
---
 target/s390x/cpu.h  |  76 +
 target/s390x/kvm/cpu_topology.c | 186 
 target/s390x/kvm/kvm.c  |   6 +-
 target/s390x/kvm/meson.build|   3 +-
 4 files changed, 269 insertions(+), 2 deletions(-)
 create mode 100644 target/s390x/kvm/cpu_topology.c

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 7d6d01325b..729ace321a 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -565,6 +565,80 @@ typedef union SysIB {
 } SysIB;
 QEMU_BUILD_BUG_ON(sizeof(SysIB) != 4096);
 
+/*
+ * CPU Topology List provided by STSI with fc=15 provides a list
+ * of two different Topology List Entries (TLE) types to specify
+ * the topology hierarchy.
+ *
+ * - Container Topology List Entry
+ *   Defines a container to contain other Topology List Entries
+ *   of any type, nested containers or CPU.
+ * - CPU Topology List Entry
+ *   Specifies the CPUs position, type, entitlement and polarization
+ *   of the CPUs contained in the last Container TLE.
+ *
+ * There can be theoretically up to five levels of containers, QEMU
+ * uses only one level, the socket level.
+ *
+ * A container of with a nesting level (NL) greater than 1 can only
+ * contain another container of nesting level NL-1.
+ *
+ * A container of nesting level 1 (socket), contains as many CPU TLE
+ * as needed to describe the position and qualities of all CPUs inside
+ * the container.
+ * The qualities of a CPU are polarization, entitlement and type.
+ *
+ * The CPU TLE defines the position of the CPUs of identical qualities
+ * using a 64bits mask which first bit has its offset defined by
+ * the CPU address orgin field of the CPU TLE like in:
+ * CPU address = origin * 64 + bit position within the mask
+ *
+ */
+/* Container type Topology List Entry */
+typedef struct SysIBTl_container {
+uint8_t nl;
+uint8_t reserved[6];
+uint8_t id;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_container;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_container) != 8);
+
+/* CPU type Topology List Entry */
+typedef struct SysIBTl_cpu {
+uint8_t nl;
+uint8_t reserved0[3];
+uint8_t reserved1:5;
+uint8_t dedicated:1;
+uint8_t polarity:2;
+uint8_t type;
+uint16_t origin;
+uint64_t mask;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_cpu;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_cpu) != 16);
+
+#define S390_TOPOLOGY_MAG  6
+#define S390_TOPOLOGY_MAG6 0
+#define S390_TOPOLOGY_MAG5 1
+#define S390_TOPOLOGY_MAG4 2
+#define S390_TOPOLOGY_MAG3 3
+#define S390_TOPOLOGY_MAG2 4
+#define S390_TOPOLOGY_MAG1 5
+/* Configuration topology */
+typedef struct SysIB_151x {
+uint8_t  reserved0[2];
+uint16_t length;
+uint8_t  mag[S390_TOPOLOGY_MAG];
+uint8_t  reserved1;
+uint8_t  mnest;
+uint32_t reserved2;
+char tle[];
+} QEMU_PACKED QEMU_ALIGNED(8) SysIB_151x;
+QEMU_BUILD_BUG_ON(sizeof(SysIB_151x) != 16);
+
+/* Max size of a SYSIB structure is when all CPU are alone in a container */
+#define S390_TOPOLOGY_SYSIB_SIZE (sizeof(SysIB_151x) + 
\
+  S390_MAX_CPUS * (sizeof(SysIBTl_container) + 
\
+   sizeof(SysIBTl_cpu)))
+
 /* MMU defines */
 #define ASCE_ORIGIN   (~0xfffULL) /* segment table origin 
*/
 #define ASCE_SUBSPACE 0x200   /* subspace group control   
*/
@@ -843,4 +917,6 @@ S390CPU *s390_cpu_addr2state(uint16_t cpu_addr);
 
 #include "exec/cpu-all.h"
 
+void insert_stsi_15_1_x(S390CPU *cpu, int sel2, __u64 addr, uint8_t ar);
+
 #endif
diff --git a/target/s390x/kvm/cpu_topology.c b/target/s390x/kvm/cpu_topology.c
new file mode 100644
index 00..1d6fd4394b
--- /dev/null
+++ b/target/s390x/kvm/cpu_topology.c
@@ -0,0 +1,186 @@
+/*
+ * QEMU S390x CPU Topology
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "hw/s390x/pv.h"
+#include "hw/sysbus.h"
+#include "hw/s390x/cpu-topology.h"
+#include "hw/s390x/sclp.h"
+
+/*
+ * s390_topology_add_cpu:
+ * @topo: pointer to the topology
+ * @cpu : pointer to the new CPU
+ *
+ * The topology pointed by S390CPU, gives us the CPU topology
+ * established by the -smp QEMU arguments.
+ * The core-id is used to calculate the position of the CPU inside
+ * the topology:
+ *  - the socket, container TLE, containing the CPU, we have one socket
+ *for every num_cores cores.
+ *  - the CPU TLE inside the socket, we have potentionly up to 4 CPU TLE
+ *in a container TLE with the assumption that all CPU are identical
+ *with the s

[PATCH v13 4/7] s390x/cpu_topology: CPU topology migration

2022-12-08 Thread Pierre Morel
The migration can only take place if both source and destination
of the migration both use or both do not use the CPU topology
facility.

We indicate a change in topology during migration postload for the
case the topology changed between source and destination.

Signed-off-by: Pierre Morel 
---
 target/s390x/cpu.h|  1 +
 hw/s390x/cpu-topology.c   | 49 +++
 target/s390x/cpu-sysemu.c |  8 +++
 3 files changed, 58 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index bc1a7de932..284c708a6c 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -854,6 +854,7 @@ void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data 
arg);
 int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t sch_id,
 int vq, bool assign);
 void s390_cpu_topology_reset(void);
+int s390_cpu_topology_mtcr_set(void);
 #ifndef CONFIG_USER_ONLY
 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
 #else
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index f54afcf550..8a2fe041d4 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -18,6 +18,7 @@
 #include "target/s390x/cpu.h"
 #include "hw/s390x/s390-virtio-ccw.h"
 #include "hw/s390x/cpu-topology.h"
+#include "migration/vmstate.h"
 
 /**
  * s390_has_topology
@@ -129,6 +130,53 @@ static void s390_topology_reset(DeviceState *dev)
 s390_cpu_topology_reset();
 }
 
+/**
+ * cpu_topology_postload
+ * @opaque: a pointer to the S390Topology
+ * @version_id: version identifier
+ *
+ * We check that the topology is used or is not used
+ * on both side identically.
+ *
+ * If the topology is in use we set the Modified Topology Change Report
+ * on the destination host.
+ */
+static int cpu_topology_postload(void *opaque, int version_id)
+{
+int ret;
+
+/* We do not support CPU Topology, all is good */
+if (!s390_has_topology()) {
+return 0;
+}
+
+/* We support CPU Topology, set the MTCR unconditionally */
+ret = s390_cpu_topology_mtcr_set();
+if (ret) {
+error_report("Failed to set MTCR: %s", strerror(-ret));
+}
+return ret;
+}
+
+/**
+ * cpu_topology_needed:
+ * @opaque: The pointer to the S390Topology
+ *
+ * We always need to know if source and destination use the topology.
+ */
+static bool cpu_topology_needed(void *opaque)
+{
+return s390_has_topology();
+}
+
+const VMStateDescription vmstate_cpu_topology = {
+.name = "cpu_topology",
+.version_id = 1,
+.post_load = cpu_topology_postload,
+.minimum_version_id = 1,
+.needed = cpu_topology_needed,
+};
+
 /**
  * topology_class_init:
  * @oc: Object class
@@ -145,6 +193,7 @@ static void topology_class_init(ObjectClass *oc, void *data)
 device_class_set_props(dc, s390_topology_properties);
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 dc->reset = s390_topology_reset;
+dc->vmsd = &vmstate_cpu_topology;
 }
 
 static const TypeInfo cpu_topology_info = {
diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index e27864c5f5..a8e3e6219d 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -319,3 +319,11 @@ void s390_cpu_topology_reset(void)
 }
 }
 }
+
+int s390_cpu_topology_mtcr_set(void)
+{
+if (kvm_enabled()) {
+return kvm_s390_topology_set_mtcr(1);
+}
+return -ENOENT;
+}
-- 
2.31.1




[PATCH v13 0/7] s390x: CPU Topology

2022-12-08 Thread Pierre Morel
Hi,

Implementation discussions
==

CPU models
--

Since the S390_FEAT_CONFIGURATION_TOPOLOGY is already in the CPU model
for old QEMU we could not activate it as usual from KVM but needed
a KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
Checking and enabling this capability enables
S390_FEAT_CONFIGURATION_TOPOLOGY.

Migration
-

Once the S390_FEAT_CONFIGURATION_TOPOLOGY is enabled in the source
host the STFL(11) is provided to the guest.
Since the feature is already in the CPU model of older QEMU,
a migration from a new QEMU enabling the topology to an old QEMU
will keep STFL(11) enabled making the guest get an exception for
illegal operation as soon as it uses the PTF instruction.

A VMState keeping track of the S390_FEAT_CONFIGURATION_TOPOLOGY
allows to forbid the migration in such a case.

Note that the VMState will be used to hold information on the
topology once we implement topology change for a running guest. 

Topology


Until we introduce bookss and drawers, polarization and dedication
the topology is kept very simple and is specified uniquely by
the core_id of the vCPU which is also the vCPU address.

Testing
===

To use the QEMU patches, you will need Linux V6-rc1 or newer,
or use the following Linux mainline patches:

f5ecfee94493 2022-07-20 KVM: s390: resetting the Topology-Change-Report
24fe0195bc19 2022-07-20 KVM: s390: guest support for topology function 
0130337ec45b 2022-07-20 KVM: s390: Cleanup ipte lock access and SIIF fac.. 

Currently this code is for KVM only, I have no idea if it is interesting
to provide a TCG patch. If ever it will be done in another series.

Documentation
=

To have a better understanding of the S390x CPU Topology and its
implementation in QEMU you can have a look at the documentation in the
last patch of this series.

The admin will want to match the host and the guest topology, taking
into account that the guest does not recognize multithreading.
Consequently, two vCPU assigned to threads of the same real CPU should
preferably be assigned to the same socket of the guest machine.

Future developments
===

Two series are actively prepared:
- Adding drawers, book, polarization and dedication to the vCPU.
- changing the topology with a running guest

Regards,
Pierre

Pierre Morel (7):
  s390x/cpu topology: Creating CPU topology device
  s390x/cpu topology: reporting the CPU topology to the guest
  s390x/cpu_topology: resetting the Topology-Change-Report
  s390x/cpu_topology: CPU topology migration
  s390x/cpu_topology: interception of PTF instruction
  s390x/cpu_topology: activating CPU topology
  docs/s390x: document s390x cpu topology

 docs/system/s390x/cpu-topology.rst |  87 ++
 docs/system/target-s390x.rst   |   1 +
 include/hw/s390x/cpu-topology.h|  52 ++
 include/hw/s390x/s390-virtio-ccw.h |   6 +
 target/s390x/cpu.h |  78 +
 target/s390x/kvm/kvm_s390x.h   |   1 +
 hw/s390x/cpu-topology.c| 261 +
 hw/s390x/s390-virtio-ccw.c |   7 +
 target/s390x/cpu-sysemu.c  |  21 +++
 target/s390x/cpu_models.c  |   1 +
 target/s390x/kvm/cpu_topology.c| 186 
 target/s390x/kvm/kvm.c |  46 -
 hw/s390x/meson.build   |   1 +
 target/s390x/kvm/meson.build   |   3 +-
 14 files changed, 749 insertions(+), 2 deletions(-)
 create mode 100644 docs/system/s390x/cpu-topology.rst
 create mode 100644 include/hw/s390x/cpu-topology.h
 create mode 100644 hw/s390x/cpu-topology.c
 create mode 100644 target/s390x/kvm/cpu_topology.c

-- 
2.31.1

- since v12

- suppress new CPU flag "disable-topology" just use ctop

- no use of special fields in CCW machine or in CPU

- modifications in documentation

- insert documentation in tree
  (Cedric)

- moved cpu-topology.c from target/s390 to target/s390/kvm
  to compile smoothly (without topology) for TCG
  (Cedric)

- since v11

- new CPU flag "disable-topology"
  I would have take "topology" if I was able to have
  it false on default.
  (Christian, Thomas)

- Build the topology during the interception of the
  STSI instruction.
  (Cedric)

- return CC3 in case the calculated SYSIB length is
  greater than 4096.
  (Janis)

- minor corections on documentation

- since v10

- change machine attribute "topology-disable" to "topology"
  (Cedric)
- Add preliminary patch for machine properties
  (Cedric)
- Use next machine as 7.2
  (Cedric / Connie)
- Remove unecessary mutex
  (Thomas)
- use ENOTSUP return value for kvm_s390_topology_set_mtcr()
  (Cedric)
- Add explanation on container and cpu TLEs
  (Thomas)
- use again cpu and socket count in topology structure
  (Cedric)
- Suppress the S390TopoTLE structure and integrate
  the TLE masks to the socket structure.
  (-)
- the STSI instruction now finds the topology from the machine
  (Cedric)

- since v9

- remove books and drawers

- remove thread denying a

[PATCH v13 3/7] s390x/cpu_topology: resetting the Topology-Change-Report

2022-12-08 Thread Pierre Morel
During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
Reviewed-by: Nico Boehr 
Reviewed-by: Janis Schoetterl-Glausch 
---
 target/s390x/cpu.h   |  1 +
 target/s390x/kvm/kvm_s390x.h |  1 +
 hw/s390x/cpu-topology.c  | 12 
 hw/s390x/s390-virtio-ccw.c   |  1 +
 target/s390x/cpu-sysemu.c| 13 +
 target/s390x/kvm/kvm.c   | 17 +
 6 files changed, 45 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 729ace321a..bc1a7de932 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -853,6 +853,7 @@ void s390_enable_css_support(S390CPU *cpu);
 void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg);
 int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t sch_id,
 int vq, bool assign);
+void s390_cpu_topology_reset(void);
 #ifndef CONFIG_USER_ONLY
 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
 #else
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index f9785564d0..649dae5948 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -47,5 +47,6 @@ void kvm_s390_crypto_reset(void);
 void kvm_s390_restart_interrupt(S390CPU *cpu);
 void kvm_s390_stop_interrupt(S390CPU *cpu);
 void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info);
+int kvm_s390_topology_set_mtcr(uint64_t attr);
 
 #endif /* KVM_S390X_H */
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index b3e59873f6..f54afcf550 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -118,6 +118,17 @@ static Property s390_topology_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+/**
+ * s390_topology_reset:
+ * @dev: the device
+ *
+ * Calls the sysemu topology reset
+ */
+static void s390_topology_reset(DeviceState *dev)
+{
+s390_cpu_topology_reset();
+}
+
 /**
  * topology_class_init:
  * @oc: Object class
@@ -133,6 +144,7 @@ static void topology_class_init(ObjectClass *oc, void *data)
 dc->unrealize = s390_topology_unrealize;
 device_class_set_props(dc, s390_topology_properties);
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+dc->reset = s390_topology_reset;
 }
 
 static const TypeInfo cpu_topology_info = {
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 8971ffb871..f03a5aaeea 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -109,6 +109,7 @@ static const char *const reset_dev_types[] = {
 "s390-flic",
 "diag288",
 TYPE_S390_PCI_HOST_BRIDGE,
+TYPE_S390_CPU_TOPOLOGY,
 };
 
 static void subsystem_reset(void)
diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 948e4bd3e0..e27864c5f5 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -306,3 +306,16 @@ void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data 
arg)
 kvm_s390_set_diag318(cs, arg.host_ulong);
 }
 }
+
+void s390_cpu_topology_reset(void)
+{
+int ret;
+
+if (kvm_enabled()) {
+ret = kvm_s390_topology_set_mtcr(0);
+if (ret) {
+error_report("Failed to set Modified Topology Change Report: %s",
+ strerror(-ret));
+}
+}
+}
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 7dc96f3663..5b6383eab0 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2593,6 +2593,23 @@ int kvm_s390_get_zpci_op(void)
 return cap_zpci_op;
 }
 
+int kvm_s390_topology_set_mtcr(uint64_t attr)
+{
+struct kvm_device_attr attribute = {
+.group = KVM_S390_VM_CPU_TOPOLOGY,
+.attr  = attr,
+};
+
+if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+return 0;
+}
+if (!kvm_vm_check_attr(kvm_state, KVM_S390_VM_CPU_TOPOLOGY, attr)) {
+return -ENOTSUP;
+}
+
+return kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, &attribute);
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 }
-- 
2.31.1




[PATCH v13 5/7] s390x/cpu_topology: interception of PTF instruction

2022-12-08 Thread Pierre Morel
When the host supports the CPU topology facility, the PTF
instruction with function code 2 is interpreted by the SIE,
provided that the userland hypervizor activates the interpretation
by using the KVM_CAP_S390_CPU_TOPOLOGY KVM extension.

The PTF instructions with function code 0 and 1 are intercepted
and must be emulated by the userland hypervizor.

Signed-off-by: Pierre Morel 
Reviewed-by: Janis Schoetterl-Glausch 
---
 include/hw/s390x/s390-virtio-ccw.h |  6 
 hw/s390x/cpu-topology.c| 52 ++
 target/s390x/kvm/kvm.c | 11 +++
 3 files changed, 69 insertions(+)

diff --git a/include/hw/s390x/s390-virtio-ccw.h 
b/include/hw/s390x/s390-virtio-ccw.h
index 9bba21a916..c1d46e78af 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -30,6 +30,12 @@ struct S390CcwMachineState {
 uint8_t loadparm[8];
 };
 
+#define S390_PTF_REASON_NONE (0x00 << 8)
+#define S390_PTF_REASON_DONE (0x01 << 8)
+#define S390_PTF_REASON_BUSY (0x02 << 8)
+#define S390_TOPO_FC_MASK 0xffUL
+void s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t ra);
+
 struct S390CcwMachineClass {
 /*< private >*/
 MachineClass parent_class;
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index 8a2fe041d4..a2a553e362 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -19,6 +19,58 @@
 #include "hw/s390x/s390-virtio-ccw.h"
 #include "hw/s390x/cpu-topology.h"
 #include "migration/vmstate.h"
+#include "target/s390x/cpu.h"
+#include "hw/s390x/s390-virtio-ccw.h"
+
+/*
+ * s390_handle_ptf:
+ *
+ * @register 1: contains the function code
+ *
+ * Function codes 0 and 1 handle the CPU polarization.
+ * We assume an horizontal topology, the only one supported currently
+ * by Linux, consequently we answer to function code 0, requesting
+ * horizontal polarization that it is already the current polarization
+ * and reject vertical polarization request without further explanation.
+ *
+ * Function code 2 is handling topology changes and is interpreted
+ * by the SIE.
+ */
+void s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t ra)
+{
+CPUS390XState *env = &cpu->env;
+uint64_t reg = env->regs[r1];
+uint8_t fc = reg & S390_TOPO_FC_MASK;
+
+if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+s390_program_interrupt(env, PGM_OPERATION, ra);
+return;
+}
+
+if (env->psw.mask & PSW_MASK_PSTATE) {
+s390_program_interrupt(env, PGM_PRIVILEGED, ra);
+return;
+}
+
+if (reg & ~S390_TOPO_FC_MASK) {
+s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+return;
+}
+
+switch (fc) {
+case 0:/* Horizontal polarization is already set */
+env->regs[r1] |= S390_PTF_REASON_DONE;
+setcc(cpu, 2);
+break;
+case 1:/* Vertical polarization is not supported */
+env->regs[r1] |= S390_PTF_REASON_NONE;
+setcc(cpu, 2);
+break;
+default:
+/* Note that fc == 2 is interpreted by the SIE */
+s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+}
+}
 
 /**
  * s390_has_topology
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 5b6383eab0..a79fdf1c79 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -97,6 +97,7 @@
 
 #define PRIV_B9_EQBS0x9c
 #define PRIV_B9_CLP 0xa0
+#define PRIV_B9_PTF 0xa2
 #define PRIV_B9_PCISTG  0xd0
 #define PRIV_B9_PCILG   0xd2
 #define PRIV_B9_RPCIT   0xd3
@@ -1465,6 +1466,13 @@ static int kvm_mpcifc_service_call(S390CPU *cpu, struct 
kvm_run *run)
 }
 }
 
+static void kvm_handle_ptf(S390CPU *cpu, struct kvm_run *run)
+{
+uint8_t r1 = (run->s390_sieic.ipb >> 20) & 0x0f;
+
+s390_handle_ptf(cpu, r1, RA_IGNORED);
+}
+
 static int handle_b9(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
 {
 int r = 0;
@@ -1482,6 +1490,9 @@ static int handle_b9(S390CPU *cpu, struct kvm_run *run, 
uint8_t ipa1)
 case PRIV_B9_RPCIT:
 r = kvm_rpcit_service_call(cpu, run);
 break;
+case PRIV_B9_PTF:
+kvm_handle_ptf(cpu, run);
+break;
 case PRIV_B9_EQBS:
 /* just inject exception */
 r = -1;
-- 
2.31.1




[PATCH v13 1/7] s390x/cpu topology: Creating CPU topology device

2022-12-08 Thread Pierre Morel
We will need a Topology device to transfer the topology
during migration and to implement machine reset.

The device creation is fenced by s390_has_topology().

Signed-off-by: Pierre Morel 
---
 include/hw/s390x/cpu-topology.h |  44 ++
 hw/s390x/cpu-topology.c | 149 
 hw/s390x/s390-virtio-ccw.c  |   6 ++
 hw/s390x/meson.build|   1 +
 4 files changed, 200 insertions(+)
 create mode 100644 include/hw/s390x/cpu-topology.h
 create mode 100644 hw/s390x/cpu-topology.c

diff --git a/include/hw/s390x/cpu-topology.h b/include/hw/s390x/cpu-topology.h
new file mode 100644
index 00..6c3d2d080f
--- /dev/null
+++ b/include/hw/s390x/cpu-topology.h
@@ -0,0 +1,44 @@
+/*
+ * CPU Topology
+ *
+ * Copyright IBM Corp. 2022
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#ifndef HW_S390X_CPU_TOPOLOGY_H
+#define HW_S390X_CPU_TOPOLOGY_H
+
+#include "hw/sysbus.h"
+#include "hw/qdev-core.h"
+#include "qom/object.h"
+
+#define S390_TOPOLOGY_CPU_IFL 0x03
+#define S390_TOPOLOGY_MAX_ORIGIN ((63 + S390_MAX_CPUS) / 64)
+
+#define S390_TOPOLOGY_POLARITY_HORIZONTAL  0x00
+#define S390_TOPOLOGY_POLARITY_VERTICAL_LOW0x01
+#define S390_TOPOLOGY_POLARITY_VERTICAL_MEDIUM 0x02
+#define S390_TOPOLOGY_POLARITY_VERTICAL_HIGH   0x03
+
+typedef struct S390TopoSocket {
+int active_count;
+uint64_t mask[S390_TOPOLOGY_MAX_ORIGIN];
+} S390TopoSocket;
+
+struct S390Topology {
+SysBusDevice parent_obj;
+uint32_t num_cores;
+uint32_t num_sockets;
+S390TopoSocket *socket;
+};
+
+#define TYPE_S390_CPU_TOPOLOGY "s390-topology"
+OBJECT_DECLARE_SIMPLE_TYPE(S390Topology, S390_CPU_TOPOLOGY)
+
+void s390_init_topology(MachineState *machine, Error **errp);
+bool s390_has_topology(void);
+S390Topology *s390_get_topology(void);
+
+#endif
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
new file mode 100644
index 00..b3e59873f6
--- /dev/null
+++ b/hw/s390x/cpu-topology.c
@@ -0,0 +1,149 @@
+/*
+ * CPU Topology
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Pierre Morel 
+
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "hw/qdev-properties.h"
+#include "hw/boards.h"
+#include "qemu/typedefs.h"
+#include "target/s390x/cpu.h"
+#include "hw/s390x/s390-virtio-ccw.h"
+#include "hw/s390x/cpu-topology.h"
+
+/**
+ * s390_has_topology
+ *
+ * Return false until the commit activating the topology is
+ * commited.
+ */
+bool s390_has_topology(void)
+{
+return false;
+}
+
+/**
+ * s390_get_topology
+ *
+ * Returns a pointer to the topology.
+ *
+ * This function is called when we know the topology exist.
+ * Testing if the topology exist is done with s390_has_topology()
+ */
+S390Topology *s390_get_topology(void)
+{
+static S390Topology *s390Topology;
+
+if (!s390Topology) {
+s390Topology = S390_CPU_TOPOLOGY(
+object_resolve_path(TYPE_S390_CPU_TOPOLOGY, NULL));
+}
+
+assert(s390Topology);
+
+return s390Topology;
+}
+
+/**
+ * s390_init_topology
+ * @machine: The Machine state, used to retrieve the SMP parameters
+ * @errp: the error pointer in case of problem
+ *
+ * This function creates and initialize the S390Topology with
+ * the QEMU -smp parameters we will use during adding cores to the
+ * topology.
+ */
+void s390_init_topology(MachineState *machine, Error **errp)
+{
+DeviceState *dev;
+
+if (machine->smp.threads > 1) {
+error_setg(errp, "CPU Topology do not support multithreading");
+return;
+}
+
+dev = qdev_new(TYPE_S390_CPU_TOPOLOGY);
+
+object_property_add_child(&machine->parent_obj,
+  TYPE_S390_CPU_TOPOLOGY, OBJECT(dev));
+object_property_set_int(OBJECT(dev), "num-cores",
+machine->smp.cores, errp);
+object_property_set_int(OBJECT(dev), "num-sockets",
+machine->smp.sockets, errp);
+
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp);
+}
+
+/**
+ * s390_topology_realize:
+ * @dev: the device state
+ *
+ * We free the socket array allocated in realize.
+ */
+static void s390_topology_unrealize(DeviceState *dev)
+{
+S390Topology *topo = S390_CPU_TOPOLOGY(dev);
+
+g_free(topo->socket);
+}
+
+/**
+ * s390_topology_realize:
+ * @dev: the device state
+ * @errp: the error pointer (not used)
+ *
+ * During realize the machine CPU topology is initialized with the
+ * QEMU -smp parameters.
+ * The maximum count of CPU TLE in the all Topology can not be greater
+ * than the maximum CPUs.
+ */
+static void s390_topology_realize(DeviceState *dev, Error **errp)
+{
+S390Topology *topo = S390_CPU_TOPOLOGY(dev);
+
+topo->socket = g_new0(

[PATCH v13 6/7] s390x/cpu_topology: activating CPU topology

2022-12-08 Thread Pierre Morel
The KVM capability, KVM_CAP_S390_CPU_TOPOLOGY is used to
activate the S390_FEAT_CONFIGURATION_TOPOLOGY feature and
the topology facility for the guest in the case the topology
is available in QEMU and in KVM.

The feature is disabled by default and fenced for SE
(secure execution).

Signed-off-by: Pierre Morel 
---
 include/hw/s390x/cpu-topology.h | 10 +-
 hw/s390x/cpu-topology.c |  5 ++---
 target/s390x/cpu_models.c   |  1 +
 target/s390x/kvm/kvm.c  | 12 
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/hw/s390x/cpu-topology.h b/include/hw/s390x/cpu-topology.h
index 6c3d2d080f..fe25a3bf6b 100644
--- a/include/hw/s390x/cpu-topology.h
+++ b/include/hw/s390x/cpu-topology.h
@@ -38,7 +38,15 @@ struct S390Topology {
 OBJECT_DECLARE_SIMPLE_TYPE(S390Topology, S390_CPU_TOPOLOGY)
 
 void s390_init_topology(MachineState *machine, Error **errp);
-bool s390_has_topology(void);
 S390Topology *s390_get_topology(void);
 
+#ifdef CONFIG_KVM
+bool s390_has_topology(void);
+#else
+static inline bool s390_has_topology(void)
+{
+return false;
+}
+#endif
+
 #endif
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index a2a553e362..bf125bb4c0 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -75,12 +75,11 @@ void s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t ra)
 /**
  * s390_has_topology
  *
- * Return false until the commit activating the topology is
- * commited.
+ * checks the S390_FEAT_CONFIGURATION_TOPOLOGY availability.
  */
 bool s390_has_topology(void)
 {
-return false;
+return s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY);
 }
 
 /**
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index c3a4f80633..3f05e05fd3 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -253,6 +253,7 @@ bool s390_has_feat(S390Feat feat)
 case S390_FEAT_SIE_CMMA:
 case S390_FEAT_SIE_PFMFI:
 case S390_FEAT_SIE_IBS:
+case S390_FEAT_CONFIGURATION_TOPOLOGY:
 return false;
 break;
 default:
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index a79fdf1c79..abf1202c28 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2471,6 +2471,18 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, 
Error **errp)
 set_bit(S390_FEAT_UNPACK, model->features);
 }
 
+/*
+ * If we have support for CPU Topology in KVM
+ * activate the CPU TOPOLOGY feature.
+ */
+if (kvm_check_extension(kvm_state, KVM_CAP_S390_CPU_TOPOLOGY)) {
+if (kvm_vm_enable_cap(kvm_state, KVM_CAP_S390_CPU_TOPOLOGY, 0) < 0) {
+error_setg(errp, "KVM: Error enabling KVM_CAP_S390_CPU_TOPOLOGY");
+return;
+}
+set_bit(S390_FEAT_CONFIGURATION_TOPOLOGY, model->features);
+}
+
 /* We emulate a zPCI bus and AEN, therefore we don't need HW support */
 set_bit(S390_FEAT_ZPCI, model->features);
 set_bit(S390_FEAT_ADAPTER_EVENT_NOTIFICATION, model->features);
-- 
2.31.1




[PATCH v13 7/7] docs/s390x: document s390x cpu topology

2022-12-08 Thread Pierre Morel
Add some basic examples for the definition of cpu topology
in s390x.

Signed-off-by: Pierre Morel 
---
 docs/system/s390x/cpu-topology.rst | 87 ++
 docs/system/target-s390x.rst   |  1 +
 2 files changed, 88 insertions(+)
 create mode 100644 docs/system/s390x/cpu-topology.rst

diff --git a/docs/system/s390x/cpu-topology.rst 
b/docs/system/s390x/cpu-topology.rst
new file mode 100644
index 00..12f974cc54
--- /dev/null
+++ b/docs/system/s390x/cpu-topology.rst
@@ -0,0 +1,87 @@
+CPU Topology on s390x
+=
+
+CPU Topology on S390x provides up to 5 levels of topology containers:
+nodes, drawers, books, sockets and CPUs.
+While the higher level containers, Containers Topology List Entries,
+(Containers TLE) define a tree hierarchy, the lowest level of topology
+definition, the CPU Topology List Entry (CPU TLE), provides the placement
+of the CPUs inside the parent container.
+
+Currently QEMU CPU topology uses a single level of container: the sockets.
+
+For backward compatibility, threads can be declared on the ``-smp`` command
+line. They will be seen as CPUs by the guest as long as multithreading
+is not really supported by QEMU for S390.
+
+Prerequisites
+-
+
+To use CPU Topology a Linux QEMU/KVM machine providing the CPU Topology 
facility
+(STFLE bit 11) is required.
+
+However, since this facility has been enabled by default in an early version
+of QEMU, we use a capability, ``KVM_CAP_S390_CPU_TOPOLOGY``, to notify KVM
+QEMU use of the CPU Topology.
+
+Enabling CPU topology
+-
+
+Currently, CPU topology is only enabled in the host model.
+
+Enabling CPU topology in a CPU model is done by setting the CPU flag
+``ctop`` to ``on`` like in:
+
+.. code-block:: bash
+
+   -cpu gen16b,ctop=on
+
+Having the topology disabled by default allows migration between
+old and new QEMU without adding new flags.
+
+Indicating the CPU topology to the Virtual Machine
+--
+
+The CPU Topology, can be specified on the QEMU command line
+with the ``-smp`` or the ``-device`` QEMU command arguments.
+
+In the following machine we define 8 sockets with 4 cores each.
+Note that S390 QEMU machines do not implement multithreading.
+
+.. code-block:: bash
+
+  $ qemu-system-s390x -m 2G \
+-cpu gen16b,ctop=on \
+-smp cpus=5,sockets=8,cores=4,maxcpus=32 \
+-device host-s390x-cpu,core-id=14 \
+
+New CPUs can be plugged using the device_add hmp command like in:
+
+.. code-block:: bash
+
+  (qemu) device_add host-s390x-cpu,core-id=9
+
+The core-id defines the placement of the core in the topology by
+starting with core 0 in socket 0 up to maxcpus.
+
+In the example above:
+
+* There are 5 CPUs provided to the guest with the ``-smp`` command line
+  They will take the core-ids 0,1,2,3,4
+  As we have 4 cores in a socket, we have 4 CPUs provided
+  to the guest in socket 0, with core-ids 0,1,2,3.
+  The last cpu, with core-id 4, will be on socket 1.
+
+* the core with ID 14 provided by the ``-device`` command line will
+  be placed in socket 3, with core-id 14
+
+* the core with ID 9 provided by the ``device_add`` qmp command will
+  be placed in socket 2, with core-id 9
+
+Note that the core ID is machine wide and the CPU TLE masks provided
+by the STSI instruction will be writen in a big endian mask:
+
+* in socket 0: 0xf000 (core id 0,1,2,3)
+* in socket 1: 0x0800 (core id 4)
+* in socket 2: 0x0040 (core id 9)
+* in socket 3: 0x0002 (core id 14)
diff --git a/docs/system/target-s390x.rst b/docs/system/target-s390x.rst
index c636f64113..ff0ffe04f3 100644
--- a/docs/system/target-s390x.rst
+++ b/docs/system/target-s390x.rst
@@ -33,3 +33,4 @@ Architectural features
 .. toctree::
s390x/bootdevices
s390x/protvirt
+   s390x/cpu-topology
-- 
2.31.1




Re: [PATCH-for-8.0 2/4] gdbstub: Use vaddr type for generic insert/remove_breakpoint() API

2022-12-08 Thread Philippe Mathieu-Daudé

On 7/12/22 21:15, Peter Maydell wrote:

On Wed, 7 Dec 2022 at 18:27, Philippe Mathieu-Daudé  wrote:


On 7/12/22 19:23, Peter Maydell wrote:

On Wed, 7 Dec 2022 at 17:42, Philippe Mathieu-Daudé  wrote:


Both insert/remove_breakpoint() handlers are used in system and
user emulation. We can not use the 'hwaddr' type on user emulation,
we have to use 'vaddr' which is defined as "wide enough to contain
any #target_ulong virtual address".



@@ -48,8 +48,8 @@ struct AccelOpsClass {

   /* gdbstub hooks */
   bool (*supports_guest_debug)(void);
-int (*insert_breakpoint)(CPUState *cpu, int type, hwaddr addr, hwaddr len);
-int (*remove_breakpoint)(CPUState *cpu, int type, hwaddr addr, hwaddr len);
+int (*insert_breakpoint)(CPUState *cpu, int type, vaddr addr, vaddr len);
+int (*remove_breakpoint)(CPUState *cpu, int type, vaddr addr, vaddr len);
   void (*remove_all_breakpoints)(CPUState *cpu);
   };


If you're changing the prototype of these methods on AccelOpsClass
don't you also want to change the implementations, eg tcg_breakpoint_insert()?
Interestingly that function calls cpu_breakpoint_insert() which
already takes a 'vaddr' rather than a 'hwaddr'.


Yes I neglected to include these changes here:

-- >8 --
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
diff --git a/gdbstub/user.c b/gdbstub/user.c


Those are the callsites to the methods, not the implementations, I think.


Oh correct, I missed that :/ Thanks!




Re: [PATCH-for-8.0 1/4] cputlb: Restrict SavedIOTLB to system emulation

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 09:40, Alex Bennée wrote:


Philippe Mathieu-Daudé  writes:


Commit 2f3a57ee47 ("cputlb: ensure we save the IOTLB data in
case of reset") added the SavedIOTLB structure -- which is
system emulation specific -- in the generic CPUState structure.

Signed-off-by: Philippe Mathieu-Daudé 
---
  accel/tcg/cputlb.c| 4 ++--
  include/hw/core/cpu.h | 6 --
  2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 6f1c00682b..0ea96fbcdf 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1395,7 +1395,7 @@ static uint64_t io_readx(CPUArchState *env, 
CPUTLBEntryFull *full,
  static void save_iotlb_data(CPUState *cs, MemoryRegionSection *section,
  hwaddr mr_offset)
  {
-#ifdef CONFIG_PLUGIN
+#if defined(CONFIG_PLUGIN) && !defined(CONFIG_USER_ONLY)


cputlb is softmmu only so I don't think we need to check CONFIG_USER_ONLY here.

Indeed, only "hw/core/cpu.h" requires it, thanks!



[PATCH] gitlab-ci: Check building ppc64 without TCG

2022-12-08 Thread Thomas Huth
Building QEMU for ppc64 hosts with --disable-tcg used to break a couple
of times in the past, see e.g. commit a01b64cee7 ("target/ppc: Put do_rfi
under a TCG-only block") or commit 049b4ad669 ("target/ppc: Fix build
warnings when building with 'disable-tcg'"), so we should test this in
our CI to avoid such regressions.

Signed-off-by: Thomas Huth 
---
 Not sure whether we really should add new shared runner jobs ... but
 I currently also don't see a better option for this: We don't have
 any custom ppc64 runners in our CI, so the only other way to test this
 is travis - but the Travis extension for gitlab recently broke, so
 the results there are currently rather neglected, I think...

 .gitlab-ci.d/crossbuilds.yml | 8 
 1 file changed, 8 insertions(+)

diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index c4cd96433d..8dbbb8f881 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -112,6 +112,14 @@ cross-ppc64el-user:
   variables:
 IMAGE: debian-ppc64el-cross
 
+cross-ppc64el-kvm-only:
+  extends: .cross_accel_build_job
+  needs:
+job: ppc64el-debian-cross-container
+  variables:
+IMAGE: debian-ppc64el-cross
+EXTRA_CONFIGURE_OPTS: --disable-tcg --without-default-devices
+
 # The riscv64 cross-builds currently use a 'sid' container to get
 # compilers and libraries. Until something more stable is found we
 # allow_failure so as not to block CI.
-- 
2.31.1




Re: [PATCH 1/1] hw/nvme: fix missing cq eventidx update

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 09:29, Klaus Jensen wrote:

From: Klaus Jensen 

Prior to reading the shadow doorbell cq head, we have to update the
eventidx. Otherwise, we risk that the driver will skip an mmio doorbell
write. This happens on riscv64, as reported by Guenter.

Adding the missing update to the cq eventidx fixes the issue.

Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support")
Reported-by: Guenter Roeck 
Signed-off-by: Klaus Jensen 
---
  hw/nvme/ctrl.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e54276dc1dc7..1192919b4869 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1331,6 +1331,13 @@ static inline void nvme_blk_write(BlockBackend *blk, 
int64_t offset,
  }
  }
  
+static void nvme_update_cq_eventidx(const NvmeCQueue *cq)

+{
+pci_dma_write(&cq->ctrl->parent_obj, cq->ei_addr, &cq->head,


'parent_obj' is a private field. Better to use the QOM accessor:

   pci_dma_write(PCI_DEVICE(&cq->ctrl), cq->ei_addr, &cq->head,


+  sizeof(cq->head));
+trace_pci_nvme_eventidx_cq(cq->cqid, cq->head);


Surprisingly the trace event format was already present in Jinhao respin...
https://lore.kernel.org/all/YqsIh+OUcWnHU3zp@apples/T/

Could we move the event before the call?


+}
+
  static void nvme_update_cq_head(NvmeCQueue *cq)
  {
  pci_dma_read(&cq->ctrl->parent_obj, cq->db_addr, &cq->head,
@@ -1351,6 +1358,7 @@ static void nvme_post_cqes(void *opaque)
  hwaddr addr;
  
  if (n->dbbuf_enabled) {

+nvme_update_cq_eventidx(cq);
  nvme_update_cq_head(cq);
  }
  





Re: [PATCH] gitlab-ci: Check building ppc64 without TCG

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 11:15, Thomas Huth wrote:

Building QEMU for ppc64 hosts with --disable-tcg used to break a couple
of times in the past, see e.g. commit a01b64cee7 ("target/ppc: Put do_rfi
under a TCG-only block") or commit 049b4ad669 ("target/ppc: Fix build
warnings when building with 'disable-tcg'"), so we should test this in
our CI to avoid such regressions.

Signed-off-by: Thomas Huth 
---
  Not sure whether we really should add new shared runner jobs ... but
  I currently also don't see a better option for this: We don't have
  any custom ppc64 runners in our CI, so the only other way to test this
  is travis - but the Travis extension for gitlab recently broke, so
  the results there are currently rather neglected, I think...

  .gitlab-ci.d/crossbuilds.yml | 8 
  1 file changed, 8 insertions(+)

diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index c4cd96433d..8dbbb8f881 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -112,6 +112,14 @@ cross-ppc64el-user:
variables:
  IMAGE: debian-ppc64el-cross
  
+cross-ppc64el-kvm-only:

+  extends: .cross_accel_build_job
+  needs:
+job: ppc64el-debian-cross-container
+  variables:
+IMAGE: debian-ppc64el-cross
+EXTRA_CONFIGURE_OPTS: --disable-tcg --without-default-devices


Maybe also --disable-tools --disable-docs?

Regardless:

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] gitlab-ci: Check building ppc64 without TCG

2022-12-08 Thread Thomas Huth

On 08/12/2022 11.17, Philippe Mathieu-Daudé wrote:

On 8/12/22 11:15, Thomas Huth wrote:

Building QEMU for ppc64 hosts with --disable-tcg used to break a couple
of times in the past, see e.g. commit a01b64cee7 ("target/ppc: Put do_rfi
under a TCG-only block") or commit 049b4ad669 ("target/ppc: Fix build
warnings when building with 'disable-tcg'"), so we should test this in
our CI to avoid such regressions.

Signed-off-by: Thomas Huth 
---
  Not sure whether we really should add new shared runner jobs ... but
  I currently also don't see a better option for this: We don't have
  any custom ppc64 runners in our CI, so the only other way to test this
  is travis - but the Travis extension for gitlab recently broke, so
  the results there are currently rather neglected, I think...

  .gitlab-ci.d/crossbuilds.yml | 8 
  1 file changed, 8 insertions(+)

diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index c4cd96433d..8dbbb8f881 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -112,6 +112,14 @@ cross-ppc64el-user:
    variables:
  IMAGE: debian-ppc64el-cross
+cross-ppc64el-kvm-only:
+  extends: .cross_accel_build_job
+  needs:
+    job: ppc64el-debian-cross-container
+  variables:
+    IMAGE: debian-ppc64el-cross
+    EXTRA_CONFIGURE_OPTS: --disable-tcg --without-default-devices


Maybe also --disable-tools --disable-docs?


These are already added by the template.


Regardless:

Reviewed-by: Philippe Mathieu-Daudé 


Thanks!

 Thomas





Re: [PATCH 1/1] hw/nvme: fix missing cq eventidx update

2022-12-08 Thread Klaus Jensen
On Dec  8 11:14, Philippe Mathieu-Daudé wrote:
> On 8/12/22 09:29, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > Prior to reading the shadow doorbell cq head, we have to update the
> > eventidx. Otherwise, we risk that the driver will skip an mmio doorbell
> > write. This happens on riscv64, as reported by Guenter.
> > 
> > Adding the missing update to the cq eventidx fixes the issue.
> > 
> > Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support")
> > Reported-by: Guenter Roeck 
> > Signed-off-by: Klaus Jensen 
> > ---
> >   hw/nvme/ctrl.c | 8 
> >   1 file changed, 8 insertions(+)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index e54276dc1dc7..1192919b4869 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -1331,6 +1331,13 @@ static inline void nvme_blk_write(BlockBackend *blk, 
> > int64_t offset,
> >   }
> >   }
> > +static void nvme_update_cq_eventidx(const NvmeCQueue *cq)
> > +{
> > +pci_dma_write(&cq->ctrl->parent_obj, cq->ei_addr, &cq->head,
> 
> 'parent_obj' is a private field. Better to use the QOM accessor:
> 
>pci_dma_write(PCI_DEVICE(&cq->ctrl), cq->ei_addr, &cq->head,

Ah yes, of course. I think we have a couple of other ->parent_obj
accesses that we should fix also.

> 
> > +  sizeof(cq->head));
> > +trace_pci_nvme_eventidx_cq(cq->cqid, cq->head);
> 
> Surprisingly the trace event format was already present in Jinhao respin...
> https://lore.kernel.org/all/YqsIh+OUcWnHU3zp@apples/T/
> 
> Could we move the event before the call?
> 

Makes sense.

> > +}
> > +
> >   static void nvme_update_cq_head(NvmeCQueue *cq)
> >   {
> >   pci_dma_read(&cq->ctrl->parent_obj, cq->db_addr, &cq->head,
> > @@ -1351,6 +1358,7 @@ static void nvme_post_cqes(void *opaque)
> >   hwaddr addr;
> >   if (n->dbbuf_enabled) {
> > +nvme_update_cq_eventidx(cq);
> >   nvme_update_cq_head(cq);
> >   }
> 


signature.asc
Description: PGP signature


Re: [PATCH 0/1] hw/nvme: shadow doorbells broken on riscv64

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 09:29, Klaus Jensen wrote:

From: Klaus Jensen 

Guenter reports[1] that hw/nvme is broken on riscv64.

This is a regression since 7.1, so this does not warrent an rc5 for 7.2.
I'm sure Guenter can carry this patch in his tree, and maybe we can get
this out in a stable release.


Delaying to 8.0 and adding a 'Cc: qemu-stable@' tag seems wise.

Cc'ing riscv list in case.


I really wonder why this issue only shows up on riscv64. We have not
observed this on other platforms (yet).

   [1]: 
https://lore.kernel.org/qemu-devel/20221207174918.ga1151...@roeck-us.net/

Klaus Jensen (1):
   hw/nvme: fix missing cq eventidx update

  hw/nvme/ctrl.c | 8 
  1 file changed, 8 insertions(+)






Re: [PATCH v2 01/16] hw/riscv: Select MSI_NONBROKEN in SIFIVE_PLIC

2022-12-08 Thread Philippe Mathieu-Daudé

On 7/12/22 11:03, Bin Meng wrote:

hw/pci/Kconfig says MSI_NONBROKEN should be selected by interrupt
controllers regardless of how MSI is implemented. msi_nonbroken is
initialized to true in sifive_plic_realize().

Let SIFIVE_PLIC select MSI_NONBROKEN and drop the selection from
RISC-V machines.


12 years with this inverted logic... *sigh*.

Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

(no changes since v1)

  hw/intc/Kconfig  | 1 +
  hw/riscv/Kconfig | 5 -
  2 files changed, 1 insertion(+), 5 deletions(-)






Re: [PATCH v2 02/16] hw/intc: Select MSI_NONBROKEN in RISC-V AIA interrupt controllers

2022-12-08 Thread Philippe Mathieu-Daudé

On 7/12/22 11:03, Bin Meng wrote:

hw/pci/Kconfig says MSI_NONBROKEN should be selected by interrupt
controllers regardless of how MSI is implemented. msi_nonbroken is
initialized to true in both riscv_aplic_realize() and
riscv_imsic_realize().

Select MSI_NONBROKEN in RISCV_APLIC and RISCV_IMSIC.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

(no changes since v1)

  hw/intc/Kconfig | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PULL 11/21] docs: Build and install all the docs in a single manual

2022-12-08 Thread Peter Maydell
On Thu, 8 Dec 2022 at 06:55, Stefan Weil  wrote:
>
> Am 12.01.21 um 17:57 schrieb Peter Maydell:
> [...]
> > diff --git a/docs/meson.build b/docs/meson.build

> > +  sphinxdocs += this_manual
> > +  install_subdir(output_dir, install_dir: qemu_docdir, strip_directory: 
> > true)
>
> This line causes a build warning with the latest code:
>
> ../../../docs/meson.build:74: WARNING: Project targets '>=0.61.3' but
> uses feature deprecated since '0.60.0': install_subdir with empty
> directory. It worked by accident and is buggy. Use install_emptydir instead.
>
> It looks like `qemu_docdir` is no longer defined anywhere.

It's defined by the top-level meson.build, in line 39:

qemu_docdir = get_option('docdir') / get_option('qemu_suffix')

The warning from meson is not about the value of install_dir,
though : it's saying "the directory you asked me to install from
the build tree doesn't exist, I'm going to create an empty
subdirectory in the install destination, but there's better ways
to do that if that's what you were intending to do". So the
question is why your build tree doesn't have the documentation
built in it -- it should be in $BUILDDIR/docs/manual/ .

thanks
-- PMM



Re: [PATCH v2 04/16] hw/riscv: Sort machines Kconfig options in alphabetical order

2022-12-08 Thread Philippe Mathieu-Daudé

On 7/12/22 11:03, Bin Meng wrote:

SHAKTI_C machine Kconfig option was inserted in disorder. Fix it.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

(no changes since v1)

  hw/riscv/Kconfig | 16 +---
  1 file changed, 9 insertions(+), 7 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 08/16] hw/intc: sifive_plic: Use error_setg() to propagate the error up via errp in sifive_plic_realize()

2022-12-08 Thread Philippe Mathieu-Daudé

On 7/12/22 11:03, Bin Meng wrote:

The realize() callback has an errp for us to propagate the error up.
While we are here, corret the wrong multi-line comment format.


Typo "correct"

Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Bin Meng 

---

Changes in v2:
- new patch: "hw/intc: sifive_plic: Use error_setg() to propagate the error up via 
errp in sifive_plic_realize()"

  hw/intc/sifive_plic.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)





Re: [PATCH v10 6/9] KVM: Unmap existing mappings when change the memory attributes

2022-12-08 Thread Chao Peng
On Wed, Dec 07, 2022 at 05:16:34PM +, Fuad Tabba wrote:
> Hi,
> 
> On Fri, Dec 2, 2022 at 6:19 AM Chao Peng  wrote:
> >
> > Unmap the existing guest mappings when memory attribute is changed
> > between shared and private. This is needed because shared pages and
> > private pages are from different backends, unmapping existing ones
> > gives a chance for page fault handler to re-populate the mappings
> > according to the new attribute.
> >
> > Only architecture has private memory support needs this and the
> > supported architecture is expected to rewrite the weak
> > kvm_arch_has_private_mem().
> 
> This kind of ties into the discussion of being able to share memory in
> place. For pKVM for example, shared and private memory would have the
> same backend, and the unmapping wouldn't be needed.
> 
> So I guess that, instead of kvm_arch_has_private_mem(), can the check
> be done differently, e.g., with a different function, say
> kvm_arch_private_notify_attribute_change() (but maybe with a more
> friendly name than what I suggested :) )?

Besides controlling the unmapping here, kvm_arch_has_private_mem() is
also used to gate the memslot KVM_MEM_PRIVATE flag in patch09. I know
unmapping is confirmed unnecessary for pKVM, but how about
KVM_MEM_PRIVATE? Will pKVM add its own flag or reuse KVM_MEM_PRIVATE?
If the answer is the latter, then yes we should use a different check
which only works for confidential usages here.

Thanks,
Chao
> 
> Thanks,
> /fuad
> 
> >
> > Also, during memory attribute changing and the unmapping time frame,
> > page fault handler may happen in the same memory range and can cause
> > incorrect page state, invoke kvm_mmu_invalidate_* helpers to let the
> > page fault handler retry during this time frame.
> >
> > Signed-off-by: Chao Peng 
> > ---
> >  include/linux/kvm_host.h |   7 +-
> >  virt/kvm/kvm_main.c  | 168 ++-
> >  2 files changed, 116 insertions(+), 59 deletions(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 3d69484d2704..3331c0c92838 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -255,7 +255,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t 
> > cr2_or_gpa,
> >  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
> >  #endif
> >
> > -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> >  struct kvm_gfn_range {
> > struct kvm_memory_slot *slot;
> > gfn_t start;
> > @@ -264,6 +263,8 @@ struct kvm_gfn_range {
> > bool may_block;
> >  };
> >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> > +
> > +#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> >  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> >  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> >  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> > @@ -785,11 +786,12 @@ struct kvm {
> >
> >  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > struct mmu_notifier mmu_notifier;
> > +#endif
> > unsigned long mmu_invalidate_seq;
> > long mmu_invalidate_in_progress;
> > gfn_t mmu_invalidate_range_start;
> > gfn_t mmu_invalidate_range_end;
> > -#endif
> > +
> > struct list_head devices;
> > u64 manual_dirty_log_protect;
> > struct dentry *debugfs_dentry;
> > @@ -1480,6 +1482,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct 
> > kvm_vcpu *vcpu);
> >  int kvm_arch_post_init_vm(struct kvm *kvm);
> >  void kvm_arch_pre_destroy_vm(struct kvm *kvm);
> >  int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> > +bool kvm_arch_has_private_mem(struct kvm *kvm);
> >
> >  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
> >  /*
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index ad55dfbc75d7..4e1e1e113bf0 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -520,6 +520,62 @@ void kvm_destroy_vcpus(struct kvm *kvm)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_destroy_vcpus);
> >
> > +void kvm_mmu_invalidate_begin(struct kvm *kvm)
> > +{
> > +   /*
> > +* The count increase must become visible at unlock time as no
> > +* spte can be established without taking the mmu_lock and
> > +* count is also read inside the mmu_lock critical section.
> > +*/
> > +   kvm->mmu_invalidate_in_progress++;
> > +
> > +   if (likely(kvm->mmu_invalidate_in_progress == 1)) {
> > +   kvm->mmu_invalidate_range_start = INVALID_GPA;
> > +   kvm->mmu_invalidate_range_end = INVALID_GPA;
> > +   }
> > +}
> > +
> > +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +   WARN_ON_ONCE(!kvm->mmu_invalidate_in_progress);
> > +
> > +   if (likely(kvm->mmu_invalidate_in_progress == 1)) {
> > +   kvm->mmu_invalidate_range_start = start;
> > +   kvm->mmu_invalidate_range_end = end;
> > +   } else {
> > +   /*
> > +

Re: [PATCH v10 7/9] KVM: Update lpage info when private/shared memory are mixed

2022-12-08 Thread Chao Peng
On Tue, Dec 06, 2022 at 10:42:24PM -0800, Isaku Yamahata wrote:
> On Tue, Dec 06, 2022 at 08:02:24PM +0800,
> Chao Peng  wrote:
> 
> > On Mon, Dec 05, 2022 at 02:49:59PM -0800, Isaku Yamahata wrote:
> > > On Fri, Dec 02, 2022 at 02:13:45PM +0800,
> > > Chao Peng  wrote:
> > > 
> > > > A large page with mixed private/shared subpages can't be mapped as large
> > > > page since its sub private/shared pages are from different memory
> > > > backends and may also treated by architecture differently. When
> > > > private/shared memory are mixed in a large page, the current lpage_info
> > > > is not sufficient to decide whether the page can be mapped as large page
> > > > or not and additional private/shared mixed information is needed.
> > > > 
> > > > Tracking this 'mixed' information with the current 'count' like
> > > > disallow_lpage is a bit challenge so reserve a bit in 'disallow_lpage'
> > > > to indicate a large page has mixed private/share subpages and update
> > > > this 'mixed' bit whenever the memory attribute is changed between
> > > > private and shared.
> > > > 
> > > > Signed-off-by: Chao Peng 
> > > > ---
> > > >  arch/x86/include/asm/kvm_host.h |   8 ++
> > > >  arch/x86/kvm/mmu/mmu.c  | 134 +++-
> > > >  arch/x86/kvm/x86.c  |   2 +
> > > >  include/linux/kvm_host.h|  19 +
> > > >  virt/kvm/kvm_main.c |   9 ++-
> > > >  5 files changed, 169 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > > b/arch/x86/include/asm/kvm_host.h
> > > > index 283cbb83d6ae..7772ab37ac89 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -38,6 +38,7 @@
> > > >  #include 
> > > >  
> > > >  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
> > > > +#define __KVM_HAVE_ARCH_SET_MEMORY_ATTRIBUTES
> > > >  
> > > >  #define KVM_MAX_VCPUS 1024
> > > >  
> > > > @@ -1011,6 +1012,13 @@ struct kvm_vcpu_arch {
> > > >  #endif
> > > >  };
> > > >  
> > > > +/*
> > > > + * Use a bit in disallow_lpage to indicate private/shared pages mixed 
> > > > at the
> > > > + * level. The remaining bits are used as a reference count.
> > > > + */
> > > > +#define KVM_LPAGE_PRIVATE_SHARED_MIXED (1U << 31)
> > > > +#define KVM_LPAGE_COUNT_MAX((1U << 31) - 1)
> > > > +
> > > >  struct kvm_lpage_info {
> > > > int disallow_lpage;
> > > >  };
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index e2c70b5afa3e..2190fd8c95c0 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -763,11 +763,16 @@ static void update_gfn_disallow_lpage_count(const 
> > > > struct kvm_memory_slot *slot,
> > > >  {
> > > > struct kvm_lpage_info *linfo;
> > > > int i;
> > > > +   int disallow_count;
> > > >  
> > > > for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
> > > > linfo = lpage_info_slot(gfn, slot, i);
> > > > +
> > > > +   disallow_count = linfo->disallow_lpage & 
> > > > KVM_LPAGE_COUNT_MAX;
> > > > +   WARN_ON(disallow_count + count < 0 ||
> > > > +   disallow_count > KVM_LPAGE_COUNT_MAX - count);
> > > > +
> > > > linfo->disallow_lpage += count;
> > > > -   WARN_ON(linfo->disallow_lpage < 0);
> > > > }
> > > >  }
> > > >  
> > > > @@ -6986,3 +6991,130 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
> > > > if (kvm->arch.nx_huge_page_recovery_thread)
> > > > kthread_stop(kvm->arch.nx_huge_page_recovery_thread);
> > > >  }
> > > > +
> > > > +static bool linfo_is_mixed(struct kvm_lpage_info *linfo)
> > > > +{
> > > > +   return linfo->disallow_lpage & KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > > > +}
> > > > +
> > > > +static void linfo_set_mixed(gfn_t gfn, struct kvm_memory_slot *slot,
> > > > +   int level, bool mixed)
> > > > +{
> > > > +   struct kvm_lpage_info *linfo = lpage_info_slot(gfn, slot, 
> > > > level);
> > > > +
> > > > +   if (mixed)
> > > > +   linfo->disallow_lpage |= KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > > > +   else
> > > > +   linfo->disallow_lpage &= 
> > > > ~KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > > > +}
> > > > +
> > > > +static bool is_expected_attr_entry(void *entry, unsigned long 
> > > > expected_attrs)
> > > > +{
> > > > +   bool expect_private = expected_attrs & 
> > > > KVM_MEMORY_ATTRIBUTE_PRIVATE;
> > > > +
> > > > +   if (xa_to_value(entry) & KVM_MEMORY_ATTRIBUTE_PRIVATE) {
> > > > +   if (!expect_private)
> > > > +   return false;
> > > > +   } else if (expect_private)
> > > > +   return false;
> > > > +
> > > > +   return true;
> > > > +}
> > > > +
> > > > +static bool mem_attrs_mixed_2m(struct kvm *kvm, unsigned long attrs,
> > > > +  gfn_t start, gfn_t end)
> > > > +

Re: [PATCH v10 6/9] KVM: Unmap existing mappings when change the memory attributes

2022-12-08 Thread Chao Peng
On Wed, Dec 07, 2022 at 04:13:14PM +0800, Yuan Yao wrote:
> On Fri, Dec 02, 2022 at 02:13:44PM +0800, Chao Peng wrote:
> > Unmap the existing guest mappings when memory attribute is changed
> > between shared and private. This is needed because shared pages and
> > private pages are from different backends, unmapping existing ones
> > gives a chance for page fault handler to re-populate the mappings
> > according to the new attribute.
> >
> > Only architecture has private memory support needs this and the
> > supported architecture is expected to rewrite the weak
> > kvm_arch_has_private_mem().
> >
> > Also, during memory attribute changing and the unmapping time frame,
> > page fault handler may happen in the same memory range and can cause
> > incorrect page state, invoke kvm_mmu_invalidate_* helpers to let the
> > page fault handler retry during this time frame.
> >
> > Signed-off-by: Chao Peng 
> > ---
> >  include/linux/kvm_host.h |   7 +-
> >  virt/kvm/kvm_main.c  | 168 ++-
> >  2 files changed, 116 insertions(+), 59 deletions(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 3d69484d2704..3331c0c92838 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -255,7 +255,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t 
> > cr2_or_gpa,
> >  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
> >  #endif
> >
> > -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> >  struct kvm_gfn_range {
> > struct kvm_memory_slot *slot;
> > gfn_t start;
> > @@ -264,6 +263,8 @@ struct kvm_gfn_range {
> > bool may_block;
> >  };
> >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> > +
> > +#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> >  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> >  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> >  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> > @@ -785,11 +786,12 @@ struct kvm {
> >
> >  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > struct mmu_notifier mmu_notifier;
> > +#endif
> > unsigned long mmu_invalidate_seq;
> > long mmu_invalidate_in_progress;
> > gfn_t mmu_invalidate_range_start;
> > gfn_t mmu_invalidate_range_end;
> > -#endif
> > +
> > struct list_head devices;
> > u64 manual_dirty_log_protect;
> > struct dentry *debugfs_dentry;
> > @@ -1480,6 +1482,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct 
> > kvm_vcpu *vcpu);
> >  int kvm_arch_post_init_vm(struct kvm *kvm);
> >  void kvm_arch_pre_destroy_vm(struct kvm *kvm);
> >  int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> > +bool kvm_arch_has_private_mem(struct kvm *kvm);
> >
> >  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
> >  /*
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index ad55dfbc75d7..4e1e1e113bf0 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -520,6 +520,62 @@ void kvm_destroy_vcpus(struct kvm *kvm)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_destroy_vcpus);
> >
> > +void kvm_mmu_invalidate_begin(struct kvm *kvm)
> > +{
> > +   /*
> > +* The count increase must become visible at unlock time as no
> > +* spte can be established without taking the mmu_lock and
> > +* count is also read inside the mmu_lock critical section.
> > +*/
> > +   kvm->mmu_invalidate_in_progress++;
> > +
> > +   if (likely(kvm->mmu_invalidate_in_progress == 1)) {
> > +   kvm->mmu_invalidate_range_start = INVALID_GPA;
> > +   kvm->mmu_invalidate_range_end = INVALID_GPA;
> > +   }
> > +}
> > +
> > +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +   WARN_ON_ONCE(!kvm->mmu_invalidate_in_progress);
> > +
> > +   if (likely(kvm->mmu_invalidate_in_progress == 1)) {
> > +   kvm->mmu_invalidate_range_start = start;
> > +   kvm->mmu_invalidate_range_end = end;
> > +   } else {
> > +   /*
> > +* Fully tracking multiple concurrent ranges has diminishing
> > +* returns. Keep things simple and just find the minimal range
> > +* which includes the current and new ranges. As there won't be
> > +* enough information to subtract a range after its invalidate
> > +* completes, any ranges invalidated concurrently will
> > +* accumulate and persist until all outstanding invalidates
> > +* complete.
> > +*/
> > +   kvm->mmu_invalidate_range_start =
> > +   min(kvm->mmu_invalidate_range_start, start);
> > +   kvm->mmu_invalidate_range_end =
> > +   max(kvm->mmu_invalidate_range_end, end);
> > +   }
> > +}
> > +
> > +void kvm_mmu_invalidate_end(struct kvm *kvm)
> > +{
> > +   /*
> > +* This sequence increase will notify the kvm page fault that
> > +* the page that is going to be mapped in the spte could have
> > +* been freed.
> > +*/

Re: [PATCH v10 8/9] KVM: Handle page fault for private memory

2022-12-08 Thread Chao Peng
On Thu, Dec 08, 2022 at 10:29:18AM +0800, Yuan Yao wrote:
> On Fri, Dec 02, 2022 at 02:13:46PM +0800, Chao Peng wrote:
> > A KVM_MEM_PRIVATE memslot can include both fd-based private memory and
> > hva-based shared memory. Architecture code (like TDX code) can tell
> > whether the on-going fault is private or not. This patch adds a
> > 'is_private' field to kvm_page_fault to indicate this and architecture
> > code is expected to set it.
> >
> > To handle page fault for such memslot, the handling logic is different
> > depending on whether the fault is private or shared. KVM checks if
> > 'is_private' matches the host's view of the page (maintained in
> > mem_attr_array).
> >   - For a successful match, private pfn is obtained with
> > restrictedmem_get_page() and shared pfn is obtained with existing
> > get_user_pages().
> >   - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to
> > userspace. Userspace then can convert memory between private/shared
> > in host's view and retry the fault.
> >
> > Co-developed-by: Yu Zhang 
> > Signed-off-by: Yu Zhang 
> > Signed-off-by: Chao Peng 
> > ---
> >  arch/x86/kvm/mmu/mmu.c  | 63 +++--
> >  arch/x86/kvm/mmu/mmu_internal.h | 14 +++-
> >  arch/x86/kvm/mmu/mmutrace.h |  1 +
> >  arch/x86/kvm/mmu/tdp_mmu.c  |  2 +-
> >  include/linux/kvm_host.h| 30 
> >  5 files changed, 105 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 2190fd8c95c0..b1953ebc012e 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3058,7 +3058,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, 
> > gfn_t gfn,
> >
> >  int kvm_mmu_max_mapping_level(struct kvm *kvm,
> >   const struct kvm_memory_slot *slot, gfn_t gfn,
> > - int max_level)
> > + int max_level, bool is_private)
> >  {
> > struct kvm_lpage_info *linfo;
> > int host_level;
> > @@ -3070,6 +3070,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> > break;
> > }
> >
> > +   if (is_private)
> > +   return max_level;
> 
> lpage mixed information already saved, so is that possible
> to query info->disallow_lpage without care 'is_private' ?

Actually we already queried info->disallow_lpage just before this
sentence. The check is needed because later in the function we call
host_pfn_mapping_level() which is shared memory specific.

Thanks,
Chao
> 
> > +
> > if (max_level == PG_LEVEL_4K)
> > return PG_LEVEL_4K;
> >
> > @@ -3098,7 +3101,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, 
> > struct kvm_page_fault *fault
> >  * level, which will be used to do precise, accurate accounting.
> >  */
> > fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
> > -fault->gfn, 
> > fault->max_level);
> > +fault->gfn, 
> > fault->max_level,
> > +fault->is_private);
> > if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
> > return;
> >
> > @@ -4178,6 +4182,49 @@ void kvm_arch_async_page_ready(struct kvm_vcpu 
> > *vcpu, struct kvm_async_pf *work)
> > kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
> >  }
> >
> > +static inline u8 order_to_level(int order)
> > +{
> > +   BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
> > +
> > +   if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
> > +   return PG_LEVEL_1G;
> > +
> > +   if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> > +   return PG_LEVEL_2M;
> > +
> > +   return PG_LEVEL_4K;
> > +}
> > +
> > +static int kvm_do_memory_fault_exit(struct kvm_vcpu *vcpu,
> > +   struct kvm_page_fault *fault)
> > +{
> > +   vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
> > +   if (fault->is_private)
> > +   vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE;
> > +   else
> > +   vcpu->run->memory.flags = 0;
> > +   vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
> > +   vcpu->run->memory.size = PAGE_SIZE;
> > +   return RET_PF_USER;
> > +}
> > +
> > +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > +  struct kvm_page_fault *fault)
> > +{
> > +   int order;
> > +   struct kvm_memory_slot *slot = fault->slot;
> > +
> > +   if (!kvm_slot_can_be_private(slot))
> > +   return kvm_do_memory_fault_exit(vcpu, fault);
> > +
> > +   if (kvm_restricted_mem_get_pfn(slot, fault->gfn, &fault->pfn, &order))
> > +   return RET_PF_RETRY;
> > +
> > +   fault->max_level = min(order_to_level(order), fault->max_level);
> > +   fault->map_writable = !(slot->flags & KVM_MEM_READONLY);
> > +   return RET_PF_CONTINUE;
> > +}
> > +
> >  static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_pa

Re: [PATCH v10 3/9] KVM: Extend the memslot to support fd-based private memory

2022-12-08 Thread Chao Peng
On Thu, Dec 08, 2022 at 04:37:03PM +0800, Xiaoyao Li wrote:
> On 12/2/2022 2:13 PM, Chao Peng wrote:
> 
> ..
> 
> > Together with the change, a new config HAVE_KVM_RESTRICTED_MEM is added
> > and right now it is selected on X86_64 only.
> > 
> 
> From the patch implementation, I have no idea why HAVE_KVM_RESTRICTED_MEM is
> needed.

The reason is we want KVM further controls the feature enabling. An
opt-in CONFIG_RESTRICTEDMEM can cause problem if user sets that for
unsupported architectures.

Here is the original discussion:
https://lore.kernel.org/all/ykjlfu98hzovt...@google.com/

Thanks,
Chao



Re: [PATCH v2 00/12] qemu-img info: Show protocol-level information

2022-12-08 Thread Hanna Reitz

On 20.06.22 18:26, Hanna Reitz wrote:

Hi,

This series is a v2 to:

https://lists.nongnu.org/archive/html/qemu-block/2022-05/msg00042.html


Ping, it looks like this still applies (to the master branch and kevin’s 
block-next branch at least).


Hanna


So the final state is that despite the QAPI changes, hopefully only the
qemu-img info output changes, and it now prints every image node’s
subgraph.  This results in an output like the following (for a qcow2
image):

image: test.qcow2
file format: qcow2
virtual size: 64 MiB (67108864 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
 compat: 1.1
 compression type: zlib
 lazy refcounts: false
 refcount bits: 16
 corrupt: false
 extended l2: false
Child node '/file':
 filename: test.qcow2
 protocol type: file
 file length: 192 KiB (197120 bytes)
 disk size: 196 KiB
 Format specific information:
 extent size hint: 1048576





[PATCH v2 3/3] hw/nvme: fix missing cq eventidx update

2022-12-08 Thread Klaus Jensen
From: Klaus Jensen 

Prior to reading the shadow doorbell cq head, we have to update the
eventidx. Otherwise, we risk that the driver will skip an mmio doorbell
write. This happens on riscv64, as reported by Guenter.

Adding the missing update to the cq eventidx fixes the issue.

Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support")
Cc: qemu-sta...@nongnu.org
Cc: qemu-ri...@nongnu.org
Reported-by: Guenter Roeck 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index cfab21b3436e..f6cc766aba4a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1334,6 +1334,14 @@ static inline void nvme_blk_write(BlockBackend *blk, 
int64_t offset,
 }
 }
 
+static void nvme_update_cq_eventidx(const NvmeCQueue *cq)
+{
+trace_pci_nvme_update_cq_eventidx(cq->cqid, cq->head);
+
+pci_dma_write(PCI_DEVICE(cq->ctrl), cq->ei_addr, &cq->head,
+  sizeof(cq->head));
+}
+
 static void nvme_update_cq_head(NvmeCQueue *cq)
 {
 pci_dma_read(PCI_DEVICE(cq->ctrl), cq->db_addr, &cq->head,
@@ -1355,6 +1363,7 @@ static void nvme_post_cqes(void *opaque)
 hwaddr addr;
 
 if (n->dbbuf_enabled) {
+nvme_update_cq_eventidx(cq);
 nvme_update_cq_head(cq);
 }
 
-- 
2.38.1




[PATCH v2 0/3] hw/nvme: shadow doorbells broken on riscv64

2022-12-08 Thread Klaus Jensen
From: Klaus Jensen 

Guenter reports[1] that hw/nvme is broken on riscv64.

This is a regression since 7.1, so this does not warrent an rc5 for 7.2.
I'm sure Guenter can carry this patch in his tree, and maybe we can get
this out in a stable release.

I really wonder why this issue only shows up on riscv64. We have not
observed this on other platforms (yet).

  [1]: https://lore.kernel.org/qemu-devel/20221207174918.ga1151...@roeck-us.net/

v2:
 - use QOM accessor (Philippe)
 - added some cleanup patches in front

Klaus Jensen (3):
  hw/nvme: use QOM accessors
  hw/nvme: rename shadow doorbell related trace events
  hw/nvme: fix missing cq eventidx update

 hw/nvme/ctrl.c   | 109 +--
 hw/nvme/trace-events |   8 ++--
 2 files changed, 68 insertions(+), 49 deletions(-)

-- 
2.38.1




[PATCH v2 2/3] hw/nvme: rename shadow doorbell related trace events

2022-12-08 Thread Klaus Jensen
From: Klaus Jensen 

Rename the trace events related to writing the event index and reading
the doorbell value to make it more clear that the event is associated
with an actual update (write or read respectively).

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 11 +++
 hw/nvme/trace-events |  8 
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 6b70c1e39831..cfab21b3436e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1337,8 +1337,9 @@ static inline void nvme_blk_write(BlockBackend *blk, 
int64_t offset,
 static void nvme_update_cq_head(NvmeCQueue *cq)
 {
 pci_dma_read(PCI_DEVICE(cq->ctrl), cq->db_addr, &cq->head,
-sizeof(cq->head));
-trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head);
+ sizeof(cq->head));
+
+trace_pci_nvme_update_cq_head(cq->cqid, cq->head);
 }
 
 static void nvme_post_cqes(void *opaque)
@@ -6147,16 +6148,18 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 
 static void nvme_update_sq_eventidx(const NvmeSQueue *sq)
 {
+trace_pci_nvme_update_sq_eventidx(sq->sqid, sq->tail);
+
 pci_dma_write(PCI_DEVICE(sq->ctrl), sq->ei_addr, &sq->tail,
   sizeof(sq->tail));
-trace_pci_nvme_eventidx_sq(sq->sqid, sq->tail);
 }
 
 static void nvme_update_sq_tail(NvmeSQueue *sq)
 {
 pci_dma_read(PCI_DEVICE(sq->ctrl), sq->db_addr, &sq->tail,
  sizeof(sq->tail));
-trace_pci_nvme_shadow_doorbell_sq(sq->sqid, sq->tail);
+
+trace_pci_nvme_update_sq_tail(sq->sqid, sq->tail);
 }
 
 static void nvme_process_sq(void *opaque)
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index fccb79f48973..b16f2260b4fd 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -84,8 +84,8 @@ pci_nvme_enqueue_event_noqueue(int queued) "queued %d"
 pci_nvme_enqueue_event_masked(uint8_t typ) "type 0x%"PRIx8""
 pci_nvme_no_outstanding_aers(void) "ignoring event; no outstanding AERs"
 pci_nvme_enqueue_req_completion(uint16_t cid, uint16_t cqid, uint32_t dw0, 
uint32_t dw1, uint16_t status) "cid %"PRIu16" cqid %"PRIu16" dw0 0x%"PRIx32" 
dw1 0x%"PRIx32" status 0x%"PRIx16""
-pci_nvme_eventidx_cq(uint16_t cqid, uint16_t new_eventidx) "cqid %"PRIu16" 
new_eventidx %"PRIu16""
-pci_nvme_eventidx_sq(uint16_t sqid, uint16_t new_eventidx) "sqid %"PRIu16" 
new_eventidx %"PRIu16""
+pci_nvme_update_cq_eventidx(uint16_t cqid, uint16_t new_eventidx) "cqid 
%"PRIu16" new_eventidx %"PRIu16""
+pci_nvme_update_sq_eventidx(uint16_t sqid, uint16_t new_eventidx) "sqid 
%"PRIu16" new_eventidx %"PRIu16""
 pci_nvme_mmio_read(uint64_t addr, unsigned size) "addr 0x%"PRIx64" size %d"
 pci_nvme_mmio_write(uint64_t addr, uint64_t data, unsigned size) "addr 
0x%"PRIx64" data 0x%"PRIx64" size %d"
 pci_nvme_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" 
new_head %"PRIu16""
@@ -102,8 +102,8 @@ pci_nvme_mmio_start_success(void) "setting controller 
enable bit succeeded"
 pci_nvme_mmio_stopped(void) "cleared controller enable bit"
 pci_nvme_mmio_shutdown_set(void) "shutdown bit set"
 pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared"
-pci_nvme_shadow_doorbell_cq(uint16_t cqid, uint16_t new_shadow_doorbell) "cqid 
%"PRIu16" new_shadow_doorbell %"PRIu16""
-pci_nvme_shadow_doorbell_sq(uint16_t sqid, uint16_t new_shadow_doorbell) "sqid 
%"PRIu16" new_shadow_doorbell %"PRIu16""
+pci_nvme_update_cq_head(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" 
new_head %"PRIu16""
+pci_nvme_update_sq_tail(uint16_t sqid, uint16_t new_tail) "sqid %"PRIu16" 
new_tail %"PRIu16""
 pci_nvme_open_zone(uint64_t slba, uint32_t zone_idx, int all) "open zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
 pci_nvme_close_zone(uint64_t slba, uint32_t zone_idx, int all) "close zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
 pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
-- 
2.38.1




[PATCH v2 1/3] hw/nvme: use QOM accessors

2022-12-08 Thread Klaus Jensen
From: Klaus Jensen 

Replace various ->parent_obj use with the equivalent QOM accessors.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 89 +++---
 1 file changed, 48 insertions(+), 41 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e54276dc1dc7..6b70c1e39831 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -449,7 +449,7 @@ static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 return 0;
 }
 
-return pci_dma_read(&n->parent_obj, addr, buf, size);
+return pci_dma_read(PCI_DEVICE(n), addr, buf, size);
 }
 
 static int nvme_addr_write(NvmeCtrl *n, hwaddr addr, const void *buf, int size)
@@ -469,7 +469,7 @@ static int nvme_addr_write(NvmeCtrl *n, hwaddr addr, const 
void *buf, int size)
 return 0;
 }
 
-return pci_dma_write(&n->parent_obj, addr, buf, size);
+return pci_dma_write(PCI_DEVICE(n), addr, buf, size);
 }
 
 static bool nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
@@ -514,24 +514,27 @@ static uint8_t nvme_sq_empty(NvmeSQueue *sq)
 
 static void nvme_irq_check(NvmeCtrl *n)
 {
+PCIDevice *pci = PCI_DEVICE(n);
 uint32_t intms = ldl_le_p(&n->bar.intms);
 
-if (msix_enabled(&(n->parent_obj))) {
+if (msix_enabled(pci)) {
 return;
 }
 if (~intms & n->irq_status) {
-pci_irq_assert(&n->parent_obj);
+pci_irq_assert(pci);
 } else {
-pci_irq_deassert(&n->parent_obj);
+pci_irq_deassert(pci);
 }
 }
 
 static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
 {
+PCIDevice *pci = PCI_DEVICE(n);
+
 if (cq->irq_enabled) {
-if (msix_enabled(&(n->parent_obj))) {
+if (msix_enabled(pci)) {
 trace_pci_nvme_irq_msix(cq->vector);
-msix_notify(&(n->parent_obj), cq->vector);
+msix_notify(pci, cq->vector);
 } else {
 trace_pci_nvme_irq_pin();
 assert(cq->vector < 32);
@@ -546,7 +549,7 @@ static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
 static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
 {
 if (cq->irq_enabled) {
-if (msix_enabled(&(n->parent_obj))) {
+if (msix_enabled(PCI_DEVICE(n))) {
 return;
 } else {
 assert(cq->vector < 32);
@@ -570,7 +573,7 @@ static void nvme_req_clear(NvmeRequest *req)
 static inline void nvme_sg_init(NvmeCtrl *n, NvmeSg *sg, bool dma)
 {
 if (dma) {
-pci_dma_sglist_init(&sg->qsg, &n->parent_obj, 0);
+pci_dma_sglist_init(&sg->qsg, PCI_DEVICE(n), 0);
 sg->flags = NVME_SG_DMA;
 } else {
 qemu_iovec_init(&sg->iov, 0);
@@ -1333,7 +1336,7 @@ static inline void nvme_blk_write(BlockBackend *blk, 
int64_t offset,
 
 static void nvme_update_cq_head(NvmeCQueue *cq)
 {
-pci_dma_read(&cq->ctrl->parent_obj, cq->db_addr, &cq->head,
+pci_dma_read(PCI_DEVICE(cq->ctrl), cq->db_addr, &cq->head,
 sizeof(cq->head));
 trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head);
 }
@@ -1363,7 +1366,7 @@ static void nvme_post_cqes(void *opaque)
 req->cqe.sq_id = cpu_to_le16(sq->sqid);
 req->cqe.sq_head = cpu_to_le16(sq->head);
 addr = cq->dma_addr + cq->tail * n->cqe_size;
-ret = pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
+ret = pci_dma_write(PCI_DEVICE(n), addr, (void *)&req->cqe,
 sizeof(req->cqe));
 if (ret) {
 trace_pci_nvme_err_addr_write(addr);
@@ -4615,6 +4618,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest 
*req)
 
 static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 {
+PCIDevice *pci = PCI_DEVICE(n);
 uint16_t offset = (cq->cqid << 3) + (1 << 2);
 
 n->cq[cq->cqid] = NULL;
@@ -4625,8 +4629,8 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 event_notifier_set_handler(&cq->notifier, NULL);
 event_notifier_cleanup(&cq->notifier);
 }
-if (msix_enabled(&n->parent_obj)) {
-msix_vector_unuse(&n->parent_obj, cq->vector);
+if (msix_enabled(pci)) {
+msix_vector_unuse(pci, cq->vector);
 }
 if (cq->cqid) {
 g_free(cq);
@@ -4664,8 +4668,10 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, 
uint64_t dma_addr,
  uint16_t cqid, uint16_t vector, uint16_t size,
  uint16_t irq_enabled)
 {
-if (msix_enabled(&n->parent_obj)) {
-msix_vector_use(&n->parent_obj, vector);
+PCIDevice *pci = PCI_DEVICE(n);
+
+if (msix_enabled(pci)) {
+msix_vector_use(pci, vector);
 }
 cq->ctrl = n;
 cq->cqid = cqid;
@@ -4716,7 +4722,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_cq_addr(prp1);
 return NVME_INVALID_PRP_OFFSET | NVME_DNR;
 }
-if (unlikely(!msix_enabled(&n->parent_obj) && vector)) {
+if (unlikely(!msix_enabled(PCI_DEVICE(n)) && vector)) {
 trace_pci_nvme_err_inva

[PATCH] target/i386/gdbstub: Fix a bug about order of FPU stack in 'g' packets.

2022-12-08 Thread TaiseiIto
Before this commit, when GDB attached an OS working on QEMU, order of FPU
stack registers printed by GDB command 'info float' was wrong. There was a
bug causing the problem in 'g' packets sent by QEMU to GDB. The packets have
values of registers of machine emulated by QEMU containing FPU stack
registers. There are 2 ways to specify a x87 FPU stack register. The first
is specifying by absolute indexed register names (R0, ..., R7). The second
is specifying by stack top relative indexed register names (ST0, ..., ST7).
Values of the FPU stack registers should be located in 'g' packet and be
ordered by the relative index. But QEMU had located these registers ordered
by the absolute index. After this commit, when QEMU reads registers to make
a 'g' packet, QEMU specifies FPU stack registers by the relative index.
Then, the registers are ordered correctly in the packet. As a result, GDB,
the packet receiver, can print FPU stack registers in the correct order.

Signed-off-by: TaiseiIto 
---
 target/i386/gdbstub.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/gdbstub.c b/target/i386/gdbstub.c
index c3a2cf6f28..6109ad166d 100644
--- a/target/i386/gdbstub.c
+++ b/target/i386/gdbstub.c
@@ -121,7 +121,9 @@ int x86_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 return gdb_get_reg32(mem_buf, env->regs[gpr_map32[n]]);
 }
 } else if (n >= IDX_FP_REGS && n < IDX_FP_REGS + 8) {
-floatx80 *fp = (floatx80 *) &env->fpregs[n - IDX_FP_REGS];
+int st_index = n - IDX_FP_REGS;
+int r_index = (st_index + env->fpstt) % 8;
+floatx80 *fp = (floatx80 *) &env->fpregs[r_index];
 int len = gdb_get_reg64(mem_buf, cpu_to_le64(fp->low));
 len += gdb_get_reg16(mem_buf, cpu_to_le16(fp->high));
 return len;
-- 
2.34.1




Re: [PATCH v2 1/3] hw/nvme: use QOM accessors

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 13:26, Klaus Jensen wrote:

From: Klaus Jensen 

Replace various ->parent_obj use with the equivalent QOM accessors.

Signed-off-by: Klaus Jensen 
---
  hw/nvme/ctrl.c | 89 +++---
  1 file changed, 48 insertions(+), 41 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 2/3] hw/nvme: rename shadow doorbell related trace events

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 13:26, Klaus Jensen wrote:

From: Klaus Jensen 

Rename the trace events related to writing the event index and reading
the doorbell value to make it more clear that the event is associated
with an actual update (write or read respectively).

Signed-off-by: Klaus Jensen 
---
  hw/nvme/ctrl.c   | 11 +++
  hw/nvme/trace-events |  8 
  2 files changed, 11 insertions(+), 8 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




[PATCH] target/i386: Fix wrong XSAVE feature names

2022-12-08 Thread Xiaocheng Dong
The previous patch changes the name from FEAT_XSAVE_COMP_{LO|HI}
to FEAT_XSAVE_XCR0_{LO|HI}, the changes for CPUID.0x12.0x1 should be
FEAT_XSAVE_XCR0_{LO|HI}, the SGX can't work in VM if these are not right

Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")

Signed-off-by: Xiaocheng Dong 
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 22b681ca37..0f71ff9fea 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5584,8 +5584,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 } else {
 *eax &= env->features[FEAT_SGX_12_1_EAX];
 *ebx &= 0; /* ebx reserve */
-*ecx &= env->features[FEAT_XSAVE_XSS_LO];
-*edx &= env->features[FEAT_XSAVE_XSS_HI];
+*ecx &= env->features[FEAT_XSAVE_XCR0_LO];
+*edx &= env->features[FEAT_XSAVE_XCR0_HI];
 
 /* FP and SSE are always allowed regardless of XSAVE/XCR0. */
 *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;
-- 
2.31.1




[PATCH] util/oslib-win32: Remove obsolete reference to g_poll code

2022-12-08 Thread Thomas Huth
The comment about g_poll is not required here anymore since
the corresponding code has been removed a while ago already.

Fixes: b4c6036faa ("configure: bump min required glib version to 2.56")
Signed-off-by: Thomas Huth 
---
 util/oslib-win32.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index a67cb3822e..07ade41800 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -24,10 +24,6 @@
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
- *
- * The implementation of g_poll (functions poll_rest, g_poll) at the end of
- * this file are based on code from GNOME glib-2 and use a different license,
- * see the license comment there.
  */
 
 #include "qemu/osdep.h"
-- 
2.31.1




WHPX CPUID-0x40000000

2022-12-08 Thread Qi, Yadong

Hi, all
May I know why the Hypervisor vendor signature value is cleared to 0 in 
whpx-all.c?


https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/whpx/whpx-all.c#L1963
switch (cpuid_fn) {
case 0x4000:
/* Expose the vmware cpufrequency cpuid leaf */
rax = 0x4010;
rbx = rcx = rdx = 0;
break;


--
Best Regard
Yadong



[PATCH] gitlab-ci: Disable docs and GUIs for the build-tci and build-tcg-disabled jobs

2022-12-08 Thread Thomas Huth
These jobs use their own "script:" section and thus do not profit from
the global "--disable-docs" from the template. While we're at it, disable
also some GUI front ends here since we do not gain any additional test
coverage by compiling those here again.

Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/buildtest.yml | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index d21b4a1fd4..8d60c9c0a6 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -244,6 +244,7 @@ build-tcg-disabled:
 - mkdir build
 - cd build
 - ../configure --disable-tcg --audio-drv-list="" --with-coroutine=ucontext
+   --disable-docs --disable-sdl --disable-gtk --disable-vnc
   || { cat config.log meson-logs/meson-log.txt && exit 1; }
 - make -j"$JOBS"
 - make check-unit
@@ -534,8 +535,9 @@ build-tci:
 - TARGETS="aarch64 alpha arm hppa m68k microblaze ppc64 s390x x86_64"
 - mkdir build
 - cd build
-- ../configure --enable-tcg-interpreter
---target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; 
done)" || { cat config.log meson-logs/meson-log.txt && exit 1; }
+- ../configure --enable-tcg-interpreter --disable-docs --disable-gtk 
--disable-vnc
+--target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; 
done)"
+|| { cat config.log meson-logs/meson-log.txt && exit 1; }
 - make -j"$JOBS"
 - make tests/qtest/boot-serial-test tests/qtest/cdrom-test 
tests/qtest/pxe-test
 - for tg in $TARGETS ; do
-- 
2.31.1




Re: [PATCH v3 02/13] tcg/s390x: Remove TCG_REG_TB

2022-12-08 Thread Ilya Leoshkevich
On Tue, 2022-12-06 at 18:42 -0600, Richard Henderson wrote:
> On 12/6/22 16:22, Richard Henderson wrote:
> > > Wouldn't it be worth keeping XILF/XIFH here?
> > 
> > I don't know.  It's difficult for me to guess whether a dependency
> > chain like
> > 
> >  val -> xor -> xor
> > 
> > (3 insns with serial dependencies) is better than
> > 
> >  val   --> xor
> >  load  -/
> > 
> > (3 insns, but only one serial dependency) is better.  But there may
> > also be instruction 
> > fusion going on at the micro-architectural level, so that there's
> > really only one xor.
> > 
> > If you have suggestions, I'm all ears.
> 
> Related microarchitectural question:
> 
> If a 32-bit insn and a 64-bit insn have a similar size encoding (and
> perhaps even if they 
> don't), is it better to produce a 64-bit output so that the hw
> doesn't have a false 
> dependency on the upper 32-bits of the register?
> 
> Just wondering whether most of the distinction between 32-bit and 64-
> bit opcodes ought to 
> be discarded, simplifying code generation.  The only items that seem
> most likely to have 
> real execution time differences are multiply and divide.

I think this will definitely lead to false dependencies if one produces
32 bits in one place, and then consumes 64 in the other. But if this
idea is applied consistently, then there should be hopefully not so
many such instances?

Two thing come to mind here: memory and CC generation.

The first is probably not very important: we can implement 32-bit loads
with lgf, which sign-extends to 64 bits.

CC generation can be tricker: for conditional jumps it's still going to
be okay, since the code would produce 64-bit values and consume 32-bit
ones, but if the back-end needs a CC from a 32-bit addition, then we
would probably need to sign-extend the result in order to get rid of a
false dependency later on. However, based on a quick inspection I could
not find any such instances.

So using 64-bit operations instead of 32-bit ones would be an
interesting experiment.

Best regards,
Ilya



Re: [PATCH] util/oslib-win32: Remove obsolete reference to g_poll code

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 14:32, Thomas Huth wrote:

The comment about g_poll is not required here anymore since
the corresponding code has been removed a while ago already.

Fixes: b4c6036faa ("configure: bump min required glib version to 2.56")
Signed-off-by: Thomas Huth 
---
  util/oslib-win32.c | 4 
  1 file changed, 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 





[PATCH 1/4] coroutine: Clean up superfluous inclusion of qemu/coroutine.h

2022-12-08 Thread Markus Armbruster
Signed-off-by: Markus Armbruster 
---
 crypto/block-luks-priv.h| 1 -
 include/block/raw-aio.h | 1 -
 include/scsi/pr-manager.h   | 1 -
 nbd/nbd-internal.h  | 1 -
 blockjob.c  | 1 -
 crypto/block-luks.c | 1 -
 hw/9pfs/codir.c | 1 -
 hw/9pfs/cofile.c| 1 -
 hw/9pfs/cofs.c  | 1 -
 hw/9pfs/coxattr.c   | 1 -
 tests/unit/test-coroutine.c | 1 -
 tests/unit/test-vmstate.c   | 1 -
 util/qemu-coroutine-lock.c  | 1 -
 util/qemu-coroutine-sleep.c | 1 -
 util/qemu-coroutine.c   | 1 -
 15 files changed, 15 deletions(-)

diff --git a/crypto/block-luks-priv.h b/crypto/block-luks-priv.h
index 90a20d432b..dc2dd14e52 100644
--- a/crypto/block-luks-priv.h
+++ b/crypto/block-luks-priv.h
@@ -31,7 +31,6 @@
 #include "crypto/random.h"
 #include "qemu/uuid.h"
 
-#include "qemu/coroutine.h"
 #include "qemu/bitmap.h"
 
 /*
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 21fc10c4c9..f8cda9df91 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -17,7 +17,6 @@
 #define QEMU_RAW_AIO_H
 
 #include "block/aio.h"
-#include "qemu/coroutine.h"
 #include "qemu/iov.h"
 
 /* AIO request types */
diff --git a/include/scsi/pr-manager.h b/include/scsi/pr-manager.h
index e4ecbe00f6..45de28d354 100644
--- a/include/scsi/pr-manager.h
+++ b/include/scsi/pr-manager.h
@@ -5,7 +5,6 @@
 #include "qapi/visitor.h"
 #include "qom/object_interfaces.h"
 #include "block/aio.h"
-#include "qemu/coroutine.h"
 
 #define TYPE_PR_MANAGER "pr-manager"
 
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 1b2141ab4b..df42fef706 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -13,7 +13,6 @@
 #include "sysemu/block-backend.h"
 #include "io/channel-tls.h"
 
-#include "qemu/coroutine.h"
 #include "qemu/iov.h"
 
 #ifndef _WIN32
diff --git a/blockjob.c b/blockjob.c
index f51d4e18f3..c3d3e14a92 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -32,7 +32,6 @@
 #include "qapi/error.h"
 #include "qapi/qapi-events-block-core.h"
 #include "qapi/qmp/qerror.h"
-#include "qemu/coroutine.h"
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 
diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index df2b4105d6..2ee679a2fa 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -32,7 +32,6 @@
 #include "crypto/random.h"
 #include "qemu/uuid.h"
 
-#include "qemu/coroutine.h"
 #include "qemu/bitmap.h"
 
 /*
diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c
index 93ba44fb75..7ba63be489 100644
--- a/hw/9pfs/codir.c
+++ b/hw/9pfs/codir.c
@@ -19,7 +19,6 @@
 #include "qemu/osdep.h"
 #include "fsdev/qemu-fsdev.h"
 #include "qemu/thread.h"
-#include "qemu/coroutine.h"
 #include "qemu/main-loop.h"
 #include "coth.h"
 #include "9p-xattr.h"
diff --git a/hw/9pfs/cofile.c b/hw/9pfs/cofile.c
index 20f93a90e7..9c5344039e 100644
--- a/hw/9pfs/cofile.c
+++ b/hw/9pfs/cofile.c
@@ -19,7 +19,6 @@
 #include "qemu/osdep.h"
 #include "fsdev/qemu-fsdev.h"
 #include "qemu/thread.h"
-#include "qemu/coroutine.h"
 #include "qemu/main-loop.h"
 #include "coth.h"
 
diff --git a/hw/9pfs/cofs.c b/hw/9pfs/cofs.c
index 9d0adc2e78..67e3ae5c5c 100644
--- a/hw/9pfs/cofs.c
+++ b/hw/9pfs/cofs.c
@@ -19,7 +19,6 @@
 #include "qemu/osdep.h"
 #include "fsdev/qemu-fsdev.h"
 #include "qemu/thread.h"
-#include "qemu/coroutine.h"
 #include "qemu/main-loop.h"
 #include "coth.h"
 
diff --git a/hw/9pfs/coxattr.c b/hw/9pfs/coxattr.c
index dbcd09e0fd..cd0f8488ac 100644
--- a/hw/9pfs/coxattr.c
+++ b/hw/9pfs/coxattr.c
@@ -19,7 +19,6 @@
 #include "qemu/osdep.h"
 #include "fsdev/qemu-fsdev.h"
 #include "qemu/thread.h"
-#include "qemu/coroutine.h"
 #include "qemu/main-loop.h"
 #include "coth.h"
 
diff --git a/tests/unit/test-coroutine.c b/tests/unit/test-coroutine.c
index e16b80c245..513800d3db 100644
--- a/tests/unit/test-coroutine.c
+++ b/tests/unit/test-coroutine.c
@@ -12,7 +12,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/coroutine.h"
 #include "qemu/coroutine_int.h"
 #include "qemu/lockable.h"
 
diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
index 541bb4f63e..79357b29ca 100644
--- a/tests/unit/test-vmstate.c
+++ b/tests/unit/test-vmstate.c
@@ -29,7 +29,6 @@
 #include "migration/qemu-file-types.h"
 #include "../migration/qemu-file.h"
 #include "../migration/savevm.h"
-#include "qemu/coroutine.h"
 #include "qemu/module.h"
 #include "io/channel-file.h"
 
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 45c6b57374..58f3f77181 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -27,7 +27,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/coroutine.h"
 #include "qemu/coroutine_int.h"
 #include "qemu/processor.h"
 #include "qemu/queue.h"
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 571ab521ff..af59f9af98 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -12,7 +12,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/coroutine.h"
 #include "qemu/coroutine_int.h"
 #in

[PATCH 3/4] coroutine: Clean up superfluous inclusion of qemu/lockable.h

2022-12-08 Thread Markus Armbruster
Signed-off-by: Markus Armbruster 
---
 include/qemu/progress_meter.h | 2 +-
 block/progress_meter.c| 2 ++
 tests/unit/test-coroutine.c   | 1 -
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/qemu/progress_meter.h b/include/qemu/progress_meter.h
index dadf822bbf..0f2c0a32d2 100644
--- a/include/qemu/progress_meter.h
+++ b/include/qemu/progress_meter.h
@@ -27,7 +27,7 @@
 #ifndef QEMU_PROGRESS_METER_H
 #define QEMU_PROGRESS_METER_H
 
-#include "qemu/lockable.h"
+#include "qemu/thread.h"
 
 typedef struct ProgressMeter {
 /**
diff --git a/block/progress_meter.c b/block/progress_meter.c
index aa2e60248c..31a170a2cd 100644
--- a/block/progress_meter.c
+++ b/block/progress_meter.c
@@ -23,7 +23,9 @@
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
  */
+
 #include "qemu/osdep.h"
+#include "qemu/coroutine.h"
 #include "qemu/progress_meter.h"
 
 void progress_init(ProgressMeter *pm)
diff --git a/tests/unit/test-coroutine.c b/tests/unit/test-coroutine.c
index 513800d3db..b0d21d673a 100644
--- a/tests/unit/test-coroutine.c
+++ b/tests/unit/test-coroutine.c
@@ -13,7 +13,6 @@
 
 #include "qemu/osdep.h"
 #include "qemu/coroutine_int.h"
-#include "qemu/lockable.h"
 
 /*
  * Check that qemu_in_coroutine() works
-- 
2.37.3




[PATCH 0/4] coroutine: Clean up includes

2022-12-08 Thread Markus Armbruster
Markus Armbruster (4):
  coroutine: Clean up superfluous inclusion of qemu/coroutine.h
  coroutine: Move coroutine_fn to qemu/osdep.h, trim includes
  coroutine: Clean up superfluous inclusion of qemu/lockable.h
  coroutine: Break inclusion loop

 crypto/block-luks-priv.h  |  1 -
 include/block/aio_task.h  |  2 --
 include/block/block-common.h  |  1 -
 include/block/raw-aio.h   |  1 -
 include/monitor/hmp.h |  1 -
 include/qemu/coroutine.h  | 21 -
 include/qemu/lockable.h   |  1 -
 include/qemu/osdep.h  | 16 
 include/qemu/progress_meter.h |  2 +-
 include/qemu/ratelimit.h  |  2 +-
 include/qemu/seqlock.h|  2 +-
 include/scsi/pr-manager.h |  1 -
 linux-user/fd-trans.h |  2 +-
 nbd/nbd-internal.h|  1 -
 backends/tpm/tpm_emulator.c   |  2 +-
 block/progress_meter.c|  2 ++
 blockjob.c|  1 -
 cpus-common.c |  2 +-
 crypto/block-luks.c   |  1 -
 hw/9pfs/codir.c   |  1 -
 hw/9pfs/cofile.c  |  1 -
 hw/9pfs/cofs.c|  1 -
 hw/9pfs/coxattr.c |  1 -
 hw/hyperv/hyperv.c|  2 +-
 hw/usb/ccid-card-emulated.c   |  2 +-
 hw/vfio/platform.c|  2 +-
 plugins/core.c|  2 +-
 plugins/loader.c  |  2 +-
 tests/unit/test-coroutine.c   |  2 --
 tests/unit/test-vmstate.c |  1 -
 ui/spice-display.c|  2 +-
 util/log.c|  2 +-
 util/qemu-coroutine-lock.c|  1 -
 util/qemu-coroutine-sleep.c   |  1 -
 util/qemu-coroutine.c |  1 -
 util/qemu-timer.c |  2 +-
 util/rcu.c|  2 +-
 util/vfio-helpers.c   |  2 +-
 util/yank.c   |  2 +-
 39 files changed, 43 insertions(+), 51 deletions(-)

-- 
2.37.3




[PATCH 2/4] coroutine: Move coroutine_fn to qemu/osdep.h, trim includes

2022-12-08 Thread Markus Armbruster
block/block-hmp-cmds.h and qemu/co-shared-resource.h use coroutine_fn
without including qemu/coroutine.h.  They compile only if it's already
included from elsewhere.

I could fix that, but pulling in qemu/coroutine.h and everything it
includes just for a macro that expands into nothing feels silly.
Instead, move the macro to qemu/osdep.h.

Inclusions of qemu/coroutine.h just for coroutine_fn become
superfluous.  Drop them.

Signed-off-by: Markus Armbruster 
---
 include/block/aio_task.h |  2 --
 include/block/block-common.h |  1 -
 include/monitor/hmp.h|  1 -
 include/qemu/coroutine.h | 18 +++---
 include/qemu/osdep.h | 16 
 5 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/include/block/aio_task.h b/include/block/aio_task.h
index 50bc1e1817..18a9c41f4e 100644
--- a/include/block/aio_task.h
+++ b/include/block/aio_task.h
@@ -25,8 +25,6 @@
 #ifndef BLOCK_AIO_TASK_H
 #define BLOCK_AIO_TASK_H
 
-#include "qemu/coroutine.h"
-
 typedef struct AioTaskPool AioTaskPool;
 typedef struct AioTask AioTask;
 typedef int coroutine_fn (*AioTaskFunc)(AioTask *task);
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 297704c1e9..0f6b8422bd 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -27,7 +27,6 @@
 #include "block/aio.h"
 #include "block/aio-wait.h"
 #include "qemu/iov.h"
-#include "qemu/coroutine.h"
 #include "block/accounting.h"
 #include "block/dirty-bitmap.h"
 #include "block/blockjob.h"
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index dfbc0c9a2f..c92f69da8b 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -15,7 +15,6 @@
 #define HMP_H
 
 #include "qemu/readline.h"
-#include "qemu/coroutine.h"
 #include "qapi/qapi-types-common.h"
 
 bool hmp_handle_error(Monitor *mon, Error *err);
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 89650a2d7f..2496a4f4ef 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -26,23 +26,19 @@
  * waiting for events to complete.
  *
  * These functions are re-entrant and may be used outside the global mutex.
- */
-
-/**
- * Mark a function that executes in coroutine context
  *
- * Functions that execute in coroutine context cannot be called directly from
- * normal functions.  In the future it would be nice to enable compiler or
- * static checker support for catching such errors.  This annotation might make
- * it possible and in the meantime it serves as documentation.
- *
- * For example:
+ * Functions that execute in coroutine context cannot be called
+ * directly from normal functions.  Use @coroutine_fn to mark such
+ * functions.  For example:
  *
  *   static void coroutine_fn foo(void) {
  *   
  *   }
+ *
+ * In the future it would be nice to have the compiler or a static
+ * checker catch misuse of such functions.  This annotation might make
+ * it possible and in the meantime it serves as documentation.
  */
-#define coroutine_fn
 
 typedef struct Coroutine Coroutine;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b9c4307779..8e97e5d79a 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -157,6 +157,22 @@ extern "C" {
 
 #include "qemu/typedefs.h"
 
+/**
+ * Mark a function that executes in coroutine context
+ *
+ * Functions that execute in coroutine context cannot be called directly from
+ * normal functions.  In the future it would be nice to enable compiler or
+ * static checker support for catching such errors.  This annotation might make
+ * it possible and in the meantime it serves as documentation.
+ *
+ * For example:
+ *
+ *   static void coroutine_fn foo(void) {
+ *   
+ *   }
+ */
+#define coroutine_fn
+
 /*
  * For mingw, as of v6.0.0, the function implementing the assert macro is
  * not marked as noreturn, so the compiler cannot delete code following an
-- 
2.37.3




[PATCH 4/4] coroutine: Break inclusion loop

2022-12-08 Thread Markus Armbruster
qemu/coroutine.h and qemu/lockable.h include each other.  Neither
header actually needs the other one.

Drop #include "qemu/coroutine.h" from qemu/lockable.h to break the
loop.  All users of lockable.h actually need qemu/coroutine.h, so
adjust their #include directives.

I'm not dropping the #include "qemu/lockable" from qemu/coroutine.h,
because that would require adding it back next to #include
"qemu/coroutine.h" all over the place.  It's in an unusual place,
though.  Move it to the usual place, next to the other #include
directives.

Signed-off-by: Markus Armbruster 
---
 include/qemu/coroutine.h| 3 +--
 include/qemu/lockable.h | 1 -
 include/qemu/ratelimit.h| 2 +-
 include/qemu/seqlock.h  | 2 +-
 linux-user/fd-trans.h   | 2 +-
 backends/tpm/tpm_emulator.c | 2 +-
 cpus-common.c   | 2 +-
 hw/hyperv/hyperv.c  | 2 +-
 hw/usb/ccid-card-emulated.c | 2 +-
 hw/vfio/platform.c  | 2 +-
 plugins/core.c  | 2 +-
 plugins/loader.c| 2 +-
 ui/spice-display.c  | 2 +-
 util/log.c  | 2 +-
 util/qemu-timer.c   | 2 +-
 util/rcu.c  | 2 +-
 util/vfio-helpers.c | 2 +-
 util/yank.c | 2 +-
 18 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 2496a4f4ef..2504d4757f 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -15,6 +15,7 @@
 #ifndef QEMU_COROUTINE_H
 #define QEMU_COROUTINE_H
 
+#include "qemu/lockable.h"
 #include "qemu/queue.h"
 #include "qemu/timer.h"
 
@@ -376,8 +377,6 @@ void qemu_coroutine_inc_pool_size(unsigned int 
additional_pool_size);
  */
 void qemu_coroutine_dec_pool_size(unsigned int additional_pool_size);
 
-#include "qemu/lockable.h"
-
 /**
  * Sends a (part of) iovec down a socket, yielding when the socket is full, or
  * Receives data into a (part of) iovec from a socket,
diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
index 86db7cb04c..7d6cdeb750 100644
--- a/include/qemu/lockable.h
+++ b/include/qemu/lockable.h
@@ -13,7 +13,6 @@
 #ifndef QEMU_LOCKABLE_H
 #define QEMU_LOCKABLE_H
 
-#include "qemu/coroutine.h"
 #include "qemu/thread.h"
 
 typedef void QemuLockUnlockFunc(void *);
diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
index 48bf59e857..4e07e1e2a4 100644
--- a/include/qemu/ratelimit.h
+++ b/include/qemu/ratelimit.h
@@ -14,7 +14,7 @@
 #ifndef QEMU_RATELIMIT_H
 #define QEMU_RATELIMIT_H
 
-#include "qemu/lockable.h"
+#include "qemu/coroutine.h"
 #include "qemu/timer.h"
 
 typedef struct {
diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
index ecb7d2c864..79a0af625b 100644
--- a/include/qemu/seqlock.h
+++ b/include/qemu/seqlock.h
@@ -16,7 +16,7 @@
 
 #include "qemu/atomic.h"
 #include "qemu/thread.h"
-#include "qemu/lockable.h"
+#include "qemu/coroutine.h"
 
 typedef struct QemuSeqLock QemuSeqLock;
 
diff --git a/linux-user/fd-trans.h b/linux-user/fd-trans.h
index 1b9fa2041c..e662d644bc 100644
--- a/linux-user/fd-trans.h
+++ b/linux-user/fd-trans.h
@@ -16,7 +16,7 @@
 #ifndef FD_TRANS_H
 #define FD_TRANS_H
 
-#include "qemu/lockable.h"
+#include "qemu/coroutine.h"
 
 typedef abi_long (*TargetFdDataFunc)(void *, size_t);
 typedef abi_long (*TargetFdAddrFunc)(void *, abi_ulong, socklen_t);
diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index 49cc3d749d..f364b089ad 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -30,7 +30,7 @@
 #include "qemu/error-report.h"
 #include "qemu/module.h"
 #include "qemu/sockets.h"
-#include "qemu/lockable.h"
+#include "qemu/coroutine.h"
 #include "io/channel-socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/tpm_backend.h"
diff --git a/cpus-common.c b/cpus-common.c
index 793364dc0e..1b1f8f75c9 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -22,7 +22,7 @@
 #include "exec/cpu-common.h"
 #include "hw/core/cpu.h"
 #include "sysemu/cpus.h"
-#include "qemu/lockable.h"
+#include "qemu/coroutine.h"
 
 static QemuMutex qemu_cpu_list_lock;
 static QemuCond exclusive_cond;
diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
index 57b402b956..41f3e9f1ca 100644
--- a/hw/hyperv/hyperv.c
+++ b/hw/hyperv/hyperv.c
@@ -15,7 +15,7 @@
 #include "sysemu/kvm.h"
 #include "qemu/bitops.h"
 #include "qemu/error-report.h"
-#include "qemu/lockable.h"
+#include "qemu/coroutine.h"
 #include "qemu/queue.h"
 #include "qemu/rcu.h"
 #include "qemu/rcu_queue.h"
diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
index ee41a81801..6962143c29 100644
--- a/hw/usb/ccid-card-emulated.c
+++ b/hw/usb/ccid-card-emulated.c
@@ -30,7 +30,7 @@
 #include 
 
 #include "qemu/thread.h"
-#include "qemu/lockable.h"
+#include "qemu/coroutine.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
 #include "ccid.h"
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 5af73f9287..e252edc04a 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -22,7 +22,7 @@

Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-12-08 Thread Chuang Xu


On 2022/12/8 上午6:08, Peter Xu wrote:

On Thu, Dec 08, 2022 at 12:07:03AM +0800, Chuang Xu wrote:

On 2022/12/6 上午12:28, Peter Xu wrote:

Chuang,

No worry on the delay; you're faster than when I read yours. :)

On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote:

As a start, maybe you can try with poison address_space_to_flatview() (by
e.g. checking the start_pack_mr_change flag and assert it is not set)
during this process to see whether any call stack can even try to
dereference a flatview.

It's just that I didn't figure a good way to "prove" its validity, even if
I think this is an interesting idea worth thinking to shrink the downtime.

Thanks for your sugguestions!
I used a thread local variable to identify whether the current thread is a
migration thread(main thread of target qemu) and I modified the code of
qemu_coroutine_switch to make sure the thread local variable true only in
process_incoming_migration_co call stack. If the target qemu detects that
start_pack_mr_change is set and address_space_to_flatview() is called in
non-migrating threads or non-migrating coroutine, it will crash.

Are you using the thread var just to avoid the assert triggering in the
migration thread when commiting memory changes?

I think _maybe_ another cleaner way to sanity check this is directly upon
the depth:

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
  /*
   * Before using any flatview, sanity check we're not during a memory
   * region transaction or the map can be invalid.  Note that this can
   * also be called during commit phase of memory transaction, but that
   * should also only happen when the depth decreases to 0 first.
   */
  assert(memory_region_transaction_depth == 0);
  return qatomic_rcu_read(&as->current_map);
}

That should also cover the safe cases of memory transaction commits during
migration.


Peter, I tried this way and found that the target qemu will crash.

Here is the gdb backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7ff2929d851a in __GI_abort () at abort.c:118
#2  0x7ff2929cfe67 in __assert_fail_base (fmt=, 
assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth == 0", 
file=file@entry=0x55a32575d9b0 "/data00/migration/qemu-5.2.0/include/exec/memory.h",
 line=line@entry=766, function=function@entry=0x55a32578d6e0 
<__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:92
#3  0x7ff2929cff12 in __GI___assert_fail (assertion=assertion@entry=0x55a32578cdc0 
"memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0 
"/data00/migration/qemu-5.2.0/include/exec/memory.h", line=line@entry=766,
 function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> 
"address_space_to_flatview") at assert.c:101
#4  0x55a324b2ed5e in address_space_to_flatview (as=0x55a326132580 
) at 
/data00/migration/qemu-5.2.0/include/exec/memory.h:766
#5  0x55a324e79559 in address_space_to_flatview (as=0x55a326132580 
) at ../softmmu/memory.c:811
#6  address_space_get_flatview (as=0x55a326132580 ) at 
../softmmu/memory.c:805
#7  0x55a324e96474 in address_space_cache_init (cache=cache@entry=0x55a32a4fb000, 
as=, addr=addr@entry=68404985856, len=len@entry=4096, 
is_write=false) at ../softmmu/physmem.c:3307
#8  0x55a324ea9cba in virtio_init_region_cache (vdev=0x55a32985d9a0, n=0) 
at ../hw/virtio/virtio.c:185
#9  0x55a324eaa615 in virtio_load (vdev=0x55a32985d9a0, f=, 
version_id=) at ../hw/virtio/virtio.c:3203
#10 0x55a324c6ab96 in vmstate_load_state (f=f@entry=0x55a329dc0c00, 
vmsd=0x55a325fc1a60 , opaque=0x55a32985d9a0, version_id=1) 
at ../migration/vmstate.c:143
#11 0x55a324cda138 in vmstate_load (f=0x55a329dc0c00, se=0x55a329941c90) at 
../migration/savevm.c:913
#12 0x55a324cdda34 in qemu_loadvm_section_start_full (mis=0x55a3284ef9e0, 
f=0x55a329dc0c00) at ../migration/savevm.c:2741
#13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00, 
mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939
#14 0x55a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at 
../migration/savevm.c:3021
#15 0x55a324d14b4e in process_incoming_migration_co (opaque=) at ../migration/migration.c:574
#16 0x55a32501ae3b in coroutine_trampoline (i0=, i1=) at ../util/coroutine-ucontext.c:173
#17 0x7ff2929e8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x7ffed80dc2a0 in ?? ()
#19 0x in ?? ()

address_space_cache_init() is the only caller of address_space_to_flatview
I can find in vmstate_load call stack so far. Although I think the mr used
by address_space_cache_init() won't be affected by the delay of
memory_region_transaction_commit(), we really need a mechanism to prevent
the modified mr from being used.

Maybe we can build a stale list:
If a subregion is added, add its parent to the stale list(considering that
new subregion's priority has uncertain effects on flatviews).
If a subregion 

[PATCH 3/3] include/hw/block: Include hw/block/block.h where needed

2022-12-08 Thread Markus Armbruster
hw/block/swim.h needs BlockConf.

Signed-off-by: Markus Armbruster 
---
 include/hw/block/swim.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/hw/block/swim.h b/include/hw/block/swim.h
index c1bd5f6555..9b3dcb029d 100644
--- a/include/hw/block/swim.h
+++ b/include/hw/block/swim.h
@@ -11,6 +11,7 @@
 #ifndef SWIM_H
 #define SWIM_H
 
+#include "hw/block/block.h"
 #include "hw/sysbus.h"
 #include "qom/object.h"
 
-- 
2.37.3




[PATCH 1/3] include/block: Untangle inclusion loops

2022-12-08 Thread Markus Armbruster
We have two inclusion loops:

   block/block.h
-> block/block-global-state.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h

   block/block.h
-> block/block-io.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h

I believe these go back to Emanuele's reorganization of the block API,
merged a few months ago in commit d7e2fe4aac8.

Fortunately, breaking them is merely a matter of deleting unnecessary
includes from headers, and adding them back in places where they are
now missing.

Signed-off-by: Markus Armbruster 
---
 include/block/block-common.h   | 10 ++
 include/block/block-copy.h |  4 ++--
 include/block/block-global-state.h |  4 +++-
 include/block/block-hmp-cmds.h |  2 ++
 include/block/block-io.h   |  5 -
 include/block/block.h  |  4 ++--
 include/block/block_backup.h   |  2 +-
 include/block/block_int-common.h   | 14 +-
 include/block/block_int-global-state.h |  7 +--
 include/block/block_int-io.h   |  4 +++-
 include/block/block_int.h  |  4 ++--
 include/block/blockjob.h   |  2 +-
 include/block/blockjob_int.h   |  1 -
 include/block/qapi.h   |  2 +-
 include/block/thread-pool.h|  2 +-
 include/block/throttle-groups.h|  2 +-
 include/sysemu/block-backend-io.h  |  1 +
 block.c|  1 +
 block/amend.c  |  1 +
 block/backup.c |  1 +
 block/blkdebug.c   |  1 +
 block/blklogwrites.c   |  1 +
 block/blkreplay.c  |  1 +
 block/blkverify.c  |  1 +
 block/block-copy.c |  4 
 block/bochs.c  |  1 +
 block/cloop.c  |  1 +
 block/copy-before-write.c  |  1 +
 block/copy-on-read.c   |  1 +
 block/curl.c   |  1 +
 block/dirty-bitmap.c   |  2 ++
 block/dmg.c|  1 +
 block/export/fuse.c|  3 ++-
 block/file-posix.c |  1 +
 block/file-win32.c |  1 +
 block/filter-compress.c|  1 +
 block/gluster.c|  1 +
 block/io.c |  1 +
 block/iscsi.c  |  1 +
 block/mirror.c |  1 +
 block/monitor/bitmap-qmp-cmds.c|  2 ++
 block/nfs.c|  1 +
 block/null.c   |  1 +
 block/nvme.c   |  1 +
 block/parallels-ext.c  |  2 ++
 block/preallocate.c|  1 +
 block/qapi-sysemu.c|  1 +
 block/qapi.c   |  1 +
 block/qcow2-bitmap.c   |  2 ++
 block/qcow2-cache.c|  1 +
 block/qcow2-cluster.c  |  1 +
 block/qcow2-refcount.c |  1 +
 block/qcow2-threads.c  |  1 +
 block/qcow2.c  |  1 +
 block/qed-check.c  |  1 +
 block/qed-table.c  |  1 +
 block/raw-format.c |  1 +
 block/rbd.c|  1 +
 block/ssh.c|  1 +
 block/throttle.c   |  2 ++
 block/vhdx-log.c   |  1 +
 block/vvfat.c  |  1 +
 block/win32-aio.c  |  1 +
 block/write-threshold.c|  1 +
 blockdev.c |  1 +
 blockjob.c |  1 +
 hw/block/block.c   |  1 +
 hw/sparc64/niagara.c   |  1 +
 hw/virtio/virtio-pmem.c|  1 +
 migration/block-dirty-bitmap.c |  1 +
 migration/block.c  |  1 +
 migration/savevm.c |  1 +
 monitor/qmp-cmds.c |  1 +
 nbd/server.c   |  2 ++
 qemu-img.c |  1 +
 softmmu/cpus.c |  1 +
 softmmu/physmem.c  |  1 +
 storage-daemon/qemu-storage-daemon.c   |  1 +
 target/i386/kvm/kvm.c  |  1 +
 tests/unit/test-bdrv-drain.c   |  2 +-
 tests/unit/test-block-iothread.c   |  1 +
 81 files changed, 110 insertions(+), 36 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index 0f6b8422bd..8ef6bc21df 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -24,14 +24,8 @@
 #ifndef BLOCK_COMMON_H
 #define BLOCK_COMMON_H
 
-#include "block/aio.h"
-#include "block/aio-wait.h"
-#include "qemu/iov.h"
-#include "block/accounting.h"
-#include "block/dirty-bitmap.h"
-#include "block/blockjob.h"
-#include "qemu/hbitmap.h"
-#include "qemu/transactions.h"
+#includ

[PATCH 0/3] block: Clean up includes

2022-12-08 Thread Markus Armbruster
Based-on: <20221208142306.2642640-1-arm...@redhat.com>

Markus Armbruster (3):
  include/block: Untangle inclusion loops
  hw/sparc64/niagara: Use blk_name() instead of open-coding it
  include/hw/block: Include hw/block/block.h where needed

 include/block/block-common.h   | 10 ++
 include/block/block-copy.h |  4 ++--
 include/block/block-global-state.h |  4 +++-
 include/block/block-hmp-cmds.h |  2 ++
 include/block/block-io.h   |  5 -
 include/block/block.h  |  4 ++--
 include/block/block_backup.h   |  2 +-
 include/block/block_int-common.h   | 14 +-
 include/block/block_int-global-state.h |  7 +--
 include/block/block_int-io.h   |  4 +++-
 include/block/block_int.h  |  4 ++--
 include/block/blockjob.h   |  2 +-
 include/block/blockjob_int.h   |  1 -
 include/block/qapi.h   |  2 +-
 include/block/thread-pool.h|  2 +-
 include/block/throttle-groups.h|  2 +-
 include/hw/block/swim.h|  1 +
 include/sysemu/block-backend-io.h  |  1 +
 block.c|  1 +
 block/amend.c  |  1 +
 block/backup.c |  1 +
 block/blkdebug.c   |  1 +
 block/blklogwrites.c   |  1 +
 block/blkreplay.c  |  1 +
 block/blkverify.c  |  1 +
 block/block-copy.c |  4 
 block/bochs.c  |  1 +
 block/cloop.c  |  1 +
 block/copy-before-write.c  |  1 +
 block/copy-on-read.c   |  1 +
 block/curl.c   |  1 +
 block/dirty-bitmap.c   |  2 ++
 block/dmg.c|  1 +
 block/export/fuse.c|  3 ++-
 block/file-posix.c |  1 +
 block/file-win32.c |  1 +
 block/filter-compress.c|  1 +
 block/gluster.c|  1 +
 block/io.c |  1 +
 block/iscsi.c  |  1 +
 block/mirror.c |  1 +
 block/monitor/bitmap-qmp-cmds.c|  2 ++
 block/nfs.c|  1 +
 block/null.c   |  1 +
 block/nvme.c   |  1 +
 block/parallels-ext.c  |  2 ++
 block/preallocate.c|  1 +
 block/qapi-sysemu.c|  1 +
 block/qapi.c   |  1 +
 block/qcow2-bitmap.c   |  2 ++
 block/qcow2-cache.c|  1 +
 block/qcow2-cluster.c  |  1 +
 block/qcow2-refcount.c |  1 +
 block/qcow2-threads.c  |  1 +
 block/qcow2.c  |  1 +
 block/qed-check.c  |  1 +
 block/qed-table.c  |  1 +
 block/raw-format.c |  1 +
 block/rbd.c|  1 +
 block/ssh.c|  1 +
 block/throttle.c   |  2 ++
 block/vhdx-log.c   |  1 +
 block/vvfat.c  |  1 +
 block/win32-aio.c  |  1 +
 block/write-threshold.c|  1 +
 blockdev.c |  1 +
 blockjob.c |  1 +
 hw/block/block.c   |  1 +
 hw/sparc64/niagara.c   |  5 ++---
 hw/virtio/virtio-pmem.c|  1 +
 migration/block-dirty-bitmap.c |  1 +
 migration/block.c  |  1 +
 migration/savevm.c |  1 +
 monitor/qmp-cmds.c |  1 +
 nbd/server.c   |  2 ++
 qemu-img.c |  1 +
 softmmu/cpus.c |  1 +
 softmmu/physmem.c  |  1 +
 storage-daemon/qemu-storage-daemon.c   |  1 +
 target/i386/kvm/kvm.c  |  1 +
 tests/unit/test-bdrv-drain.c   |  2 +-
 tests/unit/test-block-iothread.c   |  1 +
 82 files changed, 112 insertions(+), 39 deletions(-)

-- 
2.37.3




[PATCH 2/3] hw/sparc64/niagara: Use blk_name() instead of open-coding it

2022-12-08 Thread Markus Armbruster
Signed-off-by: Markus Armbruster 
---
 hw/sparc64/niagara.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/sparc64/niagara.c b/hw/sparc64/niagara.c
index ab3c4ec346..6725cc61fd 100644
--- a/hw/sparc64/niagara.c
+++ b/hw/sparc64/niagara.c
@@ -23,7 +23,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "block/block_int-common.h"
 #include "qemu/units.h"
 #include "cpu.h"
 #include "hw/boards.h"
@@ -144,10 +143,9 @@ static void niagara_init(MachineState *machine)
 memory_region_add_subregion(get_system_memory(),
 NIAGARA_VDISK_BASE, &s->vdisk_ram);
 dinfo->is_default = 1;
-rom_add_file_fixed(blk_bs(blk)->filename, NIAGARA_VDISK_BASE, -1);
+rom_add_file_fixed(blk_name(blk), NIAGARA_VDISK_BASE, -1);
 } else {
-error_report("could not load ram disk '%s'",
- blk_bs(blk)->filename);
+error_report("could not load ram disk '%s'", blk_name(blk));
 exit(1);
 }
 }
-- 
2.37.3




Re: [PATCH 2/3] hw/sparc64/niagara: Use blk_name() instead of open-coding it

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 15:39, Markus Armbruster wrote:

Signed-off-by: Markus Armbruster 
---
  hw/sparc64/niagara.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 





Re: [PATCH 3/3] include/hw/block: Include hw/block/block.h where needed

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 15:39, Markus Armbruster wrote:

hw/block/swim.h needs BlockConf.

Signed-off-by: Markus Armbruster 
---
  include/hw/block/swim.h | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Philippe Mathieu-Daudé 





[PATCH v3 0/3] target/riscv: Apply KVM policy to ISA extensions

2022-12-08 Thread Mayuresh Chitale
Currently the single and multi letter ISA extensions exposed to the
guest vcpu don't confirm to the KVM policies. This patchset updates the kvm 
headers
and applies policies set in KVM to the extensions exposed to the guest.

These patches can also be found on riscv_kvm_ext_v3 branch at:
https://github.com/mdchitale/qemu.git

Changes in v3:
- Address comments from Bin Meng

Changes in v2:
- Rebase to latest riscv-to-apply.next
- Update linux headers to version 6.1-rc8
- Add reviewed by tags

Mayuresh Chitale (3):
  update-linux-headers: Version 6.1-rc8
  target/riscv: Extend isa_ext_data for single letter extensions
  target/riscv: kvm: Support selecting VCPU extensions

 include/standard-headers/drm/drm_fourcc.h |  34 -
 include/standard-headers/linux/ethtool.h  |  63 +++-
 include/standard-headers/linux/fuse.h |   6 +-
 .../linux/input-event-codes.h |   1 +
 include/standard-headers/linux/virtio_blk.h   |  19 +++
 linux-headers/asm-generic/hugetlb_encode.h|  26 ++--
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-riscv/kvm.h |   4 +
 linux-headers/linux/kvm.h |   1 +
 linux-headers/linux/psci.h|  14 ++
 linux-headers/linux/userfaultfd.h |   4 +
 linux-headers/linux/vfio.h| 142 ++
 target/riscv/cpu.c|  52 ---
 target/riscv/kvm.c|  88 +--
 target/riscv/kvm_riscv.h  |   2 +-
 16 files changed, 408 insertions(+), 52 deletions(-)

-- 
2.34.1




[PATCH v3 1/3] update-linux-headers: Version 6.1-rc8

2022-12-08 Thread Mayuresh Chitale
Sync headers with kernel commit 76dcd734eca2

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Andrew Jones 
---
 include/standard-headers/drm/drm_fourcc.h |  34 -
 include/standard-headers/linux/ethtool.h  |  63 +++-
 include/standard-headers/linux/fuse.h |   6 +-
 .../linux/input-event-codes.h |   1 +
 include/standard-headers/linux/virtio_blk.h   |  19 +++
 linux-headers/asm-generic/hugetlb_encode.h|  26 ++--
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-riscv/kvm.h |   4 +
 linux-headers/linux/kvm.h |   1 +
 linux-headers/linux/psci.h|  14 ++
 linux-headers/linux/userfaultfd.h |   4 +
 linux-headers/linux/vfio.h| 142 ++
 13 files changed, 298 insertions(+), 20 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 48b620cbef..b868488f93 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -98,18 +98,42 @@ extern "C" {
 #define DRM_FORMAT_INVALID 0
 
 /* color index */
+#define DRM_FORMAT_C1  fourcc_code('C', '1', ' ', ' ') /* [7:0] 
C0:C1:C2:C3:C4:C5:C6:C7 1:1:1:1:1:1:1:1 eight pixels/byte */
+#define DRM_FORMAT_C2  fourcc_code('C', '2', ' ', ' ') /* [7:0] 
C0:C1:C2:C3 2:2:2:2 four pixels/byte */
+#define DRM_FORMAT_C4  fourcc_code('C', '4', ' ', ' ') /* [7:0] C0:C1 
4:4 two pixels/byte */
 #define DRM_FORMAT_C8  fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
-/* 8 bpp Red */
+/* 1 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D1  fourcc_code('D', '1', ' ', ' ') /* [7:0] 
D0:D1:D2:D3:D4:D5:D6:D7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D2  fourcc_code('D', '2', ' ', ' ') /* [7:0] 
D0:D1:D2:D3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D4  fourcc_code('D', '4', ' ', ' ') /* [7:0] D0:D1 
4:4 two pixels/byte */
+
+/* 8 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D8  fourcc_code('D', '8', ' ', ' ') /* [7:0] D */
+
+/* 1 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R1  fourcc_code('R', '1', ' ', ' ') /* [7:0] 
R0:R1:R2:R3:R4:R5:R6:R7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R2  fourcc_code('R', '2', ' ', ' ') /* [7:0] 
R0:R1:R2:R3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R4  fourcc_code('R', '4', ' ', ' ') /* [7:0] R0:R1 
4:4 two pixels/byte */
+
+/* 8 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R8  fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
 
-/* 10 bpp Red */
+/* 10 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R10 fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 
6:10 little endian */
 
-/* 12 bpp Red */
+/* 12 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R12 fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 
4:12 little endian */
 
-/* 16 bpp Red */
+/* 16 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R16 fourcc_code('R', '1', '6', ' ') /* [15:0] R 
little endian */
 
 /* 16 bpp RG */
@@ -204,7 +228,9 @@ extern "C" {
 #define DRM_FORMAT_VYUYfourcc_code('V', 'Y', 'U', 'Y') /* 
[31:0] Y1:Cb0:Y0:Cr0 8:8:8:8 little endian */
 
 #define DRM_FORMAT_AYUVfourcc_code('A', 'Y', 'U', 'V') /* 
[31:0] A:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_AVUYfourcc_code('A', 'V', 'U', 'Y') /* [31:0] 
A:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_XYUVfourcc_code('X', 'Y', 'U', 'V') /* [31:0] 
X:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_XVUYfourcc_code('X', 'V', 'U', 'Y') /* [31:0] 
X:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_VUY888  fourcc_code('V', 'U', '2', '4') /* [23:0] 
Cr:Cb:Y 8:8:8 little endian */
 #define DRM_FORMAT_VUY101010   fourcc_code('V', 'U', '3', '0') /* Y followed 
by U then V, 10:10:10. Non-linear modifier only */
 
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 4537da20cc..1dc56cdc0a 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -736,6 +736,51 @@ enum ethtool_module_power_mode {
ETHTOOL_MODULE_POWER_MODE_HIGH,
 };
 
+/**
+ * enum ethtool_podl_p

[PATCH v3 3/3] target/riscv: kvm: Support selecting VCPU extensions

2022-12-08 Thread Mayuresh Chitale
Set the state of each ISA extension on the vcpu depending on what
is set in the CPU property and what is allowed by KVM for that extension.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Andrew Jones 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c   | 11 -
 target/riscv/kvm.c   | 88 ++--
 target/riscv/kvm_riscv.h |  2 +-
 3 files changed, 87 insertions(+), 14 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8c8f085a80..4b2c1dadf4 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1205,10 +1205,19 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
**isa_str)
 {
 char *old = *isa_str;
 char *new = *isa_str;
-int i;
+int i, offset;
 
 for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
 if (isa_ext_is_enabled(cpu, &isa_edata_arr[i])) {
+offset = isa_edata_arr[i].ext_enable_offset;
+if (kvm_enabled() && !kvm_riscv_ext_supported(offset)) {
+#ifndef CONFIG_USER_ONLY
+info_report("disabling %s extension for hart %d as kvm does "
+"not support it", isa_edata_arr[i].name,
+(int)cpu->env.mhartid);
+#endif
+continue;
+}
 if (isa_edata_arr[i].multi_letter) {
 if (cpu->cfg.short_isa_string) {
 continue;
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 30f21453d6..79029ad328 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -42,6 +42,29 @@
 #include "migration/migration.h"
 #include "sysemu/runstate.h"
 
+struct isa_ext_info {
+const char *name;
+target_ulong misa_bit;
+int ext_enable_offset;
+};
+
+#define ISA_EXT_DATA_ENTRY(_name, _bit, _prop) \
+{#_name, _bit, offsetof(struct RISCVCPUConfig, _prop)}
+
+static const struct isa_ext_info isa_info_arr[] = {
+ISA_EXT_DATA_ENTRY(a, RVA, ext_a),
+ISA_EXT_DATA_ENTRY(c, RVC, ext_c),
+ISA_EXT_DATA_ENTRY(d, RVD, ext_d),
+ISA_EXT_DATA_ENTRY(f, RVF, ext_f),
+ISA_EXT_DATA_ENTRY(h, RVH, ext_h),
+ISA_EXT_DATA_ENTRY(i, RVI, ext_i),
+ISA_EXT_DATA_ENTRY(m, RVM, ext_m),
+ISA_EXT_DATA_ENTRY(svpbmt, 0, ext_svpbmt),
+ISA_EXT_DATA_ENTRY(sstc, 0, ext_sstc),
+ISA_EXT_DATA_ENTRY(svinval, 0, ext_svinval),
+ISA_EXT_DATA_ENTRY(zihintpause, 0, ext_zihintpause),
+};
+
 static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
  uint64_t idx)
 {
@@ -394,25 +417,66 @@ void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
 
+bool kvm_riscv_ext_supported(int offset)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(isa_info_arr); ++i) {
+if (isa_info_arr[i].ext_enable_offset == offset) {
+return true;
+}
+}
+return false;
+}
+
+static void kvm_riscv_set_isa_ext(CPUState *cs, CPURISCVState *env)
+{
+RISCVCPU *cpu = RISCV_CPU(cs);
+unsigned long isa_ext_out;
+bool *ext_state;
+uint64_t id;
+int i, ret;
+
+env->misa_ext = 0;
+for (i = 0; i < ARRAY_SIZE(isa_info_arr); i++) {
+ext_state = (void *)&cpu->cfg + isa_info_arr[i].ext_enable_offset;
+id = kvm_riscv_reg_id(env, KVM_REG_RISCV_ISA_EXT, i);
+ret = kvm_get_one_reg(cs, id, &isa_ext_out);
+if (ret) {
+warn_report("Disabling ext %s due to failure %d",
+isa_info_arr[i].name, ret);
+*ext_state = false;
+continue;
+}
+if (isa_ext_out != (*ext_state)) {
+isa_ext_out = *ext_state;
+ret = kvm_set_one_reg(cs, id, &isa_ext_out);
+if (ret) {
+warn_report("Could not %s ext %s due to failure %d",
+(isa_ext_out ? "enable" : "disable"),
+isa_info_arr[i].name, ret);
+*ext_state = !isa_ext_out;
+}
+}
+/*
+ * If the single letter extension is supported by KVM then set
+ * the corresponding misa bit for the guest vcpu.
+ */
+if (isa_info_arr[i].misa_bit && (*ext_state)) {
+env->misa_ext |= isa_info_arr[i].misa_bit;
+}
+}
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
-int ret = 0;
-target_ulong isa;
 RISCVCPU *cpu = RISCV_CPU(cs);
 CPURISCVState *env = &cpu->env;
-uint64_t id;
 
 qemu_add_vm_change_state_handler(kvm_riscv_vm_state_change, cs);
 
-id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
-  KVM_REG_RISCV_CONFIG_REG(isa));
-ret = kvm_get_one_reg(cs, id, &isa);
-if (ret) {
-return ret;
-}
-env->misa_ext = isa;
-
-return ret;
+kvm_riscv_set_isa_ext(cs, env);
+return 0;
 }
 
 int kvm_arch_msi_data_to_gsi(uint32_t data)
diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
index ed281bdce0..bd0da4 100644
--- a/target/riscv/kvm_riscv.h
+++ b/target/riscv/kvm_riscv.h
@@ -21,5 +21,5 @@
 
 void kvm

[PATCH v3 2/3] target/riscv: Extend isa_ext_data for single letter extensions

2022-12-08 Thread Mayuresh Chitale
Currently the ISA string for a CPU is generated from two different
arrays, one for single letter extensions and another for multi letter
extensions. Add all the single letter extensions to the isa_ext_data
array and use it for generating the ISA string. Also drop 'P' and 'Q'
extensions from the list of single letter extensions as those are not
supported yet.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Andrew Jones 
Reviewed-by: Alistair Francis 
Reviewed-by: Bin Meng 
---
 target/riscv/cpu.c | 41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 042fd541b4..8c8f085a80 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -41,8 +41,6 @@
  (QEMU_VERSION_MICRO))
 #define RISCV_CPU_MIMPIDRISCV_CPU_MARCHID
 
-static const char riscv_single_letter_exts[] = "IEMAFDQCPVH";
-
 struct isa_ext_data {
 const char *name;
 bool multi_letter;
@@ -71,6 +69,13 @@ struct isa_ext_data {
  *extensions by an underscore.
  */
 static const struct isa_ext_data isa_edata_arr[] = {
+ISA_EXT_DATA_ENTRY(i, false, PRIV_VERSION_1_10_0, ext_i),
+ISA_EXT_DATA_ENTRY(e, false, PRIV_VERSION_1_10_0, ext_e),
+ISA_EXT_DATA_ENTRY(m, false, PRIV_VERSION_1_10_0, ext_m),
+ISA_EXT_DATA_ENTRY(a, false, PRIV_VERSION_1_10_0, ext_a),
+ISA_EXT_DATA_ENTRY(f, false, PRIV_VERSION_1_10_0, ext_f),
+ISA_EXT_DATA_ENTRY(d, false, PRIV_VERSION_1_10_0, ext_d),
+ISA_EXT_DATA_ENTRY(c, false, PRIV_VERSION_1_10_0, ext_c),
 ISA_EXT_DATA_ENTRY(h, false, PRIV_VERSION_1_12_0, ext_h),
 ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_12_0, ext_v),
 ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
@@ -1196,16 +1201,23 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
*data)
 device_class_set_props(dc, riscv_cpu_properties);
 }
 
-static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str, int 
max_str_len)
+static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str)
 {
 char *old = *isa_str;
 char *new = *isa_str;
 int i;
 
 for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
-if (isa_edata_arr[i].multi_letter &&
-isa_ext_is_enabled(cpu, &isa_edata_arr[i])) {
-new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
+if (isa_ext_is_enabled(cpu, &isa_edata_arr[i])) {
+if (isa_edata_arr[i].multi_letter) {
+if (cpu->cfg.short_isa_string) {
+continue;
+}
+new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
+} else {
+new = g_strconcat(old, isa_edata_arr[i].name, NULL);
+}
+
 g_free(old);
 old = new;
 }
@@ -1216,19 +1228,12 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
**isa_str, int max_str_len)
 
 char *riscv_isa_string(RISCVCPU *cpu)
 {
-int i;
-const size_t maxlen = sizeof("rv128") + sizeof(riscv_single_letter_exts);
+const size_t maxlen = sizeof("rv128");
 char *isa_str = g_new(char, maxlen);
-char *p = isa_str + snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS);
-for (i = 0; i < sizeof(riscv_single_letter_exts) - 1; i++) {
-if (cpu->env.misa_ext & RV(riscv_single_letter_exts[i])) {
-*p++ = qemu_tolower(riscv_single_letter_exts[i]);
-}
-}
-*p = '\0';
-if (!cpu->cfg.short_isa_string) {
-riscv_isa_string_ext(cpu, &isa_str, maxlen);
-}
+
+snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS);
+riscv_isa_string_ext(cpu, &isa_str);
+
 return isa_str;
 }
 
-- 
2.34.1




Re: [PATCH for 8.0 0/2] virtio-iommu: Fix Replay

2022-12-08 Thread Peter Xu
On Thu, Dec 08, 2022 at 08:48:09AM +0100, Eric Auger wrote:
> Hi Peter,

Hi, Eric,

> 
> On 12/8/22 00:49, Peter Xu wrote:
> > Hi, Eric,
> >
> > On Wed, Dec 07, 2022 at 02:36:44PM +0100, Eric Auger wrote:
> >> When assigning VFIO devices protected by a virtio-iommu we need to replay
> >> the mappings when adding a new IOMMU MR and when attaching a device to
> >> a domain. While we do a "remap" we currently fail to first unmap the
> >> existing IOVA mapping and just map the new one. With some device/group
> >> topology this can lead to errors in VFIO when trying to DMA_MAP IOVA
> >> ranges onto existing ones.
> > I'm not sure whether virtio-iommu+vfio will suffer from DMA races like when
> > we were working on the vt-d replay for vfio.  The issue is whether DMA can
> > happen right after UNMAP but before MAP of the same page if the page was
> > always mapped.
> 
> I don't think it can race because a mutex is hold while doing the
> virtio_iommu_replay(), and each time a virtio cmd is handled (attach,
> map, unmap), see virtio_iommu_handle_command.
> So I think it is safe.

It's not the race in the code, it's the race between modifying host IOMMU
pgtable with DMA happening in parallel.  The bug triggered with DMA_MAP
returning -EEXIST means there's existing mapping.

If during replay there's mapped ranges and the ranges are prone to DMA,
then IIUC it can happen.

I didn't really check specifically for virtio-iommu and I mostly forget the
details, just to raise this up.  It's possible for some reason it just
can't trigger.  VT-d definitely can, in which case we'll see DMA errors on
the host from the assigned device when the DMA triggers during the "unmap
and map" window.

Thanks,

-- 
Peter Xu




Re: [PATCH 1/4] coroutine: Clean up superfluous inclusion of qemu/coroutine.h

2022-12-08 Thread Stefan Hajnoczi
Probably because block layer, aio.h, and coroutine_int.h header files
already include "qemu/coroutine.h"?

Reviewed-by: Stefan Hajnoczi 



Re: [PATCH 2/4] coroutine: Move coroutine_fn to qemu/osdep.h, trim includes

2022-12-08 Thread Stefan Hajnoczi
On Thu, 8 Dec 2022 at 09:25, Markus Armbruster  wrote:
>
> block/block-hmp-cmds.h and qemu/co-shared-resource.h use coroutine_fn
> without including qemu/coroutine.h.  They compile only if it's already
> included from elsewhere.
>
> I could fix that, but pulling in qemu/coroutine.h and everything it
> includes just for a macro that expands into nothing feels silly.
> Instead, move the macro to qemu/osdep.h.
>
> Inclusions of qemu/coroutine.h just for coroutine_fn become
> superfluous.  Drop them.
>
> Signed-off-by: Markus Armbruster 
> ---
>  include/block/aio_task.h |  2 --
>  include/block/block-common.h |  1 -
>  include/monitor/hmp.h|  1 -
>  include/qemu/coroutine.h | 18 +++---
>  include/qemu/osdep.h | 16 
>  5 files changed, 23 insertions(+), 15 deletions(-)

Reviewed-by: Stefan Hajnoczi 



Re: [PATCH 3/4] coroutine: Clean up superfluous inclusion of qemu/lockable.h

2022-12-08 Thread Stefan Hajnoczi
Reviewed-by: Stefan Hajnoczi 



Re: [PATCH 4/4] coroutine: Break inclusion loop

2022-12-08 Thread Stefan Hajnoczi
Reviewed-by: Stefan Hajnoczi 



CVMSEG Emulation

2022-12-08 Thread Christopher Wrogg

In userspace emulation how do I make a set of addresses always valid and
initialized to 0 even though the process does not map it in? In particular
I want to map the CVMSEG for Cavium qemu-mips64 and qemu-mipsn32. The
addresses would be 0x8000 - 0xBFFF. I've looked at
target_mmap but it can't handle addresses that large. The lack of an
emulated mmu for 64 bit guests is going to be a problem.


[RFC PATCH] RISC-V: Save mmu_idx using FIELD_DP32 not OR

2022-12-08 Thread Christoph Muellner
From: Christoph Müllner 

Setting flags using OR might work, but is not optimal
for a couple of reasons:
* No way grep for stores to the field MEM_IDX.
* The return value of cpu_mmu_index() is not masked
  (not a real problem as long as cpu_mmu_index() returns only valid values).
* If the offset of MEM_IDX would get moved to non-0, then this code
  would not work anymore.

Let's use the FIELD_DP32() macro instead of the OR, which is already
used for most other flags.

Signed-off-by: Christoph Müllner 
---
 target/riscv/cpu_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 278d163803..d68b6b351d 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -80,7 +80,8 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
 flags |= TB_FLAGS_MSTATUS_FS;
 flags |= TB_FLAGS_MSTATUS_VS;
 #else
-flags |= cpu_mmu_index(env, 0);
+flags = FIELD_DP32(flags, TB_FLAGS, MEM_IDX, cpu_mmu_index(env, 0));
+
 if (riscv_cpu_fp_enabled(env)) {
 flags |= env->mstatus & MSTATUS_FS;
 }
-- 
2.38.1




[PATCH-for-8.0 v2 0/4] target/cpu: System/User cleanups around hwaddr/vaddr

2022-12-08 Thread Philippe Mathieu-Daudé
We are not supposed to use the 'hwaddr' type on user emulation.

This series is a preparatory cleanup before few refactors to
isolate further System vs User code.

Since v1:
- only restrict SavedIOTLB in header (Alex)
- convert insert/remove_breakpoint implementations (Peter)

Philippe Mathieu-Daudé (4):
  cputlb: Restrict SavedIOTLB to system emulation
  gdbstub: Use vaddr type for generic insert/remove_breakpoint() API
  target/cpu: Restrict cpu_get_phys_page_debug() handlers to sysemu
  target/sparc/sysemu: Remove pointless CONFIG_USER_ONLY guard

 accel/kvm/kvm-all.c| 4 ++--
 accel/kvm/kvm-cpus.h   | 4 ++--
 accel/tcg/tcg-accel-ops.c  | 4 ++--
 gdbstub/gdbstub.c  | 1 -
 gdbstub/internals.h| 6 --
 gdbstub/softmmu.c  | 5 ++---
 gdbstub/user.c | 5 ++---
 include/hw/core/cpu.h  | 6 --
 include/sysemu/accel-ops.h | 6 +++---
 target/alpha/cpu.h | 2 +-
 target/cris/cpu.h  | 3 +--
 target/hppa/cpu.h  | 2 +-
 target/m68k/cpu.h  | 2 +-
 target/nios2/cpu.h | 2 +-
 target/openrisc/cpu.h  | 3 ++-
 target/ppc/cpu.h   | 2 +-
 target/rx/cpu.h| 2 +-
 target/rx/helper.c | 4 ++--
 target/sh4/cpu.h   | 2 +-
 target/sparc/cpu.h | 3 ++-
 target/sparc/mmu_helper.c  | 2 --
 target/xtensa/cpu.h| 2 +-
 22 files changed, 36 insertions(+), 36 deletions(-)

-- 
2.38.1




[PATCH-for-8.0 v2 1/4] cputlb: Restrict SavedIOTLB to system emulation

2022-12-08 Thread Philippe Mathieu-Daudé
Commit 2f3a57ee47 ("cputlb: ensure we save the IOTLB data in
case of reset") added the SavedIOTLB structure -- which is
system emulation specific -- in the generic CPUState structure.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/core/cpu.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 8830546121..bc3229ae13 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -222,7 +222,7 @@ struct CPUWatchpoint {
 QTAILQ_ENTRY(CPUWatchpoint) entry;
 };
 
-#ifdef CONFIG_PLUGIN
+#if defined(CONFIG_PLUGIN) && !defined(CONFIG_USER_ONLY)
 /*
  * For plugins we sometime need to save the resolved iotlb data before
  * the memory regions get moved around  by io_writex.
@@ -406,9 +406,11 @@ struct CPUState {
 
 #ifdef CONFIG_PLUGIN
 GArray *plugin_mem_cbs;
+#if !defined(CONFIG_USER_ONLY)
 /* saved iotlb data from io_writex */
 SavedIOTLB saved_iotlb;
-#endif
+#endif /* !CONFIG_USER_ONLY */
+#endif /* CONFIG_PLUGIN */
 
 /* TODO Move common fields from CPUArchState here. */
 int cpu_index;
-- 
2.38.1




[PATCH-for-8.0 v2 2/4] gdbstub: Use vaddr type for generic insert/remove_breakpoint() API

2022-12-08 Thread Philippe Mathieu-Daudé
Both insert/remove_breakpoint() handlers are used in system and
user emulation. We can not use the 'hwaddr' type on user emulation,
we have to use 'vaddr' which is defined as "wide enough to contain
any #target_ulong virtual address".

gdbstub.c doesn't require to include "exec/hwaddr.h" anymore.

Signed-off-by: Philippe Mathieu-Daudé 
---
 accel/kvm/kvm-all.c| 4 ++--
 accel/kvm/kvm-cpus.h   | 4 ++--
 accel/tcg/tcg-accel-ops.c  | 4 ++--
 gdbstub/gdbstub.c  | 1 -
 gdbstub/internals.h| 6 --
 gdbstub/softmmu.c  | 5 ++---
 gdbstub/user.c | 5 ++---
 include/sysemu/accel-ops.h | 6 +++---
 8 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f99b0becd8..f3b434c717 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3219,7 +3219,7 @@ bool kvm_supports_guest_debug(void)
 return kvm_has_guest_debug;
 }
 
-int kvm_insert_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr len)
+int kvm_insert_breakpoint(CPUState *cpu, int type, vaddr addr, vaddr len)
 {
 struct kvm_sw_breakpoint *bp;
 int err;
@@ -3257,7 +3257,7 @@ int kvm_insert_breakpoint(CPUState *cpu, int type, hwaddr 
addr, hwaddr len)
 return 0;
 }
 
-int kvm_remove_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr len)
+int kvm_remove_breakpoint(CPUState *cpu, int type, vaddr addr, vaddr len)
 {
 struct kvm_sw_breakpoint *bp;
 int err;
diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h
index fd63fe6a59..ca40add32c 100644
--- a/accel/kvm/kvm-cpus.h
+++ b/accel/kvm/kvm-cpus.h
@@ -19,8 +19,8 @@ void kvm_cpu_synchronize_post_reset(CPUState *cpu);
 void kvm_cpu_synchronize_post_init(CPUState *cpu);
 void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu);
 bool kvm_supports_guest_debug(void);
-int kvm_insert_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr len);
-int kvm_remove_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr len);
+int kvm_insert_breakpoint(CPUState *cpu, int type, vaddr addr, vaddr len);
+int kvm_remove_breakpoint(CPUState *cpu, int type, vaddr addr, vaddr len);
 void kvm_remove_all_breakpoints(CPUState *cpu);
 
 #endif /* KVM_CPUS_H */
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 19cbf1db3a..d9228fd403 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -116,7 +116,7 @@ static inline int xlat_gdb_type(CPUState *cpu, int gdbtype)
 return cputype;
 }
 
-static int tcg_insert_breakpoint(CPUState *cs, int type, hwaddr addr, hwaddr 
len)
+static int tcg_insert_breakpoint(CPUState *cs, int type, vaddr addr, vaddr len)
 {
 CPUState *cpu;
 int err = 0;
@@ -147,7 +147,7 @@ static int tcg_insert_breakpoint(CPUState *cs, int type, 
hwaddr addr, hwaddr len
 }
 }
 
-static int tcg_remove_breakpoint(CPUState *cs, int type, hwaddr addr, hwaddr 
len)
+static int tcg_remove_breakpoint(CPUState *cs, int type, vaddr addr, vaddr len)
 {
 CPUState *cpu;
 int err = 0;
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index be88ca0d71..c3fbc31123 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -48,7 +48,6 @@
 #include "sysemu/runstate.h"
 #include "semihosting/semihost.h"
 #include "exec/exec-all.h"
-#include "exec/hwaddr.h"
 #include "sysemu/replay.h"
 
 #include "internals.h"
diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index eabb0341d1..b23999f951 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -9,9 +9,11 @@
 #ifndef _INTERNALS_H_
 #define _INTERNALS_H_
 
+#include "exec/cpu-common.h"
+
 bool gdb_supports_guest_debug(void);
-int gdb_breakpoint_insert(CPUState *cs, int type, hwaddr addr, hwaddr len);
-int gdb_breakpoint_remove(CPUState *cs, int type, hwaddr addr, hwaddr len);
+int gdb_breakpoint_insert(CPUState *cs, int type, vaddr addr, vaddr len);
+int gdb_breakpoint_remove(CPUState *cs, int type, vaddr addr, vaddr len);
 void gdb_breakpoint_remove_all(CPUState *cs);
 
 #endif /* _INTERNALS_H_ */
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index f208c6cf15..129575e510 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -11,7 +11,6 @@
 
 #include "qemu/osdep.h"
 #include "exec/gdbstub.h"
-#include "exec/hwaddr.h"
 #include "sysemu/cpus.h"
 #include "internals.h"
 
@@ -24,7 +23,7 @@ bool gdb_supports_guest_debug(void)
 return false;
 }
 
-int gdb_breakpoint_insert(CPUState *cs, int type, hwaddr addr, hwaddr len)
+int gdb_breakpoint_insert(CPUState *cs, int type, vaddr addr, vaddr len)
 {
 const AccelOpsClass *ops = cpus_get_accel();
 if (ops->insert_breakpoint) {
@@ -33,7 +32,7 @@ int gdb_breakpoint_insert(CPUState *cs, int type, hwaddr 
addr, hwaddr len)
 return -ENOSYS;
 }
 
-int gdb_breakpoint_remove(CPUState *cs, int type, hwaddr addr, hwaddr len)
+int gdb_breakpoint_remove(CPUState *cs, int type, vaddr addr, vaddr len)
 {
 const AccelOpsClass *ops = cpus_get_accel();
 if (ops->remove_breakpoint) {
diff --git a/gdbstub/user.c b/gdbstub/user.c
index 0

[PATCH-for-8.0 v2 3/4] target/cpu: Restrict cpu_get_phys_page_debug() handlers to sysemu

2022-12-08 Thread Philippe Mathieu-Daudé
The 'hwaddr' type is only available / meaningful on system emulation.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/alpha/cpu.h| 2 +-
 target/cris/cpu.h | 3 +--
 target/hppa/cpu.h | 2 +-
 target/m68k/cpu.h | 2 +-
 target/nios2/cpu.h| 2 +-
 target/openrisc/cpu.h | 3 ++-
 target/ppc/cpu.h  | 2 +-
 target/rx/cpu.h   | 2 +-
 target/rx/helper.c| 4 ++--
 target/sh4/cpu.h  | 2 +-
 target/sparc/cpu.h| 3 ++-
 target/xtensa/cpu.h   | 2 +-
 12 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/target/alpha/cpu.h b/target/alpha/cpu.h
index d0abc949a8..5e67304d81 100644
--- a/target/alpha/cpu.h
+++ b/target/alpha/cpu.h
@@ -276,9 +276,9 @@ extern const VMStateDescription vmstate_alpha_cpu;
 
 void alpha_cpu_do_interrupt(CPUState *cpu);
 bool alpha_cpu_exec_interrupt(CPUState *cpu, int int_req);
+hwaddr alpha_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 #endif /* !CONFIG_USER_ONLY */
 void alpha_cpu_dump_state(CPUState *cs, FILE *f, int flags);
-hwaddr alpha_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 int alpha_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int alpha_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 
diff --git a/target/cris/cpu.h b/target/cris/cpu.h
index e6776f25b1..71fa1f96e0 100644
--- a/target/cris/cpu.h
+++ b/target/cris/cpu.h
@@ -193,12 +193,11 @@ bool cris_cpu_exec_interrupt(CPUState *cpu, int int_req);
 bool cris_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
MMUAccessType access_type, int mmu_idx,
bool probe, uintptr_t retaddr);
+hwaddr cris_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 #endif
 
 void cris_cpu_dump_state(CPUState *cs, FILE *f, int flags);
 
-hwaddr cris_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
-
 int crisv10_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int cris_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int cris_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
diff --git a/target/hppa/cpu.h b/target/hppa/cpu.h
index 6f3b6beecf..b595ef25a9 100644
--- a/target/hppa/cpu.h
+++ b/target/hppa/cpu.h
@@ -322,11 +322,11 @@ static inline void cpu_hppa_change_prot_id(CPUHPPAState 
*env) { }
 void cpu_hppa_change_prot_id(CPUHPPAState *env);
 #endif
 
-hwaddr hppa_cpu_get_phys_page_debug(CPUState *cs, vaddr addr);
 int hppa_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int hppa_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 void hppa_cpu_dump_state(CPUState *cs, FILE *f, int);
 #ifndef CONFIG_USER_ONLY
+hwaddr hppa_cpu_get_phys_page_debug(CPUState *cs, vaddr addr);
 bool hppa_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
MMUAccessType access_type, int mmu_idx,
bool probe, uintptr_t retaddr);
diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h
index 3a9cfe2f33..68ed531fc3 100644
--- a/target/m68k/cpu.h
+++ b/target/m68k/cpu.h
@@ -176,9 +176,9 @@ struct ArchCPU {
 #ifndef CONFIG_USER_ONLY
 void m68k_cpu_do_interrupt(CPUState *cpu);
 bool m68k_cpu_exec_interrupt(CPUState *cpu, int int_req);
+hwaddr m68k_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 #endif /* !CONFIG_USER_ONLY */
 void m68k_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
-hwaddr m68k_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 int m68k_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int m68k_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 
diff --git a/target/nios2/cpu.h b/target/nios2/cpu.h
index f85581ee56..2f43b67a8f 100644
--- a/target/nios2/cpu.h
+++ b/target/nios2/cpu.h
@@ -262,7 +262,6 @@ void nios2_tcg_init(void);
 void nios2_cpu_do_interrupt(CPUState *cs);
 void dump_mmu(CPUNios2State *env);
 void nios2_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
-hwaddr nios2_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 G_NORETURN void nios2_cpu_do_unaligned_access(CPUState *cpu, vaddr addr,
   MMUAccessType access_type, int 
mmu_idx,
   uintptr_t retaddr);
@@ -288,6 +287,7 @@ static inline int cpu_mmu_index(CPUNios2State *env, bool 
ifetch)
 }
 
 #ifndef CONFIG_USER_ONLY
+hwaddr nios2_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 bool nios2_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
 MMUAccessType access_type, int mmu_idx,
 bool probe, uintptr_t retaddr);
diff --git a/target/openrisc/cpu.h b/target/openrisc/cpu.h
index 1d5efa5ca2..31a4ae5ad3 100644
--- a/target/openrisc/cpu.h
+++ b/target/openrisc/cpu.h
@@ -312,7 +312,6 @@ struct ArchCPU {
 
 void cpu_openrisc_list(void);
 void openrisc_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
-hwaddr openrisc_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 int openrisc_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int openrisc_cpu_gdb_wri

[PATCH-for-8.0 v2 4/4] target/sparc/sysemu: Remove pointless CONFIG_USER_ONLY guard

2022-12-08 Thread Philippe Mathieu-Daudé
Commit caac44a52a ("target/sparc: Make sparc_cpu_tlb_fill sysemu
only") restricted mmu_helper.c to system emulation. Checking
whether CONFIG_USER_ONLY is defined is now pointless.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/sparc/mmu_helper.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/target/sparc/mmu_helper.c b/target/sparc/mmu_helper.c
index 919448a494..a7e51e4b7d 100644
--- a/target/sparc/mmu_helper.c
+++ b/target/sparc/mmu_helper.c
@@ -924,7 +924,6 @@ hwaddr sparc_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 return phys_addr;
 }
 
-#ifndef CONFIG_USER_ONLY
 G_NORETURN void sparc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
   MMUAccessType access_type,
   int mmu_idx,
@@ -942,4 +941,3 @@ G_NORETURN void sparc_cpu_do_unaligned_access(CPUState *cs, 
vaddr addr,
 
 cpu_raise_exception_ra(env, TT_UNALIGNED, retaddr);
 }
-#endif /* !CONFIG_USER_ONLY */
-- 
2.38.1




Re: [PATCH-for-8.0 v2 0/4] target/cpu: System/User cleanups around hwaddr/vaddr

2022-12-08 Thread Richard Henderson

On 12/8/22 09:35, Philippe Mathieu-Daudé wrote:

Philippe Mathieu-Daudé (4):
   cputlb: Restrict SavedIOTLB to system emulation
   gdbstub: Use vaddr type for generic insert/remove_breakpoint() API
   target/cpu: Restrict cpu_get_phys_page_debug() handlers to sysemu
   target/sparc/sysemu: Remove pointless CONFIG_USER_ONLY guard


Reviewed-by: Richard Henderson 


r~



[PATCH] mailmap: Fix Stefan Weil author email

2022-12-08 Thread Philippe Mathieu-Daudé
Fix authorship of commits 266aaedc37~..ac14949821. See commit
3bd2608db7 ("maint: Add .mailmap entries for patches claiming
list authorship") for rationale.

Signed-off-by: Philippe Mathieu-Daudé 
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index 35dddbe27b..fad2aff5aa 100644
--- a/.mailmap
+++ b/.mailmap
@@ -45,6 +45,7 @@ Ed Swierk  Ed Swierk via 
Qemu-devel  Ian McKellar via Qemu-devel 

 Julia Suvorova  Julia Suvorova via Qemu-devel 

 Justin Terry (VM)  Justin Terry (VM) via Qemu-devel 

+Stefan Weil  Stefan Weil via 
 
 # Next, replace old addresses by a more recent one.
 Aleksandar Markovic  

-- 
2.38.1




Re: [PATCH-for-8.0 v2 2/4] gdbstub: Use vaddr type for generic insert/remove_breakpoint() API

2022-12-08 Thread Fabiano Rosas
Philippe Mathieu-Daudé  writes:

> Both insert/remove_breakpoint() handlers are used in system and
> user emulation. We can not use the 'hwaddr' type on user emulation,
> we have to use 'vaddr' which is defined as "wide enough to contain
> any #target_ulong virtual address".
>
> gdbstub.c doesn't require to include "exec/hwaddr.h" anymore.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Fabiano Rosas 



Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-12-08 Thread Peter Xu
On Thu, Dec 08, 2022 at 10:39:11PM +0800, Chuang Xu wrote:
> 
> On 2022/12/8 上午6:08, Peter Xu wrote:
> > On Thu, Dec 08, 2022 at 12:07:03AM +0800, Chuang Xu wrote:
> > > On 2022/12/6 上午12:28, Peter Xu wrote:
> > > > Chuang,
> > > > 
> > > > No worry on the delay; you're faster than when I read yours. :)
> > > > 
> > > > On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote:
> > > > > > As a start, maybe you can try with poison 
> > > > > > address_space_to_flatview() (by
> > > > > > e.g. checking the start_pack_mr_change flag and assert it is not 
> > > > > > set)
> > > > > > during this process to see whether any call stack can even try to
> > > > > > dereference a flatview.
> > > > > > 
> > > > > > It's just that I didn't figure a good way to "prove" its validity, 
> > > > > > even if
> > > > > > I think this is an interesting idea worth thinking to shrink the 
> > > > > > downtime.
> > > > > Thanks for your sugguestions!
> > > > > I used a thread local variable to identify whether the current thread 
> > > > > is a
> > > > > migration thread(main thread of target qemu) and I modified the code 
> > > > > of
> > > > > qemu_coroutine_switch to make sure the thread local variable true 
> > > > > only in
> > > > > process_incoming_migration_co call stack. If the target qemu detects 
> > > > > that
> > > > > start_pack_mr_change is set and address_space_to_flatview() is called 
> > > > > in
> > > > > non-migrating threads or non-migrating coroutine, it will crash.
> > > > Are you using the thread var just to avoid the assert triggering in the
> > > > migration thread when commiting memory changes?
> > > > 
> > > > I think _maybe_ another cleaner way to sanity check this is directly 
> > > > upon
> > > > the depth:
> > > > 
> > > > static inline FlatView *address_space_to_flatview(AddressSpace *as)
> > > > {
> > > >   /*
> > > >* Before using any flatview, sanity check we're not during a 
> > > > memory
> > > >* region transaction or the map can be invalid.  Note that this 
> > > > can
> > > >* also be called during commit phase of memory transaction, but 
> > > > that
> > > >* should also only happen when the depth decreases to 0 first.
> > > >*/
> > > >   assert(memory_region_transaction_depth == 0);
> > > >   return qatomic_rcu_read(&as->current_map);
> > > > }
> > > > 
> > > > That should also cover the safe cases of memory transaction commits 
> > > > during
> > > > migration.
> > > > 
> > > Peter, I tried this way and found that the target qemu will crash.
> > > 
> > > Here is the gdb backtrace:
> > > 
> > > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> > > #1  0x7ff2929d851a in __GI_abort () at abort.c:118
> > > #2  0x7ff2929cfe67 in __assert_fail_base (fmt=, 
> > > assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth 
> > > == 0", file=file@entry=0x55a32575d9b0 
> > > "/data00/migration/qemu-5.2.0/include/exec/memory.h",
> > >  line=line@entry=766, function=function@entry=0x55a32578d6e0 
> > > <__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:92
> > > #3  0x7ff2929cff12 in __GI___assert_fail 
> > > (assertion=assertion@entry=0x55a32578cdc0 
> > > "memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0 
> > > "/data00/migration/qemu-5.2.0/include/exec/memory.h", line=line@entry=766,
> > >  function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> 
> > > "address_space_to_flatview") at assert.c:101
> > > #4  0x55a324b2ed5e in address_space_to_flatview (as=0x55a326132580 
> > > ) at 
> > > /data00/migration/qemu-5.2.0/include/exec/memory.h:766
> > > #5  0x55a324e79559 in address_space_to_flatview (as=0x55a326132580 
> > > ) at ../softmmu/memory.c:811
> > > #6  address_space_get_flatview (as=0x55a326132580 ) 
> > > at ../softmmu/memory.c:805
> > > #7  0x55a324e96474 in address_space_cache_init 
> > > (cache=cache@entry=0x55a32a4fb000, as=, 
> > > addr=addr@entry=68404985856, len=len@entry=4096, is_write=false) at 
> > > ../softmmu/physmem.c:3307
> > > #8  0x55a324ea9cba in virtio_init_region_cache (vdev=0x55a32985d9a0, 
> > > n=0) at ../hw/virtio/virtio.c:185
> > > #9  0x55a324eaa615 in virtio_load (vdev=0x55a32985d9a0, f= > > out>, version_id=) at ../hw/virtio/virtio.c:3203
> > > #10 0x55a324c6ab96 in vmstate_load_state (f=f@entry=0x55a329dc0c00, 
> > > vmsd=0x55a325fc1a60 , opaque=0x55a32985d9a0, 
> > > version_id=1) at ../migration/vmstate.c:143
> > > #11 0x55a324cda138 in vmstate_load (f=0x55a329dc0c00, 
> > > se=0x55a329941c90) at ../migration/savevm.c:913
> > > #12 0x55a324cdda34 in qemu_loadvm_section_start_full 
> > > (mis=0x55a3284ef9e0, f=0x55a329dc0c00) at ../migration/savevm.c:2741
> > > #13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00, 
> > > mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939
> > > #14 0x55a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at 
> > > ../migration

Re: [SeaBIOS] Re: [PATCH 4/4] be less conservative with the 64bit pci io window

2022-12-08 Thread Igor Mammedov
On Wed, 23 Nov 2022 11:25:08 +0100
Gerd Hoffmann  wrote:

> On Tue, Nov 22, 2022 at 01:43:16PM -0500, Kevin O'Connor wrote:
> > On Mon, Nov 21, 2022 at 11:32:13AM +0100, Gerd Hoffmann wrote:  
> > > Current seabios code will only enable and use the 64bit pci io window in
> > > case it runs out of space in the 32bit pci mmio window below 4G.
> > > 
> > > This patch will also enable the 64bit pci io window when
> > >   (a) RAM above 4G is present, and
> > >   (b) the physical address space size is known, and
> > >   (c) seabios is running on a 64bit capable processor.
> > > 
> > > This operates with the assumption that guests which are ok with memory
> > > above 4G most likely can handle mmio above 4G too.  
> > 
> > Thanks.  In general, the series looks good to me.  Can you elaborate
> > on the background to this change though?  It sounds like there is a
> > (small) risk of a regression, so I think it would be good to have a
> > high level understanding of what is driving this memory reorg.  
> 
> Well, the idea is to adapt to the world moving forward.  Running a
> 64-bit capable OS is standard these days, and the resources needed
> by devices (especially GPUs) are becoming larger and larger.
> 
> Yes, there is the risk that (old) guests are unhappy with their
> PCI bars suddenly being mapped above 4G.  Can happen only in case
> seabios handles pci initialization (i.e. when running on qemu,
> otherwise coreboot initializes the pci bars).  I hope the memory
> check handles the 'old guest' case: when the guest can't handle
> addresses above 4G it is unlikely that qemu is configured to have
> memory mapped above 4G ...

does it break 32-bit PAE enabled guests
(which can have more then 4Gb RAM configured)?

> 
> take care,
>   Gerd
> 
> ___
> SeaBIOS mailing list -- seab...@seabios.org
> To unsubscribe send an email to seabios-le...@seabios.org
> 




Re: [PATCH v4 0/2] Add OCP extended log to nvme QEMU

2022-12-08 Thread Joel Granados
ping.

Is the solution to the guid constant ok?

Best

On Fri, Nov 25, 2022 at 10:48:06AM +0100, Joel Granados wrote:
> The motivation and description are contained in the last patch in this set.
> Will copy paste it here for convenience:
> 
> In order to evaluate write amplification factor (WAF) within the storage
> stack it is important to know the number of bytes written to the
> controller. The existing SMART log value of Data Units Written is too
> coarse (given in units of 500 Kb) and so we add the SMART health
> information extended from the OCP specification (given in units of bytes).
> 
> To accommodate different vendor specific specifications like OCP, we add a
> multiplexing function (nvme_vendor_specific_log) which will route to the
> different log functions based on arguments and log ids. We only return the
> OCP extended smart log when the command is 0xC0 and ocp has been turned on
> in the args.
> 
> Though we add the whole nvme smart log extended structure, we only 
> populate
> the physical_media_units_{read,written}, log_page_version and
> log_page_uuid.
> 
> V4 changes:
> 1. Fixed cpu_to_le64 instead of cpu_to_le32
> 2. Variable naming : uuid -> guid
> 3. Changed how the guid value appears in the code:
>Used to be:
> smart_l.log_page_uuid[0] = 0xA4F2BFEA2810AFC5;
> smart_l.log_page_uuid[1] = 0xAFD514C97C6F4F9C;
> 
>Now is:
> static const uint8_t guid[16] = {
> 0xC5, 0xAF, 0x10, 0x28, 0xEA, 0xBF, 0xF2, 0xA4,
> 0x9C, 0x4F, 0x6F, 0x7C, 0xC9, 0x14, 0xD5, 0xAF
> };
> 
>This is different from what @klaus suggested because I want to keep it
>consistent to what nvme-cli currently implements. I think here we can
>either change both nvme-cli and this patch or leave the order of the
>bytes as they are here. This all depends on how you interpret the Spec
>(which is ambiguous)
> 
> V3 changes:
> 1. Corrected a bunch of checkpatch issues. Since I changed the first patch
>I did not include the reviewed-by.
> 2. Included some documentation in nvme.rst for the ocp argument
> 3. Squashed the ocp arg changes into the main patch.
> 4. Fixed several comments and an open parenthesis
> 5. Hex values are now in lower case.
> 6. Change the reserved format to rsvd
> 7. Made sure that NvmeCtrl is the first arg in all the functions.
> 8. Fixed comment on commit of main patch
> 
> V2 changes:
> 1. I moved the ocp parameter from the namespace to the subsystem as it is
>defined there in the OCP specification
> 2. I now accumulate statistics from all namespaces and report them back on
>the extended log as per the spec.
> 3. I removed the default case in the switch in nvme_vendor_specific_log as
>it does not have any special function.
> 
> Joel Granados (2):
>   nvme: Move adjustment of data_units{read,written}
>   nvme: Add physical writes/reads from OCP log
> 
>  docs/system/devices/nvme.rst |  7 
>  hw/nvme/ctrl.c   | 73 +---
>  hw/nvme/nvme.h   |  1 +
>  include/block/nvme.h | 36 ++
>  4 files changed, 111 insertions(+), 6 deletions(-)
> 
> -- 
> 2.30.2
> 


signature.asc
Description: PGP signature


[RFC PATCH-for-8.0] hw: Avoid using inlined functions with external linkage

2022-12-08 Thread Philippe Mathieu-Daudé
When using Clang ("Apple clang version 14.0.0 (clang-1400.0.29.202)")
and building with -Wall we get:

  hw/arm/smmu-common.c:173:33: warning: static function 
'smmu_hash_remove_by_asid_iova' is used in an inline function with external 
linkage [-Wstatic-in-inline]
  hw/arm/smmu-common.h:170:1: note: use 'static' to give inline function 
'smmu_iotlb_inv_iova' internal linkage
void smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
^
static

None of our code base require / use inlined functions with external
linkage. Some places use internal inlining in the hot path. These
two functions are certainly not in any hot path and don't justify
any inlining.

Reported-by: Stefan Weil 
Signed-off-by: Philippe Mathieu-Daudé 
---
RFC: Any better justification?
---
 hw/arm/smmu-common.c | 10 +-
 hw/i386/x86.c|  3 +--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index e09b9c13b7..298e021cd3 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -116,7 +116,7 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, 
SMMUTLBEntry *new)
 g_hash_table_insert(bs->iotlb, key, new);
 }
 
-inline void smmu_iotlb_inv_all(SMMUState *s)
+void smmu_iotlb_inv_all(SMMUState *s)
 {
 trace_smmu_iotlb_inv_all();
 g_hash_table_remove_all(s->iotlb);
@@ -146,7 +146,7 @@ static gboolean smmu_hash_remove_by_asid_iova(gpointer key, 
gpointer value,
((entry->iova & ~info->mask) == info->iova);
 }
 
-inline void
+void
 smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
 uint8_t tg, uint64_t num_pages, uint8_t ttl)
 {
@@ -174,7 +174,7 @@ smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
 &info);
 }
 
-inline void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
+void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
 {
 trace_smmu_iotlb_inv_asid(asid);
 g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid, &asid);
@@ -374,7 +374,7 @@ error:
  *
  * return 0 on success
  */
-inline int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
+int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
 SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
 {
 if (!cfg->aa64) {
@@ -483,7 +483,7 @@ static void smmu_unmap_notifier_range(IOMMUNotifier *n)
 }
 
 /* Unmap all notifiers attached to @mr */
-inline void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
+void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
 {
 IOMMUNotifier *n;
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 78cc131926..9ac1680180 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -64,8 +64,7 @@
 /* Physical Address of PVH entry point read from kernel ELF NOTE */
 static size_t pvh_start_addr;
 
-inline void init_topo_info(X86CPUTopoInfo *topo_info,
-   const X86MachineState *x86ms)
+void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms)
 {
 MachineState *ms = MACHINE(x86ms);
 
-- 
2.38.1




[PATCH] scripts/archive-source: Use more portable argument with tar command

2022-12-08 Thread Philippe Mathieu-Daudé
When using the archive-source.sh script on Darwin we get:

  tar: Option --concatenate is not supported
  Usage:
List:tar -tf 
Extract: tar -xf 
Create:  tar -cf  [filenames...]
Help:tar --help

Replace the long argument added by commit 8fc76176f6 ("scripts: use
git-archive in archive-source") by their short form to keep this
script functional.

Signed-off-by: Philippe Mathieu-Daudé 
---
 scripts/archive-source.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/archive-source.sh b/scripts/archive-source.sh
index 23e042dacd..6a710a212e 100755
--- a/scripts/archive-source.sh
+++ b/scripts/archive-source.sh
@@ -67,7 +67,7 @@ for sm in $submodules; do
 esac
 (cd $sm; git archive --format tar --prefix "$sm/" $(tree_ish)) > 
"$sub_file"
 test $? -ne 0 && error "failed to archive submodule $sm ($smhash)"
-tar --concatenate --file "$tar_file" "$sub_file"
+tar -c -f "$tar_file" "$sub_file"
 test $? -ne 0 && error "failed append submodule $sm to $tar_file"
 done
 exit 0
-- 
2.38.1




Re: [RFC PATCH-for-8.0] hw: Avoid using inlined functions with external linkage

2022-12-08 Thread Peter Maydell
On Thu, 8 Dec 2022 at 16:11, Philippe Mathieu-Daudé  wrote:
>
> When using Clang ("Apple clang version 14.0.0 (clang-1400.0.29.202)")
> and building with -Wall we get:
>
>   hw/arm/smmu-common.c:173:33: warning: static function 
> 'smmu_hash_remove_by_asid_iova' is used in an inline function with external 
> linkage [-Wstatic-in-inline]
>   hw/arm/smmu-common.h:170:1: note: use 'static' to give inline function 
> 'smmu_iotlb_inv_iova' internal linkage
> void smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
> ^
> static
>
> None of our code base require / use inlined functions with external
> linkage. Some places use internal inlining in the hot path. These
> two functions are certainly not in any hot path and don't justify
> any inlining.
>
> Reported-by: Stefan Weil 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> RFC: Any better justification?

I don't really understand what the warning is trying to warn
about, and googling didn't enlighten me. Does anybody understand it?

In any case, it does seem weird to define a function inline and
also have it be defined in a C file rather than as a 'static inline'
in a header file, so these are likely oversights rather than
intentional.

> ---
>  hw/arm/smmu-common.c | 10 +-
>  hw/i386/x86.c|  3 +--
>  2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index e09b9c13b7..298e021cd3 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -116,7 +116,7 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, 
> SMMUTLBEntry *new)
>  g_hash_table_insert(bs->iotlb, key, new);
>  }
>
> -inline void smmu_iotlb_inv_all(SMMUState *s)
> +void smmu_iotlb_inv_all(SMMUState *s)
>  {
>  trace_smmu_iotlb_inv_all();
>  g_hash_table_remove_all(s->iotlb);
> @@ -146,7 +146,7 @@ static gboolean smmu_hash_remove_by_asid_iova(gpointer 
> key, gpointer value,
> ((entry->iova & ~info->mask) == info->iova);
>  }
>
> -inline void
> +void
>  smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
>  uint8_t tg, uint64_t num_pages, uint8_t ttl)

While we're changing this, can we put the "void" on the same line as
the rest of the function prototype, to match the style of these other
functions ?

>  {
> @@ -174,7 +174,7 @@ smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t 
> iova,
>  &info);
>  }
>
> -inline void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
> +void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
>  {
>  trace_smmu_iotlb_inv_asid(asid);
>  g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid, &asid);
> @@ -374,7 +374,7 @@ error:
>   *
>   * return 0 on success
>   */
> -inline int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags 
> perm,
> +int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
>  SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)

This second line now needs re-indenting.

>  {
>  if (!cfg->aa64) {
> @@ -483,7 +483,7 @@ static void smmu_unmap_notifier_range(IOMMUNotifier *n)
>  }
>
>  /* Unmap all notifiers attached to @mr */
> -inline void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
> +void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
>  {
>  IOMMUNotifier *n;

> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 78cc131926..9ac1680180 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -64,8 +64,7 @@
>  /* Physical Address of PVH entry point read from kernel ELF NOTE */
>  static size_t pvh_start_addr;
>
> -inline void init_topo_info(X86CPUTopoInfo *topo_info,
> -   const X86MachineState *x86ms)
> +void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms)
>  {
>  MachineState *ms = MACHINE(x86ms);

This function is not used anywhere outside this file, so we
can delete the prototype from include/hw/i386/x86.h and
make the function "static void".

With those changes,
Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [PATCH] scripts/archive-source: Use more portable argument with tar command

2022-12-08 Thread Peter Maydell
On Thu, 8 Dec 2022 at 16:21, Philippe Mathieu-Daudé  wrote:
>
> When using the archive-source.sh script on Darwin we get:
>
>   tar: Option --concatenate is not supported
>   Usage:
> List:tar -tf 
> Extract: tar -xf 
> Create:  tar -cf  [filenames...]
> Help:tar --help
>
> Replace the long argument added by commit 8fc76176f6 ("scripts: use
> git-archive in archive-source") by their short form to keep this
> script functional.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  scripts/archive-source.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/archive-source.sh b/scripts/archive-source.sh
> index 23e042dacd..6a710a212e 100755
> --- a/scripts/archive-source.sh
> +++ b/scripts/archive-source.sh
> @@ -67,7 +67,7 @@ for sm in $submodules; do
>  esac
>  (cd $sm; git archive --format tar --prefix "$sm/" $(tree_ish)) > 
> "$sub_file"
>  test $? -ne 0 && error "failed to archive submodule $sm ($smhash)"
> -tar --concatenate --file "$tar_file" "$sub_file"
> +tar -c -f "$tar_file" "$sub_file"

'-c' is not the short-form option of '--concatenate': that would
be '-A'. The problem is not long vs short options, but that
BSD-style tar does not support the --concatenate functionality at all.

>  test $? -ne 0 && error "failed append submodule $sm to $tar_file"
>  done
>  exit 0

thanks
-- PMM



Re: [PATCH qemu.git 0/1] hw/arm/virt: add 2x sp804 timer

2022-12-08 Thread Axel Heider

Peter,


This patch adds timer peripherals to the arm-virt machine.>>

Is there a reason you can't use the CPU's built-in generic timer
device ? That is what typical guest code does on this system.
I'm a bit reluctant to add more devices to the virt board
because over time it gradually gets increasingly complicated,
and every new device model we expose to the guest is another
thing that's part of the security attack surface for guest
code trying to escape from a KVM VM.


For the seL4 specific case, this is currently not possible in
the standard configuration. It's only exposed for a special
debug and benchmarking configuration.

The catch we have here is, that the virt machine is a nice
generic ARM (and RISC-V) machine for OS testing purposes also,
but it sometimes lacks things (see my other patched for the
UART). So, I wonder what would be the best option to continue
here. Should we consider defining another generic machine
profile that is more suited for the system emulation use case.
This is what OS developer could use then. Or could the virt
machine get some config parameters to customize it further.
So the "Machine-specific options" would  support a "sp804=on"
that would add two timer peripherals then?

The really cool customization option would be passing a DTB
to QEMU that describes exactly what "virt" machine is to be
emulated. I think the Xlinx fork used to support this feature
partly. Not sure if there was ever an attempt to mainline this?
But it would avoid running into a command parameters hell for
customization options.

Axel





Re: [PATCH qemu.git 0/1] hw/arm/virt: add 2x sp804 timer

2022-12-08 Thread Peter Maydell
On Thu, 8 Dec 2022 at 16:59, Axel Heider  wrote:
>
> Peter,
>
> >> This patch adds timer peripherals to the arm-virt machine.>>
> > Is there a reason you can't use the CPU's built-in generic timer
> > device ? That is what typical guest code does on this system.
> > I'm a bit reluctant to add more devices to the virt board
> > because over time it gradually gets increasingly complicated,
> > and every new device model we expose to the guest is another
> > thing that's part of the security attack surface for guest
> > code trying to escape from a KVM VM.
>
> For the seL4 specific case, this is currently not possible in
> the standard configuration. It's only exposed for a special
> debug and benchmarking configuration.

It's not clear to me what you mean here -- the generic
timer in the CPU exists in all configurations, so there
should be no obstacle to seL4 using it.

> The catch we have here is, that the virt machine is a nice
> generic ARM (and RISC-V) machine for OS testing purposes also,
> but it sometimes lacks things (see my other patched for the
> UART). So, I wonder what would be the best option to continue
> here. Should we consider defining another generic machine
> profile that is more suited for the system emulation use case.
> This is what OS developer could use then. Or could the virt
> machine get some config parameters to customize it further.
> So the "Machine-specific options" would  support a "sp804=on"
> that would add two timer peripherals then?
>
> The really cool customization option would be passing a DTB
> to QEMU that describes exactly what "virt" machine is to be
> emulated.

This is a firm "no" -- it sounds on the surface like a good
idea but it doesn't actually work in practice -- DTB files
don't provide enough info to be able to build a board from,
except in some specific restricted situations like the Xilinx one.

-- PMM



Re: [PATCH] scripts/archive-source: Use more portable argument with tar command

2022-12-08 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> When using the archive-source.sh script on Darwin we get:
>
>   tar: Option --concatenate is not supported
>   Usage:
> List:tar -tf 
> Extract: tar -xf 
> Create:  tar -cf  [filenames...]
> Help:tar --help
>
> Replace the long argument added by commit 8fc76176f6 ("scripts: use
> git-archive in archive-source") by their short form to keep this
> script functional.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  scripts/archive-source.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/archive-source.sh b/scripts/archive-source.sh
> index 23e042dacd..6a710a212e 100755
> --- a/scripts/archive-source.sh
> +++ b/scripts/archive-source.sh
> @@ -67,7 +67,7 @@ for sm in $submodules; do
>  esac
>  (cd $sm; git archive --format tar --prefix "$sm/" $(tree_ish)) > 
> "$sub_file"
>  test $? -ne 0 && error "failed to archive submodule $sm ($smhash)"
> -tar --concatenate --file "$tar_file" "$sub_file"
> +tar -c -f "$tar_file" "$sub_file"

I'm not sure that is correct. The gnu shortform for --concatenate is -A,
-c is specifically create so I suspect you end up re-creating the
tarball rather than adding to it.

>  test $? -ne 0 && error "failed append submodule $sm to $tar_file"
>  done
>  exit 0


-- 
Alex Bennée



Re: [PATCH] scripts/archive-source: Use more portable argument with tar command

2022-12-08 Thread Daniel P . Berrangé
On Thu, Dec 08, 2022 at 05:20:51PM +0100, Philippe Mathieu-Daudé wrote:
> When using the archive-source.sh script on Darwin we get:
> 
>   tar: Option --concatenate is not supported
>   Usage:
> List:tar -tf 
> Extract: tar -xf 
> Create:  tar -cf  [filenames...]
> Help:tar --help
> 
> Replace the long argument added by commit 8fc76176f6 ("scripts: use
> git-archive in archive-source") by their short form to keep this
> script functional.

Or install a better tar implementation from brew ?

  https://formulae.brew.sh/formula/gnu-tar


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH qemu.git 0/1] hw/arm/virt: add 2x sp804 timer

2022-12-08 Thread Axel Heider

Peter,



For the seL4 specific case, this is currently not possible in
the standard configuration. It's only exposed for a special
debug and benchmarking configuration.


It's not clear to me what you mean here -- the generic
timer in the CPU exists in all configurations, so there
should be no obstacle to seL4 using it.


Access is not exposed to userland in the standard configuration
and the standard kernel API has no no timeouts besides zero and
infinite. It's a design thing in the end. Nothing that could not
be hacked around or be changed in the design in the long run. But
my goal is not to hack around, but have a "proper" machine
simulation instead. Which basically falls down to having a generic
machine in mainline that has a few more customization options.


The really cool customization option would be passing a DTB
to QEMU that describes exactly what "virt" machine is to be
emulated.


This is a firm "no" -- it sounds on the surface like a good
idea but it doesn't actually work in practice -- DTB files
don't provide enough info to be able to build a board from,
except in some specific restricted situations like the Xilinx
one.


I can see the point. But what about supporting an overlay DTB
that takes a stripped down virt machine as base? This might avoid
some limitation. In the long run, customization via a DTB seems
still better then adding parameters to the command line. For the
short term, a few more command line options seem good enough.

What is the general feeling about having a more general system
emulation option when it comes to the "virt" machine, and a way
of resolving the usage (and security) conflict with the KVM
usecase.

Axel






Re: [PATCH 1/4] coroutine: Clean up superfluous inclusion of qemu/coroutine.h

2022-12-08 Thread Markus Armbruster
Stefan Hajnoczi  writes:

> Probably because block layer, aio.h, and coroutine_int.h header files
> already include "qemu/coroutine.h"?

Mostly, but not always.  For instance, crypto/block-luks-priv.h compiles
fine without it, and doesn't include it after this patch.

> Reviewed-by: Stefan Hajnoczi 

Thanks!




Re: [PATCH] scripts/archive-source: Use more portable argument with tar command

2022-12-08 Thread Philippe Mathieu-Daudé

On 8/12/22 18:15, Daniel P. Berrangé wrote:

On Thu, Dec 08, 2022 at 05:20:51PM +0100, Philippe Mathieu-Daudé wrote:

When using the archive-source.sh script on Darwin we get:

   tar: Option --concatenate is not supported
   Usage:
 List:tar -tf 
 Extract: tar -xf 
 Create:  tar -cf  [filenames...]
 Help:tar --help

Replace the long argument added by commit 8fc76176f6 ("scripts: use
git-archive in archive-source") by their short form to keep this
script functional.


Or install a better tar implementation from brew ?

   https://formulae.brew.sh/formula/gnu-tar


Good idea, this works for me:

-- >8 --
diff --git a/scripts/archive-source.sh b/scripts/archive-source.sh
index 23e042dacd..150bdf5536 100755
--- a/scripts/archive-source.sh
+++ b/scripts/archive-source.sh
@@ -20,2 +20,3 @@ fi

+tar=$(command -v gtar || command -v tar)
 tar_file=$(realpath "$1")
@@ -69,3 +70,3 @@ for sm in $submodules; do
 test $? -ne 0 && error "failed to archive submodule $sm ($smhash)"
-tar --concatenate --file "$tar_file" "$sub_file"
+$tar --concatenate --file "$tar_file" "$sub_file"
 test $? -ne 0 && error "failed append submodule $sm to $tar_file"
---



Re: [PATCH v2 3/3] hw/nvme: fix missing cq eventidx update

2022-12-08 Thread Guenter Roeck
On Thu, Dec 08, 2022 at 01:26:42PM +0100, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Prior to reading the shadow doorbell cq head, we have to update the
> eventidx. Otherwise, we risk that the driver will skip an mmio doorbell
> write. This happens on riscv64, as reported by Guenter.
> 
> Adding the missing update to the cq eventidx fixes the issue.
> 
> Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support")
> Cc: qemu-sta...@nongnu.org
> Cc: qemu-ri...@nongnu.org
> Reported-by: Guenter Roeck 
> Signed-off-by: Klaus Jensen 

Tested-by: Guenter Roeck 

> ---
>  hw/nvme/ctrl.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index cfab21b3436e..f6cc766aba4a 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -1334,6 +1334,14 @@ static inline void nvme_blk_write(BlockBackend *blk, 
> int64_t offset,
>  }
>  }
>  
> +static void nvme_update_cq_eventidx(const NvmeCQueue *cq)
> +{
> +trace_pci_nvme_update_cq_eventidx(cq->cqid, cq->head);
> +
> +pci_dma_write(PCI_DEVICE(cq->ctrl), cq->ei_addr, &cq->head,
> +  sizeof(cq->head));
> +}
> +
>  static void nvme_update_cq_head(NvmeCQueue *cq)
>  {
>  pci_dma_read(PCI_DEVICE(cq->ctrl), cq->db_addr, &cq->head,
> @@ -1355,6 +1363,7 @@ static void nvme_post_cqes(void *opaque)
>  hwaddr addr;
>  
>  if (n->dbbuf_enabled) {
> +nvme_update_cq_eventidx(cq);
>  nvme_update_cq_head(cq);
>  }
>  



Re: [PATCH] FreeBSD: Upgrade to 12.4 release

2022-12-08 Thread Warner Losh
On Thu, Dec 8, 2022 at 12:47 AM Philippe Mathieu-Daudé 
wrote:

> On 8/12/22 07:52, Brad Smith wrote:
> > FreeBSD: Upgrade to 12.4 release
> >
> > Signed-off-by: Brad Smith 
> > ---
> >   .gitlab-ci.d/cirrus.yml | 2 +-
> >   tests/vm/freebsd| 4 ++--
> >   2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/.gitlab-ci.d/cirrus.yml b/.gitlab-ci.d/cirrus.yml
> > index 634a73a742..785b163aa6 100644
> > --- a/.gitlab-ci.d/cirrus.yml
> > +++ b/.gitlab-ci.d/cirrus.yml
> > @@ -50,7 +50,7 @@ x64-freebsd-12-build:
> >   NAME: freebsd-12
> >   CIRRUS_VM_INSTANCE_TYPE: freebsd_instance
> >   CIRRUS_VM_IMAGE_SELECTOR: image_family
> > -CIRRUS_VM_IMAGE_NAME: freebsd-12-3
> > +CIRRUS_VM_IMAGE_NAME: freebsd-12-4
> >   CIRRUS_VM_CPUS: 8
> >   CIRRUS_VM_RAM: 8G
> >   UPDATE_COMMAND: pkg update
> > diff --git a/tests/vm/freebsd b/tests/vm/freebsd
> > index d6ff4461ba..ba2ba23d24 100755
> > --- a/tests/vm/freebsd
> > +++ b/tests/vm/freebsd
> > @@ -28,8 +28,8 @@ class FreeBSDVM(basevm.BaseVM):
> >   name = "freebsd"
> >   arch = "x86_64"
> >
> > -link = "
> https://download.freebsd.org/ftp/releases/ISO-IMAGES/12.3/FreeBSD-12.3-RELEASE-amd64-disc1.iso.xz
> "
> > -csum =
> "36dd0de50f1fe5f0a88e181e94657656de26fb64254412f74e80e128e8b938b4"
> > +link = "
> https://download.freebsd.org/ftp/releases/ISO-IMAGES/12.4/FreeBSD-12.4-RELEASE-amd64-disc1.iso.xz
> "
> > +csum =
> "1dcf6446e31bf3f81b582e9aba3319a258c29a937a2af6138ee4b181ed719a87"
>
> I don't remember and wonder why we don't use the pre-populated image:
>
> https://download.freebsd.org/ftp/releases/VM-IMAGES/12.4-RELEASE/amd64/Latest/FreeBSD-12.4-RELEASE-amd64.qcow2.xz


QEMU's CI pre-dates the FreeBSD project producing those images. I don't
think there's a big technical reason to not use them, though some of the
scripting would need to change (mostly, I think, to delete things, and
maybe to more-directly change config files to effect some of the settings
done via the installer).


> Anyhow,
>
> Reviewed-by: Philippe Mathieu-Daudé 
> Tested-by: Philippe Mathieu-Daudé 
>

Reviewed by: Warner Losh 


Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support

2022-12-08 Thread Guenter Roeck
On Thu, Dec 08, 2022 at 09:08:12AM +0100, Klaus Jensen wrote:
> On Dec  8 08:16, Klaus Jensen wrote:
> > On Dec  7 09:49, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > On Thu, Jun 16, 2022 at 08:34:07PM +0800, Jinhao Fan wrote:
> > > > Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3)
> > > > and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13
> > > > in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config
> > > > command, the nvme_dbbuf_config function tries to associate each existing
> > > > SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address.
> > > > Queues created after the Doorbell Buffer Config command will have the
> > > > doorbell buffers associated with them when they are initialized.
> > > > 
> > > > In nvme_process_sq and nvme_post_cqe, proactively check for Shadow
> > > > Doorbell buffer changes instead of wait for doorbell register changes.
> > > > This reduces the number of MMIOs.
> > > > 
> > > > In nvme_process_db(), update the shadow doorbell buffer value with
> > > > the doorbell register value if it is the admin queue. This is a hack
> > > > since hosts like Linux NVMe driver and SPDK do not use shadow
> > > > doorbell buffer for the admin queue. Copying the doorbell register
> > > > value to the shadow doorbell buffer allows us to support these hosts
> > > > as well as spec-compliant hosts that use shadow doorbell buffer for
> > > > the admin queue.
> > > > 
> > > > Signed-off-by: Jinhao Fan 
> > > 
> > > I noticed that I can no longer boot Linux kernels from nvme on riscv64
> > > systems. The problem is seen with qemu v7.1 and qemu v7.2-rc4.
> > > The log shows:
> > > 
> > > [   35.904128] nvme nvme0: I/O 642 (I/O Cmd) QID 1 timeout, aborting
> > > [   35.905000] EXT4-fs (nvme0n1): mounting ext2 file system using the 
> > > ext4 subsystem
> > > [   66.623863] nvme nvme0: I/O 643 (I/O Cmd) QID 1 timeout, aborting
> > > [   97.343989] nvme nvme0: Abort status: 0x0
> > > [   97.344355] nvme nvme0: Abort status: 0x0
> > > [   97.344647] nvme nvme0: I/O 7 QID 0 timeout, reset controller
> > > [   97.350568] nvme nvme0: I/O 644 (I/O Cmd) QID 1 timeout, aborting
> > > 
> > > This is with the mainline Linux kernel (v6.1-rc8).
> > > 
> > > Bisect points to this patch. Reverting this patch and a number of 
> > > associated
> > > patches (to fix conflicts) fixes the problem.
> > > 
> > > 06143d8771 Revert "hw/nvme: Implement shadow doorbell buffer support"
> > > acb4443e3a Revert "hw/nvme: Use ioeventfd to handle doorbell updates"
> > > d5fd309feb Revert "hw/nvme: do not enable ioeventfd by default"
> > > 1ca1e6c47c Revert "hw/nvme: unregister the event notifier handler on the 
> > > main loop"
> > > 2d26abd51e Revert "hw/nvme: skip queue processing if notifier is cleared"
> > > 99d411b5a5 Revert "hw/nvme: reenable cqe batching"
> > > 2293d3ca6c Revert "hw/nvme: Add trace events for shadow doorbell buffer"
> > > 
> > > Qemu command line:
> > > 
> > > qemu-system-riscv64 -M virt -m 512M \
> > >  -kernel arch/riscv/boot/Image -snapshot \
> > >  -device nvme,serial=foo,drive=d0 \
> > >  -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> > >  -bios default \
> > >  -append "root=/dev/nvme0n1 console=ttyS0,115200 
> > > earlycon=uart8250,mmio,0x1000,115200" \
> > >  -nographic -monitor none
> > > 
> > > Guenter
> > 
> > Hi Guenter,
> > 
> > Thanks for the bisect.
> > 
> > The shadow doorbell is also an obvious candidate for this regression. I
> > wonder if this could be a kernel bug, since we are not observing this on
> > other architectures. The memory barriers required are super finicky, but
> > in QEMU all the operations are associated with full memory barriers. The
> > barriers are more fine grained in the kernel though.
> > 
> > I will dig into this together with Keith.
> 
> A cq head doorbell mmio is skipped... And it is not the fault of the
> kernel. The kernel is in it's good right to skip the mmio since the cq
> eventidx is not properly updated.
> 
> Adding that and it boots properly on riscv. But I'm perplexed as to why
> this didnt show up on our regularly tested platforms.
> 
> Gonna try to get this in for 7.2!

I see another problem with sparc64.

[5.261508] could not locate request for tag 0x0
[5.261711] nvme nvme0: invalid id 0 completed on queue 1

That is seen repeatedly until the request times out. I'll test with
your patch to see if it resolves this problem as well, and will bisect
otherwise.

Guenter



Re: [PATCH 3/6] hw/i2c: Allwinner TWI/I2C Emulation

2022-12-08 Thread Strahinja Jankovic
Hi Niek,

On Wed, Dec 7, 2022 at 11:06 PM Niek Linnenbank
 wrote:
>
> Hi Strahinja,
>
> On Sun, Dec 4, 2022 at 12:19 AM Strahinja Jankovic 
>  wrote:
>>
>> This patch implements Allwinner TWI/I2C controller emulation. Only
>> master-mode functionality is implemented.
>>
>> The SPL boot for Cubieboard expects AXP209 PMIC on TWI0/I2C0 bus, so this is
>> first part enabling the TWI/I2C bus operation.
>>
>> Since both Allwinner A10 and H3 use the same module, it is added for
>> both boards.
>
>
> The A10 and H3 datasheets have the same introduction text on the TWI, 
> suggesting re-use indeed. Unfortunately
> the A10 datasheet seems to be missing register documentation, so I can't 
> compare that with the H3 datasheet.

The A10 register documentation for TWI exists in
https://linux-sunxi.org/File:Allwinner_A10_User_manual_V1.5.pdf user
manual (unfortunately, register description for many other modules is
missing). From what I could see, the description matches the H3, so I
thought it would be good to use the same implementation.

>
> At least according to what is implemented in the linux kernel, looks like 
> that indeed both SoCs implement the same I2C module.
> The file drivers/i2c/busses/i2c-mv64xxx.c has the following 
> mv64xxx_i2c_of_match_table:
> { .compatible = "allwinner,sun4i-a10-i2c", .data = 
> &mv64xxx_i2c_regs_sun4i},
> { .compatible = "allwinner,sun6i-a31-i2c", .data = 
> &mv64xxx_i2c_regs_sun4i},
>
> And both SoCs define the sun4i-a10-i2c and sun6i-a31-i2c in their device tree 
> files, respectively.
>
> Could you please also update the documentation files for both boards, so we 
> can show that they now support TWI/I2C?
>   docs/system/arm/cubieboard.rst
>   docs/system/arm/orangepi.rst

Yes, I will update these documents in V2. I will also try to update
the description for the cubieboard to have some examples on how to run
the emulation.

>
>>
>>
>> Signed-off-by: Strahinja Jankovic 
>> ---
>>  hw/arm/Kconfig |   2 +
>>  hw/arm/allwinner-a10.c |   8 +
>>  hw/arm/allwinner-h3.c  |  11 +-
>>  hw/i2c/Kconfig |   4 +
>>  hw/i2c/allwinner-i2c.c | 417 +
>>  hw/i2c/meson.build |   1 +
>>  include/hw/arm/allwinner-a10.h |   2 +
>>  include/hw/arm/allwinner-h3.h  |   3 +
>>  include/hw/i2c/allwinner-i2c.h | 112 +
>>  9 files changed, 559 insertions(+), 1 deletion(-)
>>  create mode 100644 hw/i2c/allwinner-i2c.c
>>  create mode 100644 include/hw/i2c/allwinner-i2c.h
>>
>> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
>> index 140f142ae5..eefe1fd134 100644
>> --- a/hw/arm/Kconfig
>> +++ b/hw/arm/Kconfig
>> @@ -322,6 +322,7 @@ config ALLWINNER_A10
>>  select ALLWINNER_A10_CCM
>>  select ALLWINNER_A10_DRAMC
>>  select ALLWINNER_EMAC
>> +select ALLWINNER_I2C
>>  select SERIAL
>>  select UNIMP
>>
>> @@ -329,6 +330,7 @@ config ALLWINNER_H3
>>  bool
>>  select ALLWINNER_A10_PIT
>>  select ALLWINNER_SUN8I_EMAC
>> +select ALLWINNER_I2C
>>  select SERIAL
>>  select ARM_TIMER
>>  select ARM_GIC
>> diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
>> index a5f7a36ac9..17e439777e 100644
>> --- a/hw/arm/allwinner-a10.c
>> +++ b/hw/arm/allwinner-a10.c
>> @@ -36,6 +36,7 @@
>>  #define AW_A10_OHCI_BASE0x01c14400
>>  #define AW_A10_SATA_BASE0x01c18000
>>  #define AW_A10_RTC_BASE 0x01c20d00
>> +#define AW_A10_I2C0_BASE0x01c2ac00
>>
>>  static void aw_a10_init(Object *obj)
>>  {
>> @@ -56,6 +57,8 @@ static void aw_a10_init(Object *obj)
>>
>>  object_initialize_child(obj, "sata", &s->sata, TYPE_ALLWINNER_AHCI);
>>
>> +object_initialize_child(obj, "i2c0", &s->i2c0, TYPE_AW_I2C);
>> +
>>  if (machine_usb(current_machine)) {
>>  int i;
>>
>> @@ -176,6 +179,11 @@ static void aw_a10_realize(DeviceState *dev, Error 
>> **errp)
>>  /* RTC */
>>  sysbus_realize(SYS_BUS_DEVICE(&s->rtc), &error_fatal);
>>  sysbus_mmio_map_overlap(SYS_BUS_DEVICE(&s->rtc), 0, AW_A10_RTC_BASE, 
>> 10);
>> +
>> +/* I2C */
>> +sysbus_realize(SYS_BUS_DEVICE(&s->i2c0), &error_fatal);
>> +sysbus_mmio_map(SYS_BUS_DEVICE(&s->i2c0), 0, AW_A10_I2C0_BASE);
>> +sysbus_connect_irq(SYS_BUS_DEVICE(&s->i2c0), 0, qdev_get_gpio_in(dev, 
>> 7));
>>  }
>>
>>  static void aw_a10_class_init(ObjectClass *oc, void *data)
>> diff --git a/hw/arm/allwinner-h3.c b/hw/arm/allwinner-h3.c
>> index 308ed15552..bfce3c8d92 100644
>> --- a/hw/arm/allwinner-h3.c
>> +++ b/hw/arm/allwinner-h3.c
>> @@ -53,6 +53,7 @@ const hwaddr allwinner_h3_memmap[] = {
>>  [AW_H3_DEV_UART1]  = 0x01c28400,
>>  [AW_H3_DEV_UART2]  = 0x01c28800,
>>  [AW_H3_DEV_UART3]  = 0x01c28c00,
>> +[AW_H3_DEV_TWI0]   = 0x01c2ac00,
>>  [AW_H3_DEV_EMAC]   = 0x01c3,
>>  [AW_H3_DEV_DRAMCOM]= 0x01c62000,
>>  [AW_H3_DEV_DRAMCTL]= 0x01c63000,
>> @@ -106,7 +107,6 @@ struct AwH3Unimplemented {
>>  { "uart1

Re: [PATCH 6/6] hw/arm: Allwinner A10 enable SPL load from MMC

2022-12-08 Thread Strahinja Jankovic
On Wed, Dec 7, 2022 at 11:39 PM Niek Linnenbank
 wrote:
>
> Hi Strahinja,
>
>
> On Sun, Dec 4, 2022 at 12:19 AM Strahinja Jankovic 
>  wrote:
>>
>> This patch enables copying of SPL from MMC if `-kernel` parameter is not
>> passed when starting QEMU. SPL is copied to SRAM_A.
>>
>> The approach is reused from Allwinner H3 implementation.
>>
>> Tested with Armbian and custom Yocto image.
>>
>> Signed-off-by: Strahinja Jankovic 
>> ---
>>  hw/arm/allwinner-a10.c | 18 ++
>>  hw/arm/cubieboard.c|  5 +
>>  include/hw/arm/allwinner-a10.h | 21 +
>>  3 files changed, 44 insertions(+)
>>
>> diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
>> index 17e439777e..dc1966ff7a 100644
>> --- a/hw/arm/allwinner-a10.c
>> +++ b/hw/arm/allwinner-a10.c
>> @@ -24,7 +24,9 @@
>>  #include "sysemu/sysemu.h"
>>  #include "hw/boards.h"
>>  #include "hw/usb/hcd-ohci.h"
>> +#include "hw/loader.h"
>>
>> +#define AW_A10_SRAM_A_BASE  0x
>>  #define AW_A10_DRAMC_BASE   0x01c01000
>>  #define AW_A10_MMC0_BASE0x01c0f000
>>  #define AW_A10_CCM_BASE 0x01c2
>> @@ -38,6 +40,22 @@
>>  #define AW_A10_RTC_BASE 0x01c20d00
>>  #define AW_A10_I2C0_BASE0x01c2ac00
>>
>> +void allwinner_a10_bootrom_setup(AwA10State *s, BlockBackend *blk)
>> +{
>> +const int64_t rom_size = 32 * KiB;
>> +g_autofree uint8_t *buffer = g_new0(uint8_t, rom_size);
>> +
>> +if (blk_pread(blk, 8 * KiB, rom_size, buffer, 0) < 0) {
>> +error_setg(&error_fatal, "%s: failed to read BlockBackend data",
>> +   __func__);
>> +return;
>> +}
>> +
>> +rom_add_blob("allwinner-a10.bootrom", buffer, rom_size,
>> +  rom_size, AW_A10_SRAM_A_BASE,
>> +  NULL, NULL, NULL, NULL, false);
>> +}
>
>
> Its probably fine for now to do it in the same way here for the A10 indeed. 
> Perhaps in the future, we can try
> to share some overlapping code between the A10 and H3.

That definitely makes sense. I plan on submitting support for A20
after this patch set, so maybe that would be a good opportunity to
refactor the Allwinner support in QEMU.

Best regards,
Strahinja


>
> So the patch looks fine to me:
> Reviewed-by: Niek Linnenbank 
>
> Regards,
> Niek
>
>>
>> +
>>  static void aw_a10_init(Object *obj)
>>  {
>>  AwA10State *s = AW_A10(obj);
>> diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
>> index afc7980414..37659c35fd 100644
>> --- a/hw/arm/cubieboard.c
>> +++ b/hw/arm/cubieboard.c
>> @@ -99,6 +99,11 @@ static void cubieboard_init(MachineState *machine)
>>  memory_region_add_subregion(get_system_memory(), AW_A10_SDRAM_BASE,
>>  machine->ram);
>>
>> +/* Load target kernel or start using BootROM */
>> +if (!machine->kernel_filename && blk && blk_is_available(blk)) {
>> +/* Use Boot ROM to copy data from SD card to SRAM */
>> +allwinner_a10_bootrom_setup(a10, blk);
>> +}
>>  /* TODO create and connect IDE devices for ide_drive_get() */
>>
>>  cubieboard_binfo.ram_size = machine->ram_size;
>> diff --git a/include/hw/arm/allwinner-a10.h b/include/hw/arm/allwinner-a10.h
>> index 763935fca9..b3c9ed24c7 100644
>> --- a/include/hw/arm/allwinner-a10.h
>> +++ b/include/hw/arm/allwinner-a10.h
>> @@ -15,6 +15,7 @@
>>  #include "hw/misc/allwinner-a10-ccm.h"
>>  #include "hw/misc/allwinner-a10-dramc.h"
>>  #include "hw/i2c/allwinner-i2c.h"
>> +#include "sysemu/block-backend.h"
>>
>>  #include "target/arm/cpu.h"
>>  #include "qom/object.h"
>> @@ -47,4 +48,24 @@ struct AwA10State {
>>  OHCISysBusState ohci[AW_A10_NUM_USB];
>>  };
>>
>> +/**
>> + * Emulate Boot ROM firmware setup functionality.
>> + *
>> + * A real Allwinner A10 SoC contains a Boot ROM
>> + * which is the first code that runs right after
>> + * the SoC is powered on. The Boot ROM is responsible
>> + * for loading user code (e.g. a bootloader) from any
>> + * of the supported external devices and writing the
>> + * downloaded code to internal SRAM. After loading the SoC
>> + * begins executing the code written to SRAM.
>> + *
>> + * This function emulates the Boot ROM by copying 32 KiB
>> + * of data from the given block device and writes it to
>> + * the start of the first internal SRAM memory.
>> + *
>> + * @s: Allwinner A10 state object pointer
>> + * @blk: Block backend device object pointer
>> + */
>> +void allwinner_a10_bootrom_setup(AwA10State *s, BlockBackend *blk);
>> +
>>  #endif
>> --
>> 2.30.2
>>
>
>
> --
> Niek Linnenbank
>



Re: [PATCH 0/6] Enable Cubieboard A10 boot SPL from SD card

2022-12-08 Thread Strahinja Jankovic
Hi Niek,

On Wed, Dec 7, 2022 at 9:25 PM Niek Linnenbank  wrote:
>
> Hello Strahinja,
>
> Thanks for contribution these patches, and also taking the H3 into account :-)

Thank you for looking into these patches and all of the comments. I
will try to submit V2 of this patch set in the following days.

>
> I've ran the avocado based acceptance tests for both boards and got these 
> results:
>
> $ ARMBIAN_ARTIFACTS_CACHED=yes AVOCADO_ALLOW_LARGE_STORAGE=yes 
> ./build/tests/venv/bin/avocado --show=app,console run -t machine:orangepi-pc 
> tests/avocado/boot_linux_console.py
> ...
> RESULTS: PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
> CANCEL 0
> JOB TIME   : 114.24 s
>
> $ ./build/tests/venv/bin/avocado --show=app,console run -t machine:cubieboard 
> tests/avocado/boot_linux_console.py
> ...
> RESULTS: PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
> CANCEL 0
> JOB TIME   : 22.79 s

I did not think initially about avocado, but maybe I could also add an
SPL/SD boot test for the cubieboard, similarly to the way it is run
for Orange Pi, for V2 of the patch set?

Best regards,
Strahinja



>
> So that shows both machines are still running fine. During startup of the 
> bionic 20.08 image for orangepi-pc it did show this message:
>   console: i2c i2c-0: mv64xxx: I2C bus locked, block: 1, time_left: 0
>   console: sy8106a: probe of 0-0065 failed with error -110
>
> The SY8106a appears to be an peripheral attached to the I2C bus on the 
> orangepi-pc, and we don't emulate the SY8106a yet, so that's an error to be 
> expected:
>   https://linux-sunxi.org/SY8106A
>
> So for the series:
> Tested-by: Niek Linnenbank 
>
> I'll try to reply to each patch as well.
>
> Kind regards,
> Niek
>
> On Sun, Dec 4, 2022 at 12:19 AM Strahinja Jankovic 
>  wrote:
>>
>> This patch series adds missing Allwinner A10 modules needed for
>> successful SPL boot:
>> - Clock controller module
>> - DRAM controller
>> - I2C0 controller (added also for Allwinner H3 since it is the same)
>> - AXP-209 connected to I2C0 bus
>>
>> It also updates Allwinner A10 emulation so SPL is copied from attached
>> SD card if `-kernel` parameter is not passed when starting QEMU
>> (approach adapted from Allwinner H3 implementation).
>>
>> Boot from SD card has been tested with Cubieboard Armbian SD card image and 
>> custom
>> Yocto image built for Cubieboard.
>> Example usage for Armbian image:
>> qemu-system-arm -M cubieboard -nographic -sd 
>> ~/Armbian_22.11.0-trunk_Cubieboard_kinetic_edge_6.0.7.img
>>
>>
>> Strahinja Jankovic (6):
>>   hw/misc: Allwinner-A10 Clock Controller Module Emulation
>>   hw/misc: Allwinner A10 DRAM Controller Emulation
>>   hw/i2c: Allwinner TWI/I2C Emulation
>>   hw/misc: Allwinner AXP-209 Emulation
>>   hw/arm: Add AXP-209 to Cubieboard
>>   hw/arm: Allwinner A10 enable SPL load from MMC
>>
>>  hw/arm/Kconfig|   5 +
>>  hw/arm/allwinner-a10.c|  40 +++
>>  hw/arm/allwinner-h3.c |  11 +-
>>  hw/arm/cubieboard.c   |  11 +
>>  hw/i2c/Kconfig|   4 +
>>  hw/i2c/allwinner-i2c.c| 417 ++
>>  hw/i2c/meson.build|   1 +
>>  hw/misc/Kconfig   |  10 +
>>  hw/misc/allwinner-a10-ccm.c   | 224 ++
>>  hw/misc/allwinner-a10-dramc.c | 179 +++
>>  hw/misc/allwinner-axp-209.c   | 263 
>>  hw/misc/meson.build   |   3 +
>>  include/hw/arm/allwinner-a10.h|  27 ++
>>  include/hw/arm/allwinner-h3.h |   3 +
>>  include/hw/i2c/allwinner-i2c.h| 112 +++
>>  include/hw/misc/allwinner-a10-ccm.h   |  67 +
>>  include/hw/misc/allwinner-a10-dramc.h |  68 +
>>  17 files changed, 1444 insertions(+), 1 deletion(-)
>>  create mode 100644 hw/i2c/allwinner-i2c.c
>>  create mode 100644 hw/misc/allwinner-a10-ccm.c
>>  create mode 100644 hw/misc/allwinner-a10-dramc.c
>>  create mode 100644 hw/misc/allwinner-axp-209.c
>>  create mode 100644 include/hw/i2c/allwinner-i2c.h
>>  create mode 100644 include/hw/misc/allwinner-a10-ccm.h
>>  create mode 100644 include/hw/misc/allwinner-a10-dramc.h
>>
>> --
>> 2.30.2
>>
>
>
> --
> Niek Linnenbank
>



Re: [PATCH v3 1/2] hw/nvme: Implement shadow doorbell buffer support

2022-12-08 Thread Guenter Roeck
On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > 
> > A cq head doorbell mmio is skipped... And it is not the fault of the
> > kernel. The kernel is in it's good right to skip the mmio since the cq
> > eventidx is not properly updated.
> > 
> > Adding that and it boots properly on riscv. But I'm perplexed as to why
> > this didnt show up on our regularly tested platforms.
> > 
> > Gonna try to get this in for 7.2!
> 
> I see another problem with sparc64.
> 
> [5.261508] could not locate request for tag 0x0
> [5.261711] nvme nvme0: invalid id 0 completed on queue 1
> 
> That is seen repeatedly until the request times out. I'll test with
> your patch to see if it resolves this problem as well, and will bisect
> otherwise.
> 
The second problem is unrelated to the doorbell problem.
It is first seen in qemu v7.1. I'll try to bisect.

Guenter



  1   2   >