[PATCH] linux-user/hppa: Detect glibc ABORT_INSTRUCTION and EXCP_BREAK handler

2022-10-27 Thread Helge Deller
The glibc on the hppa platform uses the "iitlbp %r0,(%sr0, %r0)"
assembler instruction as ABORT_INSTRUCTION.
If this (in userspace context) illegal assembler statement is found,
dump the registers and report the failure to userspace the same way as
the Linux kernel on physical hardware.

For other illegal instructions report TARGET_ILL_ILLOPC instead of
TARGET_ILL_ILLOPN as si_code.

Additionally add the missing EXCP_BREAK exception handler which occurs
when the "break x,y" assembler instruction is executed and report
EXCP_ASSIST traps.

Signed-off-by: Helge Deller 

diff --git a/linux-user/hppa/cpu_loop.c b/linux-user/hppa/cpu_loop.c
index 98c51e9b8b..a42c34e549 100644
--- a/linux-user/hppa/cpu_loop.c
+++ b/linux-user/hppa/cpu_loop.c
@@ -196,15 +196,20 @@ void cpu_loop(CPUHPPAState *env)
 force_sig_fault(TARGET_SIGSEGV, TARGET_SEGV_MAPERR, env->iaoq_f);
 break;
 case EXCP_ILL:
-EXCP_DUMP(env, "qemu: got CPU exception 0x%x - aborting\n", 
trapnr);
-force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPN, env->iaoq_f);
+EXCP_DUMP(env, "qemu: EXCP_ILL exception %#x\n", trapnr);
+force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPC, env->iaoq_f);
 break;
 case EXCP_PRIV_OPR:
-EXCP_DUMP(env, "qemu: got CPU exception 0x%x - aborting\n", 
trapnr);
-force_sig_fault(TARGET_SIGILL, TARGET_ILL_PRVOPC, env->iaoq_f);
+/* check for glibc ABORT_INSTRUCTION "iitlbp %r0,(%sr0, %r0)" */
+EXCP_DUMP(env, "qemu: EXCP_PRIV_OPR exception %#x\n", trapnr);
+if (env->cr[CR_IIR] == 0x0400) {
+   force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPC, 
env->iaoq_f);
+} else {
+   force_sig_fault(TARGET_SIGILL, TARGET_ILL_PRVOPC, 
env->iaoq_f);
+}
 break;
 case EXCP_PRIV_REG:
-EXCP_DUMP(env, "qemu: got CPU exception 0x%x - aborting\n", 
trapnr);
+EXCP_DUMP(env, "qemu: EXCP_PRIV_REG exception %#x\n", trapnr);
 force_sig_fault(TARGET_SIGILL, TARGET_ILL_PRVREG, env->iaoq_f);
 break;
 case EXCP_OVERFLOW:
@@ -216,6 +221,10 @@ void cpu_loop(CPUHPPAState *env)
 case EXCP_ASSIST:
 force_sig_fault(TARGET_SIGFPE, 0, env->iaoq_f);
 break;
+case EXCP_BREAK:
+EXCP_DUMP(env, "qemu: EXCP_BREAK exception %#x\n", trapnr);
+force_sig_fault(TARGET_SIGTRAP, TARGET_TRAP_BRKPT, env->iaoq_f & 
~3);
+break;
 case EXCP_DEBUG:
 force_sig_fault(TARGET_SIGTRAP, TARGET_TRAP_BRKPT, env->iaoq_f);
 break;



Re: [PATCH v4 2/4] hw/audio: fix tab indentation

2022-10-27 Thread Thomas Huth

On 25/10/2022 16.28, Amarjargal Gundjalam wrote:

The TABs should be replaced with spaces, to make sure that we have a
consistent coding style with an indentation of 4 spaces everywhere.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/370
Reviewed-by: Daniel P. Berrangé 

Signed-off-by: Amarjargal Gundjalam 
---
  hw/audio/fmopl.c  | 1664 ++---
  hw/audio/fmopl.h  |  138 +--
  hw/audio/intel-hda-defs.h |  990 +++---
  hw/audio/wm8750.c |  270 +++---
  4 files changed, 1531 insertions(+), 1531 deletions(-)


You're changes with regards to TAB clean up look fine to me here, so for 
this patch:


Reviewed-by: Thomas Huth 

... but when I looked through the fmopl.c part, it really looks like this 
file is completely wrong with regards to the QEMU coding style. I wonder 
whether we should rather use a tool like "astyle" or "indent" to get it into 
proper shape? ... or do we rather want to keep it in its original style in 
case somebody still wants to try to port patches from the original sources 
(MAME)? In that latter case, we should maybe also keep the TABs here? Gerd, 
what do you think?


 Thomas




Re: [PATCH v4 3/4] hw/display: fix tab indentation

2022-10-27 Thread Thomas Huth

On 25/10/2022 16.28, Amarjargal Gundjalam wrote:

The TABs should be replaced with spaces, to make sure that we have a
consistent coding style with an indentation of 4 spaces everywhere.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/370
Reviewed-by: Daniel P. Berrangé 

Signed-off-by: Amarjargal Gundjalam 
---
  hw/display/blizzard.c   |  352 -
  hw/display/cirrus_vga.c | 1606 +++
  hw/display/omap_dss.c   |  598 +++
  hw/display/pxa2xx_lcd.c |  196 ++---
  hw/display/vga_regs.h   |6 +-
  hw/display/xenfb.c  |  260 +++
  6 files changed, 1509 insertions(+), 1509 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v3 0/24] Convert nanoMIPS disassembler from C++ to C

2022-10-27 Thread Thomas Huth

On 12/09/2022 14.26, Milica Lazarevic wrote:

Hi,

This patchset converts the nanomips disassembler to plain C. C++ features
like class, std::string type, exception handling, and function overloading
have been removed and replaced with the equivalent C code.


 Hi Philippe, hi Stefan,

as far as I can see, this patch set has been completely reviewed, and IMHO 
it would be nice to get this into QEMU 7.2 to finally get rid of the C++ 
dependency in the QEMU code ... could one of you pick this up and send a 
pull request with the patches? Or is there still anything left to do here?


 Thomas




[PATCH] block/block-backend: blk_set_enable_write_cache is IO_CODE

2022-10-27 Thread Emanuele Giuseppe Esposito
blk_set_enable_write_cache() is defined as GLOBAL_STATE_CODE
but can be invoked from iothreads when handling scsi requests.
This triggers an assertion failure:

 0x7fd6c3515ce1 in raise () from /lib/x86_64-linux-gnu/libc.so.6
 0x7fd6c34ff537 in abort () from /lib/x86_64-linux-gnu/libc.so.6
 0x7fd6c34ff40f in ?? () from /lib/x86_64-linux-gnu/libc.so.6
 0x7fd6c350e662 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
 0x56149e2cea03 in blk_set_enable_write_cache (wce=true, blk=0x5614a01c27f0)
   at ../src/block/block-backend.c:1949
 0x56149e2d0a67 in blk_set_enable_write_cache (blk=0x5614a01c27f0,
   wce=) at ../src/block/block-backend.c:1951
 0x56149dfe9c59 in scsi_disk_apply_mode_select (p=0x7fd6b400c00e "\004",
   page=, s=) at ../src/hw/scsi/scsi-disk.c:1520
 mode_select_pages (change=true, len=18, p=0x7fd6b400c00e "\004", 
r=0x7fd6b4001ff0)
   at ../src/hw/scsi/scsi-disk.c:1570
 scsi_disk_emulate_mode_select (inbuf=, r=0x7fd6b4001ff0) at
   ../src/hw/scsi/scsi-disk.c:1640
 scsi_disk_emulate_write_data (req=0x7fd6b4001ff0) at 
../src/hw/scsi/scsi-disk.c:1934
 0x56149e18ff16 in virtio_scsi_handle_cmd_req_submit (req=,
   req=, s=0x5614a12f16b0) at ../src/hw/scsi/virtio-scsi.c:719
 virtio_scsi_handle_cmd_vq (vq=0x7fd6bab92140, s=0x5614a12f16b0) at
   ../src/hw/scsi/virtio-scsi.c:761
 virtio_scsi_handle_cmd (vq=, vdev=) at
   ../src/hw/scsi/virtio-scsi.c:775
 virtio_scsi_handle_cmd (vdev=0x5614a12f16b0, vq=0x7fd6bab92140) at
   ../src/hw/scsi/virtio-scsi.c:765
 0x56149e1a8aa6 in virtio_queue_notify_vq (vq=0x7fd6bab92140) at
   ../src/hw/virtio/virtio.c:2365
 0x56149e3ccea5 in aio_dispatch_handler (ctx=ctx@entry=0x5614a01babe0,
   node=) at ../src/util/aio-posix.c:369
 0x56149e3cd868 in aio_dispatch_ready_handlers (ready_list=0x7fd6c09b2680,
   ctx=0x5614a01babe0) at ../src/util/aio-posix.c:399
 aio_poll (ctx=0x5614a01babe0, blocking=blocking@entry=true) at
   ../src/util/aio-posix.c:713
 0x56149e2a7796 in iothread_run (opaque=opaque@entry=0x56149ffde500) at
   ../src/iothread.c:67
 0x56149e3d0859 in qemu_thread_start (args=0x7fd6c09b26f0) at
   ../src/util/qemu-thread-posix.c:504
 0x7fd6c36b9ea7 in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
 0x7fd6c35d9aef in clone () from /lib/x86_64-linux-gnu/libc.so.6

Changing GLOBAL_STATE_CODE in IO_CODE is allowed, since GSC callers are
allowed to call IO_CODE.

Resolves: #1272

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index aa4adf06ae..ade4da55e0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1946,7 +1946,7 @@ bool blk_enable_write_cache(BlockBackend *blk)
 
 void blk_set_enable_write_cache(BlockBackend *blk, bool wce)
 {
-GLOBAL_STATE_CODE();
+IO_CODE();
 blk->enable_write_cache = wce;
 }
 
-- 
2.31.1




[PATCH v4 0/2] vhost-vdpa: add support for vIOMMU

2022-10-27 Thread Cindy Lu
These patches are to support vIOMMU in vdpa device

changes in V3
1. Move function vfio_get_xlat_addr to memory.c
2. Use the existing memory listener, while the MR is
iommu MR then call the function iommu_region_add/
iommu_region_del

changes in V4
1.make the comments in vfio_get_xlat_addr more general

Cindy Lu (2):
  vfio: move the function vfio_get_xlat_addr() to memory.c
  vhost-vdpa: add support for vIOMMU

 hw/vfio/common.c   |  92 +--
 hw/virtio/vhost-vdpa.c | 131 ++---
 include/exec/memory.h  |   4 +
 include/hw/virtio/vhost-vdpa.h |  10 +++
 softmmu/memory.c   |  84 +
 5 files changed, 222 insertions(+), 99 deletions(-)

-- 
2.34.3




[PATCH v4 2/2] vhost-vdpa: add support for vIOMMU

2022-10-27 Thread Cindy Lu
Add support for vIOMMU. add the new function to deal with iommu MR.
- during iommu_region_add register a specific IOMMU notifier,
 and store all notifiers in a list.
- during iommu_region_del, compare and delete the IOMMU notifier from the list

Verified in vp_vdpa and vdpa_sim_net driver

Signed-off-by: Cindy Lu 
---
 hw/virtio/vhost-vdpa.c | 131 ++---
 include/hw/virtio/vhost-vdpa.h |  10 +++
 2 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 3ff9ce3501..407f3e9ac2 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -26,6 +26,7 @@
 #include "cpu.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "hw/virtio/virtio-access.h"
 
 /*
  * Return one past the end of the end of section. Be careful with uint64_t
@@ -44,7 +45,6 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 uint64_t iova_min,
 uint64_t iova_max)
 {
-Int128 llend;
 
 if ((!memory_region_is_ram(section->mr) &&
  !memory_region_is_iommu(section->mr)) ||
@@ -61,14 +61,6 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 return true;
 }
 
-llend = vhost_vdpa_section_end(section);
-if (int128_gt(llend, int128_make64(iova_max))) {
-error_report("RAM section out of device range (max=0x%" PRIx64
- ", end addr=0x%" PRIx64 ")",
- iova_max, int128_get64(llend));
-return true;
-}
-
 return false;
 }
 
@@ -173,6 +165,115 @@ static void vhost_vdpa_listener_commit(MemoryListener 
*listener)
 v->iotlb_batch_begin_sent = false;
 }
 
+static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
+
+hwaddr iova = iotlb->iova + iommu->iommu_offset;
+struct vhost_vdpa *v = iommu->dev;
+void *vaddr;
+int ret;
+
+if (iotlb->target_as != &address_space_memory) {
+error_report("Wrong target AS \"%s\", only system memory is allowed",
+ iotlb->target_as->name ? iotlb->target_as->name : "none");
+return;
+}
+RCU_READ_LOCK_GUARD();
+vhost_vdpa_iotlb_batch_begin_once(v);
+
+if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+bool read_only;
+
+if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
+  &address_space_memory)) {
+return;
+}
+ret =
+vhost_vdpa_dma_map(v, iova, iotlb->addr_mask + 1, vaddr, 
read_only);
+if (ret) {
+error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
+ "0x%" HWADDR_PRIx ", %p) = %d (%m)",
+ v, iova, iotlb->addr_mask + 1, vaddr, ret);
+}
+} else {
+ret = vhost_vdpa_dma_unmap(v, iova, iotlb->addr_mask + 1);
+if (ret) {
+error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
+ "0x%" HWADDR_PRIx ") = %d (%m)",
+ v, iova, iotlb->addr_mask + 1, ret);
+}
+}
+}
+
+static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+struct vdpa_iommu *iommu;
+Int128 end;
+int iommu_idx;
+IOMMUMemoryRegion *iommu_mr;
+int ret;
+
+if (!memory_region_is_iommu(section->mr)) {
+return;
+}
+
+iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+
+iommu = g_malloc0(sizeof(*iommu));
+end =  int128_add(int128_make64(section->offset_within_region),
+section->size);
+end = int128_sub(end, int128_one());
+iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
+MEMTXATTRS_UNSPECIFIED);
+
+iommu->iommu_mr = iommu_mr;
+
+iommu_notifier_init(
+&iommu->n, vhost_vdpa_iommu_map_notify, IOMMU_NOTIFIER_IOTLB_EVENTS,
+section->offset_within_region, int128_get64(end), iommu_idx);
+iommu->iommu_offset =
+section->offset_within_address_space - section->offset_within_region;
+iommu->dev = v;
+
+ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
+if (ret) {
+g_free(iommu);
+return;
+}
+
+QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
+memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
+
+return;
+}
+
+static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+struct vdpa_iommu *iommu;
+
+if (!memory_region_is_iommu(section->mr)) {
+return;
+}
+
+QLIST_FORE

[PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Cindy Lu
Move the function vfio_get_xlat_addr to softmmu/memory.c, and
change the name to memory_get_xlat_addr().So we can use this
function in other devices,such as vDPA device.

Signed-off-by: Cindy Lu 
---
 hw/vfio/common.c  | 92 ++-
 include/exec/memory.h |  4 ++
 softmmu/memory.c  | 84 +++
 3 files changed, 92 insertions(+), 88 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ace9562a9b..2b5a9f3d8d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -574,92 +574,6 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)
section->offset_within_address_space & (1ULL << 63);
 }
 
-/* Called with rcu_read_lock held.  */
-static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
-   ram_addr_t *ram_addr, bool *read_only)
-{
-MemoryRegion *mr;
-hwaddr xlat;
-hwaddr len = iotlb->addr_mask + 1;
-bool writable = iotlb->perm & IOMMU_WO;
-
-/*
- * The IOMMU TLB entry we have just covers translation through
- * this IOMMU to its immediate target.  We need to translate
- * it the rest of the way through to memory.
- */
-mr = address_space_translate(&address_space_memory,
- iotlb->translated_addr,
- &xlat, &len, writable,
- MEMTXATTRS_UNSPECIFIED);
-if (!memory_region_is_ram(mr)) {
-error_report("iommu map to non memory area %"HWADDR_PRIx"",
- xlat);
-return false;
-} else if (memory_region_has_ram_discard_manager(mr)) {
-RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
-MemoryRegionSection tmp = {
-.mr = mr,
-.offset_within_region = xlat,
-.size = int128_make64(len),
-};
-
-/*
- * Malicious VMs can map memory into the IOMMU, which is expected
- * to remain discarded. vfio will pin all pages, populating memory.
- * Disallow that. vmstate priorities make sure any RamDiscardManager
- * were already restored before IOMMUs are restored.
- */
-if (!ram_discard_manager_is_populated(rdm, &tmp)) {
-error_report("iommu map to discarded memory (e.g., unplugged via"
- " virtio-mem): %"HWADDR_PRIx"",
- iotlb->translated_addr);
-return false;
-}
-
-/*
- * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
- * pages will remain pinned inside vfio until unmapped, resulting in a
- * higher memory consumption than expected. If memory would get
- * populated again later, there would be an inconsistency between pages
- * pinned by vfio and pages seen by QEMU. This is the case until
- * unmapped from the IOMMU (e.g., during device reset).
- *
- * With malicious guests, we really only care about pinning more memory
- * than expected. RLIMIT_MEMLOCK set for the user/process can never be
- * exceeded and can be used to mitigate this problem.
- */
-warn_report_once("Using vfio with vIOMMUs and coordinated discarding 
of"
- " RAM (e.g., virtio-mem) works, however, malicious"
- " guests can trigger pinning of more memory than"
- " intended via an IOMMU. It's possible to mitigate "
- " by setting/adjusting RLIMIT_MEMLOCK.");
-}
-
-/*
- * Translation truncates length to the IOMMU page size,
- * check that it did not truncate too much.
- */
-if (len & iotlb->addr_mask) {
-error_report("iommu has granularity incompatible with target AS");
-return false;
-}
-
-if (vaddr) {
-*vaddr = memory_region_get_ram_ptr(mr) + xlat;
-}
-
-if (ram_addr) {
-*ram_addr = memory_region_get_ram_addr(mr) + xlat;
-}
-
-if (read_only) {
-*read_only = !writable || mr->readonly;
-}
-
-return true;
-}
-
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
 bool read_only;
 
-if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
+if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
+  &address_space_memory)) {
 goto out;
 }
 /*
@@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 }
 
 rcu_read_lock();
-if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
+if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, N

Re: [PATCH] avocado: use sha1 for fc31 imgs to avoid first time re-download

2022-10-27 Thread Thomas Huth

On 24/10/2022 11.02, Daniel P. Berrangé wrote:

On Sat, Oct 22, 2022 at 02:03:50PM -0300, Daniel Henrique Barboza wrote:

'make check-avocado' will download any images that aren't present in the
cache via 'get-vm-images' in tests/Makefile.include. The target that
downloads fedora 31 images, get-vm-image-fedora-31, will use 'avocado
vmimage get  --distro=fedora --distro-version=31 --arch=(...)' to
download the image for each arch. Note that this command does not
support any argument to set the hash algorithm used and, based on the
avocado source code [1], DEFAULT_HASH_ALGORITHM is set to "sha1". The
sha1 hash is stored in a Fedora-Cloud-Base-31-1.9.{ARCH}.qcow2-CHECKSUM
in the cache.



For now, in QEMU, let's use sha1 for all Fedora 31 images. This will
immediately spares us at least one extra download for each Fedora 31
image that we're doing in all our CI runs.

[1] https://github.com/avocado-framework/avocado.git @ 942a5d6972906
[2] https://github.com/avocado-framework/avocado/issues/5496


Can we just ask Avocado maintainers to fix this problem on their
side to allow use of a modern hash alg as a priority item. We've
already had this problem in QEMU for over a year AFAICT, so doesn't
seem like we need to urgently do a workaround on QEMU side, so we
can get Avocado devs to commit to fixing it in the next month.


Do we have such a commitment? ... The avocado version in QEMU is completely 
backlevel these days, it's still using version 88.1 from May 2021, i.e. 
there hasn't been any update since more than a year. I recently tried to 
bump it to a newer version on my own (since I'm still suffering from the 
problem that find_free_port() does not work if you don't have a local IPv6 
address), but it's not that straight forward since the recent versions of 
avocado changed a lot of things (e.g. the new nrunner - do we want to run 
tests in parallel? If so it breaks a lot of the timeout settings, I think), 
so an update needs a lot of careful testing...


So unless someone is really committing to spend a lot of time on updating 
Avocado in QEMU in the near future, I don't think that such a fix for the 
hash algorithm will happen any time soon, and thus I think we should 
consider to include this work-around for the time being.


 Thomas




Re: Crash in RTC

2022-10-27 Thread Konstantin Kostiuk
ping

On Wed, Aug 31, 2022 at 11:33 AM Vadim Rozenfeld 
wrote:

> Just a bit more info related to this issue.
> Below is a quote from my previous conversation with Yan
>
> 
> QEMU RTC function periodic_timer_update is calling in response
> to Windows HAL calls
> _HalpRtcArmTimer@16
> and
> _HalpRtcStop@4
>
> WIndows can change timer  frequency  dynamically
> (some more info can be found here
> https://bugzilla.redhat.com/show_bug.cgi?id=1610461 )
> but calculation of the frequency is based on the wallclock time (IIRC).
> And if I'm not mistaken, then lost_tick_policy=delay can lead to the
> wallclock time delay,
> which in my understanding can lead to the incorrect frequency calculation.
>
> Another interesting thing is that they don't use Hyper-V enlightenments at
> all.
> Do you know if there is any particular reason for that?  They might try
> switching
> to hv_stimer instead of RTC.
>
> And one more thing, the frequency of the timer can be adjusted by UM
> applications.
> Some of them , like emulators and servers use it quite widely. It worse
> asking them
> if they are running such kinds of apps.
> 
>
>
> Cheers,
> Vadim.
>
> On Wed, Aug 31, 2022 at 5:46 PM Konstantin Kostiuk 
> wrote:
>
>> CC: Vadim
>>
>> On Wed, Aug 31, 2022 at 10:42 AM Konstantin Kostiuk 
>> wrote:
>>
>>> ping
>>>
>>> On Wed, Aug 24, 2022 at 5:37 PM Konstantin Kostiuk 
>>> wrote:
>>>
 Hi Michael and Paolo,

 I write to you as maintainers of mc146818rtc.c. I am working on bug
 https://bugzilla.redhat.com/show_bug.cgi?id=2054781
 and reproduced it on the current master branch.

 I added some print at line 202 (before assert(lost_clock >= 0),
 https://gitlab.com/qemu-project/qemu/-/blob/master/hw/rtc/mc146818rtc.c#L202)
 and got the following values:

 next_periodic_clock, old_period, last_periodic_clock, cur_clock,
 lost_clock, current_time
 54439076429968, 32, 54439076429936, 54439076430178, 242,
 1661348768010822000
 54439076430224, 512, 54439076429712, 54439076430188, 476,
 166134876807000
 54439076430224, 32, 54439076430192, 54439076429884, -308,
 1661348768001838000

 The current_time value in the last print is lower than in the previous
 one.
 So, the error occurs because time has gone backward.

 I think this is a possible situation during time synchronization.
 My question is what should we do in this case?

 Best Regards,
 Konstantin Kostiuk.

>>>


[PATCH V4 0/4] PASID support for Intel IOMMU

2022-10-27 Thread Jason Wang
Hi All:

This series tries to introduce PASID support for Intel IOMMU. The work
is based on the previous scalabe mode support by implement the
ECAP_PASID. A new "x-pasid-mode" is introduced to enable this
mode. All internal vIOMMU codes were extended to support PASID instead
of the current RID2PASID method. The code is also capable of
provisiong address space with PASID. Note that no devices can issue
PASID DMA right now, this needs future work.

This will be used for prototying PASID based device like virtio or
future vPASID support for Intel IOMMU.

Test has been done with the Linux guest with scalalbe mode enabled and
disabled. A virtio prototype[1][2] that can issue PAISD based DMA
request were also tested, different PASID were used in TX and RX in
those testing drivers.

Changes since V3:

- rearrange the member for vtd_iotlb_key structure
- reorder the pasid parameter ahead of addr for vtd_lookup_iotlb()
- allow access size from 1 to 8 for vtd_mem_ir_fault_ops

Changes since V2:

- use PCI_BUILD_BDF() instead of vtd_make_source_id()
- Tweak the comments above vtd_as_hash()
- use PCI_BUS_NUM() instead of open coding
- rename vtd_as to vtd_address_spaces
- rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
- forbid device-iotlb with PASID
- report PASID based qualified fault
- log PASID during errors

Changes since V1:

- speed up IOMMU translation when RID2PASID is not used
- remove the unnecessary L1 PASID invalidation descriptor support
- adding support for catching the translation to interrupt range when
  in the case of PT and scalable mode
- refine the comments to explain the hash algorithm used in IOTLB
  lookups

Please review.

[1] https://github.com/jasowang/qemu.git virtio-pasid
[2] https://github.com/jasowang/linux.git virtio-pasid

Jason Wang (4):
  intel-iommu: don't warn guest errors when getting rid2pasid entry
  intel-iommu: drop VTDBus
  intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
  intel-iommu: PASID support

 hw/i386/intel_iommu.c  | 685 ++---
 hw/i386/intel_iommu_internal.h |  16 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |  18 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 482 insertions(+), 241 deletions(-)

-- 
2.25.1




[PATCH V4 2/4] intel-iommu: drop VTDBus

2022-10-27 Thread Jason Wang
We introduce VTDBus structure as an intermediate step for searching
the address space. This works well with SID based matching/lookup. But
when we want to support SID plus PASID based address space lookup,
this intermediate steps turns out to be a burden. So the patch simply
drops the VTDBus structure and use the PCIBus and devfn as the key for
the g_hash_table(). This simplifies the codes and the future PASID
extension.

To prevent being slower for past vtd_find_as_from_bus_num() callers, a
vtd_as cache indexed by the bus number is introduced to store the last
recent search result of a vtd_as belongs to a specific bus.

Reviewed-by: Peter Xu 
Signed-off-by: Jason Wang 
---
Changes since V2:
- use PCI_BUILD_BDF() instead of vtd_make_source_id()
- Tweak the comments above vtd_as_hash()
- use PCI_BUS_NUM() instead of open coding
- rename vtd_as to vtd_address_spaces
---
 hw/i386/intel_iommu.c | 234 +-
 include/hw/i386/intel_iommu.h |  11 +-
 2 files changed, 118 insertions(+), 127 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 796f924c06..6abe12a8c5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -61,6 +61,16 @@
 } \
 }
 
+/*
+ * PCI bus number (or SID) is not reliable since the device is usaully
+ * initalized before guest can configure the PCI bridge
+ * (SECONDARY_BUS_NUMBER).
+ */
+struct vtd_as_key {
+PCIBus *bus;
+uint8_t devfn;
+};
+
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
@@ -210,6 +220,27 @@ static guint vtd_uint64_hash(gconstpointer v)
 return (guint)*(const uint64_t *)v;
 }
 
+static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
+{
+const struct vtd_as_key *key1 = v1;
+const struct vtd_as_key *key2 = v2;
+
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+}
+
+/*
+ * Note that we use pointer to PCIBus as the key, so hashing/shifting
+ * based on the pointer value is intended. Note that we deal with
+ * collisions through vtd_as_equal().
+ */
+static guint vtd_as_hash(gconstpointer v)
+{
+const struct vtd_as_key *key = v;
+guint value = (guint)(uintptr_t)key->bus;
+
+return (guint)(value << 8 | key->devfn);
+}
+
 static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
   gpointer user_data)
 {
@@ -248,22 +279,14 @@ static gboolean vtd_hash_remove_by_page(gpointer key, 
gpointer value,
 static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
 {
 VTDAddressSpace *vtd_as;
-VTDBus *vtd_bus;
-GHashTableIter bus_it;
-uint32_t devfn_it;
+GHashTableIter as_it;
 
 trace_vtd_context_cache_reset();
 
-g_hash_table_iter_init(&bus_it, s->vtd_as_by_busptr);
+g_hash_table_iter_init(&as_it, s->vtd_address_spaces);
 
-while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
-for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
-vtd_as = vtd_bus->dev_as[devfn_it];
-if (!vtd_as) {
-continue;
-}
-vtd_as->context_cache_entry.context_cache_gen = 0;
-}
+while (g_hash_table_iter_next (&as_it, NULL, (void**)&vtd_as)) {
+vtd_as->context_cache_entry.context_cache_gen = 0;
 }
 s->context_cache_gen = 1;
 }
@@ -993,32 +1016,6 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, 
uint32_t level)
 return slpte & rsvd_mask;
 }
 
-/* Find the VTD address space associated with a given bus number */
-static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
-{
-VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
-GHashTableIter iter;
-
-if (vtd_bus) {
-return vtd_bus;
-}
-
-/*
- * Iterate over the registered buses to find the one which
- * currently holds this bus number and update the bus_num
- * lookup table.
- */
-g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-if (pci_bus_num(vtd_bus->bus) == bus_num) {
-s->vtd_as_by_bus_num[bus_num] = vtd_bus;
-return vtd_bus;
-}
-}
-
-return NULL;
-}
-
 /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
@@ -1634,24 +1631,13 @@ static bool vtd_switch_address_space(VTDAddressSpace 
*as)
 
 static void vtd_switch_address_space_all(IntelIOMMUState *s)
 {
+VTDAddressSpace *vtd_as;
 GHashTableIter iter;
-VTDBus *vtd_bus;
-int i;
-
-g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-for (i = 0; i < PCI_DEVFN_MAX; i++) {
-if (!vtd_bus->dev_as[i]) {
- 

[PATCH V4 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry

2022-10-27 Thread Jason Wang
We use to warn on wrong rid2pasid entry. But this error could be
triggered by the guest and could happens during initialization. So
let's don't warn in this case.

Signed-off-by: Jason Wang 
---
 hw/i386/intel_iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6524c2ee32..796f924c06 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1554,8 +1554,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, 
VTDContextEntry *ce)
 if (s->root_scalable) {
 ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
 if (ret) {
-error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
-  __func__, ret);
+/*
+ * This error is guest triggerable. We should assumt PT
+ * not enabled for safety.
+ */
 return false;
 }
 return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
-- 
2.25.1




[PATCH V4 4/4] intel-iommu: PASID support

2022-10-27 Thread Jason Wang
This patch introduce ECAP_PASID via "x-pasid-mode". Based on the
existing support for scalable mode, we need to implement the following
missing parts:

1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation
   with PASID
2) tag IOTLB with PASID
3) PASID cache and its flush
4) PASID based IOTLB invalidation

For simplicity PASID cache is not implemented so we can simply
implement the PASID cache flush as a no and leave it to be implemented
in the future. For PASID based IOTLB invalidation, since we haven't
had L1 stage support, the PASID based IOTLB invalidation is not
implemented yet. For PASID based device IOTLB invalidation, it
requires the support for vhost so we forbid enabling device IOTLB when
PASID is enabled now. Those work could be done in the future.

Note that though PASID based IOMMU translation is ready but no device
can issue PASID DMA right now. In this case, PCI_NO_PASID is used as
PASID to identify the address without PASID. vtd_find_add_as() has
been extended to provision address space with PASID which could be
utilized by the future extension of PCI core to allow device model to
use PASID based DMA translation.

This feature would be useful for:

1) prototyping PASID support for devices like virtio
2) future vPASID work
3) future PRS and vSVA work

Signed-off-by: Jason Wang 
---
Changes since V3:
- rearrange the member for vtd_iotlb_key structure
- reorder the pasid parameter ahead of addr for vtd_lookup_iotlb()
- allow access size from 1 to 8 for vtd_mem_ir_fault_ops
Changes since V2:
- forbid device-iotlb with PASID
- report PASID based qualified fault
- log PASID during errors
---
 hw/i386/intel_iommu.c  | 415 +
 hw/i386/intel_iommu_internal.h |  16 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |   7 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 338 insertions(+), 104 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6c03ecf3cb..1265e7dacf 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -58,6 +58,14 @@
 struct vtd_as_key {
 PCIBus *bus;
 uint8_t devfn;
+uint32_t pasid;
+};
+
+struct vtd_iotlb_key {
+uint64_t gfn;
+uint32_t pasid;
+uint32_t level;
+uint16_t sid;
 };
 
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
@@ -199,14 +207,24 @@ static inline gboolean 
vtd_as_has_map_notifier(VTDAddressSpace *as)
 }
 
 /* GHashTable functions */
-static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
+static gboolean vtd_iotlb_equal(gconstpointer v1, gconstpointer v2)
 {
-return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+const struct vtd_iotlb_key *key1 = v1;
+const struct vtd_iotlb_key *key2 = v2;
+
+return key1->sid == key2->sid &&
+   key1->pasid == key2->pasid &&
+   key1->level == key2->level &&
+   key1->gfn == key2->gfn;
 }
 
-static guint vtd_uint64_hash(gconstpointer v)
+static guint vtd_iotlb_hash(gconstpointer v)
 {
-return (guint)*(const uint64_t *)v;
+const struct vtd_iotlb_key *key = v;
+
+return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
+   (key->level) << VTD_IOTLB_LVL_SHIFT |
+   (key->pasid) << VTD_IOTLB_PASID_SHIFT;
 }
 
 static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
@@ -214,7 +232,8 @@ static gboolean vtd_as_equal(gconstpointer v1, 
gconstpointer v2)
 const struct vtd_as_key *key1 = v1;
 const struct vtd_as_key *key2 = v2;
 
-return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn) &&
+   (key1->pasid == key2->pasid);
 }
 
 /*
@@ -302,13 +321,6 @@ static void vtd_reset_caches(IntelIOMMUState *s)
 vtd_iommu_unlock(s);
 }
 
-static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
-  uint32_t level)
-{
-return gfn | ((uint64_t)(source_id) << VTD_IOTLB_SID_SHIFT) |
-   ((uint64_t)(level) << VTD_IOTLB_LVL_SHIFT);
-}
-
 static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 {
 return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
@@ -316,15 +328,17 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t 
level)
 
 /* Must be called with IOMMU lock held */
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
-   hwaddr addr)
+   uint32_t pasid, hwaddr addr)
 {
+struct vtd_iotlb_key key;
 VTDIOTLBEntry *entry;
-uint64_t key;
 int level;
 
 for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
-key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
-source_id, level);
+key.gfn = vtd_get_iotlb_gfn(addr, level);
+key.level = level;
+key.sid = source_id;
+key.pasid = pasid;
 entry = g_hash_t

[PATCH V4 3/4] intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function

2022-10-27 Thread Jason Wang
We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an
internal goto which prevents it from being reused. This patch convert
that macro to a dedicated function and let the caller to decide what
to do (e.g using goto or not). This makes sure it can be re-used for
other function that requires fault reporting.

Signed-off-by: Jason Wang 
---
Changes since V2:
- rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
---
 hw/i386/intel_iommu.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6abe12a8c5..6c03ecf3cb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -49,17 +49,6 @@
 /* pe operations */
 #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT)
 #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & 
VTD_SM_PASID_ENTRY_AW))
-#define VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write) {\
-if (ret_fr) { \
-ret_fr = -ret_fr; \
-if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {   \
-trace_vtd_fault_disabled();   \
-} else {  \
-vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);  \
-} \
-goto error;   \
-} \
-}
 
 /*
  * PCI bus number (or SID) is not reliable since the device is usaully
@@ -1718,6 +1707,19 @@ out:
 trace_vtd_pt_enable_fast_path(source_id, success);
 }
 
+static void vtd_report_qualify_fault(IntelIOMMUState *s,
+ int err, bool is_fpd_set,
+ uint16_t source_id,
+ hwaddr addr,
+ bool is_write)
+{
+if (is_fpd_set && vtd_is_qualified_fault(err)) {
+trace_vtd_fault_disabled();
+} else {
+vtd_report_dmar_fault(s, source_id, addr, err, is_write);
+}
+}
+
 /* Map dev to context-entry then do a paging-structures walk to do a iommu
  * translation.
  *
@@ -1778,7 +1780,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
 if (!is_fpd_set && s->root_scalable) {
 ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, 
is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set,
+ source_id, addr, is_write);
+goto error;
+}
 }
 } else {
 ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
@@ -1786,7 +1792,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 if (!ret_fr && !is_fpd_set && s->root_scalable) {
 ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
 }
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set,
+ source_id, addr, is_write);
+goto error;
+}
 /* Update context-cache */
 trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo,
   cc_entry->context_cache_gen,
@@ -1822,7 +1832,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 
 ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level,
&reads, &writes, s->aw_bits);
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set, source_id,
+ addr, is_write);
+goto error;
+}
 
 page_mask = vtd_slpt_level_page_mask(level);
 access_flags = IOMMU_ACCESS_FLAG(reads, writes);
-- 
2.25.1




Re: [PATCH 5/7] block/nfs: Fix 32-bit Windows build

2022-10-27 Thread Kevin Wolf
Am 27.10.2022 um 04:45 hat Bin Meng geschrieben:
> Hi Kevin,
> [...]
> Will you queue this patch via the block tree?

Just to be sure, you mean only patch 5? Yes, I can do that.

Kevin




Re: [PATCH v10 1/9] s390x/cpu topology: core_id sets s390x CPU topology

2022-10-27 Thread Thomas Huth

On 24/10/2022 21.25, Janis Schoetterl-Glausch wrote:

On Wed, 2022-10-12 at 18:20 +0200, Pierre Morel wrote:

In the S390x CPU topology the core_id specifies the CPU address
and the position of the core withing the topology.

Let's build the topology based on the core_id.
s390x/cpu topology: core_id sets s390x CPU topology

In the S390x CPU topology the core_id specifies the CPU address
and the position of the cpu withing the topology.

Let's build the topology based on the core_id.

Signed-off-by: Pierre Morel 
---
  include/hw/s390x/cpu-topology.h |  45 +++
  hw/s390x/cpu-topology.c | 132 
  hw/s390x/s390-virtio-ccw.c  |  21 +
  hw/s390x/meson.build|   1 +
  4 files changed, 199 insertions(+)
  create mode 100644 include/hw/s390x/cpu-topology.h
  create mode 100644 hw/s390x/cpu-topology.c

diff --git a/include/hw/s390x/cpu-topology.h b/include/hw/s390x/cpu-topology.h
new file mode 100644
index 00..66c171d0bc
--- /dev/null
+++ b/include/hw/s390x/cpu-topology.h
@@ -0,0 +1,45 @@
+/*
+ * CPU Topology
+ *
+ * Copyright 2022 IBM Corp.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#ifndef HW_S390X_CPU_TOPOLOGY_H
+#define HW_S390X_CPU_TOPOLOGY_H
+
+#include "hw/qdev-core.h"
+#include "qom/object.h"
+
+typedef struct S390TopoContainer {
+int active_count;
+} S390TopoContainer;
+
+#define S390_TOPOLOGY_CPU_IFL 0x03
+#define S390_TOPOLOGY_MAX_ORIGIN ((63 + S390_MAX_CPUS) / 64)
+typedef struct S390TopoTLE {
+uint64_t mask[S390_TOPOLOGY_MAX_ORIGIN];
+} S390TopoTLE;


Since this actually represents multiple TLEs, you might want to change the
name of the struct to reflect this. S390TopoTLEList maybe?


Didn't TLE mean "Topology List Entry"? (by the way, Pierre, please explain 
this three letter acronym somewhere in this header in a comment)...


So expanding the TLE, this would mean S390TopoTopologyListEntryList ? ... 
this is getting weird... Also, this is not a "list" in the sense of a linked 
list, as one might expect at a first glance, so this is all very confusing 
here. Could you please come up with some better naming?


 Thomas




Re: [PATCH v1 10/12] hw/arm: introduce xenpv machine

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:


> Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
> TPM emulator and connects to swtpm running on host machine via chardev socket
> and support TPM functionalities for a guest domain.
>
> Extra command line for aarch64 xenpv QEMU to connect to swtpm:
> -chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
> -tpmdev emulator,id=tpm0,chardev=chrtpm \
>
> swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
> provides access to TPM functionality over socket, chardev and CUSE interface.
> Github repo: https://github.com/stefanberger/swtpm
> Example for starting swtpm on host machine:
> mkdir /tmp/vtpm2
> swtpm socket --tpmstate dir=/tmp/vtpm2 \
> --ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &


> +static void xen_enable_tpm(void)
> +{
> +/* qemu_find_tpm_be is only available when CONFIG_TPM is enabled. */
> +#ifdef CONFIG_TPM
> +Error *errp = NULL;
> +DeviceState *dev;
> +SysBusDevice *busdev;
> +
> +TPMBackend *be = qemu_find_tpm_be("tpm0");
> +if (be == NULL) {
> +DPRINTF("Couldn't fine the backend for tpm0\n");
> +return;
> +}
> +dev = qdev_new(TYPE_TPM_TIS_SYSBUS);
> +object_property_set_link(OBJECT(dev), "tpmdev", OBJECT(be), &errp);
> +object_property_set_str(OBJECT(dev), "tpmdev", be->id, &errp);
> +busdev = SYS_BUS_DEVICE(dev);
> +sysbus_realize_and_unref(busdev, &error_fatal);
> +sysbus_mmio_map(busdev, 0, GUEST_TPM_BASE);

I'm not sure what has gone wrong here but I'm getting:

  ../../hw/arm/xen_arm.c: In function ‘xen_enable_tpm’:
  ../../hw/arm/xen_arm.c:120:32: error: ‘GUEST_TPM_BASE’ undeclared (first use 
in this function); did you mean ‘GUEST_RAM_BASE’?
120 | sysbus_mmio_map(busdev, 0, GUEST_TPM_BASE);
|^~
|GUEST_RAM_BASE
  ../../hw/arm/xen_arm.c:120:32: note: each undeclared identifier is reported 
only once for each function it appears in

In my cross build:

  # Configured with: '../../configure' '--disable-docs' 
'--target-list=aarch64-softmmu' '--disable-kvm' '--enable-xen' 
'--disable-opengl' '--disable-libudev' '--enable-tpm' 
'--disable-xen-pci-passthrough' '--cross-prefix=aarch64-linux-gnu-' 
'--skip-meson'

which makes me wonder if this is a configure failure or a confusion
about being able to have host swtpm implementations during emulation but
needing target tpm for Xen?

-- 
Alex Bennée



Re: [PATCH v4 2/2] vhost-vdpa: add support for vIOMMU

2022-10-27 Thread Jason Wang
On Thu, Oct 27, 2022 at 3:41 PM Cindy Lu  wrote:
>
> Add support for vIOMMU. add the new function to deal with iommu MR.
> - during iommu_region_add register a specific IOMMU notifier,
>  and store all notifiers in a list.
> - during iommu_region_del, compare and delete the IOMMU notifier from the list
>
> Verified in vp_vdpa and vdpa_sim_net driver
>
> Signed-off-by: Cindy Lu 

Acked-by: Jason Wang 

(some nits, see below)

> ---
>  hw/virtio/vhost-vdpa.c | 131 ++---
>  include/hw/virtio/vhost-vdpa.h |  10 +++
>  2 files changed, 130 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 3ff9ce3501..407f3e9ac2 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -26,6 +26,7 @@
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-access.h"
>
>  /*
>   * Return one past the end of the end of section. Be careful with uint64_t
> @@ -44,7 +45,6 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>  uint64_t iova_min,
>  uint64_t iova_max)
>  {
> -Int128 llend;
>
>  if ((!memory_region_is_ram(section->mr) &&
>   !memory_region_is_iommu(section->mr)) ||
> @@ -61,14 +61,6 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>  return true;
>  }
>
> -llend = vhost_vdpa_section_end(section);
> -if (int128_gt(llend, int128_make64(iova_max))) {
> -error_report("RAM section out of device range (max=0x%" PRIx64
> - ", end addr=0x%" PRIx64 ")",
> - iova_max, int128_get64(llend));
> -return true;
> -}
> -
>  return false;
>  }
>
> @@ -173,6 +165,115 @@ static void vhost_vdpa_listener_commit(MemoryListener 
> *listener)
>  v->iotlb_batch_begin_sent = false;
>  }
>
> +static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry 
> *iotlb)
> +{
> +struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
> +
> +hwaddr iova = iotlb->iova + iommu->iommu_offset;
> +struct vhost_vdpa *v = iommu->dev;
> +void *vaddr;
> +int ret;
> +
> +if (iotlb->target_as != &address_space_memory) {
> +error_report("Wrong target AS \"%s\", only system memory is allowed",
> + iotlb->target_as->name ? iotlb->target_as->name : 
> "none");
> +return;
> +}
> +RCU_READ_LOCK_GUARD();
> +vhost_vdpa_iotlb_batch_begin_once(v);
> +
> +if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> +bool read_only;
> +
> +if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> +  &address_space_memory)) {
> +return;
> +}
> +ret =
> +vhost_vdpa_dma_map(v, iova, iotlb->addr_mask + 1, vaddr, 
> read_only);
> +if (ret) {
> +error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
> + "0x%" HWADDR_PRIx ", %p) = %d (%m)",
> + v, iova, iotlb->addr_mask + 1, vaddr, ret);
> +}
> +} else {
> +ret = vhost_vdpa_dma_unmap(v, iova, iotlb->addr_mask + 1);
> +if (ret) {
> +error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
> + "0x%" HWADDR_PRIx ") = %d (%m)",
> + v, iova, iotlb->addr_mask + 1, ret);
> +}
> +}
> +}
> +
> +static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
> +MemoryRegionSection *section)
> +{
> +struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, 
> listener);
> +
> +struct vdpa_iommu *iommu;
> +Int128 end;
> +int iommu_idx;
> +IOMMUMemoryRegion *iommu_mr;
> +int ret;
> +
> +if (!memory_region_is_iommu(section->mr)) {
> +return;

Nit: So we had already had one check in the caller, there's no need to
check twice. (this could be done on top).

> +}
> +
> +iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> +
> +iommu = g_malloc0(sizeof(*iommu));
> +end =  int128_add(int128_make64(section->offset_within_region),
> +section->size);
> +end = int128_sub(end, int128_one());
> +iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> +MEMTXATTRS_UNSPECIFIED);
> +
> +iommu->iommu_mr = iommu_mr;
> +
> +iommu_notifier_init(
> +&iommu->n, vhost_vdpa_iommu_map_notify, IOMMU_NOTIFIER_IOTLB_EVENTS,
> +section->offset_within_region, int128_get64(end), iommu_idx);
> +iommu->iommu_offset =
> +section->offset_within_address_space - section->offset_within_region;
> +iommu->dev = v;
> +
> +ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, 
> NULL);
> +if (ret) {
> +g_free(iommu);
> +return;
> +}
> +
> 

Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Jason Wang
On Thu, Oct 27, 2022 at 3:41 PM Cindy Lu  wrote:
>
> Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> change the name to memory_get_xlat_addr().So we can use this
> function in other devices,such as vDPA device.
>
> Signed-off-by: Cindy Lu 

Acked-by: Jason Wang 

> ---
>  hw/vfio/common.c  | 92 ++-
>  include/exec/memory.h |  4 ++
>  softmmu/memory.c  | 84 +++
>  3 files changed, 92 insertions(+), 88 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ace9562a9b..2b5a9f3d8d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -574,92 +574,6 @@ static bool 
> vfio_listener_skipped_section(MemoryRegionSection *section)
> section->offset_within_address_space & (1ULL << 63);
>  }
>
> -/* Called with rcu_read_lock held.  */
> -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> -   ram_addr_t *ram_addr, bool *read_only)
> -{
> -MemoryRegion *mr;
> -hwaddr xlat;
> -hwaddr len = iotlb->addr_mask + 1;
> -bool writable = iotlb->perm & IOMMU_WO;
> -
> -/*
> - * The IOMMU TLB entry we have just covers translation through
> - * this IOMMU to its immediate target.  We need to translate
> - * it the rest of the way through to memory.
> - */
> -mr = address_space_translate(&address_space_memory,
> - iotlb->translated_addr,
> - &xlat, &len, writable,
> - MEMTXATTRS_UNSPECIFIED);
> -if (!memory_region_is_ram(mr)) {
> -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> - xlat);
> -return false;
> -} else if (memory_region_has_ram_discard_manager(mr)) {
> -RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> -MemoryRegionSection tmp = {
> -.mr = mr,
> -.offset_within_region = xlat,
> -.size = int128_make64(len),
> -};
> -
> -/*
> - * Malicious VMs can map memory into the IOMMU, which is expected
> - * to remain discarded. vfio will pin all pages, populating memory.
> - * Disallow that. vmstate priorities make sure any RamDiscardManager
> - * were already restored before IOMMUs are restored.
> - */
> -if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> -error_report("iommu map to discarded memory (e.g., unplugged via"
> - " virtio-mem): %"HWADDR_PRIx"",
> - iotlb->translated_addr);
> -return false;
> -}
> -
> -/*
> - * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> - * pages will remain pinned inside vfio until unmapped, resulting in 
> a
> - * higher memory consumption than expected. If memory would get
> - * populated again later, there would be an inconsistency between 
> pages
> - * pinned by vfio and pages seen by QEMU. This is the case until
> - * unmapped from the IOMMU (e.g., during device reset).
> - *
> - * With malicious guests, we really only care about pinning more 
> memory
> - * than expected. RLIMIT_MEMLOCK set for the user/process can never 
> be
> - * exceeded and can be used to mitigate this problem.
> - */
> -warn_report_once("Using vfio with vIOMMUs and coordinated discarding 
> of"
> - " RAM (e.g., virtio-mem) works, however, malicious"
> - " guests can trigger pinning of more memory than"
> - " intended via an IOMMU. It's possible to mitigate "
> - " by setting/adjusting RLIMIT_MEMLOCK.");
> -}
> -
> -/*
> - * Translation truncates length to the IOMMU page size,
> - * check that it did not truncate too much.
> - */
> -if (len & iotlb->addr_mask) {
> -error_report("iommu has granularity incompatible with target AS");
> -return false;
> -}
> -
> -if (vaddr) {
> -*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> -}
> -
> -if (ram_addr) {
> -*ram_addr = memory_region_get_ram_addr(mr) + xlat;
> -}
> -
> -if (read_only) {
> -*read_only = !writable || mr->readonly;
> -}
> -
> -return true;
> -}
> -
>  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>  {
>  VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> IOMMUTLBEntry *iotlb)
>  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>  bool read_only;
>
> -if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> +if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> +  &address_space_memo

Re: [PATCH v10 2/9] s390x/cpu topology: reporting the CPU topology to the guest

2022-10-27 Thread Thomas Huth

On 12/10/2022 18.21, Pierre Morel wrote:

The guest can use the STSI instruction to get a buffer filled
with the CPU topology description.

Let us implement the STSI instruction for the basis CPU topology
level, level 2.

Signed-off-by: Pierre Morel 
---
  include/hw/s390x/cpu-topology.h |   3 +
  target/s390x/cpu.h  |  48 ++
  hw/s390x/cpu-topology.c |   8 ++-
  target/s390x/cpu_topology.c | 109 
  target/s390x/kvm/kvm.c  |   6 +-
  target/s390x/meson.build|   1 +
  6 files changed, 172 insertions(+), 3 deletions(-)
  create mode 100644 target/s390x/cpu_topology.c

diff --git a/include/hw/s390x/cpu-topology.h b/include/hw/s390x/cpu-topology.h
index 66c171d0bc..61c11db017 100644
--- a/include/hw/s390x/cpu-topology.h
+++ b/include/hw/s390x/cpu-topology.h
@@ -13,6 +13,8 @@
  #include "hw/qdev-core.h"
  #include "qom/object.h"
  
+#define S390_TOPOLOGY_POLARITY_H  0x00

+
  typedef struct S390TopoContainer {
  int active_count;
  } S390TopoContainer;
@@ -29,6 +31,7 @@ struct S390Topology {
  S390TopoContainer *socket;
  S390TopoTLE *tle;
  MachineState *ms;
+QemuMutex topo_mutex;
  };
  
  #define TYPE_S390_CPU_TOPOLOGY "s390-topology"

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 7d6d01325b..d604aa9c78 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -565,6 +565,52 @@ typedef union SysIB {
  } SysIB;
  QEMU_BUILD_BUG_ON(sizeof(SysIB) != 4096);
  
+/* CPU type Topology List Entry */

+typedef struct SysIBTl_cpu {
+uint8_t nl;
+uint8_t reserved0[3];
+uint8_t reserved1:5;
+uint8_t dedicated:1;
+uint8_t polarity:2;
+uint8_t type;
+uint16_t origin;
+uint64_t mask;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_cpu;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_cpu) != 16);
+
+/* Container type Topology List Entry */
+typedef struct SysIBTl_container {
+uint8_t nl;
+uint8_t reserved[6];
+uint8_t id;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_container;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_container) != 8);
+
+#define TOPOLOGY_NR_MAG  6
+#define TOPOLOGY_NR_MAG6 0
+#define TOPOLOGY_NR_MAG5 1
+#define TOPOLOGY_NR_MAG4 2
+#define TOPOLOGY_NR_MAG3 3
+#define TOPOLOGY_NR_MAG2 4
+#define TOPOLOGY_NR_MAG1 5
+/* Configuration topology */
+typedef struct SysIB_151x {
+uint8_t  reserved0[2];
+uint16_t length;
+uint8_t  mag[TOPOLOGY_NR_MAG];
+uint8_t  reserved1;
+uint8_t  mnest;
+uint32_t reserved2;
+char tle[0];
+} QEMU_PACKED QEMU_ALIGNED(8) SysIB_151x;
+QEMU_BUILD_BUG_ON(sizeof(SysIB_151x) != 16);
+
+/* Maxi size of a SYSIB structure is when all CPU are alone in a container */
+#define S390_TOPOLOGY_SYSIB_SIZE (sizeof(SysIB_151x) + 
\
+  S390_MAX_CPUS * (sizeof(SysIBTl_container) + 
\
+   sizeof(SysIBTl_cpu)))
+
+
  /* MMU defines */
  #define ASCE_ORIGIN   (~0xfffULL) /* segment table origin 
*/
  #define ASCE_SUBSPACE 0x200   /* subspace group control   
*/
@@ -843,4 +889,6 @@ S390CPU *s390_cpu_addr2state(uint16_t cpu_addr);
  
  #include "exec/cpu-all.h"
  
+void insert_stsi_15_1_x(S390CPU *cpu, int sel2, __u64 addr, uint8_t ar);

+
  #endif
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index 42b22a1831..c73cebfe6f 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -54,8 +54,6 @@ void s390_topology_new_cpu(int core_id)
  return;
  }
  
-socket_id = core_id / topo->cpus;

-
  /*
   * At the core level, each CPU is represented by a bit in a 64bit
   * unsigned long which represent the presence of a CPU.
@@ -76,8 +74,13 @@ void s390_topology_new_cpu(int core_id)
  bit %= 64;
  bit = 63 - bit;
  
+qemu_mutex_lock(&topo->topo_mutex);

+
+socket_id = core_id / topo->cpus;
  topo->socket[socket_id].active_count++;
  set_bit(bit, &topo->tle[socket_id].mask[origin]);
+
+qemu_mutex_unlock(&topo->topo_mutex);
  }
  
  /**

@@ -101,6 +104,7 @@ static void s390_topology_realize(DeviceState *dev, Error 
**errp)
  topo->tle = g_new0(S390TopoTLE, ms->smp.max_cpus);
  
  topo->ms = ms;

+qemu_mutex_init(&topo->topo_mutex);
  }
  
  /**

diff --git a/target/s390x/cpu_topology.c b/target/s390x/cpu_topology.c
new file mode 100644
index 00..df86a98f23
--- /dev/null
+++ b/target/s390x/cpu_topology.c
@@ -0,0 +1,109 @@
+/*
+ * QEMU S390x CPU Topology
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "hw/s390x/pv.h"
+#include "hw/sysbus.h"
+#include "hw/s390x/cpu-topology.h"
+#include "hw/s390x/sclp.h"
+
+#define S390_TOPOLOGY_MAX_STSI_SIZE (S3

Re: [PATCH v10 3/9] s390x/cpu_topology: resetting the Topology-Change-Report

2022-10-27 Thread Thomas Huth

On 12/10/2022 18.21, Pierre Morel wrote:

During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
  bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
Reviewed-by: Nico Boehr 
Reviewed-by: Janis Schoetterl-Glausch 
---
  target/s390x/cpu.h   |  1 +
  target/s390x/kvm/kvm_s390x.h |  1 +
  hw/s390x/cpu-topology.c  | 12 
  hw/s390x/s390-virtio-ccw.c   |  1 +
  target/s390x/cpu-sysemu.c|  7 +++
  target/s390x/kvm/kvm.c   | 23 +++
  6 files changed, 45 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index d604aa9c78..9b35795ac8 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -825,6 +825,7 @@ void s390_enable_css_support(S390CPU *cpu);
  void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg);
  int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t sch_id,
  int vq, bool assign);
+void s390_cpu_topology_reset(void);
  #ifndef CONFIG_USER_ONLY
  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
  #else
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index aaae8570de..a13c8fb9a3 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -46,5 +46,6 @@ void kvm_s390_crypto_reset(void);
  void kvm_s390_restart_interrupt(S390CPU *cpu);
  void kvm_s390_stop_interrupt(S390CPU *cpu);
  void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info);
+int kvm_s390_topology_set_mtcr(uint64_t attr);
  
  #endif /* KVM_S390X_H */

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index c73cebfe6f..9f202621d0 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -107,6 +107,17 @@ static void s390_topology_realize(DeviceState *dev, Error 
**errp)
  qemu_mutex_init(&topo->topo_mutex);
  }
  
+/**

+ * s390_topology_reset:
+ * @dev: the device
+ *
+ * Calls the sysemu topology reset
+ */
+static void s390_topology_reset(DeviceState *dev)
+{
+s390_cpu_topology_reset();
+}
+
  /**
   * topology_class_init:
   * @oc: Object class
@@ -120,6 +131,7 @@ static void topology_class_init(ObjectClass *oc, void *data)
  
  dc->realize = s390_topology_realize;

  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+dc->reset = s390_topology_reset;
  }
  
  static const TypeInfo cpu_topology_info = {

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index aa99a62e42..362378454a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -113,6 +113,7 @@ static const char *const reset_dev_types[] = {
  "s390-flic",
  "diag288",
  TYPE_S390_PCI_HOST_BRIDGE,
+TYPE_S390_CPU_TOPOLOGY,
  };
  
  static void subsystem_reset(void)

diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 948e4bd3e0..707c0b658c 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -306,3 +306,10 @@ void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data 
arg)
  kvm_s390_set_diag318(cs, arg.host_ulong);
  }
  }
+
+void s390_cpu_topology_reset(void)
+{
+if (kvm_enabled()) {
+kvm_s390_topology_set_mtcr(0);
+}
+}
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index f96630440b..9c994d27d5 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2585,3 +2585,26 @@ int kvm_s390_get_zpci_op(void)
  {
  return cap_zpci_op;
  }
+
+int kvm_s390_topology_set_mtcr(uint64_t attr)
+{
+struct kvm_device_attr attribute = {
+.group = KVM_S390_VM_CPU_TOPOLOGY,
+.attr  = attr,
+};
+int ret;
+
+if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+return -EFAULT;


EFAULT is something that indicates a bad address (e.g. a segmentation fault) 
... so this definitely sounds like a bad choice for an error code here.


 Thomas





Re: [PATCH 5/7] block/nfs: Fix 32-bit Windows build

2022-10-27 Thread Bin Meng
On Thu, Oct 27, 2022 at 3:55 PM Kevin Wolf  wrote:
>
> Am 27.10.2022 um 04:45 hat Bin Meng geschrieben:
> > Hi Kevin,
> > [...]
> > Will you queue this patch via the block tree?
>
> Just to be sure, you mean only patch 5? Yes, I can do that.
>

Yes, only this one. Thank you.

Regards,
Bin



Re: [PATCH v5 00/13] Instantiate VT82xx functions in host device

2022-10-27 Thread Bernhard Beschow
Am 16. September 2022 14:36:05 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 12/9/22 21:50, Bernhard Beschow wrote:
>> Am 1. September 2022 11:41:14 UTC schrieb Bernhard Beschow 
>> :
>
>>> Testing done:
>>> 
>>> * `qemu-system-ppc -machine pegasos2 -rtc base=localtime -device 
>>> ati-vga,guest_hwcursor=true,romfile="" -cdrom morphos-3.17.iso -kernel 
>>> morphos-3.17/boot.img`
>>> 
>>>   Boots successfully and it is possible to open games and tools.
>>> 
>>> 
>>> 
>>> * I was unable to test the fuloong2e board even before this series since it 
>>> seems to be unfinished [1].
>>> 
>>>   A buildroot-baked kernel [2] booted but doesn't find its root partition, 
>>> though the issues could be in the buildroot receipt I created.
>>> 
>>> 
>>> 
>>> [1] https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2
>>> 
>>> [2] https://github.com/shentok/buildroot/commits/fuloong2e
>>> 
>> 
>> Copying from v2 (just found it in my spam folder :/):
>> Series:
>> Reviewed-by: Philippe Mathieu-Daudé 
>> 
>> Review seems complete, thanks to all who participated! Now we just need 
>> someone to queue this series.
>> 
>> Best regards,
>> Bernhard
>
>Excellent cleanup! Series queued to mips-next.

Hi Phil,

would you mind doing a pull request in time for 7.2?

Thanks,
Bernhard




Re: [PATCH v1 0/3] target/riscv: Apply KVM policy to ISA extensions

2022-10-27 Thread Andrew Jones
On Thu, Oct 27, 2022 at 7:52 AM Mayuresh Chitale
 wrote:
>
> Currently the single and multi letter ISA extensions exposed to the guest
> vcpu don't confirm to the KVM policies. This patchset updates the kvm headers
> and applies policies set in KVM to the extensions exposed to the guest.
>
> Mayuresh Chitale (3):
>   update-linux-headers: Version 6.1-rc2
>   target/riscv: Extend isa_ext_data for single letter extensions
>   target/riscv: kvm: Support selecting VCPU extensions
>

I already reviewed this internally and it hasn't changed, so

for the series

Reviewed-by: Andrew Jones 

Thanks,
drew



Re: [PATCH v1 00/12] Introduce xenpv machine for arm architecture

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> Hi,
> This series add xenpv machine for aarch64. Motivation behind creating xenpv
> machine with IOREQ and TPM was to enable each guest on Xen aarch64 to have 
> it's
> own unique and emulated TPM.
>
> This series does following:
> 1. Moved common xen functionalities from hw/i386/xen to hw/xen/ so those 
> can
>be used for aarch64.
> 2. We added a minimal xenpv arm machine which creates an IOREQ server and
>support TPM.

Now I have some CI minutes again:

  https://gitlab.com/stsquad/qemu/-/pipelines/677956972/failures

which broadly break down into:

  * GUEST_TPM_BASE define missing
  * #include  failure on builds that don't enable Xen
  * CPUTLBEntryFull f; breakage (tcg bits in a non-tcg build?)

-- 
Alex Bennée



Re: [PATCH v1 10/12] hw/arm: introduce xenpv machine

2022-10-27 Thread Alex Bennée


Julien Grall  writes:

> Hi,
>
> There seem to be some missing patches on xen-devel (including the
> cover letter). Is that expected?
>
> On 15/10/2022 06:07, Vikram Garhwal wrote:
>> Add a new machine xenpv which creates a IOREQ server to register/connect with
>> Xen Hypervisor.
>
> I don't like the name 'xenpv' because it doesn't convey the fact that
> some of the HW may be emulated rather than para-virtualized. In fact
> one may only want to use for emulating devices.
>
> Potential name would be 'xen-arm' or re-using 'virt' but with
> 'accel=xen' to select a Xen layout.

I don't think you can re-use the machine name and select by accelerator
because the virt machine does quite a lot of other stuff this model
doesn't support. However I've been calling this concept "xen-virt" or
maybe the explicit "xen-virtio" because that is what it is targeting.

-- 
Alex Bennée



Re: [PATCH v1 10/12] hw/arm: introduce xenpv machine

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> Add a new machine xenpv which creates a IOREQ server to register/connect with
> Xen Hypervisor.
>

> Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
> TPM emulator and connects to swtpm running on host machine via chardev socket
> and support TPM functionalities for a guest domain.

> +
> +static void xen_arm_machine_class_init(ObjectClass *oc, void *data)
> +{
> +
> +MachineClass *mc = MACHINE_CLASS(oc);
> +mc->desc = "Xen Para-virtualized PC";
> +mc->init = xen_arm_init;
> +mc->max_cpus = 1;
> +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);

This needs #ifdef CONFIG_TPM because while doing --disable-tpm to try
and get the cross build working it then fails with:

../../hw/arm/xen_arm.c: In function ‘xen_arm_machine_class_init’:
../../hw/arm/xen_arm.c:148:48: error: ‘TYPE_TPM_TIS_SYSBUS’ undeclared (first 
use in this function)
  148 | machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
  |^~~
../../hw/arm/xen_arm.c:148:48: note: each undeclared identifier is reported 
only once for each function it appears in

-- 
Alex Bennée



Re: [PATCH v1 08/12] hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> From: Stefano Stabellini 
>
> On ARM it is possible to have a functioning xenpv machine with only the
> PV backends and no IOREQ server. If the IOREQ server creation fails continue
> to the PV backends initialization.
>
> Signed-off-by: Stefano Stabellini 
> ---
>  hw/xen/xen-hvm-common.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> index f848f9e625..7bccf595fc 100644
> --- a/hw/xen/xen-hvm-common.c
> +++ b/hw/xen/xen-hvm-common.c
> @@ -777,7 +777,11 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
> max_cpus,
>  goto err;
>  }
>  
> -xen_create_ioreq_server(xen_domid, &state->ioservid);
> +rc = xen_create_ioreq_server(xen_domid, &state->ioservid);
> +if (rc) {
> +DPRINTF("xen: failed to create ioreq server\n");

This should be a warn_report to properly inform the user.

> +goto no_ioreq;

Maybe pushing the rest of this function into a local subroutine would
reduce the amount of goto messing about. Other candidates for cleaning
up/modernising:

  - g_malloc to g_new0
  - perror -> error_setg_errno

> +}
>  
>  state->exit.notify = xen_exit_notifier;
>  qemu_add_exit_notifier(&state->exit);
> @@ -842,6 +846,7 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
> max_cpus,
>  QLIST_INIT(&state->dev_list);
>  device_listener_register(&state->device_listener);
>  
> +no_ioreq:
>  xen_bus_init();
>  
>  /* Initialize backend core & drivers */


-- 
Alex Bennée



Re: [PATCH v1 04/12] hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> From: Stefano Stabellini 
>
> In preparation to moving most of xen-hvm code to an arch-neutral location, 
> move:
> - shared_vmport_page
> - log_for_dirtybit
> - dirty_bitmap
> - suspend
> - wakeup
>
> out of XenIOState struct as these are only used on x86, especially the ones
> related to dirty logging.
> Updated XenIOState can be used for both aarch64 and x86.
>
> Also, remove free_phys_offset as it was unused.
>
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Vikram Garhwal 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée



Re: [PATCH] avocado: use sha1 for fc31 imgs to avoid first time re-download

2022-10-27 Thread Daniel P . Berrangé
On Thu, Oct 27, 2022 at 09:46:29AM +0200, Thomas Huth wrote:
> On 24/10/2022 11.02, Daniel P. Berrangé wrote:
> > On Sat, Oct 22, 2022 at 02:03:50PM -0300, Daniel Henrique Barboza wrote:
> > > 'make check-avocado' will download any images that aren't present in the
> > > cache via 'get-vm-images' in tests/Makefile.include. The target that
> > > downloads fedora 31 images, get-vm-image-fedora-31, will use 'avocado
> > > vmimage get  --distro=fedora --distro-version=31 --arch=(...)' to
> > > download the image for each arch. Note that this command does not
> > > support any argument to set the hash algorithm used and, based on the
> > > avocado source code [1], DEFAULT_HASH_ALGORITHM is set to "sha1". The
> > > sha1 hash is stored in a Fedora-Cloud-Base-31-1.9.{ARCH}.qcow2-CHECKSUM
> > > in the cache.
> > 
> > > For now, in QEMU, let's use sha1 for all Fedora 31 images. This will
> > > immediately spares us at least one extra download for each Fedora 31
> > > image that we're doing in all our CI runs.
> > > 
> > > [1] https://github.com/avocado-framework/avocado.git @ 942a5d6972906
> > > [2] https://github.com/avocado-framework/avocado/issues/5496
> > 
> > Can we just ask Avocado maintainers to fix this problem on their
> > side to allow use of a modern hash alg as a priority item. We've
> > already had this problem in QEMU for over a year AFAICT, so doesn't
> > seem like we need to urgently do a workaround on QEMU side, so we
> > can get Avocado devs to commit to fixing it in the next month.
> 
> Do we have such a commitment? ... The avocado version in QEMU is completely
> backlevel these days, it's still using version 88.1 from May 2021, i.e.
> there hasn't been any update since more than a year. I recently tried to
> bump it to a newer version on my own (since I'm still suffering from the
> problem that find_free_port() does not work if you don't have a local IPv6
> address), but it's not that straight forward since the recent versions of
> avocado changed a lot of things (e.g. the new nrunner - do we want to run
> tests in parallel? If so it breaks a lot of the timeout settings, I think),
> so an update needs a lot of careful testing...

That it is so difficult to update Avocado after barely more than
1 year is not exactly a strong vote of confidence in our continued
use of Avocado long term :-(

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v3 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-10-27 Thread David Hildenbrand

On 14.10.22 15:47, David Hildenbrand wrote:

This is a follow-up on "util: NUMA aware memory preallocation" [1] by
Michal.

Setting the CPU affinity of threads from inside QEMU usually isn't
easily possible, because we don't want QEMU -- once started and running
guest code -- to be able to mess up the system. QEMU disallows relevant
syscalls using seccomp, such that any such invocation will fail.

Especially for memory preallocation in memory backends, the CPU affinity
can significantly increase guest startup time, for example, when running
large VMs backed by huge/gigantic pages, because of NUMA effects. For
NUMA-aware preallocation, we have to set the CPU affinity, however:

(1) Once preallocation threads are created during preallocation, management
 tools cannot intercept anymore to change the affinity. These threads
 are created automatically on demand.
(2) QEMU cannot easily set the CPU affinity itself.
(3) The CPU affinity derived from the NUMA bindings of the memory backend
 might not necessarily be exactly the CPUs we actually want to use
 (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).

There is an easy "workaround". If we have a thread with the right CPU
affinity, we can simply create new threads on demand via that prepared
context. So, all we have to do is setup and create such a context ahead
of time, to then configure preallocation to create new threads via that
environment.

So, let's introduce a user-creatable "thread-context" object that
essentially consists of a context thread used to create new threads.
QEMU can either try setting the CPU affinity itself ("cpu-affinity",
"node-affinity" property), or upper layers can extract the thread id
("thread-id" property) to configure it externally.

Make memory-backends consume a thread-context object
(via the "prealloc-context" property) and use it when preallocating to
create new threads with the desired CPU affinity. Further, to make it
easier to use, allow creation of "thread-context" objects, including
setting the CPU affinity directly from QEMU, before enabling the
sandbox option.


Quick test on a system with 2 NUMA nodes:

Without CPU affinity:
 time qemu-system-x86_64 \
 -object 
memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind
 \
 -nographic -monitor stdio

 real0m5.383s
 real0m3.499s
 real0m5.129s
 real0m4.232s
 real0m5.220s
 real0m4.288s
 real0m3.582s
 real0m4.305s
 real0m5.421s
 real0m4.502s

 -> It heavily depends on the scheduler CPU selection

With CPU affinity:
 time qemu-system-x86_64 \
 -object thread-context,id=tc1,node-affinity=0 \
 -object 
memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1
 \
 -sandbox enable=on,resourcecontrol=deny \
 -nographic -monitor stdio

 real0m1.959s
 real0m1.942s
 real0m1.943s
 real0m1.941s
 real0m1.948s
 real0m1.964s
 real0m1.949s
 real0m1.948s
 real0m1.941s
 real0m1.937s

On reasonably large VMs, the speedup can be quite significant.

While this concept is currently only used for short-lived preallocation
threads, nothing major speaks against reusing the concept for other
threads that are harder to identify/configure -- except that
we need additional (idle) context threads that are otherwise left unused.

This series does not yet tackle concurrent preallocation of memory
backends. Memory backend objects are created and memory is preallocated one
memory backend at a time -- and there is currently no way to do
preallocation asynchronously.

[1] 
https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mpriv...@redhat.com

v2 -> v3:
* "util: Introduce ThreadContext user-creatable object"
  -> Further impove documentation and patch description and add ACK. [Markus]
* "util: Add write-only "node-affinity" property for ThreadContext"
  -> Further impove documentation and patch description and add ACK. [Markus]

v1 -> v2:
* Fixed some minor style nits
* "util: Introduce ThreadContext user-creatable object"
  -> Impove documentation and patch description. [Markus]
* "util: Add write-only "node-affinity" property for ThreadContext"
  -> Impove documentation and patch description. [Markus]

RFC -> v1:
* "vl: Allow ThreadContext objects to be created before the sandbox option"
  -> Move parsing of the "name" property before object_create_pre_sandbox
* Added RB's


I'm queuing this to

https://github.com/davidhildenbrand/qemu.git mem-next

And most probably send a MR tomorrow before soft-freeze.

--
Thanks,

David / dhildenb




Re: [PATCH v1 05/12] hw/i386/xen/xen-hvm: create arch_handle_ioreq and arch_xen_set_memory

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> From: Stefano Stabellini 
>
> In preparation to moving most of xen-hvm code to an arch-neutral location,
> move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
>
> Also move handle_vmport_ioreq to arch_handle_ioreq.
>
> NOTE: This patch breaks the build. Next patch fixes the build issue.
> Reason behind creating this patch is because there is lot of new code addition
> and pure code movement done for enabling Xen on ARM. Keeping the this patch
> separate is done to make it easier to review.

But you do intend to squash the patches for the final version? We don't
want to intentionally break bisection.

Otherwise:

Reviewed-by: Alex Bennée 


-- 
Alex Bennée



Re: [PATCH v10 1/9] s390x/cpu topology: core_id sets s390x CPU topology

2022-10-27 Thread Janis Schoetterl-Glausch
On Thu, 2022-10-27 at 10:05 +0200, Thomas Huth wrote:
> On 24/10/2022 21.25, Janis Schoetterl-Glausch wrote:
> > On Wed, 2022-10-12 at 18:20 +0200, Pierre Morel wrote:
> > > In the S390x CPU topology the core_id specifies the CPU address
> > > and the position of the core withing the topology.
> > > 
> > > Let's build the topology based on the core_id.
> > > s390x/cpu topology: core_id sets s390x CPU topology
> > > 
> > > In the S390x CPU topology the core_id specifies the CPU address
> > > and the position of the cpu withing the topology.
> > > 
> > > Let's build the topology based on the core_id.
> > > 
> > > Signed-off-by: Pierre Morel 
> > > ---
> > >   include/hw/s390x/cpu-topology.h |  45 +++
> > >   hw/s390x/cpu-topology.c | 132 
> > >   hw/s390x/s390-virtio-ccw.c  |  21 +
> > >   hw/s390x/meson.build|   1 +
> > >   4 files changed, 199 insertions(+)
> > >   create mode 100644 include/hw/s390x/cpu-topology.h
> > >   create mode 100644 hw/s390x/cpu-topology.c
> > > 
> > > diff --git a/include/hw/s390x/cpu-topology.h 
> > > b/include/hw/s390x/cpu-topology.h
> > > new file mode 100644
> > > index 00..66c171d0bc
> > > --- /dev/null
> > > +++ b/include/hw/s390x/cpu-topology.h
> > > @@ -0,0 +1,45 @@
> > > +/*
> > > + * CPU Topology
> > > + *
> > > + * Copyright 2022 IBM Corp.
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > > + * your option) any later version. See the COPYING file in the top-level
> > > + * directory.
> > > + */
> > > +#ifndef HW_S390X_CPU_TOPOLOGY_H
> > > +#define HW_S390X_CPU_TOPOLOGY_H
> > > +
> > > +#include "hw/qdev-core.h"
> > > +#include "qom/object.h"
> > > +
> > > +typedef struct S390TopoContainer {
> > > +int active_count;
> > > +} S390TopoContainer;
> > > +
> > > +#define S390_TOPOLOGY_CPU_IFL 0x03
> > > +#define S390_TOPOLOGY_MAX_ORIGIN ((63 + S390_MAX_CPUS) / 64)
> > > +typedef struct S390TopoTLE {
> > > +uint64_t mask[S390_TOPOLOGY_MAX_ORIGIN];
> > > +} S390TopoTLE;
> > 
> > Since this actually represents multiple TLEs, you might want to change the
> > name of the struct to reflect this. S390TopoTLEList maybe?
> 
> Didn't TLE mean "Topology List Entry"? (by the way, Pierre, please explain 

Yes.

> this three letter acronym somewhere in this header in a comment)...
> 
> So expanding the TLE, this would mean S390TopoTopologyListEntryList ? ... 
> this is getting weird...

:D indeed. So the leaves of the topology tree as stored by STSI are lists
of CPU-type TLEs which aren't empty i.e. represent some cpus.
Whereas this struct is used to track which CPU-type TLEs need to be created.
It doesn't represent a TLE and doesn't represent the list of CPU-type TLEs.
So yeah, you're right, not a good name.

Off the top of my head I'd suggest S390TopoCPUSet. It's a bitmap, which is
kind of a set. Maybe S390TopoSocketCPUSet to reflect that it is the set of
CPUs in a socket, although, if we ever support different polarizations, etc.
that wouldn't really be true anymore, since that creates additional levels,
so maybe not. (In that case the leaf list of CPU-types TLEs is a flattened 
tree.)

> Also, this is not a "list" in the sense of a linked 
> list, as one might expect at a first glance, so this is all very confusing 
> here. Could you please come up with some better naming?
> 
>   Thomas
> 
> 




Re: [PATCH v1 11/12] meson.build: enable xenpv machine build for ARM

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> Add CONFIG_XEN for aarch64 device to support build for ARM targets.

So to be clear a --enable-xen only build for any of these binaries
essentially ends up being the same thing just with a slightly less
discombobulating name?

Maybe given there is no real architecture specific stuff we should just
create a neutral binary for --enable-xen (e.g. qemu-xen-backend)?

Anyway:

Reviewed-by: Alex Bennée 


>
> Signed-off-by: Vikram Garhwal 
> Signed-off-by: Stefano Stabellini 
> ---
>  meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/meson.build b/meson.build
> index b686dfef75..0027d7d195 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -125,7 +125,7 @@ endif
>  if cpu in ['x86', 'x86_64', 'arm', 'aarch64']
># i386 emulator provides xenpv machine type for multiple architectures
>accelerator_targets += {
> -'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu'],
> +'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu', 'aarch64-softmmu'],
>}
>  endif
>  if cpu in ['x86', 'x86_64']


-- 
Alex Bennée



Re: [PATCH v10 3/9] s390x/cpu_topology: resetting the Topology-Change-Report

2022-10-27 Thread Pierre Morel




On 10/27/22 10:14, Thomas Huth wrote:

On 12/10/2022 18.21, Pierre Morel wrote:

During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
  bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
Reviewed-by: Nico Boehr 
Reviewed-by: Janis Schoetterl-Glausch 
---
  target/s390x/cpu.h   |  1 +
  target/s390x/kvm/kvm_s390x.h |  1 +
  hw/s390x/cpu-topology.c  | 12 
  hw/s390x/s390-virtio-ccw.c   |  1 +
  target/s390x/cpu-sysemu.c    |  7 +++
  target/s390x/kvm/kvm.c   | 23 +++
  6 files changed, 45 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index d604aa9c78..9b35795ac8 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -825,6 +825,7 @@ void s390_enable_css_support(S390CPU *cpu);
  void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg);
  int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t 
sch_id,

  int vq, bool assign);
+void s390_cpu_topology_reset(void);
  #ifndef CONFIG_USER_ONLY
  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
  #else
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index aaae8570de..a13c8fb9a3 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -46,5 +46,6 @@ void kvm_s390_crypto_reset(void);
  void kvm_s390_restart_interrupt(S390CPU *cpu);
  void kvm_s390_stop_interrupt(S390CPU *cpu);
  void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info);
+int kvm_s390_topology_set_mtcr(uint64_t attr);
  #endif /* KVM_S390X_H */
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index c73cebfe6f..9f202621d0 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -107,6 +107,17 @@ static void s390_topology_realize(DeviceState 
*dev, Error **errp)

  qemu_mutex_init(&topo->topo_mutex);
  }
+/**
+ * s390_topology_reset:
+ * @dev: the device
+ *
+ * Calls the sysemu topology reset
+ */
+static void s390_topology_reset(DeviceState *dev)
+{
+    s390_cpu_topology_reset();
+}
+
  /**
   * topology_class_init:
   * @oc: Object class
@@ -120,6 +131,7 @@ static void topology_class_init(ObjectClass *oc, 
void *data)

  dc->realize = s390_topology_realize;
  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    dc->reset = s390_topology_reset;
  }
  static const TypeInfo cpu_topology_info = {
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index aa99a62e42..362378454a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -113,6 +113,7 @@ static const char *const reset_dev_types[] = {
  "s390-flic",
  "diag288",
  TYPE_S390_PCI_HOST_BRIDGE,
+    TYPE_S390_CPU_TOPOLOGY,
  };
  static void subsystem_reset(void)
diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 948e4bd3e0..707c0b658c 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -306,3 +306,10 @@ void s390_do_cpu_set_diag318(CPUState *cs, 
run_on_cpu_data arg)

  kvm_s390_set_diag318(cs, arg.host_ulong);
  }
  }
+
+void s390_cpu_topology_reset(void)
+{
+    if (kvm_enabled()) {
+    kvm_s390_topology_set_mtcr(0);
+    }
+}
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index f96630440b..9c994d27d5 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2585,3 +2585,26 @@ int kvm_s390_get_zpci_op(void)
  {
  return cap_zpci_op;
  }
+
+int kvm_s390_topology_set_mtcr(uint64_t attr)
+{
+    struct kvm_device_attr attribute = {
+    .group = KVM_S390_VM_CPU_TOPOLOGY,
+    .attr  = attr,
+    };
+    int ret;
+
+    if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+    return -EFAULT;


EFAULT is something that indicates a bad address (e.g. a segmentation 
fault) ... so this definitely sounds like a bad choice for an error code 
here.


Hum, yes, ENODEV seems besser no?



  Thomas




--
Pierre Morel
IBM Lab Boeblingen



Re: [PATCH v1 12/12] meson.build: do not set have_xen_pci_passthrough for aarch64 targets

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> From: Stefano Stabellini 
>
> have_xen_pci_passthrough is only used for Xen x86 VMs.
>
> Signed-off-by: Stefano Stabellini 

I think this might want to before 11/12. Anyway:

Reviewed-by: Alex Bennée 

> ---
>  meson.build | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/meson.build b/meson.build
> index 0027d7d195..43e70936ee 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1454,6 +1454,8 @@ have_xen_pci_passthrough = 
> get_option('xen_pci_passthrough') \
> error_message: 'Xen PCI passthrough requested but Xen not 
> enabled') \
>.require(targetos == 'linux',
> error_message: 'Xen PCI passthrough not available on this 
> platform') \
> +  .require(cpu == 'x86'  or cpu == 'x86_64',
> +   error_message: 'Xen PCI passthrough not available on this 
> platform') \
>.allowed()


-- 
Alex Bennée



Re: [PATCH] target/arm: Fixed Privileged Access Never (PAN) for aarch32

2022-10-27 Thread Timofey Kutergin
Hi Peter,
> V8 always implies V7, so we only need to check V7 here.
>From silicon perspective - yes, but as I see in qemu,
ARM_FEATURE_V7 and ARM_FEATURE_V8 are independent bits which do not affect
each
other in arm_feature() and set_feature() so they should be tested
separately.
Did I miss something?

Thanks
Best regards
Timofey



On Tue, Oct 25, 2022 at 4:45 PM Peter Maydell 
wrote:

> On Wed, 19 Oct 2022 at 13:15, Timofey Kutergin 
> wrote:
> >
> > - synchronize PSTATE.PAN with changes in CPSR.PAN in aarch32 mode
> > - set PAN bit automatically on exception entry if SCTLR_SPAN bit
> >   is set
> > - throw permission fault during address translation when PAN is
> >   enabled and kernel tries to access user acessible page
> > - ignore SCTLR_XP bit for armv7 and armv8 (conflicts with SCTLR_SPAN).
> >
> > Signed-off-by: Timofey Kutergin 
> > ---
> >  target/arm/helper.c |  6 ++
> >  target/arm/ptw.c| 11 ++-
> >  2 files changed, 16 insertions(+), 1 deletion(-)
>
> Thanks for this patch. I think you've caught all the places
> we aren't correctly implementing AArch32 PAN handling.
>
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index dde64a487a..5299f67e3f 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -9052,6 +9052,11 @@ void cpsr_write(CPUARMState *env, uint32_t val,
> uint32_t mask,
> >  }
> >  mask &= ~CACHED_CPSR_BITS;
> >  env->uncached_cpsr = (env->uncached_cpsr & ~mask) | (val & mask);
> > +if (env->uncached_cpsr & CPSR_PAN) {
> > +env->pstate |= PSTATE_PAN;
> > +} else {
> > +env->pstate &= ~PSTATE_PAN;
> > +}
>
> This approach means we're storing the PAN bit in two places,
> both in env->uncached_cpsr and in env->pstate. We don't do
> this for any other bits as far as I can see. I think we should
> either:
>  (1) have the code that changes behaviour based on PAN look
>  at either env->pstate or env->uncached_cpsr depending
>  on whether we're AArch64 or AArch32
>  (2) always store the state in env->pstate only, and handle
>  this in read/write of the CPSR the same way we do with
>  other "cached" bits
>
> I think the intention of the current code is (1), and the
> only place we get this wrong is in arm_mmu_idx_el(),
> which is checking env->pstate only. (The other places that
> directly check env->pstate are all in AArch64-only code,
> and various AArch32-only bits of code already check
> env->uncached_cpsr.) A function like
>
> bool arm_pan_enabled(CPUARMState *env)
> {
> if (is_a64(env)) {
> return env->pstate & PSTATE_PAN;
> } else {
> return env->uncached_cpsr & CPSR_PAN;
> }
> }
>
> and then using that in arm_mmu_idx_el() should I think
> mean you don't need to change either cpsr_write() or
> take_aarch32_exception().
>
> >  if (rebuild_hflags) {
> >  arm_rebuild_hflags(env);
> >  }
> > @@ -9592,6 +9597,7 @@ static void take_aarch32_exception(CPUARMState
> *env, int new_mode,
> >  /* ... the target is EL1 and SCTLR.SPAN is 0.  */
> >  if (!(env->cp15.sctlr_el[new_el] & SCTLR_SPAN)) {
> >  env->uncached_cpsr |= CPSR_PAN;
> > +env->pstate |= PSTATE_PAN;
> >  }
> >  break;
> >  }
> > diff --git a/target/arm/ptw.c b/target/arm/ptw.c
> > index 23f16f4ff7..204a73350f 100644
> > --- a/target/arm/ptw.c
> > +++ b/target/arm/ptw.c
> > @@ -659,6 +659,13 @@ static bool get_phys_addr_v6(CPUARMState *env,
> uint32_t address,
> >  goto do_fault;
> >  }
> >
> > +if (regime_is_pan(env, mmu_idx) && !regime_is_user(env,
> mmu_idx) &&
> > +simple_ap_to_rw_prot_is_user(ap >> 1, 1) &&
> > +access_type != MMU_INST_FETCH) {
> > +fi->type = ARMFault_Permission;
> > +goto do_fault;
> > +}
>
> This assumes we're using the SCTLR.AFE==1 simplified
> permissions model, but PAN should apply even if we're using the
> old model. So we need a ap_to_rw_prot_is_user() to check the
> permissions in that model.
>
> The check is also being done before the Access fault check, but
> the architecture says that Access faults take priority over
> Permission faults.
>
> > +
> >  if (arm_feature(env, ARM_FEATURE_V6K) &&
> >  (regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
> >  /* The simplified model uses AP[0] as an access control
> bit.  */
> > @@ -2506,7 +2513,9 @@ bool get_phys_addr_with_secure(CPUARMState *env,
> target_ulong address,
> >  if (regime_using_lpae_format(env, mmu_idx)) {
> >  return get_phys_addr_lpae(env, address, access_type, mmu_idx,
> >is_secure, false, result, fi);
> > -} else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
> > +} else if (arm_feature(env, ARM_FEATURE_V7) ||
> > +   arm_feature(env, ARM_FEATURE_V8) || (
>
> V8 always implies V7, so we only need to ch

Re: [PATCH v1 09/12] accel/xen/xen-all: export xenstore_record_dm_state

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> xenstore_record_dm_state() will also be used in aarch64 xenpv machine.
>
> Signed-off-by: Vikram Garhwal 
> Signed-off-by: Stefano Stabellini 
> ---
>  accel/xen/xen-all.c  | 2 +-
>  include/hw/xen/xen.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
> index 69aa7d018b..276625b78b 100644
> --- a/accel/xen/xen-all.c
> +++ b/accel/xen/xen-all.c
> @@ -100,7 +100,7 @@ void xenstore_store_pv_console_info(int i, Chardev *chr)
>  }
>  
>  
> -static void xenstore_record_dm_state(struct xs_handle *xs, const char *state)
> +void xenstore_record_dm_state(struct xs_handle *xs, const char *state)
>  {
>  char path[50];
>  
> diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> index afdf9c436a..31e9538a5c 100644
> --- a/include/hw/xen/xen.h
> +++ b/include/hw/xen/xen.h
> @@ -9,6 +9,7 @@
>   */
>  
>  #include "exec/cpu-common.h"
> +#include 

This is breaking a bunch of the builds and generally we try and avoid
adding system includes in headers (apart from osdep.h) for this reason.
In fact there is a comment just above to that fact.

I think you can just add struct xs_handle to typedefs.h (or maybe just
xen.h) and directly include xenstore.h in xen-all.c following the usual
rules:

  https://qemu.readthedocs.io/en/latest/devel/style.html#include-directives

It might be worth doing an audit to see what else is including xen.h
needlessly or should be using sysemu/xen.h. 

>  
>  /* xen-machine.c */
>  enum xen_mode {
> @@ -31,5 +32,6 @@ qemu_irq *xen_interrupt_controller_init(void);
>  void xenstore_store_pv_console_info(int i, Chardev *chr);
>  
>  void xen_register_framebuffer(struct MemoryRegion *mr);
> +void xenstore_record_dm_state(struct xs_handle *xs, const char *state);
>  
>  #endif /* QEMU_HW_XEN_H */


-- 
Alex Bennée



Re: [PATCH v1 00/12] Introduce xenpv machine for arm architecture

2022-10-27 Thread Alex Bennée


Vikram Garhwal  writes:

> Hi,
> This series add xenpv machine for aarch64. Motivation behind creating xenpv
> machine with IOREQ and TPM was to enable each guest on Xen aarch64 to have 
> it's
> own unique and emulated TPM.
>
> This series does following:
> 1. Moved common xen functionalities from hw/i386/xen to hw/xen/ so those 
> can
>be used for aarch64.
> 2. We added a minimal xenpv arm machine which creates an IOREQ server and
>support TPM.
>
> Please note that patch 05/12 breaks the build. Patch 06/12 fixes the build
> issue. If needed we can merge patch 05/12 and 06/12. For now we kept these
> separate to make changes easy to review.
>
> Also, checkpatch.pl fails for 03/12 and 06/12. These fails are due to
> moving old code to new place which was not QEMU code style compatible.
> No new add code was added.

I've finished my review pass. Please CC me on v2 when it's ready ;-)

-- 
Alex Bennée



Re: [PATCH] target/arm: Fixed Privileged Access Never (PAN) for aarch32

2022-10-27 Thread Peter Maydell
On Thu, 27 Oct 2022 at 10:22, Timofey Kutergin  wrote:
> > V8 always implies V7, so we only need to check V7 here.

> From silicon perspective - yes, but as I see in qemu,
> ARM_FEATURE_V7 and ARM_FEATURE_V8 are independent bits which do not affect 
> each
> other in arm_feature() and set_feature() so they should be tested separately.
> Did I miss something?

In arm_cpu_realizefn() there is code which sets feature flags
that are always implied by other feature flags. There we set
the V7VE flag if V8 is set, and the V7 flag if V7VE is set.
So we can rely on any v8 CPU having the V7 feature flag set.

thanks
-- PMM



Re: [PATCH v5 00/13] Instantiate VT82xx functions in host device

2022-10-27 Thread Daniel Henrique Barboza




On 10/27/22 05:21, Bernhard Beschow wrote:

Am 16. September 2022 14:36:05 UTC schrieb "Philippe Mathieu-Daudé" 
:

On 12/9/22 21:50, Bernhard Beschow wrote:

Am 1. September 2022 11:41:14 UTC schrieb Bernhard Beschow :



Testing done:

* `qemu-system-ppc -machine pegasos2 -rtc base=localtime -device 
ati-vga,guest_hwcursor=true,romfile="" -cdrom morphos-3.17.iso -kernel 
morphos-3.17/boot.img`

   Boots successfully and it is possible to open games and tools.



* I was unable to test the fuloong2e board even before this series since it 
seems to be unfinished [1].

   A buildroot-baked kernel [2] booted but doesn't find its root partition, 
though the issues could be in the buildroot receipt I created.



[1] https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2

[2] https://github.com/shentok/buildroot/commits/fuloong2e



Copying from v2 (just found it in my spam folder :/):
Series:
Reviewed-by: Philippe Mathieu-Daudé 

Review seems complete, thanks to all who participated! Now we just need someone 
to queue this series.

Best regards,
Bernhard


Excellent cleanup! Series queued to mips-next.


Hi Phil,

would you mind doing a pull request in time for 7.2?


I believe Phil was having problems with his amsat.org email. It's
better to CC him using his work email phi...@linaro.org (just added
it).

Phil, since this has pegasos2 changes I can queue it up via ppc-next
if you like. I'll toss a PR tomorrow.



Daniel





Thanks,
Bernhard






Re: [PATCH] target/hppa: Fix fid instruction emulation

2022-10-27 Thread Richard Henderson

On 10/27/22 16:31, Helge Deller wrote:

The fid instruction (Floating-Point Identify) puts the FPU model and
revision into the Status Register. Since those values shouldn't be 0,
store values there which a PCX-L2 (for 32-bit) or a PCX-W2 (for 64-bit)
would return.

Signed-off-by: Helge Deller 

diff --git a/target/hppa/insns.decode b/target/hppa/insns.decode
index c7a7e997f9..3ba5f9885a 100644
--- a/target/hppa/insns.decode
+++ b/target/hppa/insns.decode
@@ -388,10 +388,8 @@ fmpyfadd_d  101110 rm1:5 rm2:5 ... 0 1 ..0 0 0 neg:1 
t:5ra3=%rc32

  # Floating point class 0

-# FID.  With r = t = 0, which via fcpy puts 0 into fr0.
-# This is machine/revision = 0, which is reserved for simulator.


Is there something in particular for which this is failing?
Per the manual, 0 means simulator, which we are.
So far we haven't identified as a particular cpu, have we?



+static bool trans_fid_f(DisasContext *ctx, arg_fid_f *a)
+{
+nullify_over(ctx);
+#if TARGET_REGISTER_BITS == 64
+save_frd(0, tcg_const_i64(0x130800)); /* PA8700 (PCX-W2) */
+#else
+save_frd(0, tcg_const_i64(0x0f0800)); /* PA7300LC (PCX-L2) */
+#endif
+return nullify_end(ctx);
+}


Missing ULL suffix.


r~



Re: [PATCH v10 3/9] s390x/cpu_topology: resetting the Topology-Change-Report

2022-10-27 Thread Cédric Le Goater

On 10/27/22 11:11, Pierre Morel wrote:



On 10/27/22 10:14, Thomas Huth wrote:

On 12/10/2022 18.21, Pierre Morel wrote:

During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
  bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
Reviewed-by: Nico Boehr 
Reviewed-by: Janis Schoetterl-Glausch 
---
  target/s390x/cpu.h   |  1 +
  target/s390x/kvm/kvm_s390x.h |  1 +
  hw/s390x/cpu-topology.c  | 12 
  hw/s390x/s390-virtio-ccw.c   |  1 +
  target/s390x/cpu-sysemu.c    |  7 +++
  target/s390x/kvm/kvm.c   | 23 +++
  6 files changed, 45 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index d604aa9c78..9b35795ac8 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -825,6 +825,7 @@ void s390_enable_css_support(S390CPU *cpu);
  void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg);
  int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t sch_id,
  int vq, bool assign);
+void s390_cpu_topology_reset(void);
  #ifndef CONFIG_USER_ONLY
  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
  #else
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index aaae8570de..a13c8fb9a3 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -46,5 +46,6 @@ void kvm_s390_crypto_reset(void);
  void kvm_s390_restart_interrupt(S390CPU *cpu);
  void kvm_s390_stop_interrupt(S390CPU *cpu);
  void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info);
+int kvm_s390_topology_set_mtcr(uint64_t attr);
  #endif /* KVM_S390X_H */
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index c73cebfe6f..9f202621d0 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -107,6 +107,17 @@ static void s390_topology_realize(DeviceState *dev, Error 
**errp)
  qemu_mutex_init(&topo->topo_mutex);
  }
+/**
+ * s390_topology_reset:
+ * @dev: the device
+ *
+ * Calls the sysemu topology reset
+ */
+static void s390_topology_reset(DeviceState *dev)
+{
+    s390_cpu_topology_reset();
+}
+
  /**
   * topology_class_init:
   * @oc: Object class
@@ -120,6 +131,7 @@ static void topology_class_init(ObjectClass *oc, void *data)
  dc->realize = s390_topology_realize;
  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    dc->reset = s390_topology_reset;
  }
  static const TypeInfo cpu_topology_info = {
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index aa99a62e42..362378454a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -113,6 +113,7 @@ static const char *const reset_dev_types[] = {
  "s390-flic",
  "diag288",
  TYPE_S390_PCI_HOST_BRIDGE,
+    TYPE_S390_CPU_TOPOLOGY,
  };
  static void subsystem_reset(void)
diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 948e4bd3e0..707c0b658c 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -306,3 +306,10 @@ void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data 
arg)
  kvm_s390_set_diag318(cs, arg.host_ulong);
  }
  }
+
+void s390_cpu_topology_reset(void)
+{
+    if (kvm_enabled()) {
+    kvm_s390_topology_set_mtcr(0);
+    }
+}
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index f96630440b..9c994d27d5 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2585,3 +2585,26 @@ int kvm_s390_get_zpci_op(void)
  {
  return cap_zpci_op;
  }
+
+int kvm_s390_topology_set_mtcr(uint64_t attr)
+{
+    struct kvm_device_attr attribute = {
+    .group = KVM_S390_VM_CPU_TOPOLOGY,
+    .attr  = attr,
+    };
+    int ret;
+
+    if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+    return -EFAULT;


EFAULT is something that indicates a bad address (e.g. a segmentation fault) 
... so this definitely sounds like a bad choice for an error code here.


Hum, yes, ENODEV seems besser no?


-ENOTSUP would be 'meilleur' may be ?  :)

C.






  Thomas









[PATCH v3 1/8] hmat acpi: Don't require initiator value in -numa

2022-10-27 Thread Hesham Almatary via
From: Brice Goglin 

The "Memory Proximity Domain Attributes" structure of the ACPI HMAT
has a "Processor Proximity Domain Valid" flag that is currently
always set because Qemu -numa requires an initiator=X value
when hmat=on. Unsetting this flag allows to create more complex
memory topologies by having multiple best initiators for a single
memory target.

This patch allows -numa without initiator=X when hmat=on by keeping
the default value MAX_NODES in numa_state->nodes[i].initiator.
All places reading numa_state->nodes[i].initiator already check
whether it's different from MAX_NODES before using it.

Tested with
qemu-system-x86_64 -accel kvm \
 -machine pc,hmat=on \
 -drive if=pflash,format=raw,file=./OVMF.fd \
 -drive media=disk,format=qcow2,file=efi.qcow2 \
 -smp 4 \
 -m 3G \
 -object memory-backend-ram,size=1G,id=ram0 \
 -object memory-backend-ram,size=1G,id=ram1 \
 -object memory-backend-ram,size=1G,id=ram2 \
 -numa node,nodeid=0,memdev=ram0,cpus=0-1 \
 -numa node,nodeid=1,memdev=ram1,cpus=2-3 \
 -numa node,nodeid=2,memdev=ram2 \
 -numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=10
 \
 -numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10485760
 \
 -numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=20
 \
 -numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5242880
 \
 -numa 
hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,latency=30
 \
 -numa 
hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=1048576
 \
 -numa 
hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=20
 \
 -numa 
hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5242880
 \
 -numa 
hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=10
 \
 -numa 
hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10485760
 \
 -numa 
hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,latency=30
 \
 -numa 
hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=1048576
which reports NUMA node2 at same distance from both node0 and node1 as seen in 
lstopo:
Machine (2966MB total) + Package P#0
  NUMANode P#2 (979MB)
  Group0
NUMANode P#0 (980MB)
Core P#0 + PU P#0
Core P#1 + PU P#1
  Group0
NUMANode P#1 (1007MB)
Core P#2 + PU P#2
Core P#3 + PU P#3

Before this patch, we had to add ",initiator=X" to "-numa 
node,nodeid=2,memdev=ram2".
The lstopo output difference between initiator=1 and no initiator is:
@@ -1,10 +1,10 @@
 Machine (2966MB total) + Package P#0
+  NUMANode P#2 (979MB)
   Group0
 NUMANode P#0 (980MB)
 Core P#0 + PU P#0
 Core P#1 + PU P#1
   Group0
 NUMANode P#1 (1007MB)
-NUMANode P#2 (979MB)
 Core P#2 + PU P#2
 Core P#3 + PU P#3

Corresponding changes in the HMAT MPDA structure:
@@ -49,10 +49,10 @@
 [078h 0120   2]   Structure Type :  [Memory Proximity Domain 
Attributes]
 [07Ah 0122   2] Reserved : 
 [07Ch 0124   4]   Length : 0028
-[080h 0128   2]Flags (decoded below) : 0001
-Processor Proximity Domain Valid : 1
+[080h 0128   2]Flags (decoded below) : 
+Processor Proximity Domain Valid : 0
 [082h 0130   2]Reserved1 : 
-[084h 0132   4] Attached Initiator Proximity Domain : 0001
+[084h 0132   4] Attached Initiator Proximity Domain : 0080
 [088h 0136   4]  Memory Proximity Domain : 0002
 [08Ch 0140   4]Reserved2 : 
 [090h 0144   8]Reserved3 : 

Final HMAT SLLB structures:
[0A0h 0160   2]   Structure Type : 0001 [System Locality Latency 
and Bandwidth Information]
[0A2h 0162   2] Reserved : 
[0A4h 0164   4]   Length : 0040
[0A8h 0168   1]Flags (decoded below) : 00
Memory Hierarchy : 0
[0A9h 0169   1]Data Type : 00
[0AAh 0170   2]Reserved1 : 
[0ACh 0172   4] Initiator Proximity Domains # : 0002
[0B0h 0176   4]   Target Proximity Domains # : 0003
[0B4h 0180   4]Reserved2 : 
[0B8h 0184   8]  Entry Base Unit : 2710
[0C0h 0192   4] Initiator Proximity Domain List : 
[0C4h 0196   4] Initiator Proximity Domain List : 0001
[0C8h 0200   4] Target Proximity Domain List : 
[0CCh 0204   4] Target Proximity Domain List : 0001
[0D0h 0208   4] Target Proximity Domain List : 0002
[0D4h 0212   2]Entry : 0001
[0D6h 0214   2]Entry : 0002
[0D8h 0216   2]Entry : 0003
[0DAh 0218   2]Entry : 0002
[0DCh 0220   2] 

[PATCH v3 3/8] tests: acpi: q35: add test for hmat nodes without initiators

2022-10-27 Thread Hesham Almatary via
From: Brice Goglin 

expected HMAT:

[000h    4]Signature : "HMAT"[Heterogeneous Memory 
Attributes Table]
[004h 0004   4] Table Length : 0120
[008h 0008   1] Revision : 02
[009h 0009   1] Checksum : 4F
[00Ah 0010   6]   Oem ID : "BOCHS "
[010h 0016   8] Oem Table ID : "BXPC"
[018h 0024   4] Oem Revision : 0001
[01Ch 0028   4]  Asl Compiler ID : "BXPC"
[020h 0032   4]Asl Compiler Revision : 0001

[024h 0036   4] Reserved : 

[028h 0040   2]   Structure Type :  [Memory Proximity Domain 
Attributes]
[02Ah 0042   2] Reserved : 
[02Ch 0044   4]   Length : 0028
[030h 0048   2]Flags (decoded below) : 0001
Processor Proximity Domain Valid : 1
[032h 0050   2]Reserved1 : 
[034h 0052   4] Attached Initiator Proximity Domain : 
[038h 0056   4]  Memory Proximity Domain : 
[03Ch 0060   4]Reserved2 : 
[040h 0064   8]Reserved3 : 
[048h 0072   8]Reserved4 : 

[050h 0080   2]   Structure Type :  [Memory Proximity Domain 
Attributes]
[052h 0082   2] Reserved : 
[054h 0084   4]   Length : 0028
[058h 0088   2]Flags (decoded below) : 0001
Processor Proximity Domain Valid : 1
[05Ah 0090   2]Reserved1 : 
[05Ch 0092   4] Attached Initiator Proximity Domain : 0001
[060h 0096   4]  Memory Proximity Domain : 0001
[064h 0100   4]Reserved2 : 
[068h 0104   8]Reserved3 : 
[070h 0112   8]Reserved4 : 

[078h 0120   2]   Structure Type :  [Memory Proximity Domain 
Attributes]
[07Ah 0122   2] Reserved : 
[07Ch 0124   4]   Length : 0028
[080h 0128   2]Flags (decoded below) : 
Processor Proximity Domain Valid : 0
[082h 0130   2]Reserved1 : 
[084h 0132   4] Attached Initiator Proximity Domain : 0080
[088h 0136   4]  Memory Proximity Domain : 0002
[08Ch 0140   4]Reserved2 : 
[090h 0144   8]Reserved3 : 
[098h 0152   8]Reserved4 : 

[0A0h 0160   2]   Structure Type : 0001 [System Locality Latency 
and Bandwidth Information]
[0A2h 0162   2] Reserved : 
[0A4h 0164   4]   Length : 0040
[0A8h 0168   1]Flags (decoded below) : 00
Memory Hierarchy : 0
[0A9h 0169   1]Data Type : 00
[0AAh 0170   2]Reserved1 : 
[0ACh 0172   4] Initiator Proximity Domains # : 0002
[0B0h 0176   4]   Target Proximity Domains # : 0003
[0B4h 0180   4]Reserved2 : 
[0B8h 0184   8]  Entry Base Unit : 2710
[0C0h 0192   4] Initiator Proximity Domain List : 
[0C4h 0196   4] Initiator Proximity Domain List : 0001
[0C8h 0200   4] Target Proximity Domain List : 
[0CCh 0204   4] Target Proximity Domain List : 0001
[0D0h 0208   4] Target Proximity Domain List : 0002
[0D4h 0212   2]Entry : 0001
[0D6h 0214   2]Entry : 0002
[0D8h 0216   2]Entry : 0003
[0DAh 0218   2]Entry : 0002
[0DCh 0220   2]Entry : 0001
[0DEh 0222   2]Entry : 0003

[0E0h 0224   2]   Structure Type : 0001 [System Locality Latency 
and Bandwidth Information]
[0E2h 0226   2] Reserved : 
[0E4h 0228   4]   Length : 0040
[0E8h 0232   1]Flags (decoded below) : 00
Memory Hierarchy : 0
[0E9h 0233   1]Data Type : 03
[0EAh 0234   2]Reserved1 : 
[0ECh 0236   4] Initiator Proximity Domains # : 0002
[0F0h 0240   4]   Target Proximity Domains # : 0003
[0F4h 0244   4]Reserved2 : 
[0F8h 0248   8]  Entry Base Unit : 0001
[100h 0256   4] Initiator Proximity Domain List : 
[104h 0260   4] Initiator Proximity Domain List : 0001
[108h 0264   4] Target Proximity Domain List : 
[10Ch 0268   4] Target Proximity Domain List : 0001
[110h 0272   4] Target Proximity Domain List : 0002
[114h 0276   2]Entry : 000A
[116h 0278   2]Entry : 0005
[118h 0280   2]Entry : 0001
[11Ah 0282   2]Entry : 0005
[11Ch 0284   2]   

[PATCH v3 0/8] AArch64/HMAT support and tests

2022-10-27 Thread Hesham Almatary via
This patchset adds support for AArch64/HMAT including a test.
It relies on other two patch sets from:

Brice Goglin: to support -numa without initiators on q35/x86.
  https://lore.kernel.org/all/ed23accb-2c8b-90f4-a7a3-f81cc57bf...@inria.fr/
Xiang Chen: to enable/support HMAT on AArch64.
  
https://lore.kernel.org/all/1643102134-15506-1-git-send-email-chenxian...@hisilicon.com/

I further add a test with ACPI/HMAT tables that uses the two
patch sets.

Changes from v2:
- Rebased and fixed a merge conflict

Changes from v1:
- Generate APIC and PPTT ACPI tables for AArch64/virt
- Avoid using legacy syntax in numa/bios tests
- Delete unchanged FACP tables

Brice Goglin (4):
  hmat acpi: Don't require initiator value in -numa
  tests: acpi: add and whitelist *.hmat-noinitiator expected blobs
  tests: acpi: q35: add test for hmat nodes without initiators
  tests: acpi: q35: update expected blobs *.hmat-noinitiators expected
HMAT:

Hesham Almatary (3):
  tests: Add HMAT AArch64/virt empty table files
  tests: acpi: aarch64/virt: add a test for hmat nodes with no
initiators
  tests: virt: Update expected *.acpihmatvirt tables

Xiang Chen (1):
  hw/arm/virt: Enable HMAT on arm virt machine

 hw/arm/Kconfig|   1 +
 hw/arm/virt-acpi-build.c  |   7 ++
 hw/core/machine.c |   4 +-
 tests/data/acpi/q35/APIC.acpihmat-noinitiator | Bin 0 -> 144 bytes
 tests/data/acpi/q35/DSDT.acpihmat-noinitiator | Bin 0 -> 8553 bytes
 tests/data/acpi/q35/HMAT.acpihmat-noinitiator | Bin 0 -> 288 bytes
 tests/data/acpi/q35/SRAT.acpihmat-noinitiator | Bin 0 -> 312 bytes
 tests/data/acpi/virt/APIC.acpihmatvirt| Bin 0 -> 396 bytes
 tests/data/acpi/virt/DSDT.acpihmatvirt| Bin 0 -> 5282 bytes
 tests/data/acpi/virt/HMAT.acpihmatvirt| Bin 0 -> 288 bytes
 tests/data/acpi/virt/PPTT.acpihmatvirt| Bin 0 -> 196 bytes
 tests/data/acpi/virt/SRAT.acpihmatvirt| Bin 0 -> 240 bytes
 tests/qtest/bios-tables-test.c| 109 ++
 13 files changed, 118 insertions(+), 3 deletions(-)
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat-noinitiator
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat-noinitiator
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat-noinitiator
 create mode 100644 tests/data/acpi/q35/SRAT.acpihmat-noinitiator
 create mode 100644 tests/data/acpi/virt/APIC.acpihmatvirt
 create mode 100644 tests/data/acpi/virt/DSDT.acpihmatvirt
 create mode 100644 tests/data/acpi/virt/HMAT.acpihmatvirt
 create mode 100644 tests/data/acpi/virt/PPTT.acpihmatvirt
 create mode 100644 tests/data/acpi/virt/SRAT.acpihmatvirt

-- 
2.25.1




[PATCH v2 3/6] target/openrisc: Always exit after mtspr npc

2022-10-27 Thread Richard Henderson
We have called cpu_restore_state asserting will_exit.
Do not go back on that promise.  This affects icount.

Signed-off-by: Richard Henderson 
---
 target/openrisc/sys_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/openrisc/sys_helper.c b/target/openrisc/sys_helper.c
index 09b3c97d7c..a3508e421d 100644
--- a/target/openrisc/sys_helper.c
+++ b/target/openrisc/sys_helper.c
@@ -51,8 +51,8 @@ void HELPER(mtspr)(CPUOpenRISCState *env, target_ulong spr, 
target_ulong rb)
 if (env->pc != rb) {
 env->pc = rb;
 env->dflag = 0;
-cpu_loop_exit(cs);
 }
+cpu_loop_exit(cs);
 break;
 
 case TO_SPR(0, 17): /* SR */
-- 
2.34.1




[PATCH v2 6/6] accel/tcg: Remove reset_icount argument from cpu_restore_state_from_tb

2022-10-27 Thread Richard Henderson
The value passed is always true.

Reviewed-by: Claudio Fontana 
Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h  |  2 +-
 accel/tcg/tb-maint.c  |  4 ++--
 accel/tcg/translate-all.c | 15 +++
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 9c06b320b7..cb13bade4f 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -107,7 +107,7 @@ TranslationBlock *tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
tb_page_addr_t phys_page2);
 bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
 void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-   uintptr_t host_pc, bool reset_icount);
+   uintptr_t host_pc);
 
 /* Return the current PC from CPU, which may be cached in TB. */
 static inline target_ulong log_pc(CPUState *cpu, const TranslationBlock *tb)
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index c8e921089d..0cdb35548c 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -536,7 +536,7 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
  * restore the CPU state.
  */
 current_tb_modified = true;
-cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
+cpu_restore_state_from_tb(cpu, current_tb, retaddr);
 }
 #endif /* TARGET_HAS_PRECISE_SMC */
 tb_phys_invalidate__locked(tb);
@@ -685,7 +685,7 @@ bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, 
uintptr_t pc)
  * function to partially restore the CPU state.
  */
 current_tb_modified = true;
-cpu_restore_state_from_tb(cpu, current_tb, pc, true);
+cpu_restore_state_from_tb(cpu, current_tb, pc);
 }
 #endif /* TARGET_HAS_PRECISE_SMC */
 tb_phys_invalidate(tb, addr);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 90997fed47..0089578f8f 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -282,12 +282,11 @@ static int cpu_unwind_data_from_tb(TranslationBlock *tb, 
uintptr_t host_pc,
 }
 
 /*
- * The cpu state corresponding to 'host_pc' is restored.
- * When reset_icount is true, current TB will be interrupted and
- * icount should be recalculated.
+ * The cpu state corresponding to 'host_pc' is restored in
+ * preparation for exiting the TB.
  */
 void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-   uintptr_t host_pc, bool reset_icount)
+   uintptr_t host_pc)
 {
 uint64_t data[TARGET_INSN_START_WORDS];
 #ifdef CONFIG_PROFILER
@@ -300,7 +299,7 @@ void cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
 return;
 }
 
-if (reset_icount && (tb_cflags(tb) & CF_USE_ICOUNT)) {
+if (tb_cflags(tb) & CF_USE_ICOUNT) {
 assert(icount_enabled());
 /*
  * Reset the cycle counter to the start of the block and
@@ -333,7 +332,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc)
 if (in_code_gen_buffer((const void *)(host_pc - tcg_splitwx_diff))) {
 TranslationBlock *tb = tcg_tb_lookup(host_pc);
 if (tb) {
-cpu_restore_state_from_tb(cpu, tb, host_pc, true);
+cpu_restore_state_from_tb(cpu, tb, host_pc);
 return true;
 }
 }
@@ -1032,7 +1031,7 @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
 tb = tcg_tb_lookup(retaddr);
 if (tb) {
 /* We can use retranslation to find the PC.  */
-cpu_restore_state_from_tb(cpu, tb, retaddr, true);
+cpu_restore_state_from_tb(cpu, tb, retaddr);
 tb_phys_invalidate(tb, -1);
 } else {
 /* The exception probably happened in a helper.  The CPU state should
@@ -1068,7 +1067,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
   (void *)retaddr);
 }
-cpu_restore_state_from_tb(cpu, tb, retaddr, true);
+cpu_restore_state_from_tb(cpu, tb, retaddr);
 
 /*
  * Some guests must re-execute the branch when re-executing a delay
-- 
2.34.1




[PATCH v2 1/6] accel/tcg: Introduce cpu_unwind_state_data

2022-10-27 Thread Richard Henderson
Add a way to examine the unwind data without actually
restoring the data back into env.

Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h  |  4 +--
 include/exec/exec-all.h   | 21 ---
 accel/tcg/translate-all.c | 74 ++-
 3 files changed, 68 insertions(+), 31 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 1227bb69bd..9c06b320b7 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -106,8 +106,8 @@ void tb_reset_jump(TranslationBlock *tb, int n);
 TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
tb_page_addr_t phys_page2);
 bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
-int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-  uintptr_t searched_pc, bool reset_icount);
+void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
+   uintptr_t host_pc, bool reset_icount);
 
 /* Return the current PC from CPU, which may be cached in TB. */
 static inline target_ulong log_pc(CPUState *cpu, const TranslationBlock *tb)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e948992a80..7d851f5907 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -39,20 +39,33 @@ typedef ram_addr_t tb_page_addr_t;
 #define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
 #endif
 
+/**
+ * cpu_unwind_state_data:
+ * @cpu: the cpu context
+ * @host_pc: the host pc within the translation
+ * @data: output data
+ *
+ * Attempt to load the the unwind state for a host pc occurring in
+ * translated code.  If @host_pc is not in translated code, the
+ * function returns false; otherwise @data is loaded.
+ * This is the same unwind info as given to restore_state_to_opc.
+ */
+bool cpu_unwind_state_data(CPUState *cpu, uintptr_t host_pc, uint64_t *data);
+
 /**
  * cpu_restore_state:
- * @cpu: the vCPU state is to be restore to
- * @searched_pc: the host PC the fault occurred at
+ * @cpu: the cpu context
+ * @host_pc: the host pc within the translation
  * @will_exit: true if the TB executed will be interrupted after some
cpu adjustments. Required for maintaining the correct
icount valus
  * @return: true if state was restored, false otherwise
  *
  * Attempt to restore the state for a fault occurring in translated
- * code. If the searched_pc is not in translated code no state is
+ * code. If @host_pc is not in translated code no state is
  * restored and the function returns false.
  */
-bool cpu_restore_state(CPUState *cpu, uintptr_t searched_pc, bool will_exit);
+bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, bool will_exit);
 
 G_NORETURN void cpu_loop_exit_noexc(CPUState *cpu);
 G_NORETURN void cpu_loop_exit(CPUState *cpu);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f185356a36..319becb698 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -247,52 +247,66 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
 return p - block;
 }
 
-/* The cpu state corresponding to 'searched_pc' is restored.
- * When reset_icount is true, current TB will be interrupted and
- * icount should be recalculated.
- */
-int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-  uintptr_t searched_pc, bool reset_icount)
+static int cpu_unwind_data_from_tb(TranslationBlock *tb, uintptr_t host_pc,
+   uint64_t *data)
 {
-uint64_t data[TARGET_INSN_START_WORDS];
-uintptr_t host_pc = (uintptr_t)tb->tc.ptr;
+uintptr_t iter_pc = (uintptr_t)tb->tc.ptr;
 const uint8_t *p = tb->tc.ptr + tb->tc.size;
 int i, j, num_insns = tb->icount;
-#ifdef CONFIG_PROFILER
-TCGProfile *prof = &tcg_ctx->prof;
-int64_t ti = profile_getclock();
-#endif
 
-searched_pc -= GETPC_ADJ;
+host_pc -= GETPC_ADJ;
 
-if (searched_pc < host_pc) {
+if (host_pc < iter_pc) {
 return -1;
 }
 
-memset(data, 0, sizeof(data));
+memset(data, 0, sizeof(uint64_t) * TARGET_INSN_START_WORDS);
 if (!TARGET_TB_PCREL) {
 data[0] = tb_pc(tb);
 }
 
-/* Reconstruct the stored insn data while looking for the point at
-   which the end of the insn exceeds the searched_pc.  */
+/*
+ * Reconstruct the stored insn data while looking for the point
+ * at which the end of the insn exceeds host_pc.
+ */
 for (i = 0; i < num_insns; ++i) {
 for (j = 0; j < TARGET_INSN_START_WORDS; ++j) {
 data[j] += decode_sleb128(&p);
 }
-host_pc += decode_sleb128(&p);
-if (host_pc > searched_pc) {
-goto found;
+iter_pc += decode_sleb128(&p);
+if (iter_pc > host_pc) {
+return num_insns - i;
 }
 }
 return -1;
+}
+
+/*
+ * The cpu state corresponding to 'host_pc' is restored.
+ * When reset_

[PATCH v2 5/6] accel/tcg: Remove will_exit argument from cpu_restore_state

2022-10-27 Thread Richard Henderson
The value passed is always true, and if the target's
synchronize_from_tb hook is non-trivial, not exiting
may be erroneous.

Reviewed-by: Claudio Fontana 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h |  5 +
 accel/tcg/cpu-exec-common.c |  2 +-
 accel/tcg/translate-all.c   | 12 ++--
 target/alpha/helper.c   |  2 +-
 target/alpha/mem_helper.c   |  2 +-
 target/arm/op_helper.c  |  2 +-
 target/arm/tlb_helper.c |  8 
 target/cris/helper.c|  2 +-
 target/i386/tcg/sysemu/svm_helper.c |  2 +-
 target/m68k/op_helper.c |  4 ++--
 target/microblaze/helper.c  |  2 +-
 target/nios2/op_helper.c|  2 +-
 target/openrisc/sys_helper.c|  4 ++--
 target/ppc/excp_helper.c|  2 +-
 target/s390x/tcg/excp_helper.c  |  2 +-
 target/tricore/op_helper.c  |  2 +-
 target/xtensa/helper.c  |  6 +++---
 17 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 7d851f5907..9b7bfbf09a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -56,16 +56,13 @@ bool cpu_unwind_state_data(CPUState *cpu, uintptr_t 
host_pc, uint64_t *data);
  * cpu_restore_state:
  * @cpu: the cpu context
  * @host_pc: the host pc within the translation
- * @will_exit: true if the TB executed will be interrupted after some
-   cpu adjustments. Required for maintaining the correct
-   icount valus
  * @return: true if state was restored, false otherwise
  *
  * Attempt to restore the state for a fault occurring in translated
  * code. If @host_pc is not in translated code no state is
  * restored and the function returns false.
  */
-bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, bool will_exit);
+bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc);
 
 G_NORETURN void cpu_loop_exit_noexc(CPUState *cpu);
 G_NORETURN void cpu_loop_exit(CPUState *cpu);
diff --git a/accel/tcg/cpu-exec-common.c b/accel/tcg/cpu-exec-common.c
index be6fe45aa5..c7bc8c6efa 100644
--- a/accel/tcg/cpu-exec-common.c
+++ b/accel/tcg/cpu-exec-common.c
@@ -71,7 +71,7 @@ void cpu_loop_exit(CPUState *cpu)
 void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
 {
 if (pc) {
-cpu_restore_state(cpu, pc, true);
+cpu_restore_state(cpu, pc);
 }
 cpu_loop_exit(cpu);
 }
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 319becb698..90997fed47 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -318,16 +318,8 @@ void cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
 #endif
 }
 
-bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, bool will_exit)
+bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc)
 {
-/*
- * The pc update associated with restore without exit will
- * break the relative pc adjustments performed by TARGET_TB_PCREL.
- */
-if (TARGET_TB_PCREL) {
-assert(will_exit);
-}
-
 /*
  * The host_pc has to be in the rx region of the code buffer.
  * If it is not we will not be able to resolve it here.
@@ -341,7 +333,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, 
bool will_exit)
 if (in_code_gen_buffer((const void *)(host_pc - tcg_splitwx_diff))) {
 TranslationBlock *tb = tcg_tb_lookup(host_pc);
 if (tb) {
-cpu_restore_state_from_tb(cpu, tb, host_pc, will_exit);
+cpu_restore_state_from_tb(cpu, tb, host_pc, true);
 return true;
 }
 }
diff --git a/target/alpha/helper.c b/target/alpha/helper.c
index a5a389b5a3..970c869771 100644
--- a/target/alpha/helper.c
+++ b/target/alpha/helper.c
@@ -532,7 +532,7 @@ G_NORETURN void dynamic_excp(CPUAlphaState *env, uintptr_t 
retaddr,
 cs->exception_index = excp;
 env->error_code = error;
 if (retaddr) {
-cpu_restore_state(cs, retaddr, true);
+cpu_restore_state(cs, retaddr);
 /* Floating-point exceptions (our only users) point to the next PC.  */
 env->pc += 4;
 }
diff --git a/target/alpha/mem_helper.c b/target/alpha/mem_helper.c
index 47283a0612..a39b52c5dd 100644
--- a/target/alpha/mem_helper.c
+++ b/target/alpha/mem_helper.c
@@ -28,7 +28,7 @@ static void do_unaligned_access(CPUAlphaState *env, vaddr 
addr, uintptr_t retadd
 uint64_t pc;
 uint32_t insn;
 
-cpu_restore_state(env_cpu(env), retaddr, true);
+cpu_restore_state(env_cpu(env), retaddr);
 
 pc = env->pc;
 insn = cpu_ldl_code(env, pc);
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index c5bde1cfcc..70672bcd9f 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -78,7 +78,7 @@ void raise_exception_ra(CPUARMState *env, uint32_t excp, 
uint32_t syndrome,
  * we must restore CPU state here before setting the syndrome
  * the caller passed us, and cannot use cpu_loop_exit_

[PATCH v2 0/6] tcg: Fix x86 TARGET_TB_PCREL (#1269)

2022-10-27 Thread Richard Henderson
As per #1269, this affects NetBSD installer boot.

The problem is that one of the x86 acpi callbacks modifies
env->eip during an mmio store, which means that the tracking
that translate.c does is thrown out of whack.

Introduce a method to extract unwind data without the
writeback to env.  This isn't a perfect abstraction, but I
couldn't think of anything better.  There's a couple of lines
of code duplication, but probably less than any abstration
that we might put on top

Changes for v2:
  * Rebase on master, 23 patches merged.
  * Comments adjusted per review (claudio)


r~


Richard Henderson (6):
  accel/tcg: Introduce cpu_unwind_state_data
  target/i386: Use cpu_unwind_state_data for tpr access
  target/openrisc: Always exit after mtspr npc
  target/openrisc: Use cpu_unwind_state_data for mfspr
  accel/tcg: Remove will_exit argument from cpu_restore_state
  accel/tcg: Remove reset_icount argument from cpu_restore_state_from_tb

 accel/tcg/internal.h|  4 +-
 include/exec/exec-all.h | 24 +---
 accel/tcg/cpu-exec-common.c |  2 +-
 accel/tcg/tb-maint.c|  4 +-
 accel/tcg/translate-all.c   | 91 +
 target/alpha/helper.c   |  2 +-
 target/alpha/mem_helper.c   |  2 +-
 target/arm/op_helper.c  |  2 +-
 target/arm/tlb_helper.c |  8 +--
 target/cris/helper.c|  2 +-
 target/i386/helper.c| 21 ++-
 target/i386/tcg/sysemu/svm_helper.c |  2 +-
 target/m68k/op_helper.c |  4 +-
 target/microblaze/helper.c  |  2 +-
 target/nios2/op_helper.c|  2 +-
 target/openrisc/sys_helper.c| 17 --
 target/ppc/excp_helper.c|  2 +-
 target/s390x/tcg/excp_helper.c  |  2 +-
 target/tricore/op_helper.c  |  2 +-
 target/xtensa/helper.c  |  6 +-
 20 files changed, 125 insertions(+), 76 deletions(-)

-- 
2.34.1




[PATCH v3 6/8] hw/arm/virt: Enable HMAT on arm virt machine

2022-10-27 Thread Hesham Almatary via
From: Xiang Chen 

Since the patchset ("Build ACPI Heterogeneous Memory Attribute Table (HMAT)"),
HMAT is supported, but only x86 is enabled. Enable HMAT on arm virt machine.

Signed-off-by: Xiang Chen 
Signed-off-by: Hesham Almatary 
Reviewed-by: Igor Mammedov 
---
 hw/arm/Kconfig   | 1 +
 hw/arm/virt-acpi-build.c | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 15fa79afd3..17fcde8e1c 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -30,6 +30,7 @@ config ARM_VIRT
 select ACPI_VIOT
 select VIRTIO_MEM_SUPPORTED
 select ACPI_CXL
+select ACPI_HMAT
 
 config CHEETAH
 bool
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 13c6e3e468..7f706f72bb 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -42,6 +42,7 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/generic_event_device.h"
 #include "hw/acpi/tpm.h"
+#include "hw/acpi/hmat.h"
 #include "hw/pci/pcie_host.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
@@ -989,6 +990,12 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 build_slit(tables_blob, tables->linker, ms, vms->oem_id,
vms->oem_table_id);
 }
+
+if (ms->numa_state->hmat_enabled) {
+acpi_add_table(table_offsets, tables_blob);
+build_hmat(tables_blob, tables->linker, ms->numa_state,
+   vms->oem_id, vms->oem_table_id);
+}
 }
 
 if (ms->nvdimms_state->is_enabled) {
-- 
2.25.1




[PATCH v3 8/8] tests: virt: Update expected *.acpihmatvirt tables

2022-10-27 Thread Hesham Almatary via
* Expected ACPI Data Table [HMAT]
[000h    4]Signature : "HMAT"[Heterogeneous
Memory Attributes Table]
[004h 0004   4] Table Length : 0120
[008h 0008   1] Revision : 02
[009h 0009   1] Checksum : 4F
[00Ah 0010   6]   Oem ID : "BOCHS "
[010h 0016   8] Oem Table ID : "BXPC"
[018h 0024   4] Oem Revision : 0001
[01Ch 0028   4]  Asl Compiler ID : "BXPC"
[020h 0032   4]Asl Compiler Revision : 0001

[024h 0036   4] Reserved : 

[028h 0040   2]   Structure Type :  [Memory Proximity
Domain Attributes]
[02Ah 0042   2] Reserved : 
[02Ch 0044   4]   Length : 0028
[030h 0048   2]Flags (decoded below) : 0001
Processor Proximity Domain Valid : 1
[032h 0050   2]Reserved1 : 
[034h 0052   4]   Processor Proximity Domain : 
[038h 0056   4]  Memory Proximity Domain : 
[03Ch 0060   4]Reserved2 : 
[040h 0064   8]Reserved3 : 
[048h 0072   8]Reserved4 : 

[050h 0080   2]   Structure Type :  [Memory Proximity
Domain Attributes]
[052h 0082   2] Reserved : 
[054h 0084   4]   Length : 0028
[058h 0088   2]Flags (decoded below) : 0001
Processor Proximity Domain Valid : 1
[05Ah 0090   2]Reserved1 : 
[05Ch 0092   4]   Processor Proximity Domain : 0001
[060h 0096   4]  Memory Proximity Domain : 0001
[064h 0100   4]Reserved2 : 
[068h 0104   8]Reserved3 : 
[070h 0112   8]Reserved4 : 

[078h 0120   2]   Structure Type :  [Memory Proximity
Domain Attributes]
[07Ah 0122   2] Reserved : 
[07Ch 0124   4]   Length : 0028
[080h 0128   2]Flags (decoded below) : 
Processor Proximity Domain Valid : 0
[082h 0130   2]Reserved1 : 
[084h 0132   4]   Processor Proximity Domain : 0080
[088h 0136   4]  Memory Proximity Domain : 0002
[08Ch 0140   4]Reserved2 : 
[040h 0064   8]Reserved3 : 
[048h 0072   8]Reserved4 : 

[050h 0080   2]   Structure Type :  [Memory Proximity
Domain Attributes]
[052h 0082   2] Reserved : 
[054h 0084   4]   Length : 0028
[058h 0088   2]Flags (decoded below) : 0001
Processor Proximity Domain Valid : 1
[05Ah 0090   2]Reserved1 : 
[05Ch 0092   4]   Processor Proximity Domain : 0001
[060h 0096   4]  Memory Proximity Domain : 0001
[064h 0100   4]Reserved2 : 
[068h 0104   8]Reserved3 : 
[070h 0112   8]Reserved4 : 

[078h 0120   2]   Structure Type :  [Memory Proximity
Domain Attributes]
[07Ah 0122   2] Reserved : 
[07Ch 0124   4]   Length : 0028
[080h 0128   2]Flags (decoded below) : 
Processor Proximity Domain Valid : 0
[082h 0130   2]Reserved1 : 
[084h 0132   4]   Processor Proximity Domain : 0080
[088h 0136   4]  Memory Proximity Domain : 0002
[08Ch 0140   4]Reserved2 : 
[090h 0144   8]Reserved3 : 
[098h 0152   8]Reserved4 : 

[0A0h 0160   2]   Structure Type : 0001 [System Locality
Latency and Bandwidth Information]
[0A2h 0162   2] Reserved : 
[0A4h 0164   4]   Length : 0040
[0A8h 0168   1]Flags (decoded below) : 00
Memory Hierarchy : 0
[0A9h 0169   1]Data Type : 00
[0AAh 0170   2]Reserved1 : 
[0ACh 0172   4] Initiator Proximity Domains # : 0002
[0B0h 0176   4]   Target Proximity Domains # : 0003
[0B4h 0180   4]Reserved2 : 
[0B8h 0184   8]  Entry Base Unit : 2710
[0C0h 0192   4] Initiator Proximity Domain List : 
[0C4h 0196   4] Initiator Proximity Domain List : 0001
[0C8h 0200   4] Target Proximity Domain List : 
[0CCh 0204   4] Target Proximity Domain List : 0001
[0D0h 0208   4] Target Proximity Domain List : 0002
[0D4h 0212   2]Entry : 0001
[0D6h 0214   2]Entry : 0002
[0D8h 0216   2]Entry : 0003
[0DAh 0218   2]Entry : 0002

[PATCH v2 2/6] target/i386: Use cpu_unwind_state_data for tpr access

2022-10-27 Thread Richard Henderson
Avoid cpu_restore_state, and modifying env->eip out from
underneath the translator with TARGET_TB_PCREL.  There is
some slight duplication from x86_restore_state_to_opc,
but it's just a few lines.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1269
Signed-off-by: Richard Henderson 
---
 target/i386/helper.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index b62a1e48e2..2cd1756f1a 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -509,6 +509,23 @@ void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int 
bank,
 }
 }
 
+static target_ulong get_memio_eip(CPUX86State *env)
+{
+uint64_t data[TARGET_INSN_START_WORDS];
+CPUState *cs = env_cpu(env);
+
+if (!cpu_unwind_state_data(cs, cs->mem_io_pc, data)) {
+return env->eip;
+}
+
+/* Per x86_restore_state_to_opc. */
+if (TARGET_TB_PCREL) {
+return (env->eip & TARGET_PAGE_MASK) | data[0];
+} else {
+return data[0] - env->segs[R_CS].base;
+}
+}
+
 void cpu_report_tpr_access(CPUX86State *env, TPRAccess access)
 {
 X86CPU *cpu = env_archcpu(env);
@@ -519,9 +536,9 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess 
access)
 
 cpu_interrupt(cs, CPU_INTERRUPT_TPR);
 } else if (tcg_enabled()) {
-cpu_restore_state(cs, cs->mem_io_pc, false);
+target_ulong eip = get_memio_eip(env);
 
-apic_handle_tpr_access_report(cpu->apic_state, env->eip, access);
+apic_handle_tpr_access_report(cpu->apic_state, eip, access);
 }
 }
 #endif /* !CONFIG_USER_ONLY */
-- 
2.34.1




[PATCH v3 4/8] tests: acpi: q35: update expected blobs *.hmat-noinitiators expected HMAT:

2022-10-27 Thread Hesham Almatary via
From: Brice Goglin 

[000h    4]Signature : "HMAT"[Heterogeneous Memory 
Attributes Table]
[004h 0004   4] Table Length : 0120
[008h 0008   1] Revision : 02
[009h 0009   1] Checksum : 4F
[00Ah 0010   6]   Oem ID : "BOCHS "
[010h 0016   8] Oem Table ID : "BXPC"
[018h 0024   4] Oem Revision : 0001
[01Ch 0028   4]  Asl Compiler ID : "BXPC"
[020h 0032   4]Asl Compiler Revision : 0001

[024h 0036   4] Reserved : 

[028h 0040   2]   Structure Type :  [Memory Proximity Domain 
Attributes]
[02Ah 0042   2] Reserved : 
[02Ch 0044   4]   Length : 0028
[030h 0048   2]Flags (decoded below) : 0001
Processor Proximity Domain Valid : 1
[032h 0050   2]Reserved1 : 
[034h 0052   4] Attached Initiator Proximity Domain : 
[038h 0056   4]  Memory Proximity Domain : 
[03Ch 0060   4]Reserved2 : 
[040h 0064   8]Reserved3 : 
[048h 0072   8]Reserved4 : 

[050h 0080   2]   Structure Type :  [Memory Proximity Domain 
Attributes]
[052h 0082   2] Reserved : 
[054h 0084   4]   Length : 0028
[058h 0088   2]Flags (decoded below) : 0001
Processor Proximity Domain Valid : 1
[05Ah 0090   2]Reserved1 : 
[05Ch 0092   4] Attached Initiator Proximity Domain : 0001
[060h 0096   4]  Memory Proximity Domain : 0001
[064h 0100   4]Reserved2 : 
[068h 0104   8]Reserved3 : 
[070h 0112   8]Reserved4 : 

[078h 0120   2]   Structure Type :  [Memory Proximity Domain 
Attributes]
[07Ah 0122   2] Reserved : 
[07Ch 0124   4]   Length : 0028
[080h 0128   2]Flags (decoded below) : 
Processor Proximity Domain Valid : 0
[082h 0130   2]Reserved1 : 
[084h 0132   4] Attached Initiator Proximity Domain : 0080
[088h 0136   4]  Memory Proximity Domain : 0002
[08Ch 0140   4]Reserved2 : 
[090h 0144   8]Reserved3 : 
[098h 0152   8]Reserved4 : 

[0A0h 0160   2]   Structure Type : 0001 [System Locality Latency 
and Bandwidth Information]
[0A2h 0162   2] Reserved : 
[0A4h 0164   4]   Length : 0040
[0A8h 0168   1]Flags (decoded below) : 00
Memory Hierarchy : 0
[0A9h 0169   1]Data Type : 00
[0AAh 0170   2]Reserved1 : 
[0ACh 0172   4] Initiator Proximity Domains # : 0002
[0B0h 0176   4]   Target Proximity Domains # : 0003
[0B4h 0180   4]Reserved2 : 
[0B8h 0184   8]  Entry Base Unit : 2710
[0C0h 0192   4] Initiator Proximity Domain List : 
[0C4h 0196   4] Initiator Proximity Domain List : 0001
[0C8h 0200   4] Target Proximity Domain List : 
[0CCh 0204   4] Target Proximity Domain List : 0001
[0D0h 0208   4] Target Proximity Domain List : 0002
[0D4h 0212   2]Entry : 0001
[0D6h 0214   2]Entry : 0002
[0D8h 0216   2]Entry : 0003
[0DAh 0218   2]Entry : 0002
[0DCh 0220   2]Entry : 0001
[0DEh 0222   2]Entry : 0003

[0E0h 0224   2]   Structure Type : 0001 [System Locality Latency 
and Bandwidth Information]
[0E2h 0226   2] Reserved : 
[0E4h 0228   4]   Length : 0040
[0E8h 0232   1]Flags (decoded below) : 00
Memory Hierarchy : 0
[0E9h 0233   1]Data Type : 03
[0EAh 0234   2]Reserved1 : 
[0ECh 0236   4] Initiator Proximity Domains # : 0002
[0F0h 0240   4]   Target Proximity Domains # : 0003
[0F4h 0244   4]Reserved2 : 
[0F8h 0248   8]  Entry Base Unit : 0001
[100h 0256   4] Initiator Proximity Domain List : 
[104h 0260   4] Initiator Proximity Domain List : 0001
[108h 0264   4] Target Proximity Domain List : 
[10Ch 0268   4] Target Proximity Domain List : 0001
[110h 0272   4] Target Proximity Domain List : 0002
[114h 0276   2]Entry : 000A
[116h 0278   2]Entry : 0005
[118h 0280   2]Entry : 0001
[11Ah 0282   2]Entry : 0005
[11Ch 0284   2]   

[PATCH v3 2/8] tests: acpi: add and whitelist *.hmat-noinitiator expected blobs

2022-10-27 Thread Hesham Almatary via
From: Brice Goglin 

.. which will be used by follow up hmat-noinitiator test-case.

Signed-off-by: Brice Goglin 
Signed-off-by: Hesham Almatary 
---
 tests/data/acpi/q35/APIC.acpihmat-noinitiator | 0
 tests/data/acpi/q35/DSDT.acpihmat-noinitiator | 0
 tests/data/acpi/q35/HMAT.acpihmat-noinitiator | 0
 tests/data/acpi/q35/SRAT.acpihmat-noinitiator | 0
 tests/qtest/bios-tables-test-allowed-diff.h   | 4 
 5 files changed, 4 insertions(+)
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat-noinitiator
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat-noinitiator
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat-noinitiator
 create mode 100644 tests/data/acpi/q35/SRAT.acpihmat-noinitiator

diff --git a/tests/data/acpi/q35/APIC.acpihmat-noinitiator 
b/tests/data/acpi/q35/APIC.acpihmat-noinitiator
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/DSDT.acpihmat-noinitiator 
b/tests/data/acpi/q35/DSDT.acpihmat-noinitiator
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/HMAT.acpihmat-noinitiator 
b/tests/data/acpi/q35/HMAT.acpihmat-noinitiator
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/SRAT.acpihmat-noinitiator 
b/tests/data/acpi/q35/SRAT.acpihmat-noinitiator
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..245fa66bcc 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,5 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/APIC.acpihmat-noinitiator",
+"tests/data/acpi/q35/DSDT.acpihmat-noinitiator",
+"tests/data/acpi/q35/HMAT.acpihmat-noinitiator",
+"tests/data/acpi/q35/SRAT.acpihmat-noinitiator",
-- 
2.25.1




[PATCH v3 5/8] tests: Add HMAT AArch64/virt empty table files

2022-10-27 Thread Hesham Almatary via
Signed-off-by: Hesham Almatary 
---
 tests/data/acpi/virt/APIC.acpihmatvirt  | 0
 tests/data/acpi/virt/DSDT.acpihmatvirt  | 0
 tests/data/acpi/virt/HMAT.acpihmatvirt  | 0
 tests/data/acpi/virt/PPTT.acpihmatvirt  | 0
 tests/data/acpi/virt/SRAT.acpihmatvirt  | 0
 tests/qtest/bios-tables-test-allowed-diff.h | 5 +
 6 files changed, 5 insertions(+)
 create mode 100644 tests/data/acpi/virt/APIC.acpihmatvirt
 create mode 100644 tests/data/acpi/virt/DSDT.acpihmatvirt
 create mode 100644 tests/data/acpi/virt/HMAT.acpihmatvirt
 create mode 100644 tests/data/acpi/virt/PPTT.acpihmatvirt
 create mode 100644 tests/data/acpi/virt/SRAT.acpihmatvirt

diff --git a/tests/data/acpi/virt/APIC.acpihmatvirt 
b/tests/data/acpi/virt/APIC.acpihmatvirt
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/virt/DSDT.acpihmatvirt 
b/tests/data/acpi/virt/DSDT.acpihmatvirt
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/virt/HMAT.acpihmatvirt 
b/tests/data/acpi/virt/HMAT.acpihmatvirt
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/virt/PPTT.acpihmatvirt 
b/tests/data/acpi/virt/PPTT.acpihmatvirt
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/virt/SRAT.acpihmatvirt 
b/tests/data/acpi/virt/SRAT.acpihmatvirt
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..4f849715bd 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,6 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/APIC.acpihmatvirt",
+"tests/data/acpi/virt/DSDT.acpihmatvirt",
+"tests/data/acpi/virt/HMAT.acpihmatvirt",
+"tests/data/acpi/virt/PPTT.acpihmatvirt",
+"tests/data/acpi/virt/SRAT.acpihmatvirt",
-- 
2.25.1




[PATCH v2 4/6] target/openrisc: Use cpu_unwind_state_data for mfspr

2022-10-27 Thread Richard Henderson
Since we do not plan to exit, use cpu_unwind_state_data
and extract exactly the data requested.

Signed-off-by: Richard Henderson 
---
 target/openrisc/sys_helper.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/openrisc/sys_helper.c b/target/openrisc/sys_helper.c
index a3508e421d..dde2fa1623 100644
--- a/target/openrisc/sys_helper.c
+++ b/target/openrisc/sys_helper.c
@@ -199,6 +199,7 @@ target_ulong HELPER(mfspr)(CPUOpenRISCState *env, 
target_ulong rd,
target_ulong spr)
 {
 #ifndef CONFIG_USER_ONLY
+uint64_t data[TARGET_INSN_START_WORDS];
 MachineState *ms = MACHINE(qdev_get_machine());
 OpenRISCCPU *cpu = env_archcpu(env);
 CPUState *cs = env_cpu(env);
@@ -232,14 +233,20 @@ target_ulong HELPER(mfspr)(CPUOpenRISCState *env, 
target_ulong rd,
 return env->evbar;
 
 case TO_SPR(0, 16): /* NPC (equals PC) */
-cpu_restore_state(cs, GETPC(), false);
+if (cpu_unwind_state_data(cs, GETPC(), data)) {
+return data[0];
+}
 return env->pc;
 
 case TO_SPR(0, 17): /* SR */
 return cpu_get_sr(env);
 
 case TO_SPR(0, 18): /* PPC */
-cpu_restore_state(cs, GETPC(), false);
+if (cpu_unwind_state_data(cs, GETPC(), data)) {
+if (data[1] & 2) {
+return data[0] - 4;
+}
+}
 return env->ppc;
 
 case TO_SPR(0, 32): /* EPCR */
-- 
2.34.1




[PATCH v3 7/8] tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators

2022-10-27 Thread Hesham Almatary via
This patch imitates the "tests: acpi: q35: add test for hmat nodes
without initiators" commit to test numa nodes with different HMAT
attributes, but on AArch64/virt.

Tested with:
qemu-system-aarch64 -accel tcg \
-machine virt,hmat=on,gic-version=3  -cpu cortex-a57 \
-bios qemu-efi-aarch64/QEMU_EFI.fd \
-kernel Image -append "root=/dev/vda2 console=ttyAMA0" \
-drive if=virtio,file=aarch64.qcow2,format=qcow2,id=hd \
-device virtio-rng-pci \
-net user,hostfwd=tcp::10022-:22 -net nic \
-device intel-hda -device hda-duplex -nographic \
-smp 4 \
-m 3G \
-object memory-backend-ram,size=1G,id=ram0 \
-object memory-backend-ram,size=1G,id=ram1 \
-object memory-backend-ram,size=1G,id=ram2 \
-numa node,nodeid=0,memdev=ram0,cpus=0-1 \
-numa node,nodeid=1,memdev=ram1,cpus=2-3 \
-numa node,nodeid=2,memdev=ram2 \
-numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=10
 \
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10485760
 \
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=20
 \
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5242880
 \
-numa 
hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,latency=30
 \
-numa 
hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=1048576
 \
-numa 
hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=20
 \
-numa 
hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5242880
 \
-numa 
hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=10
 \
-numa 
hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10485760
 \
-numa 
hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,latency=30
 \
-numa 
hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,bandwidth=1048576

Signed-off-by: Hesham Almatary 
---
 tests/qtest/bios-tables-test.c | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 02fe59fbf8..e805b3efec 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1461,6 +1461,63 @@ static void test_acpi_piix4_tcg_acpi_hmat(void)
 test_acpi_tcg_acpi_hmat(MACHINE_PC);
 }
 
+static void test_acpi_virt_tcg_acpi_hmat(void)
+{
+test_data data = {
+.machine = "virt",
+.tcg_only = true,
+.uefi_fl1 = "pc-bios/edk2-aarch64-code.fd",
+.uefi_fl2 = "pc-bios/edk2-arm-vars.fd",
+.cd = "tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2",
+.ram_start = 0x4000ULL,
+.scan_len = 128ULL * 1024 * 1024,
+};
+
+data.variant = ".acpihmatvirt";
+
+test_acpi_one(" -machine hmat=on"
+  " -cpu cortex-a57"
+  " -smp 4,sockets=2"
+  " -m 256M"
+  " -object memory-backend-ram,size=64M,id=ram0"
+  " -object memory-backend-ram,size=64M,id=ram1"
+  " -object memory-backend-ram,size=128M,id=ram2"
+  " -numa node,nodeid=0,memdev=ram0"
+  " -numa node,nodeid=1,memdev=ram1"
+  " -numa node,nodeid=2,memdev=ram2"
+  " -numa cpu,node-id=0,socket-id=0"
+  " -numa cpu,node-id=0,socket-id=0"
+  " -numa cpu,node-id=1,socket-id=1"
+  " -numa cpu,node-id=1,socket-id=1"
+  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+  "data-type=access-latency,latency=10"
+  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=10485760"
+  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+  "data-type=access-latency,latency=20"
+  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=5242880"
+  " -numa hmat-lb,initiator=0,target=2,hierarchy=memory,"
+  "data-type=access-latency,latency=30"
+  " -numa hmat-lb,initiator=0,target=2,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=1048576"
+  " -numa hmat-lb,initiator=1,target=0,hierarchy=memory,"
+  "data-type=access-latency,latency=20"
+  " -numa hmat-lb,initiator=1,target=0,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=5242880"
+  " -numa hmat-lb,initiator=1,target=1,hierarchy=memory,"
+  "data-type=access-latency,latency=10"
+  " -numa hmat-lb,initiator=1,target=1,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=10485760"
+  " -numa hmat-lb,initiator=1,targe

Re: [PATCH v10 7/9] s390x/cpu topology: add max_threads machine class attribute

2022-10-27 Thread Cédric Le Goater

Hello Pierre,

On 10/12/22 18:21, Pierre Morel wrote:

The S390 CPU topology accepts the smp.threads argument while
in reality it does not effectively allow multthreading.

Let's keep this behavior for machines older than 7.3 and
refuse to use threads in newer machines until multithreading
is really proposed to the guest by the machine.


This change is unrelated to the rest of the series and we could merge it
for 7.2. We still have time for it.

Thanks,

C.

 

Signed-off-by: Pierre Morel 
---
  include/hw/s390x/s390-virtio-ccw.h |  1 +
  hw/s390x/s390-virtio-ccw.c | 10 ++
  2 files changed, 11 insertions(+)

diff --git a/include/hw/s390x/s390-virtio-ccw.h 
b/include/hw/s390x/s390-virtio-ccw.h
index 6c4b4645fc..319dfac1bb 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -48,6 +48,7 @@ struct S390CcwMachineClass {
  bool css_migration_enabled;
  bool hpage_1m_allowed;
  bool topology_allowed;
+int max_threads;
  };
  
  /* runtime-instrumentation allowed by the machine */

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 3a13fad4df..d6ce31d168 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -85,8 +85,15 @@ out:
  static void s390_init_cpus(MachineState *machine)
  {
  MachineClass *mc = MACHINE_GET_CLASS(machine);
+S390CcwMachineClass *s390mc = S390_CCW_MACHINE_CLASS(mc);
  int i;
  
+if (machine->smp.threads > s390mc->max_threads) {

+error_report("S390 does not support more than %d threads.",
+ s390mc->max_threads);
+exit(1);
+}
+
  /* initialize possible_cpus */
  mc->possible_cpu_arch_ids(machine);
  
@@ -617,6 +624,7 @@ static void ccw_machine_class_init(ObjectClass *oc, void *data)

  s390mc->css_migration_enabled = true;
  s390mc->hpage_1m_allowed = true;
  s390mc->topology_allowed = true;
+s390mc->max_threads = 1;
  mc->init = ccw_init;
  mc->reset = s390_machine_reset;
  mc->block_default_type = IF_VIRTIO;
@@ -887,12 +895,14 @@ static void ccw_machine_7_2_class_options(MachineClass 
*mc)
  S390CcwMachineClass *s390mc = S390_CCW_MACHINE_CLASS(mc);
  static GlobalProperty compat[] = {
  { TYPE_S390_CPU_TOPOLOGY, "topology-allowed", "off", },
+{ TYPE_S390_CPU_TOPOLOGY, "max_threads", "off", },
  };
  
  ccw_machine_7_3_class_options(mc);

  compat_props_add(mc->compat_props, hw_compat_7_2, hw_compat_7_2_len);
  compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
  s390mc->topology_allowed = false;
+s390mc->max_threads = S390_MAX_CPUS;
  }
  DEFINE_CCW_MACHINE(7_2, "7.2", false);
  





Re: [PATCH] target/hppa: Fix fid instruction emulation

2022-10-27 Thread Helge Deller

On 10/27/22 11:48, Richard Henderson wrote:

On 10/27/22 16:31, Helge Deller wrote:

The fid instruction (Floating-Point Identify) puts the FPU model and
revision into the Status Register. Since those values shouldn't be 0,
store values there which a PCX-L2 (for 32-bit) or a PCX-W2 (for 64-bit)
would return.

Signed-off-by: Helge Deller 

diff --git a/target/hppa/insns.decode b/target/hppa/insns.decode
index c7a7e997f9..3ba5f9885a 100644
--- a/target/hppa/insns.decode
+++ b/target/hppa/insns.decode
@@ -388,10 +388,8 @@ fmpyfadd_d  101110 rm1:5 rm2:5 ... 0 1 ..0 0 0 neg:1 
t:5    ra3=%rc32

  # Floating point class 0

-# FID.  With r = t = 0, which via fcpy puts 0 into fr0.
-# This is machine/revision = 0, which is reserved for simulator.


Is there something in particular for which this is failing?
Per the manual, 0 means simulator, which we are.


I can't say yet if it's really failing.
I noticed it while trying to get MPE/iX installed in a hppa guest.
In some doc (sorry don't know which one right now) I saw that 0/0
values were illegal values, which is why I changed the values to
become those of a PA7300LC CPU from a  B160L machine (which
we currently emulate with the hppa SeaBIOS).


So far we haven't identified as a particular cpu, have we?


Not really, but as just mentioned the SeaBIOS reports back a B160L.
If we support more machines this needs to be adjusted.


+static bool trans_fid_f(DisasContext *ctx, arg_fid_f *a)
+{
+    nullify_over(ctx);
+#if TARGET_REGISTER_BITS == 64
+    save_frd(0, tcg_const_i64(0x130800)); /* PA8700 (PCX-W2) */
+#else
+    save_frd(0, tcg_const_i64(0x0f0800)); /* PA7300LC (PCX-L2) */
+#endif
+    return nullify_end(ctx);
+}


Missing ULL suffix.


Will fix.

Helge



[PATCH] vl: change PID file path resolve error to warning

2022-10-27 Thread Fiona Ebner
Commit 85c4bf8aa6 ("vl: Unlink absolute PID file path") made it a
critical error when the PID file path cannot be resolved. Before this
commit, it was possible to invoke QEMU when the PID file was a file
created with mkstemp that was already unlinked at the time of the
invocation. There might be other similar scenarios.

It should not be a critical error when the PID file unlink notifier
can't be registered, because the path can't be resolved. Turn it into
a warning instead.

Fixes: 85c4bf8aa6 ("vl: Unlink absolute PID file path")
Reported-by: Dominik Csapak 
Suggested-by: Thomas Lamprecht 
Signed-off-by: Fiona Ebner 
---

For completeness, here is a reproducer based on our actual invocation
written in Rust (depends on the "nix" crate). It works fine with QEMU
7.0, but not anymore with 7.1.

use std::fs::File;
use std::io::Read;
use std::os::unix::io::{AsRawFd, FromRawFd};
use std::path::{Path, PathBuf};
use std::process::Command;

fn make_tmp_file>(path: P) -> (File, PathBuf) {
let path = path.as_ref();

let mut template = path.to_owned();
template.set_extension("tmp_XX");
match nix::unistd::mkstemp(&template) {
Ok((fd, path)) => (unsafe { File::from_raw_fd(fd) }, path),
Err(err) => panic!("mkstemp {:?} failed: {}", template, err),
}
}

fn main() -> Result<(), Box> {
let (mut pidfile, pid_path) = make_tmp_file("/tmp/unlinked.pid.tmp");
nix::unistd::unlink(&pid_path)?;

let mut qemu_cmd = Command::new("./qemu-system-x86_64");
qemu_cmd.args([
"-daemonize",
"-pidfile",
&format!("/dev/fd/{}", pidfile.as_raw_fd()),
]);

let res = qemu_cmd.spawn()?.wait_with_output()?;

if res.status.success() {
let mut pidstr = String::new();
pidfile.read_to_string(&mut pidstr)?;
println!("got PID {}", pidstr);
} else {
panic!("QEMU command unsuccessful");
}
Ok(())
}

 softmmu/vl.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index b464da25bc..10dfe773a7 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2432,10 +2432,9 @@ static void qemu_maybe_daemonize(const char *pid_file)
 
 pid_file_realpath = g_malloc0(PATH_MAX);
 if (!realpath(pid_file, pid_file_realpath)) {
-error_report("cannot resolve PID file path: %s: %s",
- pid_file, strerror(errno));
-unlink(pid_file);
-exit(1);
+warn_report("not removing PID file on exit: cannot resolve PID 
file"
+" path: %s: %s", pid_file, strerror(errno));
+return;
 }
 
 qemu_unlink_pidfile_notifier = (struct UnlinkPidfileNotifier) {
-- 
2.30.2





Re: [PATCH 3/3] vdpa: Expose VIRTIO_NET_F_STATUS unconditionally

2022-10-27 Thread Eugenio Perez Martin
On Thu, Oct 27, 2022 at 8:54 AM Jason Wang  wrote:
>
> On Thu, Oct 27, 2022 at 2:47 PM Eugenio Perez Martin
>  wrote:
> >
> > On Thu, Oct 27, 2022 at 6:32 AM Jason Wang  wrote:
> > >
> > >
> > > 在 2022/10/26 17:53, Eugenio Pérez 写道:
> > > > Now that qemu can handle and emulate it if the vdpa backend does not
> > > > support it we can offer it always.
> > > >
> > > > Signed-off-by: Eugenio Pérez 
> > >
> > >
> > > I may miss something but isn't more easier to simply remove the
> > > _F_STATUS from vdpa_feature_bits[]?
> > >
> >
> > How is that? if we remove it, the guest cannot ack it so it cannot
> > access the net status, isn't it?
>
> My understanding is that the bits stored in the vdpa_feature_bits[]
> are the features that must be explicitly supported by the vhost
> device.

(Non English native here, so maybe I don't get what you mean :) ) The
device may not support them. net simulator lacks some of them
actually, and it works.

>From what I see these are the only features that will be forwarded to
the guest as device_features. If it is not in the list, the feature
will be masked out, as if the device does not support it.

So now _F_STATUS it was forwarded only if the device supports it. If
we remove it from bit_mask, it will never be offered to the guest. But
we want to offer it always, since we will need it for
_F_GUEST_ANNOUNCE.

Things get more complex because we actually need to ack it back if the
device offers it, so the vdpa device can report link_down. We will
only emulate LINK_UP always in the case the device does not support
_F_STATUS.

> So if we remove _F_STATUS, Qemu vhost code won't validate if
> vhost-vdpa device has this support:
>
> uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
> uint64_t features)
> {
> const int *bit = feature_bits;
> while (*bit != VHOST_INVALID_FEATURE_BIT) {
> uint64_t bit_mask = (1ULL << *bit);
> if (!(hdev->features & bit_mask)) {
> features &= ~bit_mask;
> }
> bit++;
> }
> return features;
> }
>

Now maybe I'm the one missing something, but why is this not done as a
masking directly?

Instead of making feature_bits an array of ints, to declare it as a
uint64_t with the valid feature bits and simply return features &
feature_bits.

Thanks!

> Thanks
>
>
>
> >
> > The goal with this patch series is to let the guest access the status
> > always, even if the device doesn't support _F_STATUS.
> >
> > > Thanks
> > >
> > >
> > > > ---
> > > >   include/net/vhost-vdpa.h |  1 +
> > > >   hw/net/vhost_net.c   | 16 ++--
> > > >   net/vhost-vdpa.c |  3 +++
> > > >   3 files changed, 18 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/net/vhost-vdpa.h b/include/net/vhost-vdpa.h
> > > > index b81f9a6f2a..cfbcce6427 100644
> > > > --- a/include/net/vhost-vdpa.h
> > > > +++ b/include/net/vhost-vdpa.h
> > > > @@ -17,5 +17,6 @@
> > > >   struct vhost_net *vhost_vdpa_get_vhost_net(NetClientState *nc);
> > > >
> > > >   extern const int vdpa_feature_bits[];
> > > > +extern const uint64_t vhost_vdpa_net_added_feature_bits;
> > > >
> > > >   #endif /* VHOST_VDPA_H */
> > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > > > index d28f8b974b..7c15cc6e8f 100644
> > > > --- a/hw/net/vhost_net.c
> > > > +++ b/hw/net/vhost_net.c
> > > > @@ -109,10 +109,22 @@ static const int 
> > > > *vhost_net_get_feature_bits(struct vhost_net *net)
> > > >   return feature_bits;
> > > >   }
> > > >
> > > > +static uint64_t vhost_net_add_feature_bits(struct vhost_net *net)
> > > > +{
> > > > +if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > > +return vhost_vdpa_net_added_feature_bits;
> > > > +}
> > > > +
> > > > +return 0;
> > > > +}
> > > > +
> > > >   uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t 
> > > > features)
> > > >   {
> > > > -return vhost_get_features(&net->dev, 
> > > > vhost_net_get_feature_bits(net),
> > > > -features);
> > > > +uint64_t ret = vhost_get_features(&net->dev,
> > > > +  vhost_net_get_feature_bits(net),
> > > > +  features);
> > > > +
> > > > +return ret | vhost_net_add_feature_bits(net);
> > > >   }
> > > >   int vhost_net_get_config(struct vhost_net *net,  uint8_t *config,
> > > >uint32_t config_len)
> > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > index 6d64000202..24d2857593 100644
> > > > --- a/net/vhost-vdpa.c
> > > > +++ b/net/vhost-vdpa.c
> > > > @@ -99,6 +99,9 @@ static const uint64_t vdpa_svq_device_features =
> > > >   BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
> > > >   BIT_ULL(VIRTIO_NET_F_STANDBY);
> > > >
> > > > +const uint64_t vhost_vdpa_net_added_feature_bits =
> > > > +BIT_ULL(VIRTIO_NET_F_STATUS);
> > > > +
> > > >   VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
> > > >   {
> >

Re: [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking

2022-10-27 Thread Richard Henderson

On 10/26/22 23:36, Alex Bennée wrote:


Richard Henderson  writes:


Finish weaning user-only away from PageDesc.

Using an interval tree to track page permissions means that
we can represent very large regions efficiently.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/290
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/967
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1214
Signed-off-by: Richard Henderson 
---
  accel/tcg/internal.h   |   4 +-
  accel/tcg/tb-maint.c   |  20 +-
  accel/tcg/user-exec.c  | 614 ++---
  tests/tcg/multiarch/test-vma.c |  22 ++
  4 files changed, 451 insertions(+), 209 deletions(-)
  create mode 100644 tests/tcg/multiarch/test-vma.c

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 250f0daac9..c7e157d1cd 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -24,9 +24,7 @@
  #endif
  
  typedef struct PageDesc {

-#ifdef CONFIG_USER_ONLY
-unsigned long flags;
-#else
+#ifndef CONFIG_USER_ONLY
  QemuSpin lock;
  /* list of TBs intersecting this ram page */
  uintptr_t first_tb;
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 14e8e47a6a..694440cb4a 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -68,15 +68,23 @@ static void page_flush_tb(void)


  
  int page_get_flags(target_ulong address)

  {
-PageDesc *p;
+PageFlagsNode *p = pageflags_find(address, address);
  
-p = page_find(address >> TARGET_PAGE_BITS);

+/*
+ * See util/interval-tree.c re lockless lookups: no false positives but
+ * there are false negatives.  If we find nothing, retry with the mmap
+ * lock acquired.
+ */
  if (!p) {
-return 0;
+if (have_mmap_lock()) {
+return 0;
+}
+mmap_lock();
+p = pageflags_find(address, address);
+mmap_unlock();
+if (!p) {
+return 0;
+}
  }
  return p->flags;


To avoid the brain twisting following locks and multiple return legs how about 
this:

   int page_get_flags(target_ulong address)
   {
   PageFlagsNode *p = pageflags_find(address, address);

   /*
* See util/interval-tree.c re lockless lookups: no false positives but
* there are false negatives.  If we had the lock and found
* nothing we are done, otherwise retry with the mmap lock acquired.
*/
   if (have_mmap_lock()) {
   return p ? p->flags : 0;
   }

   mmap_lock();
   p = pageflags_find(address, address);
   mmap_unlock();

   return p ? p->flags : 0;
   }


I'm unwilling to put an expensive test like a function call (have_mmap_lock) before an 
inexpensive test like pointer != NULL.


I don't see what's so brain twisting about the code as is.  The lock tightly surrounds a 
single statement, with a couple of pointer tests.



+/*
+ * Test very large vma allocations.
+ * The qemu out-of-memory condition was within the mmap syscall itself.
+ * If the syscall actually returns with MAP_FAILED, the test succeeded.
+ */
+#include 
+
+int main()
+{
+int n = sizeof(size_t) == 4 ? 32 : 45;
+
+for (int i = 28; i < n; i++) {
+size_t l = (size_t)1 << i;
+void *p = mmap(0, l, PROT_NONE,
+   MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
+if (p == MAP_FAILED) {
+break;
+}
+munmap(p, l);
+}
+return 0;
+}


So is the failure mode here we actually seg or bus out?


SEGV or KILL (via oom) depending on the state of the system. If the host is *really* 
beefy, it may even complete but with an unreasonable timeout.


r~



Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory

2022-10-27 Thread Fuad Tabba
Hi,


On Tue, Oct 25, 2022 at 4:18 PM Chao Peng  wrote:
>
> From: "Kirill A. Shutemov" 
>
> Introduce 'memfd_restricted' system call with the ability to create
> memory areas that are restricted from userspace access through ordinary
> MMU operations (e.g. read/write/mmap). The memory content is expected to
> be used through a new in-kernel interface by a third kernel module.
>
> memfd_restricted() is useful for scenarios where a file descriptor(fd)
> can be used as an interface into mm but want to restrict userspace's
> ability on the fd. Initially it is designed to provide protections for
> KVM encrypted guest memory.
>
> Normally KVM uses memfd memory via mmapping the memfd into KVM userspace
> (e.g. QEMU) and then using the mmaped virtual address to setup the
> mapping in the KVM secondary page table (e.g. EPT). With confidential
> computing technologies like Intel TDX, the memfd memory may be encrypted
> with special key for special software domain (e.g. KVM guest) and is not
> expected to be directly accessed by userspace. Precisely, userspace
> access to such encrypted memory may lead to host crash so should be
> prevented.
>
> memfd_restricted() provides semantics required for KVM guest encrypted
> memory support that a fd created with memfd_restricted() is going to be
> used as the source of guest memory in confidential computing environment
> and KVM can directly interact with core-mm without the need to expose
> the memoy content into KVM userspace.
>
> KVM userspace is still in charge of the lifecycle of the fd. It should
> pass the created fd to KVM. KVM uses the new restrictedmem_get_page() to
> obtain the physical memory page and then uses it to populate the KVM
> secondary page table entries.
>
> The userspace restricted memfd can be fallocate-ed or hole-punched
> from userspace. When these operations happen, KVM can get notified
> through restrictedmem_notifier, it then gets chance to remove any
> mapped entries of the range in the secondary page tables.
>
> memfd_restricted() itself is implemented as a shim layer on top of real
> memory file systems (currently tmpfs). Pages in restrictedmem are marked
> as unmovable and unevictable, this is required for current confidential
> usage. But in future this might be changed.
>
> By default memfd_restricted() prevents userspace read, write and mmap.
> By defining new bit in the 'flags', it can be extended to support other
> restricted semantics in the future.
>
> The system call is currently wired up for x86 arch.
>
> Signed-off-by: Kirill A. Shutemov 
> Signed-off-by: Chao Peng 
> ---

Reviewed-by: Fuad Tabba 

And I'm working on porting to arm64 and testing V9.

Cheers,
/fuad


>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  include/linux/restrictedmem.h  |  62 ++
>  include/linux/syscalls.h   |   1 +
>  include/uapi/asm-generic/unistd.h  |   5 +-
>  include/uapi/linux/magic.h |   1 +
>  kernel/sys_ni.c|   3 +
>  mm/Kconfig |   4 +
>  mm/Makefile|   1 +
>  mm/restrictedmem.c | 250 +
>  10 files changed, 328 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/restrictedmem.h
>  create mode 100644 mm/restrictedmem.c
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
> b/arch/x86/entry/syscalls/syscall_32.tbl
> index 320480a8db4f..dc70ba90247e 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -455,3 +455,4 @@
>  448i386process_mreleasesys_process_mrelease
>  449i386futex_waitv sys_futex_waitv
>  450i386set_mempolicy_home_node sys_set_mempolicy_home_node
> +451i386memfd_restrictedsys_memfd_restricted
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
> b/arch/x86/entry/syscalls/syscall_64.tbl
> index c84d12608cd2..06516abc8318 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -372,6 +372,7 @@
>  448common  process_mreleasesys_process_mrelease
>  449common  futex_waitv sys_futex_waitv
>  450common  set_mempolicy_home_node sys_set_mempolicy_home_node
> +451common  memfd_restrictedsys_memfd_restricted
>
>  #
>  # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/include/linux/restrictedmem.h b/include/linux/restrictedmem.h
> new file mode 100644
> index ..9c37c3ea3180
> --- /dev/null
> +++ b/include/linux/restrictedmem.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _LINUX_RESTRICTEDMEM_H
> +
> +#include 
> +#include 
> +#include 
> +
> +struct restrictedmem_notifier;
> +
> +struct restrictedmem_notifier_ops {
> +   void (*invalidate_start)(struct restrictedmem_notifier *notifier,
> +

Re: [PATCH v3 2/2] qtests/arm: add some mte tests

2022-10-27 Thread Cornelia Huck
On Thu, Oct 27 2022, Thomas Huth  wrote:

> On 26/10/2022 18.05, Cornelia Huck wrote:
>> +qtest_add_data_func("/arm/max/query-cpu-model-expansion/tag-memory",
>> +NULL, mte_tests_tag_memory_on);
>
> Is it already possible to compile qemu-system-aarch64 with --disable-tcg ? 

Not yet, the code is too entangled... I tried a bit ago, but didn't make
much progress (on my todo list, but won't mind someone else doing it :)

> If so, I'd recommend a qtest_has_accel("tcg") here ... but apart from that:
>
> Acked-by: Thomas Huth 

Thanks!




Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit

2022-10-27 Thread Fuad Tabba
Hi,

On Tue, Oct 25, 2022 at 4:19 PM Chao Peng  wrote:
>
> This new KVM exit allows userspace to handle memory-related errors. It
> indicates an error happens in KVM at guest memory range [gpa, gpa+size).
> The flags includes additional information for userspace to handle the
> error. Currently bit 0 is defined as 'private memory' where '1'
> indicates error happens due to private memory access and '0' indicates
> error happens due to shared memory access.
>
> When private memory is enabled, this new exit will be used for KVM to
> exit to userspace for shared <-> private memory conversion in memory
> encryption usage. In such usage, typically there are two kind of memory
> conversions:
>   - explicit conversion: happens when guest explicitly calls into KVM
> to map a range (as private or shared), KVM then exits to userspace
> to perform the map/unmap operations.
>   - implicit conversion: happens in KVM page fault handler where KVM
> exits to userspace for an implicit conversion when the page is in a
> different state than requested (private or shared).
>
> Suggested-by: Sean Christopherson 
> Co-developed-by: Yu Zhang 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 
> ---

Reviewed-by: Fuad Tabba 

I have tested the V8 version of this patch on arm64/qemu, and
considering this hasn't changed:
Tested-by: Fuad Tabba 

Cheers,
/fuad



>  Documentation/virt/kvm/api.rst | 23 +++
>  include/uapi/linux/kvm.h   |  9 +
>  2 files changed, 32 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f3fa75649a78..975688912b8c 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6537,6 +6537,29 @@ array field represents return values. The userspace 
> should update the return
>  values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>  spec refer, https://github.com/riscv/riscv-sbi-doc.
>
> +::
> +
> +   /* KVM_EXIT_MEMORY_FAULT */
> +   struct {
> +  #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0)
> +   __u32 flags;
> +   __u32 padding;
> +   __u64 gpa;
> +   __u64 size;
> +   } memory;
> +
> +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has
> +encountered a memory error which is not handled by KVM kernel module and
> +userspace may choose to handle it. The 'flags' field indicates the memory
> +properties of the exit.
> +
> + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by
> +   private memory access when the bit is set. Otherwise the memory error is
> +   caused by shared memory access when the bit is clear.
> +
> +'gpa' and 'size' indicate the memory range the error occurs at. The userspace
> +may handle the error and return to KVM to retry the previous memory access.
> +
>  ::
>
>  /* KVM_EXIT_NOTIFY */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f1ae45c10c94..fa60b032a405 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -300,6 +300,7 @@ struct kvm_xen_exit {
>  #define KVM_EXIT_RISCV_SBI35
>  #define KVM_EXIT_RISCV_CSR36
>  #define KVM_EXIT_NOTIFY   37
> +#define KVM_EXIT_MEMORY_FAULT 38
>
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -538,6 +539,14 @@ struct kvm_run {
>  #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
> __u32 flags;
> } notify;
> +   /* KVM_EXIT_MEMORY_FAULT */
> +   struct {
> +#define KVM_MEMORY_EXIT_FLAG_PRIVATE   (1 << 0)
> +   __u32 flags;
> +   __u32 padding;
> +   __u64 gpa;
> +   __u64 size;
> +   } memory;
> /* Fix the size of the union. */
> char padding[256];
> };
> --
> 2.25.1
>



[PATCH] target/i386: Expand eflags updates inline

2022-10-27 Thread Richard Henderson
The helpers for reset_rf, cli, sti, clac, stac are
completely trivial; implement them inline.

Drop some nearby #if 0 code.

Signed-off-by: Richard Henderson 
---
 target/i386/helper.h|  5 -
 target/i386/tcg/cc_helper.c | 41 -
 target/i386/tcg/translate.c | 30 ++-
 3 files changed, 25 insertions(+), 51 deletions(-)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index 88143b2a24..b7de5429ef 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -56,13 +56,8 @@ DEF_HELPER_2(syscall, void, env, int)
 DEF_HELPER_2(sysret, void, env, int)
 #endif
 DEF_HELPER_FLAGS_2(pause, TCG_CALL_NO_WG, noreturn, env, int)
-DEF_HELPER_1(reset_rf, void, env)
 DEF_HELPER_FLAGS_3(raise_interrupt, TCG_CALL_NO_WG, noreturn, env, int, int)
 DEF_HELPER_FLAGS_2(raise_exception, TCG_CALL_NO_WG, noreturn, env, int)
-DEF_HELPER_1(cli, void, env)
-DEF_HELPER_1(sti, void, env)
-DEF_HELPER_1(clac, void, env)
-DEF_HELPER_1(stac, void, env)
 DEF_HELPER_3(boundw, void, env, tl, int)
 DEF_HELPER_3(boundl, void, env, tl, int)
 
diff --git a/target/i386/tcg/cc_helper.c b/target/i386/tcg/cc_helper.c
index cc7ea9e8b9..6227dbb30b 100644
--- a/target/i386/tcg/cc_helper.c
+++ b/target/i386/tcg/cc_helper.c
@@ -346,44 +346,3 @@ void helper_clts(CPUX86State *env)
 env->cr[0] &= ~CR0_TS_MASK;
 env->hflags &= ~HF_TS_MASK;
 }
-
-void helper_reset_rf(CPUX86State *env)
-{
-env->eflags &= ~RF_MASK;
-}
-
-void helper_cli(CPUX86State *env)
-{
-env->eflags &= ~IF_MASK;
-}
-
-void helper_sti(CPUX86State *env)
-{
-env->eflags |= IF_MASK;
-}
-
-void helper_clac(CPUX86State *env)
-{
-env->eflags &= ~AC_MASK;
-}
-
-void helper_stac(CPUX86State *env)
-{
-env->eflags |= AC_MASK;
-}
-
-#if 0
-/* vm86plus instructions */
-void helper_cli_vm(CPUX86State *env)
-{
-env->eflags &= ~VIF_MASK;
-}
-
-void helper_sti_vm(CPUX86State *env)
-{
-env->eflags |= VIF_MASK;
-if (env->eflags & VIP_MASK) {
-raise_exception_ra(env, EXCP0D_GPF, GETPC());
-}
-}
-#endif
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index e19d5c1c64..dc1fa9f9ed 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2745,6 +2745,26 @@ static void gen_reset_hflag(DisasContext *s, uint32_t 
mask)
 }
 }
 
+static void gen_set_eflags(DisasContext *s, target_ulong mask)
+{
+TCGv t = tcg_temp_new();
+
+tcg_gen_ld_tl(t, cpu_env, offsetof(CPUX86State, eflags));
+tcg_gen_ori_tl(t, t, mask);
+tcg_gen_st_tl(t, cpu_env, offsetof(CPUX86State, eflags));
+tcg_temp_free(t);
+}
+
+static void gen_reset_eflags(DisasContext *s, target_ulong mask)
+{
+TCGv t = tcg_temp_new();
+
+tcg_gen_ld_tl(t, cpu_env, offsetof(CPUX86State, eflags));
+tcg_gen_andi_tl(t, t, ~mask);
+tcg_gen_st_tl(t, cpu_env, offsetof(CPUX86State, eflags));
+tcg_temp_free(t);
+}
+
 /* Clear BND registers during legacy branches.  */
 static void gen_bnd_jmp(DisasContext *s)
 {
@@ -2775,7 +2795,7 @@ do_gen_eob_worker(DisasContext *s, bool inhibit, bool 
recheck_tf, bool jr)
 }
 
 if (s->base.tb->flags & HF_RF_MASK) {
-gen_helper_reset_rf(cpu_env);
+gen_reset_eflags(s, RF_MASK);
 }
 if (recheck_tf) {
 gen_helper_rechecking_single_step(cpu_env);
@@ -5501,12 +5521,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 #endif
 case 0xfa: /* cli */
 if (check_iopl(s)) {
-gen_helper_cli(cpu_env);
+gen_reset_eflags(s, IF_MASK);
 }
 break;
 case 0xfb: /* sti */
 if (check_iopl(s)) {
-gen_helper_sti(cpu_env);
+gen_set_eflags(s, IF_MASK);
 /* interruptions are enabled only the first insn after sti */
 gen_update_eip_next(s);
 gen_eob_inhibit_irq(s, true);
@@ -5788,7 +5808,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 || CPL(s) != 0) {
 goto illegal_op;
 }
-gen_helper_clac(cpu_env);
+gen_reset_eflags(s, AC_MASK);
 s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
@@ -5797,7 +5817,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 || CPL(s) != 0) {
 goto illegal_op;
 }
-gen_helper_stac(cpu_env);
+gen_set_eflags(s, AC_MASK);
 s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
-- 
2.34.1




Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory

2022-10-27 Thread Fuad Tabba
On Tue, Oct 25, 2022 at 4:18 PM Chao Peng  wrote:
>
> In memory encryption usage, guest memory may be encrypted with special
> key and can be accessed only by the guest itself. We call such memory
> private memory. It's valueless and sometimes can cause problem to allow
> userspace to access guest private memory. This new KVM memslot extension
> allows guest private memory being provided though a restrictedmem
> backed file descriptor(fd) and userspace is restricted to access the
> bookmarked memory in the fd.
>
> This new extension, indicated by the new flag KVM_MEM_PRIVATE, adds two
> additional KVM memslot fields restricted_fd/restricted_offset to allow
> userspace to instruct KVM to provide guest memory through restricted_fd.
> 'guest_phys_addr' is mapped at the restricted_offset of restricted_fd
> and the size is 'memory_size'.
>
> The extended memslot can still have the userspace_addr(hva). When use, a
> single memslot can maintain both private memory through restricted_fd
> and shared memory through userspace_addr. Whether the private or shared
> part is visible to guest is maintained by other KVM code.
>
> A restrictedmem_notifier field is also added to the memslot structure to
> allow the restricted_fd's backing store to notify KVM the memory change,
> KVM then can invalidate its page table entries.
>
> Together with the change, a new config HAVE_KVM_RESTRICTED_MEM is added
> and right now it is selected on X86_64 only. A KVM_CAP_PRIVATE_MEM is
> also introduced to indicate KVM support for KVM_MEM_PRIVATE.
>
> To make code maintenance easy, internally we use a binary compatible
> alias struct kvm_user_mem_region to handle both the normal and the
> '_ext' variants.
>
> Co-developed-by: Yu Zhang 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 

Reviewed-by: Fuad Tabba 

I have tested the V8 version of this patch on arm64/qemu (which has
the fix to copy_from_user included in this patch), and considering
this hasn't changed much:
Tested-by: Fuad Tabba 

Cheers,
/fuad



> ---
>  Documentation/virt/kvm/api.rst | 48 -
>  arch/x86/kvm/Kconfig   |  2 ++
>  arch/x86/kvm/x86.c |  2 +-
>  include/linux/kvm_host.h   | 13 +++--
>  include/uapi/linux/kvm.h   | 29 
>  virt/kvm/Kconfig   |  3 +++
>  virt/kvm/kvm_main.c| 49 --
>  7 files changed, 128 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..f3fa75649a78 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1319,7 +1319,7 @@ yet and must be cleared on entry.
>  :Capability: KVM_CAP_USER_MEMORY
>  :Architectures: all
>  :Type: vm ioctl
> -:Parameters: struct kvm_userspace_memory_region (in)
> +:Parameters: struct kvm_userspace_memory_region(_ext) (in)
>  :Returns: 0 on success, -1 on error
>
>  ::
> @@ -1332,9 +1332,18 @@ yet and must be cleared on entry.
> __u64 userspace_addr; /* start of the userspace allocated memory */
>};
>
> +  struct kvm_userspace_memory_region_ext {
> +   struct kvm_userspace_memory_region region;
> +   __u64 restricted_offset;
> +   __u32 restricted_fd;
> +   __u32 pad1;
> +   __u64 pad2[14];
> +  };
> +
>/* for kvm_memory_region::flags */
>#define KVM_MEM_LOG_DIRTY_PAGES  (1UL << 0)
>#define KVM_MEM_READONLY (1UL << 1)
> +  #define KVM_MEM_PRIVATE  (1UL << 2)
>
>  This ioctl allows the user to create, modify or delete a guest physical
>  memory slot.  Bits 0-15 of "slot" specify the slot id and this value
> @@ -1365,12 +1374,27 @@ It is recommended that the lower 21 bits of 
> guest_phys_addr and userspace_addr
>  be identical.  This allows large pages in the guest to be backed by large
>  pages in the host.
>
> -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
> -KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
> -writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
> -use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
> -to make a new slot read-only.  In this case, writes to this memory will be
> -posted to userspace as KVM_EXIT_MMIO exits.
> +kvm_userspace_memory_region_ext struct includes all fields of
> +kvm_userspace_memory_region struct, while also adds additional fields for 
> some
> +other features. See below description of flags field for more information.
> +It's recommended to use kvm_userspace_memory_region_ext in new userspace 
> code.
> +
> +The flags field supports following flags:
> +
> +- KVM_MEM_LOG_DIRTY_PAGES to instruct KVM to keep track of writes to memory
> +  within the slot.  For more details, see KVM_GET_DIRTY_LOG ioctl.
> +
> +- KVM_MEM_READONLY, if KVM_CAP_READONLY_MEM allows, to make a new slot
> +  read-only.  In this case, writes to this memory will be posted to 
> userspace as
> + 

Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry

2022-10-27 Thread Fuad Tabba
Hi,

On Tue, Oct 25, 2022 at 4:19 PM Chao Peng  wrote:
>
> Currently in mmu_notifier validate path, hva range is recorded and then
> checked against in the mmu_notifier_retry_hva() of the page fault path.
> However, for the to be introduced private memory, a page fault may not
> have a hva associated, checking gfn(gpa) makes more sense.
>
> For existing non private memory case, gfn is expected to continue to
> work. The only downside is when aliasing multiple gfns to a single hva,
> the current algorithm of checking multiple ranges could result in a much
> larger range being rejected. Such aliasing should be uncommon, so the
> impact is expected small.
>
> It also fixes a bug in kvm_zap_gfn_range() which has already been using

nit: Now it's kvm_unmap_gfn_range().

> gfn when calling kvm_mmu_invalidate_begin/end() while these functions
> accept hva in current code.
>
> Signed-off-by: Chao Peng 
> ---

Based on reading this code and my limited knowledge of the x86 MMU code:
Reviewed-by: Fuad Tabba 

Cheers,
/fuad


>  arch/x86/kvm/mmu/mmu.c   |  2 +-
>  include/linux/kvm_host.h | 18 +++-
>  virt/kvm/kvm_main.c  | 45 ++--
>  3 files changed, 39 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6f81539061d6..33b1aec44fb8 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4217,7 +4217,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
> return true;
>
> return fault->slot &&
> -  mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva);
> +  mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn);
>  }
>
>  static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault 
> *fault)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 739a7562a1f3..79e5cbc35fcf 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -775,8 +775,8 @@ struct kvm {
> struct mmu_notifier mmu_notifier;
> unsigned long mmu_invalidate_seq;
> long mmu_invalidate_in_progress;
> -   unsigned long mmu_invalidate_range_start;
> -   unsigned long mmu_invalidate_range_end;
> +   gfn_t mmu_invalidate_range_start;
> +   gfn_t mmu_invalidate_range_end;
>  #endif
> struct list_head devices;
> u64 manual_dirty_log_protect;
> @@ -1365,10 +1365,8 @@ void kvm_mmu_free_memory_cache(struct 
> kvm_mmu_memory_cache *mc);
>  void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
>  #endif
>
> -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
> - unsigned long end);
> -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start,
> -   unsigned long end);
> +void kvm_mmu_invalidate_begin(struct kvm *kvm, gfn_t start, gfn_t end);
> +void kvm_mmu_invalidate_end(struct kvm *kvm, gfn_t start, gfn_t end);
>
>  long kvm_arch_dev_ioctl(struct file *filp,
> unsigned int ioctl, unsigned long arg);
> @@ -1937,9 +1935,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, 
> unsigned long mmu_seq)
> return 0;
>  }
>
> -static inline int mmu_invalidate_retry_hva(struct kvm *kvm,
> +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm,
>unsigned long mmu_seq,
> -  unsigned long hva)
> +  gfn_t gfn)
>  {
> lockdep_assert_held(&kvm->mmu_lock);
> /*
> @@ -1949,8 +1947,8 @@ static inline int mmu_invalidate_retry_hva(struct kvm 
> *kvm,
>  * positives, due to shortcuts when handing concurrent invalidations.
>  */
> if (unlikely(kvm->mmu_invalidate_in_progress) &&
> -   hva >= kvm->mmu_invalidate_range_start &&
> -   hva < kvm->mmu_invalidate_range_end)
> +   gfn >= kvm->mmu_invalidate_range_start &&
> +   gfn < kvm->mmu_invalidate_range_end)
> return 1;
> if (kvm->mmu_invalidate_seq != mmu_seq)
> return 1;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8dace78a0278..09c9cdeb773c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -540,8 +540,7 @@ static void kvm_mmu_notifier_invalidate_range(struct 
> mmu_notifier *mn,
>
>  typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
>
> -typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,
> -unsigned long end);
> +typedef void (*on_lock_fn_t)(struct kvm *kvm, gfn_t start, gfn_t end);
>
>  typedef void (*on_unlock_fn_t)(struct kvm *kvm);
>
> @@ -628,7 +627,8 @@ static __always_inline int __kvm_handle_hva_range(struct 
> kvm *kvm,
> locked = true;
> KVM_MMU_LOCK(kvm);
> if (!IS_KVM_NULL_

Re: [PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE

2022-10-27 Thread Fuad Tabba
Hi,

On Tue, Oct 25, 2022 at 4:20 PM Chao Peng  wrote:
>
> Expose KVM_MEM_PRIVATE and memslot fields restricted_fd/offset to
> userspace. KVM register/unregister private memslot to fd-based
> memory backing store and responses to invalidation event from
> restrictedmem_notifier to zap the existing memory mappings in the
> secondary page table.
>
> Whether KVM_MEM_PRIVATE is actually exposed to userspace is determined
> by architecture code which can turn on it by overriding the default
> kvm_arch_has_private_mem().
>
> A 'kvm' reference is added in memslot structure since in
> restrictedmem_notifier callback we can only obtain a memslot reference
> but 'kvm' is needed to do the zapping.
>
> Co-developed-by: Yu Zhang 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 
> ---
>  include/linux/kvm_host.h |   3 +-
>  virt/kvm/kvm_main.c  | 174 +--
>  2 files changed, 171 insertions(+), 6 deletions(-)

Reviewed-by: Fuad Tabba 

Thanks,
/fuad


>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 69300fc6d572..e27d62c30484 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -246,7 +246,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>
>
> -#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || 
> defined(CONFIG_KVM_GENERIC_PRIVATE_MEM)
> +#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || 
> defined(CONFIG_HAVE_KVM_RESTRICTED_MEM)
>  struct kvm_gfn_range {
> struct kvm_memory_slot *slot;
> gfn_t start;
> @@ -583,6 +583,7 @@ struct kvm_memory_slot {
> struct file *restricted_file;
> loff_t restricted_offset;
> struct restrictedmem_notifier notifier;
> +   struct kvm *kvm;
>  };
>
>  static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot 
> *slot)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 13a37b4d9e97..dae6a2c196ad 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1028,6 +1028,111 @@ static int kvm_vm_ioctl_set_mem_attr(struct kvm *kvm, 
> gpa_t gpa, gpa_t size,
>  }
>  #endif /* CONFIG_KVM_GENERIC_PRIVATE_MEM */
>
> +#ifdef CONFIG_HAVE_KVM_RESTRICTED_MEM
> +static bool restrictedmem_range_is_valid(struct kvm_memory_slot *slot,
> +pgoff_t start, pgoff_t end,
> +gfn_t *gfn_start, gfn_t *gfn_end)
> +{
> +   unsigned long base_pgoff = slot->restricted_offset >> PAGE_SHIFT;
> +
> +   if (start > base_pgoff)
> +   *gfn_start = slot->base_gfn + start - base_pgoff;
> +   else
> +   *gfn_start = slot->base_gfn;
> +
> +   if (end < base_pgoff + slot->npages)
> +   *gfn_end = slot->base_gfn + end - base_pgoff;
> +   else
> +   *gfn_end = slot->base_gfn + slot->npages;
> +
> +   if (*gfn_start >= *gfn_end)
> +   return false;
> +
> +   return true;
> +}
> +
> +static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier 
> *notifier,
> +  pgoff_t start, pgoff_t end)
> +{
> +   struct kvm_memory_slot *slot = container_of(notifier,
> +   struct kvm_memory_slot,
> +   notifier);
> +   struct kvm *kvm = slot->kvm;
> +   gfn_t gfn_start, gfn_end;
> +   struct kvm_gfn_range gfn_range;
> +   int idx;
> +
> +   if (!restrictedmem_range_is_valid(slot, start, end,
> +   &gfn_start, &gfn_end))
> +   return;
> +
> +   idx = srcu_read_lock(&kvm->srcu);
> +   KVM_MMU_LOCK(kvm);
> +
> +   kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end);
> +
> +   gfn_range.start = gfn_start;
> +   gfn_range.end = gfn_end;
> +   gfn_range.slot = slot;
> +   gfn_range.pte = __pte(0);
> +   gfn_range.may_block = true;
> +
> +   if (kvm_unmap_gfn_range(kvm, &gfn_range))
> +   kvm_flush_remote_tlbs(kvm);
> +
> +   KVM_MMU_UNLOCK(kvm);
> +   srcu_read_unlock(&kvm->srcu, idx);
> +}
> +
> +static void kvm_restrictedmem_invalidate_end(struct restrictedmem_notifier 
> *notifier,
> +pgoff_t start, pgoff_t end)
> +{
> +   struct kvm_memory_slot *slot = container_of(notifier,
> +   struct kvm_memory_slot,
> +   notifier);
> +   struct kvm *kvm = slot->kvm;
> +   gfn_t gfn_start, gfn_end;
> +
> +   if (!restrictedmem_range_is_valid(slot, start, end,
> +   &gfn_start, &gfn_end))
> +   return;
> +
> +   KVM_MMU_LOCK(kvm);
> +   kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end);
> +   KVM_MMU_UNLOCK(kvm);
> +}
> +
> +static struct restrictedmem_notifier_ops kvm_restrictedmem_notifier_ops = {
> +   .invalida

Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions

2022-10-27 Thread Fuad Tabba
Hi,

On Tue, Oct 25, 2022 at 4:19 PM Chao Peng  wrote:
>
> Introduce generic private memory register/unregister by reusing existing
> SEV ioctls KVM_MEMORY_ENCRYPT_{UN,}REG_REGION. It differs from SEV case
> by treating address in the region as gpa instead of hva. Which cases
> should these ioctls go is determined by the kvm_arch_has_private_mem().
> Architecture which supports KVM_PRIVATE_MEM should override this function.
>
> KVM internally defaults all guest memory as private memory and maintain
> the shared memory in 'mem_attr_array'. The above ioctls operate on this
> field and unmap existing mappings if any.
>
> Signed-off-by: Chao Peng 
> ---

Reviewed-by: Fuad Tabba 

Cheers,
/fuad


>  Documentation/virt/kvm/api.rst |  17 ++-
>  arch/x86/kvm/Kconfig   |   1 +
>  include/linux/kvm_host.h   |  10 +-
>  virt/kvm/Kconfig   |   4 +
>  virt/kvm/kvm_main.c| 227 +
>  5 files changed, 198 insertions(+), 61 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 975688912b8c..08253cf498d1 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4717,10 +4717,19 @@ Documentation/virt/kvm/x86/amd-memory-encryption.rst.
>  This ioctl can be used to register a guest memory region which may
>  contain encrypted data (e.g. guest RAM, SMRAM etc).
>
> -It is used in the SEV-enabled guest. When encryption is enabled, a guest
> -memory region may contain encrypted data. The SEV memory encryption
> -engine uses a tweak such that two identical plaintext pages, each at
> -different locations will have differing ciphertexts. So swapping or
> +Currently this ioctl supports registering memory regions for two usages:
> +private memory and SEV-encrypted memory.
> +
> +When private memory is enabled, this ioctl is used to register guest private
> +memory region and the addr/size of kvm_enc_region represents guest physical
> +address (GPA). In this usage, this ioctl zaps the existing guest memory
> +mappings in KVM that fallen into the region.
> +
> +When SEV-encrypted memory is enabled, this ioctl is used to register guest
> +memory region which may contain encrypted data for a SEV-enabled guest. The
> +addr/size of kvm_enc_region represents userspace address (HVA). The SEV
> +memory encryption engine uses a tweak such that two identical plaintext 
> pages,
> +each at different locations will have differing ciphertexts. So swapping or
>  moving ciphertext of those pages will not result in plaintext being
>  swapped. So relocating (or migrating) physical backing pages for the SEV
>  guest will require some additional steps.
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 8d2bd455c0cd..73fdfa429b20 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -51,6 +51,7 @@ config KVM
> select HAVE_KVM_PM_NOTIFIER if PM
> select HAVE_KVM_RESTRICTED_MEM if X86_64
> select RESTRICTEDMEM if HAVE_KVM_RESTRICTED_MEM
> +   select KVM_GENERIC_PRIVATE_MEM if HAVE_KVM_RESTRICTED_MEM
> help
>   Support hosting fully virtualized guest machines using hardware
>   virtualization extensions.  You will need a fairly recent
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 79e5cbc35fcf..4ce98fa0153c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -245,7 +245,8 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t 
> cr2_or_gpa,
>  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>
> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> +
> +#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || 
> defined(CONFIG_KVM_GENERIC_PRIVATE_MEM)
>  struct kvm_gfn_range {
> struct kvm_memory_slot *slot;
> gfn_t start;
> @@ -254,6 +255,9 @@ struct kvm_gfn_range {
> bool may_block;
>  };
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> +#endif
> +
> +#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> @@ -794,6 +798,9 @@ struct kvm {
> struct notifier_block pm_notifier;
>  #endif
> char stats_id[KVM_STATS_NAME_SIZE];
> +#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
> +   struct xarray mem_attr_array;
> +#endif
>  };
>
>  #define kvm_err(fmt, ...) \
> @@ -1453,6 +1460,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu 
> *vcpu);
>  int kvm_arch_post_init_vm(struct kvm *kvm);
>  void kvm_arch_pre_destroy_vm(struct kvm *kvm);
>  int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> +bool kvm_arch_has_private_mem(struct kvm *kvm);
>
>  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
>  /*
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 9ff164c7e0cc..69ca59e82149 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -89,3 +89,7 @@ co

Re: [PATCH v4 0/3] MIPS Bootloader helper

2022-10-27 Thread Jiaxun Yang



> 2022年10月26日 20:18,Philippe Mathieu-Daudé  写道:
> 
> This is a respin of Jiaxun v3 [1] addressing the semihosting review
> comment [2].
> 
> [1] 
> https://lore.kernel.org/qemu-devel/20210127065424.114125-1-jiaxun.y...@flygoat.com/
> [2] 
> https://lore.kernel.org/qemu-devel/5a22bbe1-5023-6fc3-a41b-8d72ec2bb...@flygoat.com/

For the series:

Tested-by: Jiaxun Yang 
Reviewed-by: Jiaxun Yang 

I thought this series was committed in whole.. Just forgot that there are still 
something remaining :-)

Thanks
- Jiaxun


> 
> *** BLURB HERE ***
> 
> Jiaxun Yang (2):
>  hw/mips: Use bl_gen_kernel_jump to generate bootloaders
>  hw/mips/malta: Use bootloader helper to set BAR registers
> 
> Philippe Mathieu-Daudé (1):
>  hw/mips/bootloader: Allow bl_gen_jump_kernel to optionally set
>register
> 
> hw/mips/bootloader.c |  28 ++--
> hw/mips/boston.c |   5 +-
> hw/mips/fuloong2e.c  |   8 ++-
> hw/mips/malta.c  | 122 ++-
> include/hw/mips/bootloader.h |   8 ++-
> 5 files changed, 86 insertions(+), 85 deletions(-)
> 
> -- 
> 2.37.3
> 

---
Jiaxun Yang




Re: [PULL 00/30] target-arm queue

2022-10-27 Thread Peter Maydell
On Wed, 26 Oct 2022 at 15:52, Stefan Hajnoczi  wrote:
>
> On Tue, 25 Oct 2022 at 12:51, Peter Maydell  wrote:
> > target-arm queue:
> >  * Implement FEAT_E0PD
> >  * Implement FEAT_HAFDBS
>
> A second CI failure:

> libqemu-aarch64-softmmu.fa.p/target_arm_ptw.c.o -MF
> libqemu-aarch64-softmmu.fa.p/target_arm_ptw.c.o.d -o
> libqemu-aarch64-softmmu.fa.p/target_arm_ptw.c.o -c ../target/arm/ptw.c
> ../target/arm/ptw.c: In function ‘arm_casq_ptw’:
> ../target/arm/ptw.c:449:19: error: implicit declaration of function
> ‘qemu_mutex_iothread_locked’; did you mean ‘qemu_mutex_trylock’?
> [-Werror=implicit-function-declaration]
> 449 | bool locked = qemu_mutex_iothread_locked();
> | ^~
> | qemu_mutex_trylock

Oops, sorry about the CI failures. The windows one is an accidental
use of PROT_WRITE when PAGE_WRITE was intended; this one's a missing
include of main-loop.h. I'm not sure why it doesn't show up on my
system -- I guess we're dragging in main-loop.h via some other
header somehow.

thanks
-- PMM



Re: [PATCH 4/9] target/s390x: Use Int128 for return from CKSM

2022-10-27 Thread Ilya Leoshkevich
On Fri, Oct 21, 2022 at 05:30:01PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 
> ---
>  target/s390x/helper.h | 2 +-
>  target/s390x/tcg/mem_helper.c | 7 +++
>  target/s390x/tcg/translate.c  | 6 --
>  3 files changed, 8 insertions(+), 7 deletions(-)

Acked-by: Ilya Leoshkevich 



Re: [PATCH 5/9] target/s390x: Use Int128 for return from TRE

2022-10-27 Thread Ilya Leoshkevich
On Fri, Oct 21, 2022 at 05:30:02PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 
> ---
>  target/s390x/helper.h | 2 +-
>  target/s390x/tcg/mem_helper.c | 7 +++
>  target/s390x/tcg/translate.c  | 7 +--
>  3 files changed, 9 insertions(+), 7 deletions(-)

Acked-by: Ilya Leoshkevich 



Re: [PATCH v2 1/6] accel/tcg: Introduce cpu_unwind_state_data

2022-10-27 Thread Claudio Fontana
On 10/27/22 12:02, Richard Henderson wrote:
> Add a way to examine the unwind data without actually
> restoring the data back into env.
> 
> Signed-off-by: Richard Henderson 
> ---
>  accel/tcg/internal.h  |  4 +--
>  include/exec/exec-all.h   | 21 ---
>  accel/tcg/translate-all.c | 74 ++-
>  3 files changed, 68 insertions(+), 31 deletions(-)
> 
> diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
> index 1227bb69bd..9c06b320b7 100644
> --- a/accel/tcg/internal.h
> +++ b/accel/tcg/internal.h
> @@ -106,8 +106,8 @@ void tb_reset_jump(TranslationBlock *tb, int n);
>  TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
> tb_page_addr_t phys_page2);
>  bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
> -int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
> -  uintptr_t searched_pc, bool reset_icount);
> +void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
> +   uintptr_t host_pc, bool reset_icount);
>  
>  /* Return the current PC from CPU, which may be cached in TB. */
>  static inline target_ulong log_pc(CPUState *cpu, const TranslationBlock *tb)
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e948992a80..7d851f5907 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -39,20 +39,33 @@ typedef ram_addr_t tb_page_addr_t;
>  #define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
>  #endif
>  
> +/**
> + * cpu_unwind_state_data:
> + * @cpu: the cpu context
> + * @host_pc: the host pc within the translation
> + * @data: output data
> + *
> + * Attempt to load the the unwind state for a host pc occurring in
> + * translated code.  If @host_pc is not in translated code, the
> + * function returns false; otherwise @data is loaded.
> + * This is the same unwind info as given to restore_state_to_opc.
> + */
> +bool cpu_unwind_state_data(CPUState *cpu, uintptr_t host_pc, uint64_t *data);
> +
>  /**
>   * cpu_restore_state:
> - * @cpu: the vCPU state is to be restore to
> - * @searched_pc: the host PC the fault occurred at
> + * @cpu: the cpu context
> + * @host_pc: the host pc within the translation
>   * @will_exit: true if the TB executed will be interrupted after some
> cpu adjustments. Required for maintaining the correct
> icount valus
>   * @return: true if state was restored, false otherwise
>   *
>   * Attempt to restore the state for a fault occurring in translated
> - * code. If the searched_pc is not in translated code no state is
> + * code. If @host_pc is not in translated code no state is
>   * restored and the function returns false.
>   */
> -bool cpu_restore_state(CPUState *cpu, uintptr_t searched_pc, bool will_exit);
> +bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, bool will_exit);
>  
>  G_NORETURN void cpu_loop_exit_noexc(CPUState *cpu);
>  G_NORETURN void cpu_loop_exit(CPUState *cpu);
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index f185356a36..319becb698 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -247,52 +247,66 @@ static int encode_search(TranslationBlock *tb, uint8_t 
> *block)
>  return p - block;
>  }
>  
> -/* The cpu state corresponding to 'searched_pc' is restored.
> - * When reset_icount is true, current TB will be interrupted and
> - * icount should be recalculated.
> - */
> -int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
> -  uintptr_t searched_pc, bool reset_icount)


Maybe add a small comment about what the return value of this static function 
means?
It can be indirectly inferred from its point of use:

 +int insns_left = cpu_unwind_data_from_tb(tb, host_pc, data);

But I find having the information about the meaning of a function and return 
value useful to be available there.

IIUC for external functions the standard way is to document in the header 
files, but for the static functions I would think we can do it here.

With that Reviewed-by: Claudio Fontana 


> +static int cpu_unwind_data_from_tb(TranslationBlock *tb, uintptr_t host_pc,
> +   uint64_t *data)
>  {
> -uint64_t data[TARGET_INSN_START_WORDS];
> -uintptr_t host_pc = (uintptr_t)tb->tc.ptr;
> +uintptr_t iter_pc = (uintptr_t)tb->tc.ptr;
>  const uint8_t *p = tb->tc.ptr + tb->tc.size;
>  int i, j, num_insns = tb->icount;
> -#ifdef CONFIG_PROFILER
> -TCGProfile *prof = &tcg_ctx->prof;
> -int64_t ti = profile_getclock();
> -#endif
>  
> -searched_pc -= GETPC_ADJ;
> +host_pc -= GETPC_ADJ;
>  
> -if (searched_pc < host_pc) {
> +if (host_pc < iter_pc) {
>  return -1;
>  }
>  
> -memset(data, 0, sizeof(data));
> +memset(data, 0, sizeof(uint64_t) * TARGET_INSN_START_WORDS);
>  if (!TARGET_TB_PCREL) {
>  data[0] = tb_p

Re: [PATCH] target/arm: Fixed Privileged Access Never (PAN) for aarch32

2022-10-27 Thread Timofey Kutergin
Understood, thank you a lot :)

Best regards
Timofey


On Thu, Oct 27, 2022 at 12:35 PM Peter Maydell 
wrote:

> On Thu, 27 Oct 2022 at 10:22, Timofey Kutergin 
> wrote:
> > > V8 always implies V7, so we only need to check V7 here.
>
> > From silicon perspective - yes, but as I see in qemu,
> > ARM_FEATURE_V7 and ARM_FEATURE_V8 are independent bits which do not
> affect each
> > other in arm_feature() and set_feature() so they should be tested
> separately.
> > Did I miss something?
>
> In arm_cpu_realizefn() there is code which sets feature flags
> that are always implied by other feature flags. There we set
> the V7VE flag if V8 is set, and the V7 flag if V7VE is set.
> So we can rely on any v8 CPU having the V7 feature flag set.
>
> thanks
> -- PMM
>


Re: [PATCH v2 0/2] linux-user: handle /proc/self/exe with execve() syscall

2022-10-27 Thread Michael Tokarev

27.10.2022 09:40, Laurent Vivier wrote:
..
I tried O_CLOEXEC, but it seems the fd is closed before it is needed by execveat() to re-spawn the process, so it exits with an error (something like 
EBADF)


It works here for me with a simple test program:

#include 
#include 
#include 
#include 
#define AT_EMPTY_PATH   0x1000

static char *argv[] = { "ls", NULL };
static char *envp[] = { NULL };

int main(void) {
  int fd = open("/usr/bin/ls", O_RDONLY);
  fcntl(fd, F_SETFD, O_CLOEXEC);
  //execveat(fd, "", argv, envp, AT_EMPTY_PATH);
  syscall(__NR_execveat, fd, "", argv, envp, AT_EMPTY_PATH);
  return 0;
}


/mjt



Re: [PATCH v2 3/6] target/openrisc: Always exit after mtspr npc

2022-10-27 Thread Philippe Mathieu-Daudé

On 27/10/22 12:02, Richard Henderson wrote:

We have called cpu_restore_state asserting will_exit.
Do not go back on that promise.  This affects icount.

Signed-off-by: Richard Henderson 
---
  target/openrisc/sys_helper.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 3/4] meson: enforce a minimum Linux kernel headers version >= 4.18

2022-10-27 Thread Daniel P . Berrangé
On Tue, Oct 04, 2022 at 10:32:05AM +0100, Daniel P. Berrangé wrote:
> Various areas of QEMU have a dependency on Linux kernel header
> definitions. This falls under the scope of our supported platforms
> matrix, but historically we've not checked for a minimum kernel
> headers version. This has made it unclear when we can drop support
> for older kernel headers.
> 
>   * Alpine 3.14: 5.10
>   * CentOS 8: 4.18
>   * CentOS 9: 5.14
>   * Debian 10: 4.19
>   * Debian 11: 5.10
>   * Fedora 35: 5.19
>   * Fedora 36: 5.19
>   * OpenSUSE 15.3: 5.3.0
>   * Ubuntu 20.04: 5.4
>   * Ubuntu 22.04: 5.15
> 
> The above ignores the 3rd version digit since distros update their
> packages periodically and such updates don't generally affect public
> APIs to the extent that it matters for our build time check.
> 
> Overall, we can set the baseline to 4.18 currently.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  meson.build | 12 
>  1 file changed, 12 insertions(+)

Since there's no agreement, I'll just consider this patch discarded,
along with the next one. I won't repost since Laurent has already
queued the first two patches.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] net: improve error message for missing netdev backend

2022-10-27 Thread Daniel P . Berrangé
ping: Jason, are you willing to queue this since it has two
positive reviews.

On Mon, Oct 03, 2022 at 11:06:12AM +0100, Daniel P. Berrangé wrote:
> The current message when using '-net user...' with SLIRP disabled at
> compile time is:
> 
>   qemu-system-x86_64: -net user: Parameter 'type' expects a net backend type 
> (maybe it is not compiled into this binary)
> 
> An observation is that we're using the 'netdev->type' field here which
> is an enum value, produced after QAPI has converted from its string
> form.
> 
> IOW, at this point in the code, we know that the user's specified
> type name was a valid network backend. The only possible scenario that
> can make the backend init function be NULL, is if support for that
> backend was disabled at build time. Given this, we don't need to caveat
> our error message with a 'maybe' hint, we can be totally explicit.
> 
> The use of QERR_INVALID_PARAMETER_VALUE doesn't really lend itself to
> user friendly error message text. Since this is not used to set a
> specific QAPI error class, we can simply stop using this pre-formatted
> error text and provide something better.
> 
> Thus the new message is:
> 
>   qemu-system-x86_64: -net user: network backend 'user' is not compiled into 
> this binary
> 
> The case of passing 'hubport' for -net is also given a message reminding
> people they should have used -netdev/-nic instead, as this backend type
> is only valid for the modern syntax.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
> 
> NB, this does not make any difference to people who were relying on the
> QEMU built-in default hub that was created if you don't list any -net /
> -netdev / -nic argument, only those using explicit args.
> 
>  net/net.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/net/net.c b/net/net.c
> index 2db160e063..8ddafacf13 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1036,19 +1036,23 @@ static int net_client_init1(const Netdev *netdev, 
> bool is_netdev, Error **errp)
>  if (is_netdev) {
>  if (netdev->type == NET_CLIENT_DRIVER_NIC ||
>  !net_client_init_fun[netdev->type]) {
> -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type",
> -   "a netdev backend type");
> +error_setg(errp, "network backend '%s' is not compiled into this 
> binary",
> +   NetClientDriver_str(netdev->type));
>  return -1;
>  }
>  } else {
>  if (netdev->type == NET_CLIENT_DRIVER_NONE) {
>  return 0; /* nothing to do */
>  }
> -if (netdev->type == NET_CLIENT_DRIVER_HUBPORT ||
> -!net_client_init_fun[netdev->type]) {
> -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type",
> -   "a net backend type (maybe it is not compiled "
> -   "into this binary)");
> +if (netdev->type == NET_CLIENT_DRIVER_HUBPORT) {
> +error_setg(errp, "network backend '%s' is only supported with 
> -netdev/-nic",
> +   NetClientDriver_str(netdev->type));
> +return -1;
> +}
> +
> +if (!net_client_init_fun[netdev->type]) {
> +error_setg(errp, "network backend '%s' is not compiled into this 
> binary",
> +   NetClientDriver_str(netdev->type));
>  return -1;
>  }
>  
> -- 
> 2.37.3
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 6/9] target/s390x: Copy wout_x1 to wout_x1_P

2022-10-27 Thread Ilya Leoshkevich
On Fri, Oct 21, 2022 at 05:30:03PM +1000, Richard Henderson wrote:
> Make a copy of wout_x1 before modifying it, as wout_x1_P
> emphasizing that it operates on the out/out2 pair.  The insns
> that use x1_P are data movement that will not change to Int128.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target/s390x/tcg/translate.c   |  8 
>  target/s390x/tcg/insn-data.def | 12 ++--
>  2 files changed, 14 insertions(+), 6 deletions(-)

Acked-by: Ilya Leoshkevich 



Re: [PATCH v3 0/24] Convert nanoMIPS disassembler from C++ to C

2022-10-27 Thread Philippe Mathieu-Daudé

On 27/10/22 09:25, Thomas Huth wrote:

On 12/09/2022 14.26, Milica Lazarevic wrote:

Hi,

This patchset converts the nanomips disassembler to plain C. C++ features
like class, std::string type, exception handling, and function 
overloading

have been removed and replaced with the equivalent C code.


  Hi Philippe, hi Stefan,

as far as I can see, this patch set has been completely reviewed, and 
IMHO it would be nice to get this into QEMU 7.2 to finally get rid of 
the C++ dependency in the QEMU code ... could one of you pick this up 
and send a pull request with the patches? Or is there still anything 
left to do here?


Sorry I lost track of this series. I'm preparing a pull request and will
look at it later today. Thanks for the reminder!

Phil.



Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices

2022-10-27 Thread Jonathan Cameron via
On Wed, 26 Oct 2022 16:47:18 -0400
Gregory Price  wrote:

> On Wed, Oct 26, 2022 at 08:13:24PM +, Adam Manzanares wrote:
> > On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:  
> > > Submitted as an extention to the multi-feature branch maintained
> > > by Jonathan Cameron at:
> > > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24__;!!EwVzqGoTKBqv-0DWAJBm!RyiGL5B1XmQnVFwgxikKJeosPMKtoO1cTr61gIq8fwqfju8l4cbGZGwAEkKXIJB-Dbkfi_LNN2rGCbzMISz65cTxpAxI9pQ$
> > >
> > > 
> > > 
> > > Summary of Changes:
> > > 1) E820 CFMW Bug fix.  
> > > 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> > > 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> > > 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev

+CC Dan for a question on status of Generic Ports ACPI code first ECN.
Given that was done on the mailing list I can find any public tracking
of whether it was accepted or not - hence whether we can get on with
implementation.  There hasn't been a release ACPI spec since before
that was proposed so we need that confirmation of the code first proposal
being accepted to get things moving.

> > > 
> > > 
> > > Regarding the E820 fix
> > >   * This bugfix is required for memory regions to work on x86
> > >   * input from Dan Williams and others suggest that E820 entry for
> > > the CFMW should not exist, as it is expected to be dynamically
> > > assigned at runtime.  If this entry exists, it instead blocks
> > > region creation by nature of the memory region being marked as
> > > reserved.  
> > 
> > For CXL 2.0 it is my understanding that volatile capacity present at boot 
> > will
> > be advertised by the firmware. In the absence of EFI I would assume this 
> > would
> > be provided in the e820 map.   

Whilst this is one option, it is certainly not the case that all CXL 2.0
platforms will decide to do any setup of CXL memory (volatile or not) in the
firmware.  They can leave it entirely to the OS, so a cold plug flow.
Early platforms will do the setup in BIOS to support unware OSes, once
that's not a problem any more the only reason you'd want to do this is if
you don't have other RAM to boot the system, or for some reason you want
your host kernel etc in the CXL attached memory.

I'd expect to see BIOS having OS managed configuration as an option in the
intermediate period where some OSes are aware, others not.
OS knows more about usecase / policy so can make better choices on interleaving
etc of volatile CXL type 3 memory (let alone the fun corner of devices
where you can dynamically change the mix of volatile and non volatile
memory).


> 
> The issue in this case is very explicitly that a double-mapping occurs
> for the same region.  An E820 mapping for RESERVED is set *and* the
> region driver allocates a CXL CFMW mapping.  As a result the region
> drive straight up fails to allocate regions.
> 
> So in either case - at least from my view - the entry added as RESERVED
> is just wrong.
> 
> This is separate from type-3 device SRAT entries and default mappings
> for volatile regions.  For this situation, if you explicitly assign the
> memdev backing a type-3 device to a numa node, then an SRAT area is
> generated and an explicit e820 entry is generated and marked usable -
> though I think there are likely issues with this kind of
> double-referencing.

SRAT setup for CXL type 3 devices is to my mind is a job for a full BIOS,
not QEMU level of faking a few things. That BIOS should also
be doing the full configuration (HDM Decoders and all the rest).  ARM has
a prototype for one of the fixed virtual platforms that does this (talk
at Plumbers Uconf), we should look to do the same if we want to test
those flows using QEMU via appropriate changes in EDK2 to walk topology
and configure everything.  Until the devices are configured there is no
way to configure the SLIT, HMAT entries that align with the SRAT ones
(in theory those can be updated at runtime via _SLI, _HMA but in 
practice, I'm fairly sure Linux doesn't support doing that.)


> 
> > 
> > Is the region driver meant to cover volatile capacity present at boot? I was
> > under the impression that it would be used for hot added volatile memory. It
> > would be good to cover all of these assumptions for the e820 fix.  
> 
> This region appears to cover hotplug memory behind the CFMW.  The
> problem is that this e820 RESERVED mapping blocks the CFMW region from
> being used at all.
> 
> Without this, you can't use a type-3 persistent region, even with
> support, let alone a volatile region.  In attempting to use a persistent
> region as volatile via ndctl and friends, I'm seeing further issues (it
> cannot be assigned to a numa node successfully), but that's a separate
> issue.
For the Numa node bit...

So far, the CDAT table isn't used in Linux (read out for debug purposes
is supported only).  That all needs to be added yet to get the NUMA node
alloca

Re: [PATCH 7/9] tests/tcg/s390x: Add long-double.c

2022-10-27 Thread Ilya Leoshkevich
On Fri, Oct 21, 2022 at 05:30:04PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 
> ---
>  tests/tcg/s390x/long-double.c   | 24 
>  tests/tcg/s390x/Makefile.target |  1 +
>  2 files changed, 25 insertions(+)
>  create mode 100644 tests/tcg/s390x/long-double.c

It might be better to do this in asm in order to be sure that a
compiler doesn't perform any magic. But at least as of today gcc
generates all the "interesting" instructions from this code.

Acked-by: Ilya Leoshkevich 



[PATCH v2 2/7] accel/tcg: Use interval tree for TBs in user-only mode

2022-10-27 Thread Richard Henderson
Begin weaning user-only away from PageDesc.

Since, for user-only, all TB (and page) manipulation is done with
a single mutex, and there is no virtual/physical discontinuity to
split a TB across discontinuous pages, place all of the TBs into
a single IntervalTree. This makes it trivial to find all of the
TBs intersecting a range.

Retain the existing PageDesc + linked list implementation for
system mode.  Move the portion of the implementation that overlaps
the new user-only code behind the common ifdef.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h  |  16 +-
 include/exec/exec-all.h   |  43 -
 accel/tcg/tb-maint.c  | 388 ++
 accel/tcg/translate-all.c |   4 +-
 4 files changed, 280 insertions(+), 171 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 1227bb69bd..1bd5a02911 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -24,14 +24,13 @@
 #endif
 
 typedef struct PageDesc {
-/* list of TBs intersecting this ram page */
-uintptr_t first_tb;
 #ifdef CONFIG_USER_ONLY
 unsigned long flags;
 void *target_data;
-#endif
-#ifdef CONFIG_SOFTMMU
+#else
 QemuSpin lock;
+/* list of TBs intersecting this ram page */
+uintptr_t first_tb;
 #endif
 } PageDesc;
 
@@ -69,9 +68,6 @@ static inline PageDesc *page_find(tb_page_addr_t index)
  tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
  tb = (TranslationBlock *)((uintptr_t)tb & ~1))
 
-#define PAGE_FOR_EACH_TB(pagedesc, tb, n)   \
-TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
-
 #define TB_FOR_EACH_JMP(head_tb, tb, n) \
 TB_FOR_EACH_TAGGED((head_tb)->jmp_list_head, tb, n, jmp_list_next)
 
@@ -89,6 +85,12 @@ void do_assert_page_locked(const PageDesc *pd, const char 
*file, int line);
 #endif
 void page_lock(PageDesc *pd);
 void page_unlock(PageDesc *pd);
+
+/* TODO: For now, still shared with translate-all.c for system mode. */
+typedef int PageForEachNext;
+#define PAGE_FOR_EACH_TB(start, end, pagedesc, tb, n) \
+TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
+
 #endif
 #if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG)
 void assert_no_pages_locked(void);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e948992a80..11fd635fdc 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -24,6 +24,7 @@
 #ifdef CONFIG_TCG
 #include "exec/cpu_ldst.h"
 #endif
+#include "qemu/interval-tree.h"
 
 /* allow to see translation results - the slowdown should be negligible, so we 
leave it */
 #define DEBUG_DISAS
@@ -549,11 +550,20 @@ struct TranslationBlock {
 
 struct tb_tc tc;
 
-/* first and second physical page containing code. The lower bit
-   of the pointer tells the index in page_next[].
-   The list is protected by the TB's page('s) lock(s) */
+/*
+ * Track tb_page_addr_t intervals that intersect this TB.
+ * For user-only, the virtual addresses are always contiguous,
+ * and we use a unified interval tree.  For system, we use a
+ * linked list headed in each PageDesc.  Within the list, the lsb
+ * of the previous pointer tells the index of page_next[], and the
+ * list is protected by the PageDesc lock(s).
+ */
+#ifdef CONFIG_USER_ONLY
+IntervalTreeNode itree;
+#else
 uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
+#endif
 
 /* jmp_lock placed here to fill a 4-byte hole. Its documentation is below 
*/
 QemuSpin jmp_lock;
@@ -609,24 +619,51 @@ static inline uint32_t tb_cflags(const TranslationBlock 
*tb)
 
 static inline tb_page_addr_t tb_page_addr0(const TranslationBlock *tb)
 {
+#ifdef CONFIG_USER_ONLY
+return tb->itree.start;
+#else
 return tb->page_addr[0];
+#endif
 }
 
 static inline tb_page_addr_t tb_page_addr1(const TranslationBlock *tb)
 {
+#ifdef CONFIG_USER_ONLY
+tb_page_addr_t next = tb->itree.last & TARGET_PAGE_MASK;
+return next == (tb->itree.start & TARGET_PAGE_MASK) ? -1 : next;
+#else
 return tb->page_addr[1];
+#endif
 }
 
 static inline void tb_set_page_addr0(TranslationBlock *tb,
  tb_page_addr_t addr)
 {
+#ifdef CONFIG_USER_ONLY
+tb->itree.start = addr;
+/*
+ * To begin, we record an interval of one byte.  When the translation
+ * loop encounters a second page, the interval will be extended to
+ * include the first byte of the second page, which is sufficient to
+ * allow tb_page_addr1() above to work properly.  The final corrected
+ * interval will be set by tb_page_add() from tb->size before the
+ * node is added to the interval tree.
+ */
+tb->itree.last = addr;
+#else
 tb->page_addr[0] = addr;
+#endif
 }
 
 static inline void tb_set_page_addr1(TranslationBlock *tb,
  tb_page_addr_t addr)
 {
+#ifdef CONFIG_USER_ONLY
+/* Ext

Re: [PATCH v10 3/9] s390x/cpu_topology: resetting the Topology-Change-Report

2022-10-27 Thread Pierre Morel




On 10/27/22 11:58, Cédric Le Goater wrote:

On 10/27/22 11:11, Pierre Morel wrote:



On 10/27/22 10:14, Thomas Huth wrote:

On 12/10/2022 18.21, Pierre Morel wrote:

During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
  bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
Reviewed-by: Nico Boehr 
Reviewed-by: Janis Schoetterl-Glausch 
---
  target/s390x/cpu.h   |  1 +
  target/s390x/kvm/kvm_s390x.h |  1 +
  hw/s390x/cpu-topology.c  | 12 
  hw/s390x/s390-virtio-ccw.c   |  1 +
  target/s390x/cpu-sysemu.c    |  7 +++
  target/s390x/kvm/kvm.c   | 23 +++
  6 files changed, 45 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index d604aa9c78..9b35795ac8 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -825,6 +825,7 @@ void s390_enable_css_support(S390CPU *cpu);
  void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg);
  int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t 
sch_id,

  int vq, bool assign);
+void s390_cpu_topology_reset(void);
  #ifndef CONFIG_USER_ONLY
  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
  #else
diff --git a/target/s390x/kvm/kvm_s390x.h 
b/target/s390x/kvm/kvm_s390x.h

index aaae8570de..a13c8fb9a3 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -46,5 +46,6 @@ void kvm_s390_crypto_reset(void);
  void kvm_s390_restart_interrupt(S390CPU *cpu);
  void kvm_s390_stop_interrupt(S390CPU *cpu);
  void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info);
+int kvm_s390_topology_set_mtcr(uint64_t attr);
  #endif /* KVM_S390X_H */
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index c73cebfe6f..9f202621d0 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -107,6 +107,17 @@ static void s390_topology_realize(DeviceState 
*dev, Error **errp)

  qemu_mutex_init(&topo->topo_mutex);
  }
+/**
+ * s390_topology_reset:
+ * @dev: the device
+ *
+ * Calls the sysemu topology reset
+ */
+static void s390_topology_reset(DeviceState *dev)
+{
+    s390_cpu_topology_reset();
+}
+
  /**
   * topology_class_init:
   * @oc: Object class
@@ -120,6 +131,7 @@ static void topology_class_init(ObjectClass *oc, 
void *data)

  dc->realize = s390_topology_realize;
  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    dc->reset = s390_topology_reset;
  }
  static const TypeInfo cpu_topology_info = {
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index aa99a62e42..362378454a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -113,6 +113,7 @@ static const char *const reset_dev_types[] = {
  "s390-flic",
  "diag288",
  TYPE_S390_PCI_HOST_BRIDGE,
+    TYPE_S390_CPU_TOPOLOGY,
  };
  static void subsystem_reset(void)
diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 948e4bd3e0..707c0b658c 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -306,3 +306,10 @@ void s390_do_cpu_set_diag318(CPUState *cs, 
run_on_cpu_data arg)

  kvm_s390_set_diag318(cs, arg.host_ulong);
  }
  }
+
+void s390_cpu_topology_reset(void)
+{
+    if (kvm_enabled()) {
+    kvm_s390_topology_set_mtcr(0);
+    }
+}
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index f96630440b..9c994d27d5 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2585,3 +2585,26 @@ int kvm_s390_get_zpci_op(void)
  {
  return cap_zpci_op;
  }
+
+int kvm_s390_topology_set_mtcr(uint64_t attr)
+{
+    struct kvm_device_attr attribute = {
+    .group = KVM_S390_VM_CPU_TOPOLOGY,
+    .attr  = attr,
+    };
+    int ret;
+
+    if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+    return -EFAULT;


EFAULT is something that indicates a bad address (e.g. a segmentation 
fault) ... so this definitely sounds like a bad choice for an error 
code here.


Hum, yes, ENODEV seems besser no?


-ENOTSUP would be 'meilleur' may be ?  :)


yes better :)

thanks,
Pierre

--
Pierre Morel
IBM Lab Boeblingen



[PATCH v2 3/7] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE

2022-10-27 Thread Richard Henderson
Continue weaning user-only away from PageDesc.

Use an interval tree to record target data.
Chunk the data, to minimize allocation overhead.

Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h  |  1 -
 accel/tcg/user-exec.c | 99 ---
 2 files changed, 74 insertions(+), 26 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 1bd5a02911..8731dc52e2 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -26,7 +26,6 @@
 typedef struct PageDesc {
 #ifdef CONFIG_USER_ONLY
 unsigned long flags;
-void *target_data;
 #else
 QemuSpin lock;
 /* list of TBs intersecting this ram page */
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index fb7d6ee9e9..42a04bdb21 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -210,47 +210,96 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
 return addr;
 }
 
+#ifdef TARGET_PAGE_DATA_SIZE
+/*
+ * Allocate chunks of target data together.  For the only current user,
+ * if we allocate one hunk per page, we have overhead of 40/128 or 40%.
+ * Therefore, allocate memory for 64 pages at a time for overhead < 1%.
+ */
+#define TPD_PAGES  64
+#define TBD_MASK   (TARGET_PAGE_MASK * TPD_PAGES)
+
+typedef struct TargetPageDataNode {
+IntervalTreeNode itree;
+char data[TPD_PAGES][TARGET_PAGE_DATA_SIZE] __attribute__((aligned));
+} TargetPageDataNode;
+
+static IntervalTreeRoot targetdata_root;
+
 void page_reset_target_data(target_ulong start, target_ulong end)
 {
-#ifdef TARGET_PAGE_DATA_SIZE
-target_ulong addr, len;
+IntervalTreeNode *n, *next;
+target_ulong last;
 
-/*
- * This function should never be called with addresses outside the
- * guest address space.  If this assert fires, it probably indicates
- * a missing call to h2g_valid.
- */
-assert(end - 1 <= GUEST_ADDR_MAX);
-assert(start < end);
 assert_memory_lock();
 
 start = start & TARGET_PAGE_MASK;
-end = TARGET_PAGE_ALIGN(end);
+last = TARGET_PAGE_ALIGN(end) - 1;
 
-for (addr = start, len = end - start;
- len != 0;
- len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+for (n = interval_tree_iter_first(&targetdata_root, start, last),
+ next = n ? interval_tree_iter_next(n, start, last) : NULL;
+ n != NULL;
+ n = next,
+ next = next ? interval_tree_iter_next(n, start, last) : NULL) {
+target_ulong n_start, n_last, p_ofs, p_len;
+TargetPageDataNode *t;
 
-g_free(p->target_data);
-p->target_data = NULL;
+if (n->start >= start && n->last <= last) {
+interval_tree_remove(n, &targetdata_root);
+g_free(n);
+continue;
+}
+
+if (n->start < start) {
+n_start = start;
+p_ofs = (start - n->start) >> TARGET_PAGE_BITS;
+} else {
+n_start = n->start;
+p_ofs = 0;
+}
+n_last = MIN(last, n->last);
+p_len = (n_last + 1 - n_start) >> TARGET_PAGE_BITS;
+
+t = container_of(n, TargetPageDataNode, itree);
+memset(t->data[p_ofs], 0, p_len * TARGET_PAGE_DATA_SIZE);
 }
-#endif
 }
 
-#ifdef TARGET_PAGE_DATA_SIZE
 void *page_get_target_data(target_ulong address)
 {
-PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
-void *ret = p->target_data;
+IntervalTreeNode *n;
+TargetPageDataNode *t;
+target_ulong page, region;
 
-if (!ret) {
-ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
-p->target_data = ret;
+page = address & TARGET_PAGE_MASK;
+region = address & TBD_MASK;
+
+n = interval_tree_iter_first(&targetdata_root, page, page);
+if (!n) {
+/*
+ * See util/interval-tree.c re lockless lookups: no false positives
+ * but there are false negatives.  If we find nothing, retry with
+ * the mmap lock acquired.  We also need the lock for the
+ * allocation + insert.
+ */
+mmap_lock();
+n = interval_tree_iter_first(&targetdata_root, page, page);
+if (!n) {
+t = g_new0(TargetPageDataNode, 1);
+n = &t->itree;
+n->start = region;
+n->last = region | ~TBD_MASK;
+interval_tree_insert(n, &targetdata_root);
+}
+mmap_unlock();
 }
-return ret;
+
+t = container_of(n, TargetPageDataNode, itree);
+return t->data[(page - region) >> TARGET_PAGE_BITS];
 }
-#endif
+#else
+void page_reset_target_data(target_ulong start, target_ulong end) { }
+#endif /* TARGET_PAGE_DATA_SIZE */
 
 /* The softmmu versions of these helpers are in cputlb.c.  */
 
-- 
2.34.1




Re: [PATCH v10 2/9] s390x/cpu topology: reporting the CPU topology to the guest

2022-10-27 Thread Pierre Morel




On 10/27/22 10:12, Thomas Huth wrote:

On 12/10/2022 18.21, Pierre Morel wrote:

The guest can use the STSI instruction to get a buffer filled
with the CPU topology description.

Let us implement the STSI instruction for the basis CPU topology
level, level 2.

Signed-off-by: Pierre Morel 
---
  include/hw/s390x/cpu-topology.h |   3 +
  target/s390x/cpu.h  |  48 ++
  hw/s390x/cpu-topology.c |   8 ++-
  target/s390x/cpu_topology.c | 109 
  target/s390x/kvm/kvm.c  |   6 +-
  target/s390x/meson.build    |   1 +
  6 files changed, 172 insertions(+), 3 deletions(-)
  create mode 100644 target/s390x/cpu_topology.c

diff --git a/include/hw/s390x/cpu-topology.h 
b/include/hw/s390x/cpu-topology.h

index 66c171d0bc..61c11db017 100644
--- a/include/hw/s390x/cpu-topology.h
+++ b/include/hw/s390x/cpu-topology.h
@@ -13,6 +13,8 @@
  #include "hw/qdev-core.h"
  #include "qom/object.h"
+#define S390_TOPOLOGY_POLARITY_H  0x00
+
  typedef struct S390TopoContainer {
  int active_count;
  } S390TopoContainer;
@@ -29,6 +31,7 @@ struct S390Topology {
  S390TopoContainer *socket;
  S390TopoTLE *tle;
  MachineState *ms;
+    QemuMutex topo_mutex;
  };
  #define TYPE_S390_CPU_TOPOLOGY "s390-topology"
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 7d6d01325b..d604aa9c78 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -565,6 +565,52 @@ typedef union SysIB {
  } SysIB;
  QEMU_BUILD_BUG_ON(sizeof(SysIB) != 4096);
+/* CPU type Topology List Entry */
+typedef struct SysIBTl_cpu {
+    uint8_t nl;
+    uint8_t reserved0[3];
+    uint8_t reserved1:5;
+    uint8_t dedicated:1;
+    uint8_t polarity:2;
+    uint8_t type;
+    uint16_t origin;
+    uint64_t mask;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_cpu;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_cpu) != 16);
+
+/* Container type Topology List Entry */
+typedef struct SysIBTl_container {
+    uint8_t nl;
+    uint8_t reserved[6];
+    uint8_t id;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_container;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_container) != 8);
+
+#define TOPOLOGY_NR_MAG  6
+#define TOPOLOGY_NR_MAG6 0
+#define TOPOLOGY_NR_MAG5 1
+#define TOPOLOGY_NR_MAG4 2
+#define TOPOLOGY_NR_MAG3 3
+#define TOPOLOGY_NR_MAG2 4
+#define TOPOLOGY_NR_MAG1 5
+/* Configuration topology */
+typedef struct SysIB_151x {
+    uint8_t  reserved0[2];
+    uint16_t length;
+    uint8_t  mag[TOPOLOGY_NR_MAG];
+    uint8_t  reserved1;
+    uint8_t  mnest;
+    uint32_t reserved2;
+    char tle[0];
+} QEMU_PACKED QEMU_ALIGNED(8) SysIB_151x;
+QEMU_BUILD_BUG_ON(sizeof(SysIB_151x) != 16);
+
+/* Maxi size of a SYSIB structure is when all CPU are alone in a 
container */
+#define S390_TOPOLOGY_SYSIB_SIZE (sizeof(SysIB_151x) 
+ \
+  S390_MAX_CPUS * 
(sizeof(SysIBTl_container) + \

+   sizeof(SysIBTl_cpu)))
+
+
  /* MMU defines */
  #define ASCE_ORIGIN   (~0xfffULL) /* segment table 
origin */
  #define ASCE_SUBSPACE 0x200   /* subspace group 
control   */

@@ -843,4 +889,6 @@ S390CPU *s390_cpu_addr2state(uint16_t cpu_addr);
  #include "exec/cpu-all.h"
+void insert_stsi_15_1_x(S390CPU *cpu, int sel2, __u64 addr, uint8_t ar);
+
  #endif
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index 42b22a1831..c73cebfe6f 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -54,8 +54,6 @@ void s390_topology_new_cpu(int core_id)
  return;
  }
-    socket_id = core_id / topo->cpus;
-
  /*
   * At the core level, each CPU is represented by a bit in a 64bit
   * unsigned long which represent the presence of a CPU.
@@ -76,8 +74,13 @@ void s390_topology_new_cpu(int core_id)
  bit %= 64;
  bit = 63 - bit;
+    qemu_mutex_lock(&topo->topo_mutex);
+
+    socket_id = core_id / topo->cpus;
  topo->socket[socket_id].active_count++;
  set_bit(bit, &topo->tle[socket_id].mask[origin]);
+
+    qemu_mutex_unlock(&topo->topo_mutex);
  }
  /**
@@ -101,6 +104,7 @@ static void s390_topology_realize(DeviceState 
*dev, Error **errp)

  topo->tle = g_new0(S390TopoTLE, ms->smp.max_cpus);
  topo->ms = ms;
+    qemu_mutex_init(&topo->topo_mutex);
  }
  /**
diff --git a/target/s390x/cpu_topology.c b/target/s390x/cpu_topology.c
new file mode 100644
index 00..df86a98f23
--- /dev/null
+++ b/target/s390x/cpu_topology.c
@@ -0,0 +1,109 @@
+/*
+ * QEMU S390x CPU Topology
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or 
(at

+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "hw/s390x/pv.h"
+#include "hw/sysbus.h"
+#include "hw/s390x/cpu-topology.h"
+#include "hw/s390x/sclp.h"
+
+#define S390_TOPOLOGY_MAX_ST

Re: [PATCH 8/9] target/s390x: Use Int128 for returning float128

2022-10-27 Thread Ilya Leoshkevich
On Fri, Oct 21, 2022 at 05:30:05PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 
> ---
>  target/s390x/helper.h  | 22 +++
>  target/s390x/tcg/fpu_helper.c  | 29 ++--
>  target/s390x/tcg/translate.c   | 49 +++---
>  target/s390x/tcg/insn-data.def | 20 +++---
>  4 files changed, 63 insertions(+), 57 deletions(-)
> 

> @@ -2032,7 +2031,7 @@ static DisasJumpType op_cxlgb(DisasContext *s, DisasOps 
> *o)
>  if (!m34) {
>  return DISAS_NORETURN;
>  }
> -gen_helper_cxlgb(o->out, cpu_env, o->in2, m34);
> +gen_helper_cxlgb(o->out_128, cpu_env, o->in2, m34);
>  tcg_temp_free_i32(m34);
>  return_low128(o->out2);
>  return DISAS_NEXT;

Do we still need return_low128() here?

>  static DisasJumpType op_lxeb(DisasContext *s, DisasOps *o)
>  {
> -gen_helper_lxeb(o->out, cpu_env, o->in2);
> +gen_helper_lxeb(o->out_128, cpu_env, o->in2);
>  return_low128(o->out2);
>  return DISAS_NEXT;
>  }

Same question.



[PATCH] target/arm: Fixed Privileged Access Never (PAN) for aarch32

2022-10-27 Thread Timofey Kutergin
- Use CPSR.PAN to check for PAN state in aarch32 mode
- throw permission fault during address translation when PAN is
  enabled and kernel tries to access user acessible page
- ignore SCTLR_XP bit for armv7 and armv8 (conflicts with SCTLR_SPAN).

Signed-off-by: Timofey Kutergin 
---
 target/arm/helper.c | 13 +++--
 target/arm/ptw.c| 35 ++-
 2 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index c672903f43..4301478ed8 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10992,6 +10992,15 @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState 
*env, bool secstate)
 }
 #endif
 
+static bool arm_pan_enabled(CPUARMState *env)
+{
+if (is_a64(env)) {
+return env->pstate & PSTATE_PAN;
+} else {
+return env->uncached_cpsr & CPSR_PAN;
+}
+}
+
 ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
 {
 ARMMMUIdx idx;
@@ -11012,7 +11021,7 @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
 }
 break;
 case 1:
-if (env->pstate & PSTATE_PAN) {
+if (arm_pan_enabled(env)) {
 idx = ARMMMUIdx_E10_1_PAN;
 } else {
 idx = ARMMMUIdx_E10_1;
@@ -11021,7 +11030,7 @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
 case 2:
 /* Note that TGE does not apply at EL2.  */
 if (arm_hcr_el2_eff(env) & HCR_E2H) {
-if (env->pstate & PSTATE_PAN) {
+if (arm_pan_enabled(env)) {
 idx = ARMMMUIdx_E20_2_PAN;
 } else {
 idx = ARMMMUIdx_E20_2;
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 6c5ed56a10..a82accab40 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -433,12 +433,11 @@ static bool get_level1_table_address(CPUARMState *env, 
ARMMMUIdx mmu_idx,
  * @mmu_idx: MMU index indicating required translation regime
  * @ap:  The 3-bit access permissions (AP[2:0])
  * @domain_prot: The 2-bit domain access permissions
+ * @is_user: TRUE if accessing from PL0
  */
-static int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
- int ap, int domain_prot)
+static int ap_to_rw_prot_is_user(CPUARMState *env, ARMMMUIdx mmu_idx,
+ int ap, int domain_prot, bool is_user)
 {
-bool is_user = regime_is_user(env, mmu_idx);
-
 if (domain_prot == 3) {
 return PAGE_READ | PAGE_WRITE;
 }
@@ -482,6 +481,20 @@ static int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 }
 }
 
+/*
+ * Translate section/page access permissions to page R/W protection flags
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @ap:  The 3-bit access permissions (AP[2:0])
+ * @domain_prot: The 2-bit domain access permissions
+ */
+static int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
+ int ap, int domain_prot)
+{
+   return ap_to_rw_prot_is_user(env, mmu_idx, ap, domain_prot,
+regime_is_user(env, mmu_idx));
+}
+
 /*
  * Translate section/page access permissions to page R/W protection flags.
  * @ap:  The 2-bit simple AP (AP[2:1])
@@ -644,6 +657,7 @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate 
*ptw,
 hwaddr phys_addr;
 uint32_t dacr;
 bool ns;
+int user_prot;
 
 /* Pagetable walk.  */
 /* Lookup l1 descriptor.  */
@@ -749,8 +763,10 @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate 
*ptw,
 goto do_fault;
 }
 result->f.prot = simple_ap_to_rw_prot(env, mmu_idx, ap >> 1);
+user_prot = simple_ap_to_rw_prot_is_user(ap >> 1, 1);
 } else {
 result->f.prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
+user_prot = ap_to_rw_prot_is_user(env, mmu_idx, ap, domain_prot, 
1);
 }
 if (result->f.prot && !xn) {
 result->f.prot |= PAGE_EXEC;
@@ -760,6 +776,14 @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate 
*ptw,
 fi->type = ARMFault_Permission;
 goto do_fault;
 }
+if (regime_is_pan(env, mmu_idx) &&
+!regime_is_user(env, mmu_idx) &&
+user_prot &&
+access_type != MMU_INST_FETCH) {
+/* Privileged Access Never fault */
+fi->type = ARMFault_Permission;
+goto do_fault;
+}
 }
 if (ns) {
 /* The NS bit will (as required by the architecture) have no effect if
@@ -2606,7 +2630,8 @@ static bool get_phys_addr_with_struct(CPUARMState *env, 
S1Translate *ptw,
 if (regime_using_lpae_format(env, mmu_idx)) {
 return get_phys_addr_lpae(env, ptw, address, access_type, false,
   result, fi);
-} else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
+} else if (arm_feature(env, ARM_FEATURE_V7) ||
+   regime_sctlr(env, mmu_idx) & SC

[PATCH v2 5/7] accel/tcg: Use interval tree for user-only page tracking

2022-10-27 Thread Richard Henderson
Finish weaning user-only away from PageDesc.

Using an interval tree to track page permissions means that
we can represent very large regions efficiently.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/290
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/967
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1214
Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h   |   4 +-
 accel/tcg/tb-maint.c   |  20 +-
 accel/tcg/user-exec.c  | 614 ++---
 tests/tcg/multiarch/test-vma.c |  22 ++
 4 files changed, 451 insertions(+), 209 deletions(-)
 create mode 100644 tests/tcg/multiarch/test-vma.c

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 250f0daac9..c7e157d1cd 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -24,9 +24,7 @@
 #endif
 
 typedef struct PageDesc {
-#ifdef CONFIG_USER_ONLY
-unsigned long flags;
-#else
+#ifndef CONFIG_USER_ONLY
 QemuSpin lock;
 /* list of TBs intersecting this ram page */
 uintptr_t first_tb;
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 14e8e47a6a..694440cb4a 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -68,15 +68,23 @@ static void page_flush_tb(void)
 /* Call with mmap_lock held. */
 static void tb_page_add(TranslationBlock *tb, PageDesc *p1, PageDesc *p2)
 {
-/* translator_loop() must have made all TB pages non-writable */
-assert(!(p1->flags & PAGE_WRITE));
-if (p2) {
-assert(!(p2->flags & PAGE_WRITE));
-}
+target_ulong addr;
+int flags;
 
 assert_memory_lock();
-
 tb->itree.last = tb->itree.start + tb->size - 1;
+
+/* translator_loop() must have made all TB pages non-writable */
+addr = tb_page_addr0(tb);
+flags = page_get_flags(addr);
+assert(!(flags & PAGE_WRITE));
+
+addr = tb_page_addr1(tb);
+if (addr != -1) {
+flags = page_get_flags(addr);
+assert(!(flags & PAGE_WRITE));
+}
+
 interval_tree_insert(&tb->itree, &tb_root);
 }
 
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 22ef780900..d71404f49c 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -135,106 +135,61 @@ bool handle_sigsegv_accerr_write(CPUState *cpu, sigset_t 
*old_set,
 }
 }
 
-/*
- * Walks guest process memory "regions" one by one
- * and calls callback function 'fn' for each region.
- */
-struct walk_memory_regions_data {
-walk_memory_regions_fn fn;
-void *priv;
-target_ulong start;
-int prot;
-};
+typedef struct PageFlagsNode {
+IntervalTreeNode itree;
+int flags;
+} PageFlagsNode;
 
-static int walk_memory_regions_end(struct walk_memory_regions_data *data,
-   target_ulong end, int new_prot)
+static IntervalTreeRoot pageflags_root;
+
+static PageFlagsNode *pageflags_find(target_ulong start, target_long last)
 {
-if (data->start != -1u) {
-int rc = data->fn(data->priv, data->start, end, data->prot);
-if (rc != 0) {
-return rc;
-}
-}
+IntervalTreeNode *n;
 
-data->start = (new_prot ? end : -1u);
-data->prot = new_prot;
-
-return 0;
+n = interval_tree_iter_first(&pageflags_root, start, last);
+return n ? container_of(n, PageFlagsNode, itree) : NULL;
 }
 
-static int walk_memory_regions_1(struct walk_memory_regions_data *data,
- target_ulong base, int level, void **lp)
+static PageFlagsNode *pageflags_next(PageFlagsNode *p, target_ulong start,
+ target_long last)
 {
-target_ulong pa;
-int i, rc;
+IntervalTreeNode *n;
 
-if (*lp == NULL) {
-return walk_memory_regions_end(data, base, 0);
-}
-
-if (level == 0) {
-PageDesc *pd = *lp;
-
-for (i = 0; i < V_L2_SIZE; ++i) {
-int prot = pd[i].flags;
-
-pa = base | (i << TARGET_PAGE_BITS);
-if (prot != data->prot) {
-rc = walk_memory_regions_end(data, pa, prot);
-if (rc != 0) {
-return rc;
-}
-}
-}
-} else {
-void **pp = *lp;
-
-for (i = 0; i < V_L2_SIZE; ++i) {
-pa = base | ((target_ulong)i <<
-(TARGET_PAGE_BITS + V_L2_BITS * level));
-rc = walk_memory_regions_1(data, pa, level - 1, pp + i);
-if (rc != 0) {
-return rc;
-}
-}
-}
-
-return 0;
+n = interval_tree_iter_next(&p->itree, start, last);
+return n ? container_of(n, PageFlagsNode, itree) : NULL;
 }
 
 int walk_memory_regions(void *priv, walk_memory_regions_fn fn)
 {
-struct walk_memory_regions_data data;
-uintptr_t i, l1_sz = v_l1_size;
+IntervalTreeNode *n;
+int rc = 0;
 
-data.fn = fn;
-data.priv = priv;
-data.start = -1u;
-data.prot = 0;
+mmap_lock();
+for (n = interval_tree_iter_first(&pageflags_root, 0, -1);
+ 

Re: [PATCH v10 7/9] s390x/cpu topology: add max_threads machine class attribute

2022-10-27 Thread Pierre Morel




On 10/27/22 12:00, Cédric Le Goater wrote:

Hello Pierre,

On 10/12/22 18:21, Pierre Morel wrote:

The S390 CPU topology accepts the smp.threads argument while
in reality it does not effectively allow multthreading.

Let's keep this behavior for machines older than 7.3 and
refuse to use threads in newer machines until multithreading
is really proposed to the guest by the machine.


This change is unrelated to the rest of the series and we could merge it
for 7.2. We still have time for it.


OK, then I send it on its own

Regards,
Pierre

...

--
Pierre Morel
IBM Lab Boeblingen



[PATCH v2 7/7] accel/tcg: Move remainder of page locking to tb-maint.c

2022-10-27 Thread Richard Henderson
The only thing that still touches PageDesc in translate-all.c
are some locking routines related to tb-maint.c which have not
yet been moved.  Do so now.

Move some code up in tb-maint.c as well, to untangle the maze
of ifdefs, and allow a sensible final ordering.

Move some declarations from exec/translate-all.h to internal.h,
as they are only used within accel/tcg/.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h |  68 ++-
 include/exec/translate-all.h |   6 -
 accel/tcg/tb-maint.c | 352 +--
 accel/tcg/translate-all.c| 301 --
 4 files changed, 352 insertions(+), 375 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index c6c9e02cfd..6ce1437c58 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -23,62 +23,28 @@
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
-typedef struct PageDesc PageDesc;
-#ifndef CONFIG_USER_ONLY
-struct PageDesc {
-QemuSpin lock;
-/* list of TBs intersecting this ram page */
-uintptr_t first_tb;
-};
-
-PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc);
-
-static inline PageDesc *page_find(tb_page_addr_t index)
-{
-return page_find_alloc(index, false);
-}
-
-void page_table_config_init(void);
-#else
-static inline void page_table_config_init(void) { }
-#endif
-
-/* list iterators for lists of tagged pointers in TranslationBlock */
-#define TB_FOR_EACH_TAGGED(head, tb, n, field)  \
-for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);\
- tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
- tb = (TranslationBlock *)((uintptr_t)tb & ~1))
-
-#define TB_FOR_EACH_JMP(head_tb, tb, n) \
-TB_FOR_EACH_TAGGED((head_tb)->jmp_list_head, tb, n, jmp_list_next)
-
-/* In user-mode page locks aren't used; mmap_lock is enough */
-#ifdef CONFIG_USER_ONLY
-#define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
-static inline void page_lock(PageDesc *pd) { }
-static inline void page_unlock(PageDesc *pd) { }
-#else
-#ifdef CONFIG_DEBUG_TCG
-void do_assert_page_locked(const PageDesc *pd, const char *file, int line);
-#define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
-#else
-#define assert_page_locked(pd)
-#endif
-void page_lock(PageDesc *pd);
-void page_unlock(PageDesc *pd);
-
-/* TODO: For now, still shared with translate-all.c for system mode. */
-typedef int PageForEachNext;
-#define PAGE_FOR_EACH_TB(start, end, pagedesc, tb, n) \
-TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
-
-#endif
-#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG)
+#if defined(CONFIG_SOFTMMU) && defined(CONFIG_DEBUG_TCG)
 void assert_no_pages_locked(void);
 #else
 static inline void assert_no_pages_locked(void) { }
 #endif
 
+#ifdef CONFIG_USER_ONLY
+static inline void page_table_config_init(void) { }
+#else
+void page_table_config_init(void);
+#endif
+
+#ifdef CONFIG_SOFTMMU
+struct page_collection;
+void tb_invalidate_phys_page_fast(struct page_collection *pages,
+  tb_page_addr_t start, int len,
+  uintptr_t retaddr);
+struct page_collection *page_collection_lock(tb_page_addr_t start,
+ tb_page_addr_t end);
+void page_collection_unlock(struct page_collection *set);
+#endif /* CONFIG_SOFTMMU */
+
 TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
   target_ulong cs_base, uint32_t flags,
   int cflags);
diff --git a/include/exec/translate-all.h b/include/exec/translate-all.h
index 3e9cb91565..88602ae8d8 100644
--- a/include/exec/translate-all.h
+++ b/include/exec/translate-all.h
@@ -23,12 +23,6 @@
 
 
 /* translate-all.c */
-struct page_collection *page_collection_lock(tb_page_addr_t start,
- tb_page_addr_t end);
-void page_collection_unlock(struct page_collection *set);
-void tb_invalidate_phys_page_fast(struct page_collection *pages,
-  tb_page_addr_t start, int len,
-  uintptr_t retaddr);
 void tb_invalidate_phys_page(tb_page_addr_t addr);
 void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
 
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 31d0a74aa9..8fe2d322db 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -30,6 +30,15 @@
 #include "internal.h"
 
 
+/* List iterators for lists of tagged pointers in TranslationBlock. */
+#define TB_FOR_EACH_TAGGED(head, tb, n, field)  \
+for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);\
+ tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
+ tb = (TranslationBlock *)((uintptr_t)tb & ~1))
+
+#define TB_FOR_EAC

[PATCH v2 4/7] accel/tcg: Move page_{get,set}_flags to user-exec.c

2022-10-27 Thread Richard Henderson
This page tracking implementation is specific to user-only,
since the system softmmu version is in cputlb.c.  Move it
out of translate-all.c to user-exec.c.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h  |  17 ++
 accel/tcg/translate-all.c | 350 --
 accel/tcg/user-exec.c | 346 +
 3 files changed, 363 insertions(+), 350 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 8731dc52e2..250f0daac9 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -33,6 +33,23 @@ typedef struct PageDesc {
 #endif
 } PageDesc;
 
+/*
+ * In system mode we want L1_MAP to be based on ram offsets,
+ * while in user mode we want it to be based on virtual addresses.
+ *
+ * TODO: For user mode, see the caveat re host vs guest virtual
+ * address spaces near GUEST_ADDR_MAX.
+ */
+#if !defined(CONFIG_USER_ONLY)
+#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
+# define L1_MAP_ADDR_SPACE_BITS  HOST_LONG_BITS
+#else
+# define L1_MAP_ADDR_SPACE_BITS  TARGET_PHYS_ADDR_SPACE_BITS
+#endif
+#else
+# define L1_MAP_ADDR_SPACE_BITS  MIN(HOST_LONG_BITS, TARGET_ABI_BITS)
+#endif
+
 /* Size of the L2 (and L3, etc) page tables.  */
 #define V_L2_BITS 10
 #define V_L2_SIZE (1 << V_L2_BITS)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index dc7973eb3b..0f8f8e5bef 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -109,23 +109,6 @@ struct page_collection {
 struct page_entry *max;
 };
 
-/*
- * In system mode we want L1_MAP to be based on ram offsets,
- * while in user mode we want it to be based on virtual addresses.
- *
- * TODO: For user mode, see the caveat re host vs guest virtual
- * address spaces near GUEST_ADDR_MAX.
- */
-#if !defined(CONFIG_USER_ONLY)
-#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
-# define L1_MAP_ADDR_SPACE_BITS  HOST_LONG_BITS
-#else
-# define L1_MAP_ADDR_SPACE_BITS  TARGET_PHYS_ADDR_SPACE_BITS
-#endif
-#else
-# define L1_MAP_ADDR_SPACE_BITS  MIN(HOST_LONG_BITS, TARGET_ABI_BITS)
-#endif
-
 /* Make sure all possible CPU event bits fit in tb->trace_vcpu_dstate */
 QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
   sizeof_field(TranslationBlock, trace_vcpu_dstate)
@@ -1222,339 +1205,6 @@ void cpu_interrupt(CPUState *cpu, int mask)
 qatomic_set(&cpu_neg(cpu)->icount_decr.u16.high, -1);
 }
 
-/*
- * Walks guest process memory "regions" one by one
- * and calls callback function 'fn' for each region.
- */
-struct walk_memory_regions_data {
-walk_memory_regions_fn fn;
-void *priv;
-target_ulong start;
-int prot;
-};
-
-static int walk_memory_regions_end(struct walk_memory_regions_data *data,
-   target_ulong end, int new_prot)
-{
-if (data->start != -1u) {
-int rc = data->fn(data->priv, data->start, end, data->prot);
-if (rc != 0) {
-return rc;
-}
-}
-
-data->start = (new_prot ? end : -1u);
-data->prot = new_prot;
-
-return 0;
-}
-
-static int walk_memory_regions_1(struct walk_memory_regions_data *data,
- target_ulong base, int level, void **lp)
-{
-target_ulong pa;
-int i, rc;
-
-if (*lp == NULL) {
-return walk_memory_regions_end(data, base, 0);
-}
-
-if (level == 0) {
-PageDesc *pd = *lp;
-
-for (i = 0; i < V_L2_SIZE; ++i) {
-int prot = pd[i].flags;
-
-pa = base | (i << TARGET_PAGE_BITS);
-if (prot != data->prot) {
-rc = walk_memory_regions_end(data, pa, prot);
-if (rc != 0) {
-return rc;
-}
-}
-}
-} else {
-void **pp = *lp;
-
-for (i = 0; i < V_L2_SIZE; ++i) {
-pa = base | ((target_ulong)i <<
-(TARGET_PAGE_BITS + V_L2_BITS * level));
-rc = walk_memory_regions_1(data, pa, level - 1, pp + i);
-if (rc != 0) {
-return rc;
-}
-}
-}
-
-return 0;
-}
-
-int walk_memory_regions(void *priv, walk_memory_regions_fn fn)
-{
-struct walk_memory_regions_data data;
-uintptr_t i, l1_sz = v_l1_size;
-
-data.fn = fn;
-data.priv = priv;
-data.start = -1u;
-data.prot = 0;
-
-for (i = 0; i < l1_sz; i++) {
-target_ulong base = i << (v_l1_shift + TARGET_PAGE_BITS);
-int rc = walk_memory_regions_1(&data, base, v_l2_levels, l1_map + i);
-if (rc != 0) {
-return rc;
-}
-}
-
-return walk_memory_regions_end(&data, 0, 0);
-}
-
-static int dump_region(void *priv, target_ulong start,
-target_ulong end, unsigned long prot)
-{
-FILE *f = (FILE *)priv;
-
-(void) fprintf(f, TARGET_FMT_lx"-"TARGET_FMT_lx
-" "TARGET_FMT_lx" %c%c%c\n",
-start, end, end - start,
-((prot & PAGE_READ) ? 'r' : '-'),
-((prot & PA

[PATCH v2 1/7] util: Add interval-tree.c

2022-10-27 Thread Richard Henderson
Copy and simplify the Linux kernel's interval_tree_generic.h,
instantiating for uint64_t.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/qemu/interval-tree.h|  99 
 tests/unit/test-interval-tree.c | 209 
 util/interval-tree.c| 882 
 tests/unit/meson.build  |   1 +
 util/meson.build|   1 +
 5 files changed, 1192 insertions(+)
 create mode 100644 include/qemu/interval-tree.h
 create mode 100644 tests/unit/test-interval-tree.c
 create mode 100644 util/interval-tree.c

diff --git a/include/qemu/interval-tree.h b/include/qemu/interval-tree.h
new file mode 100644
index 00..25006debe8
--- /dev/null
+++ b/include/qemu/interval-tree.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Interval trees.
+ *
+ * Derived from include/linux/interval_tree.h and its dependencies.
+ */
+
+#ifndef QEMU_INTERVAL_TREE_H
+#define QEMU_INTERVAL_TREE_H
+
+/*
+ * For now, don't expose Linux Red-Black Trees separately, but retain the
+ * separate type definitions to keep the implementation sane, and allow
+ * the possibility of disentangling them later.
+ */
+typedef struct RBNode
+{
+/* Encodes parent with color in the lsb. */
+uintptr_t rb_parent_color;
+struct RBNode *rb_right;
+struct RBNode *rb_left;
+} RBNode;
+
+typedef struct RBRoot
+{
+RBNode *rb_node;
+} RBRoot;
+
+typedef struct RBRootLeftCached {
+RBRoot rb_root;
+RBNode *rb_leftmost;
+} RBRootLeftCached;
+
+typedef struct IntervalTreeNode
+{
+RBNode rb;
+
+uint64_t start;/* Start of interval */
+uint64_t last; /* Last location _in_ interval */
+uint64_t subtree_last;
+} IntervalTreeNode;
+
+typedef RBRootLeftCached IntervalTreeRoot;
+
+/**
+ * interval_tree_is_empty
+ * @root: root of the tree.
+ *
+ * Returns true if the tree contains no nodes.
+ */
+static inline bool interval_tree_is_empty(const IntervalTreeRoot *root)
+{
+return root->rb_root.rb_node == NULL;
+}
+
+/**
+ * interval_tree_insert
+ * @node: node to insert,
+ * @root: root of the tree.
+ *
+ * Insert @node into @root, and rebalance.
+ */
+void interval_tree_insert(IntervalTreeNode *node, IntervalTreeRoot *root);
+
+/**
+ * interval_tree_remove
+ * @node: node to remove,
+ * @root: root of the tree.
+ *
+ * Remove @node from @root, and rebalance.
+ */
+void interval_tree_remove(IntervalTreeNode *node, IntervalTreeRoot *root);
+
+/**
+ * interval_tree_iter_first:
+ * @root: root of the tree,
+ * @start, @last: the inclusive interval [start, last].
+ *
+ * Locate the "first" of a set of nodes within the tree at @root
+ * that overlap the interval, where "first" is sorted by start.
+ * Returns NULL if no overlap found.
+ */
+IntervalTreeNode *interval_tree_iter_first(IntervalTreeRoot *root,
+   uint64_t start, uint64_t last);
+
+/**
+ * interval_tree_iter_next:
+ * @node: previous search result
+ * @start, @last: the inclusive interval [start, last].
+ *
+ * Locate the "next" of a set of nodes within the tree that overlap the
+ * interval; @next is the result of a previous call to
+ * interval_tree_iter_{first,next}.  Returns NULL if @next was the last
+ * node in the set.
+ */
+IntervalTreeNode *interval_tree_iter_next(IntervalTreeNode *node,
+  uint64_t start, uint64_t last);
+
+#endif /* QEMU_INTERVAL_TREE_H */
diff --git a/tests/unit/test-interval-tree.c b/tests/unit/test-interval-tree.c
new file mode 100644
index 00..119817a019
--- /dev/null
+++ b/tests/unit/test-interval-tree.c
@@ -0,0 +1,209 @@
+/*
+ * Test interval trees
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/interval-tree.h"
+
+static IntervalTreeNode nodes[20];
+static IntervalTreeRoot root;
+
+static void rand_interval(IntervalTreeNode *n, uint64_t start, uint64_t last)
+{
+gint32 s_ofs, l_ofs, l_max;
+
+if (last - start > INT32_MAX) {
+l_max = INT32_MAX;
+} else {
+l_max = last - start;
+}
+s_ofs = g_test_rand_int_range(0, l_max);
+l_ofs = g_test_rand_int_range(s_ofs, l_max);
+
+n->start = start + s_ofs;
+n->last = start + l_ofs;
+}
+
+static void test_empty(void)
+{
+g_assert(root.rb_root.rb_node == NULL);
+g_assert(root.rb_leftmost == NULL);
+g_assert(interval_tree_iter_first(&root, 0, UINT64_MAX) == NULL);
+}
+
+static void test_find_one_point(void)
+{
+/* Create a tree of a single node, which is the point [1,1]. */
+nodes[0].start = 1;
+nodes[0].last = 1;
+
+interval_tree_insert(&nodes[0], &root);
+
+g_assert(interval_tree_iter_first(&root, 0, 9) == &nodes[0]);
+g_assert(interval_tree_iter_next(&nodes[0], 0, 9) == NULL);
+g_assert(interval_tree_iter_first(&root, 0, 0) == NULL);
+g_assert(interval_tree_iter_next(&nodes[0

Re: [PATCH v2 1/6] accel/tcg: Introduce cpu_unwind_state_data

2022-10-27 Thread Richard Henderson

On 10/27/22 20:40, Claudio Fontana wrote:

On 10/27/22 12:02, Richard Henderson wrote:

Add a way to examine the unwind data without actually
restoring the data back into env.

Signed-off-by: Richard Henderson 
---
  accel/tcg/internal.h  |  4 +--
  include/exec/exec-all.h   | 21 ---
  accel/tcg/translate-all.c | 74 ++-
  3 files changed, 68 insertions(+), 31 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 1227bb69bd..9c06b320b7 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -106,8 +106,8 @@ void tb_reset_jump(TranslationBlock *tb, int n);
  TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 tb_page_addr_t phys_page2);
  bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
-int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-  uintptr_t searched_pc, bool reset_icount);
+void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
+   uintptr_t host_pc, bool reset_icount);
  
  /* Return the current PC from CPU, which may be cached in TB. */

  static inline target_ulong log_pc(CPUState *cpu, const TranslationBlock *tb)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e948992a80..7d851f5907 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -39,20 +39,33 @@ typedef ram_addr_t tb_page_addr_t;
  #define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
  #endif
  
+/**

+ * cpu_unwind_state_data:
+ * @cpu: the cpu context
+ * @host_pc: the host pc within the translation
+ * @data: output data
+ *
+ * Attempt to load the the unwind state for a host pc occurring in
+ * translated code.  If @host_pc is not in translated code, the
+ * function returns false; otherwise @data is loaded.
+ * This is the same unwind info as given to restore_state_to_opc.
+ */
+bool cpu_unwind_state_data(CPUState *cpu, uintptr_t host_pc, uint64_t *data);
+
  /**
   * cpu_restore_state:
- * @cpu: the vCPU state is to be restore to
- * @searched_pc: the host PC the fault occurred at
+ * @cpu: the cpu context
+ * @host_pc: the host pc within the translation
   * @will_exit: true if the TB executed will be interrupted after some
 cpu adjustments. Required for maintaining the correct
 icount valus
   * @return: true if state was restored, false otherwise
   *
   * Attempt to restore the state for a fault occurring in translated
- * code. If the searched_pc is not in translated code no state is
+ * code. If @host_pc is not in translated code no state is
   * restored and the function returns false.
   */
-bool cpu_restore_state(CPUState *cpu, uintptr_t searched_pc, bool will_exit);
+bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, bool will_exit);
  
  G_NORETURN void cpu_loop_exit_noexc(CPUState *cpu);

  G_NORETURN void cpu_loop_exit(CPUState *cpu);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f185356a36..319becb698 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -247,52 +247,66 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
  return p - block;
  }
  
-/* The cpu state corresponding to 'searched_pc' is restored.

- * When reset_icount is true, current TB will be interrupted and
- * icount should be recalculated.
- */
-int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-  uintptr_t searched_pc, bool reset_icount)



Maybe add a small comment about what the return value of this static function 
means?
It can be indirectly inferred from its point of use:

  +int insns_left = cpu_unwind_data_from_tb(tb, host_pc, data);

But I find having the information about the meaning of a function and return 
value useful to be available there.

IIUC for external functions the standard way is to document in the header 
files, but for the static functions I would think we can do it here.

With that Reviewed-by: Claudio Fontana 



I added

+/**
+ * cpu_unwind_data_from_tb: Load unwind data for TB
+ * @tb: translation block
+ * @host_pc: the host pc within translation
+ * @data: output array
+ *
+ * Within @tb, locate the guest insn whose translation contains @host_pc,
+ * then load the unwind data created by INDEX_opc_start_insn for that
+ * guest insn.  Return the number of guest insns which remain un-executed
+ * within @tb -- these must be credited back to the cpu's icount budget.
+ *
+ * If we could not determine which guest insn to which @host_pc belongs,
+ * return -1 and do not load unwind data.
+ * FIXME: Such a failure is likely to break the guest, as we were not
+ * expecting to unwind from such a location.  This may be some sort of
+ * backend code generation problem.  Consider asserting instead.
  */

Which I think captures some of your v1 comments as well.


r~



[PATCH v2 6/7] accel/tcg: Move PageDesc tree into tb-maint.c for system

2022-10-27 Thread Richard Henderson
Now that PageDesc is not used for user-only, and for system
it is only used for tb maintenance, move the implementation
into tb-main.c appropriately ifdefed.

We have not yet eliminated all references to PageDesc for
user-only, so retain a typedef to the structure without definition.

Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h  |  49 +++---
 accel/tcg/tb-maint.c  | 130 --
 accel/tcg/translate-all.c |  95 
 3 files changed, 134 insertions(+), 140 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index c7e157d1cd..c6c9e02cfd 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -23,51 +23,13 @@
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
-typedef struct PageDesc {
+typedef struct PageDesc PageDesc;
 #ifndef CONFIG_USER_ONLY
+struct PageDesc {
 QemuSpin lock;
 /* list of TBs intersecting this ram page */
 uintptr_t first_tb;
-#endif
-} PageDesc;
-
-/*
- * In system mode we want L1_MAP to be based on ram offsets,
- * while in user mode we want it to be based on virtual addresses.
- *
- * TODO: For user mode, see the caveat re host vs guest virtual
- * address spaces near GUEST_ADDR_MAX.
- */
-#if !defined(CONFIG_USER_ONLY)
-#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
-# define L1_MAP_ADDR_SPACE_BITS  HOST_LONG_BITS
-#else
-# define L1_MAP_ADDR_SPACE_BITS  TARGET_PHYS_ADDR_SPACE_BITS
-#endif
-#else
-# define L1_MAP_ADDR_SPACE_BITS  MIN(HOST_LONG_BITS, TARGET_ABI_BITS)
-#endif
-
-/* Size of the L2 (and L3, etc) page tables.  */
-#define V_L2_BITS 10
-#define V_L2_SIZE (1 << V_L2_BITS)
-
-/*
- * L1 Mapping properties
- */
-extern int v_l1_size;
-extern int v_l1_shift;
-extern int v_l2_levels;
-
-/*
- * The bottom level has pointers to PageDesc, and is indexed by
- * anything from 4 to (V_L2_BITS + 3) bits, depending on target page size.
- */
-#define V_L1_MIN_BITS 4
-#define V_L1_MAX_BITS (V_L2_BITS + 3)
-#define V_L1_MAX_SIZE (1 << V_L1_MAX_BITS)
-
-extern void *l1_map[V_L1_MAX_SIZE];
+};
 
 PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc);
 
@@ -76,6 +38,11 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 return page_find_alloc(index, false);
 }
 
+void page_table_config_init(void);
+#else
+static inline void page_table_config_init(void) { }
+#endif
+
 /* list iterators for lists of tagged pointers in TranslationBlock */
 #define TB_FOR_EACH_TAGGED(head, tb, n, field)  \
 for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);\
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 694440cb4a..31d0a74aa9 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -127,6 +127,121 @@ static PageForEachNext foreach_tb_next(PageForEachNext tb,
 }
 
 #else
+/*
+ * In system mode we want L1_MAP to be based on ram offsets.
+ */
+#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
+# define L1_MAP_ADDR_SPACE_BITS  HOST_LONG_BITS
+#else
+# define L1_MAP_ADDR_SPACE_BITS  TARGET_PHYS_ADDR_SPACE_BITS
+#endif
+
+/* Size of the L2 (and L3, etc) page tables.  */
+#define V_L2_BITS 10
+#define V_L2_SIZE (1 << V_L2_BITS)
+
+/*
+ * L1 Mapping properties
+ */
+static int v_l1_size;
+static int v_l1_shift;
+static int v_l2_levels;
+
+/*
+ * The bottom level has pointers to PageDesc, and is indexed by
+ * anything from 4 to (V_L2_BITS + 3) bits, depending on target page size.
+ */
+#define V_L1_MIN_BITS 4
+#define V_L1_MAX_BITS (V_L2_BITS + 3)
+#define V_L1_MAX_SIZE (1 << V_L1_MAX_BITS)
+
+static void *l1_map[V_L1_MAX_SIZE];
+
+void page_table_config_init(void)
+{
+uint32_t v_l1_bits;
+
+assert(TARGET_PAGE_BITS);
+/* The bits remaining after N lower levels of page tables.  */
+v_l1_bits = (L1_MAP_ADDR_SPACE_BITS - TARGET_PAGE_BITS) % V_L2_BITS;
+if (v_l1_bits < V_L1_MIN_BITS) {
+v_l1_bits += V_L2_BITS;
+}
+
+v_l1_size = 1 << v_l1_bits;
+v_l1_shift = L1_MAP_ADDR_SPACE_BITS - TARGET_PAGE_BITS - v_l1_bits;
+v_l2_levels = v_l1_shift / V_L2_BITS - 1;
+
+assert(v_l1_bits <= V_L1_MAX_BITS);
+assert(v_l1_shift % V_L2_BITS == 0);
+assert(v_l2_levels >= 0);
+}
+
+PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
+{
+PageDesc *pd;
+void **lp;
+int i;
+
+/* Level 1.  Always allocated.  */
+lp = l1_map + ((index >> v_l1_shift) & (v_l1_size - 1));
+
+/* Level 2..N-1.  */
+for (i = v_l2_levels; i > 0; i--) {
+void **p = qatomic_rcu_read(lp);
+
+if (p == NULL) {
+void *existing;
+
+if (!alloc) {
+return NULL;
+}
+p = g_new0(void *, V_L2_SIZE);
+existing = qatomic_cmpxchg(lp, NULL, p);
+if (unlikely(existing)) {
+g_free(p);
+p = existing;
+}
+}
+
+lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
+}
+
+pd = qatomic_rcu_read(lp);
+   

  1   2   3   4   5   >