date:20221027


On 25/10/2022 16.28, Amarjargal Gundjalam wrote:

The TABs should be replaced with spaces, to make sure that we have a
consistent coding style with an indentation of 4 spaces everywhere.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/370
Reviewed-by: Daniel P. Berrangé 

Signed-off-by: Amarjargal Gundjalam 
---
  hw/audio/fmopl.c  | 1664 ++---
  hw/audio/fmopl.h  |  138 +--
  hw/audio/intel-hda-defs.h |  990 +++---
  hw/audio/wm8750.c |  270 +++---
  4 files changed, 1531 insertions(+), 1531 deletions(-)


You're changes with regards to TAB clean up look fine to me here, so for 
this patch:


Reviewed-by: Thomas Huth 

... but when I looked through the fmopl.c part, it really looks like this 
file is completely wrong with regards to the QEMU coding style. I wonder 
whether we should rather use a tool like "astyle" or "indent" to get it into 
proper shape? ... or do we rather want to keep it in its original style in 
case somebody still wants to try to port patches from the original sources 
(MAME)? In that latter case, we should maybe also keep the TABs here? Gerd, 
what do you think?


 Thomas

Re: [PATCH v4 3/4] hw/display: fix tab indentation


On 25/10/2022 16.28, Amarjargal Gundjalam wrote:

The TABs should be replaced with spaces, to make sure that we have a
consistent coding style with an indentation of 4 spaces everywhere.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/370
Reviewed-by: Daniel P. Berrangé 

Signed-off-by: Amarjargal Gundjalam 
---
  hw/display/blizzard.c   |  352 -
  hw/display/cirrus_vga.c | 1606 +++
  hw/display/omap_dss.c   |  598 +++
  hw/display/pxa2xx_lcd.c |  196 ++---
  hw/display/vga_regs.h   |6 +-
  hw/display/xenfb.c  |  260 +++
  6 files changed, 1509 insertions(+), 1509 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH v3 0/24] Convert nanoMIPS disassembler from C++ to C


On 12/09/2022 14.26, Milica Lazarevic wrote:

Hi,

This patchset converts the nanomips disassembler to plain C. C++ features
like class, std::string type, exception handling, and function overloading
have been removed and replaced with the equivalent C code.


 Hi Philippe, hi Stefan,

as far as I can see, this patch set has been completely reviewed, and IMHO 
it would be nice to get this into QEMU 7.2 to finally get rid of the C++ 
dependency in the QEMU code ... could one of you pick this up and send a 
pull request with the patches? Or is there still anything left to do here?


 Thomas

[PATCH] block/block-backend: blk_set_enable_write_cache is IO_CODE

2022-10-27 Thread Emanuele Giuseppe Esposito

blk_set_enable_write_cache() is defined as GLOBAL_STATE_CODE
but can be invoked from iothreads when handling scsi requests.
This triggers an assertion failure:

 0x7fd6c3515ce1 in raise () from /lib/x86_64-linux-gnu/libc.so.6
 0x7fd6c34ff537 in abort () from /lib/x86_64-linux-gnu/libc.so.6
 0x7fd6c34ff40f in ?? () from /lib/x86_64-linux-gnu/libc.so.6
 0x7fd6c350e662 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
 0x56149e2cea03 in blk_set_enable_write_cache (wce=true, blk=0x5614a01c27f0)
   at ../src/block/block-backend.c:1949
 0x56149e2d0a67 in blk_set_enable_write_cache (blk=0x5614a01c27f0,
   wce=) at ../src/block/block-backend.c:1951
 0x56149dfe9c59 in scsi_disk_apply_mode_select (p=0x7fd6b400c00e "\004",
   page=, s=) at ../src/hw/scsi/scsi-disk.c:1520
 mode_select_pages (change=true, len=18, p=0x7fd6b400c00e "\004", 
r=0x7fd6b4001ff0)
   at ../src/hw/scsi/scsi-disk.c:1570
 scsi_disk_emulate_mode_select (inbuf=, r=0x7fd6b4001ff0) at
   ../src/hw/scsi/scsi-disk.c:1640
 scsi_disk_emulate_write_data (req=0x7fd6b4001ff0) at 
../src/hw/scsi/scsi-disk.c:1934
 0x56149e18ff16 in virtio_scsi_handle_cmd_req_submit (req=,
   req=, s=0x5614a12f16b0) at ../src/hw/scsi/virtio-scsi.c:719
 virtio_scsi_handle_cmd_vq (vq=0x7fd6bab92140, s=0x5614a12f16b0) at
   ../src/hw/scsi/virtio-scsi.c:761
 virtio_scsi_handle_cmd (vq=, vdev=) at
   ../src/hw/scsi/virtio-scsi.c:775
 virtio_scsi_handle_cmd (vdev=0x5614a12f16b0, vq=0x7fd6bab92140) at
   ../src/hw/scsi/virtio-scsi.c:765
 0x56149e1a8aa6 in virtio_queue_notify_vq (vq=0x7fd6bab92140) at
   ../src/hw/virtio/virtio.c:2365
 0x56149e3ccea5 in aio_dispatch_handler (ctx=ctx@entry=0x5614a01babe0,
   node=) at ../src/util/aio-posix.c:369
 0x56149e3cd868 in aio_dispatch_ready_handlers (ready_list=0x7fd6c09b2680,
   ctx=0x5614a01babe0) at ../src/util/aio-posix.c:399
 aio_poll (ctx=0x5614a01babe0, blocking=blocking@entry=true) at
   ../src/util/aio-posix.c:713
 0x56149e2a7796 in iothread_run (opaque=opaque@entry=0x56149ffde500) at
   ../src/iothread.c:67
 0x56149e3d0859 in qemu_thread_start (args=0x7fd6c09b26f0) at
   ../src/util/qemu-thread-posix.c:504
 0x7fd6c36b9ea7 in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
 0x7fd6c35d9aef in clone () from /lib/x86_64-linux-gnu/libc.so.6

Changing GLOBAL_STATE_CODE in IO_CODE is allowed, since GSC callers are
allowed to call IO_CODE.

Resolves: #1272

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index aa4adf06ae..ade4da55e0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1946,7 +1946,7 @@ bool blk_enable_write_cache(BlockBackend *blk)
 
 void blk_set_enable_write_cache(BlockBackend *blk, bool wce)
 {
-GLOBAL_STATE_CODE();
+IO_CODE();
 blk->enable_write_cache = wce;
 }
 
-- 
2.31.1

[PATCH v4 0/2] vhost-vdpa: add support for vIOMMU

2022-10-27 Thread Cindy Lu

These patches are to support vIOMMU in vdpa device

changes in V3
1. Move function vfio_get_xlat_addr to memory.c
2. Use the existing memory listener, while the MR is
iommu MR then call the function iommu_region_add/
iommu_region_del

changes in V4
1.make the comments in vfio_get_xlat_addr more general

Cindy Lu (2):
  vfio: move the function vfio_get_xlat_addr() to memory.c
  vhost-vdpa: add support for vIOMMU

 hw/vfio/common.c   |  92 +--
 hw/virtio/vhost-vdpa.c | 131 ++---
 include/exec/memory.h  |   4 +
 include/hw/virtio/vhost-vdpa.h |  10 +++
 softmmu/memory.c   |  84 +
 5 files changed, 222 insertions(+), 99 deletions(-)

-- 
2.34.3

[PATCH v4 2/2] vhost-vdpa: add support for vIOMMU

2022-10-27 Thread Cindy Lu

Add support for vIOMMU. add the new function to deal with iommu MR.
- during iommu_region_add register a specific IOMMU notifier,
 and store all notifiers in a list.
- during iommu_region_del, compare and delete the IOMMU notifier from the list

Verified in vp_vdpa and vdpa_sim_net driver

Signed-off-by: Cindy Lu 
---
 hw/virtio/vhost-vdpa.c | 131 ++---
 include/hw/virtio/vhost-vdpa.h |  10 +++
 2 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 3ff9ce3501..407f3e9ac2 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -26,6 +26,7 @@
 #include "cpu.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "hw/virtio/virtio-access.h"
 
 /*
  * Return one past the end of the end of section. Be careful with uint64_t
@@ -44,7 +45,6 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 uint64_t iova_min,
 uint64_t iova_max)
 {
-Int128 llend;
 
 if ((!memory_region_is_ram(section->mr) &&
  !memory_region_is_iommu(section->mr)) ||
@@ -61,14 +61,6 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
 return true;
 }
 
-llend = vhost_vdpa_section_end(section);
-if (int128_gt(llend, int128_make64(iova_max))) {
-error_report("RAM section out of device range (max=0x%" PRIx64
- ", end addr=0x%" PRIx64 ")",
- iova_max, int128_get64(llend));
-return true;
-}
-
 return false;
 }
 
@@ -173,6 +165,115 @@ static void vhost_vdpa_listener_commit(MemoryListener 
*listener)
 v->iotlb_batch_begin_sent = false;
 }
 
+static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
+
+hwaddr iova = iotlb->iova + iommu->iommu_offset;
+struct vhost_vdpa *v = iommu->dev;
+void *vaddr;
+int ret;
+
+if (iotlb->target_as != &address_space_memory) {
+error_report("Wrong target AS \"%s\", only system memory is allowed",
+ iotlb->target_as->name ? iotlb->target_as->name : "none");
+return;
+}
+RCU_READ_LOCK_GUARD();
+vhost_vdpa_iotlb_batch_begin_once(v);
+
+if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+bool read_only;
+
+if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
+  &address_space_memory)) {
+return;
+}
+ret =
+vhost_vdpa_dma_map(v, iova, iotlb->addr_mask + 1, vaddr, 
read_only);
+if (ret) {
+error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
+ "0x%" HWADDR_PRIx ", %p) = %d (%m)",
+ v, iova, iotlb->addr_mask + 1, vaddr, ret);
+}
+} else {
+ret = vhost_vdpa_dma_unmap(v, iova, iotlb->addr_mask + 1);
+if (ret) {
+error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
+ "0x%" HWADDR_PRIx ") = %d (%m)",
+ v, iova, iotlb->addr_mask + 1, ret);
+}
+}
+}
+
+static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+struct vdpa_iommu *iommu;
+Int128 end;
+int iommu_idx;
+IOMMUMemoryRegion *iommu_mr;
+int ret;
+
+if (!memory_region_is_iommu(section->mr)) {
+return;
+}
+
+iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+
+iommu = g_malloc0(sizeof(*iommu));
+end =  int128_add(int128_make64(section->offset_within_region),
+section->size);
+end = int128_sub(end, int128_one());
+iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
+MEMTXATTRS_UNSPECIFIED);
+
+iommu->iommu_mr = iommu_mr;
+
+iommu_notifier_init(
+&iommu->n, vhost_vdpa_iommu_map_notify, IOMMU_NOTIFIER_IOTLB_EVENTS,
+section->offset_within_region, int128_get64(end), iommu_idx);
+iommu->iommu_offset =
+section->offset_within_address_space - section->offset_within_region;
+iommu->dev = v;
+
+ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
+if (ret) {
+g_free(iommu);
+return;
+}
+
+QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
+memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
+
+return;
+}
+
+static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+struct vdpa_iommu *iommu;
+
+if (!memory_region_is_iommu(section->mr)) {
+return;
+}
+
+QLIST_FORE

[PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Cindy Lu

Move the function vfio_get_xlat_addr to softmmu/memory.c, and
change the name to memory_get_xlat_addr().So we can use this
function in other devices,such as vDPA device.

Signed-off-by: Cindy Lu 
---
 hw/vfio/common.c  | 92 ++-
 include/exec/memory.h |  4 ++
 softmmu/memory.c  | 84 +++
 3 files changed, 92 insertions(+), 88 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ace9562a9b..2b5a9f3d8d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -574,92 +574,6 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)
section->offset_within_address_space & (1ULL << 63);
 }
 
-/* Called with rcu_read_lock held.  */
-static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
-   ram_addr_t *ram_addr, bool *read_only)
-{
-MemoryRegion *mr;
-hwaddr xlat;
-hwaddr len = iotlb->addr_mask + 1;
-bool writable = iotlb->perm & IOMMU_WO;
-
-/*
- * The IOMMU TLB entry we have just covers translation through
- * this IOMMU to its immediate target.  We need to translate
- * it the rest of the way through to memory.
- */
-mr = address_space_translate(&address_space_memory,
- iotlb->translated_addr,
- &xlat, &len, writable,
- MEMTXATTRS_UNSPECIFIED);
-if (!memory_region_is_ram(mr)) {
-error_report("iommu map to non memory area %"HWADDR_PRIx"",
- xlat);
-return false;
-} else if (memory_region_has_ram_discard_manager(mr)) {
-RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
-MemoryRegionSection tmp = {
-.mr = mr,
-.offset_within_region = xlat,
-.size = int128_make64(len),
-};
-
-/*
- * Malicious VMs can map memory into the IOMMU, which is expected
- * to remain discarded. vfio will pin all pages, populating memory.
- * Disallow that. vmstate priorities make sure any RamDiscardManager
- * were already restored before IOMMUs are restored.
- */
-if (!ram_discard_manager_is_populated(rdm, &tmp)) {
-error_report("iommu map to discarded memory (e.g., unplugged via"
- " virtio-mem): %"HWADDR_PRIx"",
- iotlb->translated_addr);
-return false;
-}
-
-/*
- * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
- * pages will remain pinned inside vfio until unmapped, resulting in a
- * higher memory consumption than expected. If memory would get
- * populated again later, there would be an inconsistency between pages
- * pinned by vfio and pages seen by QEMU. This is the case until
- * unmapped from the IOMMU (e.g., during device reset).
- *
- * With malicious guests, we really only care about pinning more memory
- * than expected. RLIMIT_MEMLOCK set for the user/process can never be
- * exceeded and can be used to mitigate this problem.
- */
-warn_report_once("Using vfio with vIOMMUs and coordinated discarding 
of"
- " RAM (e.g., virtio-mem) works, however, malicious"
- " guests can trigger pinning of more memory than"
- " intended via an IOMMU. It's possible to mitigate "
- " by setting/adjusting RLIMIT_MEMLOCK.");
-}
-
-/*
- * Translation truncates length to the IOMMU page size,
- * check that it did not truncate too much.
- */
-if (len & iotlb->addr_mask) {
-error_report("iommu has granularity incompatible with target AS");
-return false;
-}
-
-if (vaddr) {
-*vaddr = memory_region_get_ram_ptr(mr) + xlat;
-}
-
-if (ram_addr) {
-*ram_addr = memory_region_get_ram_addr(mr) + xlat;
-}
-
-if (read_only) {
-*read_only = !writable || mr->readonly;
-}
-
-return true;
-}
-
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
 bool read_only;
 
-if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
+if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
+  &address_space_memory)) {
 goto out;
 }
 /*
@@ -1359,7 +1274,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 }
 
 rcu_read_lock();
-if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
+if (memory_get_xlat_addr(iotlb, NULL, &translated_addr, N

Re: [PATCH] avocado: use sha1 for fc31 imgs to avoid first time re-download


On 24/10/2022 11.02, Daniel P. Berrangé wrote:

On Sat, Oct 22, 2022 at 02:03:50PM -0300, Daniel Henrique Barboza wrote:

'make check-avocado' will download any images that aren't present in the
cache via 'get-vm-images' in tests/Makefile.include. The target that
downloads fedora 31 images, get-vm-image-fedora-31, will use 'avocado
vmimage get  --distro=fedora --distro-version=31 --arch=(...)' to
download the image for each arch. Note that this command does not
support any argument to set the hash algorithm used and, based on the
avocado source code [1], DEFAULT_HASH_ALGORITHM is set to "sha1". The
sha1 hash is stored in a Fedora-Cloud-Base-31-1.9.{ARCH}.qcow2-CHECKSUM
in the cache.



For now, in QEMU, let's use sha1 for all Fedora 31 images. This will
immediately spares us at least one extra download for each Fedora 31
image that we're doing in all our CI runs.

[1] https://github.com/avocado-framework/avocado.git @ 942a5d6972906
[2] https://github.com/avocado-framework/avocado/issues/5496


Can we just ask Avocado maintainers to fix this problem on their
side to allow use of a modern hash alg as a priority item. We've
already had this problem in QEMU for over a year AFAICT, so doesn't
seem like we need to urgently do a workaround on QEMU side, so we
can get Avocado devs to commit to fixing it in the next month.


Do we have such a commitment? ... The avocado version in QEMU is completely 
backlevel these days, it's still using version 88.1 from May 2021, i.e. 
there hasn't been any update since more than a year. I recently tried to 
bump it to a newer version on my own (since I'm still suffering from the 
problem that find_free_port() does not work if you don't have a local IPv6 
address), but it's not that straight forward since the recent versions of 
avocado changed a lot of things (e.g. the new nrunner - do we want to run 
tests in parallel? If so it breaks a lot of the timeout settings, I think), 
so an update needs a lot of careful testing...


So unless someone is really committing to spend a lot of time on updating 
Avocado in QEMU in the near future, I don't think that such a fix for the 
hash algorithm will happen any time soon, and thus I think we should 
consider to include this work-around for the time being.


 Thomas

Re: Crash in RTC

2022-10-27 Thread Konstantin Kostiuk

ping

On Wed, Aug 31, 2022 at 11:33 AM Vadim Rozenfeld 
wrote:

> Just a bit more info related to this issue.
> Below is a quote from my previous conversation with Yan
>
> 
> QEMU RTC function periodic_timer_update is calling in response
> to Windows HAL calls
> _HalpRtcArmTimer@16
> and
> _HalpRtcStop@4
>
> WIndows can change timer  frequency  dynamically
> (some more info can be found here
> https://bugzilla.redhat.com/show_bug.cgi?id=1610461 )
> but calculation of the frequency is based on the wallclock time (IIRC).
> And if I'm not mistaken, then lost_tick_policy=delay can lead to the
> wallclock time delay,
> which in my understanding can lead to the incorrect frequency calculation.
>
> Another interesting thing is that they don't use Hyper-V enlightenments at
> all.
> Do you know if there is any particular reason for that?  They might try
> switching
> to hv_stimer instead of RTC.
>
> And one more thing, the frequency of the timer can be adjusted by UM
> applications.
> Some of them , like emulators and servers use it quite widely. It worse
> asking them
> if they are running such kinds of apps.
> 
>
>
> Cheers,
> Vadim.
>
> On Wed, Aug 31, 2022 at 5:46 PM Konstantin Kostiuk 
> wrote:
>
>> CC: Vadim
>>
>> On Wed, Aug 31, 2022 at 10:42 AM Konstantin Kostiuk 
>> wrote:
>>
>>> ping
>>>
>>> On Wed, Aug 24, 2022 at 5:37 PM Konstantin Kostiuk 
>>> wrote:
>>>
 Hi Michael and Paolo,

 I write to you as maintainers of mc146818rtc.c. I am working on bug
 https://bugzilla.redhat.com/show_bug.cgi?id=2054781
 and reproduced it on the current master branch.

 I added some print at line 202 (before assert(lost_clock >= 0),
 https://gitlab.com/qemu-project/qemu/-/blob/master/hw/rtc/mc146818rtc.c#L202)
 and got the following values:

 next_periodic_clock, old_period, last_periodic_clock, cur_clock,
 lost_clock, current_time
 54439076429968, 32, 54439076429936, 54439076430178, 242,
 1661348768010822000
 54439076430224, 512, 54439076429712, 54439076430188, 476,
 166134876807000
 54439076430224, 32, 54439076430192, 54439076429884, -308,
 1661348768001838000

 The current_time value in the last print is lower than in the previous
 one.
 So, the error occurs because time has gone backward.

 I think this is a possible situation during time synchronization.
 My question is what should we do in this case?

 Best Regards,
 Konstantin Kostiuk.

>>>

[PATCH V4 0/4] PASID support for Intel IOMMU

Hi All:

This series tries to introduce PASID support for Intel IOMMU. The work
is based on the previous scalabe mode support by implement the
ECAP_PASID. A new "x-pasid-mode" is introduced to enable this
mode. All internal vIOMMU codes were extended to support PASID instead
of the current RID2PASID method. The code is also capable of
provisiong address space with PASID. Note that no devices can issue
PASID DMA right now, this needs future work.

This will be used for prototying PASID based device like virtio or
future vPASID support for Intel IOMMU.

Test has been done with the Linux guest with scalalbe mode enabled and
disabled. A virtio prototype[1][2] that can issue PAISD based DMA
request were also tested, different PASID were used in TX and RX in
those testing drivers.

Changes since V3:

- rearrange the member for vtd_iotlb_key structure
- reorder the pasid parameter ahead of addr for vtd_lookup_iotlb()
- allow access size from 1 to 8 for vtd_mem_ir_fault_ops

Changes since V2:

- use PCI_BUILD_BDF() instead of vtd_make_source_id()
- Tweak the comments above vtd_as_hash()
- use PCI_BUS_NUM() instead of open coding
- rename vtd_as to vtd_address_spaces
- rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
- forbid device-iotlb with PASID
- report PASID based qualified fault
- log PASID during errors

Changes since V1:

- speed up IOMMU translation when RID2PASID is not used
- remove the unnecessary L1 PASID invalidation descriptor support
- adding support for catching the translation to interrupt range when
  in the case of PT and scalable mode
- refine the comments to explain the hash algorithm used in IOTLB
  lookups

Please review.

[1] https://github.com/jasowang/qemu.git virtio-pasid
[2] https://github.com/jasowang/linux.git virtio-pasid

Jason Wang (4):
  intel-iommu: don't warn guest errors when getting rid2pasid entry
  intel-iommu: drop VTDBus
  intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
  intel-iommu: PASID support

 hw/i386/intel_iommu.c  | 685 ++---
 hw/i386/intel_iommu_internal.h |  16 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |  18 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 482 insertions(+), 241 deletions(-)

-- 
2.25.1

[PATCH V4 2/4] intel-iommu: drop VTDBus

We introduce VTDBus structure as an intermediate step for searching
the address space. This works well with SID based matching/lookup. But
when we want to support SID plus PASID based address space lookup,
this intermediate steps turns out to be a burden. So the patch simply
drops the VTDBus structure and use the PCIBus and devfn as the key for
the g_hash_table(). This simplifies the codes and the future PASID
extension.

To prevent being slower for past vtd_find_as_from_bus_num() callers, a
vtd_as cache indexed by the bus number is introduced to store the last
recent search result of a vtd_as belongs to a specific bus.

Reviewed-by: Peter Xu 
Signed-off-by: Jason Wang 
---
Changes since V2:
- use PCI_BUILD_BDF() instead of vtd_make_source_id()
- Tweak the comments above vtd_as_hash()
- use PCI_BUS_NUM() instead of open coding
- rename vtd_as to vtd_address_spaces
---
 hw/i386/intel_iommu.c | 234 +-
 include/hw/i386/intel_iommu.h |  11 +-
 2 files changed, 118 insertions(+), 127 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 796f924c06..6abe12a8c5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -61,6 +61,16 @@
 } \
 }
 
+/*
+ * PCI bus number (or SID) is not reliable since the device is usaully
+ * initalized before guest can configure the PCI bridge
+ * (SECONDARY_BUS_NUMBER).
+ */
+struct vtd_as_key {
+PCIBus *bus;
+uint8_t devfn;
+};
+
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
@@ -210,6 +220,27 @@ static guint vtd_uint64_hash(gconstpointer v)
 return (guint)*(const uint64_t *)v;
 }
 
+static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
+{
+const struct vtd_as_key *key1 = v1;
+const struct vtd_as_key *key2 = v2;
+
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+}
+
+/*
+ * Note that we use pointer to PCIBus as the key, so hashing/shifting
+ * based on the pointer value is intended. Note that we deal with
+ * collisions through vtd_as_equal().
+ */
+static guint vtd_as_hash(gconstpointer v)
+{
+const struct vtd_as_key *key = v;
+guint value = (guint)(uintptr_t)key->bus;
+
+return (guint)(value << 8 | key->devfn);
+}
+
 static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
   gpointer user_data)
 {
@@ -248,22 +279,14 @@ static gboolean vtd_hash_remove_by_page(gpointer key, 
gpointer value,
 static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
 {
 VTDAddressSpace *vtd_as;
-VTDBus *vtd_bus;
-GHashTableIter bus_it;
-uint32_t devfn_it;
+GHashTableIter as_it;
 
 trace_vtd_context_cache_reset();
 
-g_hash_table_iter_init(&bus_it, s->vtd_as_by_busptr);
+g_hash_table_iter_init(&as_it, s->vtd_address_spaces);
 
-while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
-for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
-vtd_as = vtd_bus->dev_as[devfn_it];
-if (!vtd_as) {
-continue;
-}
-vtd_as->context_cache_entry.context_cache_gen = 0;
-}
+while (g_hash_table_iter_next (&as_it, NULL, (void**)&vtd_as)) {
+vtd_as->context_cache_entry.context_cache_gen = 0;
 }
 s->context_cache_gen = 1;
 }
@@ -993,32 +1016,6 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, 
uint32_t level)
 return slpte & rsvd_mask;
 }
 
-/* Find the VTD address space associated with a given bus number */
-static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
-{
-VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
-GHashTableIter iter;
-
-if (vtd_bus) {
-return vtd_bus;
-}
-
-/*
- * Iterate over the registered buses to find the one which
- * currently holds this bus number and update the bus_num
- * lookup table.
- */
-g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-if (pci_bus_num(vtd_bus->bus) == bus_num) {
-s->vtd_as_by_bus_num[bus_num] = vtd_bus;
-return vtd_bus;
-}
-}
-
-return NULL;
-}
-
 /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
@@ -1634,24 +1631,13 @@ static bool vtd_switch_address_space(VTDAddressSpace 
*as)
 
 static void vtd_switch_address_space_all(IntelIOMMUState *s)
 {
+VTDAddressSpace *vtd_as;
 GHashTableIter iter;
-VTDBus *vtd_bus;
-int i;
-
-g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-for (i = 0; i < PCI_DEVFN_MAX; i++) {
-if (!vtd_bus->dev_as[i]) {
-

[PATCH V4 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry

We use to warn on wrong rid2pasid entry. But this error could be
triggered by the guest and could happens during initialization. So
let's don't warn in this case.

Signed-off-by: Jason Wang 
---
 hw/i386/intel_iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6524c2ee32..796f924c06 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1554,8 +1554,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, 
VTDContextEntry *ce)
 if (s->root_scalable) {
 ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
 if (ret) {
-error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
-  __func__, ret);
+/*
+ * This error is guest triggerable. We should assumt PT
+ * not enabled for safety.
+ */
 return false;
 }
 return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
-- 
2.25.1

[PATCH V4 4/4] intel-iommu: PASID support

This patch introduce ECAP_PASID via "x-pasid-mode". Based on the
existing support for scalable mode, we need to implement the following
missing parts:

1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation
   with PASID
2) tag IOTLB with PASID
3) PASID cache and its flush
4) PASID based IOTLB invalidation

For simplicity PASID cache is not implemented so we can simply
implement the PASID cache flush as a no and leave it to be implemented
in the future. For PASID based IOTLB invalidation, since we haven't
had L1 stage support, the PASID based IOTLB invalidation is not
implemented yet. For PASID based device IOTLB invalidation, it
requires the support for vhost so we forbid enabling device IOTLB when
PASID is enabled now. Those work could be done in the future.

Note that though PASID based IOMMU translation is ready but no device
can issue PASID DMA right now. In this case, PCI_NO_PASID is used as
PASID to identify the address without PASID. vtd_find_add_as() has
been extended to provision address space with PASID which could be
utilized by the future extension of PCI core to allow device model to
use PASID based DMA translation.

This feature would be useful for:

1) prototyping PASID support for devices like virtio
2) future vPASID work
3) future PRS and vSVA work

Signed-off-by: Jason Wang 
---
Changes since V3:
- rearrange the member for vtd_iotlb_key structure
- reorder the pasid parameter ahead of addr for vtd_lookup_iotlb()
- allow access size from 1 to 8 for vtd_mem_ir_fault_ops
Changes since V2:
- forbid device-iotlb with PASID
- report PASID based qualified fault
- log PASID during errors
---
 hw/i386/intel_iommu.c  | 415 +
 hw/i386/intel_iommu_internal.h |  16 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |   7 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 338 insertions(+), 104 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6c03ecf3cb..1265e7dacf 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -58,6 +58,14 @@
 struct vtd_as_key {
 PCIBus *bus;
 uint8_t devfn;
+uint32_t pasid;
+};
+
+struct vtd_iotlb_key {
+uint64_t gfn;
+uint32_t pasid;
+uint32_t level;
+uint16_t sid;
 };
 
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
@@ -199,14 +207,24 @@ static inline gboolean 
vtd_as_has_map_notifier(VTDAddressSpace *as)
 }
 
 /* GHashTable functions */
-static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
+static gboolean vtd_iotlb_equal(gconstpointer v1, gconstpointer v2)
 {
-return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+const struct vtd_iotlb_key *key1 = v1;
+const struct vtd_iotlb_key *key2 = v2;
+
+return key1->sid == key2->sid &&
+   key1->pasid == key2->pasid &&
+   key1->level == key2->level &&
+   key1->gfn == key2->gfn;
 }
 
-static guint vtd_uint64_hash(gconstpointer v)
+static guint vtd_iotlb_hash(gconstpointer v)
 {
-return (guint)*(const uint64_t *)v;
+const struct vtd_iotlb_key *key = v;
+
+return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
+   (key->level) << VTD_IOTLB_LVL_SHIFT |
+   (key->pasid) << VTD_IOTLB_PASID_SHIFT;
 }
 
 static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
@@ -214,7 +232,8 @@ static gboolean vtd_as_equal(gconstpointer v1, 
gconstpointer v2)
 const struct vtd_as_key *key1 = v1;
 const struct vtd_as_key *key2 = v2;
 
-return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn) &&
+   (key1->pasid == key2->pasid);
 }
 
 /*
@@ -302,13 +321,6 @@ static void vtd_reset_caches(IntelIOMMUState *s)
 vtd_iommu_unlock(s);
 }
 
-static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
-  uint32_t level)
-{
-return gfn | ((uint64_t)(source_id) << VTD_IOTLB_SID_SHIFT) |
-   ((uint64_t)(level) << VTD_IOTLB_LVL_SHIFT);
-}
-
 static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 {
 return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
@@ -316,15 +328,17 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t 
level)
 
 /* Must be called with IOMMU lock held */
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
-   hwaddr addr)
+   uint32_t pasid, hwaddr addr)
 {
+struct vtd_iotlb_key key;
 VTDIOTLBEntry *entry;
-uint64_t key;
 int level;
 
 for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
-key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
-source_id, level);
+key.gfn = vtd_get_iotlb_gfn(addr, level);
+key.level = level;
+key.sid = source_id;
+key.pasid = pasid;
 entry = g_hash_t

[PATCH V4 3/4] intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function

We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an
internal goto which prevents it from being reused. This patch convert
that macro to a dedicated function and let the caller to decide what
to do (e.g using goto or not). This makes sure it can be re-used for
other function that requires fault reporting.

Signed-off-by: Jason Wang 
---
Changes since V2:
- rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
---
 hw/i386/intel_iommu.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6abe12a8c5..6c03ecf3cb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -49,17 +49,6 @@
 /* pe operations */
 #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT)
 #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & 
VTD_SM_PASID_ENTRY_AW))
-#define VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write) {\
-if (ret_fr) { \
-ret_fr = -ret_fr; \
-if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {   \
-trace_vtd_fault_disabled();   \
-} else {  \
-vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);  \
-} \
-goto error;   \
-} \
-}
 
 /*
  * PCI bus number (or SID) is not reliable since the device is usaully
@@ -1718,6 +1707,19 @@ out:
 trace_vtd_pt_enable_fast_path(source_id, success);
 }
 
+static void vtd_report_qualify_fault(IntelIOMMUState *s,
+ int err, bool is_fpd_set,
+ uint16_t source_id,
+ hwaddr addr,
+ bool is_write)
+{
+if (is_fpd_set && vtd_is_qualified_fault(err)) {
+trace_vtd_fault_disabled();
+} else {
+vtd_report_dmar_fault(s, source_id, addr, err, is_write);
+}
+}
+
 /* Map dev to context-entry then do a paging-structures walk to do a iommu
  * translation.
  *
@@ -1778,7 +1780,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
 if (!is_fpd_set && s->root_scalable) {
 ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, 
is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set,
+ source_id, addr, is_write);
+goto error;
+}
 }
 } else {
 ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
@@ -1786,7 +1792,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 if (!ret_fr && !is_fpd_set && s->root_scalable) {
 ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
 }
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set,
+ source_id, addr, is_write);
+goto error;
+}
 /* Update context-cache */
 trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo,
   cc_entry->context_cache_gen,
@@ -1822,7 +1832,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 
 ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level,
&reads, &writes, s->aw_bits);
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
+if (ret_fr) {
+vtd_report_qualify_fault(s, -ret_fr, is_fpd_set, source_id,
+ addr, is_write);
+goto error;
+}
 
 page_mask = vtd_slpt_level_page_mask(level);
 access_flags = IOMMU_ACCESS_FLAG(reads, writes);
-- 
2.25.1

Re: [PATCH 5/7] block/nfs: Fix 32-bit Windows build

2022-10-27 Thread Kevin Wolf

Am 27.10.2022 um 04:45 hat Bin Meng geschrieben:
> Hi Kevin,
> [...]
> Will you queue this patch via the block tree?

Just to be sure, you mean only patch 5? Yes, I can do that.

Kevin

Re: [PATCH v10 1/9] s390x/cpu topology: core_id sets s390x CPU topology


On 24/10/2022 21.25, Janis Schoetterl-Glausch wrote:

On Wed, 2022-10-12 at 18:20 +0200, Pierre Morel wrote:

In the S390x CPU topology the core_id specifies the CPU address
and the position of the core withing the topology.

Let's build the topology based on the core_id.
s390x/cpu topology: core_id sets s390x CPU topology

In the S390x CPU topology the core_id specifies the CPU address
and the position of the cpu withing the topology.

Let's build the topology based on the core_id.

Signed-off-by: Pierre Morel 
---
  include/hw/s390x/cpu-topology.h |  45 +++
  hw/s390x/cpu-topology.c | 132 
  hw/s390x/s390-virtio-ccw.c  |  21 +
  hw/s390x/meson.build|   1 +
  4 files changed, 199 insertions(+)
  create mode 100644 include/hw/s390x/cpu-topology.h
  create mode 100644 hw/s390x/cpu-topology.c

diff --git a/include/hw/s390x/cpu-topology.h b/include/hw/s390x/cpu-topology.h
new file mode 100644
index 00..66c171d0bc
--- /dev/null
+++ b/include/hw/s390x/cpu-topology.h
@@ -0,0 +1,45 @@
+/*
+ * CPU Topology
+ *
+ * Copyright 2022 IBM Corp.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#ifndef HW_S390X_CPU_TOPOLOGY_H
+#define HW_S390X_CPU_TOPOLOGY_H
+
+#include "hw/qdev-core.h"
+#include "qom/object.h"
+
+typedef struct S390TopoContainer {
+int active_count;
+} S390TopoContainer;
+
+#define S390_TOPOLOGY_CPU_IFL 0x03
+#define S390_TOPOLOGY_MAX_ORIGIN ((63 + S390_MAX_CPUS) / 64)
+typedef struct S390TopoTLE {
+uint64_t mask[S390_TOPOLOGY_MAX_ORIGIN];
+} S390TopoTLE;


Since this actually represents multiple TLEs, you might want to change the
name of the struct to reflect this. S390TopoTLEList maybe?


Didn't TLE mean "Topology List Entry"? (by the way, Pierre, please explain 
this three letter acronym somewhere in this header in a comment)...


So expanding the TLE, this would mean S390TopoTopologyListEntryList ? ... 
this is getting weird... Also, this is not a "list" in the sense of a linked 
list, as one might expect at a first glance, so this is all very confusing 
here. Could you please come up with some better naming?


 Thomas

Re: [PATCH v1 10/12] hw/arm: introduce xenpv machine



Vikram Garhwal  writes:


> Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
> TPM emulator and connects to swtpm running on host machine via chardev socket
> and support TPM functionalities for a guest domain.
>
> Extra command line for aarch64 xenpv QEMU to connect to swtpm:
> -chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
> -tpmdev emulator,id=tpm0,chardev=chrtpm \
>
> swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
> provides access to TPM functionality over socket, chardev and CUSE interface.
> Github repo: https://github.com/stefanberger/swtpm
> Example for starting swtpm on host machine:
> mkdir /tmp/vtpm2
> swtpm socket --tpmstate dir=/tmp/vtpm2 \
> --ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &


> +static void xen_enable_tpm(void)
> +{
> +/* qemu_find_tpm_be is only available when CONFIG_TPM is enabled. */
> +#ifdef CONFIG_TPM
> +Error *errp = NULL;
> +DeviceState *dev;
> +SysBusDevice *busdev;
> +
> +TPMBackend *be = qemu_find_tpm_be("tpm0");
> +if (be == NULL) {
> +DPRINTF("Couldn't fine the backend for tpm0\n");
> +return;
> +}
> +dev = qdev_new(TYPE_TPM_TIS_SYSBUS);
> +object_property_set_link(OBJECT(dev), "tpmdev", OBJECT(be), &errp);
> +object_property_set_str(OBJECT(dev), "tpmdev", be->id, &errp);
> +busdev = SYS_BUS_DEVICE(dev);
> +sysbus_realize_and_unref(busdev, &error_fatal);
> +sysbus_mmio_map(busdev, 0, GUEST_TPM_BASE);

I'm not sure what has gone wrong here but I'm getting:

  ../../hw/arm/xen_arm.c: In function ‘xen_enable_tpm’:
  ../../hw/arm/xen_arm.c:120:32: error: ‘GUEST_TPM_BASE’ undeclared (first use 
in this function); did you mean ‘GUEST_RAM_BASE’?
120 | sysbus_mmio_map(busdev, 0, GUEST_TPM_BASE);
|^~
|GUEST_RAM_BASE
  ../../hw/arm/xen_arm.c:120:32: note: each undeclared identifier is reported 
only once for each function it appears in

In my cross build:

  # Configured with: '../../configure' '--disable-docs' 
'--target-list=aarch64-softmmu' '--disable-kvm' '--enable-xen' 
'--disable-opengl' '--disable-libudev' '--enable-tpm' 
'--disable-xen-pci-passthrough' '--cross-prefix=aarch64-linux-gnu-' 
'--skip-meson'

which makes me wonder if this is a configure failure or a confusion
about being able to have host swtpm implementations during emulation but
needing target tpm for Xen?

-- 
Alex Bennée

Re: [PATCH v4 2/2] vhost-vdpa: add support for vIOMMU

On Thu, Oct 27, 2022 at 3:41 PM Cindy Lu  wrote:
>
> Add support for vIOMMU. add the new function to deal with iommu MR.
> - during iommu_region_add register a specific IOMMU notifier,
>  and store all notifiers in a list.
> - during iommu_region_del, compare and delete the IOMMU notifier from the list
>
> Verified in vp_vdpa and vdpa_sim_net driver
>
> Signed-off-by: Cindy Lu 

Acked-by: Jason Wang 

(some nits, see below)

> ---
>  hw/virtio/vhost-vdpa.c | 131 ++---
>  include/hw/virtio/vhost-vdpa.h |  10 +++
>  2 files changed, 130 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 3ff9ce3501..407f3e9ac2 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -26,6 +26,7 @@
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-access.h"
>
>  /*
>   * Return one past the end of the end of section. Be careful with uint64_t
> @@ -44,7 +45,6 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>  uint64_t iova_min,
>  uint64_t iova_max)
>  {
> -Int128 llend;
>
>  if ((!memory_region_is_ram(section->mr) &&
>   !memory_region_is_iommu(section->mr)) ||
> @@ -61,14 +61,6 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>  return true;
>  }
>
> -llend = vhost_vdpa_section_end(section);
> -if (int128_gt(llend, int128_make64(iova_max))) {
> -error_report("RAM section out of device range (max=0x%" PRIx64
> - ", end addr=0x%" PRIx64 ")",
> - iova_max, int128_get64(llend));
> -return true;
> -}
> -
>  return false;
>  }
>
> @@ -173,6 +165,115 @@ static void vhost_vdpa_listener_commit(MemoryListener 
> *listener)
>  v->iotlb_batch_begin_sent = false;
>  }
>
> +static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry 
> *iotlb)
> +{
> +struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
> +
> +hwaddr iova = iotlb->iova + iommu->iommu_offset;
> +struct vhost_vdpa *v = iommu->dev;
> +void *vaddr;
> +int ret;
> +
> +if (iotlb->target_as != &address_space_memory) {
> +error_report("Wrong target AS \"%s\", only system memory is allowed",
> + iotlb->target_as->name ? iotlb->target_as->name : 
> "none");
> +return;
> +}
> +RCU_READ_LOCK_GUARD();
> +vhost_vdpa_iotlb_batch_begin_once(v);
> +
> +if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> +bool read_only;
> +
> +if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> +  &address_space_memory)) {
> +return;
> +}
> +ret =
> +vhost_vdpa_dma_map(v, iova, iotlb->addr_mask + 1, vaddr, 
> read_only);
> +if (ret) {
> +error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
> + "0x%" HWADDR_PRIx ", %p) = %d (%m)",
> + v, iova, iotlb->addr_mask + 1, vaddr, ret);
> +}
> +} else {
> +ret = vhost_vdpa_dma_unmap(v, iova, iotlb->addr_mask + 1);
> +if (ret) {
> +error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
> + "0x%" HWADDR_PRIx ") = %d (%m)",
> + v, iova, iotlb->addr_mask + 1, ret);
> +}
> +}
> +}
> +
> +static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
> +MemoryRegionSection *section)
> +{
> +struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, 
> listener);
> +
> +struct vdpa_iommu *iommu;
> +Int128 end;
> +int iommu_idx;
> +IOMMUMemoryRegion *iommu_mr;
> +int ret;
> +
> +if (!memory_region_is_iommu(section->mr)) {
> +return;

Nit: So we had already had one check in the caller, there's no need to
check twice. (this could be done on top).

> +}
> +
> +iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> +
> +iommu = g_malloc0(sizeof(*iommu));
> +end =  int128_add(int128_make64(section->offset_within_region),
> +section->size);
> +end = int128_sub(end, int128_one());
> +iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> +MEMTXATTRS_UNSPECIFIED);
> +
> +iommu->iommu_mr = iommu_mr;
> +
> +iommu_notifier_init(
> +&iommu->n, vhost_vdpa_iommu_map_notify, IOMMU_NOTIFIER_IOTLB_EVENTS,
> +section->offset_within_region, int128_get64(end), iommu_idx);
> +iommu->iommu_offset =
> +section->offset_within_address_space - section->offset_within_region;
> +iommu->dev = v;
> +
> +ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, 
> NULL);
> +if (ret) {
> +g_free(iommu);
> +return;
> +}
> +
>

Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

On Thu, Oct 27, 2022 at 3:41 PM Cindy Lu  wrote:
>
> Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> change the name to memory_get_xlat_addr().So we can use this
> function in other devices,such as vDPA device.
>
> Signed-off-by: Cindy Lu 

Acked-by: Jason Wang 

> ---
>  hw/vfio/common.c  | 92 ++-
>  include/exec/memory.h |  4 ++
>  softmmu/memory.c  | 84 +++
>  3 files changed, 92 insertions(+), 88 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ace9562a9b..2b5a9f3d8d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -574,92 +574,6 @@ static bool 
> vfio_listener_skipped_section(MemoryRegionSection *section)
> section->offset_within_address_space & (1ULL << 63);
>  }
>
> -/* Called with rcu_read_lock held.  */
> -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> -   ram_addr_t *ram_addr, bool *read_only)
> -{
> -MemoryRegion *mr;
> -hwaddr xlat;
> -hwaddr len = iotlb->addr_mask + 1;
> -bool writable = iotlb->perm & IOMMU_WO;
> -
> -/*
> - * The IOMMU TLB entry we have just covers translation through
> - * this IOMMU to its immediate target.  We need to translate
> - * it the rest of the way through to memory.
> - */
> -mr = address_space_translate(&address_space_memory,
> - iotlb->translated_addr,
> - &xlat, &len, writable,
> - MEMTXATTRS_UNSPECIFIED);
> -if (!memory_region_is_ram(mr)) {
> -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> - xlat);
> -return false;
> -} else if (memory_region_has_ram_discard_manager(mr)) {
> -RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> -MemoryRegionSection tmp = {
> -.mr = mr,
> -.offset_within_region = xlat,
> -.size = int128_make64(len),
> -};
> -
> -/*
> - * Malicious VMs can map memory into the IOMMU, which is expected
> - * to remain discarded. vfio will pin all pages, populating memory.
> - * Disallow that. vmstate priorities make sure any RamDiscardManager
> - * were already restored before IOMMUs are restored.
> - */
> -if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> -error_report("iommu map to discarded memory (e.g., unplugged via"
> - " virtio-mem): %"HWADDR_PRIx"",
> - iotlb->translated_addr);
> -return false;
> -}
> -
> -/*
> - * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> - * pages will remain pinned inside vfio until unmapped, resulting in 
> a
> - * higher memory consumption than expected. If memory would get
> - * populated again later, there would be an inconsistency between 
> pages
> - * pinned by vfio and pages seen by QEMU. This is the case until
> - * unmapped from the IOMMU (e.g., during device reset).
> - *
> - * With malicious guests, we really only care about pinning more 
> memory
> - * than expected. RLIMIT_MEMLOCK set for the user/process can never 
> be
> - * exceeded and can be used to mitigate this problem.
> - */
> -warn_report_once("Using vfio with vIOMMUs and coordinated discarding 
> of"
> - " RAM (e.g., virtio-mem) works, however, malicious"
> - " guests can trigger pinning of more memory than"
> - " intended via an IOMMU. It's possible to mitigate "
> - " by setting/adjusting RLIMIT_MEMLOCK.");
> -}
> -
> -/*
> - * Translation truncates length to the IOMMU page size,
> - * check that it did not truncate too much.
> - */
> -if (len & iotlb->addr_mask) {
> -error_report("iommu has granularity incompatible with target AS");
> -return false;
> -}
> -
> -if (vaddr) {
> -*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> -}
> -
> -if (ram_addr) {
> -*ram_addr = memory_region_get_ram_addr(mr) + xlat;
> -}
> -
> -if (read_only) {
> -*read_only = !writable || mr->readonly;
> -}
> -
> -return true;
> -}
> -
>  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>  {
>  VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> IOMMUTLBEntry *iotlb)
>  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>  bool read_only;
>
> -if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> +if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> +  &address_space_memo

Re: [PATCH v10 2/9] s390x/cpu topology: reporting the CPU topology to the guest


On 12/10/2022 18.21, Pierre Morel wrote:

The guest can use the STSI instruction to get a buffer filled
with the CPU topology description.

Let us implement the STSI instruction for the basis CPU topology
level, level 2.

Signed-off-by: Pierre Morel 
---
  include/hw/s390x/cpu-topology.h |   3 +
  target/s390x/cpu.h  |  48 ++
  hw/s390x/cpu-topology.c |   8 ++-
  target/s390x/cpu_topology.c | 109 
  target/s390x/kvm/kvm.c  |   6 +-
  target/s390x/meson.build|   1 +
  6 files changed, 172 insertions(+), 3 deletions(-)
  create mode 100644 target/s390x/cpu_topology.c

diff --git a/include/hw/s390x/cpu-topology.h b/include/hw/s390x/cpu-topology.h
index 66c171d0bc..61c11db017 100644
--- a/include/hw/s390x/cpu-topology.h
+++ b/include/hw/s390x/cpu-topology.h
@@ -13,6 +13,8 @@
  #include "hw/qdev-core.h"
  #include "qom/object.h"
  
+#define S390_TOPOLOGY_POLARITY_H  0x00

+
  typedef struct S390TopoContainer {
  int active_count;
  } S390TopoContainer;
@@ -29,6 +31,7 @@ struct S390Topology {
  S390TopoContainer *socket;
  S390TopoTLE *tle;
  MachineState *ms;
+QemuMutex topo_mutex;
  };
  
  #define TYPE_S390_CPU_TOPOLOGY "s390-topology"

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 7d6d01325b..d604aa9c78 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -565,6 +565,52 @@ typedef union SysIB {
  } SysIB;
  QEMU_BUILD_BUG_ON(sizeof(SysIB) != 4096);
  
+/* CPU type Topology List Entry */

+typedef struct SysIBTl_cpu {
+uint8_t nl;
+uint8_t reserved0[3];
+uint8_t reserved1:5;
+uint8_t dedicated:1;
+uint8_t polarity:2;
+uint8_t type;
+uint16_t origin;
+uint64_t mask;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_cpu;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_cpu) != 16);
+
+/* Container type Topology List Entry */
+typedef struct SysIBTl_container {
+uint8_t nl;
+uint8_t reserved[6];
+uint8_t id;
+} QEMU_PACKED QEMU_ALIGNED(8) SysIBTl_container;
+QEMU_BUILD_BUG_ON(sizeof(SysIBTl_container) != 8);
+
+#define TOPOLOGY_NR_MAG  6
+#define TOPOLOGY_NR_MAG6 0
+#define TOPOLOGY_NR_MAG5 1
+#define TOPOLOGY_NR_MAG4 2
+#define TOPOLOGY_NR_MAG3 3
+#define TOPOLOGY_NR_MAG2 4
+#define TOPOLOGY_NR_MAG1 5
+/* Configuration topology */
+typedef struct SysIB_151x {
+uint8_t  reserved0[2];
+uint16_t length;
+uint8_t  mag[TOPOLOGY_NR_MAG];
+uint8_t  reserved1;
+uint8_t  mnest;
+uint32_t reserved2;
+char tle[0];
+} QEMU_PACKED QEMU_ALIGNED(8) SysIB_151x;
+QEMU_BUILD_BUG_ON(sizeof(SysIB_151x) != 16);
+
+/* Maxi size of a SYSIB structure is when all CPU are alone in a container */
+#define S390_TOPOLOGY_SYSIB_SIZE (sizeof(SysIB_151x) + 
\
+  S390_MAX_CPUS * (sizeof(SysIBTl_container) + 
\
+   sizeof(SysIBTl_cpu)))
+
+
  /* MMU defines */
  #define ASCE_ORIGIN   (~0xfffULL) /* segment table origin 
*/
  #define ASCE_SUBSPACE 0x200   /* subspace group control   
*/
@@ -843,4 +889,6 @@ S390CPU *s390_cpu_addr2state(uint16_t cpu_addr);
  
  #include "exec/cpu-all.h"
  
+void insert_stsi_15_1_x(S390CPU *cpu, int sel2, __u64 addr, uint8_t ar);

+
  #endif
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index 42b22a1831..c73cebfe6f 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -54,8 +54,6 @@ void s390_topology_new_cpu(int core_id)
  return;
  }
  
-socket_id = core_id / topo->cpus;

-
  /*
   * At the core level, each CPU is represented by a bit in a 64bit
   * unsigned long which represent the presence of a CPU.
@@ -76,8 +74,13 @@ void s390_topology_new_cpu(int core_id)
  bit %= 64;
  bit = 63 - bit;
  
+qemu_mutex_lock(&topo->topo_mutex);

+
+socket_id = core_id / topo->cpus;
  topo->socket[socket_id].active_count++;
  set_bit(bit, &topo->tle[socket_id].mask[origin]);
+
+qemu_mutex_unlock(&topo->topo_mutex);
  }
  
  /**

@@ -101,6 +104,7 @@ static void s390_topology_realize(DeviceState *dev, Error 
**errp)
  topo->tle = g_new0(S390TopoTLE, ms->smp.max_cpus);
  
  topo->ms = ms;

+qemu_mutex_init(&topo->topo_mutex);
  }
  
  /**

diff --git a/target/s390x/cpu_topology.c b/target/s390x/cpu_topology.c
new file mode 100644
index 00..df86a98f23
--- /dev/null
+++ b/target/s390x/cpu_topology.c
@@ -0,0 +1,109 @@
+/*
+ * QEMU S390x CPU Topology
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "hw/s390x/pv.h"
+#include "hw/sysbus.h"
+#include "hw/s390x/cpu-topology.h"
+#include "hw/s390x/sclp.h"
+
+#define S390_TOPOLOGY_MAX_STSI_SIZE (S3

Re: [PATCH v10 3/9] s390x/cpu_topology: resetting the Topology-Change-Report


On 12/10/2022 18.21, Pierre Morel wrote:

During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
  bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
Reviewed-by: Nico Boehr 
Reviewed-by: Janis Schoetterl-Glausch 
---
  target/s390x/cpu.h   |  1 +
  target/s390x/kvm/kvm_s390x.h |  1 +
  hw/s390x/cpu-topology.c  | 12 
  hw/s390x/s390-virtio-ccw.c   |  1 +
  target/s390x/cpu-sysemu.c|  7 +++
  target/s390x/kvm/kvm.c   | 23 +++
  6 files changed, 45 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index d604aa9c78..9b35795ac8 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -825,6 +825,7 @@ void s390_enable_css_support(S390CPU *cpu);
  void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg);
  int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t sch_id,
  int vq, bool assign);
+void s390_cpu_topology_reset(void);
  #ifndef CONFIG_USER_ONLY
  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
  #else
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index aaae8570de..a13c8fb9a3 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -46,5 +46,6 @@ void kvm_s390_crypto_reset(void);
  void kvm_s390_restart_interrupt(S390CPU *cpu);
  void kvm_s390_stop_interrupt(S390CPU *cpu);
  void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info);
+int kvm_s390_topology_set_mtcr(uint64_t attr);
  
  #endif /* KVM_S390X_H */

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index c73cebfe6f..9f202621d0 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -107,6 +107,17 @@ static void s390_topology_realize(DeviceState *dev, Error 
**errp)
  qemu_mutex_init(&topo->topo_mutex);
  }
  
+/**

+ * s390_topology_reset:
+ * @dev: the device
+ *
+ * Calls the sysemu topology reset
+ */
+static void s390_topology_reset(DeviceState *dev)
+{
+s390_cpu_topology_reset();
+}
+
  /**
   * topology_class_init:
   * @oc: Object class
@@ -120,6 +131,7 @@ static void topology_class_init(ObjectClass *oc, void *data)
  
  dc->realize = s390_topology_realize;

  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+dc->reset = s390_topology_reset;
  }
  
  static const TypeInfo cpu_topology_info = {

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index aa99a62e42..362378454a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -113,6 +113,7 @@ static const char *const reset_dev_types[] = {
  "s390-flic",
  "diag288",
  TYPE_S390_PCI_HOST_BRIDGE,
+TYPE_S390_CPU_TOPOLOGY,
  };
  
  static void subsystem_reset(void)

diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 948e4bd3e0..707c0b658c 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -306,3 +306,10 @@ void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data 
arg)
  kvm_s390_set_diag318(cs, arg.host_ulong);
  }
  }
+
+void s390_cpu_topology_reset(void)
+{
+if (kvm_enabled()) {
+kvm_s390_topology_set_mtcr(0);
+}
+}
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index f96630440b..9c994d27d5 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2585,3 +2585,26 @@ int kvm_s390_get_zpci_op(void)
  {
  return cap_zpci_op;
  }
+
+int kvm_s390_topology_set_mtcr(uint64_t attr)
+{
+struct kvm_device_attr attribute = {
+.group = KVM_S390_VM_CPU_TOPOLOGY,
+.attr  = attr,
+};
+int ret;
+
+if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+return -EFAULT;


EFAULT is something that indicates a bad address (e.g. a segmentation fault) 
... so this definitely sounds like a bad choice for an error code here.


 Thomas

Re: [PATCH 5/7] block/nfs: Fix 32-bit Windows build

2022-10-27 Thread Bin Meng

On Thu, Oct 27, 2022 at 3:55 PM Kevin Wolf  wrote:
>
> Am 27.10.2022 um 04:45 hat Bin Meng geschrieben:
> > Hi Kevin,
> > [...]
> > Will you queue this patch via the block tree?
>
> Just to be sure, you mean only patch 5? Yes, I can do that.
>

Yes, only this one. Thank you.

Regards,
Bin

Re: [PATCH v5 00/13] Instantiate VT82xx functions in host device

2022-10-27 Thread Bernhard Beschow

Am 16. September 2022 14:36:05 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 12/9/22 21:50, Bernhard Beschow wrote:
>> Am 1. September 2022 11:41:14 UTC schrieb Bernhard Beschow 
>> :
>
>>> Testing done:
>>> 
>>> * `qemu-system-ppc -machine pegasos2 -rtc base=localtime -device 
>>> ati-vga,guest_hwcursor=true,romfile="" -cdrom morphos-3.17.iso -kernel 
>>> morphos-3.17/boot.img`
>>> 
>>>   Boots successfully and it is possible to open games and tools.
>>> 
>>> 
>>> 
>>> * I was unable to test the fuloong2e board even before this series since it 
>>> seems to be unfinished [1].
>>> 
>>>   A buildroot-baked kernel [2] booted but doesn't find its root partition, 
>>> though the issues could be in the buildroot receipt I created.
>>> 
>>> 
>>> 
>>> [1] https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2
>>> 
>>> [2] https://github.com/shentok/buildroot/commits/fuloong2e
>>> 
>> 
>> Copying from v2 (just found it in my spam folder :/):
>> Series:
>> Reviewed-by: Philippe Mathieu-Daudé 
>> 
>> Review seems complete, thanks to all who participated! Now we just need 
>> someone to queue this series.
>> 
>> Best regards,
>> Bernhard
>
>Excellent cleanup! Series queued to mips-next.

Hi Phil,

would you mind doing a pull request in time for 7.2?

Thanks,
Bernhard

Re: [PATCH v1 0/3] target/riscv: Apply KVM policy to ISA extensions

2022-10-27 Thread Andrew Jones

On Thu, Oct 27, 2022 at 7:52 AM Mayuresh Chitale
 wrote:
>
> Currently the single and multi letter ISA extensions exposed to the guest
> vcpu don't confirm to the KVM policies. This patchset updates the kvm headers
> and applies policies set in KVM to the extensions exposed to the guest.
>
> Mayuresh Chitale (3):
>   update-linux-headers: Version 6.1-rc2
>   target/riscv: Extend isa_ext_data for single letter extensions
>   target/riscv: kvm: Support selecting VCPU extensions
>

I already reviewed this internally and it hasn't changed, so

for the series

Reviewed-by: Andrew Jones 

Thanks,
drew

Re: [PATCH v1 00/12] Introduce xenpv machine for arm architecture



Vikram Garhwal  writes:

> Hi,
> This series add xenpv machine for aarch64. Motivation behind creating xenpv
> machine with IOREQ and TPM was to enable each guest on Xen aarch64 to have 
> it's
> own unique and emulated TPM.
>
> This series does following:
> 1. Moved common xen functionalities from hw/i386/xen to hw/xen/ so those 
> can
>be used for aarch64.
> 2. We added a minimal xenpv arm machine which creates an IOREQ server and
>support TPM.

Now I have some CI minutes again:

  https://gitlab.com/stsquad/qemu/-/pipelines/677956972/failures

which broadly break down into:

  * GUEST_TPM_BASE define missing
  * #include  failure on builds that don't enable Xen
  * CPUTLBEntryFull f; breakage (tcg bits in a non-tcg build?)

-- 
Alex Bennée

Re: [PATCH v1 10/12] hw/arm: introduce xenpv machine

Julien Grall  writes:

> Hi,
>
> There seem to be some missing patches on xen-devel (including the
> cover letter). Is that expected?
>
> On 15/10/2022 06:07, Vikram Garhwal wrote:
>> Add a new machine xenpv which creates a IOREQ server to register/connect with
>> Xen Hypervisor.
>
> I don't like the name 'xenpv' because it doesn't convey the fact that
> some of the HW may be emulated rather than para-virtualized. In fact
> one may only want to use for emulating devices.
>
> Potential name would be 'xen-arm' or re-using 'virt' but with
> 'accel=xen' to select a Xen layout.

I don't think you can re-use the machine name and select by accelerator
because the virt machine does quite a lot of other stuff this model
doesn't support. However I've been calling this concept "xen-virt" or
maybe the explicit "xen-virtio" because that is what it is targeting.

-- 
Alex Bennée

Re: [PATCH v1 10/12] hw/arm: introduce xenpv machine



Vikram Garhwal  writes:

> Add a new machine xenpv which creates a IOREQ server to register/connect with
> Xen Hypervisor.
>

> Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
> TPM emulator and connects to swtpm running on host machine via chardev socket
> and support TPM functionalities for a guest domain.

> +
> +static void xen_arm_machine_class_init(ObjectClass *oc, void *data)
> +{
> +
> +MachineClass *mc = MACHINE_CLASS(oc);
> +mc->desc = "Xen Para-virtualized PC";
> +mc->init = xen_arm_init;
> +mc->max_cpus = 1;
> +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);

This needs #ifdef CONFIG_TPM because while doing --disable-tpm to try
and get the cross build working it then fails with:

../../hw/arm/xen_arm.c: In function ‘xen_arm_machine_class_init’:
../../hw/arm/xen_arm.c:148:48: error: ‘TYPE_TPM_TIS_SYSBUS’ undeclared (first 
use in this function)
  148 | machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
  |^~~
../../hw/arm/xen_arm.c:148:48: note: each undeclared identifier is reported 
only once for each function it appears in

-- 
Alex Bennée

Re: [PATCH v1 08/12] hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure



Vikram Garhwal  writes:

> From: Stefano Stabellini 
>
> On ARM it is possible to have a functioning xenpv machine with only the
> PV backends and no IOREQ server. If the IOREQ server creation fails continue
> to the PV backends initialization.
>
> Signed-off-by: Stefano Stabellini 
> ---
>  hw/xen/xen-hvm-common.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> index f848f9e625..7bccf595fc 100644
> --- a/hw/xen/xen-hvm-common.c
> +++ b/hw/xen/xen-hvm-common.c
> @@ -777,7 +777,11 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
> max_cpus,
>  goto err;
>  }
>  
> -xen_create_ioreq_server(xen_domid, &state->ioservid);
> +rc = xen_create_ioreq_server(xen_domid, &state->ioservid);
> +if (rc) {
> +DPRINTF("xen: failed to create ioreq server\n");

This should be a warn_report to properly inform the user.

> +goto no_ioreq;

Maybe pushing the rest of this function into a local subroutine would
reduce the amount of goto messing about. Other candidates for cleaning
up/modernising:

  - g_malloc to g_new0
  - perror -> error_setg_errno

> +}
>  
>  state->exit.notify = xen_exit_notifier;
>  qemu_add_exit_notifier(&state->exit);
> @@ -842,6 +846,7 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
> max_cpus,
>  QLIST_INIT(&state->dev_list);
>  device_listener_register(&state->device_listener);
>  
> +no_ioreq:
>  xen_bus_init();
>  
>  /* Initialize backend core & drivers */


-- 
Alex Bennée

Re: [PATCH v1 04/12] hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState



Vikram Garhwal  writes:

> From: Stefano Stabellini 
>
> In preparation to moving most of xen-hvm code to an arch-neutral location, 
> move:
> - shared_vmport_page
> - log_for_dirtybit
> - dirty_bitmap
> - suspend
> - wakeup
>
> out of XenIOState struct as these are only used on x86, especially the ones
> related to dirty logging.
> Updated XenIOState can be used for both aarch64 and x86.
>
> Also, remove free_phys_offset as it was unused.
>
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Vikram Garhwal 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH] avocado: use sha1 for fc31 imgs to avoid first time re-download

2022-10-27 Thread Daniel P . Berrangé

On Thu, Oct 27, 2022 at 09:46:29AM +0200, Thomas Huth wrote:
> On 24/10/2022 11.02, Daniel P. Berrangé wrote:
> > On Sat, Oct 22, 2022 at 02:03:50PM -0300, Daniel Henrique Barboza wrote:
> > > 'make check-avocado' will download any images that aren't present in the
> > > cache via 'get-vm-images' in tests/Makefile.include. The target that
> > > downloads fedora 31 images, get-vm-image-fedora-31, will use 'avocado
> > > vmimage get  --distro=fedora --distro-version=31 --arch=(...)' to
> > > download the image for each arch. Note that this command does not
> > > support any argument to set the hash algorithm used and, based on the
> > > avocado source code [1], DEFAULT_HASH_ALGORITHM is set to "sha1". The
> > > sha1 hash is stored in a Fedora-Cloud-Base-31-1.9.{ARCH}.qcow2-CHECKSUM
> > > in the cache.
> > 
> > > For now, in QEMU, let's use sha1 for all Fedora 31 images. This will
> > > immediately spares us at least one extra download for each Fedora 31
> > > image that we're doing in all our CI runs.
> > > 
> > > [1] https://github.com/avocado-framework/avocado.git @ 942a5d6972906
> > > [2] https://github.com/avocado-framework/avocado/issues/5496
> > 
> > Can we just ask Avocado maintainers to fix this problem on their
> > side to allow use of a modern hash alg as a priority item. We've
> > already had this problem in QEMU for over a year AFAICT, so doesn't
> > seem like we need to urgently do a workaround on QEMU side, so we
> > can get Avocado devs to commit to fixing it in the next month.
> 
> Do we have such a commitment? ... The avocado version in QEMU is completely
> backlevel these days, it's still using version 88.1 from May 2021, i.e.
> there hasn't been any update since more than a year. I recently tried to
> bump it to a newer version on my own (since I'm still suffering from the
> problem that find_free_port() does not work if you don't have a local IPv6
> address), but it's not that straight forward since the recent versions of
> avocado changed a lot of things (e.g. the new nrunner - do we want to run
> tests in parallel? If so it breaks a lot of the timeout settings, I think),
> so an update needs a lot of careful testing...

That it is so difficult to update Avocado after barely more than
1 year is not exactly a strong vote of confidence in our continued
use of Avocado long term :-(

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-10-27 Thread David Hildenbrand

On 14.10.22 15:47, David Hildenbrand wrote:

This is a follow-up on "util: NUMA aware memory preallocation" [1] by
Michal.

Setting the CPU affinity of threads from inside QEMU usually isn't
easily possible, because we don't want QEMU -- once started and running
guest code -- to be able to mess up the system. QEMU disallows relevant
syscalls using seccomp, such that any such invocation will fail.

Especially for memory preallocation in memory backends, the CPU affinity
can significantly increase guest startup time, for example, when running
large VMs backed by huge/gigantic pages, because of NUMA effects. For
NUMA-aware preallocation, we have to set the CPU affinity, however:

(1) Once preallocation threads are created during preallocation, management
tools cannot intercept anymore to change the affinity. These threads
are created automatically on demand.
(2) QEMU cannot easily set the CPU affinity itself.
(3) The CPU affinity derived from the NUMA bindings of the memory backend
might not necessarily be exactly the CPUs we actually want to use
(e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).

There is an easy "workaround". If we have a thread with the right CPU
affinity, we can simply create new threads on demand via that prepared
context. So, all we have to do is setup and create such a context ahead
of time, to then configure preallocation to create new threads via that
environment.

So, let's introduce a user-creatable "thread-context" object that
essentially consists of a context thread used to create new threads.
QEMU can either try setting the CPU affinity itself ("cpu-affinity",
"node-affinity" property), or upper layers can extract the thread id
("thread-id" property) to configure it externally.

Make memory-backends consume a thread-context object
(via the "prealloc-context" property) and use it when preallocating to
create new threads with the desired CPU affinity. Further, to make it
easier to use, allow creation of "thread-context" objects, including
setting the CPU affinity directly from QEMU, before enabling the
sandbox option.

Quick test on a system with 2 NUMA nodes:

Without CPU affinity:
time qemu-system-x86_64 \
-object
memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind
\
-nographic -monitor stdio

real0m5.383s
real0m3.499s
real0m5.129s
real0m4.232s
real0m5.220s
real0m4.288s
real0m3.582s
real0m4.305s
real0m5.421s
real0m4.502s

-> It heavily depends on the scheduler CPU selection

With CPU affinity:
time qemu-system-x86_64 \
-object thread-context,id=tc1,node-affinity=0 \
-object
memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1
\
-sandbox enable=on,resourcecontrol=deny \
-nographic -monitor stdio

real0m1.959s
real0m1.942s
real0m1.943s
real0m1.941s
real0m1.948s
real0m1.964s
real0m1.949s
real0m1.948s
real0m1.941s
real0m1.937s

On reasonably large VMs, the speedup can be quite significant.

While this concept is currently only used for short-lived preallocation
threads, nothing major speaks against reusing the concept for other
threads that are harder to identify/configure -- except that
we need additional (idle) context threads that are otherwise left unused.

This series does not yet tackle concurrent preallocation of memory
backends. Memory backend objects are created and memory is preallocated one
memory backend at a time -- and there is currently no way to do
preallocation asynchronously.

[1]
https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mpriv...@redhat.com

v2 -> v3:
* "util: Introduce ThreadContext user-creatable object"
-> Further impove documentation and patch description and add ACK. [Markus]
* "util: Add write-only "node-affinity" property for ThreadContext"
-> Further impove documentation and patch description and add ACK. [Markus]

v1 -> v2:
* Fixed some minor style nits
* "util: Introduce ThreadContext user-creatable object"
-> Impove documentation and patch description. [Markus]
* "util: Add write-only "node-affinity" property for ThreadContext"
-> Impove documentation and patch description. [Markus]

RFC -> v1:
* "vl: Allow ThreadContext objects to be created before the sandbox option"
-> Move parsing of the "name" property before object_create_pre_sandbox
* Added RB's

I'm queuing this to

https://github.com/davidhildenbrand/qemu.git mem-next

And most probably send a MR tomorrow before soft-freeze.

--
Thanks,

David / dhildenb

Re: [PATCH v1 05/12] hw/i386/xen/xen-hvm: create arch_handle_ioreq and arch_xen_set_memory



Vikram Garhwal  writes:

> From: Stefano Stabellini 
>
> In preparation to moving most of xen-hvm code to an arch-neutral location,
> move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
>
> Also move handle_vmport_ioreq to arch_handle_ioreq.
>
> NOTE: This patch breaks the build. Next patch fixes the build issue.
> Reason behind creating this patch is because there is lot of new code addition
> and pure code movement done for enabling Xen on ARM. Keeping the this patch
> separate is done to make it easier to review.

But you do intend to squash the patches for the final version? We don't
want to intentionally break bisection.

Otherwise:

Reviewed-by: Alex Bennée 


-- 
Alex Bennée

Re: [PATCH v10 1/9] s390x/cpu topology: core_id sets s390x CPU topology

2022-10-27 Thread Janis Schoetterl-Glausch

On Thu, 2022-10-27 at 10:05 +0200, Thomas Huth wrote:
> On 24/10/2022 21.25, Janis Schoetterl-Glausch wrote:
> > On Wed, 2022-10-12 at 18:20 +0200, Pierre Morel wrote:
> > > In the S390x CPU topology the core_id specifies the CPU address
> > > and the position of the core withing the topology.
> > > 
> > > Let's build the topology based on the core_id.
> > > s390x/cpu topology: core_id sets s390x CPU topology
> > > 
> > > In the S390x CPU topology the core_id specifies the CPU address
> > > and the position of the cpu withing the topology.
> > > 
> > > Let's build the topology based on the core_id.
> > > 
> > > Signed-off-by: Pierre Morel 
> > > ---
> > >   include/hw/s390x/cpu-topology.h |  45 +++
> > >   hw/s390x/cpu-topology.c | 132 
> > >   hw/s390x/s390-virtio-ccw.c  |  21 +
> > >   hw/s390x/meson.build|   1 +
> > >   4 files changed, 199 insertions(+)
> > >   create mode 100644 include/hw/s390x/cpu-topology.h
> > >   create mode 100644 hw/s390x/cpu-topology.c
> > > 
> > > diff --git a/include/hw/s390x/cpu-topology.h 
> > > b/include/hw/s390x/cpu-topology.h
> > > new file mode 100644
> > > index 00..66c171d0bc
> > > --- /dev/null
> > > +++ b/include/hw/s390x/cpu-topology.h
> > > @@ -0,0 +1,45 @@
> > > +/*
> > > + * CPU Topology
> > > + *
> > > + * Copyright 2022 IBM Corp.
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > > + * your option) any later version. See the COPYING file in the top-level
> > > + * directory.
> > > + */
> > > +#ifndef HW_S390X_CPU_TOPOLOGY_H
> > > +#define HW_S390X_CPU_TOPOLOGY_H
> > > +
> > > +#include "hw/qdev-core.h"
> > > +#include "qom/object.h"
> > > +
> > > +typedef struct S390TopoContainer {
> > > +int active_count;
> > > +} S390TopoContainer;
> > > +
> > > +#define S390_TOPOLOGY_CPU_IFL 0x03
> > > +#define S390_TOPOLOGY_MAX_ORIGIN ((63 + S390_MAX_CPUS) / 64)
> > > +typedef struct S390TopoTLE {
> > > +uint64_t mask[S390_TOPOLOGY_MAX_ORIGIN];
> > > +} S390TopoTLE;
> > 
> > Since this actually represents multiple TLEs, you might want to change the
> > name of the struct to reflect this. S390TopoTLEList maybe?
> 
> Didn't TLE mean "Topology List Entry"? (by the way, Pierre, please explain 

Yes.

> this three letter acronym somewhere in this header in a comment)...
> 
> So expanding the TLE, this would mean S390TopoTopologyListEntryList ? ... 
> this is getting weird...

:D indeed. So the leaves of the topology tree as stored by STSI are lists
of CPU-type TLEs which aren't empty i.e. represent some cpus.
Whereas this struct is used to track which CPU-type TLEs need to be created.
It doesn't represent a TLE and doesn't represent the list of CPU-type TLEs.
So yeah, you're right, not a good name.

Off the top of my head I'd suggest S390TopoCPUSet. It's a bitmap, which is
kind of a set. Maybe S390TopoSocketCPUSet to reflect that it is the set of
CPUs in a socket, although, if we ever support different polarizations, etc.
that wouldn't really be true anymore, since that creates additional levels,
so maybe not. (In that case the leaf list of CPU-types TLEs is a flattened 
tree.)

> Also, this is not a "list" in the sense of a linked 
> list, as one might expect at a first glance, so this is all very confusing 
> here. Could you please come up with some better naming?
> 
>   Thomas
> 
>

Re: [PATCH v1 11/12] meson.build: enable xenpv machine build for ARM



Vikram Garhwal  writes:

> Add CONFIG_XEN for aarch64 device to support build for ARM targets.

So to be clear a --enable-xen only build for any of these binaries
essentially ends up being the same thing just with a slightly less
discombobulating name?

Maybe given there is no real architecture specific stuff we should just
create a neutral binary for --enable-xen (e.g. qemu-xen-backend)?

Anyway:

Reviewed-by: Alex Bennée 


>
> Signed-off-by: Vikram Garhwal 
> Signed-off-by: Stefano Stabellini 
> ---
>  meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/meson.build b/meson.build
> index b686dfef75..0027d7d195 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -125,7 +125,7 @@ endif
>  if cpu in ['x86', 'x86_64', 'arm', 'aarch64']
># i386 emulator provides xenpv machine type for multiple architectures
>accelerator_targets += {
> -'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu'],
> +'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu', 'aarch64-softmmu'],
>}
>  endif
>  if cpu in ['x86', 'x86_64']


-- 
Alex Bennée

Re: [PATCH v10 3/9] s390x/cpu_topology: resetting the Topology-Change-Report

2022-10-27 Thread Pierre Morel





On 10/27/22 10:14, Thomas Huth wrote:

On 12/10/2022 18.21, Pierre Morel wrote:

During a subsystem reset the Topology-Change-Report is cleared
by the machine.
Let's ask KVM to clear the Modified Topology Change Report (MTCR)
  bit of the SCA in the case of a subsystem reset.

Signed-off-by: Pierre Morel 
Reviewed-by: Nico Boehr 
Reviewed-by: Janis Schoetterl-Glausch 
---
  target/s390x/cpu.h   |  1 +
  target/s390x/kvm/kvm_s390x.h |  1 +
  hw/s390x/cpu-topology.c  | 12 
  hw/s390x/s390-virtio-ccw.c   |  1 +
  target/s390x/cpu-sysemu.c    |  7 +++
  target/s390x/kvm/kvm.c   | 23 +++
  6 files changed, 45 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index d604aa9c78..9b35795ac8 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -825,6 +825,7 @@ void s390_enable_css_support(S390CPU *cpu);
  void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg);
  int s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t 
sch_id,

  int vq, bool assign);
+void s390_cpu_topology_reset(void);
  #ifndef CONFIG_USER_ONLY
  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
  #else
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index aaae8570de..a13c8fb9a3 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -46,5 +46,6 @@ void kvm_s390_crypto_reset(void);
  void kvm_s390_restart_interrupt(S390CPU *cpu);
  void kvm_s390_stop_interrupt(S390CPU *cpu);
  void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info);
+int kvm_s390_topology_set_mtcr(uint64_t attr);
  #endif /* KVM_S390X_H */
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index c73cebfe6f..9f202621d0 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -107,6 +107,17 @@ static void s390_topology_realize(DeviceState 
*dev, Error **errp)

  qemu_mutex_init(&topo->topo_mutex);
  }
+/**
+ * s390_topology_reset:
+ * @dev: the device
+ *
+ * Calls the sysemu topology reset
+ */
+static void s390_topology_reset(DeviceState *dev)
+{
+    s390_cpu_topology_reset();
+}
+
  /**
   * topology_class_init:
   * @oc: Object class
@@ -120,6 +131,7 @@ static void topology_class_init(ObjectClass *oc, 
void *data)

  dc->realize = s390_topology_realize;
  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    dc->reset = s390_topology_reset;
  }
  static const TypeInfo cpu_topology_info = {
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index aa99a62e42..362378454a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -113,6 +113,7 @@ static const char *const reset_dev_types[] = {
  "s390-flic",
  "diag288",
  TYPE_S390_PCI_HOST_BRIDGE,
+    TYPE_S390_CPU_TOPOLOGY,
  };
  static void subsystem_reset(void)
diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 948e4bd3e0..707c0b658c 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -306,3 +306,10 @@ void s390_do_cpu_set_diag318(CPUState *cs, 
run_on_cpu_data arg)

  kvm_s390_set_diag318(cs, arg.host_ulong);
  }
  }
+
+void s390_cpu_topology_reset(void)
+{
+    if (kvm_enabled()) {
+    kvm_s390_topology_set_mtcr(0);
+    }
+}
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index f96630440b..9c994d27d5 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2585,3 +2585,26 @@ int kvm_s390_get_zpci_op(void)
  {
  return cap_zpci_op;
  }
+
+int kvm_s390_topology_set_mtcr(uint64_t attr)
+{
+    struct kvm_device_attr attribute = {
+    .group = KVM_S390_VM_CPU_TOPOLOGY,
+    .attr  = attr,
+    };
+    int ret;
+
+    if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+    return -EFAULT;


EFAULT is something that indicates a bad address (e.g. a segmentation 
fault) ... so this definitely sounds like a bad choice for an error code 
here.


Hum, yes, ENODEV seems besser no?



  Thomas




--
Pierre Morel
IBM Lab Boeblingen

Re: [PATCH v1 12/12] meson.build: do not set have_xen_pci_passthrough for aarch64 targets



Vikram Garhwal  writes:

> From: Stefano Stabellini 
>
> have_xen_pci_passthrough is only used for Xen x86 VMs.
>
> Signed-off-by: Stefano Stabellini 

I think this might want to before 11/12. Anyway:

Reviewed-by: Alex Bennée 

> ---
>  meson.build | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/meson.build b/meson.build
> index 0027d7d195..43e70936ee 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1454,6 +1454,8 @@ have_xen_pci_passthrough = 
> get_option('xen_pci_passthrough') \
> error_message: 'Xen PCI passthrough requested but Xen not 
> enabled') \
>.require(targetos == 'linux',
> error_message: 'Xen PCI passthrough not available on this 
> platform') \
> +  .require(cpu == 'x86'  or cpu == 'x86_64',
> +   error_message: 'Xen PCI passthrough not available on this 
> platform') \
>.allowed()


-- 
Alex Bennée

Re: [PATCH] target/arm: Fixed Privileged Access Never (PAN) for aarch32

2022-10-27 Thread Timofey Kutergin

Hi Peter,
> V8 always implies V7, so we only need to check V7 here.
>From silicon perspective - yes, but as I see in qemu,
ARM_FEATURE_V7 and ARM_FEATURE_V8 are independent bits which do not affect
each
other in arm_feature() and set_feature() so they should be tested
separately.
Did I miss something?

Thanks
Best regards
Timofey



On Tue, Oct 25, 2022 at 4:45 PM Peter Maydell 
wrote:

> On Wed, 19 Oct 2022 at 13:15, Timofey Kutergin 
> wrote:
> >
> > - synchronize PSTATE.PAN with changes in CPSR.PAN in aarch32 mode
> > - set PAN bit automatically on exception entry if SCTLR_SPAN bit
> >   is set
> > - throw permission fault during address translation when PAN is
> >   enabled and kernel tries to access user acessible page
> > - ignore SCTLR_XP bit for armv7 and armv8 (conflicts with SCTLR_SPAN).
> >
> > Signed-off-by: Timofey Kutergin 
> > ---
> >  target/arm/helper.c |  6 ++
> >  target/arm/ptw.c| 11 ++-
> >  2 files changed, 16 insertions(+), 1 deletion(-)
>
> Thanks for this patch. I think you've caught all the places
> we aren't correctly implementing AArch32 PAN handling.
>
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index dde64a487a..5299f67e3f 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -9052,6 +9052,11 @@ void cpsr_write(CPUARMState *env, uint32_t val,
> uint32_t mask,
> >  }
> >  mask &= ~CACHED_CPSR_BITS;
> >  env->uncached_cpsr = (env->uncached_cpsr & ~mask) | (val & mask);
> > +if (env->uncached_cpsr & CPSR_PAN) {
> > +env->pstate |= PSTATE_PAN;
> > +} else {
> > +env->pstate &= ~PSTATE_PAN;
> > +}
>
> This approach means we're storing the PAN bit in two places,
> both in env->uncached_cpsr and in env->pstate. We don't do
> this for any other bits as far as I can see. I think we should
> either:
>  (1) have the code that changes behaviour based on PAN look
>  at either env->pstate or env->uncached_cpsr depending
>  on whether we're AArch64 or AArch32
>  (2) always store the state in env->pstate only, and handle
>  this in read/write of the CPSR the same way we do with
>  other "cached" bits
>
> I think the intention of the current code is (1), and the
> only place we get this wrong is in arm_mmu_idx_el(),
> which is checking env->pstate only. (The other places that
> directly check env->pstate are all in AArch64-only code,
> and various AArch32-only bits of code already check
> env->uncached_cpsr.) A function like
>
> bool arm_pan_enabled(CPUARMState *env)
> {
> if (is_a64(env)) {
> return env->pstate & PSTATE_PAN;
> } else {
> return env->uncached_cpsr & CPSR_PAN;
> }
> }
>
> and then using that in arm_mmu_idx_el() should I think
> mean you don't need to change either cpsr_write() or
> take_aarch32_exception().
>
> >  if (rebuild_hflags) {
> >  arm_rebuild_hflags(env);
> >  }
> > @@ -9592,6 +9597,7 @@ static void take_aarch32_exception(CPUARMState
> *env, int new_mode,
> >  /* ... the target is EL1 and SCTLR.SPAN is 0.  */
> >  if (!(env->cp15.sctlr_el[new_el] & SCTLR_SPAN)) {
> >  env->uncached_cpsr |= CPSR_PAN;
> > +env->pstate |= PSTATE_PAN;
> >  }
> >  break;
> >  }
> > diff --git a/target/arm/ptw.c b/target/arm/ptw.c
> > index 23f16f4ff7..204a73350f 100644
> > --- a/target/arm/ptw.c
> > +++ b/target/arm/ptw.c
> > @@ -659,6 +659,13 @@ static bool get_phys_addr_v6(CPUARMState *env,
> uint32_t address,
> >  goto do_fault;
> >  }
> >
> > +if (regime_is_pan(env, mmu_idx) && !regime_is_user(env,
> mmu_idx) &&
> > +simple_ap_to_rw_prot_is_user(ap >> 1, 1) &&
> > +access_type != MMU_INST_FETCH) {
> > +fi->type = ARMFault_Permission;
> > +goto do_fault;
> > +}
>
> This assumes we're using the SCTLR.AFE==1 simplified
> permissions model, but PAN should apply even if we're using the
> old model. So we need a ap_to_rw_prot_is_user() to check the
> permissions in that model.
>
> The check is also being done before the Access fault check, but
> the architecture says that Access faults take priority over
> Permission faults.
>
> > +
> >  if (arm_feature(env, ARM_FEATURE_V6K) &&
> >  (regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
> >  /* The simplified model uses AP[0] as an access control
> bit.  */
> > @@ -2506,7 +2513,9 @@ bool get_phys_addr_with_secure(CPUARMState *env,
> target_ulong address,
> >  if (regime_using_lpae_format(env, mmu_idx)) {
> >  return get_phys_addr_lpae(env, address, access_type, mmu_idx,
> >is_secure, false, result, fi);
> > -} else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
> > +} else if (arm_feature(env, ARM_FEATURE_V7) ||
> > +   arm_feature(env, ARM_FEATURE_V8) || (
>
> V8 always implies V7, so we only need to ch

Re: [PATCH v1 09/12] accel/xen/xen-all: export xenstore_record_dm_state



Vikram Garhwal  writes:

> xenstore_record_dm_state() will also be used in aarch64 xenpv machine.
>
> Signed-off-by: Vikram Garhwal 
> Signed-off-by: Stefano Stabellini 
> ---
>  accel/xen/xen-all.c  | 2 +-
>  include/hw/xen/xen.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
> index 69aa7d018b..276625b78b 100644
> --- a/accel/xen/xen-all.c
> +++ b/accel/xen/xen-all.c
> @@ -100,7 +100,7 @@ void xenstore_store_pv_console_info(int i, Chardev *chr)
>  }
>  
>  
> -static void xenstore_record_dm_state(struct xs_handle *xs, const char *state)
> +void xenstore_record_dm_state(struct xs_handle *xs, const char *state)
>  {
>  char path[50];
>  
> diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> index afdf9c436a..31e9538a5c 100644
> --- a/include/hw/xen/xen.h
> +++ b/include/hw/xen/xen.h
> @@ -9,6 +9,7 @@
>   */
>  
>  #include "exec/cpu-common.h"
> +#include 

This is breaking a bunch of the builds and generally we try and avoid
adding system includes in headers (apart from osdep.h) for this reason.
In fact there is a comment just above to that fact.

I think you can just add struct xs_handle to typedefs.h (or maybe just
xen.h) and directly include xenstore.h in xen-all.c following the usual
rules:

  https://qemu.readthedocs.io/en/latest/devel/style.html#include-directives

It might be worth doing an audit to see what else is including xen.h
needlessly or should be using sysemu/xen.h. 

>  
>  /* xen-machine.c */
>  enum xen_mode {
> @@ -31,5 +32,6 @@ qemu_irq *xen_interrupt_controller_init(void);
>  void xenstore_store_pv_console_info(int i, Chardev *chr);
>  
>  void xen_register_framebuffer(struct MemoryRegion *mr);
> +void xenstore_record_dm_state(struct xs_handle *xs, const char *state);
>  
>  #endif /* QEMU_HW_XEN_H */


-- 
Alex Bennée

Re: [PATCH v1 00/12] Introduce xenpv machine for arm architecture